Five things that caught my eye during ECCV 2024
Last week, I enjoyed attending ECCV 2024, which included exciting workshops on the applied side of computer vision, oral presentations from various computer vision specialists about their frontier research, and inspirational keynotes from industry titans.
In this short blog post, I will describe five information that were the most interesting and surprising for me, both from research and applied perspective.
What is ECCV?
European Conference in Computer Vision (ECCV) is one of the largest and most esteemed conferences (among CVPR, ICCV and NeurIPS) in the field of Computer Vision, with famous papers like NERF or Group Normalization. Conference in total takes 6 days, 2 days of workshops, followed by 4 days of main event. Considering the number of tracks, papers, and workshops ( 12 tracks, 2387 papers, and 51 workshops), it is impossible to check and understand everything properly, as topics range from Video Generation and 3D Depth Estimation to applications for automatic inspection of detects in manufacturing or diseases in agriculture. I tried my best to follow the topics of my greatest interest.
Highlights of the conference
Considering the conference, in the following examples, I will try to distill the most interesting insights from a variety of fields, ranging from avatars to GenAI, Explainable AI, and agriculture.
Make your own avatar with Synthesia
In their keynote, two lead researchers (Vittorio Ferrari and Lourdes Agapito) have shown how the field was changing in the past 6 years, with various advancements and the respective research associated. The company is trying to share its findings through peer-reviewed papers at conferences like ECCV, but also through wider audience appearances like in the famous advertisement of „Malaria Must Die” with David Beckham.
I was aware of some advancements in the field, with majority of avatars having either low quality (if personalized), having serious artifacts (most often around the lips) or not being very engaging.
In the past 2 years, they managed to solve a lot of the issues (making avatars more engaging and of better quality with experimental features of hand gestures). Apparently, that was the case in 2023, while in the past 2 years, the field has seen significant advancements with more precise lip sync across 150+ languages, more engaging voices, and even hand gestures).
In the near future, Synthesia also wants to add object interactions to make the avatars even more interesting, however, natural object grasping is a common problem from the intersection of computer vision and robots.
Fair, transparent and accountable AI
Bias preservation checklist from Bias Preservation in Machine Learning: The Legality of Fairness Metrics Under EU Non-Discrimination Law
Another interesting keynote was with the topic „Fair, transparent and accountable AI: What is legally required, what is ethically desired, and what is technically feasible” from Prof. Sandra Wachter of the University of Oxford. This keynote was very interesting as it showed a critical view on the EU AI act and current legal system, with significant loopholes and general descriptions (related report by the presenter: Limitations and loopholes of EU AI act), without clear guidelines. What was new for me is that currently more area specific guidelines are being constructed to supplement the EU AI act.
Moreover, the lecture described the classification of biases and their influence (i.e. Facebook used to let advertisers exclude users by race). An interesting bias presented was introduced in the Netherlands, where applicants of certain races and applicants living in certain areas had reduced mortgage capabilities. This partly due to the fact, that credit score systems were identifying certain areas as high-risk and others as low-risk.
Lastly, Prof. Sandra Watcher has presented a solution for that called Conditional Demographic Disparity -CDD
, which was introduced to AWS ecosystem in bias and interpretability tool called Amazon Clarify.
Read: LLMs for LegalTech - Unleashing the Potential of LLM Agents for Chatbot with the EU AI Act
Model centric and data centric Explainable AI
Activation of the various heads in CLIP architecture, Source: Interpreting the Second-Order Effects of Neurons in CLIP
Another interesting workshop was „Explainable Computer Vision”, with 2 part lectures from BAIR related to showing two different approaches to Explainable AI. First one was from Yossi Gandelsman with his work on understanding the contributions of different attention modules in CLIP (used in the majority of the current computer vision GenAI applications), which was groundbreaking as, in his research, he managed to find out which attention heads were accounting for features like:
- Geolocation
- Colors
- Textures
- Animal
- And many more.
Another part of the lecture was related to the data approach (with parallels to EigenFaces, where the underlying idea was that we can represent every face in the world as a weighted combination of about 100 faces). In this presentation Prof. Alexei A. Efros was showing how instead of focusing on explaining the behaviour of different algorithms, we should move from a model-centric approach („which part of the model decided to do so”) to data centric approach („which data contributed to that”). From his research, it seems feasible to do so on smaller, controlled datasets, with promising future views for larger and more diverse datasets (i.e., ImageNet or LAION-4b).
Check: FoXAI
Clay filters and mitagating sudden large AWS bill
Varial Clay filter from Bending Spoons, source recodechinaai.substack.com
This presentation was from one of the sponsors Bending Spoon, where they were showing how different waves of interest in their filter generating product. In Europe the are popular because of their app allowing to generate corporate pictures for LinkedIn from casual pictures. In Asia, they are more popular due to their Clay filter feature.
They have shown how advancements in Stable Diffusion helped them create interesting filters (i.e. Corporate Headshots) which countries and platforms were having the largest interest (with main contributors being Asian countries).
The last and most impactful hype wave (related to clay filter generation) was so significant that within a day, they managed to create 300k$ bill for a single day. In the situation they had to move fast and reduce their cloud consumption fast, while preserving . What they actually did was quite simple: they accepted and created smaller pictures, using the smaller and cheaper GPUs and finally reduced training time significantly).
By doing so they managed to reduce their costs by 10x, so small improvements can go a long way.
Advances in agriculture and phenotyping
Segmentation masks of leaf diseases. Source: A new Large Dataset and a Transfer Learning Methodology for Plant Phenotyping in Vertical Farms
With my significant interest in agriculture, I have joined presentations for the workshop „Computer Vision in Plant Phenotyping and Agriculture” (CVPPA).
In this workshop the main presentation that caught my eye was the one from Dr Fiora Pirri from DeepPlants, where she showed a project for disease detection dataset, available on HuggingFace. What was most surprising was the complexity of the pipeline (consisting of at least 5 different architectures and 2-3 modalities: RGB, depth etc). This dataset can help advance disease detection (from videos from mobile phone cameras) for underprivileged countries, early on and limit their impact. Additionally it was interesting to see various ideas using advances in Computer Vision (i.e., in the field of Stable Diffusion to generate examples of certain diseases to improve classification of such diseases).
Conclusions
In this short blog post I have covered five different areas with their state of the art - ranging from avatars, through advancements in applied fields like agriculture and to model/data understanding and identifying bias in your data.
ECCV was a great opportunity to understand what the SOTA is, how advancements in one field influence others (i.e., in agriculture or the medical industry) and to experience the frontiers of the field in the real world.