Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- keynoteOctober 2024
From Pixels to Preservation: The Power of Large Vision Models in Heritage Content Understanding
SUMAC '24: Proceedings of the 6th workshop on the analySis, Understanding and proMotion of heritAge ContentsPages 3–4https://doi.org/10.1145/3689094.3689470Preserving cultural heritage is essential for maintaining the legacy and history of human civilization, but it presents challenges in managing vast amounts of historical artifacts and documents. Recent advances in artificial intelligence, especially ...
- research-articleOctober 2024
Multimodal Understanding: Investigating the Capabilities of Large Multimodal Models for Object Detection in XR Applications
LGM3A '24: Proceedings of the 2nd Workshop on Large Generative Models Meet Multimodal ApplicationsPages 26–35https://doi.org/10.1145/3688866.3689126Extended Reality (XR), encompassing the concepts of augmented, virtual, and mixed reality, has the potential to offer unprecedented types of user interactions. An essential requirement is the automated understanding of a user's current scene, for ...
- research-articleOctober 2024
Domain Adaptive Object Detection for UAV-based Images by Robust Representation Learning and Multiple Pseudo-label Aggregation
EMCLR'24: Proceedings of the 1st International Workshop on Efficient Multimedia Computing under LimitedPages 59–67https://doi.org/10.1145/3688863.3689576Object detection on aerial images captured by Unmanned Aerial Vehicles (UAVs) has a wide range of applications. Due to the variations in illumination, weather conditions and scene backgrounds, the testing images (target domain) typically exhibit ...
- research-articleOctober 2024
Instance-aware Fine-grained Micro-action Recognition
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 11320–11326https://doi.org/10.1145/3664647.3688976Micro-action involves low-amplitude movement of human body, which brings challenges to common action recognition. This paper focuses on the extremely small region of human body as well as the severe long-tail distribution in micro-action recognition. An ...
- research-articleOctober 2024
Fractional Correspondence Framework in Detection Transformer
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 5498–5506https://doi.org/10.1145/3664647.3681613The Detection Transformer (DETR), by incorporating the Hungarian algorithm, has significantly simplified the matching process in object detection tasks. This algorithm facilitates optimal one-to-one matching of predicted bounding boxes to ground-truth ...
-
- research-articleOctober 2024
Uni-YOLO: Vision-Language Model-Guided YOLO for Robust and Fast Universal Detection in the Open World
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 1991–2000https://doi.org/10.1145/3664647.3681212Universal object detectors aim to detect any object in any scene without human annotation, exhibiting superior generalization. However, the current universal object detectors show degraded performance in harsh weather, and their insufficient real-time ...
- research-articleOctober 2024
EPL-UFLSID: Efficient Pseudo Labels-Driven Underwater Forward-Looking Sonar Images Object Detection
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 4349–4357https://doi.org/10.1145/3664647.3681160Sonar imaging is widely utilized in submarine and underwater detection missions. However, due to the complex underwater environment, sonar images suffer from complex distortions and noises, making detection models hard to extract clean high-level ...
- research-articleOctober 2024
Adaptive Hierarchical Aggregation for Federated Object Detection
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3732–3740https://doi.org/10.1145/3664647.3681158In practical object detection scenarios, distributed data and stringent privacy protections significantly limit the feasibility of traditional centralized training methods. Federated learning (FL) emerges as a promising solution to this dilemma. ...
- research-articleOctober 2024
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 4851–4860https://doi.org/10.1145/3664647.3681043Recent years have seen an increase in the use of gigapixel-level image and video capture systems and benchmarks with high-resolution wide (HRW) shots. However, unlike close-up shots in the MS COCO dataset, the higher resolution and wider field of view ...
- research-articleOctober 2024
Purified Distillation: Bridging Domain Shift and Category Gap in Incremental Object Detection
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 1197–1205https://doi.org/10.1145/3664647.3681031Incremental Object Detection (IOD) simulates the dynamic data flow in real-world applications, which require detectors to learn new classes or adapt to new domains while retaining knowledge from previous tasks. Most existing IOD methods focus only on ...
- research-articleOctober 2024
Diffusion Domain Teacher: Diffusion Guided Domain Adaptive Object Detector
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3284–3293https://doi.org/10.1145/3664647.3680962Object detectors often suffer a decrease in performance due to the large domain gap between the training data (source domain) and real-world data (target domain). Diffusion-based generative models have shown remarkable abilities in generating high-...
- research-articleOctober 2024
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7947–7956https://doi.org/10.1145/3664647.3680960Weakly-supervised visual recognition using inexact supervision is a critical yet challenging learning problem. It significantly reduces human labeling costs and traditionally relies on multi-instance learning and pseudo-labeling. This paper introduces ...
- research-articleOctober 2024
Alleviating the Equilibrium Challenge with Sample Virtual Labeling for Adversarial Domain Adaptation
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 2681–2689https://doi.org/10.1145/3664647.3680929Many domain adaptive object detection (DAOD) methods employ domain adversarial training to align features and mitigate the domain gap. In this approach, a feature extractor is trained to deceive a domain classifier, thereby aligning feature ...
- research-articleOctober 2024
LOVD: Large-and-Open Vocabulary Object Detection
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 9321–9329https://doi.org/10.1145/3664647.3680925Existing open-vocabulary object detectors require an accurate and compact vocabulary pre-defined during inference. Their performance is largely degraded in real scenarios where the underlying vocabulary may be indeterminate and often exponentially large. ...
- research-articleOctober 2024
Stochastic Context Consistency Reasoning for Domain Adaptive Object Detection
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 1331–1340https://doi.org/10.1145/3664647.3680899Domain Adaptive Object Detection (DAOD) aims to improve the adaptation of the detector for the unlabeled target domain by the labeled source domain. Recent advances leverage a self-training framework to enable a student model to learn the target domain ...
- research-articleOctober 2024
mmBox: Harnessing Millimeter-Wave Signals for Reliable Vehicle and Pedestrians Detection
ACM Transactions on Internet of Things (TIOT), Volume 5, Issue 4Article No.: 22, Pages 1–30https://doi.org/10.1145/3695883Object detection plays a pivotal role in various fields, for example, a smart traffic system relies on the detected results for decision-making. However, existing studies predominately utilize optical camera and LiDAR, which exhibit limitations in adverse ...
- short-paperOctober 2024
M2IoU: A Min-Max Distance-based Loss Function for Bounding Box Regression in Medical Imaging
- Anurag Kumar Shandilya,
- Kalash Shah,
- Bhavik Kanekar,
- Akshat Gautam,
- Pavni Tandon,
- Ganesh Ramakrishnan,
- Kshitij Jadhav
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 4041–4045https://doi.org/10.1145/3627673.3679958Computer vision applications such as object detection have increased manifolds in the medical domain for diagnosis and treatment purposes. Generally, object detection models such as YOLO(You Only Look Once) involve identifying the correct bounding box ...
- short-paperOctober 2024
Intricate Object Detection in Self Driving Environments with Edge-Adaptive Depth Estimation(EADE)
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 3837–3841https://doi.org/10.1145/3627673.3679948Autonomous vehicles make decisions and controls based on various object recognition results. The driving environment is characterized by the coexistence of a multitude of objects of varying shapes and sizes. Therefore, the ability to accurately recognise ...
- research-articleAugust 2024JUST ACCEPTED
Learning and Vision-based approach for Human fall detection and classification in naturally occurring scenes using video data
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Just Accepted https://doi.org/10.1145/3687125The advancement of medicine presents challenges for modern cultures, especially with unpredictable elderly falling incidents anywhere due to serious health issues. Delayed rescue for at-risk elders can be dangerous. Traditional elder safety methods like ...
- research-articleAugust 2024
Basic Safety Message Generation through a Video-based Analytics for Potential Safety Applications
- Abyad Enan,
- Abdullah Ai Mamun,
- Jean Michel Tine,
- Judith Mwakalonge,
- Debbie Aisiana Indah,
- Gurcan Comert,
- Mashrur Chowdhury
ACM Journal on Autonomous Transportation Systems (JATS), Volume 1, Issue 4Article No.: 23, Pages 1–26https://doi.org/10.1145/3643823With the advancement of modern artificial intelligence techniques, computer vision can play a vital role in enhancing roadway safety by reducing the risk of imminent collisions. To do so, a vision-based safety application is required, where a roadside ...