ARGUS: Hallucination and Omission Evaluation in Video-LLMs
ICCV 2025, 2025
An evaluation framework and benchmark for free-form video-LLMs hallucination and omission.
ICCV 2025, 2025
An evaluation framework and benchmark for free-form video-LLMs hallucination and omission.
ArXiv, 2025
We propose Bilevel ZOFO, a novel bilevel optimization framework that bridges parameter-efficient and zeroth-order optimization techniques for efficient large language model (LLM) fine-tuning and meta-training.
ArXiv, 2025
We propose ToMoE, a novel method for converting dense large language models to mixture-of-experts through dynamic structural pruning.
CVPR 2025, 2024
How to train a small diffusion model while suppressing unwanted concepts
ICLR 2025, 2024
Dynamic Prompt-based Pruning of Text-to-Image Diffusion Models
ArXiv, 2024
PixelProse is a comprehensive dataset of over 16M (million) synthetically generated captions, leveraging cutting-edge vision-language models for detailed and accurate descriptions.
ArXiv, 2023
Prompt tuning for Graph Transformers and Message Passing Graph Neural Networks, improved efficiency and resource utilization of Graph Transformers.
ArXiv, 2023
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks to effectively exploit auxiliary modalities available during training in order to improve the performance of a unimodal model at inference.
ArXiv, 2021
we propose to classify the disease severity based on the Fazekas scale through the visual biomarkers, namely the Periventricular White Matter (PVWM) and the Deep White Matter (DWM) changes, in the real-world setting of thick-slice MRI