Sight and Sound - CVPR 2024

Schedule

Location: The workshop will be located in Summit 326.

9:00 - 9:05 (PT)	Welcome
9:05 - 10:00 (PT)	Paper session #1
	Laughing Matters: Introducing Audio-Driven Laughing-Face Generation with Diffusion Models - Extended Abstract		Antoni Bigata Casademunt, Rodrigo Mira, Nikita Drobyshev, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
	Can CLIP Help Visual Sound Localization?		Sooyoung Park, Arda Senocak, Joon Son Chung
	Learning Continual Audio-Visual Sound Separation Models		Weiguo Pian, Yiyang Nan, Shijian Deng, Shentong Mo, Yunhui Guo, Yapeng Tian
	BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition - Extended Abstract		Alexandros Haliassos, Andreas Zinonos, Rodrigo Mira, Stavros Petridis, Maja Pantic
	Q&A session
	Audio-Visual Autism Behavior Recognition with Multimodal Large Language Models		Shijian Deng, Erin Kosloski, Siddhi Patel, Zeke A Barnett, Yiyang Nan, Alexander M Kaplan, Sisira Aarukapalli, William Doan, Matthew Wang, Harsh Singh, Rollins Pamela, Yapeng Tian
	Dataset distillation for audio-visual datasets		Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, Yapeng Tian
	AVQA-CoT: When CoT Meets Question Answering in Audio-Visual Scenarios		Guangyao Li, Henghui Du, Di Hu
	Q&A session
10:00 - 10:30 (PT)	Posters & Coffee Break
10:30 - 11:15 (PT)	Paper session #2
	ViSpeR: Multilingual Visual Speech Recognition		Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Eustache Le Bihan, Ankit Singh, Hakim Hacid
	Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning		Nikhil Singh, Chih-Wei Wu, Iroro Orife, Mahdi M. Kalayeh
	Q&A session
	AVHuMAR: Audio-Visual Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy		Wenxuan Wu, Xueyuan Chen, Xixin Wu, Haizhou Li, Helen Meng
	AV-Mamba: Cross-Modality Selective State Space Models for Audio-Visual Question Answering		Ziru Huang, Jia Li, Wenjie Zhao, Yunhui Guo, Yapeng Tian
	SparseVSR: Lightweight and Noise Robust Visual Speech Recognition - Extended Abstract		Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros Haliassos, Stavros Petridis, Maja Pantic
	Q&A session
11:15 - 11:45 (PT)	Invited talk	Alexander Richard
11:45 - 1:00 (PT)	Lunch
1:00 - 2:00 (PT)	Invited papers		Chair: Ziyang Chen
	SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos		Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman
	Q&A session
	TIM: A Time Interval Machine for Audio-Visual Action Recognition		Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen
	The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective		Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao
	Q&A session
	MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models		Sanjoy Chowdhury, Sayan Nag, Joseph K J, Balaji Vasan Srinivasan, Dinesh Manocha
	Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos		Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
	Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language		Mark Hamilton, Andrew Zisserman, John R. Hershey, William T. Freeman
	Q&A session
2:00 - 2:30 (PT)	Invited talk	Ruohan Gao
2:30 - 3:00 (PT)	Invited talk	Shyam Gollakota
3:00 - 3:30 (PT)	Coffee Break
3:30 - 4:00 (PT)	Invited talk	Hilde Kuehne
4:00 - 4:30 (PT)	Invited talk	Samuel Clarke
4:30 - 5:00 (PT)	Invited talk	Tengda Han

Presentation instructions

Previous workshops: 2018, 2019, 2020, 2021, 2022, 2023

Authors of accepted papers will present a 5-minute talk about their work. You may either present in person, or submit a video. For the latter option, please submit by June 15th (11:59 PST) to CMT as a .mp4 file. Please submit the video as a supplementary file on CMT, along with the PDF for your paper.
We'll have a paper presentation session from 9am - 11:15am. These will be run as two sub-sessions, with a coffee and poster break in between. Each session will be a mix of in-person and video presentations. Throughout the paper sessions, there will be short Q&A sessions for all of the papers that precede them. We'll also release recordings on our website for offline viewing. We'll post the paper schedule in the coming weeks.
You are welcome to optionally present a poster during the lunch and coffee breaks. We unfortunately are unable to offer a hybrid option for posters.
Please also submit the camera ready version of your paper via CMT by June 13th (11:59 PST). Papers will be available on our website.
Looking forward to seeing you there!