Seeing Motion from Sound with Event Cameras

December 2025 Event Cameras

Sound causes objects to vibrate, and while these vibrations are typically invisible to the naked eye, they can be observed visually under appropriate sensing conditions. Prior work has shown that the motion of an object’s surface is closely related to the underlying sound pressure, making it possible to infer audio from visual measurements alone [1]. This idea has enabled non-contact sound recovery in applications such as visual microphones, optical vibration sensing, surveillance, and the analysis of material and structural properties, especially in settings where placing microphones is difficult or undesirable [2], [3] .

Event cameras operate differently from conventional cameras. Instead of recording images at fixed frame rates, each pixel independently reports changes in brightness as they occur. This asynchronous sensing makes event cameras extremely sensitive to fast and subtle motion, while producing very little data when the scene is static.

As a result, event cameras can reveal physical phenomena—such as micro-scale vibrations—that are invisible to standard video and even to the human eye. In this post, I describe a simple observation: although the object appears stationary to the naked eye, sound-induced vibrations produce apparent motion that becomes visible in the event stream.

Experiment Setup

The object is illuminated by a single fixed light source. The camera observes the object, while sound is played by a speaker outside the camera's field of view. The object appears visually static. Measures were taken to minimize sound propagation through the table, including the use of acoustic isolation foam. Below is the event stream visualization of a chips packet with sound playing.

Observation

When sound is playing, the event stream exhibits clear, structured activity that is absent when no sound is present. This activity is caused by sound-induced micro-vibrations of the object, which introduce subtle temporal changes in brightness that trigger events.

When no sound is playing, the event stream is sparse and dominated by background noise. The contrast between these two conditions is immediately visible in the raw event data.

Raw event stream with sound

Raw event stream with sound

Raw event stream without sound

Raw event stream without sound

Key takeaway:
Since event cameras can capture tiny, rapid brightness changes, sound becomes visible in the form of motion as it physically vibrates objects.

Next Steps

Ongoing work focuses on investigating methods for extracting sound-related information present in these event streams.

Acknowledgments

Thank you to Prof. Robert Pless for his guidance and for providing this opportunity and access to the event camera used in these experiments. I also thank Alper Cetinkaya for the helpful discussions during this work.

References

  1. Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham J. Mysore, Fredo Durand, and William T. Freeman. The Visual Microphone: Passive Recovery of Sound from Video. ACM Transactions on Graphics (SIGGRAPH), 2014.
  2. Justin G. Chen, Neal Wadhwa, Young-Jin Cha, Fredo Durand, William T. Freeman, and Oral Buyukozturk. Structural Modal Identification Through High-Speed Camera Video: Motion Magnification. Topics in Modal Analysis I, Springer, 2014.
  3. Mark Sheinin, Dorian Chan, Matthew O’Toole, and Srinivasa G. Narasimhan. Dual-Shutter Optical Vibration Sensing. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.