Computer Vision DJ 
Installation overview:
Pose & Tunes is an interactive loop station for electronic music, controlled via webcam and gestures. The system comprises three elements: Three-Channel Looper, Effects Panel, and Keyboard. It enables hands-free control, allowing users to create loops, apply effects, and perform live, offering an interactive approach to electronic music production
Audio Processing in Max MSP:
1. Three Channel Looper:
Techno music consists of three stems: bass, drums, and melody. We curated three samples for each stem, offering users 27 possible combinations. The samples were stored in Max using "buffer~" objects, allocated by a "select" function based on user requests. Separate "buffer~" objects were created for each stem, allowing them to select different sample paths. The key feature is the continuous looping using the "groove~" function, syncing longer samples with the shortest one. Toggle switches were included to turn off specific stems, enabling users to play only melody and bass. To maintain perfect timing, we implemented a metronome that synchronized user input with the music.
2. Effects Panel
The effects panel offers control over master volume, tempo, and low/high pass filters for the audio output. Tempo adjustments change the playback speed of the looped samples via the "groove~" objects, synchronized by a global "tempo" object. To maintain pitch, a time stretch function (stretch~) is applied. The low/high pass filters are controlled by a slider (-1 to +1). An if statement scales the values, generating appropriate cutoff frequency and linear gain for the filters. Filter coefficients are then applied using the "biquad~" object. Volume control is achieved through a slider mapped to dB values that drive the master gain of the patch.
3. Keyboard MIDI Controller:
The keyboard is controlled by placing a hand over the desired key, which triggers a selector to send the corresponding MIDI value. Key velocity is mapped from the volume control slider. To ensure on-time and in-beat key entry, a metronome linked to the master tempo is employed. The keyboard plays notes only when a bang signal is received, allowing it to play precisely on the beat.
Gestural User Interface
Control Logic & Form:
The gestural user interface features a dashboard of control elements overlaying the webcam feed. Users can see themselves at the center of the control elements. The interface is hosted locally in a web browser, allowing rapid and real-time communication with MAX through a local networking port.
Gesture control is achieved using the Media Pipe human pose estimation library. It extracts positional information of both hands from the webcam feed. The JavaScript logic treats the dashboard as a 2D coordinate plane. When the hands and wrists align with a button or key position, the JavaScript changes the interface configuration/state and sends the corresponding OSC message to MAX.

Design & Interface Development 
To change values on the gesture interface, users hover their hands over the desired button for 0.7 seconds. After this delay, the interface recognizes the user's intention and performs the functions mentioned earlier. The 0.7-second time delay was determined through user testing and feedback, ensuring a balance between sensitivity and responsiveness. The layout and design of the control panel underwent refinement based on user feedback. Initial design feedback indicated confusion regarding control and hand usage. In the second iteration, panels were placed around the user, with their pose centered, clarifying that right-hand panels were controlled by the right hand, and vice versa for the left panels.
Connecting the interface with MAX MSP
Data communication stream:
As the lead developer, I focused on establishing communication between MAX MSP and the locally hosted browser interface. MAX can utilize the User Datagram Protocol (UDP) transport layer for cross-network communication. This protocol does not require a constant connection and enables the Max Patch to watch a specific port for datagrams, treating them as incoming messages. UDP is ideal for real-time communication, ensuring a seamless user experience.
Within the UDP transport layer, MAX employs Open Sound Control (OSC) messages, commonly used for networking audio data. Initially, I planned for the JavaScript browser interface to directly send OSC messages to the Max Patch using UDP. However, due to security concerns, JavaScript does not allow UDP communication from web browsers. To overcome this limitation, I created a WebSocket/UDP bridge using a Node.js server/app. The bridge acted as a "middleman," receiving OSC messages from the browser via TCP and then forwarding them to a different local host port using UDP. The Max Patch listened for these messages using the "udprecieve" function. The diagram below illustrates the complete system diagram for data transfer.

Conclusion
During user testing and the final exhibition, our system received positive feedback and was even included in the Dyson School of Design Engineering "Open House" exhibition. The gesture control interface proved intuitive and natural for new users, enhancing the overall audio experience. This successful outcome validated our initial hypothesis and made our project a novel and engaging interaction that left a positive impression on users.
Back to Top