Sound UX

Sound Representation of Machine Learning Estimation on Image and Temperature Data by Granular Synthesis

Published in

Towards Data Science

9 min readJan 8, 2019

The purpose of this research is to display a captured image and a temperature data by sonification. Sonification is the technique of a transition from various data to sound and is used for accessibility, media art and interaction design. We proposed a system generate sound from minimum distances between moving objects and path prediction by machine learning in an image and temperature data. The sound depended on the minimum distances by the path prediction data based on k-means, fitting and optical flow, and was designed by granular synthesis and beat phenomenon sound. We also examined classification of images by flow vectors on optical flow and sound assignments by the results.

Ambient intelligence is considered in the present study, and the concept of intelligent space, in which devices are connected through various protocols, has been proposed. Ambient intelligence must be embedded, context aware, personalized, adaptive, and anticipatory. [1] Machine learning and artificial intelligence are key technologies for realizing adaptive and anticipatory functionality. We have considered environmental space by object tracking and object recognition and have assumed that the space, which has devices connected to each other through various protocols, has intelligence and provides services by means of sound activation. We also consider research based on several approaches, such as soundscape, as advocated by R. Murry Schafer and Barry Truax, interactive design, and media art. [2][3]

Sonification and Machine Learning

In order to implement the proposed system, a responsive system by sonification and machine learning was considered. Sonification is a conversion technique from various types of data to sound, and this technique has numerous implementations. Here, we focus on the conversion from image data to sound. Figure 1 shows a conceptual diagram of this process. Data related to contrast contours in an image are mapped to designated sounds by sound design. Figure 1(a) shows real-time sound mapping from the image data, and the sound indicates the current status of the image. The sound contains no future information. Figure 1(b) shows real-time sound mapping from the image data, including sounds indicating current status, and prediction by machine learning and AI. Moreover, the system can compare the current status and predictions as future information. The system was designed based on the type shown in Figure 1(b), and image data were sonified using both real-time data and the prediction.

・Sound Navigation

We considered sound navigation and sound representation of image and temperature. Research on sound navigation, including areas such as accessibility, interaction design, and media art, has been conducted. [4] In order to represent image or temperature data by sound, the following objectives were considered.

1 Sound representation for directions and distances

Data extracted from an image were translated into digital sound and represented as spatial information. The sound was designed according to differences between distance and direction. Pitch or tone depended on distance and direction, and sound were changing for the differences. In other words, if the distance between an estimated trajectory and an actual trajectory of an object is large, then the sounds assigned to these trajectories are significantly different.

2 Sound representation for moving objects using optical flow

Optical flow information consists of velocity vectors of moving objects, and points in images are displayed as vectors. The flow vectors were used to define the current status. We calculated the status using the correlation coefficient and assigned sound.

Sonification of Temperatures

We considered sonification of temperatures. The implemented system could obtain temperature data from sensors continuously, and we examined sonification for the logging data.

・System model

Figure 2 shows the system model used in the present study. One purpose of this system is to represent direction and distance by sound. We implemented an embedded algorithm of the system by machine learning in order to estimate the path prediction and reveal differences between the prediction and object by sound.

Machine Learning

Figure 3 shows an object tracking image. An image captured by a camera is converted to a binary image based on thresholding by the variation of intensity and subtracting the image from the original image. (Right) If objects in the image move continuously, then white pixels remained in the subtracted image, and numbers are assigned to the detected objects by a labeling algorithm. (Left) Trajectories of the objects as learning data are used for path prediction and status in the image.

・Path Prediction by Polynomial Approximation

Trajectory points are recorded by object tracking, and path prediction is estimated by polynomial approximation. The polynomial approximation is derived by the least squares method.

・Minimum Distance from Estimated Trajectory Points

**Figure 4. Minimum Distance from Estimated Trajectory Point**

Figure 4 shows the minimum distance from the estimated trajectory points of line (a) and curve (b). The minimum distances based on the formula were calculated in the system.

K-Means Clustering for Moving Objects Trajectories

**Figure 5. K-means Clustering for Moving Objects**

The camera in the system can detect data related to many points on moving objects. In order to classify the points into several groups, K-means clustering was used in the system. Figure 5 shows k-means clustering for moving objects in the case of three clusters, and the three groups are indicated by different colors in the figure. Red points indicate points on moving objects related to pedestrians. The classification of these points into groups depends on the number of clusters, which must be decided. The optimal number of clusters was decided based on the silhouette coefficient. [5] Figure 6 shows the relationship between the number of clusters and the silhouette coefficient. In Fig. 6, the number of clusters is 3, and the silhouette coefficient improves as it approaches 1.

Figure 7 shows path prediction by k-means and fitting. The system estimated the trajectory of pedestrians by fitting after classifying moving objects into groups (three clusters). Figure 8 shows the minimum distance from estimated trajectories. The system calculated the minimum distance of moving objects (pedestrians) and estimated trajectories.

Figure 8. Minimum Distance from Estimated Trajectories

Optical Flow

The Gaussian filter of Gunnar Farneback’s algorithm was used to compute the optical flow and calculate the velocity vectors of grids at even intervals in the images in order to measure the vector flows as an atmosphere. Figure 9 shows an example of velocity vectors calculated by optical flow. The correlation coefficient and the sum of the vectors were also calculated.We examined the calculation of the objective criteria based on the velocity vectors by the optical flow.

Temperature Data

The system uses a thermometer to obtain temperature data, which are then used to determine the status in space. Parameters from the temperature data were sent to a sound engine (Max/MSP) via the OSC [7], and sonification was performed.

Sonification

Sound was designed for the system, and the following were considered:

Sound by beat is assigned for distances from predicted trajectories.

Sound by granular synthesis and beat changes by control parameters via the distances.

Sound Assignment

Before the system notified sound which means distance or velocity information, the sound was generated by sound mapping, and the distances were mapped to the sound. We considered beat and granular synthesis in the sound mapping.

Beat

Beat is the interference pattern between two waves of slightly different frequencies. Figure 10 shows the pattern for the case in which beat is generated by 440 Hz and 445 Hz sine waves. The pattern is a periodic vibration. Figure 11 shows an example of Max/MSP patch of beat.

Figure 11. An Example of Max/MSP Patch (Beat)

Granular Synthesis

We examined sound generation by granular synthesis in the system. The sound obtained by granular synthesis consists of grains, which are samples that are split into small wave fragments. Figure 12 shows the principle of granular synthesis. Grains were made by the following procedure.

1. Extraction of grain in samples

2. Multiply by a window function

Extracted samples s(n) are multiplied by a window function to make values outside an interval.We designed grain depending on distances between moving objects and the estimated trajectory by changing amplitudes and frequencies.Cross synthesis is signal processing by convolution, which involves the multiplication of two spectral functions.

We designed sound by cross synthesis and assumed abstract sound, which includes a few pitches and environmental sounds, such as car noise. Figure 13 shows the spectrum obtained by cross synthesis using two 880-Hz samples of a sine wave and car noise.

Figure 13. Spectrum by Cross Synthesis (Car Noise and Sine Wave)

In Figure 13, the amplitude of the upper sine wave has a maximum amplitude of 0.7, and the lower sine wave has a maximum amplitude of 0.2. Frequencies around 880 Hz exist in the bottom for the lower sine wave due to the effects of the sine wave. In contrast, a range of frequencies exists in the top for the upper sine wave due to the effects of car noise. This means that entropy increased in the top for the upper since wave. Figure 14 shows the granular synthesis patch (Max/MSP). Sound was generated depending on the control parameters concerning the distance between moving objects and trajectories.

Figure 14. Granular Synthesis Patch (Max/MSP)

System Implementation

The system consisted of a data analyzer, a camera, a thermometer, and a sound engine (Figure 15). The data analyzer analyzes image data and temperature data by OpenCV and a sensor data library. Some data concerning moving objects and temperature data were stored in a file and were analyzed by python scripts related to machine learning. The obtained data were sent to the sound engine (Max/MSP) via OSC protocols, and sound was generated by granular synthesis and beat in real time.

Summary

Figure 16. Correlation Coefficients of Velocity Vectors

We investigated correlation coefficients of velocity vectors based on the results of optical flow data. Figure 16 shows velocity vector images (a), (b), and (c) . The red lines indicate velocity vectors. Many red lines appear in an image if moving objects are present. We calculated the correlation coefficients of velocity vectors for each combination. Table 1 shows the results of these combinations. Figures 16(a) and (b) show no velocity vectors, whereas numerous velocity vectors appear in Fig. 16(c). The correlation coefficient between Figs. 16(a) and 16(c) is very low (0.026171), whereas that between Figs. 16(b) and 16(c) is high (0.2745), based on the similarity of the velocity vectors.

The conclusions of the present study are as follows:

1 The correlation coefficient and average are criteria of the judgment situation in image.

2 The distance between moving objects and trajectories concerning path prediction is an important factor to notify situation in image.

REFERENCES

1. Aarts, E., and B. Eggen (eds.) [2002]. Ambient Intelligence Research in HomeLab. Phillips Research, Endhoven.

2. “Architectural User Interfaces: Themes, Trends and Directions in the Evolution of Architectural Design and Human Computer Interaction”, Martyn Dade-Robertson, International Journal of Architectural Computing, March 1, 2013

3. “Designing Sound Representations for Responsive Environments” Takuya Yamauchi, The 22nd Annual International Conference on Audiotry Display (ICAD2016),Australian National University, Canberra 3–7 July 2016

4. “Sound Jewelry” Takuya Yamauchi and Toru Iwatake Leonardo Music Journal No 18 MIT Press

5. “Python Machine Learning”, Sebastian Raschka, Packt Publishing

6. Architectural User Interfaces: Themes, Trends and Directions in the Evolution of Architectural Design and Human Computer Interaction.

7. Wright, M., Freed, A., “Open Sound Control: A New Protocol for Communicating with Sound Synthesizers”, International Computer Music Conference, Thessaloniki, Greece, 1997