The world’s leading publication for data science, AI, and ML professionals.

Advanced Music Analytics using Machine Learning

For a few years, I worked for a subsidiary of Sony Music that focused on distributing indie music. Compared to the corporate offices of…

Photo by Thomas Kelley on Unsplash
Photo by Thomas Kelley on Unsplash

Making Sense of Big Data

For a few years, I worked for a subsidiary of Sony Music that focused on distributing indie music. Compared to the corporate offices of Sony farther uptown, the atmosphere was pretty laid back, and I made some good friendships during that time. The company did a lot more than just distribute music to services like Spotify or Apple Music; they also formed relationships with artists to market and advertise their music and nurture them professionally. As more of a boutique shop, they could work more intimately with artists. In fact, they advertise themselves as a company that "empowers artists and labels to connect with fans across the globe."

Now, I didn’t do any of that myself, at least not directly. I was just a lowly data engineer in the tech department, whose job it was supposedly to maintain and develop the software this business model relied on. In many ways, the tech department just did what it was told, producing the best engineered system it could, but ultimately just following spec. The specification of a new product or feature was the end result of a long process of conducting surveys with clients about the product. By the time the spec was complete and software deployed, customer needs had already shifted. To address this problem, as well as foster creativity within the tech org generally, they tried to empower engineers to pursue new ideas and iterate on products quickly. In the following months, through hackathons and other initiatives aimed at getting the tech department to exercise its creativity a little more, we worked closely with product and design not just to execute on, but actually define, new products and prototype them quickly.

In this article, I’d like to present the results of some of these efforts centered around revamping the presentation of our music analytics data. To give some background, over the prior year, in my role as a data engineer, I’d worked to improve the analytics part of the company’s platform which allowed label managers to track the number of streams their music had generated. While the various graphs and tables of music streams on the page gave a rough sense of performance, they failed to reveal major trends and yielded little insight into actions needed to improve performance. The filters on the page allowed users to drill down to certain artists or songs, but they didn’t provide a holistic view of an artist’s popularity, conflicting with the artist-centric focus of the rest of the company. To overcome these limitations, I wanted to see how machine learning could be used to discover novel information about artists that wasn’t immediately obvious from an examination of their streams. Ultimately, I came up with two complementary approaches for advanced artist-centric analytics, shown below.

(Image by author)
(Image by author)

In the first approach, called automatic fanbase segmentation, an artist’s listeners are broken down into different categories based on how devotedly they listen. This approach was inspired by prior work by the data analytics team, which provided business insights to clients and the rest of the company, but operated largely independently of the engineering team working on the analytics software. Each quarter they provided our top labels with a breakdown of their listeners into similar categories based on level of commitment, as shown below. The thresholds in terms of the number of streams, used to divide the categories, were chosen arbitrarily based on a statistical analysis using only data for that music label. For example, if 50% of the listeners of a label’s music stream up to 119 times per year, then listeners with up to that many streams are categorized as "interested." Ultimately just a reflection of the underlying distribution of the number of users who’ve streamed a label’s music a certain number of times, these results don’t provide insights into what a committed versus casual fan looks like in a way that translates across labels and artists. To address these shortcomings, I used machine learning to automatically infer listener profiles across a large section of our data, allowing different artists to be easily compared.

(Image by author)
(Image by author)

In the second approach, called artist Clustering, I built on these ideas to provide another way to compare artists and identify growth opportunities. Because the company is an indie shop, with a catalogue spread across many individual artists, not just a comparatively few major artists signed by corporate, there had been some prior work focused on artist clustering and similarity. However, the graph methods used to compute artist similarity were slow and sensitive to noise, preventing them from being used to derive insights. Using dimensionality reduction techniques paired with Gaussian mixture model clustering, I developed a method that cuts through the noise to compare artists much faster. With this improved method, I generated actionable insights on where to market artists based on where they’re underperforming compared to artists who have a lot of common listeners.

In the remainder of this post, I’ll dive into these new approaches based on machine learning and the envisioned products centered on them.

Automatic Fanbase Segmentation

Before discussing the approach, let’s discuss what the data looks like. As discussed above, after music is distributed to vendors like Spotify and Apple Music, they return data about how users streamed that music (see below). Each row of the data corresponds to a listener streaming a track at a given time, together with other contextual data like which country they streamed from, or which operating system they used. The data is retrieved from a variety of sources each day, then loaded into the data warehouse.

(Image by author)
(Image by author)

The goal is to use this data to automatically break down an artist’s listeners into different commitment levels, or fan profiles. It’s not clear how to formulate this problem in the traditional language of supervised or unsupervised learning, but using an analogy in which fan profiles correspond to topics, and music streams correspond to words, I used a popular method in topic modeling to identify fan profiles automatically using only prior knowledge of the number of profiles. Before applying this method, I transformed the data above into a matrix analogous to the document-term matrix, as shown below. In a traditional document-term matrix, on the left, each element corresponds to the number of occurrences of a given word (column) in a given document (row). When translating the problem into the music domain, on the right, artists represent documents, but instead of keeping track of all their individual listeners, we discretize them into different bins depending on how many times they streamed. For example, the value of 100 in the first cell means 100 listeners streamed "Artist 1" from 1–50 times.

(Image by author)
(Image by author)

Discretizing in this way not only reduces the number of columns in the table, but it also completes the analogy between topic modeling and music streams. If a topic modeling algorithm automatically determines topics represented as distributions over words using the matrix to the left, then the same process applied to the right matrix should return "topics" that are distributions over streaming levels. These "topics" are interpreted as fan profiles. In the context of a common generative model used in topic modeling called latent Dirichlet allocation (LDA), the topics are determined by maximizing the probability that the documents contain the words they do, given a breakdown of those documents into topics. The depiction of this generative model in the music domain is shown below. (The identities of the artists is sensitive information, so artist images have been blurred.)

(Image by author)
(Image by author)

In this example, there are three fan profiles, or topics, representing different levels of listener engagement: passive, casual, and diehard. In the music catalogue as a whole, they occur in the proportions shown at the top. Each individual artist, however, can have a different mix of passive, casual, and diehard fans comprising their fanbase. The less popular artist, shown on the left, might have more passive or casual fans, while a more popular one has a greater proportion of diehard fans. The actual counts of streams observed for each artist is based on this breakdown, so that, for example, the artist with more diehard fans has more streams.

Applying LDA on months’ worth of streaming data using three topics resulted in the fan profiles shown below. Each profile is a distribution over listens. As expected, the three profiles break down roughly into passive, casual, and diehard fans. (No prior information on their shape had been provided, so it’s nice to see three easily interpretable topics falling out of the training.) Again, these profiles can be understood by analogy to topic modeling. A topic is a distribution over words which represents the chances of those words appearing in discussion of that topic. Sampling from it produces words in accordance with those probabilities. Similarly, sampling from a fan profile yields a certain amount of streams. Sampling from the passive fan profile is more likely to result in lower streams than sampling from the diehard profile. In the middle of the graph, where the number of streams is between the two extremes, the algorithm has a hard time telling the three profiles apart. The choppiness of the curves is probably an artifact of discretization, which might be alleviated by widening the bins there.

(Image by author)
(Image by author)

These fan profiles also allow you to make inferences in the reverse direction, determining which fan profile is likely to have led to a given number of music streams. Let’s say you know someone has listened to a certain artist 20 times. Which type of fan would you say they are? In the figure, imagine drawing a vertical line at 20 streams. That line intersects with all three fan profiles. The fan profile it intersects with at the highest probability level is the type of fan you should identify them as. In this instance, they’d be a passive fan, but if they had listened 120 times, they’d be a diehard fan. With this maximum likelihood approach, the crossover from passive to casual fan occurs at about 60 streams, and from casual to diehard at around 110. These thresholds may be sensitive to the discretization, but the point is that this method has the power to automatically discern these boundaries, by varying the distributions to fit the counts of observed streams.

Representing fans using probability distributions over the full range of music listens also provides an additional layer of meaning and nuance compared to using a simple threshold-based approach touched on above. With a fixed threshold, a diehard fan might be one that has more than 230 streams. If they had just one fewer stream, they wouldn’t be considered diehard. With fan profile distributions, the probability of being one fan instead of another varies more smoothly. Of course, this distinction doesn’t matter so much if a threshold-based approach to discriminating between fans, as discussed above, is resorted to anyways. Still, the full probabilistic treatment allows even diehard fans to sometimes produce a small number of streams, or passive listeners to produce a lot, making the model more flexible in capturing the overall streams observed across the data.

Below is the fan breakdown for three different artists chosen based on overall streaming. As expected, the least popular artist, shown on the left, has fewer diehard fans than the most popular artist, on the right. The most popular artist also has more casual listeners, which makes sense given their popularity. The artist in the middle has a fan breakdown somewhere in the middle. The nice part of this analysis is that all three artists, regardless of overall streaming performance, can be compared on the same scale.

(Image by author)
(Image by author)

The LDA model can also be used to deliver marketing recommendations based on demographic and geographic analyses, as shown below for the least popular artist, who stands to benefit the most from targeted growth opportunities. Each of his listeners was identified uniquely as one of the different types of fans, as described above. This enables aggregates of ages and nationalities to be generated across different fan types. Because a disproportionate number of passive and casual fans are in their twenties (note that the circled area in the figure is in error), effort could be taken to try to attract these listeners and convert them to diehard fans. The same applies to passive/casual listeners in Germany and Sweden.

(Image by author)
(Image by author)
(Image by author)
(Image by author)

During a hackathon following my design and implementation of this new approach to analytics, I even had an opportunity to work with design to envision what a product centered around this approach would look like, as shown below. The company had recently developed a mobile app for displaying analytics, so the designs show what a mobile user would see when they navigate to the hypothetical artist screen (left screen). When they click into diehard fans, for example, they’re shown some demographic and geographic insights (right screen). Though these mockups don’t exploit the full potential of the model, they’re a good start at delivering more targeted, artist-centric insights that may not be evident from a simple monitor of streaming performance.

(Image by author)
(Image by author)

Artist Clustering

Following the work on automatic fanbase segmentation, I explored another way of generating artist-centric insights. Instead of identifying hidden fan types that comprise an artist’s fanbase, this new approach involved clustering artists themselves into groups.

The motivation behind clustering artists is easy to understand. If two artists are in the same group, they’re somehow similar, and if they’re similar, then one artist’s streaming performance could be used to draw conclusions about the other artist. Artists can be grouped based on different data, leading to different groups. For example, artists can be grouped using metadata like their genre, but the groups generated using that data might not be as insightful as ones generated based on actual listener data. See the figure below. Artists and listeners form a graph in which an arrow from an artist to a listener indicates that listener streamed that artist. Grouping artists based on listenership means, intuitively, that artists in the same group have more listeners in common. For example, the artists within both groups have two listeners in common, whereas artists from different groups share at most one listener.

(Image by author)
(Image by author)

Using data of actual listenership has the potential to reveal patterns that might not be immediately obvious, but it requires traversing a large graph of all artists and listeners, which is computationally intensive and sensitive to noise. That’s where Dimensionality Reduction comes in handy. With dimensionality reduction, you can take a big matrix of all artists and their listeners, shown below, and compress it down to a much smaller matrix that captures most of the essential information. Because there are many more listeners than artists, dimensionality reduction can be used to greatly cut down on the number of columns. Each column of this reduced matrix would then represent an artist feature, computed based on what kinds of people listen to them.

(Image by author)
(Image by author)

Features computed using dimensionality reduction don’t have clear interpretations in terms of familiar concepts in music used to distinguish between artists like genre. Even without any clear designations, it’s often helpful to think about what the features might represent based on the data used to compute them. For features computed based on actual listenership, one possibility is that they represent the overall tastes of an artist’s audience, depicted below. For the artist on the left, dimensionality reduction might reduce all the complexity of his fanbase down to just four numbers representing their preferred genre (e.g., 50% of his fanbase prefers rock to all other genres). The artist on the right has a different breakdown reflecting his audience which favors pop and hip-hop. Comparing artists using their features is often a better way to get at the essence of their relationship, compared to looking at their individual listeners who may or may not listen to both.

(Image by author)
(Image by author)

Based on this central idea, I developed another approach for making recommendations on where best to market artists. Instead of targeting regions where an artist’s passive and casual listeners reside, this approach targets areas where similar artists outperform them in terms of overall streams. Because similar artists may have audiences with similar tastes, a country where one artist became really popular may be a fruitful place to market a similar-sounding artist. See the overall flow of the system below.

(Image by author)
(Image by author)

In the left module, the artist model is built by collecting streaming data; generating artist features using a popular dimensionality reduction technique called truncated singular value decomposition; and clustering artists based on these features using Bayesian Gaussian mixture modeling. An artist cluster can be visualized as a graph where artists are nodes and edges connect artists considered most similar based on their features. See an example artist cluster below visualized using Neo4j.

(Image by author)
(Image by author)

The compressed artist model is saved to disk and loaded by the second module, the geographic model. That module gets the most similar artists to a given artist using the artist model, then uses their historical streams broken down by country to draw conclusions. A product based on this system could display some of these conclusions in the following way.

(Image by author)
(Image by author)

The places where artists most similar to the selected artist are generating the most streams are ranked and displayed on a map, providing crucial information for future marketing campaigns.

Taking the analysis one step further, the average of the most similar artists’ streams can be compared to the artist’s actual streams. Those countries with the biggest performance gap between actual and predicted streams (based on similar artists’ average performance) are identified as the top regions to target. We didn’t create a software product based on these new analytics, but I did work closely with the data analytics team to create a Looker dashboard showing the top countries to target and the gap between the artist’s streams and those of his most similar artists, which we called "potential streams." Below is an illustration of the information contained in the Looker dashboard.

(Image by author)
(Image by author)

These new Looker models were presented to internal stakeholders to empower them to make the best decisions on where to promote new and maturing artists.

Conclusion

In this article, I showed how machine learning can be used in the music industry to uncover hidden patterns contained in streaming data like latent fan types and artist similarity. These hidden patterns can drive the development of new analytics products and internal reporting, allowing users to draw conclusions about their artists’ performance that can’t be reached through a simple presentation of their streaming numbers. With advanced analytics, label managers are given more tools at their disposal to make their artists as successful as possible.

Code

To explore the code for automatic fanbase segmentation, please see https://github.com/jchryssanthacopoulos/dirichlet; for artist clustering, https://github.com/jchryssanthacopoulos/artist_clustering.


Related Articles