Analyzing My Google Music Library

3 minute read

I’m a massive music fan. If I’m working, running, driving, or just sitting at home, I’m listening to something. When I’m asked what I usually listen to I always claim that my music taste is varied, but in the interest of science I’d like to put that to the test. I want to back my claims up with actual data and figure out what genres and musicians I lean towards using actual data. I’ve been subscribed to Google Music for about 5 years now, so any data sourced from the service should be more than adequate.

There’s No API

Unlike other popular streaming services Google Music doesn’t have an API I can use to get my playlist data. They do however display most of the data I need in a table on the Music Library > Songs page. While this isn’t complete it has most of the data I’ll need to get started.

Thinking this would be easy I spent a little bit of time looking through Google Chrome’s Network tab trying to find a magic JSON object with all of the data I needed, but it just didn’t exist. Therefore in order to begin any kind of data crunching I’ll need to create a script that scrapes data from this page.

As the DOM is pretty well structured I made a pretty simple script that saves all of this data into an array. The kicker is that due to how Google Music lazy loads the songs you’ll need to scroll through the page at least once until all of the play list songs have been added.

const songList = {}

const addSong = () => {
  [].forEach.call(document.querySelectorAll(".song-row"), (e) => { 
    // Collects the song data from the .song-row div.
    const songId = e.dataset.id;
    const artist = e.querySelector("td[data-col='artist']").innerText
    const album = e.querySelector("td[data-col='album']").innerText
    const title = e.querySelector("td[data-col='title'] .column-content").innerText
    const plays = e.querySelector("td[data-col='play-count']").innerText

    // As a de-duping method we use Google Music's song ID to key the item
    // in the object. We also skip pushing an item if it has no plays
    // as that data isn't valueable to us.
    if (!songList[songId] && plays) {
      songList[songId] =
        {
          artist,
          album,
          title,
          plays
        }
    }
});
}

// Google Music lazy loads the data on the page. Therefore we can watch for it as we scroll down using DOMNodeInserted.
document.body.addEventListener("DOMNodeInserted", addSong, false);

This results in a data structure that looks like the following.

{
  "2a7d1fc11-26b9-35e4-95f7-bae4f84861b6": {
    "artist": "Die Antwoord",
    "album": "Mount Ninji and da Nice Time Kid",
    "title": "Street Light",
    "plays": "10"
  },
  "c89e60b3-fc37-3cfe-a385-39c54631d3c1": {
    "artist": "DJ Rashad",
    "album": "Double Cup",
    "title": "She A Go (feat. Spinn & Taso)",
    "plays": "10"
  },
  "cc2c8102-93c4-3e6d-adfb-a713722a139e": {
    "artist": "Drake",
    "album": "More Life",
    "title": "Get It Together (feat. Black Coffee & Jorja Smith)",
    "plays": "10"
  },
}

This data can then be exported from the Chrome Developer Console by typing JSON.stringify(songList) and hitting the copy button. We can also convert the data into an array with JSON.stringify(Object.values(songList)).

Getting the Genre

The missing piece is the music genre as this isn’t displayed with the song. As I can’t add it as part of the initial scraping script I’ll need to figure out an alternative route.

Last.fm has an API which lets you query a song by name and artist. In return it will provide some metadata about the song along with a series of user generated tags that can sometimes describe the songs genre.

I created a small Python script that hits this endpoint for every track in the array and attaches the tags from Last.fm.

import requests
import json


def gather_data(song_list):
    """ Gathers data from the Last.fm api. """
    song_data = list()

    for song in song_list:
        response = requests.get(
            'http://ws.audioscrobbler.com/2.0/?method=track.getInfo',
            params={
                'api_key': '###',
                'format': 'json',
                'artist': song['artist'],
                'track': song['title']
            },
        )

        json_response = response.json()

        # Ensures that the response includes a track it found.
        if 'track' in json_response:
          tags = json_response['track']['toptags']['tag']

          tag_data = list()

          for tag in tags:
            tag_data.append(tag['name'])

          song_info = {
            'artist': song['artist'],
            'album': song['album'],
            'title': song['title'],
            'plays': song['plays'],
            'tags': tag_data
          }

          song_data.append(song_info)

    return song_data

After running the script I have a list of songs with their plays, albums, and Last.fm tags. The tags provided by Last.fm are community driven and un-curated meaning the data is questionable at best. I was pretty unsatisfied with this. For some of my most played songs the tag list was entirely empty.

{
  "artist": "Kano",
  "album": "Made In The Manor",
  "title": "GarageSkankFREESTYLE [Bonus Track]",
  "plays": "22",
  "genres": []
},

And for others the tags didn’t even describe a genre at all and included some pretty ridiculous results.

{
  "artist": "Drake",
  "album": "Views",
  "title": "9",
  "plays": "27",
  "genres": ["Hip-Hop", "epic", "fav", "Legendary", "like a lot"]
},

Going back to the drawing board I decided to give the Spotify API a try. They have a search endpoint which allows you to query by artist name which seemed somewhat promising. I wrote another small Python script which does something very similar to the first one, instead this time it attaches Spotify’s genres array.

import requests
import json

def get_spotify_query(query):
  response = requests.get(
    'https://api.spotify.com/v1/search',
    headers={
      'Authorization': 'Bearer ...',
    },
    params={
      'q': query,
      'type': 'artist'
    }
  )

  json_response = response.json()

  return json_response


def gather_data(song_list):
    """ Gathers data from the Spotify api. """
    song_data = list()

    for song in song_list:
      query = get_spotify_query(song['artist'])

      # If the search results come back with nothing then genres should be an empty array.
      try:
        genres = query['artists']['items'][0]['genres']
      except IndexError:
        genres = []
        
      song_info = {
        'artist': song['artist'],
        'album': song['album'],
        'title': song['title'],
        'plays': song['plays'],
        'genres': genres
      }

      song_data.append(song_info)

    return song_data

The data returned looks much more promising. I’d prefer to query by song specifically, but the tracks query doesn’t appear to return the genre in the payload so this will have to do. Overall it looks pretty accurate compared to the Last.fm examples.


  {
    "artist": "Kano",
    "album": "Made In The Manor",
    "title": "GarageSkankFREESTYLE [Bonus Track]",
    "plays": "22",
    "genres": ["grime", "uk alternative hip hop", "uk hip hop"]
  },
  {
    "artist": "Drake",
    "album": "Views",
    "title": "Redemption",
    "plays": "21",
    "genres": [
      "canadian hip hop",
      "canadian pop",
      "hip hop",
      "pop rap",
      "rap",
      "toronto rap"
    ]
  },

Crunching the Numbers

Now that I have my song data, all that is left to do is sort through it. For this I’m going to use JavaScript’s array sort, filter and reduce methods. I can use these to get the most plays by a specific artist or genre.

// Sorts by most plays.
songs.sort((a, b) => a.plays - b.plays)

// Sorts by most plays by a specific artist.
songs
  .filter((song) => song.artist === 'Die Antwoord')
  .sort((a, b) => a.plays - b.plays)

// Sorts by most plays by a specific genre.
songs
  .filter((song) => song.genres.includes('grime'))
  .sort((a, b) => a.plays - b.plays)

Using reduce you can flatten the genres and count their occurrence. This will allow me to see what genres occur the most throughout the sample data.

const flattened = array.reduce((acc, data) => acc.concat(data.genres), [])

function countOccurances(object, genre) {
  object[genre] = (++object[genre] || 1);

  return object;
}

flattened.reduce(countOccurances, {});

Results

After going through the results I’ve concluded the following.

My top song is Big Lie by Post Malone with 80 plays. I’ve listened to this time and time again and for some reason I just really enjoy the tune.

My top three genres are Rap, followed by Grime and Trance. In the Trance category with 33 plays my top song is Thriller by Ben Gold.

Secondly in the Grime category with 56 plays is Make it Or Break It by Bugzy Malone.

And lastly in the Rap category my top song is Do Not Disturb by Drake with 72 plays.

The album I’ve listened to the most is Integrity by Jme, followed by Stoney (Deluxe) by Post Malone.

Closing Thoughts

Turns out my music tastes are only somewhat varied after all. This was a pretty fun experiment to conduct, and I’m happy that I was able to draw some results. I’d love it if Google would release a proper API for their Google Music service though.

If you have any questions please feel free to reach out on Twitter.