Karim's Portfolio

Spotify Playlist Recommendation

With the goal of developing my data analytics skills, this project aims to create a recommendation system for Spotify playlists by analysing already existing playlists and matching them with other songs that best fit the playlist.

Jupyter Notebook and project details can be found in this git repo

Project Details

Having already used vector similarity techniques in my third year individual project, I decided to try to apply it here to find similarities between songs based on their audio features. I also wanted to explore large scale data manipulation to build my skills in data proccessing. Using the Spotify dataset seemed like the most ideal choice as well as the most interesting.

The goal was to create a system that could generate personalised song recommendations by analysing a user's existing Spotify playlists. This involved processing a substantial dataset from Spotify, consisting of numerous tracks and their respective audio features, such as tempo, energy, danceability, and more.

Technologies Used:

Python/Jupyter Notebook: Used for all aspects of structure and analysis throughout this project.
Pandas and NumPy: For data preprocessing and all numerical operations.
Scikit-learn: Used to implement machine learning algorithms and preprocessing techniques.
Matplotlib and Seaborn: For data visualisation, helping to interpret the data effectively.

Collection and Preparation

I started by gathering song data from a pre-existing dataset which included detailed features of each track such as track IDs, names, artist names, and audio features (e.g., danceability, energy, key). The preprocessing phase involved cleaning the data, handling missing values, and standardising the format of the audio features to ensure accurate comparisons.

Feature Engineering

To understand the underlying patterns in the data, I explored various audio features and used techniques like Principal Component Analysis (PCA) to reduce dimensionality. This helped in distilling the features into a more manageable form without losing critical information, making it easier to compute similarities between tracks.

Recommendation Algorithm

The heart of the notebook involves developing the recommendation algorithm. The method used is a form of content-based filtering, using track features to find similarities between songs in a user's existing playlist and the broader Spotify library dataset. The similarity between tracks is calculated using the cosine similarity with the song vectors.

The final and main function, generate_playlist_recos, takes any given user's playlist and the complete dataset as inputs, finds song similarities, and outputs a list of recommended tracks that are not yet in the user's playlist but share similar audio characteristics.

Output and Visualisation

The final step in the notebook includes presenting the recommended tracks. The output is formatted to show only essential details like track ID, name, artist, and a similarity score, which quantifies how closely the recommended track matches the user's musical taste. Links to track images or Spotify URLs are also provided.

Reflection and Future Improvments

Reflecting on this project, I've greatly expanded my knowledge and practical skills in data science, particularly in applying new technologies to real-world applications. Utilising Python, Pandas, NumPy, and Scikit-learn, I've learned the intricate processes of data manipulation, feature engineering, and the basics of building machine learning models.

Through the development and application of a recommendation system, I've gained a deeper appreciation for the potential of machine learning to create personalised experiences in music streaming services like Spotify. The project not only allowed me to work with sophisticated algorithms but also challenged me to think critically about how to tackle unique datasets.

For future work, I plan to explore more integrated solutions, such as further integration with Spotify's API for automatic playlist creation and real-time music recommendation updates. I'm particularly interested in experimenting with deeper models, like convolutional neural networks or recurrent neural networks, to explore complex patterns in music that go beyond what cosine similarity can achieve. These enhancements aim to tap into richer datasets and potentially deliver even more finely tuned music recommendations.