As the part of my computer science degree, I delivered a capstone project designing a song recommendation system that used emtoional sentiment analysis techniques as a part of broader natural language processing research to make playlists for users based on their provided mood.
Over the course of 8 months, there were multiple pillars required for the recommendation system to be successful. Multiple APIs and song datasets were cross-referenced and pulled from in order to populate a training dataset.
GeniusAPI and Last.FM API were used to gather millions of song IDs that were web-scraped to pull lyrics from since most datasets don't publically share song lyric data.
Once the data was scraped, cleaned and ingested, it was then trained against a set of emotional lexicon text data from NRC.
Multiprocessing technologies such as Apache Spark were used to handle multiple streams of data at once. Python was used to calls the APIs and scrape the web.