In 2014, the pop-culture data analysis website The Pudding released an analysis of rappers’ vocabularies that still gets trotted out occasionally. (It’s the one that ranked Aesop Rock Most Verbose Rapper Ever.) They recently released what they pitch as an explicit sequel to that work called “The Language Of Hip Hop,” and it’s just about the most interesting thing on the planet, assuming you care very much about rappers and the words they use.
They started by adding 26 million words from the lyrics of 500 top artists, comprising some 50,000 songs. This is the language of hip hop, spanning four decades. They then made another data set of 275,905 non-hip-hop songs, which they used to contrast and find the words most disproportionately used by rappers. The most used words are fascinating—chopper comes in at number one, followed by stunting, flexing, mane, and, interestingly, the name “Nina”—but just as interesting are the words least used in hip-hop: sailed, emptiness, sigh, desire, sea, broken, mountain, and other feathery singer-songwriter horseshit.
Here’s a video of Gucci Mane making various Gucci Mane sounds before we continue:
The next step was to create specific sets of language used by each rapper—so, to find the words each emcee uses disproportionately more than other rappers. Here’s where shit gets real fun: You can find any old rapper you want and see “their” words. A lot of it is stuff that you innately sort of know (Ghostface: likely to say “aiyo”) but it’s still illustrative of their artistry. Chance’s words include praises, wonderful, and cocoa; Future says splurge, xans, astronaut, and spaz; Ugly God pretty much exclusively talks about butts. You can sort by decade and just click around, watching the slang of Kool Moe Dee transform to the slang of Raekwon. You can also just cut to the chase and sort it to be exclusively Wu-Tang rappers, because let’s be real, if you’re reading a data analysis of hip-hop vocabulary, this is what you’re here for.
There are more toys to play with, like an interactive map that portrays rappers in clusters along with others who use similar language, but really the whole thing is just a lovely ode to the eloquence and variety of language in rap. It’s a well-structured essay, too, with some jokey asides and a clear progression of logic. The Pudding’s author Matt Daniels closes the piece with a caveat that there’s room for debate in their methodology from data scientists and machine learning folks, and there’s no analysis given to the hows or whys of these clusters. Really, though, you should treat the whole thing as a set of toys to be fiddled with, broken, questioned, and enjoyed, just as rappers do the customs and conventions of the English language.