At first, I thought I would have to do an extensive literary search for an efficient solution to this problem, but my girlfriend proposed a quick solution. She suggest that I just look to see if the artists' names are the first ones listed. At the surface this seemed reasonable, except that some artists have names that are sub-sequences of other artists (e.g., "Joe", "Pink"). But this lead to an efficient solution to the problem: look for names that are equal or that have a special formatting. For example, most of the feature problems can be dealt with by looking for the regular expression /^artistsName_feat_/ or /^artistName_&_/ (underscores and ampersand are not wildcards).
This actually worked fairly well since I am only looking for a group of users that listened to songs from my dataset. This is not a solution to the misspelling problem, but it's a fair assumption that most people will listen to correct spellings when using a well-established site like Last.fm. This greatly saved some time and proved once again that one should always try something quick and dirty first.
Looking at the top 20, there is a definite pattern:
mariah_carey: | 135 |
busta_rhymes: | 105 |
usher: | 54 |
nelly: | 52 |
madonna: | 48 |
ludacris: | 42 |
wyclef_jean: | 40 |
santana: | 39 |
michael_jackson: | 37 |
bob_marley: | 37 |
david_bowie: | 35 |
ja_rule: | 32 |
dmx: | 31 |
nelly_furtado: | 31 |
ricky_martin: | 29 |
frank_sinatra: | 29 |
sting: | 28 |
cypress_hill: | 27 |
elton_john: | 27 |
outkast: | 25 |
One should note that artists like Mariah Carey and Busta Rhymes have not necessarily played with over a hundred different artists because those artists can have different spellings, which I did not correct for (e.g., "mariah_carey_feat_boys_2_men" vs "mariah_carey_feat_boys_ii_men). However, the likelihood of mispelling of the featured artists is probably not an inherint trait of the first artists, so we can treat it as noise. I don't think Mariah Carey has a particular fondness of easily mispelled or varied names.
One can also divide this group into about 3 groups (some overlap depending on personal genre definitions): hip-hop, rap, and old and established rock/pop artists. So, the "rap" problem may not be such a problem in terms of taste given the list above. Also, voice and style are very central to the "musicalness" of rap and hip-hop, so using a different artist is probably the same as a rock musician using an orchestra or a different instrument than normal.
No comments:
Post a Comment