At first, I thought I would have to do an extensive literary search for an efficient solution to this problem, but my girlfriend proposed a quick solution. She suggest that I just look to see if the artists' names are the first ones listed. At the surface this seemed reasonable, except that some artists have names that are sub-sequences of other artists (e.g., "Joe", "Pink"). But this lead to an efficient solution to the problem: look for names that are equal or that have a special formatting. For example, most of the feature problems can be dealt with by looking for the regular expression /^artistsName_feat_/ or /^artistName_&_/ (underscores and ampersand are not wildcards).
This actually worked fairly well since I am only looking for a group of users that listened to songs from my dataset. This is not a solution to the misspelling problem, but it's a fair assumption that most people will listen to correct spellings when using a well-established site like Last.fm. This greatly saved some time and proved once again that one should always try something quick and dirty first.
Looking at the top 20, there is a definite pattern:
| mariah_carey: | 135 |
| busta_rhymes: | 105 |
| usher: | 54 |
| nelly: | 52 |
| madonna: | 48 |
| ludacris: | 42 |
| wyclef_jean: | 40 |
| santana: | 39 |
| michael_jackson: | 37 |
| bob_marley: | 37 |
| david_bowie: | 35 |
| ja_rule: | 32 |
| dmx: | 31 |
| nelly_furtado: | 31 |
| ricky_martin: | 29 |
| frank_sinatra: | 29 |
| sting: | 28 |
| cypress_hill: | 27 |
| elton_john: | 27 |
| outkast: | 25 |
One should note that artists like Mariah Carey and Busta Rhymes have not necessarily played with over a hundred different artists because those artists can have different spellings, which I did not correct for (e.g., "mariah_carey_feat_boys_2_men" vs "mariah_carey_feat_boys_ii_men). However, the likelihood of mispelling of the featured artists is probably not an inherint trait of the first artists, so we can treat it as noise. I don't think Mariah Carey has a particular fondness of easily mispelled or varied names.
One can also divide this group into about 3 groups (some overlap depending on personal genre definitions): hip-hop, rap, and old and established rock/pop artists. So, the "rap" problem may not be such a problem in terms of taste given the list above. Also, voice and style are very central to the "musicalness" of rap and hip-hop, so using a different artist is probably the same as a rock musician using an orchestra or a different instrument than normal.
No comments:
Post a Comment