Wednesday, August 13, 2008

Closed-set vs Open-set Tags

Ugh. The cluster is still down. I was hoping to get something together for MIREX's tag annotation contest, but there is no way I can get to it with everything else I have going on. Oh well, maybe next year. Anyway, on to the subject.

I have been examining playlist prediction using and Pandora tags. Not surprisingly, I got this result:

This was a real simple nearest-neighbor search. While this gives evidence to (part of) my hypothesis that's "anything goes" open tag set will perform better than Pandora's expert-assigned closed tag set, I need to eliminate some other variables before any final conclusions are made. Most noteably, Pandora's tag set has a size of around 500 tags, while's tag set is (at least) on the order 10,000. In fact, on just a subset of the USPop set, I found over 20,000 tags. I need to reduce the dimensions so they are comparable, but still maintain the flavor of's set.

