Friday, April 25, 2008

Rap may not be music

Ok, this isn't a serious post. I've often claimed that rap is not music because what determines a good rap from a bad rap is too heavily skewed to lyrics, so I identify it as a type of poetry. Rhyme and meter are way more important than harmony or melody. They still have an importance, but the importance is greatly attenuated. Basically, if that's music, then so is what William Shatner does and no one seriously thinks this counts as music.

To further my argument, here is a rap video by a couple of management majors, which is actually decent in terms of melody, harmony, and even production value. The only limiting factor is that the subject matter is a little dumb (for those who are unfamiliar with Georgia Tech lingo, the M-train refers to changing majors to management, which is considered to be the easy joke major). Bottom line, if these count as music, then everyone has issues.

Thursday, April 24, 2008

New Project

I'm working on a new project in language identification. Specifically, we are looking into using speech attribute detectors to enhance phonetic transcriptions. From there, supervectors are created by creating phone document vectors for each language. Moreover, we are using TempoRAI Patterns (TRAPs) as features. These have been shown to be superior to using MFCC + velocity + acceleration vectors. I would be interested to see how these perform on music, especially since incorporating dynamic features have had only limited effect. I think part of the problem is that music is (generally) slower than speech, so incorporating longer windows might be better. TRAPs are also different from texture windows because texture windows are simply first and second order statistics from the frames within the window, whereas the original frames are concatenated in TRAPs. However, since I'm limited to using USPop's feature set (MFCC), I'm not sure I'll get to see the effect any time soon.

Thursday, April 17, 2008

Searching, a skill?

I told my girlfriend in a joking manner that I was better at "Googling." I'm not really, but I think I'm more likely to use Google than she is (actually, she is a Yahoo! person... we are so different). She's also more likely to ask someone else, but I'm lazy and if it involves me leaving my desk, I'd rather go another route. Anyway, she said that "Googling" was hardly a skill. To prove her wrong (for once), I found a few websites that discuss search strategies and found one that describes the weakness of tags. The reality is that tags only provide some information because there is a structure to it. That is; tags are not completely miscellaneous because tags are ultimately categorical (sorry, Mr. Weinberger).

Using "wisdom of the crowds" to find information still requires effort on the user to learn how the population generally tags items. For example, users of last.fm know that there are many tags that are generally too vague to be of value, like "rock" and "pop." More importantly, there is an entire language of tags. True, new tags can always be created, but in order
to be useful, tags need to be used by the crowds. The most successful tags are one that have a standard definition and are discriminatory. For example, if I wanted to find my sister's favorite band, I can type "female vocalist" and "goth", which brings me to Evanescence and similar bands. However, if I were to choose two other tags on Evanescence's page I'm not going to get the same result, even if I take the most popular ones ("rock" and "female vocalist"). I think Mr. Weinberger's real focus was not just on the miscellany of the internet, but rather the personalization which can be derived from miscellany. Rather than pre-structure, adaptive-structure strategies are needed for information content.

Monday, April 7, 2008

Name this tune...

An interesting paper appeared in Psychology of Music, titled "Memory and metamemory for songs: the relative effectiveness of titles, lyrics, and melodies as cues for each other" by Pyrnircioglu, Rabinaovitz, and Thompson. There findings indicate that while people cannot remember lyrics well when given a title or melody, lyrics are better to remember titles or melodies than using titles or melodies to remember the other. However, if someone couldn't remember the target with certainty, they were asked to pick one of four choices and then asked how sure they were. In this case, lyrics were seen as not much help when used to try to remember a melody or title, even though they scored best when used. Also, even though lyrics were the never really remembered given a melody or title, people picked their choices with more certainty.

I find this pretty fascinating, but it would be interesting to see an additional study: the roles of tags and non-acoustic information. Many content-based retrieval algorithms are bootstrapping their acoustic classifiers with textual descriptions (e.g., tags). The basic idea stems from websites such as last.fm. However, I've never seen that these tags remain universal in meaning. For example, given that a song is listed with the tag "grunge," can we safely assume that everyone would understand this? Or are tags only valuable to the person that assigned them? It's probably somewhere in the middle, like genres. However, give enough tags, we can get a good "picture" of what the song contains.