Wednesday, December 24, 2008

No kidding

From the DUH pile.  Up next, college students still drink beer and Baywatch was never watched for its plot lines.

Friday, December 19, 2008

NY Governor abuses science

In arguing for a soda tax, the NY Governor, David Patterson, borrowed a page from defendents of alternative medicine, by saying:

"A study by Harvard researchers found that each additional 12-ounce soft drink consumed per day increases the risk of a child becoming obese by 60 percent.  For adults, the association is similar."

The study the NY Governor cites is by Dr. David Ludwig, which by his own admission has limited statistical power.  Larger, better studies have come out documenting that snack foods and soda are not linked to obesity.  Rather, it appears that physical inactivity is to blame.  This is straight out of the alternative medicine playbook: find some small study which validates your claim and then ignore any larger, blinded studies that disprove your claim.

To quote Homer Simpson, "Oh, people can come up with anything.  14% of people know that."

Tuesday, December 16, 2008

MIR Student Researcher in Press!

Congrats, Anita on the excellent CrunchGear writeup!  It appears that MIR research is starting to pick up steam in terms of popular opinion.  Don't forget to read the comments of the article.  Now, get back to work!  The people want more!

Monday, November 10, 2008

Old Copyright Laws Hurt Research

Note: Thanks to my brother, Josh, for his comments.  Josh is an IP lawyer in Chicago, IL.

Recently, a question was phrased on a research mailing list, that more or less went as follows: the researcher was conducting a listening experiment and there was a potential that the subjects could potentially find and keep the 15-second excerpts for personal use.  The author was worried that this constituted a copyright violation.  I pointed out that more than likely this falls under fair use
.  However, reading this gives one clear impression: the law itself is rather meaningless.  First, the law only stipulates what needs to be considered in evaluating fair use, without giving guidelines or specifics.  The webpage states "There is no specific number of words, lines, or notes that may safely be taken without permission," and that it is best to obtain permission from the copyright holder.  Further, the precedent given only gives a partial list of examples that was relevant in 1961.

These points are key to researchers in information retrieval (and in particular, music information retrieval) because these laws were based on the 1960's technology.  Simply put, exchanging songs, text, images, etc., was a rather involved task.  Today, the exchange and storage can be conducted on a massive scale, unforeseen by the lawmakers fifty years ago.  With this increased capacity for storage, researchers can now test large-scale IR algorithms and the need for a (relatively) free, large scale database is needed.  However, in the case of music, such large scale databases are impossible to find or have severe restrictions on them.  Every year, I see experiment after experiment of promising algorithms, but results must be taken only so far because of the size and scope of the testing database.  Even though some schools have access to a large library archive of recordings, researchers at other institutions are unable to duplicate their results because the data is not freely available.

Some researchers have found "loopholes" that allow them to share features extracted from audio, which cannot be used to recreate the audio (e.g., Mel-cepstral coefficients
).  This is still not a viable solution because no-one can a priori determine the best features for all IR experiments and experimentation with new features is impossible.  Also, potentially, a set of features, which in combination may be reversible could lead to the best results, but this is impossible to test if only a limited set of features is ever distributed.

A very interesting solution comes in the form of MIREX
, where a TREC-like evaluation is conducted by having researchers send in algorithms to various competitions.  However, there are a few drawbacks.  First, it is an enormous burden on the sponsoring institution, IMERSIL at The University of Illinois.  The livelihood is also completely depended on the program's funding, which is fine for the next few years, but the long-term stability is not guaranteed.  Second, the evaluation is carried out once a year, but there was talk of extending this to a rolling model.  A third problem is that tasks are largely fixed and a new task is only considered if it has broad approval.  New and interesting tasks are still subject to small, private databases before their inclusion in the task.

I applaud those at IMERSIL for coming up with the proposed solution and also those that supply databases in some form or another, but these are patches to the main problem, which, as I have stated
, is that copyright regulations are severely out-of-date.  Simply put, when today's regulations were implemented, no one imagined the scalablity of today's information age.  Regulations are not only needed for the public sector to address today's file-sharing "problem," but also, better regulations are needed for today's researchers.

The problem ultimately stems from the current practice of common law.  Simply put, our current laws are written as loose guidelines and the specifics are left open to the courts.  Despite what you learned in history class, our laws are not actually written by legislatures, but rather by those on the bench.  Look at The Sherman Act: a single sentence determines when the law is applicable; however, courts have expanded and contracted this law as they see fit.  Instead of a coherent, well-structured law that anyone can follow, one needs a swarm of lawyers to get through any issue.  Worse, many people are completely unaware that they may be breaking copyright law.  Many researchers wrongly assume that if they use less than 30 seconds, then they are legally safe, but this is untrue.  It is purely dependent on whether the recording industry chooses to go after you and how good your defense team is.

So what would a good solution look like?  I have thought of one that is actually rather easy and is found in other research fields.  Handling of nuclear, biological, and chemical materials contains a strict set of guidelines for researchers to follow in obtaining, handling, and destroying potentially dangerous chemicals.  I'm actually a little surprised that a similar structure has not been suggested for the use of copyright materials.  Such guidelines could allow researchers access to large amounts of complete, unaltered data (i.e., full songs, raw audio), while still ensuring the rights of the copyright holders.

I can already address the objection that will be raised by the copyright holders: "But very few researchers will want to take home nuclear, biological, and chemical materials."  This is just untrue.  Many research labs conduct studies on illegal drugs, such as marijuana.  Are you telling me that no researcher would want to take home a little stash?  Again, strict guidelines are in place to ensure that researchers use these illegal substances in an ethical and legal manner while also ensuring that necessary research can be conducted (  This is definitely possible in terms of music, text, and other multi-media.

Thursday, November 6, 2008

Science Reporting, Data-Mining, and Terrorism

Disclaimer: This blog is non-political, but can discuss how science, journalism, and politics interact.  I will try my best to simply state the facts and point to where I see a misinterpretation or omission of scienctific principles.  As such, I intentionally did not post this until after the election.

Recently, I wrote about the new responsibilities engineers have when describing technical findings with science journalists.  Shortly after my post, I began to see many articles stating that a committee put together by The Department of Homeland Security found that data-mining technology should not be used to track or identify terrorists because the technology would not work and privacy-rights would be violated due to false-positives.  At first, I did not pay attention to this story, but I started to see more and more stories saying that ultimately, this task is futile.

Futile?  Really?  This implies that we know the limits of data-mining as a science.  I guess we can cancel all those conferences next year.  Unlike many of the journalists, I chose to actually read the report beyond the Executive Summary and found the comittee's objectives and conclusions were mischaracterized.  First, the committee said that such technologies should not be used right now "given the present state of the science" (italics added) and should never be trusted in a fully automatic sense.  The report also says (in a few places), that research should continue.

Second, this report is mostly a legal report and only uses the technological aspects as background.  One common theme in the report is that false positives will occur, which results in privacy violations.  However, the report fails to give the conditions under which a particular invasion may be justified.  Clearly, the answer is not all or none, since privacy violations occur legally in non-terrorism contexts.  For example, many common law-enforcement techniques such as DNA testing, witness accounts, and even confessions have a false-positive rate.  Where are the calls to dismiss these technologies or to stop investigating crimes in general?

So what were the real conclusions in the report in regards to using data mining techniques for counter-terrorism efforts?

1. No fully automatic data-mining technique should be used.

Specifically, the document says that since there is always the possibility of false-positives, data-mining techniques can only be used to identify subjects for further investigation. This is not really new.

2. Technology can be used to reduce the risk of privacy intrusion.

Specifically, the technology can be used as a filter. The report gives an example, where only images with guns detected automatically are seen by humans for further investigation.

3. "Because data mining has proven to be valueable in private-sector applications... there is reason to explore its potential uses in countering terrorism.

Once again, proving my point that engineers and scientists need to be careful about how they describe their research and findings to journalists.

4. Programs need to be developed with specific goals and legal limitations in mind. In addition, programs must be subject to continual review.

The truth is that many of laws or legal understandings are based on judicial precidents and are rarely cleaned up by Congress. This becomes an issue when technologies change and new laws are not written. Any legal decision is largely based on the facts in the particular case and will not encompass the facts to apply a law in a broader context. A similar problem is seen in our obsolete copyright laws.

For what it's worth, I do not blame the reporters entirely.  Reading a 372-page document is a lot to accomplish with a ever-shrinking new cycle.  But this does demonstrate that engineers and scientists need to be careful about how they state their finding, since public perception and even legal policies can be altered by their mischaracterization in the media

Pandora Video Series

Even after laying off a significant portion of its workforce, Pandora is continuing to look for ways to expand their business.  One such avenue is the start of video series format of their musicology podcast series (another favorite).  Personally, I love this and wonder if Pandora might one day expand or split off a business into the area of popular music education.  One can only hope this (and other music-based educational technologies) might be a way to offer music education to public and private schools under the threat of budget cuts.

Monday, November 3, 2008


I talked to a couple people at ISMIR about a new machine learning toolbox, called FASTLIB (although, it appears to be called both FASTLIB or MLPACK). This toolbox was developed by Alexander Gray's lab in the College of Computing at Georgia Tech and I used this extensively in Alexander Gray's class. I highly recommend that anyone try this toolbox for their machine learning needs. Programming within the guidelines greatly reduces the programming time (almost to the simplicity of MATLAB), while retaining computational speed and memory capacity. If you are like me, how have had to make the judgement call between programming something in MATLAB and having it run a long time, or spending a long time writing and debugging C++ code so that the algorithm runs quicker.

The official place to download the package seems to be here; however, I found some issues (expected with a version 1.0). The stripped down package on an old class website seemed easier to install. The individual built-in algorithms can be added manually later. I hope to have a small series of posts demonstrating the ease of programming style.

Monday, October 6, 2008

MIR and The Media: How do we interface?

The music information retrieval (MIR) community has seen an increasing amount of press time in the past couple weeks (see here and here).  As this type of research gets more press coverage, an important issue is how the researchers interact with the press.  Recently, my favorite podcast, The Skeptic's Guide to the Universe interviewed Sharon Begley, the Senior Science Editor at Newsweek, about how science and media interact (podcast #166).

First, she discussing that relying on journalists to get the word out is sure-fire way to never get any press.  Surprisingly, for someone in "Big Media," she applauds The New Media paradigm.  Self-promotion, however, raises many ethical considerations since a few researchers are apt to over-publicize their results and well before any peer-review process has taken place.  I think it is key that researchers limit their publicity until after some form of peer-review.  For example, you will not hear any results about my research on this blog until it has been approved for publication. 

Second, the press is a profit-making machine.  Therefore, the fantastical will always get more press than the consensus.  The same pressures that apply to other media types, especially in the face of The New Media.  Generating readers will be the main point of any major media outlet.  Generally, the story will be modified to be mostly true, but the key point may still be lost.  It is up to the researchers to keep their message intact.

Third, researchers cannot expect the press to understand everything in their.  At best, you can hope for some amount of scientific knowledge from the press, but they will always trust a Ph.D. in some science, even if the guy is crank (e.g., creationists, homeopathy, etc.).  Being able to describe research in comfortable layman's terms is an essential skill for any scientist/engineer.

Wednesday, October 1, 2008

Noel Gallagher Could Run For Office

Only a politician could say something absolute like Oasis will never give away a record like Radiohead did and then do something that's almost like he said he wouldn't do.  But Noel Gallagher did that since Oasis will now stream their new album for free.  True, they aren't giving it away, but it's at least 80% hypocrisy.  I can understand Noel's point: a band needs to be paid.  But maybe ownership of music is not the way to go.  Certainly, there are other ways to raise captial.  Selling music to sites that distribute it for free and then show advertisements is one way.  Encouraging ticket sales is another.  For a 90's alternative band, Oasis sure is stuck... in the 90s.

Anyway, off to see the Raconteurs.

Monday, September 29, 2008

The First Sound

Professor Whittle at the University of Virginia has put together the first sound every created: that of the Big Bang.  The sound file and a very full and insightful description can be found here.

Sunday, September 21, 2008

YouTube and New Music: Not a Solution

I did not blog too much after the first day at ISMIR because Paul Lamere was doing a very good job, so you can check out his details here.  However, there was one article I saw on CNN during the conference on YouTube being the next showcase for new music.  Personally, I hope not.  True, millions of people use the site daily and it is a great resource in trying to find music if you know exactly what you are looking for.  However, try using YouTube to discover new music, especially if you are also discovering new bands.  Currently, you'll have to enter in the artist name or song title to find anything.  But what if I want to find something that's like something else, but that I've never heard before?

The difference between the two concepts is ultimately the difference between retrieval, recommendation, and discovery.  The old "Google search bar" paradigm is great for retrieving something if I have a general idea of what I'd like.  However, it's still very object based.  By this, I mean that I need to know the thing I am looking for.  In recommendation, the focus is on the idea of what I might like.  For example, collaborative filtering notices that I've bought Carl Sagin's A Demon-Haunted World and will recommend James Randi's Flim-Flam! since people that buy Sagin's book also buy Randi's.  Currently, YouTube does have both, but their discovery technology is a lacking.  A great quick look at the difference between discovery and recommendation can be gained here, but ultimately discovery is learning why I like something, which is actually much harder to do.  What is it that I like Sagan's book?  Is it because he's a great science writer?  Is it his research?  Is it his involvement in the skeptical movement?  The three have dramatically different answers.  If it's writing,  then I'll probably want Dawkin's book on scientific writing.  If it's his research, then I'll probably want another book by Sagan about astronomy.  If it's the skeptical movement, then Randi's just one of many great selections.

Currently, YouTube does not really support discovery and would probably need to be redesigned to get it right.  I've literally listened to hundreds of artists on YouTube since I use it to play guitar.  However, there is no recommendations for new music based on what I've listened to in the past.  If I want something, I've got to first find an artist I know, and then continue to click "related" videos until I finally find something cool.  The process is long and there is a good chance that I'll get stuck in a cycle or worse, get way off target.  For the artist, this means that generating new listeners will be difficult and still have to come by word-of-mouth rather than automatic recommendation.

Tuesday, September 16, 2008

Unable to get outlet

I was unable to get an outlet until after the panel discussion, so there was no live blogging during the panel.  Sorry about that, but I guess that kind of speaks against live blogging, doesn't it?

Sunday, September 14, 2008

ISMIR 2008: Day 1 - Tutorials

I really should get more rest before ISMIR, but even with only three hours of sleep the night before, I was rivoted by the tutorial on social tags given by Paul Lamere and Elias Pampalk.  I was a bit curious as to what the new content would be since tags was a central theme in the tutorial last year on music recommendation by Oscar Celma and Paul Lamere.  I was happy to see that Lamere and Pampalk discussed their recent findings on how tags are used in general and, in particular, demographics.  Eventually, I hope that it is possible to get more information on how tags mean different things to different groups of people.  Such findings would help in "tag adaption," where someone's tag profile will reflect their own interpretation of tags and not influenced as much by the average definition among the community.

I think the overall highlight, however, was the discussion on tags from a signal processing and machine learning perspective.  These fields can greatly help with the cold-start problem, vandalism, and popularity bias, but there are a few questions that need to be addressed.  Anyway, that is all the time I have to write on the conference tonight.  I have to get slides ready for a presentation I need to give Friday to Dr. Sagayama when he comes to visit Georgia Tech.

Thursday, September 11, 2008

More Cowbell!!!

While waiting for final simulations to run before I write up a paper, I saw the most awesomest website ever:  Sounds like they are using some kind of beat tracking and music segmentation program.  Not sure what algorithm they use, but it does a fairly good job.  The transitions are very good.  I have included a couple songs below:

 Make your own at 

Wednesday, September 10, 2008

Signal Processing Series!!!

I do not know if any of my former students read this blog or if there are any new students who come across this, but I found a blog on MATLAB pointers.  Currently, they are doing a series on signal processing, and even though I have only read today's post on sampling, it seems to do a pretty good job describing the sampling theorem.

This does raise up the good point of teaching in the 21st century.  I had a teacher last fall that stated in his syllabus that we were not allowed to use materials outside of class.  Why?  The point should be whether or not I retained the information in class.  Hearing him talk about the "evils" of the internet made me wonder if he was going to yell at me to get of his lawn.  Ugh, maybe I'll write a blog post on the way teaching should change while I'm at ISMIR next week.

Anyway, to my former students: follow your teachers' rules, but after the class, please go back and try to understand the material.  A grade only helps in that first job and after that, it's recommendations and professional accomplishments.  These entirely depend on your knowledge base.

Thursday, September 4, 2008

Technology Causing IP Problems Everywhere

As someone engaged in music recommendation research, I am constantly hearing about the latest arguments from both sides over music intellectual property (IP) rights. I was a little surprised that college football is currently having issues over IP rights. Specifically, the issue seems to stem from live blogging, which is a very inefficient way to give live updates. It seems the NCAA views bloggers as a threat to TV and Internet coverage rights. It also appears that since the NCAA rents venues for private events, First Amendment Right issues may not come into play. Yes, it is true that the First Amendment is a protection from the government, not other people or organizations (at least, the way it is written).

However, I have hard time believing that live blogging is a threat to TV and Internet coverage revenue. Who really says, "hey, I could watch(or listen to) the game, but I would rather continually hit refresh on my browser and read what is going on!"? The truth is that live blogging is more than likely going to enhance people's enjoyment of the event. Apparently, the NCAA is against the idea of interactive media and (like the music industry) prays we stay in the 20th century.

Even more importantly, how in the heck is this going to be enforced? As I said, pulling out a laptop and posting to a blog repeatedly throughout a game is inefficient. I could do the same with my phone using Twitter's services, and I have the cheap-o free Samsung that I got for renewing my contract with my cell provider! Technically, anyone can microblog about the game. Is the NCAA really going to stand around every section in the stadium and look for people texting? They cannot even stop the frats from sneaking liquor into the stadium and get blind, stinking drunk!

Maybe, just maybe, we should really evaluate whether a new technology indeed trully is an infridgment on IP rights and whether that will translate into an actual revenue loss before we start over-reacting.

Wednesday, September 3, 2008

Feeding popularity or getting paid to buy

I just saw that a new website, Popcuts, is paying users to buy albums. Well, not quite. Basically, you pay 99 cents, like iTunes, to buy a song. However, if more people buy the song AFTER you purchase your copy, then you get paid (currently by credit to future purchases). The goal is to encourage people to buy music quicker.

Overall, I think this is a failed business model because it encourages people to buy before really evaluating music. This reminds me of a used car dealership distracting the customer with a lot of gimmicks so the customer does not focus on whether the car meets his/her demand. After the user feels pressured to buy a song they grow to dislike, I doubt they will return for more of the same.

One of the hopes that Popcuts has is that people will "invest" in experimental purchases and that independent bands will have a forum to increase listenership. I remain skeptical. Overall, people are likely to learn what makes a succesful hit to increase their revenue and still largely purchase that. For example, let's say I see that new Fall Out Boy song that is so catchy that it is probably one of the few videos that MTV will actually show. Even if I hate that song (and I most certainly will), I am probably guaranteed that I'll get over 99 cents in credit and be able to get more songs for free. Even if I am not one of the first, I will still make a good bit of money since hits can remain hits for quite a while. For example, Back in Black by AC/DC came out in 1980, going platinum. In 1990, the album was 10.00x MULTI PLATINUM and by 2007 was 22.00x MULTI PLATINUM.

Potentially, they could put a cap on the amount that people get for a certain purchase, but this limits the overall "investment" feel that the creators are hoping for. Another point is that this type of behavior would contribute to more sales anyway. That is, until users become dissatisified and eventually leave as I stated earlier.

However, this would be interesting to see how the long tail and short head wage war for area under the curve.

Saturday, August 30, 2008

Preparing for ISMIR

I am getting excited about ISMIR in a couple weeks. As many people suggested to me last year, I am going to ISMIR with (almost) nothing to do but take it all in. I do have a late breaking session poster, but that is on the very last session on Thursday. I do not anticipate many people attending the session since most will be trying to make flights back home. To be honest, I think they should scrap the last day or make it a full day with the dinner to end everything.

I have even started reading some of the proceedings as Elias pointed out that they are up for everyone to see. So far I have read about five papers. I have to admit, I was a little fearful about the focus of interdisciplinary research being incorporated into every individual paper. This is largely subjective and I feared that people would apply too broad of a focus and the papers would not get into technical detail in any one area. While I am sure some of this is true, I was happy to find a couple papers that were really good. I liked the paper by Moh and Buhmann about adaptive kernels and would like to see it applied to something other than artist classification, which does not necessarily translate into general similarity as Elias pointed out in his thesis. Matthew Riley, et. al., was good too. It is great to see that people are tokenizing songs to incorporate dynamics better. I think they could get more modeling power if they added HMMs and did something closer to the acoustic segment modeling approach that my adviser and I did at ISMIR a couple years ago. I really like Kurt Jacobson's paper on identifying artist communities in social communities, especially the attempt to incorporate audio analysis into the design. I am definitely going to discuss this with him at the conference.

If I have not mentioned your paper, then I probably have not read it yet, so do not take offense. I will get to it and I am sure I will like it, even if I do not post about this in the future.

Monday, August 18, 2008

Radio killed the radio star

One of my favorite Internet radio stations, Pandora, may finally be shutting down thanks to the recent hike in royalty fees. I think the only way that people may start to notice is if an Internet radio giant takes the fall. No one ever notices a problem until a staple goes down. One wonders just how record companies plan on making a profit in the future. CDs are not selling and everyone is praying for its demise. Digital distributers of music are closing shop, causing people to fear downloading anything having to do with DRM.

I guess this quote signifies how little record companies understand capitalism: "SoundExchange officials argue that because different media have different profit margins, it is appropriate to set different royalty rates." Really? Do record companies pay more for the raw materials to make CDs? If Jack White and I walk into a guitar store and purchase the same guitar, does he pay more? Surely, he will make a lot more money with that guitar than I will ever make. If I'm not mistaken, isn't charging two different people for the same product illegal when it results in decreasing competition?

Wednesday, August 13, 2008

Closed-set vs Open-set Tags

Ugh. The cluster is still down. I was hoping to get something together for MIREX's tag annotation contest, but there is no way I can get to it with everything else I have going on. Oh well, maybe next year. Anyway, on to the subject.

I have been examining playlist prediction using and Pandora tags. Not surprisingly, I got this result:

This was a real simple nearest-neighbor search. While this gives evidence to (part of) my hypothesis that's "anything goes" open tag set will perform better than Pandora's expert-assigned closed tag set, I need to eliminate some other variables before any final conclusions are made. Most noteably, Pandora's tag set has a size of around 500 tags, while's tag set is (at least) on the order 10,000. In fact, on just a subset of the USPop set, I found over 20,000 tags. I need to reduce the dimensions so they are comparable, but still maintain the flavor of's set.

Friday, August 8, 2008

Occum's Razor and the "Rap Problem"

Yesterday, I briefly described the "Rap problem," which is where artists names appear several times in a database because they feature other artists. It's probably unfair to "pick" on rap after looking at the greatest violators, but there is a clear trend that rap is a fairly big violator. Note: I'm not saying rap sucks or anything like that. I'm just saying that this presents a problem for researchers dealing in search technology. In fact, as I'll show, people who feature lots of guest artists make a pretty impressive list of musicians and performers.

At first, I thought I would have to do an extensive literary search for an efficient solution to this problem, but my girlfriend proposed a quick solution. She suggest that I just look to see if the artists' names are the first ones listed. At the surface this seemed reasonable, except that some artists have names that are sub-sequences of other artists (e.g., "Joe", "Pink"). But this lead to an efficient solution to the problem: look for names that are equal or that have a special formatting. For example, most of the feature problems can be dealt with by looking for the regular expression /^artistsName_feat_/ or /^artistName_&_/ (underscores and ampersand are not wildcards).

This actually worked fairly well since I am only looking for a group of users that listened to songs from my dataset. This is not a solution to the misspelling problem, but it's a fair assumption that most people will listen to correct spellings when using a well-established site like This greatly saved some time and proved once again that one should always try something quick and dirty first.

Looking at the top 20, there is a definite pattern:

mariah_carey: 135
busta_rhymes: 105
usher: 54
nelly: 52
madonna: 48
ludacris: 42
wyclef_jean: 40
santana: 39
michael_jackson: 37
bob_marley: 37
david_bowie: 35
ja_rule: 32
dmx: 31
nelly_furtado: 31
ricky_martin: 29
frank_sinatra: 29
sting: 28
cypress_hill: 27
elton_john: 27
outkast: 25

One should note that artists like Mariah Carey and Busta Rhymes have not necessarily played with over a hundred different artists because those artists can have different spellings, which I did not correct for (e.g., "mariah_carey_feat_boys_2_men" vs "mariah_carey_feat_boys_ii_men). However, the likelihood of mispelling of the featured artists is probably not an inherint trait of the first artists, so we can treat it as noise. I don't think Mariah Carey has a particular fondness of easily mispelled or varied names.

One can also divide this group into about 3 groups (some overlap depending on personal genre definitions): hip-hop, rap, and old and established rock/pop artists. So, the "rap" problem may not be such a problem in terms of taste given the list above. Also, voice and style are very central to the "musicalness" of rap and hip-hop, so using a different artist is probably the same as a rock musician using an orchestra or a different instrument than normal.

Wednesday, August 6, 2008

The Continued Popularity of USPop2002

In order to gather some useful training data for my thesis, I need to get some preference rankings for music recommendation. It is also necessary for there to exist tag information as well, such as and Pandora. Further, I must be able to obtain audio (or some acoustic features) rather cheaply. The best data I have found is LabROSA's USPop2002. It's much larger than RWC Database and because the songs are based on popularity in 2002, it is much more likely to have tags than Magnatunes. The downside is that I'm limited to Mel-frequency cepstral coefficients.

While, LabROSA also has playlists from OpenNap, there are no preferences given; a song is either on a person's playlist or not on a person's playlist. I've been using's API to try to remedy this situation. First, I gathered the top listeners for each of the 400 artists in the USPop2002 set. Over the past couple weeks I have been extracting the total combined weekly chart lists to get the number of plays of a particular song for each listener. While number of plays may not be a direct measure of preference (or rating), it is reasonable to assume that people will listen to song they like more than the ones they do not like. At the moment, I have only downloaded about 4000 listeners (I have to download several pages per listener and requests a 1 second wait between requests). Also, artist names appear in several different varieties. Rap and hip-hop seem to be exceptionaly bad since they are unable to do any song without a guest star.

There's tons of data to play with, but for now, let's look into what artists are popular. Note: there are still thousands of users to download and some artists' top 50 listeners have not been reached yet. These results should be taken with precaution so that we don't leap to Montauk monster conclusions (it's a racoon, let it go people).

This kind of continued success is what I would expect to see: superstars make up the vast majority of hits and the short-lived fame of others dies out. However, one should note the artists appearing at the bottom may have more plays due to the "rap problem" described above. I also wanted to see if the data was consistent with Zipf's Law, but it is not (the bend is not deep enough).

One neat thing occurred in the top 5 artists: Beatles, Radiohead, Pink Floyd, David Bowie, Queen. Only the Beatles and possibly David Bowie have had enough users from their lists to explain such high results. Indeed, it appears that the other artists would be just as popular if I had taken a random group of users (note: I'm sure the Beatles will also have this once I extract more pages).

I'll have more later.

Monday, August 4, 2008

X-cluster down

The X-cluster was taken down today for summer maintenance. Looks like it may be two weeks, but hopefully the file server gets back on-line soon. I'll probably post some preliminary results on a couple of experiments during this time.

Friday, August 1, 2008

How labels could profit from Radiohead and NIN 'experiment'

I just got done reading an interesting economic article by Will Page and Eric Garland on whether Radiohead's "pay what you want" experiment was successful in attracting usual torrent users to the band's website. I'm a little cautious of the conclusions by Page and Garland of "yes, but with a twist" because ultimately, this is a very small sample size and because the novelty might have changed user behavior.

But the author's have a fantastic analysis in the form of a table comparing,, and torrent sites. The author make note of the various invisible costs to the users such as "attention costs", "privacy costs", and "quality of product". What if record labels viewed torrent sites, not as competition, but rather as base designs that could be improved upon?

As the authors' note, a large number of people still went to torrent sites for illegal downloads even though offered the same thing, but free and legal. They conclude this is because people will ultimately keep their buying habits steady unless they have a benefit to gain from switching (clearly, legality alone is not enough). Why can't labels cash in on this?

Why could labels not offer similar sites and offer additional content? Imagine a completely different type of business model: instead of collecting money from consumers, collect them from the artists. One of the problems with a future like the one advocates is that I would have to go to a bunch of different sites (one for each artist) to obtain music. Labels could offer an "online free supermarket" of music. In addition, targeted advertisement could be done in a similar way as iLike and Amazon (e.g., "we've noticed that you like Band X, did you know that Band X has a show scheduled near you? Here are some T-shirts you can buy"). Artists would pay to have their music on these sites, sell shirts, etc. Recommendations would be made in addition. One day, oh one day, we'll have

Thursday, July 31, 2008

Fun with Venn

I've been extremely busy the past couple of weeks between a language identification project, a music tag annotation project, and like Elias, trying to improve on my weak areas (although with me, I feel optimization and real analysis are my areas to improve on - good news is that my girlfriend is an unknowing math goddess). Probably did not help that I had to read every researcher's experience in booking at the conference hotel for ISMIR on the Music-IR mailing list. (Side note: why is this conference always at expensive hotels? Thank you, Sun Microsystems!).

Anyway, came across a funny site with Venn diagram and function cartoons. Hilarious. I am such a dork.

Friday, July 25, 2008


National Public Radio has released its API. It already looks to be a tremendous research because they have audio content. I'm new to APIs so I'm pretty jazzed about this. I've only gotten to play with it for about 5 minutes, but I have verified that the audio is great. One potential application I foresee is music/speech detection and segmentation. Also, on the speech side, this data is great for topic identification. I'll hopefully have more to say on this later, but for now, I've got to go. It's 8PM on a Friday and my girlfriend is telling me I have to stop working.

Sunday, July 20, 2008

Absolute pitch

I'm sure that a few of the readers have seen Yoo Ye Eun. From what I can tell, this does not appear to be a fraud.

Of course, David Huron points out in his book, Sweet Anticipation, that absolute pitch has its disadvantages such as difficulty in judging intervals.

Still, very cool.

Tuesday, July 8, 2008

MIR Group on CiteULike

I have started a group on CiteULike for music information retrieval researchers focusing on similarity and retrieval from audio. This is to allow us to see what papers others are reading on the subject. The focus is on using non-symbolic audio as the original format. For example, using MFCCs to build genre-level Gaussian mixture models is relevant. Using DTW on MIDI signals is not relevant unless the MIDI signal is a mid-level representation (ex. "Specmurt analysis"). Onset detection is not relevant; however, using onset features to classify dance music is relevant.

I greatly encourage other fields to start their own groups (I may also start more if others join). I felt restrictions on the scope of the group was important because MIR is becoming too broad of a field. I expect that many researchers may be in several groups, which is great and there may be a lot of overlap in the papers appearing in these groups. However, in our "Everything is Miscellaneous" world, this is not a bad thing.

I've restricted that new users must be approved, but this is simply to generate a list of who's who. Anyone that wants to get in will be accepted. I am also willing to free up restrictions on anyomous postings if people want, but I want to prevent abuse since this supposed to be useful and non-combative.

Friday, June 27, 2008

The Wisdom of One

I've written a couple of blog posts on The Wisdom of the Crowd in the past, but The Economists notes a study about how asking a single person for different answers can yield better answers than asking just once. The results are better if the time between answers is longer. I'm not sure why this is all that surprising. The researchers, Edward Vul and Harold Pashler believe this phenomena may be that the brain is making hypothesis and then updating ones that are incorrect. This is probably true to some degree. But another possible reason may come from the field of machine learning.

Fundamentaly, the memory is simply a feature extracture and like all feature extraction techniques they can be quiet noisy. For example, it has been shown that doctoring images can make people remember events differently, even if they were at the event in question. In effect, by asking many people or asking the same person multiple times is taking a statistical sample, for which the mean is a better indicator on average (minimizes the square error loss). Another effect seen here is that better answers are gained by lengthing the time between questioning. This can be explained by Monte-Carlo Maximum Likelihood (MCML). One noted effect in MCML is that neighboring samples are correlated, but by taking samples spaced farther apart, the correlation decreases and closer to being independent and identically distributed.

I wonder if this could be an explanation considering something missing in the hypothesis by Vul and Pashler: what feedback was given to people to say their first answer was wrong? True, simply asking a question again probably makes the person question is correctness, but a control study group would be needed. One example is to tell people upfront that they will be given two guesses, but they will be rewarded inversely proportional to the combined answer. For example, if the correct answer in "guess a number I am thinking" is four, but one person bets four and nine, while the other person bets three and five, then the second person gets the reward.

Wednesday, June 18, 2008

Where's the RIAA on this one?

The earliest form of a computer playing music is discovered. One wonders when the RIAA will sue...

Tuesday, June 17, 2008

Dragons and psuedoscience

Brian Dunning has a must-see movie about critical thinking and the fallacy of pseudoscience. Also, on the subject are perpetual-motion and free energy machines that violate thermodynamics.

Wednesday, June 11, 2008

If you can't beat them, join them

In an attempt to be the EU of music, Merlin, tries to "to turn indie bands and labels into a loose, decentralized version of the major label." Recently, they are trying to prove their worth in a deal with online sites much like what major labels have. Of course there are two issues at play. The first is the continual instance of the major labels (and multimedia distributers in general) that everyone is out to get them and that they carry no fault. It amazes me how unaware the industry is about the fact that hits have fallen faster than the industry as a whole. The bottom line is that controlling the distribution channels ultimately chokes competition and ruins quality. Call me a skeptic, but I find it hard to believe the deal between MySpace and the major labels is anything but a plot to ration music on the internet. Are we headed for Radio 2.0?

The second issue is whether Merlin actually stays true to its message. As these major labels try ration music on the internet, will Merlin not be pulled by the appeal of control and power? Sure, there is a ton of revenue in The Long Tail, but it also involves a lot of work. What is the revenue generated from investment in a new band? As a band gets bigger, it's costs go up, but the rate of return is higher (and probably, more secure - when will U2 or Coldplay not sell out?). Also, you need fewer bands to make the equivalent revenue. Things are even more appealing if you can control the distribution channels. Will Merlin be able to pass this up and stay not-for-profit? Probably, but more likely, Merlin's Indie bands will jump ship if they start becoming a hit, ensuring Merlin will not eat up "too much" of the market.

So we are left with the status quo. A new distribution channel will open, independent bands will flock to it to get their message out, the record labels will sue, but eventually sign a deal which chokes competition, and repeat.

Tuesday, June 3, 2008

The Filter

I've been playing around with The Filter this morning, which is yet another recommender site. This one is a little different than Pandora or for two reasons. First, it recommends music, movies, and web-based videos. Second, it has a more involved enrollment stage. The approach starts by asking you what 3 genres you like out of a list of about 13. However, the genres are typical genre labels and I found myself unhappy with the selection. For starters, there is Rock/Pop as a single selection and no sub-genres under this category. Given my particular tastes, they might have well have asked, "Hey, do you like music?" Anyway, I picked three genres ("Rock/Pop", "Blues", and "Jazz"... oh did I mention that Classical is not even a choice?).

The next page gives me a list 3 "prototypical" artists for each of the three genres I picked. I will give The Filter credit, in that they allow the user to listen to a sample selection from each artist. They use a slider scale to pick a rating. There are a few problems with the choices because they are largely generic and do not encompass the range of each genre. For example, under the "Rock" category, I was given the bands "Green Day", "Blink 182", and "U2." I did this three times and got the same bands twice, so there is probably a short list. I hardly think this represents the Rock and Pop universe. For starters, Green Day and Blink 182 are both considered to be Punk bands, albeit at different times. U2 is so generic that I doubt that anyone truly hates them. They might as well have The Beatles or Led Zeppelin. The other genres were not better with choices like Ray Charles, B.B. King, and Miles Davis. There is a selection for "More artists," but you must change at least one of the sliders under the genre to get it to change. What if I don't know or are unfamiliar with the particular band? Further, what if I think that these choices are OK, but are not representative enough of my particular interests?

The movies section was the same thing: pick three genres and then a list of choices under these. I picked Action, Comedy, and Drama. For Drama I was given the 1940's version of "The Grapes of Wrath", "Casablanca", and "Mildred Place". I have heard of these, but I haven't seen them... these movies are well before my time. For action, there was John Woo's 1989 "The Killer" (I haven't heard of this before" and Goldfinger (again, who doesn't like James Bond?!?). For Comedy, I had the cartoon version of "The Grinch", "The Bank Dick" (never heard of this... a porno for rich people?), and "The Honeymooners [TV Series]", which is again, long before my time. I skipped this, since I'm not familiar enough and was more interested in music anyway.

When I finally got to the recommendations page, I was pissed. Most were bands that I would never listen to, much less buy their stuff... (e.g., Usher). The ones I had heard of where the exact ones they asked me to rate. "You like B.B. King, so you might like B.B. King." Ugh! Things got a little bit better after I incorporated my data in that the recommendations made more sense. However, one wonders why the enrollment phase is necessary. It's a time waster.

One very useful feature are the sliders. When you select a recommended track, there are two sliders. One is for the "familiarity" of the track so that users can select how much they would like to explore The Long Tail. Another is based on newness so a user can select if they only want music made recently or if they don't care.

The Filter also has an application to download that will interface with iTunes, Winamp, and Windows Media Player. I tried with iTunes and it crashed. I'm not sure if that was a problem related to iTunes, the Filter, or something else. I'll try again later with iTunes and see what happens. I tried it with Winamp and played a song. One thing I like is that you must play a specified length of the song (you decide) before it's supposed to scrobble the track. I played a single song and awaited my recommendations. However, there seems to be issues because it says that it can't because either the playing track was not recognized or there is not any related music in the library. However, when I went to the website, my recommendations where more or less what I would expect from any recommendation service. It would probably be good to have some sort of message saying "We are sending this information".

There was one major problem in that the application was supposed to scan my library and send that information to The Filter's servers so that they would not recommend music I already have. However, this appeared not to happen based on the recommendations I received. I reinstalled the application again, but this did not help things. I'll keep The Filter open for a couple days and report back if this gets fixed.

All in all, The Filter is probably a handy little tool, depending on how recommendations are done. The FAQ site says that it uses "Bayesian mathematics" and "artificial intelligence" to make the recommendations and that these are based on items bought or listened. However, I have no idea if these recommendations are given by collaborative filtering, similarity by tag data from (or something similar), or by data mining sites like (or some combination). The enrollment question phase should be skipped all together. Almost any user will have some music locally stored or have a profile at Second, it needs to made obvious to the user that data is being sent immediately when the application starts. Also, I wonder if it's possible to use the percentage of times that I've skipped or played a song in my iTunes profile? This would generate recommendations immediately instead of in a couple days after installation. Anyway, follow-up in a couple days.

Monday, June 2, 2008

A how to for myspace bands

Wired has a wiki page with a "how to" for promoting your band on myspace. I am a facebook guy, but still, pretty cool reference. Although, there are some rather obvious tips... like how attractive people are more photogenic.

Wednesday, May 28, 2008

Blaming the Internet

John C. Dvorak, recently wrote that everyone is losing perspective, in large part, because of the Internet. I found it hard to believe that someone as knowledgeable about technology would be so mislead as to fear it. He writes that there is a "decline in general perspective," which he defines "generalized or common knowledge." Further, this is due to the explosion of the Internet contributing bloggers, podcasts, etc. Mr. Dvorak leads us to believe that because of the Internet, people only read the news they want to read and fail to get a general, standard perspective.

However, there are a few problems with this theory. First, at no time was there ever a general, standard perspective. Everyone has their own perspective, which may be similar to others' perspective, but is still wholly unique. The idea that there ever was a single, unique interpretation speaks of thought control (cue Pink Floyd... "Teacher leave them kids alone").

Second, how was this general perspective even decided? Majority vote? Nope. By a handful of "middle-aged white men [sitting] around a table in a room" (quote from Everything is Miscellaneous by David Weinberger). It's not general knowledge that Dvorak is begging to return, but the knowledge deemed important by a small group of people from a limited demography. Further, there has always been bias in reporting. It has only been recently that it has come to the spotlight. Despite Fox New's stance, there is no such thing as "fair and balanced" (ask any liberal). In fact, Mr. Dvorak's example of The New York Times is hardly bias-free (ask any conservative).

Mr. Dvorak believes that custom newspapers, which tailor to a reader's interests, makes people only read news they want. While this is potentially possible, newspapers have never exactly been a solution to this either. How many people read the newspaper from front to back, never missing an article? From experience, I can safely say that my Mom read the sports about as much as I read the Home and Garden section, which was... never!

Another cause for concern according to Mr. Dvorak is that those "gosh darn kids today" do not read newspapers and are the ones who really fail to get the "general perspective" (quotes are added to this from now on because the thought is complete rubbish). Maybe when Mr. Dvorak gets back from yelling at the kids to get off his lawn he'll ask himself a few questions such as:

How many kids where reading the newspaper before the Internet?
How many kids just read the sections that interested them (e.g., Sports, Comics, etc.)?
How many kids now have replaced reading those sections with similar sites on the Internet (e.g.,,,

Sadly, Mr. Dvorak gives no data on any of this or any of his other claims. The truth is that we have a wealth of resources available to us. True, some are fictitious and utter nonsense, but it is not like these viewpoints only came about with the rise of the Internet or blogs. There were idiots in the past, there are idiots now, and there will continue to be idiots in the future. The Internet is a medium and nothing more. In fact, I couldn't be happier that these people have found a medium on the Internet because I can now board a plane without being harassed.

I think that Mr. Dvorak assumes that a newspaper is akin to an Encyclopedia, but again, even an Encyclopedia may have an implicit bias. As Mr. Weinberger points out in Everything is Miscellaneous, the wisdom of the crowds as lead to a truly consensus view (e.g., Wikipedia). Further, Mr. Weinberger states that one can actually view the degree to which consensus has been reached by looking at the history of pages. This allows for one to see if the post is new, is being changed a lot, or has settled into a stable state (i.e., consensus has been reached).

I am not really sure if it was Mr. Dvorak's hope of looking like someone who is afraid of technology or someone who is nostalgic for the "old ways," but he succeeded in both. The true cause of the decline in traditional media is that it is too static and mankind has evolved. Simply put, traditional media is just not enough anymore and is no longer a "good thing".

Saturday, May 24, 2008

Science to the Rescue!

This isn't intended to have a political side, but it was just a cool example of how science can ultimately solve a problem. A teenager finds out how to decompose a plastic bag in months. Note: I have only seen news articles on this and as far as I know, it's not been tried by others yet. However, if this is true, it's kind of cool. This is also a lot more promising than praying that no more plastic bags are made.

Monday, May 19, 2008

Friday, May 16, 2008

Sounds familiar

The Leading Question and Music Ally have teamed up to state the five ways the music industry can save its sorry a$$. Of course, this is just a repeat from what other have said (recycled isn't just left for RSS feeds). For instance:

1) Gerd Leonhard already stated the need to bundle music into other products.

3) Kurtis Jacobson already stated that freeing music will lead to increased revenues.

4) and 5) David Jennings stated in his book, that charts are vastly out-dated because they "lost their 'water cooler' effect". Also, half his book is on the need for more power to be given to Savants.

Even point 2) could be seen as a part of the solution proposed by David Weinberger. Judging from the speed that companies and the government seem to work, one wonders if its a good idea to listen to the consumer.

Wednesday, May 14, 2008

Multi-tag search

Elias demonstrates the new playground. I love the multi-tag search. I wonder if after 10-20 years of music sites like and Pandora, which cater to individual taste, might start to change the vocabulary we use to describe music. Specifically, will genre labels go the way of the 8-track? Too vague and highly variable. Thinking about, genre was a product of record companies. Makes sense that as record companies are forced to change their structure that the vocabulary would go with it.

Monday, May 12, 2008

Science Fiction movies that do it right

I'm not much of a science fiction fan, but I did like that none of the five science fiction movies that get the science right mentioned neural networks. We've all heard the plot line: engineers develop a neural network, it becomes "aware," and then decides that the most logical thing to do is to go on a massive killing spree (of course!). For all of those who worry about technological advances and specifically the dangers of neural networks: there is NOTHING to worry about. This is not how they work and it is not even close. True, they are "inspired" by the early ideas of how neurons work, but of course, the 1950s science is outdated. The brain is much more complex than anything built with neural networks. Sorry, but I always have to roll my eyes whenever I watch Terminator. I now know how my parents feel when they watch ER. (Case in point, on TV, a person who needs CPR in the hospital will survive on Chicago Hope with a 64% of the time, but real-life the highest number is 40% and the long-term survival rate is no more than 30%).

Sunday, May 11, 2008

Piracy hurts public health?

It seems I can always count on California to overreact and come to some illogical conclusions. While music and video piracy is still a problem for the major record labels, I find it hard to believe that there is any detriment to public health. I have never heard of anyone being physically or even emotionally hurt by such a crime. There has never been anyone dragged to the emergency room due to an illegal download. In fact, unless you are traveling around the coastal waters of Somalia and other parts of the Indian Ocean, I would have to say your chances of being hurt by anyone engaged in piracy is minimal. I'm starting to wonder if those running the Los Angeles government have confused The Pirates of the Caribbean with the nightly news. They even go to say the welfare of the country's citizens are at stake. I know the RIAA believes that dead people and pre-teens are a horrible aggitators and need to be stopped, I have never been in any immediate threat because on of my neighbors may have obtained the newest Sun Kil Moon album illegally. This whole statement screams of such ridiculous overstatements that it should trigger one's skeptic sense.

Friday, April 25, 2008

Rap may not be music

Ok, this isn't a serious post. I've often claimed that rap is not music because what determines a good rap from a bad rap is too heavily skewed to lyrics, so I identify it as a type of poetry. Rhyme and meter are way more important than harmony or melody. They still have an importance, but the importance is greatly attenuated. Basically, if that's music, then so is what William Shatner does and no one seriously thinks this counts as music.

To further my argument, here is a rap video by a couple of management majors, which is actually decent in terms of melody, harmony, and even production value. The only limiting factor is that the subject matter is a little dumb (for those who are unfamiliar with Georgia Tech lingo, the M-train refers to changing majors to management, which is considered to be the easy joke major). Bottom line, if these count as music, then everyone has issues.

Thursday, April 24, 2008

New Project

I'm working on a new project in language identification. Specifically, we are looking into using speech attribute detectors to enhance phonetic transcriptions. From there, supervectors are created by creating phone document vectors for each language. Moreover, we are using TempoRAI Patterns (TRAPs) as features. These have been shown to be superior to using MFCC + velocity + acceleration vectors. I would be interested to see how these perform on music, especially since incorporating dynamic features have had only limited effect. I think part of the problem is that music is (generally) slower than speech, so incorporating longer windows might be better. TRAPs are also different from texture windows because texture windows are simply first and second order statistics from the frames within the window, whereas the original frames are concatenated in TRAPs. However, since I'm limited to using USPop's feature set (MFCC), I'm not sure I'll get to see the effect any time soon.

Thursday, April 17, 2008

Searching, a skill?

I told my girlfriend in a joking manner that I was better at "Googling." I'm not really, but I think I'm more likely to use Google than she is (actually, she is a Yahoo! person... we are so different). She's also more likely to ask someone else, but I'm lazy and if it involves me leaving my desk, I'd rather go another route. Anyway, she said that "Googling" was hardly a skill. To prove her wrong (for once), I found a few websites that discuss search strategies and found one that describes the weakness of tags. The reality is that tags only provide some information because there is a structure to it. That is; tags are not completely miscellaneous because tags are ultimately categorical (sorry, Mr. Weinberger).

Using "wisdom of the crowds" to find information still requires effort on the user to learn how the population generally tags items. For example, users of know that there are many tags that are generally too vague to be of value, like "rock" and "pop." More importantly, there is an entire language of tags. True, new tags can always be created, but in order
to be useful, tags need to be used by the crowds. The most successful tags are one that have a standard definition and are discriminatory. For example, if I wanted to find my sister's favorite band, I can type "female vocalist" and "goth", which brings me to Evanescence and similar bands. However, if I were to choose two other tags on Evanescence's page I'm not going to get the same result, even if I take the most popular ones ("rock" and "female vocalist"). I think Mr. Weinberger's real focus was not just on the miscellany of the internet, but rather the personalization which can be derived from miscellany. Rather than pre-structure, adaptive-structure strategies are needed for information content.

Monday, April 7, 2008

Name this tune...

An interesting paper appeared in Psychology of Music, titled "Memory and metamemory for songs: the relative effectiveness of titles, lyrics, and melodies as cues for each other" by Pyrnircioglu, Rabinaovitz, and Thompson. There findings indicate that while people cannot remember lyrics well when given a title or melody, lyrics are better to remember titles or melodies than using titles or melodies to remember the other. However, if someone couldn't remember the target with certainty, they were asked to pick one of four choices and then asked how sure they were. In this case, lyrics were seen as not much help when used to try to remember a melody or title, even though they scored best when used. Also, even though lyrics were the never really remembered given a melody or title, people picked their choices with more certainty.

I find this pretty fascinating, but it would be interesting to see an additional study: the roles of tags and non-acoustic information. Many content-based retrieval algorithms are bootstrapping their acoustic classifiers with textual descriptions (e.g., tags). The basic idea stems from websites such as However, I've never seen that these tags remain universal in meaning. For example, given that a song is listed with the tag "grunge," can we safely assume that everyone would understand this? Or are tags only valuable to the person that assigned them? It's probably somewhere in the middle, like genres. However, give enough tags, we can get a good "picture" of what the song contains.

Wednesday, March 19, 2008

Using an HMM != ASR

As I said yesterday, many MIR researchers have tried to copy the usual automatic speech recognition (ASR) paradigm by using hidden Markov models (HMMs). However, almost all of these approaches have not correctly used HMMs... at least, if their goal was to mirror what is done in ASR research. Most MIR researchers have modeled an entire song with an HMM with the number of states varying between 3 to 10. Usually, it's been noted that these approaches fair no better than using a GMM for the entire song. The conclusion for a long time has been that dynamic information is not important for music similarity.

The problem I have is in the model itself. Using an HMM for an entire song (or even worse, genre) is NOT the same paradigm in ASR. A song is typically 3 minutes in length, but HMMs in speech are rarely larger than a single word or phone, so the length of time for an HMM is typically on the order of milliseconds. The reality is that HMMs in speech are shared among different utterances. If one wants to copy this for MIR, then HMMs need to be shared across songs. Of course, no one has come up with a good way to provide music transcriptions from which to train HMMs in this way.

Christopher Raphael presented a paper that did try to provide transcriptions to monophonic melodies using HMMs, but no one has really stated how this would apply to polyphonic music. Imagine a slow-moving bass line with a very fast staccato melody on top. Does one start a model each time a new sound starts? Doing so would mean that there would be a very high number of models because every possible combination of notes from every voice would have to have its own model. What about only modeling the lowest note? The problem is that the densities under each state would need a very high number of mixtures to account for the different notes that may be played on top (even more still if one includes different instruments). This means tons of data.

In reality, until something better comes along, using unsupervised HMM tokenization is the best chance for modeling music in the same fashion as speech. The downside is that no one has a direct interpretation for what these models mean. However, there are language identification papers where phone models trained in one language are modeled on another language, even if one language has sounds that are not modeled in the other. This gives hope for those studying music similarity and classification.

Tuesday, March 18, 2008

Cepstral mean subtraction worthless?

It has been cited in several research papers (most notably, Aucouturier and Pachet 2006), that performing cepstral mean subtraction (CMS) is damaging to music information retrieval. However, such an approach is common place in automatic speech recognition. I've noticed this with any algorithm that models global timbre. For example, Aucouturier and Pachet modeled each song with a Gaussian mixture model (GMM) and then compared distances with an estimated Kullback-Leibler divergence and noticed a detrimental effect with CMS. This result has been verified by other researchers as well (as well as me). However, there is an important point: the model is built at the global song level. When models are shared among several songs, like acoustic segment modeling (ASM) (Reed and Lee, 2006), it is not only useful to perform CMS, but necessary. If one does not perform CMS, the ASM approach does not work. In fact, most songs sent through the Viterbi decoder will not have surviving paths and even if the paths do survive, it is most often only going to produce a couple of "musiphones."

It should be remembered that CMS discards information (e.g., recording equipment), which is definitely useful for similarity. Obviously, people who record similar types of music are going to use similar types of equipment. However, if one performs CMS on a global model there is no gain to be had by discarding this information. On the other hand, if one wants to use dynamic information, then discarding information by using CMS is necessary. I think a lot of researchers have been citing the conclusions by Aucouturier and Pachet a little unfairly. Their paper was based on global timbre models and results are not applicable to approaches which take dynamics into account.

However, it should be noted that just using HMMs does not necessarily bring useful dynamic information, either. One needs to use these intelligently, which will be the subject of tomorrow's post.