Monday, November 10, 2008

Old Copyright Laws Hurt Research

Note: Thanks to my brother, Josh, for his comments.  Josh is an IP lawyer in Chicago, IL.

Recently, a question was phrased on a research mailing list, that more or less went as follows: the researcher was conducting a listening experiment and there was a potential that the subjects could potentially find and keep the 15-second excerpts for personal use.  The author was worried that this constituted a copyright violation.  I pointed out that more than likely this falls under fair use
.  However, reading this gives one clear impression: the law itself is rather meaningless.  First, the law only stipulates what needs to be considered in evaluating fair use, without giving guidelines or specifics.  The webpage states "There is no specific number of words, lines, or notes that may safely be taken without permission," and that it is best to obtain permission from the copyright holder.  Further, the precedent given only gives a partial list of examples that was relevant in 1961.

These points are key to researchers in information retrieval (and in particular, music information retrieval) because these laws were based on the 1960's technology.  Simply put, exchanging songs, text, images, etc., was a rather involved task.  Today, the exchange and storage can be conducted on a massive scale, unforeseen by the lawmakers fifty years ago.  With this increased capacity for storage, researchers can now test large-scale IR algorithms and the need for a (relatively) free, large scale database is needed.  However, in the case of music, such large scale databases are impossible to find or have severe restrictions on them.  Every year, I see experiment after experiment of promising algorithms, but results must be taken only so far because of the size and scope of the testing database.  Even though some schools have access to a large library archive of recordings, researchers at other institutions are unable to duplicate their results because the data is not freely available.

Some researchers have found "loopholes" that allow them to share features extracted from audio, which cannot be used to recreate the audio (e.g., Mel-cepstral coefficients
).  This is still not a viable solution because no-one can a priori determine the best features for all IR experiments and experimentation with new features is impossible.  Also, potentially, a set of features, which in combination may be reversible could lead to the best results, but this is impossible to test if only a limited set of features is ever distributed.

A very interesting solution comes in the form of MIREX
, where a TREC-like evaluation is conducted by having researchers send in algorithms to various competitions.  However, there are a few drawbacks.  First, it is an enormous burden on the sponsoring institution, IMERSIL at The University of Illinois.  The livelihood is also completely depended on the program's funding, which is fine for the next few years, but the long-term stability is not guaranteed.  Second, the evaluation is carried out once a year, but there was talk of extending this to a rolling model.  A third problem is that tasks are largely fixed and a new task is only considered if it has broad approval.  New and interesting tasks are still subject to small, private databases before their inclusion in the task.

I applaud those at IMERSIL for coming up with the proposed solution and also those that supply databases in some form or another, but these are patches to the main problem, which, as I have stated
, is that copyright regulations are severely out-of-date.  Simply put, when today's regulations were implemented, no one imagined the scalablity of today's information age.  Regulations are not only needed for the public sector to address today's file-sharing "problem," but also, better regulations are needed for today's researchers.

The problem ultimately stems from the current practice of common law.  Simply put, our current laws are written as loose guidelines and the specifics are left open to the courts.  Despite what you learned in history class, our laws are not actually written by legislatures, but rather by those on the bench.  Look at The Sherman Act: a single sentence determines when the law is applicable; however, courts have expanded and contracted this law as they see fit.  Instead of a coherent, well-structured law that anyone can follow, one needs a swarm of lawyers to get through any issue.  Worse, many people are completely unaware that they may be breaking copyright law.  Many researchers wrongly assume that if they use less than 30 seconds, then they are legally safe, but this is untrue.  It is purely dependent on whether the recording industry chooses to go after you and how good your defense team is.

So what would a good solution look like?  I have thought of one that is actually rather easy and is found in other research fields.  Handling of nuclear, biological, and chemical materials contains a strict set of guidelines for researchers to follow in obtaining, handling, and destroying potentially dangerous chemicals.  I'm actually a little surprised that a similar structure has not been suggested for the use of copyright materials.  Such guidelines could allow researchers access to large amounts of complete, unaltered data (i.e., full songs, raw audio), while still ensuring the rights of the copyright holders.

I can already address the objection that will be raised by the copyright holders: "But very few researchers will want to take home nuclear, biological, and chemical materials."  This is just untrue.  Many research labs conduct studies on illegal drugs, such as marijuana.  Are you telling me that no researcher would want to take home a little stash?  Again, strict guidelines are in place to ensure that researchers use these illegal substances in an ethical and legal manner while also ensuring that necessary research can be conducted (  This is definitely possible in terms of music, text, and other multi-media.

No comments: