Thursday, November 6, 2008

Science Reporting, Data-Mining, and Terrorism

Disclaimer: This blog is non-political, but can discuss how science, journalism, and politics interact.  I will try my best to simply state the facts and point to where I see a misinterpretation or omission of scienctific principles.  As such, I intentionally did not post this until after the election.

Recently, I wrote about the new responsibilities engineers have when describing technical findings with science journalists.  Shortly after my post, I began to see many articles stating that a committee put together by The Department of Homeland Security found that data-mining technology should not be used to track or identify terrorists because the technology would not work and privacy-rights would be violated due to false-positives.  At first, I did not pay attention to this story, but I started to see more and more stories saying that ultimately, this task is futile.

Futile?  Really?  This implies that we know the limits of data-mining as a science.  I guess we can cancel all those conferences next year.  Unlike many of the journalists, I chose to actually read the report beyond the Executive Summary and found the comittee's objectives and conclusions were mischaracterized.  First, the committee said that such technologies should not be used right now "given the present state of the science" (italics added) and should never be trusted in a fully automatic sense.  The report also says (in a few places), that research should continue.

Second, this report is mostly a legal report and only uses the technological aspects as background.  One common theme in the report is that false positives will occur, which results in privacy violations.  However, the report fails to give the conditions under which a particular invasion may be justified.  Clearly, the answer is not all or none, since privacy violations occur legally in non-terrorism contexts.  For example, many common law-enforcement techniques such as DNA testing, witness accounts, and even confessions have a false-positive rate.  Where are the calls to dismiss these technologies or to stop investigating crimes in general?

So what were the real conclusions in the report in regards to using data mining techniques for counter-terrorism efforts?

1. No fully automatic data-mining technique should be used.

Specifically, the document says that since there is always the possibility of false-positives, data-mining techniques can only be used to identify subjects for further investigation. This is not really new.

2. Technology can be used to reduce the risk of privacy intrusion.

Specifically, the technology can be used as a filter. The report gives an example, where only images with guns detected automatically are seen by humans for further investigation.

3. "Because data mining has proven to be valueable in private-sector applications... there is reason to explore its potential uses in countering terrorism.

Once again, proving my point that engineers and scientists need to be careful about how they describe their research and findings to journalists.

4. Programs need to be developed with specific goals and legal limitations in mind. In addition, programs must be subject to continual review.

The truth is that many of laws or legal understandings are based on judicial precidents and are rarely cleaned up by Congress. This becomes an issue when technologies change and new laws are not written. Any legal decision is largely based on the facts in the particular case and will not encompass the facts to apply a law in a broader context. A similar problem is seen in our obsolete copyright laws.

For what it's worth, I do not blame the reporters entirely.  Reading a 372-page document is a lot to accomplish with a ever-shrinking new cycle.  But this does demonstrate that engineers and scientists need to be careful about how they state their finding, since public perception and even legal policies can be altered by their mischaracterization in the media

No comments: