Thursday, February 10, 2011

Building Digital Libraries to Contain the Data Deluge

The full article is at IEEE Spectrum, but you might want to dip your toe in with the PSFK article first.

The Coming Data Deluge talks about how digital libraries will be needed to store, filter, and organize the massive amounts of data from scientific projects. This makes me happy, because a) I love science, and b) I would really like to have a job when I graduate, and information management is right up my alley.

In the past, most scientific disciplines could be described as small data, or even data poor. Most experiments or studies had to contend with just a few hundred or a few thousand data points. Now, thanks to massively complex new instruments and simulators, many disciplines are generating correspondingly massive data sets that are described as big data, or data rich. Consider the Large Hadron Collider, which will eventually generate about 15 petabytes of data per year. A petabyte is about a million gigabytes, so that qualifies as a full-fledged data deluge.


