I came across an old article citing a 2003 study about the world’s largest databases. It doesn’t look like they did one in 2004, though they are starting to do another one in 2005.

All Environments & UNIX Only: Top honors went to France Telecom for the largest database in the All Environments and UNIX Only categories. At 29.2TB, the database was three times as big as that of the 2001 winner. France Telecom runs the Oracle Database on HP Superdome servers and HP RAID storage systems.

Windows Only: The Windows Grand Prize was awarded to comScore Networks, Inc. The 8.9TB implementation was six times the size of the database of the previous winner ­ none other than comScore itself! comScore is a Sybase IQ DBMS site, with Dell PowerEdge servers and EMC Symmetrix 5 storage arrays.

[...]

All Environments: The 2003 program was the first to track hybrid databases, which store data on both tape and disk. In general, these are data archives in which the majority of the data that can be queried is on tape. However, because their sizes dwarf the size of other databases, they deserve notice. Approaching a petabyte of data, the Stanford Linear Accelerator Center (SLAC) database, at 828.8TB, earned the Grand Prize. The SLAC database is managed by Objectivity DBMS on Sun Fire servers and Sun StorEdge storage arrays.

828.8 TB– that is amazing! This article is fairly interesting to me since I work in an multi-terabyte environment, and database platforms always seem to be a hot topic because of speed and scalibility, or lack thereof.

A few other snips from eWeek about this study:

Taking home the prize for largest database size for all OS environments and Unix, for the DSS portion was France Telecom boasting 29.2TB. France Telecom uses Oracle Corp. as its DBMS, Hewlett-Packard Co. as its storage and system vendor, and employs an SMP (symmetric multi-processing) architecture. In the Windows comparison for database size, ComScore Networks Inc. came in first with 8.9TB for its database. ComScore relies upon Sybase and its Sybase IQ offering as its DBMS, Dell for its systems, and EMC as its storage provider in a clustered architecture. In 2001, ComScore finished on top in the same category with only 1.5TB.

[...]

Within the TopTen program’s new Normalized Data Volume category, measuring data managed by the DBMS, AT&T ranked number one at 94.3TB, which is nearly three times as large as Amazon.com at 34.2TB in the number two slot.

I think the scary thing about this survey is that companies who compile massive amounts of “opt-in” consumer data (like grocery stores with discount cards, etc), probably didn’t take part in this survey. I would imagine there are some bigger databases which weren’t included from companies like that, and other places like financial instutions, insurance companies, health companies, etc.