December 5, 2012
Big Data Is Relative
Swab your inner cheek, and your sequenced DNA takes up about 750 MB. That might not sound like much; it’s not even 1 GB. But what if you had to store the genome for 100,000 people? Now you need 72 TB. The entire population of the US? 222 PB. It adds up fast. But big data is not just about storing data; you also have to be able to process it. Suppose you wanted to run algorithms to find disease indicators in your 750 MB of DNA. The computing power necessary to run these algorithms could be substantial, making 750 MB seem big. Finally, you may want to access, search, and visualize terabytes of DNA for entire populations of people. Big data is relative. It all comes down to how well you can handle the activities of big data: store, process, and access (SPA):
- Store. Can you capture and store the data?
- Process. Can you cleanse, enrich, and analyze the data?
- Access. Can you retrieve, search, integrate, and visualize the data?