When all it takes to find an answer is to say “OK Google”, it’s almost impossible to imagine using an encyclopedia to reference information. By 2010, 2.5 exabytes of data were being generated on the internet every day; the equivalent to 167,000 times the information contained in all books in the United States Library of Congress1. The information we have access to is astronomical compared to a decade or two ago, and as we enter the age of Trillions™, this information continues to grow.
The Evolution of Data Storage
Your mobile device alone has the capability to track everything from how many steps you take in a day to frequently visited locations. As microprocessors continue to integrate into everyday technology like smartphones and wearables devices, IoT has allowed us to collect more data than ever before. However, once data is collected, where does it go? To understand data storage today, we need to go back to the beginning.
Although early computers had no viable memory or storage system, computers have been framed as information containers since they made their commercial debut in the ‘60s. When IBM released its 1401 model in 1959, it offered one of the first electronic data processing solutions to small businesses. However, there wasn’t a way to digitally store the data it processed. Users needed external data sources, in this case decks of punched cards, to feed the computer both commands and data to process and record.
From Punched Cards to The Cloud
In the ‘70s, college campuses adopted a time-sharing model (sharing computing resources with users by simultaneously running multiple programs and tasks) by transforming the punched card system’s storage pattern into a “satellite” computing station, where a row of keypunch machines, card readers, and a line printer connected with the campus mainframe (the central processing unit of a computer) via telephone wires to store data. Eventually this process was automated with the help of “robot typewriters” to replace card readers and printers. Data was then stored in images of card decks on remote disk drives.
This use of mainframes provided a solid foundation for the minicomputer, and later, the personal computer (PC), by centralizing data. Instead of remote, external data storage, information could be stored locally on the computer, shifting from using floppy disks, to hard drives backed up by floppy disks. Today, we use files to store and share information, and have brought data storage full circle with another time-sharing model, known as The Cloud.
Grand Repository in the Sky
With The Cloud, information can be instantaneously shared, as opposed to the punched cards of the past, and eventually will create an Information Commonwealth that acts as one central “cloud” that we call “Grand Repository in the Sky” (GRIS). What make GRIS different than The Cloud is that it uses a distributed network of computers, a trillion-node network, in which all nodes are peers acting as clients and servers that directly make information transactions with each other. GRIS is not the Internet or a huge database, rather, as MAYA founders put it, “a massively replicated, distributed ocean of information objects, brought into existence by shared agreement on a few simple conventions for identifying, storing, and sharing small “boxes” of data by a consensual, peer-to-peer scheme.2” Essentially, GRIS will allow people to optimize data collected and shared using peer-to-peer computing. Our connected world will help build this repository through pervasive computing devices, and allow us to access more information than ever.
How to Tame the Complexity of Data
We know we are collecting data at an exponential rate, but without a system that can intelligently identify relevant information, a centralized repository like GRIS will be useless. IBM has already began to pioneer a solution to this challenge by incorporating the power of artificial intelligence (AI) with a search engine. IBM Watson Machine Learning, previously known as Watson IBM Predictive Analytics service, has introduced using AI as a tool to refine data retrieval in the medical field. This AI has the capacity to read 200 pages in 3 seconds, allowing medical professionals to identify the most relevant information to help patients. Watson also makes 80% of unused data available to the medical field. If we applied similar technology to GRIS, it would provide people with the ability use data they didn’t know they had access to on an exponential level.
Understanding how we store information is important as we continue to move past the Internet, and can help us design better ways to find and safely share information. Additionally, by utilizing AI and machine learning, we can help optimize our access to information as we try to tame information in a Trillions™ future.