Big data is the collection, storage, and management of huge amounts of digital information. Databases falling into the realm of big data today often begin in the hundreds of terabytes but can range up to petabytes (1015) or greater in size, and problems involving yottabytes (1024) of data are under discussion. For a sense of perspective on big data, assume that a reasonable quality photo in JPEG format might be about 400 kilobytes in size. If so, then a one-terabyte disk could store only about 2.5 million such photos; Facebook's photo archive currently holds about 240 billion photos. See also: Big data
Big data is the result of our being able to gather more data than can be processed. Common data sources include medical records, retail and banking transactions, weblogs, social media, mobile devices, image files, audio, video, and environmental and transportation sensor networks. Some of this is structured data, but much of it is unstructured. Although storage (particularly in the cloud) is cheap and available now, some of the data collected are stored, analyzed, and dumped. For businesses, big data has become an artificial resource that is mined, analyzed, and harnessed for profit. See also: Cloud computing; Data mining; Data warehouse
Massive data collections, by themselves, are useless: Their size overwhelms and buries most of the valuable information they hold. Only statistical analysis (analytics) of big data can identify useful patterns they contain, thanks to multicore processors, parallel processing, cloud computing, machine learning using genetic algorithms that improve automatically through experience, and predictive modeling for unknown or future outcomes. See also: Artificial intelligence; Integrated circuits; Multiprocessing
Applications of big data are found in industry, government, and healthcare services, to name a few. Currently, internet search, climate research, and banking organizations are creating peta- and exabyte-scale data sets. In terms of web and internet applications, online advertising is based on scrutiny of mountains of audience behavior observations and user predictions. Packet information and network-related data are collected, stored, and analyzed continually to detect threat patterns to avoid cyber attacks. See also: Cyber defense
Notwithstanding big data’s benefits, it also has a dark side to consider. With all the personal information available online today, privacy and profiling become big issues. The possibility of drawing wrong conclusions is also a major concern, such as for medical doctors or power-grid controllers.
Big data is currently enjoying a manic up phase of enthusiasm among its developers and customers. As with most new technologies, some downside may be inevitable on its way to adoption.