The world’s leading publication for data science, AI, and ML professionals.

When is Data considered Big Data?

What you have to know about the Buzzword Big Data

Photo by Nina Luong on Unsplash
Photo by Nina Luong on Unsplash

Big Data refers to large amounts of data from areas such as the internet, mobile telephony, the financial industry, the energy sector, healthcare etc. and from sources such as intelligent agents, social media, smart metering systems, vehicles etc. which are stored, processed and evaluated by using special solutions [1].

Areas and Buzzwords related to Big Data - Image by Author
Areas and Buzzwords related to Big Data – Image by Author

The 4 V’s

In order to process, store and analyze the data, the following four challenges have to be considered:

  • Volume (the sheer amount of data – the year 2025 will feature eight times more data than in 2017 [2])
  • Velocity (the speed with which data is generated and processed – e.g. streaming, IOT, social media)
  • Variety (structured and increasingly unstructured data)
  • Veracity (lack of data quality and missing know-how for evaluation)

Technical Aspects

But when do we talk about Big Data? How much data must be congregated in order to be considered as big data? Normal data storage technologies usually work with megabytes/gigabytes. We are speaking of Big Data when the data amount reaches terabytes/petabytes. The reason for this rule of thumb is that traditional systems are no longer powerful enough and are also significantly more expensive when working with this kind of data amounts.

Typical characteristics of Big Data (Storage) Technologies are:

  • Distributed Storage
  • Data Replication
  • Local Data Processing
  • High Availability
  • Data Partitioning
  • Denormalized Data
  • Working with Structured and unstructured data
Querying a lot of data with Googles BigQuery in seconds - Image by Author
Querying a lot of data with Googles BigQuery in seconds – Image by Author

Analytical and Visualization Aspects

There are new possibilities to use and analyze data by using big data technologies:

  • Through more available computing power (keyword: cloud) larger amounts of data can be faster processed/analyzed (essential for e.g. machine learning)
  • Deep Learning (based on large amounts of data e.g. images)
  • Necessary to enable real time reporting e.g. in the area of IoT

But also the area of visualization comes with new ways and challenges due to the huge amounts of data. Therefore, new visualization techniques had to be created in order to make the data amounts more tangible for the user. Some example visualization are tree maps, sunburst diagrams or word clouds.

Treemap Example with Iowa liquor data - Image by Author
Treemap Example with Iowa liquor data – Image by Author

What will the Future bring?

Although many companies have not yet reached the big data world in terms of data volume, one or two characteristics of big data may apply to their data. But one thing is clear: The amount of data will grow constantly, if not even exponentially so it is useful to be prepared for this topic. People need to take big data into consideration when thinking about their IT architecture and system landscapes. Another point is that the emerging field of deep/machine learning becomes more and more efficient by training with more data. Therefore, the area is a perfect addition to big data. With powerful and easy to use public cloud provider services, people are able to process, store and analyze big data much faster and easier. This is especially an advantage for smaller and medium sized companies[4].

Conclusion

Big Data is definitely a buzzword and not every company will have the need for it. However, the field of Big Data offers great advantages and new ways of working with specifics. However, it comes with technical challenges that need to be overcome. With public cloud services this challenges are easier to master.

"The world is one big data problem." – by Andrew McAfee, co-director of the MIT Initiative

Sources

[1] Google, https://cloud.google.com/what-is-big-data (2021)

[2] McKinsey and Company, ARTIFICIAL INTELLIGENCE THE NEXT DIGITAL FRONTIER? (2017)

[3] itsvit, https://itsvit.com/blog/big-data-information-visualization-techniques/

[4] Bernice M Purce, Big data using cloud computing (2013)


Related Articles