Skip to content

Latest commit

 

History

History
45 lines (29 loc) · 4.49 KB

shubham-sutar.md

File metadata and controls

45 lines (29 loc) · 4.49 KB

📑 Big Data

Big Data it is a collection of data, which is in big volume of storage and increases exponential with respective time. It should be in petabytes and above. Very Big Scale companies using big data that is Faceboot, Twitter, Google.

Types of big data :

As the Internet age continues to grow, we generate an incomprehensible amount of data every second. So much so that the number of data floating around the internet is estimated to reach 163 zettabytes by 2025. These data can be classified according to the following types:

Structured data

Structured data has certain predefined organizational properties and is present in structured or tabular schema, making it easier to analyze and sort. In addition, thanks to its predefined nature, each field is discrete and can be accessed separately or jointly along with data from other fields. This makes structured data extremely valuable, making it possible to collect data from various locations in the database quickly.

Unstructured data

Unstructured data entails information with no predefined conceptual definitions and is not easily interpreted or analyzed by standard databases or data models. Unstructured data accounts for the majority of big data and comprises information such as dates, numbers, and facts. Big data examples of this type include video and audio files, mobile activity, satellite imagery, and No-SQL databases, to name a few. Photos we upload on Facebook or Instagram and videos that we watch on YouTube or any other platform contribute to the growing pile of unstructured data.

Semi-structured data

Semi-structured data is a hybrid of structured and unstructured data. This means that it inherits a few characteristics of structured data but nonetheless contains information that fails to have a definite structure and does not conform with relational databases or formal structures of data models. For instance, JSON and XML are typical examples of semi-structured data.

Characteristics of Big Data:

As with anything huge, we need to make proper categorizations in order to improve our understanding. As a result, features of big data can be characterized by five Vs.: volume, variety, velocity, value, and veracity.

1. Volume

The prominent feature of any dataset is its size. Volume refers to the size of data generated and stored in a Big Data system. We’re talking about the size of data in the petabytes and exabytes range. These massive amounts of data necessitate the use of advanced processing technology—far more powerful than a typical laptop or desktop CPU.

2. Variety

Variety entails the types of data that vary in format and how it is organized and ready for processing. Big names such as Facebook, Twitter, Pinterest, Google Ads, CRM systems produce data that can be collected, stored, and subsequently analyzed.

3. Velocity

The rate at which data accumulates also influences whether the data is classified as big data or regular data. Much of this data must be evaluated in real-time; therefore, systems must be able to handle the pace and amount of data created.

4. Value

Value is another major issue that is worth considering. It is not only the amount of data that we keep or process that is important. It is also data that is valuable and reliable and data that must be saved, processed, and evaluated to get insights.

5. Veracity

Veracity refers to the trustworthiness and quality of the data. If the data is not trustworthy and/or reliable, then the value of Big Data remains unquestionable. This is especially true when working with data that is updated in real-time.

The Heart of Big Data Analytics

Hadoop is an open-source software framework that’s synonymous with big data storage and analysis. The system’s ability to store and process a wide range of data makes it ideal for supporting advanced analytics, such as predictive analytics, data mining, and machine learning. Hadoop consists of four modules, each designed to perform specialized tasks for big data analytics. The Distributed File System allows fast data access across a large number of storage devices, while MapReduce enables efficient data set conversions. With Hadoop Common, different computer operating systems can retry data stored in Hadoop. Finally, YARN takes care of allocating system resources.

Screenshot 2022-10-01 105602

Applications

  • 1.Next-Generation Servers.
  • 2.Superior Computing Performance.
  • 3.Cloud Computing.