According to IBM, we create 2.5 quintillion bytes of data every day. These data originates from all spheres of activity and everywhere: to name just a few, data’s come from sensors, social media sites, digital pictures, web logs and transaction records of online purchases etc,.
In general, data can be classified into three categories. Any data which can be stored in databases can be called as Structured data. For example, transaction records of online purchase can be stored in databases. Hence, it can be called as Structured data. Some data can be partially stored in databases which can be called as Semi-Structured data. For example, the data on the XML records can be partially stored in databases and it can be called as Semi Structured Data.
The other forms of data which will not fit into these two categories are called as Unstructured Data. To name a few, data from social media sites, web logs cannot be stored analysed and processed in databases, therefore it is categorised as Unstructured Data. The other term used for Unstructured Data is Big Data.
According to NASSCOM, Structured Data accounts for 10% data hk of the total data that exists today in the Internet. It accounts for 10% of semi-structured data and the remaining 80% of data comes under Unstructured Data. In general, organizations use analysis of Structured and Semi Structured Data using traditional data analytics tools. There was no sophisticated tools available to analyse the Unstructured Data till the Map Reduce framework which was developed by Google. Later, Apache developed a framework called “Hadoop” which analyses all these Data and reveals information which will be of great help for business to take better decisions.
Hadoop has already proved its importance in several areas. For example, according to NASSCOM, many organizations have started using Big Data analytics. National Oceanic and Atmosphere Administration (NOAA), National Aeronautics and Space Administration (NASA) and several pharmaceutical and energy companies have started using big data analytics extensively to predict their customer behaviour.
According to a recent research from Nemertes group, organizations perceive value in Big Data analytics and planning to have a better leverage in reaping the benefits of Big Data Analytics. The New York Times is using Big Data tools for text analysis, and Walt Disney Company use them to correlate and understand customer behaviour in all of its stores and theme parks. Indian IT companies such as TCS, Wipro, Infosys and other key players have also started to reap the immense potential which Big Data continues to offer.
This clearly shows that Big Data is an emerging area and many companies have started to explore new opportunities. Meanwhile, usage Big Data is proving to be worthwhile but at the same time it may also be noted that privacy and data protection concerns have also risen.