What is Big Data ?
Big Data relates to rapidly growing, Structured and Unstructured datasets with sizes beyond the ability of conventional database tools to store, manage, and analyze them. In addition to its size and complexity, it refers to its ability to help in “Evidence-Based” Decision-making, having a high impact on business operations.
3Vs- Volume, Variety and Velocity.
Large Volume of datawhich may be enterprise-specific or general and public or private
Diverse set of data being created, such as social networking feeds, video and audio files, email, sensor data and other raw data
Speed of data inflow as well as rate at which this fast-moving data needs to be stored. New age communication channels such as mobile phones, emails, social networking has increased the rate of information flows
Types of Data
Structured Data refers to data that resides in formal data stores – RDBMS and Data Warehouse; grouped in the form of rows or columns. Accounts for ~10% of the total data existing currently
Unstructured Data comprises data formats which cannot be stored in row/ column format like audio files, video, clickstream data, text messages, Blogs, weather patterns, location coordinates, social media etc. Accounts for ~80% of the total data existing currently
Semistructured Data A form of structured data that does not conform with the formal structure of data models. Accounts for ~10% of the total data existing currently
Four key elements:
Big Data Management & storage:
Data storage infrastructure and technologies.
NoSQL databases to store unstructured data as well as innovative processing methods like Hadoop and massive parallel processing (MPP)
Big Data Analytics
Includes the technologies and tools to analyze the data and generate insight from it.
Analytics products (Avro, Apache Thrift)
Big Data’s Application & Use
Involves enabling the Big Data insights to work in BI and end-user applications
IT services including
Project management and customization
eBay.com uses two data warehouses at 7.5 petabytes and 40PB as well as a 40PB Hadoop cluster for search, consumer recommendations, and merchandising. Inside eBay’s 90PB data warehouse
Amazon.com handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. The core technology that keeps Amazon running is Linux-based and as of 2005 they had the world’s three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB.
Walmart handles more than 1 million customer transactions every hour, which are imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data.
Big Data market opportunity is expected to witness strong growth in the next 5 years
Key verticals driving demand for Big Data analytics: Financial services, Retail, Telecom, Healthcare and Manufacturing
Key risk – potential shortfall of Data-Savvy Managers and Data Scientists in the US by 2018