Business Intelligence, Analytics, and Data Science: A ...

Business Intelligence, Analytics, and Data Science: A ...

Business Intelligence, Analytics, and Data Science: A Managerial Perspective Fourth Edition Chapter 7 Big Data Concepts and Tools Slides in this presentation contain hyperlinks. JAWS users should be able to get a list of links by using INSERT+F7 Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Learning Objectives (1 of 2) 7.1 Learn what Big Data is and how it is changing the world of analytics

7.2 Understand the motivation for and business drivers of Big Data analytics 7.3 Become familiar with the wide range of enabling technologies for Big Data analytics 7.4 Learn about Hadoop, MapReduce, and NoSQL as they relate to Big Data analytics 7.5 Compare and contrast the complementary uses of data warehousing and Big Data technologies Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Learning Objectives (2 of 2) 7.6 Become familiar with select Big Data platforms and services 7.7 Understand the need for and appreciate the capabilities of stream analytics 7.8 Learn about the applications of stream analytics

Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Opening Vignette (1 of 4) Analyzing Customer Churn in a Telecom Company Using Big Data Methods Telecom a highly competitive market segment Customer churn rate is higher than most other markets A good example of Big Data analytics Challenges Data from multiple sources Data volume is higher than usual Solution Results Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Opening Vignette (2 of 4) Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Opening Vignette (3 of 4) Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Opening Vignette (4 of 4) Discussion Questions 1. What problem did customer service cancellation pose to ATs business survival? 2. Identify and explain the technical hurdles presented by the nature and characteristics of ATs data. 3. What is sessionizing? Why was it necessary for AT to sessionize its data?

4. Research other studies where customer churn models have been employed. What types of variables were used in those studies? How is this vignette different? Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data - Definition and Concepts (1 of 2) Big Data means different things to people with different backgrounds and interests Traditionally, Big Data = massive volumes of data Example, volume of data at CERN, NASA, Google, Where does the Big Data come from? Everywhere! Web logs, RFID, GPS systems, sensor networks, social networks, Internet-based text documents, Internet search indexes, detail call records, astronomy, atmospheric science, biology, genomics, nuclear physics, biochemical experiments, medical records, scientific

research, military surveillance, multimedia archives, Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Technology Insights 7.1 (1 of 2) The Data Size Is Getting Big, Bigger, and Bigger Hadron Collider - 1 PB/sec Boeing jet - 20 TB/hr Facebook - 500 TB/day YouTube 1 TB/4 min The proposed Square Kilometer Array telescope (the worlds proposed biggest telescope) 1EB/day Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Technology Insights 7.1 (2 of 2) Name

Symbol Value Kilobyte kB 103 Megabyte MB 106

Gigabyte GB 109 Terabyte TB 1012 Petabyte PB

1015 Exabyte EB 1018 Zettabyte ZB 1021 Yottabyte

YB 1024 Brontobyte* BB 1027 Gegobyte* GeB 1030

*Not an official SI (International System of Units) name/symbol, yet. Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data - Definition and Concepts (2 of 2) Big Data is a misnomer! Big Data is more than just big The Vs that define Big Data Volume Variety Velocity Veracity Variability Value Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

A High-Level Conceptual Architecture for Big Data Solutions (by AsterData / Teradata) UNI FI ED DATA ARCHI TECTURE System Conceptual View ERP ERP SCM MOVE MANAGE ACCESS

DATA PLATFORM CRM I NTEGRATED DATA WAREHOUSE Images Audio and Video Marketing Executives

Applications Operational Systems Business Intelligence Data Mining Machine Logs Text

Marketing Frontline Workers Business Analysts DI SCOVERY PLATFORM Math and Stats EVENT PROCESSI NG

Data Scientists Languages Web and Social BI G DATA SOURCES Customers Partners Engineers ANALYTI C

TOOLS & APPS USERS Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 7.1 Alternative Data for Market Analysis or Forecasts Questions for Discussion 1. What is a common thread in the examples discussed in this application case? 2. Can you think of other data streams that might help give an early indication of sales at a retailer? 3. Can you think of other applications along the lines presented in this application case?

Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Fundamentals of Big Data Analytics Big Data by itself, regardless of the size, type, or speed, is worthless Big Data + big analytics = value With the value proposition, Big Data also brought about big challenges Effectively and efficiently capturing, storing, and analyzing Big Data New breed of technologies needed (developed or purchased or hired or outsourced ) Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data Considerations

You cant process the amount of data that you want to because of the limitations of your current platform. You cant include new/contemporary data sources (example, social media, RFID, Sensory, Web, GPS, textual data) because it does not comply with the data storage schema You need to (or want to) integrate data as quickly as possible to be current on your analysis. You want to work with a schema-on-demand data storage paradigm because the variety of data types involved. The data is arriving so fast at your organizations doorstep that your traditional analytics platform cannot handle it. Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Critical Success Factors for Big Data Analytics (1 of 2)

A clear business need (alignment with the vision and the strategy) Strong, committed sponsorship (executive champion) Alignment between the business and IT strategy A fact-based decision-making culture A strong data infrastructure The right analytics tools Right people with right skills Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Critical Success Factors for Big Data Analytics (2 of 2) A Clear business need Personnel with advanced

analytical skills The right analytics tools Strong, committed sponsorship Keys to Success with Big Data Analytics A strong data infrastructure

Alignment between the business and IT strategy A fact-based decision-making culture Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Enablers of Big Data Analytics In-memory analytics Storing and processing the complete data set in RAM In-database analytics Placing analytic procedures close to where data is stored

Grid computing & MPP Use of many machines and processors in parallel (MPP massively parallel processing) Appliances Combining hardware, software, and storage in a single unit for performance and scalability Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Challenges of Big Data Analytics Data volume The ability to capture, store, and process the huge volume of data in a timely manner Data integration The ability to combine data quickly and at reasonable cost Processing capabilities The ability to process the data quickly, as it is captured (i.e., stream analytics)

Data governance ( security, privacy, access) Skill availability ( data scientist) Solution cost (ROI) Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Business Problems Addressed by Big Data Analytics (1 of 2) Process efficiency and cost reduction Brand management Revenue maximization, cross-selling/up-selling Enhanced customer experience Churn identification, customer recruiting Improved customer service Identifying new products and market opportunities Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Business Problems Addressed by Big Data Analytics (2 of 2) Risk management Regulatory compliance Enhanced security capabilities Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 7.2 (1 of 2) Top Five Investment Bank Achieves Single Source of the Truth Questions for Discussion 1. How can Big Data benefit large-scale trading banks? 2. How did MarkLogic infrastructure help ease the

leveraging of Big Data? 3. What were the challenges, the proposed solution, and the obtained results? Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 7.2 (2 of 2) Moving from many old systems to a unified new system Before After Before it was diffi cult to identify financial exposure across many systems (separate

copies of derivatives trade store ) After it was possible to analyze all contracts in single database (MarkLogic Server eliminates the need for 20 database copies) Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data Technologies (1 of 2) MapReduce Hadoop Hive Pig Hbase Flume Oozie Ambari

Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data Technologies (2 of 2) Avro Mahout Sqoop, Hcatalog, . Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data Technologies--MapReduce (1 of 2) MapReduce distributes the processing of very large multistructured data files across a large cluster of ordinary machines/processors Goal - achieving high performance with simple computers Developed and popularized by Google Good at processing and analyzing large volumes of multistructured data in a timely manner

Example tasks: indexing the Web for search, graph analysis, text analysis, machine learning, Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data Technologies--MapReduce (2 of 2) How does MapReduce work? Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data Technologies--Hadoop (1 of 3) Hadoop is an open source framework for storing and analyzing massive amounts of distributed, unstructured data Originally created by Doug Cutting at Yahoo! Hadoop clusters run on inexpensive commodity hardware so projects can scale-out inexpensively

Hadoop is now part of Apache Software Foundation Open source - hundreds of contributors continuously improve the core technology Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data Technologies--Hadoop (2 of 3) How Does Hadoop Work? Access unstructured and semi-structured data (example, log files, social media feeds, other data sources) Break the data up into parts, which are then loaded into a file system made up of multiple nodes running on commodity hardware using HDFS Each part is replicated multiple times and loaded into the file system for replication and failsafe processing A node acts as the Facilitator and another as Job Tracker

Jobs are distributed to the clients, and once completed the results are collected and aggregated using MapReduce Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data Technologies--Hadoop (3 of 3) Hadoop Technical Components Hadoop Distributed File System (HDFS) Name Node (primary facilitator) Secondary Node (backup to Name Node) Job Tracker Slave Nodes (the grunts of any Hadoop cluster) Additionally, Hadoop ecosystem is made up of a number of complementary sub-projects: NoSQL (Cassandra, Hbase), DW (Hive), NoSQL = not only SQL

Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Technology Insights 7.2 A Few Demystifying Facts about Hadoop Hadoop consists of multiple products Hadoop is open source but available from vendors, too Hadoop is an ecosystem, not a single product HDFS is a file system, not a DBMS Hive resembles SQL but is not standard SQL Hadoop and MapReduce are related but not the same MapReduce provides control for analytics, not analytics Hadoop is about data diversity, not just data volume Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 7.3 - eBays Big Data

Solution Questions for Discussion 1. Why did eBay need a Big Data solution? 2. What were the challenges, the proposed solution, and the obtained results? Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 7.4 Understanding Quality and Reliability of Healthcare Support Information on Twitter Questions for Discussion 1. What was the data scientists main concern regarding health information that is disseminated on the Twitter platform? 2. How did the data scientists ensure that nonexpert information disseminated on social media could indeed contain valuable

health information? 3. Does it make sense that influential users would share more objective information whereas less influential users could focus more on subjective information? Why? Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data and Data Warehousing What is the impact of Big Data on DW? Big Data and RDBMS do not go nicely together Will Hadoop replace data warehousing/RDBMS? Use Cases for Hadoop Hadoop as the repository and refinery Hadoop as the active archive Use Cases for Data Warehousing Data warehouse performance Integrating data that provides business value

Interactive BI tools Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Hadoop Versus Data Warehouse When to Use Which Platform (1 of 2) Table 7.1 When to Use Which PlatformHadoop versus DW Requirement Data Warehouse Hadoop Low latency, interactive reports, and OLAP Checkmark

Blank ANSI 2003 SQL compliance is required Checkmark Checkmark Preprocessing or exploration of raw unstructured data Blank Checkmark

Online archives alternative to tape Blank Checkmark High-quality cleansed and consistent data Checkmark Checkmark 100s to 1,000s of concurrent users Checkmark

Checkmark Blank Checkmark Discover unknown relationships in the data Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Hadoop Versus Data Warehouse When to Use Which Platform (2 of 2) Table 7.1 [continued] Requirement Data

Warehouse Hadoop Parallel complex process logic Checkmark Checkmark CPU intense analysis Checkmark Blank

System, users, and data governance Blank Checkmark Many flexible programming languages running in parallel Blank Checkmark Unrestricted, ungoverned sandbox explorations Blank

Checkmark Analysis of provisional data Checkmark Blank Extensive security and regulatory compliance Checkmark Checkmark Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Coexistence of Hadoop and DW (1 of 2) 1. Use Hadoop for storing and archiving multi-structured data 2. Use Hadoop for filtering, transforming, and/or consolidating multi-structured data 3. Use Hadoop to analyze large volumes of multistructured data and publish the analytical results 4. Use a relational DBMS that provides MapReduce capabilities as an investigative computing platform 5. Use a front-end query tool to access and analyze data Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Coexistence of Hadoop and DW (2 of 2) Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Big Data Vendors Software, Hardware, Service, Big Data vendor landscape is developing very rapidly A representative list would include Cloudera - cloudera.com MapR mapr.com Hortonworks - hortonworks.com Also, IBM (Netezza, InfoSphere), Oracle (Exadata, Exalogic), Microsoft, Amazon, Google, Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved IBM InfoSphere BigInsights Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Application Case 7.5 Using Social Media for Nowcasting the Flu Activity Questions for Discussion 1. Why would social media be able to serve as an early predictor of flu outbreaks? 2. What other variables might help in predicting such outbreaks? 3. Why would this problem be a good problem to solve using Big Data technologies mentioned in this chapter? Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data Platforms Teradata Aster Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Application Case 7.6 Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse Questions for Discussion 1. Why could comorbidity of diseases be different between rural and urban hospitals? 2. What is the issue about the huge difference between rural and urban patient encounters? 3. What are the main components of a network? 4. Where else can you apply the network approach? Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Figure 7.11 Urban and Rural Comorbidity Networks

Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Technology Insights 7.3 How to Succeed with Big Data 1. Simplify 2. Coexist 3. Visualize 4. Empower 5. Integrate 6. Govern 7. Evangelize Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Big Data and Stream Analytics Data-in-motion analytics and real-time data analytics One of the Vs in Big Data = Velocity

Analytic process of extracting actionable information from continuously flowing data Why Stream Analytics? It may not be feasible to store the data, or lose its value Stream Analytics Versus Perpetual Analytics Critical Event Processing? Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Stream Analytics A Use Case in Energy Industry Energy Production System (Traditional and Renewable ) Capacity Decisions

Sensor Data (Energy Production System Status) Meteorological Data (Wind, Light, Temperature, etc.) Usage Data (Smart Meters, Smart Grid Devises) Data Integration and Temporary Staging

Streaming Analytics (Predicting Usage , Production and Anomalies) Permanent Storage Area Energy Consumption System (Residential and Commercial ) Pricing Decisions Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Stream Analytics Applications

e-Commerce Telecommunication Law Enforcement and Cyber Security Power Industry Financial Services Health Services Government Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Application Case 7.7 Salesforce Is Using Streaming Data to Enhance Customer Value Questions for Discussion 1. Are there areas in any industry where streaming data is irrelevant?

2. Besides customer retention, what are other benefits of using predictive analytics? Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved End of Chapter 7 Questions / Comments Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Copyright Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved

Recently Viewed Presentations

  • Diagnosis and Management of Hyperglycemic Crises Diabetic Ketoacidosis

    Diagnosis and Management of Hyperglycemic Crises Diabetic Ketoacidosis

    Hospital Discharges for Diabetic Ketoacidosis (DKA) in the US. In 2005, diagnosis of DKA was present on. 120,000 discharges . 7.4 discharges per 1000 DM patient population
  • RESPIRATORY SYSTEM Dermal Branchiae Soft Bumps on the

    RESPIRATORY SYSTEM Dermal Branchiae Soft Bumps on the

    Dermal. Branchiae. Soft Bumps on the body that absorb oxygen from the water. These are projections from the coelom and covered by a thin layer of epidermis. So… They take oxygen via diffusion and very simple gills. RESPIRATORY SYSTEM
  • Basic Training Scenario Mobile Phone Insurance Mobile Phone

    Basic Training Scenario Mobile Phone Insurance Mobile Phone

    Instead of pricing, marketing may be attracting the lower than average risks. Educator Notes: This final questions asks students to think critically about adverse selection, segmented pricing, and the insurance company business at large. Students should be able to identify...
  • LED Flashlight

    LED Flashlight

    How does it work? When electricity flows through an LED (light-emitting diode), the LED lights up. We need a battery, two LEDs, some wire, a switch and some know-how! Need to learn some things. How to read an electrical schematic....
  • Basic Accounting Principles

    Basic Accounting Principles

    2. Microsoft Excel 3. Department's Sub-ledgers Balance Sheet Reconciliation is a template in which data from a balance sheet account is entered from the Department Balance Sheet Accounts are used in the general ledger and represent the assets and liabilities...
  • Understanding the Nature of Sleep and How it Changes With Age

    Understanding the Nature of Sleep and How it Changes With Age

    Circadian Rhythm Age-Related Changes May Disrupt Tonic Orexin Secretion. Circadian Rhythm Sleep DisordersEndogenous Misalignment: ASPD. Morbidity of Insomnia. Sleep Duration and Risk of Falls . Falls in Older Adults. Nocturia and Sleep Issues.
  • Platts Powerpoint Template

    Platts Powerpoint Template

    Inflation is expected to rise but is not expected to have a material impact on consumer purchasing power. However, it makes raising prices over the next year difficult to raise.
  • Socially Optimal Operation of Grid-scale Storage: Balancing Direct

    Socially Optimal Operation of Grid-scale Storage: Balancing Direct

    The Approach. For each US EPA eGRIDsubregion: Get hourly market clearing price data and time-varying marginal emissions factors (MEFs) and marginal damage factors (MDFs) for CO 2, SO 2, NOx and PM2.51. Use a linear programming (LP) optimization to determine...