Unistore: Project Updates

Unistore: Project Updates

Unistore: Project Updates Presenter: Wei Xie Project Members: Jiang Zhou, Mark Reyes, Jason Noble, David Cauthron and Yong Chen Data-Intensive Scalable Computing Laboratory(DISCL) Computer Science Department Texas Tech University We are grateful to the Nimboxx and the Cloud and Autonomic Computing site at Texas Tech University for the valuable support for this project Unistore Overview To build a unified storage architecture (Unistore) for Cloud storage systems with the co-existence and efficient integration of heterogeneous HDDs and SCM (Storage Class Memory) devices Prototype development based on Sheepdog and/or Ceph Data Placement Component Characterization Component Workloads Access patterns

Devices Bandwidth Throughput Block erasure Concurrency Wear-leveling guide I/O Pattern Random/ Sequential Read/write Hot/cold I/O Functions Write_to_SSD Read_from_SSD Write_to_HDD Placement Algorithm Modified Consistent Hash

Team and Leverage Faculty: Yong Chen Post-doc researcher: Jiang Zhou Ph.D. student: Wei Xie Undergraduate student: Mark Reyes Nimboxx: Jason Noble and David Cauthron Experimental platform: Two nodes on DISCI cluster in [email protected] CPU - 2 x 8 Core E5 -2650v2, 2.60GHz Memory - 128GB 3*500GB SAS HDD and 2*200GB SSD Phi 5110pP Coprocessors Used as sheepdog storage nodes 3

Background: Challenges in Data Distribution Requirement of data distribution Scalability Load balance (based on capacity) Data need to be randomly and statistical proportionally distributed according to nodes capacity Handles node addition/removal Data replication for fault-tolerance High performance Throughput of storage nodes need to be fully exploited

Consistent hashing and CRUSH handle the first four problems fairly well CRUSH is a more flexible as it is able to distribute data based on the physical organization of nodes for better fault-tolerance 4 Background: Challenges in Data Distribution Heterogeneous storage environment Distinct throughput NVMe SSD: 2000 or more MB/s SATA SSD: ~500 MB/s Enterprise HDD: ~150 MB/s Large SSDs are becoming available, but still

expensive 1.2TB NVMe Intel 750 costs $1000 1TB SATA Saumsung 640 EVO costs $500 10 or more costly than HDDs SSDs still co-exist with HDDs as accelerator instead of replacing them 5 Background: How to Use SSDs in Cloudscale Storage Traditional way of using SCMs (i.e. SSD) in cloudscale distributed storage: as cache layer

Caching/buffering generates extensive writes to SSD, which wears out the device Need fine-tuned caching/buffering scheme Not fully utilize capacity of SSDs The capacity of SSDs is growing fast Treat SSD-equipped nodes the same level as HDD-equipped nodes No need to do cache replacement or buffer flushing User sees the storage system with combined capacity and maximized performance Less write to SSDs Load-balance aware distribution and performance aware distribution are naturally conflict SSDs are usually smaller but faster, while HDDs larger but slower Existing data distribution algorithms do not consider this problem

6 Project Tasks: Overview Data distribution management of Unistore Modify the data distribution algorithm (Consistent hash) in Sheepdog or CRUSH algorithm in Ceph Achieves load-balance, reliability and performance at the same time for heterogeneous storage Different storage devices are unified and fully utilized Two-mode distribution: BigData15 Short Paper SUORA algorithm

Tracing IO operations and workload characterization Instrument Sheepdog for IO tracing capability Integrate IO workload characterization component to serve as the hint for data distribution Tracing component developed by Mark Reyes 7 Activities Bi-weekly meeting for the team members to report progress and discuss the problems Each student members report the recent research and development progress. Bring up new ideas or discuss current ideas

Presentation slides and meeting minutes are maintained 8 Deliverables Two-mode paper accepted by BigData15 conference SUORA paper completed and preparing for submission A new paper called Tier-CRUSH is in

preparation IO tracing and workload characterization component is being developed Try patent filing 9 Two-Mode Data Distribution Data Objects Data Distributor Distributor Selector Capacity Monitor Traditional data distribution only cares about load-balance, i.e. uses capacity-based distributor We propose to use performancebased and capacity-based distributor at the same time IO Monitor Performancebased Distributor

Capacitybased Distributor Storage Nodes Switch between two mode is based on the use of capacity and IO workload Read and write policy to handle two modes Mode transition strategy to reduce data migration overhead 10 Throughput Improvement 1.8 performance gain here Migration overhead ignored Significant system throughput improvement in a wide range of user input

SUORA Algorithm Multiple tiers, each tier represents a type of storage devices with similar characteristics (performance, capacity) Data placed across different tiers based on hotness Data distributed across different nodes in each tier randomly and uniformly and proportionally to capacity 12 Conclusions

Reconsider data distribution with heterogeneous storage devices with distinct performance metrics Two-mode scheme targets at providing maximized performance while still maintaining load-balance, without drastic change to existing data distribution algorithms Analysis shows potential of the two-mode scheme Still need more trace-based or real world evaluation of the scheme The proposed algorithms received positive feedback from BigData conference On-going/Future Work

Starting to implement the proposed algorithms in Sheepdog or Ceph Continue the development of IO tracing and characterization component Writing a new paper name Tiered-CRUSH that extends CRUSH algorithm to support heterogeneous storage Integrate workload characterization component and data distribution component together Test on the experimental platform 14 Thank You

Please visit: http://cac.ttu.edu/, http://discl.cs.ttu.edu/ Acknowledgement: The [email protected] is funded by the National Science Foundation under grants IIP-1362134 and IIP-1238338. 15 Please take a moment to fill out your L.I.F.E. forms. http://www.iucrc.com Select Cloud and Autonomic Computing Center then select IAB role. What do you like about this project? What would you change? (Please include all relevant feedback.) 16

Recently Viewed Presentations

  • www.rsb.org.uk The Royal Society of Biology and you

    www.rsb.org.uk The Royal Society of Biology and you

    RSciTech. RSci. CSci. £500 grant for overseas travel in connection with biological study, teaching or research . CSciTeach. CBiol. POST-NOMINAL LETTERS. Boost your professional recognition by using our post-nominal letters. PROFESSIONAL REGISTERS.
  • Paint Pots - cdn.ymaws.com

    Paint Pots - cdn.ymaws.com

    Establishment and Application of Daily Cost-Sharing Rate, Con't. Beginning January 1, 2014, Medicare Part D sponsors must apply a DCR to all prescriptions for less than a month's supply (unless an exception applies due to the type of drug involved),...
  • The Blueprint for Change: A National Strategy to Enhance ...

    The Blueprint for Change: A National Strategy to Enhance ...

    Programs have strong recruitment efforts. STEM teaching is promoted as an exciting and fulfilling career opportunity, rather than a good back up plan. Teacher prep. programs have extensive clinical (field-based) components and are collaborative, working with in-service teachers and schools...
  • Tax Rates - Virginia Employment Commission

    Tax Rates - Virginia Employment Commission

    The computed tax rate is determined by applying the resulting percentage and the trust fund balance factor to the rate tables provided by the law. The pool cost charge is added to all employers' tax rates to compensate for charges...
  • First Rocky Exoplanet Detected  Most known exoplanets are

    First Rocky Exoplanet Detected Most known exoplanets are

    First Rocky Exoplanet Detected Most known exoplanets are large and have low densities - similar to jovian planets in our solar system A space telescope recently discovered a planet with radius only 70% larger than Earth's
  • Chapter 8 - Object-Based Programming

    Chapter 8 - Object-Based Programming

    Chapter 8 - Object-Based Programming Outline 8.1 Introduction 8.2 Implementing a Time Abstract Data Type with a Class 8.3 Class Scope 8.4 Controlling Access to Members 8.5 Referring to the Current Object's Members with this 8.6 Initializing Class Objects: Constructors...
  • William Stallings, Cryptography and Network Security 5/e

    William Stallings, Cryptography and Network Security 5/e

    A: Autokey cipher, Vigenere cipher, Vernam cipher, OneTime Pad (OTP) Block vs Stream Ciphers Block Cipher Principles most symmetric block ciphers are based on a Feistel Cipher Structure needed since must be able to decrypt ciphertext to recover messages efficiently...
  • Brendon Gallacher - Resources for Miss Archer's GCSE classes

    Brendon Gallacher - Resources for Miss Archer's GCSE classes

    Brendon Gallacher. He was seven and I was six, my Brendon Gallacher. He was Irish and I was Scottish, my Brendon Gallacher. His father was in prison; he was a cat burglar. My father was communist party full-time worker. He...