Cassandra - A Decentralized Structured Storage System

Cassandra - A Decentralized Structured Storage System

Spanner: Googles Globally-Distributed Database James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford

Google, Inc. Figures taken from paper and Alex Lloyds presentation at Berlinbuzzwords-2012 Agenda

Outline and Key Features System Architecture Software Stack

Data Model TrueTime Evaluation Case Study Outline Next step from Bigtable in RDBMS path with strong time semantics

Key Features: Temporal Multi-version database Externally consistent global write-transactions with synchronous replication. Transactions across Datacenters. Lock-free read-only transactions. Schematized, semi-relational (tabular) data model. SQL-like query interface.

Key Features cont. Auto-sharding, auto-rebalancing, automatic failure response. Exposes control of data replication and placement to user/application. Enables transaction serialization via global timestamps Acknowledges clock uncertainty and guarantees a

bound on it Uses novel TrueTime API to accomplish concurrency control Uses GPS devices and Atomic clocks to get accurate time System Architecture Software Stack

Software Stack cont. (key:string, timestamp:int64) string Back End: Colossus (successor to GFS) Paxos State Machine on top of each tablet stores meta data and logs of the tablet. Leader among replicas in a Paxos group is chosen and all write requests for replicas in that group initiate at leader.

Transaction Leader Is Paxos Leader if transaction involves one Paxos group Software Stack cont. Directory analogous to bucket in BigTable Smallest unit of data placement Smallest unit to define replication properties

Directory might in turn be sharded into Fragments if it grows too large. Datamodel One or more databases supported in Spanner Universe Database can contain unlimited schematized tables Not purely relational Requires rows to have names

Names are nothing but a set(can be singleton) of primary keys In a way, its a key value store with primary keys mapped to nonkey columns as values Datamodel TrueTime Novel API behind Spanners core innovation Leverages hardware features like GPS and Atomic Clocks

Implemented via TrueTime API. Key method being now() which not only returns current system time but also another value () which tells the maximum uncertainty in the time returned Set of time master server per datacenters and time slave daemon per machines. Majority of time masters are GPS fitted and few others are atomic clock fitted (Armageddon masters).

Daemon polls variety of masters and reaches a consensus about correct timestamp. TrueTime TrueTime TrueTime Cont.

TrueTime uses both GPS and Atomic clocks since they are different failure rates and scenarios. Two other boolean methods in API are After(t) returns TRUE if t is definitely passed Before(t) returns TRUE if t is definitely not arrived TrueTime uses these methods in concurrency control and t serialize transactions.

TrueTime Cont. After() is used for Paxos Leader Leases Uses after(Smax) to check if Smax is passed so that Paxos Leader can abdicate its slaves. Paxos Leaders can not assign timestamps(Si) greater than Smax for transactions(Ti) and clients can not see the data

commited by transaction Ti till after(Si) is true. After(t) returns TRUE if t is definitely passed Before(t) returns TRUE if t is definitely not arrived Replicas maintain a timestamp tsafe which is the maximum timestamp at which that replica is up to date. TrueTime Transactions

Read-Write requires lock. Read-Only lock free. Requires declaration before start of transaction. Reads information that is up to date Snapshot Read Read information from past by specifying a timestamp or bound Use specifies specific timestamp from past or timestamp bound so

that data till that point will be read. Evaluation Evaluated for replication, transactions and availability. Results on epsilon of TrueTime Benchmarked on Spanner System with 50 Paxos groups 250 Directories

Clients(applicatons) and Zones are at a network distance of 1ms Evaluation - Availability Evaluation - Epsilon Case Study Spanner is currently in production used by Googles

advertising backend F1. F1 previously used MySQL since it requires strong transactional semantics which NoSQL database solution impractical. Spanner provides synchronous replication and automatic failover for F1. Case Study cont.

Enabled F1 to specify data placement via directories of spanner based on their needs. F1 operation latencies measured over 24 hours Thank You

Recently Viewed Presentations

  • Nicht-invasive prnatale RHDBestimmung RHD-Genotypisierung zellfreier fetaler DNA 32.

    Nicht-invasive prnatale RHDBestimmung RHD-Genotypisierung zellfreier fetaler DNA 32.

    Pränatale Ultraschall - Diagnostik Nachweis von Ödembildung Pränatale Ultraschall - Diagnostik Nachweis von Ödembildung Doppler Ultraschalldiagnostik A. cerebri media flow (MCA-PSV) LO, Y.M.D. et al. 1997. Presence of fetal DNA in maternal plasma and serum. Lancet 350: 485-487.
  • The Enlightenment

    The Enlightenment

    Rousseau - Champion of Freedom. Individual freedom. Very different from other philosophes. Civilization corrupted people's natural goodness. Only good government was one that was freely formed by the people and guided by the ""general will" of society. Which is what?...
  • How to Set Performance Targets in Inventory Control

    How to Set Performance Targets in Inventory Control

    How to Set Performance Targets in Inventory Control Dr. Everette S. Gardner, Jr. Clean Up the Parts List Code substitute items Ensure historical demand recorded against primary items Delete obsolete items (no longer used in current product line) Do the...
  • Chapter 7 Body Systems

    Chapter 7 Body Systems

    Assist in transferring the patient to the stretcher when the transport person comes to take the patient to surgery. Compare the patient's identification bracelet name and numbers with the transport request sheet for accuracy. Check the chart to make certain...
  • Solving Word Problems - Gwinnett County Public Schools

    Solving Word Problems - Gwinnett County Public Schools

    Solving Word Problems ... Labels This means define the variable Algebraic Model Write a math equation or inequality using your labels and verbal model Solve Check to make sure your solution makes sense Let's Try Together You and some friends...
  • Poetry Analysis Getting Started This is a process

    Poetry Analysis Getting Started This is a process

    It does not make a judgment. example: "Don't do drugs" is not a theme. It merely states something that is true to life and the human condition. T is for THEME Look at the other parts of TPCASTT. What insight...
  • International Student Guide to Finding Work in Canada

    International Student Guide to Finding Work in Canada

    Making Contact With An Employer. If accessing the hidden job market: Make initial contact by email, or telephone (better); goal is to secure an in-person appointment with a hiring manager
  • 9 Inferences Based on Two Samples Copyright  Cengage

    9 Inferences Based on Two Samples Copyright Cengage

    In our examples, paired data resulted from two observations on the same subject (Example 9.9) or experimental object (location in Example 9.8). Even when this cannot be done, paired data with dependence within pairs can be obtained by matching individuals...