Workflow Aware Storage - UBC ECE

Workflow Aware Storage - UBC ECE

Towards a High-Performance and Scalable Storage System for Workflow Applications Emalayan Vairavanathan Department of Electrical and Computer Engineering The University of British Columbia 1 Background: Workflow Applications Large number of independent tasks collectively work on a problem Common Characteristics File based communication Large number of tasks Large amount of storage I/O Regular data access patterns

modFTDock workflow 2 Background ModFTDock in Argonne Blue Gene/P Workflow Runtime Engine 1.2 M File based communication Docking Tasks Large IO volume Scale: 40960 Compute nodes App. task App. task App. task

App. task App. task Local storage Local storage Local storage Local storage Local storage

IO throughput < 1MBps / core Central Storage System (e.g., GPFS, NFS) 3 Background Central Storage Bottleneck Z. Zhang et. al, SC12 Montage workflow (512 BG/P CPU cores, GPFS) 4 Contributions - Alleviating storage I/O bottleneck Intermediate Storage System Designed and implemented a prototype Integrated with workflow runtime Evaluated with applications on BG/P

The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System. S. Al-Kiswany, Workflow-aware Storage System Identified new data access patterns Studied the viability of a workflow-aware storage Emalayan Vairavanathan, L. B. Costa, H. Yang, M. A Workflow-Aware Storage System: An Opportunity Ripeanu. Submitted - FAST '13. Study. Emalayan Vairavanathan, S. Al-Kiswany, L. B. Costa, Z.Zhang, D.Katz, M.Wilde, M. Ripeanu. CCGRID '12. Acceptance Rate : 27%. MosaStore Storage System

A case for Workflow-Aware Storage: An Opportunity Experimental platform for other studies Study using MosaStore. Emalayan Vairavanathan, S. Al-Kiswany, A. Barros, L. B. Costa1 H. Yang, G. Fedak, D.Katz, M.Wilde, M. Ripeanu. Submitted - FGCS Journal Predicting Intermediate Storage Performance for Workflow Applications. L. B. Costa, A. Barros, Emalayan Vairavanathan, S. Al-Kiswany, M. Ripeanu. Submitted CCGRID '13. 5 Intermediate Storage System Opportunities:

Workflow Runtime Engine Compute Nodes App. task POSIX API Underutilized resources Local storage App. task App. task

Local storage Local storage Stage Out Intermediate Storage Stage In Central Storage System (e.g., GPFS, NFS) 6 Evaluation - modFTDock on Blue Gene/P 20- 40% improvement 2x improvement 7

Contributions - Alleviating storage I/O bottleneck Intermediate Storage System Designed and implemented a prototype Integrated with workflow runtime Evaluated with applications on BG/P The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System. S. Al-Kiswany, Workflow-aware Storage System Identified new data access patterns Studied the viability of a workflow-aware storage Emalayan Vairavanathan, L. B. Costa, H. Yang, M. A Workflow-Aware Storage System: An Opportunity Ripeanu. Submitted - FAST '13.

Study. Emalayan Vairavanathan, S. Al-Kiswany, L. B. Costa, Z.Zhang, D.Katz, M.Wilde, M. Ripeanu. CCGRID '12. Acceptance Rate : 27%. MosaStore Storage System A case for Workflow-Aware Storage: An Opportunity Experimental platform for other studies Study using MosaStore. Emalayan Vairavanathan, S. Al-Kiswany, A. Barros, L. B. Costa1 H. Yang, G. Fedak, D.Katz, M.Wilde, M. Ripeanu. Submitted - FGCS Journal Predicting Intermediate Storage Performance for Workflow Applications. L. B. Costa, A. Barros, Emalayan Vairavanathan, S. Al-Kiswany, M. Ripeanu. Submitted CCGRID '13.

8 A Workflow-aware Storage System Opportunities Dedicated intermediate storage Exposing data location Workflow Runtime Engine Task scheduling Regular data access patterns Compute Nodes POSIX API Deploy intermediate storage

App. task Local storage App. task App. task Local storage Local storage Workflow-aware Intermediate Storage Intermediate storage (shared) Stage In/Out

Central Storage System (e.g., GPFS) Data Access Patterns in Workflow Applications Pipeline Locality and location-aware scheduling Broadcast Replication Reduce Collocation and location-aware scheduling Scatter and Gather

Block-level data placement Wozniak et al PDSW09, Katz et al BlueWater, Shibata et al. HPDC10 10 Data Access Patterns in ModFTDock Broadcast pattern ModFTDock Reduce pattern Pipeline pattern

11 Evaluation - Baselines Compute Nodes App. task App.and task MosaStore, NFS Node-local storage Local storage Local storage App. task vs

Local storage Intermediate storage (shared) Local Workflow-aware storagestorage MosaStore Workflowaware storage Stage In/Out Central Storage System (e.g., GPFS, NFS) NFS 12 Evaluation - Platform Cluster of 20 machines. Intel Xeon 4-core, 2.33-GHz CPU, 4-GB RAM, 1-Gbps NIC, and a RAID1 on two 300-GB 7200-rpm SATA disks

Central storage NFS server Intel Xeon E5345 8-core, 2.33-GHz CPU, 8-GB RAM, 1-Gbps NIC, and a 6 SATA disks in a RAID 5 configuration NFS server is better provisioned 13 Evaluation Benchmarks and Application Synthetic benchmark Workload Pipeline Broadcast Reduce

Small 100KB, 200KB, 10KB 100KB, 1KB 10KB, 100KB Medium 100 MB, 200 MB, 1MB 100 MB, 1MB 10MB, 200 MB Large 1GB, 2GB, 10MB 100MB, 2 GB 1 GB, 10 MB

Application and workflow run-time engine Montage modFTDock 14 Synthetic Benchmark - Pipeline Optimization: Locality and location-aware scheduling Average runtime for medium workload 3x improvement in workflow time 15 Synthetic Benchmarks - Broadcast Optimization: Replication

Average runtime for medium workload on disk 60% improvement in the runtime 16 Evaluation Montage Total application time on five different systems Montage workflow 10% improvement in the runtime 17 Contributions - Alleviating storage I/O bottleneck Intermediate Storage System Designed and implemented a

prototype Integrated with workflow runtime Evaluated with applications on BG/P The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System. S. Al-Kiswany, Workflow-aware Storage System Identified new data access patterns Studied the viability of a workflow-aware storage Emalayan Vairavanathan, L. B. Costa, H. Yang, M. A Workflow-Aware Storage System: An Opportunity Ripeanu. Submitted - FAST '13. Study. Emalayan Vairavanathan, S. Al-Kiswany, L. B. Costa, Z.Zhang, D.Katz, M.Wilde, M. Ripeanu. CCGRID '12. Acceptance Rate : 27% (one of the top 15 papers).

MosaStore Storage System A case for Workflow-Aware Storage: An Opportunity Experimental platform for other studies Study using MosaStore. Emalayan Vairavanathan, S. Al-Kiswany, A. Barros, L. B. Costa1 H. Yang, G. Fedak, D.Katz, M.Wilde, M. Ripeanu. Submitted - FGCS Journal Predicting Intermediate Storage Performance for Workflow Applications. L. B. Costa, A. Barros, Emalayan Vairavanathan, S. Al-Kiswany, M. Ripeanu. Submitted CCGRID '13. 18

THANK YOU 19 BACKUP SLIDES 20 Background Many-task workflows Large amount of legacy code Rapid application development Portability (workstation supercomputers) Easy to debug Implicit fault-tolerance Expression of natural parallelism 21 Background Motivation Many-task applications are becoming popular

Better utilization of costly hardware, Energy saving (lot of time is spend to execute workflow applications) Better scalability and high performance will help to solve large problems more accurately Large number of available workflow applications 22 Blue Gene/P Architecture 640 IO Nodes GPFS: deployed on 128 file server nodes (3 Petabytes storage capacity) Torus Network 10 Gb/s x 128 40960 compute nodes

(160K cores) 10 10 Gbps Gbps Switch Switch Complex Complex 6.4 Gbps per link. Tree network (850 MBps x 640) 23 Example Workflow Software Stack Swift script

Swift Compiler Intermediate Code Workflow runtime engine (e.g. Swift) Tasks / Notifications Task dispatching service (e.g. Coasters) Tasks / Notifications Worker Worker Worker Worker

Worker Worker Worker Worker Performs Storage IO Shared Storage System 24 Intermediate Storage System MosaStore File is divided into fixed size chunks. Chunks: stored on the

storage nodes. Manager maintains a block-map for each file MosaStore distributed storage architecture POSIX interface for accessing the system 25 Contribution - Intermediate Storage System Support a set of POSIX APIs (random read and write, delete, close) Garbage-collection Replication (eager and lazy) Client side caching MosaStore Storage System

26 Viability study Changes in MosaStore Optimized data placement for the pipeline pattern Priority to local writes and reads Optimized data placement for the reduce pattern Collocating files in a single benefactor Replication mechanism optimized for the broadcast pattern Parallel replication Data block placement for the scatter and gather patterns 27 Evaluation - Synthetic Benchmark on Blue Gene/P Pipeline benchmark

Runtime at different scale 100% performance gain in the application runtime 28 Synthetic Benchmarks - Reduce Optimization: Collocation and location-aware scheduling Average runtime for medium workload 2x improvement in the runtime 29 Synthetic benchmarks Small workload Reduce benchmark

Broadcast benchmark 30 Evaluation ModFTDock ModFTDock workflow Total application time on three different systems 20% improvement in the runtime 31 Evaluation Montage per stage time Total application time five different systems 32

Recently Viewed Presentations

  • MedOnto: Medical Ontology Learning System

    MedOnto: Medical Ontology Learning System

    MedOnto: Medical Ontology Learning System(Work in Progress). SyedFarrukhMehdi. Reza Fathzadeh. S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca
  • Degrees in the FFA - Amazon S3

    Degrees in the FFA - Amazon S3

    The Discovery FFA Degree, the Greenhand FFA Degree and the Chapter FFA Degree are awarded at the chapter level. State associations award top members with the State FFA Degree. In Texas, this is known as the Lone Star Farmer Degree....
  • The South Cardiff and Vale Crisis Resolution And

    The South Cardiff and Vale Crisis Resolution And

    The South Cardiff and Vale Crisis Resolution And Home Treatment Team Jayne Bell Team Leader 029 20 906222 Where we are Based in the Hamadryad Centre, co-located with the South West CMHT Approximately geographically central to our sector (traffic allowing!)
  • www.osha.gov

    www.osha.gov

    This training materials will cover the fall hazards seen regularly on construction sites and will focus on the methods for the recognition and the prevention of these common hazards. ... You will be presented with a specific hazard recognition question...
  • LinkedIn Masterclass - Sue Ellson

    LinkedIn Masterclass - Sue Ellson

    LinkedIn Business Brand Booster Blueprint for Women16 August 2017 11:00am ... LinkedIn is publicly held, diversified business model with revenues from member subscriptions, advertising sales and talent solutions ... login once a week. 3+ posts per year .
  • Chapter 11

    Chapter 11

    Molecular Mechanism of Transformation. Fig. 11-2. Heating S-strain cells killed them but did not completely destroy their DNA. When killed S-strain bacteria were mixed with living R-strain bacteria, fragments of DNA from the dead S-strain cells became incorporated into the...
  • The Cost of Capital, Corporation Finance &amp; The Theory of ...

    The Cost of Capital, Corporation Finance & The Theory of ...

    The Cost of Capital, Corporation Finance & The Theory of Investment American Economic Review Miller & Modigliani, 1958 Presented by Marc Fuhrmann
  • RPW 5-23-95 Slides - Data Management Association

    RPW 5-23-95 Slides - Data Management Association

    FY03 ANTICIPATED ADDITIONS—SOURCES Levels of Information System Interoperability (LISI) Additions to JTA 4.0 (as they emerge) DoD CIO Information Technology Architecture (ITA) Net-Centric Operations/Warfare (NCOW) Reference Model -- Architecture products (e.g., common glossary) -- Programmatic Evaluation Criteria (e.g., for NetOps)...