6.829: Computer Networks Data Center Networks Mohammad Alizadeh

6.829: Computer Networks Data Center Networks Mohammad Alizadeh

6.829: Computer Networks Data Center Networks Mohammad Alizadeh Fall 2019 Slides adapted from presentations by Albert Greenberg and Changhoon Kim (Microsoft) 1 What are Data Centers? Large facilities with 10s of thousands of networked servers Compute, storage, and networking working in concert Warehouse-Scale Computers Huge investment: ~ 0.5 billion for large datacenter 2 Data Center Costs Amortized Cost* ~45% Component Sub-Components Servers

CPU, memory, disk ~25% ~15% Power infrastructure Power draw UPS, cooling, power distribution Electrical utility costs ~15% Network Switches, links, transit The Cost of a Cloud: Research Problems in Data Center Networks. Sigcomm CCR 2009. Greenberg, Hamilton, Maltz, Patel. *3 yr amortization for servers, 15 yr for infrastructure; 5% cost of money Server Costs 30% utilization considered good in most data centers!

Uneven application fit Each server has CPU, memory, disk: most applications exhaust one resource, stranding the others Uncertainty in demand Demand for a new service can spike quickly Risk management Not having spare servers to meet demand brings failure just when success is at hand 4 Goal: Agility Any service, Any Server Turn the servers into a single large fungible pool Dynamically expand and contract service footprint as needed Benefits Lower cost (higher utilization) Increase developer productivity Achieve high performance and reliability 5 Datacenter Networks Provide the illusion of One Big Switch

10,000s of ports Compute Storage (Disk, Flash, ) Datacenter Traffic Growth DCN bandwidth growth demanded much more Today: Petabits/s in one DC More than core of the Internet! 12 Source: Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Googles Datacenter Network, SIGCOMM 2015. Latency is King Who does she know? Large-scale What has she done? Traditional Application

<< 1s latency AppAppAppAppAppAppAppAppAppApp Alice Logic Logic Logic Logic Logic Logic Logic Logic Logic Logic App. App Logic Data Structures Web Application App Tier

10s-1ms s-1ms latency Fabric Single machine Data Tier 1 user request 1000s of messages over DC network Microseconds of latency matter Even at the tail (e.g., 99.9th percentile) Based on slide by John Ousterhout (Stanford) Eric Minnie Pics Apps Videos Data Center

8 Conventional DC Network Problems 9 Conventional DC Network Internet Internet CR DC-Layer DC-Layer 33 AR AR S S CR ...

L2 pros, cons? L3 pros, cons? AR AR DC-Layer DC-Layer 22 S S S S A A A A A A

... Key CR = Core Router (L3) AR = Access Router (L3) S = Ethernet Switch (L2) A = Rack of app. servers ~ 1,000 servers/pod == IP subnet Reference Data Center: Load balancing Data Center Services, Cisco 2004 10 Reminder: Layer 2 vs. Layer 3 Ethernet switching (layer 2) Fixed IP addresses and auto-configuration (plug & play) Seamless mobility, migration, and failover x Broadcast limits scale (ARP) x No multipath (Spanning Tree Protocol) IP routing (layer 3)

Scalability through hierarchical addressing Multipath routing through equal-cost multipath x Cant migrate w/o changing IP address x Complex configuration 11 Conventional DC Network Problems CR AR AR S S CR ~ 200:1 AR AR S

S S S S S A A A A A A ~ 40:1 S A ~ S5:1 A A

S S A A A ... Extremely limited server-to-server capacity 12 Conventional DC Network Problems CR CR ~ 200:1 AR AR S

S S S S S A A A A A A IP subnet (VLAN) #1 AR AR S

S S S S S A A A A A A IP subnet (VLAN) #2 Extremely limited server-to-server capacity Resource fragmentation 13 And More Problems

CR AR CR ~ 200:1 AR Complicated manual L2/L3 re-configuration AR AR S S S S S


A A A IP subnet (VLAN) #1 IP subnet (VLAN) #2 14 VL2 Paper Measurements VL2 Design - Clos topology Valiant LB Name/location separation (precursor to network virtualization) http://research.microsoft.com/en-US/news/features/datacenternetworking-081909.aspx 15 VL2 Goals

The Illusion of a Huge L2 Switch 1. L2 semantics 2. Uniform high capacity A A A A A A 3. Performance isolation A A A A A A 16

Clos Topology Offer huge capacity via multiple paths (scale out, not up) VL2 Int ... Aggr ... ... ... TOR 20 Servers ...... ........ 17 https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-nextgeneration-facebook-data-center-network /

18 Building Block: Merchant Silicon Switching Chips Switch ASIC 6 pack Facebook Wedge Image courtesy of Facebook 19 ECMP Load Balancing Pick among equal-cost paths by a hash of 5-tuple Randomized load balancing Preserves packet order Problems? - Hash collisions H(f) % 3 = 0 20 Do Hash Collisions Matter? VL2 Paper: Probabilistic flow distribution would

work well enough, because ? Flows are numerous and not huge no elephants ? Commodity switch-to-switch links are substantially thicker (~ 10x) than the maximum thickness of a flow 21 Impact of Fabric vs. NIC Speed Differential Three non-oversubscribed topologies: 5 racks, 100x10G servers 2010Gbps Uplinks 540Gbps Uplinks 2100Gbps Uplinks 2010Gbps Downlinks 2010Gbps

Downlinks 2010Gbps Downlinks 22 F C T (n o r m a liz e d t o o p ti m a l) Impact of Fabric vs. NIC Speed Differential Web search workload Better Avg FCT: Large (10MB,) background flows 20 18 16 14 12 10 8 6 4 2 0

30 OQ-Switch 20x10Gbps 5x40Gbps 2x100Gbps 40/100Gbps fabric: ~ same FCT as OQ 35 40 45 50 55 60 65 70 75 10Gbps fabric: FCT up 40% worse than OQ 8023 Load (%)

Intuition Higher speed links improve probabilistic routing 2010Gbps Uplinks 2100Gbps Uplinks Prob of 100% throughput = 3.27% 1 2 20 Prob of 100% throughput = 99.95% 1110Gbps flows (55% load) 1 2 24 DC Load Balancing Research Centralized

[Hedera, Planck, Fastpass, ] In-Network Cong. Oblivious [ECMP, WCMP, packet-spray, ] Cong. Aware Distributed Host-Based Cong. Oblivious Cong. Aware [Presto, Juggler] [MPTCP, MP-RDMA, [Flare, TeXCP, CONGA, Clove, FlowBender] DeTail, HULA, LetFlow, ] 25 VL2 Performance Isolation

Why does VL2 need TCP for performance isolation? 26 Addressing and Routing: Name-Location Separation VL2 Switches run link-state routing and maintain only switch-level topology Allows to use low cost switches Protects network from host-state churn ToR1 . . . ToR2 ToR3 . . . ToR4 ... ToR3 y payload ToR34 z payload x y,yz Servers use flat names

z Directory Service x ToR2 y ToR3 z ToR34 Lookup & Response 27 VL2 Agent in Action H(ft) dst IP src IP dst IP src H(ft) IP dst AA src AA payload Int ( ( ToR ( ( ToR ( VL2 Agent Why use hash for Src IP? Why anycast & double encap? VLB ECMP 28


Recently Viewed Presentations

  • Chapter 10

    Chapter 10

    Experiencing MIS InClass Exercise 10: GardenTracker (cont'd) Explain how you would use SDLC to develop GardenTracker. Define the scope of your system. Explain process you would use to determine feasibility of GardenTracker. List data you need for such an assessment,...
  • Running Your Services On Docker An experience report

    Running Your Services On Docker An experience report

    We use Consul, Nginx and Consul Template to implement a "Service Proxy" for inter and intra-host container communication. We built a utility container called "Service Proxy" that uses Consul's service directory to locate a container's ip address and port
  • ATIS Lawful Intercept (LI/LAES) Standards Development

    ATIS Lawful Intercept (LI/LAES) Standards Development

    Challenges. Working at the nexus where law, regulation, technology, and standards come together. Delivering standards to satisfy aggressive regulatory mandate target dates such that reliable implementations can be deployed (per target dates)
  • Gas Laws - Department of Atmospheric Sciences

    Gas Laws - Department of Atmospheric Sciences

    Can combine both gas laws into one: the Perfect Gas Law (a.k.a., the Ideal Gas Law) P= rRT. Where r is density (kg m-3), R is a gas constant, P is pressure (Nm-2), and T is temperature (K) One of...
  • Sampling Methods and Quality Assurance Plan

    Sampling Methods and Quality Assurance Plan

    Sampling of straight-run and cut-back bitumens shall be carried out in accordance with ASTM D 140. Sampling of bitumen emulsion shall be carried out in accordance with BS 434, Part 1, except that where a delivery is made in drums...
  • Managing Depression

    Managing Depression

    Managing Depression in Primary Care Diagnosis and Treatment
  • Climate - robeson.k12.nc.us

    Climate - robeson.k12.nc.us

    StarterConsider the climate. Climate is the long-term, widespread weather. There are 3 true statements about climate in the list below. Place an X next to the number of each true statement. Temperature and rainfall are main features of climate. Topography...
  • Negotiation Skills Jon Boyes Curriculum and Work-Related Learning

    Negotiation Skills Jon Boyes Curriculum and Work-Related Learning

    Define what is meant by negotiation and apply that to a number of different contexts. Identify factors that can determine the outcome of a negotiation. Plan a strategy for successful negotiation. Understand the principle of 'win-win' negotiations.