Yeast whole-genome analysis of conserved regulatory motifs

Yeast whole-genome analysis of conserved regulatory motifs

Chromatin state dynamics in nine human cell types elucidate regulators and disease-associated SNPs Jason Ernst Joint work with Pouya Kheradpour, Luke Ward Brad Bernstein and Manolis Kellis Goal: interpreting disease-associated variants using epigenomics CATGACTG CATGCCTG Epigenomics Disease variants GWAS implicate hundreds of non-coding loci with disease Challenges towards interpreting disease variants: Find true causative SNP among many in Linkage Disequilibrium Determine type of function: especially outside protein-coding Reveal relevant cell type of activity Link to upstream regulators and downstream target genes Epigenomics tools to address these challenges 2 From chromatin states to disease Chromatin State Introduction

Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants Challenge of data integration in many marks/cells Construct antibodies pull down chromatin ChIP-seq tracks Epigenomic information retains genome state in differentiation and development Two types: DNA methyl.

Histone marks Histone tail modifications Dozens of chromatin tracks Understand their function Reveal their combinations Annotate systematically Common chromatin states DNA packaged into chromatin around histone proteins Explicitly model combinations Unsupervised approach, probabilistic model From chromatin marks to chromatin states Promoter states Transcribed states Active Intergenic Repressed Learn de novo

significant combinations of chromatin marks Reveal functional elements, even without looking at sequence Use for genome annotation Use for studying regulation dynamics in different cell types Ernst and Kellis, Nat Biotech 2010 From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants ENCODE: Study nine marks in nine human cell lines

9 human cell types 9 marks 81 Chromatin Tracks (2^81 combinations) H3K4me1 HUVEC Umbilical vein endothelial H3K4me2 NHEK Keratinocytes GM12878 Lymphoblastoid K562 Myelogenous leukemia HepG2

Liver carcinoma NHLF Normal human lung fibroblast H3K36me3 HMEC Mammary epithelial cell CTCF HSMM Skeletal muscle myoblasts H1 Embryonic H3K4me3 H3K27ac H3K9ac H3K27me3

H4K20me1 +WCE +RNA x Brad Bernstein Chromatin Group Ernst et al, Nature 2011 Chromatin states dynamics across nine cell types Single annotation track for each cell type Summarize cell-type activity at a glance Can study 9-cell activity pattern across From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants Introducing multi-cell activity profiles

Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 ON OFF

Active enhancer Repressed Motif enrichment Motif depletion TF On TF Off Motif aligned Flat profile Linking Distal Regulatory Elements to Genes Which gene(s) is this active enhancer in HMEC likely regulating? ? HMEC state IRF6 expression -0.7 ? H3K27ac signal

-1.1 -1.7 1.2 -1.6 0.0 -1.7 -1.3 0.9 0.5 -1.6 -0.1 -1.6 0.1 4.2

0.4 3.7 0.3 Compute correlations between gene expression levels and enhancer associated histone modification signals C1orf107 expression 12 Linking Distal Regulatory Elements to Genes Which gene(s) is this active enhancer in HMEC likely regulating? Random gene expression HMEC state -1.1 IRF6 expression 4.0 -1.7

-0.5 -1.6 -0.8 -1.7 0.5 0.9 -0.5 -1.6 0.6 -1.6 -1.1 4.2 -1.0

3.7 Random H3K27ac signal -0.7 Combine intensity signal from all marks: Train logistic regression classifier to discriminate real from random correlations, conditioned on state, TSS dist, cell type Real Compare correlations between enhancer and gene expression between real and randomized data 13 Enhancer-gene links supported by eQTL-gene links eQTL study 15kb Individuals

Indiv. 1 -0.5 Indiv. 2 -1.5 Indiv. 3 -1.8 Indiv. 4 3.1 Indiv. 5 1.1 Indiv. 6 -1.8 Indiv. 7 -1.4

Indiv. 8 3.2 Indiv. 9 4.4 Expression level of gene A A A C A A A C C

Validation rationale: Expression Quantitative Trait Loci (eQTLs) provide independent SNP-to-gene links Do they agree with activity-based links? Example: Lymphoblastoid (GM) cells study Expression/genotype across 60 individuals (Montgomery et al, Nature 2010) 120 eQTLs are eligible for enhancer-gene linking based on our datasets 51 actually linked (43%) using predictions 4-fold enrichment (10% exp. by chance) Sequence variant at distal position Independent validation of links. Relevance to disease datasets.14 From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget

Use these to study disease-associated variants Introducing multi-cell activity profiles Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1

ON OFF Active enhancer Repressed Motif enrichment Motif depletion TF On TF Off Motif aligned Flat profile Coordinated activity reveals activators/repressors Enhancer activity Gene activity Predicted regulators Activity signatures for each TF

Enhancer networks: Regulator enhancer target gene Ex1: Oct4 predicted activator of embryonic stem (ES) cells Ex2: Gfi1 repressor of K562/GM cells Causal motifs supported by dips & enhancer assays Dip evidence of TF binding (nucleosome displacement) Enhancer activity halved by single-motif disruption Motifs bound by TF, contribute to enhancers 18 From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants Revisiting diseaseassociated variants (Ganesh et al, Nat Genet 2009)

(Teslovich et al, Nature 2010) (Stahl et al, Nat Genet 2010) (Liu et al, Nat Genet 2010) (Han et al, Nat Genet 2009) (Kathiresan et al, 2008) (Kamatani et al, Nat Genet 2009) (Soranzo et al, Nat Genet 2009) (Houlston et al, Nat Genet 2008) (Newton-Chen et al, Nat Genet 2009) rs9271100 Disease-associated SNPs enriched for enhancers in relevant cell type Ex1: Systemic lupus erythrematosus SNP: Ets-1 motif SNP in lymphoblastoid GM enhancer state Disrupts Ets1 motif instance, predicted GM regulator Model: Disease SNP abolishes GM enhancer Ets-1 is a predicted activator of GM enhancers Enhancer activity Activity signatures for each TF Ets expression Ets-1 motif enrichment in enhancers

Model: Ets-1 disruption would abolish enhancer state Ex2: Erythrocyte phenotype study SNP: Gfi-1 motif K562: erythroleukaemia cell type ` ` Disease SNP creates motif instance for Gfi-1 repressor Gfi-1 predicted repressor for K562-specific enhancers Creation of repressive motif abolishes K562 enhancer Gfi-1 is a predicted repressor of non-K562 enhancers Enhancer activity Activity signatures for each TF Gfi expression Gfi-1 motif depletion in enhancers Prediction: Gfi-1 large-scale repression of non-K562 Motif created Gfi-1 recruited enhancer repressed SNPs from GWAS Enrich for Cell Type Specific Strong Enhancer Chromatin States in Biologically Relevant Cell Types Title

Author/ Journal Multiple loci influence erythrocyte phenotypes in the CHARGE Consortium. Biological, clinical and population relevance of 95 loci for blood lipids Ganesh et al Nat Genet 2009 Teslovich et al Nature 2010 Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci Genome-wide meta-analyses identify three loci associated with primary biliary cirrhosis Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Stahl et al Nat Genet 2010 Liu et al

Nat Genet 2010 Genome-wide association study of hematological and biochemical traits in a Japanese population A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Genome-wide association study identifies eight loci associated with blood pressure. Total #SNPs Fold 35 101 Cell Type 17 K562 11 HepG2

# SNPs in Strong enhancers FDR 9 0.02 13 0.02 29 15 GM12878 7 0.03 6 41 GM12878 4

0.03 Han et al Nat Genet 2009 18 21 GM12878 6 0.03 Kathiresan et al Nat Genet 2008 18 24 HepG2 5 0.03 Kamatani et al Nat Genet 2009

39 12 K562 7 0.03 Soranzo et al Nat Genet 2009 28 15 K562 6 0.03 4 66 HepG2 3 0.03

9 30 K562 4 0.04 Houlston et al Nat Genet 2008 Newton-Chen et al Nat Genet 2009 From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants Chromatin state dynamics: Contributions summary Chromatin states capture mark combinations Reveal promoter/enhancer/insulator/transcribed regions

Chromatin states capture chromatin dynamics Single annotation track for each cell type One 15-state track per cell type instead of 29 combinations Activity profiles capture correlated changes Gene expression vs. chromatin: EnhancerGene links Motifs vs. TF expr vs. chromatin: Activators/Repressors Regulatory predictions validated: eQTLs/dips/lucif. eQTLs: links. Dips: binding. Luciferase assays: motif role Interpret disease-associated variants Intergenic SNPs enriched for cell-type specific enhancers Mechanistic predictions reveal potential drug targets Ever-expanding dimensions of epigenomics Additional dimensions: Environment Thousands of whole-genome Genotype datasets Disease Gender Chromatin marks Stage Age

Cell types Today: Cell-type and chromatin-mark dimensions Next: Personal epigenomes: genotype/phenotype Complete matrix of conditions, individuals, alleles Collaborators and Acknowledgements Broad Institute/ MGH Pathology/HHMI: Tarjei Mikkelsen MIT compbio group: Noam Shoresh Pouya Kheradpour Charles B. Epstein Lucas Ward Xiaolan Zhang Manolis Kellis Li Wang ENCODE consortium Robyn Issner Michael Coyne Funding Manching Ku NHGRI, NIH, NSF, Timothy Durham HHMI, Sloan Foundation Bradley E. Bernstein

Recently Viewed Presentations

  • Calculating Water Quality-based Effluent Limits

    Calculating Water Quality-based Effluent Limits

    Permit limits apply to discharges.Water quality criteria apply to water bodies.. In other words, criteria in the Texas Surface Water Quality Standards do not apply directly to a discharge.. Note: Stay tuned for an important exception. And some of these...
  • U3 Unit Conversions - Georgetown High School

    U3 Unit Conversions - Georgetown High School

    Similarly, 10^-2 is equivalent to one hundredth [click], 10^-3 is equivalent to one thousandth [click] and 10^-6 is equivalent to on millionth. If you start with the number 1.0, the power of ten can be interpreted as moving the decimal...
  • CBP's Powerpoint template for scientific posters

    CBP's Powerpoint template for scientific posters

    This work was financially supported by PETROBRAS and Agência Nacional do Petróleo, Gás Natural e Biocombustíveis (ANP), Brazil, via the Oceanographic Modelling and Observation Research Network (REMO). a) b) References Daley, R. Atmospheric Data Anaslysis. Cambridge Univ. Press. 457 pp....
  • Economic Systems - Welcome to Ms. Grno's class!

    Economic Systems - Welcome to Ms. Grno's class!

    Analyze how each type of systems answers the three economic questions and meets the broad social and economic goals of freedom, security, equity, growth, efficiency, price stability, full employment, and sustainability.
  • CST Assembly Community and Participation Be dreamers [who]

    CST Assembly Community and Participation Be dreamers [who]

    CST Assembly. Community . and Participation 'Be dreamers [who] believe in a new humanity, one that rejects hatred between peoples, one that refuses to see borders as barriers.' Pope . Francis. Some people came carrying a paralysed man on a...
  • Slide 0

    Slide 0

    Paul Kurdyak MD PhD Mental Illness and Addiction Treatment Rates Two thirds of people with depression do not seek help Up to 90% of people with addictions do not seek treatment Very little evidence on increasing treatment-seeking behaviours to address...
  • Parallel Computing Using MPI - Computer Science

    Parallel Computing Using MPI - Computer Science

    Parallel Computing Using MPI Parallel Computing MPI CS Lab setup Simple MPI program MPI data type and user defined MPI data types Parallel Computing Traditional computing is sequential. Only one instruction can be executed at any given moment in time...
  • CSE687 - Object Oriented Design class notes Chapter 1 ...

    CSE687 - Object Oriented Design class notes Chapter 1 ...

    That is a very big deal! Encapsulation * Class Relationships Composition is the best encapsulated relationship: Only member functions of the class have access to the interface of private data member objects. Only member functions of the class and its...