ISMB 2013 report - acgt.cs.tau.ac.il

ISMB 2013 report - acgt.cs.tau.ac.il

Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al. 2012 Presented by Yves A. Lussier MD PhD The University of Chicago

Head and Neck Squamous Cell cancer A group of similar cancers Spread to lymph nodes Strongly related to environmental risk factors Alcohol, smoking, UV lights, viruses ~40,000 new cases in the US per year

Hard to predict recurrence Early detection is crucial Computational goals Use gene expression data for patient classification and recurrence survival Improve interpretability of the classifiers by

integrating pathway information Keep classification performance high Standardize across multi datasets Sample analysis Input: an expression profile of a sample A vector of real values for each patient

Step 1: rank the genes Step 2: calculate a score for each gene Rank of gene g in sample s

Total number of ranked genes Gene set features Gene sets: GO terms or KEGG pathways

Define the normalized centroid (NC) of a gene set to be the average weights of the genes Calculate the NC score for each gene set and its complement Gene set features Calculate the NC score for each gene set and

its complement Gene set features The score of a gene set is the difference between the two Which results in the FAIME profile for each

sample FAIME summary Differential pathways Differential analysis for each pathway\GO feature

Healthy FAIME scores Sick FAIME scores Get empirical p-values by shuffling

the labels (1000 repeats) Analysis overview 1 Data used

8 GE HNSCC datasets 3 for training 5 for validation Controls can be

Samples from an independent healthy individuals Paired samples sample from distant uninvolved site Paired samples samples from the margins of the tumor Data used

Preprocessing Download raw data (cel files) from ArrayExpress Use MAS5 to get the expression matrix Probe -> genes Keep the probe with the largest inter-quantile range Remove genes with negative average expression across samples

Affys MAS5 implementation should produce non negative values Comment: preprocessing is not sample based! Data used

Stability of FAIME scores Dataset A 22 HNSCC 22 independent controls Boxplot for each sample

Values are pathway scores Reproducibility tests Compared methods 1. FAIME 1. Pathway p-value calculated using Z-scores

2. Hypergeometric enrichment analysis 1. Differential genes (from the original papers) 3. GSEA (actually, they used GSA) 4. CORG (Lee et al. 2008; Ideker lab) Test the overlap after running the methods on datasets A-C

Reproducibility tests Reproducibility across methods Goal: test if FAIMEs significant pathways cover the standard analysis Idea: use the standard analysis (GSEA, or HG) as gold

standard Given FAIMEs pathways calculate precision-recall curves based on pathway sizes For a threshold k calculate precision and recall considering pathways with > k genes Reproducibility across methods

Validation on independent datasets Use the pathways and GO terms that were stable in datasets A-C 57 features Test their ability to classify new datasets

D: n=35; E:n=91 Classification: just cluster the samples to two groups (next slides) Average linkage hierarchical clustering Validation on independent datasets

Extra-cellular events Nuclear events, cell cycle No Errors Metabolism

Validation on independent datasets 3 Errors Additional dataset (supplement) 4 Errors

Survival analysis For datasets E and F a follow-up study is given for the cancer patients Cluster the HNSCC samples to two groups Used the CLARA algorithm

Compare the sample clusters based on the survival data (recurrence free survival) Survival analysis Additional dataset (supplement)

Survival analysis: comments The same flow with GSEA or Mean-G: survival plots are not significant (p>0.07) In the publication of dataset E another HNSCC specific clinical test (called HPV) did not produce significant separation.

Prostate cancer (Chen et al. 2013, BMC Medical Genomics) Analyzed three case-control datasets Validation: survival analysis on an independent dataset

FAIME scores Re-scale FAIME scores (not justified): Tested gene sets Gene ontology ~4000 terms

Cancer modules (Segal et al. 2004) ~2000 samples covering 22 tumor types Cluster to 454 modules Reproducibility GO FAIME

CM Survival analysis Recurrence: if PSA (prostate cancer blood biomarker) is present suggesting bad prognosis

Relevant to us? Ranking based analysis can help in integration of different datasets Pathway level scores provide standardized set of features across datasets The FAIME score can be improved

The idea of using case control studies for survival analysis on independent data is a strong validation scheme

Recently Viewed Presentations

  • Gimme a Kiss! Introducing Confidence Intervals Using Hershey ...

    Gimme a Kiss! Introducing Confidence Intervals Using Hershey ...

    Record a frequency table on the board for water versus land as the students randomly toss the globe to each other throughout the room. Collect at least 80 observations. (should result in ... Introducing Confidence Intervals Using Hershey's Kisses Last...
  • chapter: 33 > Macroeconomics: > Events and Ideas

    chapter: 33 > Macroeconomics: > Events and Ideas

    The school of thought that emerged out of the works of John Maynard Keynes is known as Keynesian economics. Classical versus Keynesian Macroeconomics The Politics of Keynes The term Keynesian economics is sometimes used as a synonym for leftwing economics....
  • RESALE CERTIFICATE RENEWALS Tax Administration Division Louisiana Department

    RESALE CERTIFICATE RENEWALS Tax Administration Division Louisiana Department

    Follow the instructions to complete the renewal process. Once the application is completed, be sure to hit the "Submit Request" button. Make a note of the confirmation number in case there is a problem. Processing of the application usually takes...
  • Class D amplifiers

    Class D amplifiers

    Subs reproduce deep bass in movie soundtracks Sound you can feel (Explosions, rumbles etc) Requirement for higher power The ear is less sensitive to low frequencies The trend to smaller cabinets and a requirement to reach a high sound pressure...
  • Nothing in Biology makes sense except in the

    Nothing in Biology makes sense except in the

    G. G. Simpson. G. L. Stebbins. S. Wright. e. t al. Modern Evolutionary Synthesis. Please do not use the images in these PowerPoint slides without permission. Thomas Kuhn - paradigm as the set of concepts & practices that defines a...
  • Spinal Fusion - AAHAM Western Region

    Spinal Fusion - AAHAM Western Region

    Today's Speaker. Elaine Lips, RHIA. President & CEO. ELIPSe, Inc. Elaine Lips, RHIA is the President & CEO of ELIPSe, Inc. She has thirty years experience in HIM consulting and information systems in the user, integrated health organizations, and vendor...
  • Grade Pre-Assessment for Quarter 2 Reading Informational Text

    Grade Pre-Assessment for Quarter 2 Reading Informational Text

    Fire Cakes! Fire cakes are a horrible tasting blob of burnt wheat. The soldiers first mixed flour with water until it was a thick, damp dough. They then formed it into a cake. ... The longer the water has been...
  • 50th Anniversary SPUTNIK INTRODUCTION October 4th marks the

    50th Anniversary SPUTNIK INTRODUCTION October 4th marks the

    When all of a sudden, in the fall of 1957, the Sputnik Satellite, comes racing across the sky and Americans thought that this was a sign that the Soviets were surpassing us." - Michael Beschloss, Presidential Historian CONCLUSION "Sputnik fueled...