人工智能技术za生物信息学与 - D'Trends

人工智能技术za生物信息学与 - D'Trends

DNA Bioinformatics

Hwa A. Lim CompBio bioinformatique bio-informatics(bio /informatics) bioinformatics Number of entries in PDB 50k 40k

30k 20k 10k 0 1985 1990 1995 2000 2005

2010 DNA

DNA 3000 400.0

100 2.5

1300

DNA DNA RNA (adennine,A) (guanine,G))

(cytosine,C) (thymine,T)) (Uracil,U)

DNA 20

G)ly G) Ser S Ala A

T)hr T) Val V Asn N

Ile I G)ln Q Leu L T)yr

Y Phe F His H Pro

P Asp D Met M G)lu E

T)rp W Lys K Cys C

Arg R Ngram CRF N-gram, binary profile N-nary profile SVM

LSA Dong et al. N-gram Statistics and Linguistic F eatrues Analysis of Whole G)enome Protein S equences. Journal of Harbin Institute of T)ech nology. 2004 N-gram SVYDA 3-gram SVY

VYD YDA N-gram N-gram Zipf C Zipf x r r Zipf log x r c log(r )

Zipf

CRF CRF CRF

A R N D C Q E G H I L K M F P S T W Y V ... 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 CRF

CRF CRF yi-1 yi yi+1 1 n p (Y | X ) exp k f k ( yi 1 , yi , X , i ) Z(X ) i 1 k 1 n

exp k tk ( yi 1 , yi , X , i ) k sk ( yi , X , i ) Z(X ) i 1 k X (x1,x2,,xi-1,xi,xi+1,xn) n Z ( X ) exp k f k ( yi 1 , yi , X , i ) i 1 k tk ( yi 1 , yi , X , i )

sk ( yi , X , i ) 1 if yi 1 y and yi y ' t y , y ' ( yi 1 , yi , X , i ) 0 otherwise

pro y , aa s s scale( PSSM ( xk , aa)) if yi y ( yi , xk , i )

0 otherwise ASA y ASA( xk ) if yi y ( yi , xk , i ) otherwise 0 grade( xk ) /10 if yi y s con ( y , x ,

i ) y i k 0 otherwise CRF SMC1HD:SCC1-C

CRF Ribosomal subunit 30S CRF

Sreptococcal pyrogenic enterotoxin C SpeC SCOP (family) (sup erfamily) (fold)

N-grams Binary profiles N-nary profiles Binary profiles Amino acid sequence QTSVSPSKVILPRGGSVLVTCSTSCDQPKLLGIETPLPKKELLLPGNN Multiple sequence alignment

EI.IH.P.A.I.....LR...P..I...RKTTF.L..V.N.E.VS-R.P.W..FL...D...EIN.L................. .V.IH.TEAF.......Q.P..S..EDEN...L..NWM.D..S-S.H.W.LFK..DIG.R.L.FE..GTT Frequency profile PSI-BLAST Binary profile Amino acid combination

A: 0.03 C: 0.002 D: 0.26 E: 0.06 F: 0.01 G: 0.01 H: 0.07 I: 0.01 K: 0.01 L: 0.05 M: 0.02 N: 0.01 P: 0.18 Q: 0.02 R: 0.11 S: 0.03 T: 0.03 V: 0.02

W: 0.02 Y: 0.03 A: 0 C: 0 D: 1 E: 0 F: 0 G: 0 H: 0 I: 0 K: 0 L: 0 M: 0 N: 0 P: 1 Q: 0 R: 0 S: 0 T: 0

V: 0 W: 0 Y: 0 DP Frequency threshold 0.17 A: 0.06 C: 0.004 D: 0.04 E: 0.03

F: 0.03 G: 0.02 H: 0.02 I: 0.2 K: 0.03 L: 0.18 M: 0.01 N: 0.05 P: 0.02 Q: 0.02 R: 0.06 S: 0.002 T: 0.05 V: 0.17 W: 0.002 Y: 0.002 A: 0 C: 0 D: 0

E: 0 F: 0 G: 0 H: 0 I: 1 K: 0 L: 1 M: 0 N: 0 P: 0 Q: 0 R: 0 S: 0 T: 0 V: 1 W: 0 Y: 0 ILV

N-nary profiles Amino acid sequence QTSVSPSKVILPRGGSVLVTCSTSCDQPKLLGIETPLPKKELLLPGNN Multiple sequence alignment EI.IH.P.A.I.....LR...P..I...RKTTF.L..V.N.E.VS-R.P.W..FL...D...EIN.L................. .V.IH.TEAF.......Q.P..S..EDEN...L..NWM.D..S-S.H.W.LFK..DIG.R.L.FE..GTT

Protein sequence frequency profiles PSI-BLAST A: 0.03 C: 0.002 D: 0.26 E: 0.06 F: 0.01 G: 0.01 H: 0.07 I: 0.01 K: 0.01 L: 0.05 M: 0.02

N: 0.01 P: 0.18 Q: 0.02 R: 0.11 S: 0.03 T: 0.03 V: 0.02 W: 0.02 Y: 0.03 A: 0.06 C: 0.004 D: 0.04 E: 0.03 F: 0.03 G: 0.02 H: 0.02

I: 0.2 K: 0.03 L: 0.18 M: 0.01 N: 0.05 P: 0.02 Q: 0.02 R: 0.06 S: 0.002 T: 0.05 V: 0.17 W: 0.002 Y: 0.002 N=10 N-nary

profiles A: 0 C: 0 D: 2 E: 0 F: 0 G: 0 H: 0 I: 0 K: 0 L: 0 M: 0 N: 0 P: 1 Q: 0 R: 1

S: 0 T: 0 V: 0 W: 0 Y: 0 A: 0 C: 0 D: 0 E: 0 F: 0 G: 0 H: 0 I: 2 K: 0 L: 1 M: 0

N: 0 P: 0 Q: 0 R: 0 S: 0 T: 0 V: 1 W: 0 Y: 0 2 2 t c

2 t c N ( A D C B) 2 (t , c) ( A C ) ( B D) ( A B) (C D) 2 2 m avg (t ) Pr (ci ) 2 (t , ci ) i 1

W A USV USVTT LSA W USV T LSA

(roc50roc50 ) roc50 (cont.)cont.)) 1 EMBL http://www.embl-heidelberg.de 2 G)enBank http:// www.ncbi.nlm.nih.gov/Web/G)enbank/index.html 3 DDBJ http://www.ddbj.nig.ac.jp/ G)DB http://www.gdb.org/ Ensembl

http://www.ensembl.org/ MG)D http://www.informatics.jax.org/ SG)D http://genome-www.stanford.edu/Saccharomyces/ dbEST) http://www.ncbi.nlm.nih.gov/dbEST)/ dbST)S http://www.ncbi.nlm.nih.gov/dbST)S/ UniG)ene http://www.ncbi.nlm.nih.gov/UniG)ene/

PIR http://pir.georgetown.edu/ SWISS-PROT) http://www.expasy.ch/sprot/sprot-top.html T)rEMBL http://www.ebi.ac.uk/trembl/ UniProt Includes PIR, SWISS-PROT), T)rEMBL http://www.uniprot.org/ PDB http://www.rcsb.org/pdb/home/home.do MMDB http://130.14.29.110/Structure/MMDB/mmdb.s

html PDB

dbSNP http://www3.ncbi.nlm.nih.gov/SNP/ SCOP http://scop.mrc-lmb.cam.ac.uk/scop/ DSSP http://www.sander.embl-heidelberg.de/dssp/ HSSP http://www.sander.embl-heidelberg.de/hssp/ OMIM http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=OMIM PRINT)S http://www.bioinf.man.ac.uk/dbbrowser/PRINT)S/ EPD http://www.epd.isb-sib.ch/ T)RRD http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd/ T)RANSFAC http://transfac.gbf.de/ G)O http://www.geneontology.org/ PubMed http://www.ncbi.nlm.nih.gov/ BODYMAP http://bodymap.ims.u-tokyo.ac.jp/ PROSIT)E http://www.expasy.ch/prosite/ DBCat http://www.infobiogen.fr/services/dbcat/

EMBNet APBi oNet http://www.cbi.pku.edu.cn/chinese/mirrors.html T)he Canadian Bioinformatics Resource http://www.cbr.nrc.ca/ Human G)enome Working Draft http://genome.ucsc.edu/

T)IG)R (T)he Institute for G)enomics Research) http://www.tigr.org/ Celera http://www.celera.com/ (Model) Organism specific information: Yeast: http://genome-www.stanford.edu/Saccharomyces/ Arabidopis: http://www.tair.org/ Mouse: http://www.jax.org/ Fruitfly: http://www.fruitfly.org/ Nematode: http://www.wormbase.org/ Nucleic Acids Research Database Issue

http://nar.oupjournals.org/ (roc50First issue every year) Database interfaces Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, Sequence alignment BLAST, FASTA Multiple sequence alignment Clustal, MultAlin, DiAlign PSI-Blast G)ene finding Genscan, GenomeScan, GeneMark, GRAIL

Protein Domain analysis and identification pfam, BLOCKS, ProDom, Pattern Identification/Characterization Gibbs Sampler, AlignACE, MEME Protein Folding prediction PredictProtein, SwissModeler Sun

Dong Qiwen, Wang Xiaolong, Lin Lei. N-gram Statistics and Linguist ic Features Analysis of Whole G)enome Protein Sequences. Journal of Harbin Institute of T)echnology. 2004. Li MH, Lin L, Wang XL, Liu T): Protein-protein interaction site predicti on based on conditional random fields. Bioinformatics (2007). Dong QW., Wang XL. and Lin L.: Application of Latent Semantic An alysis to Protein Remote Homology Detection. Bioinformatics. 22, 2 85-290 (2006). Liu B, Lin L, Wang XL, Dong QW, Wang X: A discriminative method for protein remote homology detection based on N-nary profiles. BIR D08 (2008). , , .

Recently Viewed Presentations

  • Impressionism - Mr. McDonald (2016-2017)

    Impressionism - Mr. McDonald (2016-2017)

    The Dance Class was exhibited in 1876 at the second Impressionist exhibition. The subject of the work is a dance class conducted by the famous ballet master Jules Perrot. The scene is a careful arrangement of what seems to be...
  • Introduction to Nanotechnology What is Nanotechnology? Nanotechnology is

    Introduction to Nanotechnology What is Nanotechnology? Nanotechnology is

    Nanotechnology is the creation of functional materials, devices, and systems through control of matter on the nanometer length scale by exploiting novel phenomena and properties (physical, chemical, biological) present only at that length scale.
  • LICHFIELD LIBRARY - Ascel

    LICHFIELD LIBRARY - Ascel

    Janene Cox . Commissioner - Culture and Communities . Staffordshire County Council. 11 November 2016. UK Approach to Migration/Refugee re-settlement. There are approximately 95 Asylum Dispersal areas across the country. G4S hold the government contract to manage this process.
  • Unit 3 Overview - Emergency Management Institute

    Unit 3 Overview - Emergency Management Institute

    Joint Information Center. Another coordination entity is the Joint Information Center (JIC). The JIC: May be established to coordinate all incident-related public information activities. Serves as the central point of contact for all news media-when possible, public information officials from...
  • Basic Tasks - West Virginia University

    Basic Tasks - West Virginia University

    For the purposes of this article, the following words shall have the meanings hereafter ascribed to them unless the context clearly indicates otherwise:(d) "Valuation commission" or "commission" means the commission created in section three of this article.(h) "Electronic" means relating...
  • Air Contents and Air-Entraining Admixture in the Concrete

    Air Contents and Air-Entraining Admixture in the Concrete

    The production of Portland cement (the most common binder in concrete) is an energy-intensive process that accounts for a significant portion of global CO. 2. emissions and other greenhouse gases. Background.
  • Slide Title - Home - Greenleaf Center for Servant Leadership

    Slide Title - Home - Greenleaf Center for Servant Leadership

    Win - win for all Three core areas: Website: tell stories of successful small businesses, keep abreast of the latest research regarding work place practices C & T: work with leaders of small firms interested improving their work culture and...
  • Assessing Reference Services Using the READ Scale (Reference

    Assessing Reference Services Using the READ Scale (Reference

    Biomedical Library, University of California, San Diego ... EndNote Level 5 - Extensive Instruction, substantial time/effort spent, multiple resources Graduate research, helping user modify their original research question Level 6 - Very Extensive Instruction, may take 90 minutes or more...