KDD for Science Data Analysis Issues and Examples

KDD for Science Data Analysis Issues and Examples

KDD for Science Data Analysis Issues and Examples Contents Introduction Data Considerations

Brief Case Studies

Sky Survey Cataloging Finding Volcanoes on Venus Biosequence Databases Earth Geophysics Atmospheric Science Issues and Challenges Conclusion Data Considerations

Image Data Time-series and sequence data Numerical Vs Categorical values Structured and sparse data

Reliability of Data Brief Case Studies Sky Survey Cataloging

Finding Volcanoes on Venus Earth Geophysics Atmospheric Science Biosequence Databases Sky Survey Cataloging The survey consists of 3 terabytes of image data containing an estimated 2 billion sky objects The basic problem is to generate a survey catalog which records the attributes of each object along with its class: star or galaxy

To achieve this scientists developed the SKICAT system Reasons why SKICAT was successful

The astronomers solved the feature extraction problem Data mining methods contributed to solving difficult classification problems Manual approaches were simply not feasible. Astronomers needed an automated classifier to make the most out of the data Decision tree methods proved to be an effective tool for finding the important dimensions for this problem Finding Volcanoes on Venus

Data collected by Magellan spacecraft The first pass of Venus using the left looking radar resulted in 30,000 1000 x 1000 pixel images To help geologists analyze this data set, the JPL Adaptive Recognition Tool (JARtool) was developed

Motivation for using Data mining methods Scientists did not know much about image processing or about the SAR properties. Hence they could easily label

images but not design recognizers There was little variation in illumination and orientation of objects of interest. Hence mapping from pixel space to feature space can be performed automatically Geologists did not have any other easy means for finding the small volcanoes, hence they were motivated to cooperate by providing training data and other help Earth Geophysics Two images taken before and after an earthquake and

by repeatedly registering different local regions of the two images, it is possible to infer the direction and magnitude of ground motion due to the earthquake. Example of a geoscientific data mining system is Quakefinder which automatically detects and measures tectonic activity in the earths crust by examination of Satellite data Atmospheric Science

Data mining tool used is called CONQUEST Parallel testbeds were employed by Conquest to enable rapid extraction of spatio-temporal features for content based access. Some of the goals of the this tool is the development of learning algorithms which look

for novel patterns, event clusters etc. Retrieved Sea Level Pressure Fields Biosequence Databases The largest DNA database is GENBANK with a database

of about 400 million letters of DNA from a variety of organisms The pressing data mining tasks for biosequence are Find genes in the DNA sequences of various organisms. Some of the gene finding programs such as GRAIL, GeneID, GeneParser, Genie use neural nets and other AI or statistical methods Issues and Challenges

Feature Extraction Minority Classes High degree of Confidence

Data mining task Relevant domain Knowledge Scalable machines and Algorithms Conclusions KDD applications in science may in general be easier than applications in business, finance, or other areas. This is due to the fact that science end users typically know the data in intimate detail.

Recently Viewed Presentations

  • Reflecting on Practice: Using Inquiry to Build Thinking ...

    Reflecting on Practice: Using Inquiry to Build Thinking ...

    Look at the wordle and find a way to make 3 or 4 clusters of words that belong together as big ideas for what makes a task worthwhile. Write these clusters on your board. Reflecting on Practice. Park City Mathematics...
  • Disciplined Agile Delivery

    Disciplined Agile Delivery

    Parting Thoughts. You don't just do DevOps. You must also have the DevOps mindset. DevOps improves IT's ability to support the rest of the organization
  • GCSE English - Reigate School

    GCSE English - Reigate School

    GCSE English Language and GCSE English Literature. Four papers (two for Language, two for Literature) ... read poetry, read novels, read the newspaper, read as much as possible . and discuss what you have read. ... An 12 mark question...
  • Teachers Teachers instructions instructions Click Clickto tostart start

    Teachers Teachers instructions instructions Click Clickto tostart start

    This activity can be alongside regular curriculum content to help build vocabulary and confidence with the English language. This particular version of Red Herring focuses on synonyms and antonyms. In each question, students should identify the antonym.
  • Landscape image interpretation task  View the images in

    Landscape image interpretation task View the images in

    Landscape image interpretation task View the images in each of these slides. There is a question beneath each image - attempt to answer each one.
  • Choose a category. You will be given the

    Choose a category. You will be given the

    Instructions for using this template. Remember this is Jeopardy, so where I have written "Answer" this is the prompt the students will see, and where I have "Question" should be the student's response.
  • ODJFS Updates

    ODJFS Updates

    Develop Ohio's statewide CQI infrastructure for child welfare. Increase accessibility of SACWIS data and improve data integrity to support CQI activities. Further integrate CQI into OFC's technical processes Apply CQI principles to improve casework practice and supervision
  • The Age of Realism

    The Age of Realism

    Realism in Visual Arts. Objective: a truthful objective, scientific, view of the world. Artists wanted to show society as it really was (not "Romanticized") Scenes of industrial cities, physical labor, real people . who complete the . real work. Artist...