Data Stewardship Committee & Citizen Science

Data Stewardship Committee & Citizen Science

Improving Information Quality for Earth Science Data and Products An Overview H. K. (Rama) Ramapriyan Science Systems and Applications, Inc. & NASA Goddard Space Flight Center David Moroni Jet Propulsion Laboratory, California Institute of Technology Ge Peng North Carolina State University December 14, 2015 H. K. Ramapriyans work was supported by NASA under contract NNG15HQ01C. David Moronis work is supported by a NASA contract with the Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA. Ge Peng is supported by NOAA under Cooperative Agreement NA14NES432003. Government sponsorship acknowledged. Paper #IN14A-01 - Presented at AGU Fall Meeting, San Francisco ESIP Information Quality Cluster - Objectives Bring together people from various disciplines to assess aspects of quality of Earth science data Establish and publish baseline of standards and best practices for data quality for adoption by inter-agency and international data providers Become an authoritative and responsive resource of information and guidance to data providers on how best to implement certain

data quality standards and best practices for their datasets Build framework for consistent capture, harmonization, and presentation of data quality for the purposes of climate change studies, Earth science and applications (Objectives evolve with participant inputs) Information Quality Scientific quality Accuracy, precision, uncertainty, validity and suitability for use (fitness for purpose) in various applications Product quality how well the scientific quality is assessed and documented Completeness of metadata and documentation, provenance and context, etc. Stewardship quality how well data are being managed and preserved by an archive or repository how easy it is for users to find, get, understand, trust, and use data whether archive has people who understand the data available to help users. Information Quality is a combination of all of the above Background QA4EO ISO 19157:2013 Standard Geographic information -- Data quality NOAA Climate Data Record (CDR) Maturity Matrix

NOAA Data Stewardship Maturity Matrix NCAR Community Contribution Pages NASA Making Earth System Data Records (ESDRs) for use in Research Environments (MEaSUREs) Product Quality Checklists NASA Earth Science Data System Working Groups (ESDSWG) Data Quality Working Group Much related work has occurred in recent years QA4EO Established and endorsed by the Committee on Earth Observation Satellites (CEOS) in response to a Group on Earth Observations (GEO) Task DA-06-02 (now Task DA-09-01a) Four International Workshops - 2007, 2008, 2009, and 2011 Key Principles (from In order to achieve the vision of GEOSS, Quality Indicators (QIs) should be ascribed to data and products, at each stage of the data processing chain - from collection and processing to delivery A QI should provide sufficient information to allow all users to readily evaluate a products suitability for their particular application, i.e. its fitness for purpose.

To ensure that this process is internationally harmonized and consistent, the QI needs to be based on a documented and quantifiable assessment of evidence demonstrating the level of traceability to internationally agreed (where possible SI) reference standards. Framework and 10 Key Guidelines established (e.g., establishing Quality Indicator , establishing measurement equivalence, expressing uncertainty) A few cases studies are available that illustrate QA4EO-compliant methodologies [e.g., NOAA Maturity Matrix for CDRs, WELD: Web - Enabled Landsat Data (NASAfunded MEaSUREs Project), ESA Sentinel-2 Radiometric Uncertainty Tool] ISO 19157:2013 - Geographic information -- Data quality* Establishes principles for describing the quality of geographic data Defines components for describing data quality Specifies components and content structure of a register for data quality measures Describes general procedures for evaluating the quality of geographic data Establishes principles for reporting data quality

Defines a set of data quality measures for use in evaluating and reporting data quality Applicable to data producers providing quality information to describe and assess how well a data set conforms to its product specification Applicable to data users attempting to determine whether or not specific geographic data are of sufficient quality for their particular application Examples of DQ Elements: Completeness, Thematic Accuracy, Logical Consistency, Temporal Quality, Positional Accuracy * From: CDR Maturity Matrix NOAA NCEI Climate Data Record (CDR) Maturity Matrix assesses readiness of a product as a NOAA satellite CDR Bates, J. J. and Privette, J. L., A Maturity Model for Assessing the Completeness of Climate Data Records, Eos, Vol. 93, No. 44, 30 October 2012 Assesses maturity in 6 categories (software readiness, metadata, documentation, product validation, public access, utility) at 6 levels Provides consistent guidance to data producers for improved data quality and long-term preservation EUMETSATs CORE-CLIMAX Matrix based on CDR Maturity Matrix; contains guidance on uncertainty measures http:// Matrix_Template.xlsx

Data Stewardship Maturity Matrix NOAA NCEI/CICS-NC Scientific Data Stewardship Maturity Matrix (SMM) provides a unified framework for assessing the maturity of measurable stewardship practices applied to individual digital Earth Science datasets that are publicly available Assesses maturity in 9 categories (e.g., preservability, accessibility, data quality assessment, data integrity) at 5 levels Provides understandable data quality information to users including scientists and actionable information to management Peng, G. et al, 2015. A unified framework for measuring stewardship practices applied to digital environmental datasets, Data Science Journal, 13. doi:10.2481/dsj.14-049 More details in paper #IN14A-05 NCAR Climate Data Guide* Community contributed datasets, reviews Focuses on limited selection of data sets that are most useful for large-scale climate research and model evaluation Contributed reviews answer 10 key questions; Examples of topics addressed

strengths, limitations, and typical applications of datasets Comparable datasets Methods of uncertainty characterization utility for climate research and model evaluation. *From Schneider, D. P., et al (2013), Climate Data Guide Spurs Discovery and Understanding, Eos Trans. AGU, 94(13), 121. [article] - See more at: NASA MEaSUREs - Product Quality Checklists Making Earth System Data Records for Use in Research Environments (MEaSUREs) NASA-funded, typically 5-year projects generating longterm consistent time series Product Quality Checklists (PQC) indicate completeness of Quality Assessment, metadata, documentation, etc. PQC templates - developed in 2011 and adopted in 2012 Questions asked address science quality, documentation quality, usage and user satisfaction NASA Earth Science Data System Working Groups (ESDSWG) Data Quality Working Group (DQWG) Mission: Assess existing data quality standards and practices in the inter-agency and international arena to determine a working solution relevant to Earth Science Data and Information System

Project (ESDIS), Distributed Active Archive Centers (DAACs), and NASA-funded Data Producers. Initiated in March 2014 2014-2015: 16 use cases analyzed, issues identified from users points of view and ~100 recommendations made for improvement Consolidated into 12 high-priority recommendations 2015-2016: Extracted 4 Low Hanging Fruit (LHF) recommendations from previous 12 Implementation strategies for comprehensive integration across NASA ESDIS have been scoped out for LHF recs. Details will be covered in paper #IN14A-08 ESIP Information Quality Cluster Activities

Coordinate use case studies with broad and diverse applications, collaborating with the ESIP Data Stewardship Committee and various national and international programs Identify additional needs for consistently capturing, describing, and conveying quality information Establish and provide community-wide guidance on roles and responsibilities of key players and stakeholders including users and management Prototype innovative ways of conveying quality information to users Evaluate NASA ESDSWG DQWG recommendations and propose possible implementations. Establish a baseline of standards and best practices for data quality, collaborating with the ESIP Documentation Cluster and Earth Science agencies. Engage data providers, data managers, and data user communities as resources to improve our standards and best practices. Thank you for your attention! [email protected] [email protected] [email protected] NOAA CDR Maturity Matrix Maturity Software Readiness

Metadata Documentation Product Validation Public Access Utility 1 Conceptual development Little or none Draft Climate Algorithm Theoretical Basis Document (C-ATBD); paper on algorithm submitted Little or None Restricted to a select few Little or none

2 Significant code changes expected Research grade C-ATBD Version 1+ ; paper on algorithm reviewed Minimal Limited data availability to develop familiarity Limited or ongoing Public C-ATBD; Peerreviewed publication on algorithm Uncertainty estimated for select locations/times 3 Moderate code changes

expected Research grade; Meets int'l standards: ISO or FGDC for collection; netCDF for file Exists at file and collection level. Stable. Allows provenance tracking and reproducibility of dataset. Meets international standards for dataset 4 Some code changes expected 5 Complete at file and collection level. Stable. Allows Minimal code changes provenance tracking and expected; Stable, portable and reproducibility of dataset. reproducible Meets international standards

for dataset 6 No code changes expected; Stable and reproducible; portable and operationally efficient Data and source code archived Assessments have and available; caveats required demonstrated positive value. for use. Public C-ATBD; Draft Operational Algorithm Description (OAD); Peerreviewed publication on algorithm; paper on product submitted Uncertainty estimated over Data and source code archived widely distributed May be used in applications; and publicly available; times/location by multiple

assessments demonstrating uncertainty estimates provided; investigators; Differences positive value. Known issues public understood. Public C-ATBD, Review version of OAD, Peerreviewed publications on algorithm and product Record is archived and Consistent uncertainties May be used in applications by publicly available with estimated over most other investigators; associated uncertainty environmental conditions by assessments demonstrating estimate; Known issues public. multiple investigators positive value Periodically updated Updated and complete at file Observation strategy designed

and collection level. Stable. Public C-ATBD and OAD; to reveal systematic errors Used in published applications; Record is publicly available Allows provenance tracking Multiple peer-reviewed through independent crossmay be used by industry; from Long-Term archive; and reproducibility of dataset. publications on algortihm and checks, open inspection, and assessments demonstrating Regularly updated Meets current international product continuous interrogation; positive value standards for dataset quantified errors 14 Data Stewardship Maturity Matrix

Recently Viewed Presentations

  • #10 Pt. 2  The Book Of Joshua The

    #10 Pt. 2 The Book Of Joshua The

    make His face shine upon you, and be gracious to you; the ... Joshua Bids The People Farewell. ... We relate to God under a different covenant, a new and better covenant (Hebrews 8:6-7), by which Jesus has redeemed us...
  • Rome - Pearl Public School District

    Rome - Pearl Public School District

    The city of Rome was named after Romulus after he won a deadly battle against his brother. ... Republic- The Romans grew tired of the King and decided to overthrow him and go with a republic which gave people the...
  • Classroom Rules "The 5 P's" - Wasatch

    Classroom Rules "The 5 P's" - Wasatch

    Classroom Rules"The 5 P's" "I follow three rules: Do the right thing, do the best you can, and always show people you care."--Lou Holtz (former American football coach)
  • Open Source Design Pattern Library Spreading Communities Thick:

    Open Source Design Pattern Library Spreading Communities Thick:

    Open Source Design Pattern Library Spreading Communities Thick: Open Source Communities of Practice Allison Bloodworth, Senior User Interaction Designer, Educational Technology Services, University of California - Berkeley
  • eResearch Suite A comprehensive electronic platform for consenting

    eResearch Suite A comprehensive electronic platform for consenting

    Within each study snapshot, the user can take several actions: - Email the study information to themselves (or a friend/family) - Email the study team - See more information about the study, record. StudyFinder adopted by other institutions. UTSW.
  • Diapositiva 1 - Le Baobab Bleu

    Diapositiva 1 - Le Baobab Bleu

    À désigner des objets ou des personnes présents dans la situation: C'est qui l'administrateur? Celui-là À éviter des répétitions: Quel homme politique je choisis pour l'Europe? Celui qui a des projets d'avenir. Les pronoms démonstratifs sont toujours accompagnés: d'un adverbe:...
  • The War Room - Mrs. DeVault's Blended Learning

    The War Room - Mrs. DeVault's Blended Learning

    4. Carpet bombing. Identification: Large scale air bombing with the aim of complete destruction of a large area or city. Used to demoralize the enemy; making the prospect of peace or surrender preferable.
  • CompTIA Network

    CompTIA Network

    MAN: A MAN is between a LAN and a WAN, typically covering a metropolitan area such as three office branches in the same city. PAN: A PAN is created from the interconnection of personal devices such as a phone, headset,...