SHARING DATA TO ADVANCE SCIENCE Data Repository Assessment

SHARING DATA TO ADVANCE SCIENCE Data Repository Assessment

SHARING DATA TO ADVANCE SCIENCE Data Repository Assessment & Certification: Experiences and Lessons Learned Jared Lyle Network of Asian Social Science Data Archives Tokyo, Japan January 25, 2019

Acknowledgements Mary Vardigan Nancy McGovern Outline Overview of ICPSR Why assessment is important Assessment and certification options

ICPSRs experience with assessment, including effort and resources needed ICPSRs recent application to CoreTrustSeal Benefits from assessment http://www.icpsr.umich.edu ICPSR Established 1962

Originally 22 Members, now consortium of 776 world-wide Originally Political Science, now all social and behavioral sciences Philip Converse, Warren Miller, and Angus

Campbell Source: http://www.icpsr.umich.edu/icpsrweb/content/membership/history/timeline.html ICPSR Current holdings 10,000+ studies, quarter million files 1500+ are restricted studies, almost always to protect confidentiality

Bibliography of Data-related Literature with 80,000 citations Approximately 60,000 active MyData (shopping cart) accounts Thematic collections of data about addiction and HIV, aging, arts and culture, child care and early education, criminal justice, demography, health and medical care, and minorities

ICPSR Make data sharing feasible ICPSRs General Archive Anyone can deposit Curated and preserved Guidance over data life cycle Templates for consent, Institutional Review Boards, Data Management Plans consistent with transparent and reproducible access

Incentivize data sharing Standard citation Bibliography Usage statistics Why Assessment is Important http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf

Promote the deposit of data in publicly accessible databases, where appropriate and available Forever! Guaranteed! We promise!

http://chronicle.com/blogs/wiredcampus/hazards-of-the-cloud-data-storage-services-crash-sets-back-researchers/52571 If we want to be able to share data, we need to store them in a trustworthy data repository. Data created and used by scientists should be managed, curated, and archived in such a way to preserve the initial investment in collecting them. Researchers must be certain that data held in archives remain useful and meaningful into the

future. An Introduction to the Core Trustworthy Data Repositories Requirements https://www.coretrustseal.org/wp-content/uploads/2017/01/Intro_To_Core_Trustworthy_Data_Repositories_Requirements_2016-11.pdf Why Assessment is Important Promote trust by funding agencies, data producers, and data users that data will be available for the long term Provide transparent view into the repository

Improve processes and procedures Measure against a community standard Show the benefits of domain repositories Dillo, I., & de Leeuw, L. (2018). CoreTrustSeal. Communications of the Association of Austrian Librarians, 71(1), 162-170. https://doi.org/10.31263/voebm.v71i1.1981 Assessment Options Basic Certification CoreTrustSeal (replaces Data Seal of Approval and

World Data System) Formal Certification Trustworthy Repositories Audit and Certification (TRAC)/ISO 16363 (includes site visit) Other alternatives Self-audits and peer reviews Digital Repository Audit Method Based On Risk Assess

ment (DRAMBORA) nestor-Seal DIN 31644 Common Elements of Assessment The Organization and its Framework Governance, staffing, policies, finances, etc.

Treatment of the Data Access, integrity, process, preservation, etc. Technical Infrastructure System design, security, etc. ICPSR Assessment Experience 2005-2006 CRL test audit (TRAC checklist)

2010-2012 TRAC/ISO 16363 self-assessment 2009-2010 Data Seal of Approval certification 2013 Data Seal of Approval (update) 2013 World Data System certification 2018-2019 CoreTrustSeal CRL Test Audit, 2005-2006

Test methodology based on RLG-NARA Checklist for the Certification of Trusted Digital Repositories Assessment performed by an external agency (CRL) Precursor to current TRAC audit/certification ICPSR Test Audit Report: http://www.crl.edu/sites/default/files/attachments/ pages/ICPSR_final.pdf

Effort and Resources Required Completion of Audit Checklist Gathering of large amounts of data about the organization staffing, finances, digital assets, process, technology, security, redundancy, etc. Weeks of staff time to do the above Hosting of audit group for two and a half days

with interviews and meetings Remediation of problems discovered Findings Positive review overall: Taken as a whole, ICPSR appears to provide responsible stewardship of the valuable research resources in its custody. Depositors of data to the ICPSR data archives and users of

those archives can be confident about the state of its operation, and the processes, procedures, technologies, and technical infrastructure employed by the organization. Findings Positive review overall, but Succession and disaster plans needed Funding uncertainty (grants)

Acquisition of preservation rights from depositors Need for more process and procedural documentation related to preservation Machine-room issues noted Changes Made Hired a Digital Preservation Officer Created policies, including Digital

Preservation Policy Framework, Access Policy Framework, and Disaster Plan Changed deposit process to be explicit about ICPSRs right to preserve content Continued to diversify funding (ongoing) Made changes to machine room TRAC self-assessment, 2010-2012 TRAC/ISO most rigorous method

requirements (100 in ISO) OAIS orientation 80+ Procedures Followed Parceled out the 80+ TRAC requirements to committees across the organization Set up Drupal system for reporting evidence

Gathered evidence demonstrating compliance for each guideline; rated compliance on scale Digital Preservation Officer and Director of Curation Services reviewing evidence Goal is to provide a public report TRAC/ISO Drupal System https://wiki.archivematica.org/Internal_audit_tool

Example TRAC/ISO Requirements Documented process for testing understandability of the information content Process that generates the requested digital object(s) is complete Process that generates the requested digital object(s) is correct

All access requests result in a response of acceptance or rejection Dissemination of authentic copies of the original or objects traceable to originals Effort and Resources Required Time of many individuals across the organization Technology Developed Drupal site for data

entry Time for high-level review and summarization Time/technology most likely required to address areas for improvement DSA Self-Assessment, 2009-2010 http://assessment.datasealofapproval.org/assessment_78/seal/pdf http://hdl.handle.net/2027.42/144318

Data Seal of Approval Started by DANS in 2009 The objectives of the DSA are to safeguard data, to ensure high quality and to guide reliable management of data for the future without requiring the implementation of new standards, regulations, or high costs. http://www.datasealofapproval.org/en/information/about/

Data Seal of Approval 16 guidelines 3 target the data producer, 3 the data consumer, and 10 the repository Example guideline: (7) The data repository has a plan for long-term preservation of its digital assets. Self-assessments are done online with ratings and then peer-reviewed by a DSA Board

member Procedures Followed Digital Preservation Officer and Director of Collection Delivery conducted selfassessment, assembled evidence, completed application Provided a URL for each guideline Effort and Resources Required

Mainly time of the Digital Preservation Officer and Director of Collection Delivery Would estimate two days at most Less time required to recertify every two years Self-Assessment Ratings Using the manual and guiding questions: Rated ICPSR as having achieved 4 stars for all

but Guideline 13, which addresses full OAIS compliance Findings and Changes Made Recognized need to make policies more public e.g., static and linkable Terms of Use (previously only dynamic) Reinforced work on succession planning now integrated into Data-PASS partnership

agreement Underscored need to comply with OAIS building a new system based on it DSA Self-Assessment, 2014-2015 https://assessment.datasealofapproval.org/assessment_114/seal/pdf/ http://hdl.handle.net/2027.42/144319

World Data System Certification, June 2013 WDS is effort of the International Council of Science (ICSU) Started in natural sciences -- similar to Data Seal of Approval Membership and certification mechanisms World Data System Certification,

June 2013 20+ criteria (guidelines) Example criterion: The facility ensures integrity and authenticity of data sets during ingest, archival storage, data quality assessment and analysis, product generation, access, and delivery Effort and Resources Required

Time of one individual around two days Five-stage process: Organization expresses interest; demonstrates its capabilities; if necessary, an on-site review may occur; accreditation; review every 3-5 years Findings ICPSR certified but members-only access questioned as WDS data is open access

Permitted comparison of WDS and DSA content and procedures Resulted in WDS-DSA Working Group under the umbrella of the RDA Certification IG WG assessed commonalities and potential to combine efforts, which resulted in the CoreTrustSeal Data Repository certification CoreTrustSeal, 2018-2019

CoreTrustSeal Developed by the DSA-WDS Partnership Working Group on Repository Audit and Certification, a Working Group of the Research Data Alliance Merging of the Data Seal of Approval certification and the World Data System certification 16 criteria (guidelines)

Requirements 16 criteria (guidelines): Organizational Infrastructure (6) Digital Object Management (8) Technology (2) Compliance level 0 Not Applicable

1 The repository has not considered this yet 2 The repository has a theoretical concept 3 The repository is in the implementation phase 4 The guideline has been full implemented applicants will be judged against statements supported by appropriate evidence; not against self-assessed compliance levels. Organizational Infrastructure

has an explicit mission to provide access to and preserve data in its domain maintains all applicable licenses covering data access and use and monitors compliance has a continuity plan to ensure ongoing access to and preservation of its holdings ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms

Organizational Infrastructure has adequate funding and sufficient numbers of qualified staff managed through a clear system of governance to effectively carry out the mission adopts mechanism(s) to secure ongoing expert guidance and feedback (either in-house, or external, including scientific guidance, if relevant)

Digital Object Management guarantees the integrity and authenticity of the data accepts data and metadata based on defined criteria to ensure relevance and understandability for data users applies documented processes and procedures in managing archival storage of the

data assumes responsibility for long-term preservation and manages this function in a planned and documented way Digital Object Management has appropriate expertise to address technical data and metadata quality and ensures that sufficient information is available for end users to

make quality-related evaluations Archiving takes place according to defined workflows from ingest to dissemination enables users to discover the data and refer to them in a persistent way through proper citation enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data

Technology functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community The technical infrastructure of the repository provides for protection of the facility and its data, products, services, and users

Example of Evidence R5 Guideline Text: R5. The repository has adequate funding and sufficient numbers of qualified staff managed through a clear system of governance to effectively carry out the mission Example of Evidence R5 Guidance:

The repository is hosted by a recognized institution (ensuring long-term stability and sustainability) appropriate to its Designated Community. The repository has sufficient funding, including staff resources, IT resources, and a budget for attending meetings when necessary. Ideally this should be for a three- to five-year period. The repository ensures that its staff have access to ongoing training and professional development.

The range and depth of expertise of both the organization and its staff, including any relevant affiliations (e.g., national or international bodies), is appropriate to the mission. ICPSR Response: R5 With more than 55 years of service to the social sciences, ICPSR is the largest archive of digital social and behavioral science data in the world. ICPSR is a unit within the Institute for Social Research at the University of

Michigan and maintains its office in Ann Arbor. [1] ICPSRs diversified funding model offers stability and reliability. The three primary sources of revenue include grants and contracts, membership dues, and tuition [2]. ICPSR provides data archiving and dissemination services for more than 20 government agencies and foundations, including the Bureau of Justice Statistics, the National Science Foundation, the National Institutes of Health, the Alfred P. Sloan Foundation, the Laura and John Arnold Foundation, the Bill & Melinda Gates Foundation, and the Robert Wood Johnson Foundation [3]. Some of these partnerships have been in place for

decades. Membership dues from ICPSRs over 780 member institutions [4] and tuition from the Summer Program in Quantitative Methods [5] make up other revenue streams. ICPSR Response: R5 (continued) A 12-person Council whose members are elected by the ICPSR membership provides guidance and oversight to ICPSR. Members serve four-year terms, and six new members are elected every two years. The Council acts on administrative, budgetary, and organizational issues on behalf of all the

members of ICPSR. [6] ICPSRs staff of over 100 perform a variety of functions to support ICPSRs archival and training missions. The staff include data curators and managers, librarians, Web developers, communications specialists, user support specialists, administrative staff, and a small team of researchers, as well as software developers, programmers, system administrators, and desktop support specialists. Staff have expertise in digital archiving, data preservation, usability testing, Section 508 review for ADA Section 8 compliance, DOI registration, web traffic analytics, search engine optimization, storage and

dissemination of sensitive data, restricted-use data agreements, and researcher credentialing. All staff are required to complete ongoing training related to data security and disclosure risk. [7] ICPSR Response: R5 (continued) ICPSR operates in accord with three organizational documents: a Constitution [8], Bylaws [9], and a Memorandum of Agreement with the University of Michigan and the Institute for Social Research [10]. The organization also maintains several policies that inform and guide its work as

an archive, including an overarching Strategic Plan [11] that lays out the organizations priorities for coming years. Other policies cover areas such as digital preservation [12], data access [13], collection development [14], and disaster planning [15]. ICPSR Response: R5 (continued) References: [1] ICPSR Web site, About the Organization: https://www.icpsr.umich.edu/icpsrweb/content/about/index.html (accessed

2018-10-04) [2] ICPSR 2016-2017 Annual Report, Financial Reports: https://www.icpsr.umich.edu/files/ICPSR/about/annualreport/2016-2017.pdf (accessed 2018-11-08) [3] ICPSR Web site, Thematic Data Collections: https://www.icpsr.umich.edu/icpsrweb/content/about/thematic-collections.html (accessed 2018-10-04) [4] ICPSR Web site, List of Member Institutions and Subscribers: https://www.icpsr.umich.edu/icpsrweb/membership/administration/institutions

(accessed 2018-11-06) Effort and Resources Required 3-5 days of time by the Director of Metadata and Preservation Less time required to certify every 3 years 57

58 59 Findings and Changes Made In progress -- CoreTrustSeal Secretariat will assign reviewers shortly Some fine tuning:

Selection decisions about individual files in deposits Specifying duration of preservation commitment Continued compliance with OAIS (e.g., file-level citations) Comparison of Assessments Effort and Resources Test audit was the most labor- and time-intensive TRAC self-assessment involved the time of more people

CoreTrustSeal (Data Seal of Approval and World Data System) certification least costly Comparison of Assessments Benefits What did we learn and did the results justify the work required? Test audit was first experience resulted in greatest number of changes, greatest increase in

awareness Fewer changes made as a result of CoreTrustSeal (DSA and WDS); also not as detailed TRAC assessment has surfaced additional issues to address Benefits continued Difficult to quantify Trust of stakeholders

Transparency Improvements in processes and procedures Use of community standards Greater awareness of benefits of domain repositories Leadership dimension also important Thank you!

[email protected] Other References Vardigan, M. and Lyle, J., 2014. The Inter-university Consortium for Political and Social Research and the Data Seal of Approval: Accreditation Experiences, Challenges, and Opportunities. Data Science Journal, 13, pp.PDA83PDA87. DOI: http://doi.org/10.2481/dsj.IFPDA-14 Additional Observations

Try not to integrate details about technology that may change Schedule regular reviews of policies included in the assessments

Recently Viewed Presentations

  • Phenotypic Evolution: Process MUTATION + SELECTION POPULATIONS +/

    Phenotypic Evolution: Process MUTATION + SELECTION POPULATIONS +/

    DIRECTIONAL SELECTION ON A SINGLE LOCUS FITNESSES: WAA = 1 WAa = 1- hs Waa = 1 - s CASE 1: ADVANTAGEOUS ALLELE WITH DIFFERING DEGREES OF DOMINANCE THE RATE OF SPREAD OF A FAVORABLE ALLELE DEPENDS ON THE DEGREE...
  • Teacher Performance Appraisal - Ontario

    Teacher Performance Appraisal - Ontario

    ~ Stronge and Tucker Working with the Summative Report Activity Four: Marg's Classroom Observation A formal classroom observation is only one part of a comprehensive teacher performance appraisal. ... Practice Because the post-observation meeting component of the TPA process appraisal...
  • Exam Papers START By John Healy Search Engine

    Exam Papers START By John Healy Search Engine

    The main sources of heating for a dwelling is a boiler that provides hot water for an indirect cylinder and four radiators. With the aid of a comprehensive sketch, explain the complete layout of hot and cold services, indicating a...
  • Vision showcase activity The National College of Computer

    Vision showcase activity The National College of Computer

    We implement various project (COMENIUS ):-the bilateral partnership with a school from Italy, called Young Journalist-Heralds of Tradition-the multilateral partnership with schools from Germany, Turkey, Greece, Bulgaria, called SOS WATER-the host school for assistant Comenius- Germany language, French language
  • Business Services Organisation Roadshow 2018 Agenda  Prescription journey

    Business Services Organisation Roadshow 2018 Agenda Prescription journey

    Current codebook description. Future codebook description. Will code over the number of gauze swabs i.e. multiples of 100. BSO have been working towards implementing AVS for all prescribed items ; products such as gauze swabs will become virtual "special containers"...
  • Anatomia de grandes vasos - HCI

    Anatomia de grandes vasos - HCI

    Veia safena magna: origina-se na rede de vênulas da região dorsal do pé, margeando a borda medial desta região, passa entre o maléolo medial e o tendão do músculo tibial anterior e sobe pela face medial da perna e da...
  • processR Shiny App

    processR Shiny App

    Select Process Macro Model Number. Select Model 1. Assign Variables. ... This model is saturated, it includes every possible effect. Were we to delete in interaction term, it would not be saturated. ... The simple effect of misanthropy is significant...
  • WTO/TBT Workshop on the Different Approaches to Conformity

    WTO/TBT Workshop on the Different Approaches to Conformity

    Times New Roman Arial CommonBullets Arial Unicode MS Wingdings Verdana Tahoma Trebuchet MS Arial Narrow Developing Country Seminar ILAC.IAF.13.10.04 Microsoft Graph 2000 Chart Visio 2000 Drawing Microsoft Excel Chart PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation PowerPoint Presentation ...