e-Social Science: scaling up social scientific investigations Alex

e-Social Science: scaling up social scientific investigations Alex

e-Social Science: scaling up social scientific investigations Alex Voss, Andy Turner, Rob Procter National Centre for e-Social Science Gabor Terstyanszky, Gabor Szmetanko, Tamas Kiss CPC @ Westminster Presentation at ISGC 2009, Taipei, Taiwan, 2009-04-22. Overview

Background Introduction MoSeS GENESIS Demographic Modelling Population Reconstruction (Initialisation) Dynamic Simulation Experience Organisation

Scaling Issues Future Work Acknowledgements Background Much social science does not use advanced ICT but emergence of new analytical methods is driven by: Increased availability of data about social phenomena Issues with data management and integration Challenges to analyse social phenomena at scale Challenges to inform practical policy and decision making (e.g., evidence-based policy making) National Centre for e-Social Science (NCeSS) in the UK is investigating ways to respond to these challenges.

EUAsiaGrid is supporting e-Social Science amongst other application domains Introduction Virtual worlds rich in detail are being developed Digital representations (of parts) of Earth are being developed Necessarily generalised models Interact with the real world Socio-economic There can always be more detail Higher spatial and temporal resolution More and more detailed attributes Geography and social science is no different to any other type of science in this respect We are all geographers to some extent and we all interact

in some way with the object of study MoSeS Modeling and simulation approaches for social science First phase research node of NCeSS Core contemporary demographic model of the UK based on UK census data and other datasets Using agent-based simulation to project population forward in time by 25 years Explicitly model births, deaths, migration, changes in health status etc

Applications in transportation research, health and social care planning and business applications GENESIS Uses and builds on MoSeS A team involving experts in geovisualisation from UCL Two development strands Theoretic models based on restriction free data Models seeded with more restricted access data More theoretical Computational limits Investigating what visualisations are useful Considering how to do validation models Less emphasis on developing specific applications

Applications being considered in transportation planning Respond ad hoc to what is in the public interest Daily activity models Demographic Modelling I Generation of an individual level population data for the UK Based on 2001 census data Works with public release versions of census that are restricted, Census Aggregate Statistics (CAS) at Output Area Level 1% of population (anonymisation) Reconstructed data has same attributes as real population and same number of individuals but is still anonymised

Uses a genetic algorithm to select a well fitting set of sample of anonymised records to assign to an output area Need for attributes in the SAR to be matched with those in the CAS This is often complicated because of different categories Aggregation to a lowest common categorisation Demographic Modelling II Dynamic modelling Daily activity modelling Commuting Retail modelling Transportation Population Forecasting Annual time step

Birth Death Migration Experiences Integrating existing code into grid environment required some changes to source code management of input arguments code scalability log management error handling Finding the right input size and parameters for testing to keep execution times low Making sense of execution failures lack of ways to debug code in distributed environments

Experiences II Step-wise process works well, ensures we encounter problems piece by piece allows us to comply with data protection / licensing Population reconstruction is resource intensive may run up against limits on wall clock time Importance of at elbow support but hindered by data protection/licensing issues Licensing means we need to limit execution to UK resources Setting up VO to support secure sharing of data Organisation

Scaling Issues I Simulations Need to find ways to map to different architectures, both HPC and HTC Need to deal with large memory requirements and limitations imposed by OS, JVM and Java libraries Exploring Terracotta Distributing computation Virtual heap space

Dependability Advice would be very welcome Scaling Issues II Population model size and sophistication From town size to country size (and beyond) Number of variables Number of constraints Number of cores used To reduce runtime Need to go beyond using only one site Community Open development needs tool support Number of users requires hardening of code & documentation

Future Work Next steps until code runs in Taiwan with Taiwanese data Proof of concept execution on Quanta cluster at ASGC Definition of data outputs from Develop submission to exploit multiple NGS nodes and EGEE Compute Elements Improving data and code staging Moving from population reconstruction to supporting the simulation process Integration into science gateway for the social sciences and developing a repository for models Acknowledgements

National Centre for e-Social Science MoSeS Node: Mark Birkin (PI) GENeSIS Node: Mike Batty (PI) NCeSS Hub: Peter Halfpenny and Rob Procter EUAsiaGrid Consortium Marco Paganoni (Project Director)

CPC at Westminster University Gabor Szmetanko Gabor Terstyanszky Tamas Kiss GridPP Jens Jensen and Jeremy Coles National Grid Service Jason Lander and Shiv Kaushal (Leeds), Steven Young (Oxford), Mike Jones (Manchester)

Recently Viewed Presentations

  • Multiplying Signed Numbers - Math Rules

    Multiplying Signed Numbers - Math Rules

    Dividing Signed Numbers The same rules apply when dividing signed numbers. When dividing two numbers that are positive, you will get an answer that is positive. When dividing two numbers that have different signs, you will get an answer that...
  • Snacktivities - Sport New Zealand

    Snacktivities - Sport New Zealand

    underarm to hit the highest number . on. a . target. Collect . the ball and go to the end of the line. How . many points did your team score? ... throw the beanbag in the air and quickly...
  • The Discovery of Atoms and The Development of

    The Discovery of Atoms and The Development of

    Tin Oxide(s) 100g. tin + 13.5g. oxygen 113.5g. stannous. oxide. stannum = tin (Sn) + 27g. oxygen 127g. stannic. oxide. 100g. tin. SnO. SnO. 2
  • FRS 123: Technology in Art and Cultural Heritage

    FRS 123: Technology in Art and Cultural Heritage

    FRS 123: Technology in Art and Cultural Heritage Devices for Linear Perspective Zahn's Camera Obscura (1685) Zahn's portable camera: built-in lens, screen Dürer's devices (early 1500s) Draftsman's net Artist's glass Dürer's devices (early 1500s) Point plotting device Cigoli's Perspective Machine...
  • Hamburger Paragraphs - Mrs. Zierer's Classroom

    Hamburger Paragraphs - Mrs. Zierer's Classroom

    Hamburger Paragraphs How to write a really great paragraph! Click on me to learn about good paragraphs! The Topic Sentence (Top Bun) Very first sentence of your paragraph. Always needs to be indented. Tells what your paragraph is going to...
  • 2010 Ops Workshop - trlmo.com

    2010 Ops Workshop - trlmo.com

    Socks - all white, athletic style, crew length. Shoes. Well blackened 8 or 10 inch safety boots. Brown boat shoe (as defined in AUXMAN) All white or all black low top athletic shoes with nonskid/non-marking soles
  • Welcome to 4th Grade! - hintonsclass.weebly.com

    Welcome to 4th Grade! - hintonsclass.weebly.com

    Math textbooks will be checked out to each student and can be left at home. Homework assignments will be copied down daily into student planners. Mack Money for completed . homework and signed planners . Homework . Club Parties. Missing...
  • ANNUAL SURVEY OF INDUSTRIES 2012-13 Work shop to

    ANNUAL SURVEY OF INDUSTRIES 2012-13 Work shop to

    Sampling Design in ASI - 2012-13 Selection of state sample: For the purpose of selecting samples from the residual frame for the State/UTs, stratification is done afresh by grouping units belonging to District X 3-digit NIC for each state to...