EicMC: EIC-specific Google Protocol Buffers Monte-Carlo file format

EicMC: EIC-specific Google Protocol Buffers Monte-Carlo file format

EicMC: EIC-specific Google Protocol Buffers Monte-Carlo file format Alexander Kiselev EIC R&D Software Consortium Meeting BNL February,09 2017 Motivation Our October2016 meeting: Want to exchange MCEG files in a non-ROOT and non-ASCII format Bring all existing EIC MC generator files to a common denominator -> suggestion: adapt existing ProMC library to do the job Certain progress in this direction made at the beginning: Generator-neutral part is incorporated in EicRoot framework (as an extra input file format for pure GEANT transport purposes) EIC MCEG-specific info encoding in ProMC faced difficulties: ProMC is primarily Pythia-oriented -> no elegant way to extend .proto files to maintain say MILOU-specific event-per-event variables Few other small (and partly fake) issues identified with the ProMC

format (floating-point precision, default 64k record limit, inefficient storage of typically small EIC events, external dependencies, etc) Feb,9 2017 A.Kiselev 2 Google Protocol Buffers Active project, maintained and internally used by Google Long-term support guaranteed as long as Google is there Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. message MyMessage { string name = 1; int32 id = 2; repeated float data = 3; } Feb,9 2017 message->set_name(Crap); message->set_id(777); message->add_data(3.14159);

message->add_data(2.71828); message->SerializeToStream(stream); // or such A.Kiselev 3 Google Protocol Buffers Provide certain degree of flexibility in data description All basic data types as well as nested messages (structures) Some sort of unions and even STL maps Custom data definition language (.proto files) and converter to C++ and other languages (resembles ADAMO, ROOT dictionaries, etc) As long as simple rules are observed when extending the format: Backward compatibility is maintained (add new variables to a .proto file, recompile -> new executable still able to read old files with missing variables) Forward compatibility is maintained (add new variables to a .proto file, create new file in this extended format -> old executable still able to read new files in part, which it is aware of) -> yet looks like step back towards the stone age compared to ROOT; fine Feb,9 2017 A.Kiselev

4 EicMC in brief Small standalone C++ library (~3k lines of code total) No external dependencies on the user (import) side Except for the Google protobuf libraries, of course MCEG -> EicMC converter is realized through eic-smear interface though, therefore ROOT is required Portability: tried the codes out on SL6 and OS X Mavericks ~512MB max single event size if the only real built-in limit -> the rest of the presentation will be EicMC vs ProMC in a snapshot fashion Feb,9 2017 A.Kiselev 5 Binary file layout

EicMC Individually zipped event records in Google protobuf message format with a top-level directory structure provided by the third party library (with its own issues) Native stream of lengthdelimited event and service records (sparsification tables, direct access catalogues) in Google protobuf message format ProMC: ProMC Event records can be decoded independently (so per definition no complications with direct access mode) EicMC: Event records are independent from each other, but require extra information (sparsification tables) for decoding Optional compression using respective flavor of google protobuf stream is possible (and events can be merged together while zipping) Feb,9 2017

A.Kiselev 6 Direct access to the event records EicMC Top-level linear directory structure and respective Skip() and Seek() calls provided by the third party library Multi-dimensional direct access tables are injected in the event message stream as separate custom records ProMC: ProMC Should be faster (linear catalogue structure with direct access to individual zipped event records) compared to EicMC default mode EicMC:

Must be much slower in default mode (layered structure with direct access to the typically coalesced chunks of zipped event records) Should however be infinitely scalable If scalability and file size are of no concern, a fall-back a la ProMC mode can be imitated (individually zipped events and 1D addressing) Feb,9 2017 A.Kiselev 7 Self-description (whatever it means) ProMC EicMC Relevant collection of .proto files can be included by hand as individual zipped records and can be retrieved later Base Record message structure matching the current library is automatically included in the file header EicMC: Technically the Record structure is very similar to a .proto file

There are user calls provided, which allow one of the following: Build message structure on the fly (reflection) and retrieve variables by name -> hardly of any practical use, but allows to claim the true self description feature of this file format Dump a proper .proto file (which in addition to the Record message contains gzip file header extension layout description), which can be used to compile a library with the message structure exactly matching this particular binary file Feb,9 2017 A.Kiselev 8 MCEG event records Common to all generators: Momentum components Vertex coordinates and time Status, PDG, mother(s), daughter(s) Nasty part: event-per-event generator-specific variables

Hardcode them all as event sub-headers in the .proto file? Use some sort of {tag,value} maps? -> NB: in the ideal ROOT-based eic-smear world these are inherited C++ classes Feb,9 2017 A.Kiselev 9 Philosophy of MCEG info inclusion ProMC EicMC Create separate .proto files for different MC generators (see promc & nlo examples) and compile custom library version(s) accordingly Use identical .proto file for all generators; generator-specific info for individual events is added via sparsified {tag,value} maps event->AddFloatValue(trueY, 0.95); ProMC:

This default implementation does not allow two different formats to be compiled in at once (which definitely limits the useability) Optional: add plain {tag,value} arrays on event-per-event basis Would be fine for the file header; for individual events must be pretty inefficient (?) EicMC: Convertor for all so far known DIS MCEG already implemented Requires ROOT and eic-smear Both floats and int64 values, as well as tagged arrays can be packed Feb,9 2017 A.Kiselev 10 Floating point precision ProMC EicMC

Momentum and coordinate values are stored as signed integers in units of userspecified resolution Both a la ProMC storage mode and double (single) precision possible and can be selected via user calls when file is created EicMC: Double-precision floating point user interface therefore 64-bit default storage mode for {px,py,pz;x,y,z,t} unless the actually provided values are by mistake given in single precision (which can be checked easily), then stored in a 32-bit floats basket ProMC-like storage mode (fixed precision, say keep momenta with precision up to 1 keV/c only) is also possible in which case values are stored in a variable length 64-bit integers basket Feb,9 2017 A.Kiselev 11 User interface

ProMC EicMC Internal google protobuf message structure is partly exposed to the end user Internal event structure is completely hidden from the end user EicMC: Basically the whole collection of expected high-level calls is provided: GetNextEvent() event->GetParticleCount() event->GetParticle(i) particle->GetPx(), etc while event is automatically unpacked from a protobuf message in the background Feb,9 2017 A.Kiselev

12 Packaging ProMC EicMC Provided with a local copy of google protocol buffer software as well as a local copy of third party zipping library, etc Bare custom codes; expects google protocol buffer software (as well as optionally ROOT & eic-smear) to be pre-installed EicMC: Can be changed of course; but thats the today status Feb,9 2017 A.Kiselev 13 Sparsification and compression

ProMC EicMC Uses relatively simple event message layout almost without pre-processing; lets zlib do the compression job Uses a bit over-complicated event message layout with heavy (optional) sparsification; zlib compression is also optional EicMC: Can sparsify status code sequences, PDG entry sequences, 0.0 values (primary vertices in particular), duplicate (up to the sign) momentum component values, duplicate vertex coordinates, beam particles, etc Configurable zlib compression of multi-event chunks is possible (and is the default mode) on top of this pre-processing -> whether this complication is really needed remains a question; but it does not hurt (and also easy packing mode is still possible) Feb,9 2017 A.Kiselev 14 Performance Ideally would like to benchmark ROOT vs ProMC vs EicMC

Hard to compare apples to apples though: Which floating point precision was used? Was the file optimized for size or import (unpacking) speed? Was the file optimized for sequential or direct access? User code accesses all variables of the event record or only a few? Are MCEG-specific variables considered in comparison or not? EicMC (against ProMC, leave ROOT alone): Sparsification -> competitive unzipped file format flavor Possibility to merge several (small) events in a single gzip record Improves import speed at a cost of a certain file size increase Minimizes file size at a cost of direct access performance Feb,9 2017

A.Kiselev 15 Next steps Finalize validation process Upload codes to GitLab Optimize package configuration (CMake, etc)? Include few other converters (HepMC?) & usage examples Technically one can add other (non-MC) event types Tune for HepSim: file metadata, streaming, etc Tune for GEANT (multi-threading, etc)? Feb,9 2017 A.Kiselev 16

Recently Viewed Presentations

  • Chapter 1

    Chapter 1

    How does Geopolitics Help Us Understand the World? Ratzel's Organic State Theory (1897) PowerPoint Presentation Mackinder's Heartland Theory (1904) PowerPoint Presentation Mahan's Sea Power Theory (1890) PowerPoint Presentation Shatterbelt = an area of instability between regions with opposing political/cultural values.
  • The ATC Organisation

    The ATC Organisation

    It is not available to ATC WOs without previous senior NCO regular service. * * A useful overview of the size of regions and the location of each HQ. Watch for any changes of location and update Since the introduction...
  • Future Science Opportunities in Antarctica and the Southern

    Future Science Opportunities in Antarctica and the Southern

    Statement of Task (2) Comment on the broad logistical capabilities and technologies that, from a science delivery perspective, would need to be improved or require major changes
  • CS 415: Programming Languages Fortran Aaron Bloomfield Fall

    CS 415: Programming Languages Fortran Aaron Bloomfield Fall

    PRINT *,"The average = ",avg STOP END Demo program ! This program uses a function to find the average of three numbers. PROGRAM func_ave ! Type variables in main program (a, b, and c are local variables). REAL :: a,b,c,average...
  • French Detroit - Wayne State University

    French Detroit - Wayne State University

    Nadim El-Hage. Graduate of APHS - 2012. Took French all 4 years. Third Year at WSU. ... 1701 - Antoine de la Mothe Cadillac (Antoine Laumet) writes to French court about potential settlement (Detroit) ... French Detroit
  • Danny Tran, CSUCI Company Background Co-founded by Larry

    Danny Tran, CSUCI Company Background Co-founded by Larry

    Company Background. Co-founded by Larry Page and Sergey Brin while they were Ph.D students at Stanford University.Started out as a research project. Founded 1998. Went public in 2004. Headquartered in Mountain View, CA.
  • Presentación de PowerPoint

    Presentación de PowerPoint

    Dr. Primitivo Reyes Aguilar / Enero 2006 Tel. 58 83 41 67 / Cel. 044 55 52 17 49 12 Mail: [email protected] Contenido Introducción Despliegue de Seis Sigma en la empresa Gestión de procesos en la empresa Gestión de proyectos...
  • BlackBoard 5 A Definitive e-Learning Software Platform http://www.blackboard.com

    BlackBoard 5 A Definitive e-Learning Software Platform http://www.blackboard.com

    BlackBoard a course management system, customizable institution-wide portals, online campus communities, and an advanced architecture allowing easy integration of multiple administrative systems.