UCSD-2019: Technical Working Group: Data Management
Contents
Charge
- Identify key decisions that must be made (and justified) prior to CD-1/PDR,
- Make progress on (or actually make) those decisions,
- Lay out a timeline and process for making each decision, consistent with the post-decision work and internal reviews that will be needed to complete preparations for CD-1/PDR,
- Ensure that those timelines and processes are understood and supported by the collaboration, and that we (together) believe we can follow them.
Agenda
- L2 Overview (Julian Borrill/Tom Crawford) slides
- Subsystem Management (Julian Borrill) slides
- Data Movement (Sasha Rahlin) slides
- Software Infrastructure (Ted Kisner) slides
- Data Synthesis (Sara Simon/Andrea Zonca) slides
- Data Reduction (Colin Bischoff/Reijo Keskitalo) slides
- Transients (Don Petravik/Nathan Whitehorn) slides (note conflict with Transients parallel)
- Site Hardware (Tom Crawford) slides
- Simulations for Flowdown (Sara Simon) slides
Remote attendance
Notes
Intro
Big questions to answer here:
- What are we missing?
Note that the L2-level stuff, including the bi-weekly telecon, is all coordination and management; real work is done at L3 and below.
DM scope redefined as raw data coming off the telescopes to well-characterized "reduced data (maps, etc.)." (Used to be "well-characterized maps" but transients...)
- also responsible for mock data sets to support decisions in other WBSs.
DM transitions to operations in ~2026
- but there's a data challenge scheduled for 2027, should we change that? (MEM)
- maybe say DM "begins transition" to ops in 2026.
Discussion about boundary between roles of project DM (raw data to maps) and collaboration analyzers (maps to science).
What does "well-characterized" mean? (RS)
- Something we probably need to define better, along with analysis working groups.
Is there a document stating "at stage X in DOE/NSF project maturity, we need set Y of simulations"? (SH)
- No. There probably should be.
- A worry about asking AWGs what is needed is that they will say "everything," which is hard. (KH)
Do we really need HPC for anything? (JV)
- For things that care about interprocess communication (which is important for capturing some types of correlations).
- We are not the only people building in interoperability in HPC/HTC, so we should be able to piggyback. (SH)
L3: Transients
Draft WBS expected next week.
DOE doesn't do transients, so... (GG)
- before he can finish, many people jump in with "yes it does"
- so resources come from both sides?
- hardware at Pole definitely from NSF
L3: Subsystem Management
Why are Pole and Atacama computing resources being crossed off?
- because there's a new L3 for that.
Is there software for all of the Data Challenge stuff in place?
- much of it, yes, but not necessarily validated against all sites and instruments we want