UCSD-2019: Technical Working Group: Data Management
- Identify key decisions that must be made (and justified) prior to CD-1/PDR,
- Make progress on (or actually make) those decisions,
- Lay out a timeline and process for making each decision, consistent with the post-decision work and internal reviews that will be needed to complete preparations for CD-1/PDR,
- Ensure that those timelines and processes are understood and supported by the collaboration, and that we (together) believe we can follow them.
- L2 Overview (Julian Borrill/Tom Crawford) slides
- Subsystem Management (Julian Borrill) slides
- Data Movement (Sasha Rahlin) slides
- Software Infrastructure (Ted Kisner) slides
- Data Synthesis (Sara Simon/Andrea Zonca) slides
- Data Reduction (Colin Bischoff/Reijo Keskitalo) slides
- Transients (Don Petravik/Nathan Whitehorn) slides (note conflict with Transients parallel)
- Site Hardware (Tom Crawford) slides
- Simulations for Flowdown (Sara Simon) slides
Big questions to answer here:
- What are we missing?
Note that the L2-level stuff, including the bi-weekly telecon, is all coordination and management; real work is done at L3 and below.
DM scope redefined as raw data coming off the telescopes to well-characterized "reduced data (maps, etc.)." (Used to be "well-characterized maps" but transients...)
- also responsible for mock data sets to support decisions in other WBSs.
DM transitions to operations in ~2026
- but there's a data challenge scheduled for 2027, should we change that? (MEM)
- maybe say DM "begins transition" to ops in 2026.
Discussion about boundary between roles of project DM (raw data to maps) and collaboration analyzers (maps to science).
What does "well-characterized" mean? (RS)
- Something we probably need to define better, along with analysis working groups.
Is there a document stating "at stage X in DOE/NSF project maturity, we need set Y of simulations"? (SH)
- No. There probably should be.
- A worry about asking AWGs what is needed is that they will say "everything," which is hard. (KH)
Do we really need HPC for anything? (JV)
- For things that care about interprocess communication (which is important for capturing some types of correlations).
- We are not the only people building in interoperability in HPC/HTC, so we should be able to piggyback. (SH)
Draft WBS expected next week.
DOE doesn't do transients, so... (GG)
- before he can finish, many people jump in with "yes it does"
- so resources come from both sides?
- hardware at Pole definitely from NSF
L3: Subsystem Management
Why are Pole and Atacama computing resources being crossed off? (ASR)
- because there's a new L3 for that.
Is there software for all of the Data Challenge stuff in place? (KH)
- much of it, yes, but not necessarily validated against all sites and instruments we want
If someone asked you "what actually needs to be simulated to get to Baseline Design," what would you say? (GG)
- hardest thing is going to be instrument non-idealities like beam details
L3: Data Movement
What is the current TDRSS coverage? (MEM)
- 4 hours/day, but the total bandwidth we are allocated is only ~125GB/day, which is a factor of at least 40 too low.
- What about Starlink and other commercial options? (TK)
- Looking into it.
- What sort of lossy compression has been investigated? (SH)
- Only downsampling. And sending back maps instead of TOD.
How much data traffic do you anticipate once the data is stored?
- Any such traffic (with real data) will be tiny compared to distributing sims.
What is the cost model for data movement? (GG)
- "FedEx" model is fully costed; buying more bandwidth is not.
L3: Software Infrastructure
What database options have you looked into? (SH)
- Depends on how big these metadata will be (partially depends on how transients go).