Difference between revisions of "Technology Development Telecon - (DM) Data Management"

From CMB-S4 wiki
Jump to navigationJump to search
 
(14 intermediate revisions by 3 users not shown)
Line 15: Line 15:
 
== Milestone ==
 
== Milestone ==
 
* October (combined call on Nov 2)
 
* October (combined call on Nov 2)
** Define baseline to compare impact of R&Ds to
+
** Define baseline to compare impact of TDs to
** Start populating list of R&D item and do evaluation on some entries
+
** Start populating list of TDs item and do evaluation on some entries
 
** Make modification/improvements to organization method if necessary
 
** Make modification/improvements to organization method if necessary
 
* November (combined call on Nov 30)
 
* November (combined call on Nov 30)
Line 22: Line 22:
 
** Draft list by end of month  
 
** Draft list by end of month  
 
** evaluation may not be complete by this time
 
** evaluation may not be complete by this time
* December (combined call on Dec 28)
+
* December (combined call on Jan 04)
 
** Sub-group list with priority evaluation by end of month such that we can start combining lists in 2018
 
** Sub-group list with priority evaluation by end of month such that we can start combining lists in 2018
 
* January (combined call on Feb 1, maybe more combined call)
 
* January (combined call on Feb 1, maybe more combined call)
Line 32: Line 32:
 
* March  
 
* March  
 
** CMB-S4 workshop at ANL: Present at CMB-S4 meeting
 
** CMB-S4 workshop at ANL: Present at CMB-S4 meeting
----
+
 
 +
 
 +
== CMB-S4 Technology Development Telecon 2018-2-9 Subgroup Meeting: Data Management ==
 +
 
 +
* '''Agenda'''
 +
** Review the [[Media: cmbs4dmtdwg_anl.pdf | draft slides]] for the Argonne Meeting
 +
 
 +
 
 +
== CMB-S4 Technology Development Telecon 2017-12-19 Subgroup Meeting: Data Management ==
 +
 
 +
* '''Agenda'''
 +
** Start filling in the Impact and T&D of the DM spreadsheet
 +
 
 +
* '''Notes'''
 +
** Present: Joy, Laura, Nathan, Ken, Christian
 +
** Group goes through the spreadsheet list and assigns names to each TD
 +
** The action item for each assignee for each item is to:
 +
*** Define/Estimate what is the current baseline
 +
*** Define what the TD goal would be and how it would improve the baseline
 +
*** Assign numbers (Impact, etc) to evaluate the TD
 +
** One difficulty mentioned by Ken that others agree is that many of these items don't really have a baseline yet
 +
** Group agrees that many of the "Science Analyses" items do not really belong to the DM group -- this group's goal is to make sure technology and software will be ready to analyse S4 data, but not to develop science (like improve lensing prediction). Because the data amounts once in the map domain do no change substantially from now to S4, this means for many of the science analyses tasks there is not much to do from this group's perspective. Main job in this area should be ensure standardisation/choice of a map pixelisation, data formats for exchanges (maps, power spectra, etc). We have greyed out areas we believe don't belong in the DM TD memo.
 +
 
 +
* '''Action Items'''
 +
** '''all''': For the items you have your names associated with, define the baseline, the possible TD, and start assigning an impact to the TD
 +
** '''all''': Have some progress done for Jan 4 combined call
 +
** '''all''': Next DM meeting, with spreadsheet filled, on Jan 12
 +
 
 +
 
 +
== CMB-S4 Technology Development Telecon 2017-11-17 Subgroup Meeting: Data Management ==
 +
 
 +
* '''Agenda'''
 +
** Continue to go over the DM tab of the TD spreadsheet (drawn from the CDT report) identifying any gaps, determining which elements do not require technology development per se, and starting to flesh out the items which do.
 +
 
 +
* '''Notes'''
 +
** Present: Joy, Ken, Colin, Nathan, Laura
 +
** Joy notes that "Impact" and "R&D" columns should start getting filled soon. DM group to define their own 1-5 impact scale and then normalization between groups.
 +
** Science Analysis/Foregrounds:
 +
*** Colin: This is an important issue already, work is already happening, and not a huge increase in data size foreseen. The increase in sensitivity means foreground cleaning will have to be better. This is not a huge T&D effort as work already being done
 +
*** Ken: Will there be an issue from different telescopes not measuring the same modes (ex: chile vs south pole) for foreground separations?
 +
*** Colin: we do that for BICEP/Keck with Planck data, this is accounted for in the covariance analysis. With very different modes, it will set a lower limit on foreground removal. Bottom line is that filtering the mode coverage has to be accounted for in the analysis.
 +
** Science/Results:
 +
*** General discussion about code standardisation: Laura mentions we should talk about standardisation across maps, polarization direction etc. Joy agrees and more generally thinks a BIG technology development effort for S4 is getting many people to use same code, within one or two max pipelines, etc.
 +
*** Verification & Validation: Colin says we need to build some standard for validation of pipeline code across S4, that doesn’t just rely on ‘it worked in the past’. Build some unit testing. General discussion of one vs more pipelines. If two pipelines exist, they should have component checks along the way so that various stages can be compared.
 +
*** Null tests: There should be some form of standardisation of null tests across telescopes, even if they won't be performing exactly the same ones (beam size, location). Discussion of blind analysis. It was done on QUIET, to some extend on ACT (less for power spectrum but yes for B-modes). Colin advocates methodical as more important than blind. Collaboration will need to way in. But we DON'T want to operate on a mode where analysis is fine tuned to produce desired bias. Standardisation of null tests at the telescope-map level (though different location/aperture sizes might need different things). Null test need to be performed on the shortest time scales too.
 +
*** Theory/Modeling: On Planck, computation of the PS had to be improved to match the accuracy of measurements. Similarly, maybe improvements on lensing and neutrinos calculations may be necessary.
 +
*** Feedback: Need to have simulation pipelines integrated enough with the TOD2Map pipelines that can iterate on improving experiment models. Colin work that currently Lloyd's simulation telecons are iterating on experiment models (bandpasses, beam size) and treating map-based systematics in a generic maner, putting constraints on overall additive systematic effects, etc.
 +
** Simulations:
 +
*** General: Lloyd telecons: already a structure in place to start doing map-based simulations and forecasting: simulated maps using simple prescriptions for the noise, not time ordered data, using healpix, nside 512, putting it on nersc, etc. Origin of maps from various places, made available to everyone and then people use their own tools to make analysis.
 +
*** Time Ordered Data simulations: this is probably where the largest development effort is necessary, to bring hardware and software up to date to deal with large amount of increased data. This is a lot of work, we also need to figure out to what extent and for what exactly we want TOD data simulations.
 +
*** Instrument Systematics: Colin advocate first generic simulations can put constrains on overall additive systematics, band centers, gain constraints. Can then be guidance for the instruments and real life problems. Can move from generic to specific as the instrument design progresses
 +
** Discussion of how analysis fits in the T&D work.
 +
*** Nathan point out there is some technology items (like computing resources)
 +
*** Laura points out DOE does have 'sections' for exactly the work we're doing: data management / project management, scaling of software
 +
*** Joy says the question is what are the areas of software / algorithms development that need to be highlighted and worked on so that the analysis of S4 data is possible and organized in a coherent flow.
 +
 
 +
* '''Action Items'''
 +
** '''all''': Meet in two weeks Dec 1 11am PT
 +
 
  
 
== CMB-S4 Technology Development Telecon 2017-11-10 Subgroup Meeting: Data Management ==
 
== CMB-S4 Technology Development Telecon 2017-11-10 Subgroup Meeting: Data Management ==
Line 41: Line 99:
 
****November (combined call on Nov 30)
 
****November (combined call on Nov 30)
 
**** Continue to populate and evaluate list of R&D items as sub-group
 
**** Continue to populate and evaluate list of R&D items as sub-group
**** Draft list by end of month
+
**** Draft list by end of month -- evaluation may not be complete by this time
evaluation may not be complete by this time
 
 
** Set the telecon frequency/schedule
 
** Set the telecon frequency/schedule
  
 
* '''Notes'''
 
* '''Notes'''
** Present: Julian B, Joy D, Ken G, Nathan W, Graca R,  
+
** Present: Julian B, Joy D, Ken G, Nathan W, Graca R, Yuji C.
 
+
** Computation Resources -- not a technological development. We will use available DOE/NSF resources, and probably buy ~$100k of computers / special hardware, but there is not money to build super computer for
* Computation Resources -- not a technological development. We will use available DOE/NSF resources, and probably buy ~$100k of computers / special hardware, but there is not money to build super computer for
+
** DAQ: TD effort is technology dependant, for example for uMux there is stuff to do. That might be on the Readout group side though
* DAQ: TD effort is technology dependant, for example for uMux there is stuff to do. That might be on the Readout group side though
+
** Compression might need some RD -- currently using FLAC for SPT3G but it's limited, can't do more than 24 bit samples
* Compression might need some RD -- currently using FLAC for SPT3G but it's limited, can't do more than 24 bit samples
+
** Transmission: issue for South Pole. Real time can send optimally compressed / downsampled data. Question of how many computers you want there.
* Transmission: issue for South Pole. Real time can send optimally compressed / downsampled data. Question of how many computers you want there.
+
** On site storage: very unlikely to have the resources to keep the entire dataset on spin
* On site storage: very unlikely to have the resources to keep the entire dataset on spin
+
** Time Domain:
* Time Domain:
+
*** Live Monitoring: huge data rate, probably current live monitors are not going to scale. Can look at LSST-type alerts. You have to plan all kind of hierarchal observations plan from daily to weekly to monthly. Develop something to catch issues early, some checks are computer intensive, some are not. Computer intensive one: have to figure out how to do that at South Pole given limited transmission bandwidth. Risk is 'small' (i.e. path forward is relatively clear), but lots of work to be done.
** Live Monitoring: huge data rate, probably current live monitors are not going to scale. Can look at LSST-type alerts. You have to plan all kind of hierarchal observations plan from daily to weekly to monthly. Develop something to catch issues early, some checks are computer intensive, some are not. Computer intensive one: have to figure out how to do that at South Pole given limited transmission bandwidth. Risk is 'small' (i.e. path forward is relatively clear), but lots of work to be done.
+
*** Pre-processing: framework needs to be compute and human efficiency (lots of data, framework common to a very large collaboration now). This needs a significant developmen effort. This will partly come from developing simulations to assess systematic errors as a precursor to systematic mitigation.
** Pre-processing: framework needs to be compute and human efficiency (lots of data, framework common to a very large collaboration now). This needs a significant developmen effort. This will partly come from developing simulations to assess systematic errors as a precursor to systematic mitigation.
+
*** Map making: work to be done to scale to S4 data. Characterization with covariance matrices, monte carlo etc will be a big piece of it.
** Map making: work to be done to scale to S4 data. Characterization with covariance matrices, monte carlo etc will be a big piece of it.
+
** Science:
 +
*** Foregrounds: dealing with the large dataset issue is already done by maps, foreground algorithms are going to progressively get better with S3, so changes in algorithm seem to be incremental (unless we need to deal with simulations right now)
  
 +
* '''Action Items'''
 +
** '''all''': Meet next week Nov 17 11am PT
 +
** '''all''': look through the spreadsheet and starting thinking about filling the "Impact" and "R&D" columns
 +
** '''Joy''' : check with Toki when columns need to be filled + what is the meaning of the 1-5 numbers
  
 
* '''Action Items'''
 
  
 
== CMB-S4 Technology Development Telecon 2017-10-05 Subgroup Meeting: Data Management ==
 
== CMB-S4 Technology Development Telecon 2017-10-05 Subgroup Meeting: Data Management ==

Latest revision as of 03:05, 9 February 2018

Charge to CMB-S4 Technology Development Working Group

In the first edition of the Technology Book, the experimental CMB community summarized the current state of CMB technology and evaluated its current technical readiness with a 5-level Technology Status Level (TSL) and manufacturing readiness with a 5-level Production Status Level (PSL). For each technology, we identified Technology Development (TD) efforts necessary to advance it for possible use in CMB-S4. As a next step of the collaborative community wide effort, the CMB-S4 TD prioritization working group will evaluate TD topics based on impacts they have on cost, schedule, and science return. By the time of the Argonne meeting (March 2018) the working group will produce a prioritized list of the TD topics that the community should pursue to ensure timely maturity of technologies that will enable the successful advancement of the project.

We have grouped the relevant technologies into the following areas to tackle this immense task: Telescope and Site; Cryogenics, Cryostats and Optics; Detectors and Readout; and Data Management. Calibration of evaluation metrics across the subgroups is important for fair comparison of the TD topics. In addition, many TD topics are inter-dependent. To capture these ideas, the overall working group will communicate across all the subgroups in monthly combined group meetings.

The scope of each subgroup is as follows:

  • Telescope and site: Covers telescope, mount, site, power generation, etc...
  • Cryogenics, cryostats and optics: Covers cryogenics (4K and mK), cryostats, windows, filters, lenses, HWP etc...
  • Detector and readout: Covers detector (detector array and holder) and readout (warm/cold), etc...
  • Data management: Covers DAQ, data transfer, simulation, analysis, publication, etc...

Tab-Separated Table

https://docs.google.com/spreadsheets/d/101ncyzfDAHrTF9O0rTPRGbX6dqut_WdfVA7-2ck9qGQ/edit?usp=sharing

Milestone

  • October (combined call on Nov 2)
    • Define baseline to compare impact of TDs to
    • Start populating list of TDs item and do evaluation on some entries
    • Make modification/improvements to organization method if necessary
  • November (combined call on Nov 30)
    • Continue to populate and evaluate list of R&D items as sub-group
    • Draft list by end of month
    • evaluation may not be complete by this time
  • December (combined call on Jan 04)
    • Sub-group list with priority evaluation by end of month such that we can start combining lists in 2018
  • January (combined call on Feb 1, maybe more combined call)
    • Start normalization/ combine lists from different groups
    • Draft of combined list by end of month
  • February (combined call on March 1, maybe more combined call)
    • Modify/fine tune combined list
    • Discuss on what we’ll show at CMB-S4 workshop at ANL
  • March
    • CMB-S4 workshop at ANL: Present at CMB-S4 meeting


CMB-S4 Technology Development Telecon 2018-2-9 Subgroup Meeting: Data Management


CMB-S4 Technology Development Telecon 2017-12-19 Subgroup Meeting: Data Management

  • Agenda
    • Start filling in the Impact and T&D of the DM spreadsheet
  • Notes
    • Present: Joy, Laura, Nathan, Ken, Christian
    • Group goes through the spreadsheet list and assigns names to each TD
    • The action item for each assignee for each item is to:
      • Define/Estimate what is the current baseline
      • Define what the TD goal would be and how it would improve the baseline
      • Assign numbers (Impact, etc) to evaluate the TD
    • One difficulty mentioned by Ken that others agree is that many of these items don't really have a baseline yet
    • Group agrees that many of the "Science Analyses" items do not really belong to the DM group -- this group's goal is to make sure technology and software will be ready to analyse S4 data, but not to develop science (like improve lensing prediction). Because the data amounts once in the map domain do no change substantially from now to S4, this means for many of the science analyses tasks there is not much to do from this group's perspective. Main job in this area should be ensure standardisation/choice of a map pixelisation, data formats for exchanges (maps, power spectra, etc). We have greyed out areas we believe don't belong in the DM TD memo.
  • Action Items
    • all: For the items you have your names associated with, define the baseline, the possible TD, and start assigning an impact to the TD
    • all: Have some progress done for Jan 4 combined call
    • all: Next DM meeting, with spreadsheet filled, on Jan 12


CMB-S4 Technology Development Telecon 2017-11-17 Subgroup Meeting: Data Management

  • Agenda
    • Continue to go over the DM tab of the TD spreadsheet (drawn from the CDT report) identifying any gaps, determining which elements do not require technology development per se, and starting to flesh out the items which do.
  • Notes
    • Present: Joy, Ken, Colin, Nathan, Laura
    • Joy notes that "Impact" and "R&D" columns should start getting filled soon. DM group to define their own 1-5 impact scale and then normalization between groups.
    • Science Analysis/Foregrounds:
      • Colin: This is an important issue already, work is already happening, and not a huge increase in data size foreseen. The increase in sensitivity means foreground cleaning will have to be better. This is not a huge T&D effort as work already being done
      • Ken: Will there be an issue from different telescopes not measuring the same modes (ex: chile vs south pole) for foreground separations?
      • Colin: we do that for BICEP/Keck with Planck data, this is accounted for in the covariance analysis. With very different modes, it will set a lower limit on foreground removal. Bottom line is that filtering the mode coverage has to be accounted for in the analysis.
    • Science/Results:
      • General discussion about code standardisation: Laura mentions we should talk about standardisation across maps, polarization direction etc. Joy agrees and more generally thinks a BIG technology development effort for S4 is getting many people to use same code, within one or two max pipelines, etc.
      • Verification & Validation: Colin says we need to build some standard for validation of pipeline code across S4, that doesn’t just rely on ‘it worked in the past’. Build some unit testing. General discussion of one vs more pipelines. If two pipelines exist, they should have component checks along the way so that various stages can be compared.
      • Null tests: There should be some form of standardisation of null tests across telescopes, even if they won't be performing exactly the same ones (beam size, location). Discussion of blind analysis. It was done on QUIET, to some extend on ACT (less for power spectrum but yes for B-modes). Colin advocates methodical as more important than blind. Collaboration will need to way in. But we DON'T want to operate on a mode where analysis is fine tuned to produce desired bias. Standardisation of null tests at the telescope-map level (though different location/aperture sizes might need different things). Null test need to be performed on the shortest time scales too.
      • Theory/Modeling: On Planck, computation of the PS had to be improved to match the accuracy of measurements. Similarly, maybe improvements on lensing and neutrinos calculations may be necessary.
      • Feedback: Need to have simulation pipelines integrated enough with the TOD2Map pipelines that can iterate on improving experiment models. Colin work that currently Lloyd's simulation telecons are iterating on experiment models (bandpasses, beam size) and treating map-based systematics in a generic maner, putting constraints on overall additive systematic effects, etc.
    • Simulations:
      • General: Lloyd telecons: already a structure in place to start doing map-based simulations and forecasting: simulated maps using simple prescriptions for the noise, not time ordered data, using healpix, nside 512, putting it on nersc, etc. Origin of maps from various places, made available to everyone and then people use their own tools to make analysis.
      • Time Ordered Data simulations: this is probably where the largest development effort is necessary, to bring hardware and software up to date to deal with large amount of increased data. This is a lot of work, we also need to figure out to what extent and for what exactly we want TOD data simulations.
      • Instrument Systematics: Colin advocate first generic simulations can put constrains on overall additive systematics, band centers, gain constraints. Can then be guidance for the instruments and real life problems. Can move from generic to specific as the instrument design progresses
    • Discussion of how analysis fits in the T&D work.
      • Nathan point out there is some technology items (like computing resources)
      • Laura points out DOE does have 'sections' for exactly the work we're doing: data management / project management, scaling of software
      • Joy says the question is what are the areas of software / algorithms development that need to be highlighted and worked on so that the analysis of S4 data is possible and organized in a coherent flow.
  • Action Items
    • all: Meet in two weeks Dec 1 11am PT


CMB-S4 Technology Development Telecon 2017-11-10 Subgroup Meeting: Data Management

  • Agenda
    • Go over the DM tab of the TD spreadsheet (drawn from the CDT report) identifying any gaps, determining which elements do not require technology development per se, and starting to flesh out the items which do.
      • Next Milestone:
        • November (combined call on Nov 30)
        • Continue to populate and evaluate list of R&D items as sub-group
        • Draft list by end of month -- evaluation may not be complete by this time
    • Set the telecon frequency/schedule
  • Notes
    • Present: Julian B, Joy D, Ken G, Nathan W, Graca R, Yuji C.
    • Computation Resources -- not a technological development. We will use available DOE/NSF resources, and probably buy ~$100k of computers / special hardware, but there is not money to build super computer for
    • DAQ: TD effort is technology dependant, for example for uMux there is stuff to do. That might be on the Readout group side though
    • Compression might need some RD -- currently using FLAC for SPT3G but it's limited, can't do more than 24 bit samples
    • Transmission: issue for South Pole. Real time can send optimally compressed / downsampled data. Question of how many computers you want there.
    • On site storage: very unlikely to have the resources to keep the entire dataset on spin
    • Time Domain:
      • Live Monitoring: huge data rate, probably current live monitors are not going to scale. Can look at LSST-type alerts. You have to plan all kind of hierarchal observations plan from daily to weekly to monthly. Develop something to catch issues early, some checks are computer intensive, some are not. Computer intensive one: have to figure out how to do that at South Pole given limited transmission bandwidth. Risk is 'small' (i.e. path forward is relatively clear), but lots of work to be done.
      • Pre-processing: framework needs to be compute and human efficiency (lots of data, framework common to a very large collaboration now). This needs a significant developmen effort. This will partly come from developing simulations to assess systematic errors as a precursor to systematic mitigation.
      • Map making: work to be done to scale to S4 data. Characterization with covariance matrices, monte carlo etc will be a big piece of it.
    • Science:
      • Foregrounds: dealing with the large dataset issue is already done by maps, foreground algorithms are going to progressively get better with S3, so changes in algorithm seem to be incremental (unless we need to deal with simulations right now)
  • Action Items
    • all: Meet next week Nov 17 11am PT
    • all: look through the spreadsheet and starting thinking about filling the "Impact" and "R&D" columns
    • Joy : check with Toki when columns need to be filled + what is the meaning of the 1-5 numbers


CMB-S4 Technology Development Telecon 2017-10-05 Subgroup Meeting: Data Management

  • Agenda
    • Charge to the group
    • CDT report
    • Schedule & milestones
  • Notes
    • Present: Colin B, Julian B, Yuji C, Joy D, Salman H
    • Apologies: Laura N
    • Charge:
      • DM is a bit different from the other areas, and couples to them (eg. in experiment modeling, design validation, systematics mitigation, ... )
    • CDT report:
      • Julian walked people through the current draft of the DM section of the CDT report, inviting comments and additions
        • Instrument Data (acquisition, transmission, storage)
        • Time Domain (live monitoring, pre-processing, map-making)
          • Time domain processing will be different for different telescopes (eg. with or without HWP), but should still all run within a common framework.
          • Live monitoring needs to combine data from multiple telescopes
        • Science Analysis (foregrounds, results, feedback)
          • Analysis complexity is increased by having multiple instruments of multiple types, as well as the hybrid scanning strategy.
          • Feedback can also inform changes in the instrument configuration and/or scanning strategy.
        • Simulations (experiment modeling, sky modeling, data generation)
        • Publication (data products, software tools, archiving)
          • Need to include internal data distribution; data and software standards.
        • Computational Resources (science data facility)
    • Schedule & milestones
      • Doodle poll still open and active; discussion about telecon frequency (tbd)
      • Milestones:
        • Develop a comprehensive list of DM requirements coupled to the project timeline.
        • Determine the TD required for these.
        • Prioritize this TD based on its impact on risk, cost, and science return.
        • Take this prioritized list to the full group for integration.
  • Action Items
    • Julian to update CDT report to include the comments here (done).
    • Circulate telecon time once doodle polling is complete; next telecon is week of October 16th.