# LBNL-2016: Simulations & Data Analysis

## Contents

## Simulations & Data Analysis Parallel Session: Monday, March 7, 10:45 AM - 11:45 AM

**Session Organizers**: Tom Crawford & Julian Borrill

**Chapter Section Coordinators:**

- Time-Ordered Data Processing: Julian Borrill
- Component Separation: Josquin Errard
- Statistics & Parameters: Tom Crawford & Cora Dvorkin
- Sky Modeling: Nick Battaglia & Jacques Delabrouille
- Data simulation: Julian Borrill
- Forecasting: Jo Dunkley & John Kovac
- Validation & Verification: Julian Borrill
- Implementation Issues: Julian Borrill

(with many thanks to all of the other people who contributed to this draft)

**Charge For This Session**

What we are charged with achieving in this session is, according to the workshop wiki page (https://cosmo.uchicago.edu/CMB-S4workshops/index.php/LBNL-2016:_Cosmology_with_CMB-S4): "The goal for each of these sessions it to determine missing content and outstanding questions for their chapter. It is particularly important to address what they need from the larger community. During the Plenary session in the afternoon, the drafters will present their findings with emphasis on the issues for which community input is sought."

With that in mind, we ask that each section coordinator be prepared to discuss (briefly! we only have 10 minutes per section) the 1 or 2 questions that define the scope of your section, in particular questions that are still open after the writing of this draft of the chapter (and questions that require input from the larger meeting, so that we can bring those up in the plenary).

Please gather those questions---and any supporting figures or other material (but no big slide presentations, please)---in the sections below.

## Rapid-fire Sessions (1 per chapter section)

**Time-Ordered Data Processing:**

- Multi-site/multi-telescope pre-processing challenges - should we be targeting a common data model after pre-processing?
- Can we produce cleaned, calibrated, timelines as inputs for both map-making and noise/systematics residuals estimation?
- Help needed on pre-processing & mission characterization subsection.

Notes:

- what data representation(s) can/will we use/deliver? monte carlo, pixel matrix, harmonic domain?
- different representations for different challenges
- dependence on sky fraction and resolution

- how to keep track of missing modes?
- TOD sims? pixel mixing matrices? simple 2d fourier transfer functions with some information about higher-order correlations?
- is this a place where large- and small-aperture platforms split?

- is "mission characterization" part of TOD processing for ground-based instruments?

**Component Separation:**

- multi-site data - various resolutions leading to a “degraded” resolution for the final CMB map; common data pre-processing is required.
- pixel vs harmonic domain - all methods should work independently of this choice, but computational needs should be checked.
- Q/U vs E/B component separation - implications should be checked on simulations
- atmospheric residuals - although unpolarized, atmosphere scales with frequency in a similar way as a grey body.

Notes:

- time-dependent atmosphere
- representation of missing modes: is there a compact basis?
- do we need cmb maps? can we simply estimate parameters from frequency maps?
- large angular scales are hard
- cross-check of cross-frequency analyses
- higher order statistics
- always a compromise ... do both & test for consistency

- component separation must account for bandpass mismatch and uncertainty, relative calibration

**Statistics & Parameters:**

File:Dvorkin Statistics and Parameters CMBS4 March2016.pdf

Outstanding questions / What do we need from the community:

- For Science Book writing:

- How to coordinate with the discussions of parameter-constraint-related issues in other sections (like de-lensing)?

- For general planning (in no particular order)

- Covariance approximation sufficiency (Are there more covariance issues we haven't thought about? What about combining with optical?)
- At what stage do we combine data sets? (This may be treated in other sections too.)
- Do we have a plan for E/B separation? Will this be done the same way in every data set?
- What is the path toward more realistic de-lensing estimates?

Notes:

- how/when do we combine data from different platforms? want to avoid separation by legacy
- how to provide precision covariance estimates: analytic, monte carlo, ...
- survey design accounting for differences in platform (sky coverage, multipole range, ... )
- where do the simulations break for high-resolution/high-sensitivity delensing?

**Sky Modeling:**

*Galactic foregrounds*

- Can we agree on nominal model sims and set of extensions that include more complex foreground behaviour?
- Example of new code created for this purpose: https://github.com/bthorne93/PySM
- Is there a planned release of PSM soon?

Notes:

- How do we capture the foreground complexity needed to test our methods?
- We won't know the answer to this until we have more data.
- Can we at least span the optimistic/pessimistic range?

- What is the optimization over atmospheric windows?
- What bandwidths? Overlapping bands ... can fit ~7 bands with 20-30% bandwidth between 30-300GHz.

- Given the complexity of large scales, what scale should CMB-S4 be targeting?
- Additional components not currently in models that could be show-stoppers
- Hard to implement and span possibilities
- Foregrounds suck!

*Secondaries and extragalactic foregrounds*

- The key challenges for the extragalactic sky models of CMB-S4 are to provide fast and self-consistent simulations of CMB secondary anisotropies and extragalactic sources.
- Modular approach with many groups contributing.

Questions:

- What format? HEALPIX?
- Reasonably well in hand, but when do we need this by? Deadlines are always good ...

Notes:

- Not all foregrounds suck!

**Data Simulation**

- Prioritization of effects to be introduced
- instrument: beams, bandpasses, auto- & cross-correlation noise, ... ; observation: scanning strategy, modulation, atmosphere, ground pickup, ...
- (How) can we achieve consistency of their representations across domains (time/pixel/multipole)?

- What is the requirements timeline (see forecasting, v&v), including the cost/complexity vs realism trade-off?

**Forecasting:**

- Awareness that forecasting for r versus higher-ell TT/TE/EE/kk parameters v kSZ/tSZ parameters is done differently
- How many different experiments set-ups should we consider? (bands, sensitivity, sky area)
- How to define "units of effort" to judge tradeoffs?

- What noise model should we use? (can we draw on Stage 2/3 expts) What systematics assumptions?
- What foreground model(s) should be used at low-ell for r and at high-ell for lensing etc (and can we share with other experiments)?
- Can we agree on role of map-level "mocks" to validate forecasts?

- What shall we assume about delensing residuals for r?
- Can we conclude whether we need <40 and >270 GHz data?
- What is the forecasting requirements timeline?

**Validation & Verification**

- Can we produce standardized data sets for V&V? Can these overlap with forecasting?

**Implementation Issues:**

- (Evolving) computing resources & requirements
- Common data interfaces, objects & formats
- Sky & mission models
- Maps!

**Notes on previous three topics**

- really, what is timeline? when do we need final science book? (answer: not clear)
- some consensus that any Fisher / power-spectrum-based forecasting must be validate-able by map-based mocks
- is there an adequate map-based representation of systematics normally treated with TOD sims?
- how to represent noise?
- N(\ell), validated by real noise maps

- do we know the computing requirements?
- not yet, but data volume is roughly 1000x Planck

## Plenary

**Most pressing question from this chapter: Forecasting definition & timeline.**

- Forecasting group, science groups, and instrument group need to agree on input parameters to forecasting and figures of merit that forecasts should return.
- All of these are likely to evolve, requiring more sophistication as the project matures.
- In particular, are we looking at power spectrum-domain Fisher matrix estimates or something more realistic?
- For the near term, probably spectrum-based Fisher codes validated by (some) map-level sims.

- How can we be sure our foreground modeling is sufficiently flexible that we are not surprised by a foreground issue after we design the experiment?
- What priors do we use in forecasting?
- tau in particular
- part of framework definition (define several prior choices that individual forecasting codes can choose from)

**Forecasting Timeline Strawman**

https://docs.google.com/spreadsheets/d/1LoSuxKYGeSeGoDN4I20-P2iHVyjMqAEkis9K5Dr8Dz4/edit#gid=0

**Longer-timescale but still very important questions:**

- How and at what stage to combine data from different platforms?
- How to account for modes that some platforms don't have (due to beam or anisotropic filtering/projection)?

- How to model / clean foregrounds and properly account for residuals in parameter estimation?
- map- or harmonic-space-based component separation vs. power spectrum?
- if foreground distributions aren't Gaussian and stationary, how can power spectrum be adequate?

- map- or harmonic-space-based component separation vs. power spectrum?