November 30, 2020 - Colin Bischoff, Julian Borrill, Reijo Keskitalo, Ted Kisner, Sara Simon and Andrea Zonca
Update on December 7, 2020
The simulations used in deriving the results below included a simulated 1% calibration error for each detector and each scanset. This error induces a T->P leakage mode that is responsible for most of the mismatch between the simulation and the observation matrix treatment. To counter the effect of the gain mismatch, we implemented a pair-differencing operation in the S4 simulation pipeline and added deprojection support into OpFiltereBin. The only deprojection template needed to counter the gain mismatch is the input temperature map itself.
Here are the resulting difference maps (filter+bin vs. observation matrix) with the gain mismatch deprojected:
The temperature difference remains unchanged as the deprojection only applies to the pair-difference. Here are the pseudo spectra:
Comparing to the earlier pseudo spectra, we find minimal differences in the pipeline transfer function but a significant reduction in the filtered vs. observation matrix difference, despite the simulation including gain mismatch.
To date, CMB-S4 time-domain simulations have used pipelines build from the simulation and reduction modules available in the TOAST framework. A shortcoming of these simulations has been the absence of a BICEP/Keck-style "observation matrix" that can be used to transform input signal maps into output maps with the appropriate mode loss from time domain filtering.
We have developed a new TOAST operator, OpFilterBin, that applies a stack of filters to the TOD and bins a map. OpFilterBin can also calculate and output an observation matrix consistent with the filter stack. In this post we test OpFilterBin on a 10-day reference design tool simulation to establish the accuracy and performance of the new operator.
The TOAST filter-and-bin map is computed as
where is the pointing matrix, is the diagonal white noise matrix, is the filtering matrix:
and is the template matrix. Each column of is a template that gets projected out from the TOD. For the time being it consists of Legendre polynomials representing the ground and atmosphere. BICEP/Keck-style deprojection templates will be added in the future.
Assuming that the input and output pixelizations are the same, the observation matrix that corresponds to (1) and (2) is
Introduction of noise weighting () in the filtering matrix allows us to mask or down-weight certain samples in filtering while still retaining them in the filtered signal. This is particularly useful for bright point sources.
Note that in this formalism all of the filtering is applied in a single step, rather than in sequence. This avoids introducing previously-filtered modes back into the TOD in subsequent filtering steps when the templates are not orthogonal. It does impose a requirement on the template covariance matrix, : it has to be invertible. This means, for example, that the same polynomial orders should not be present in both the ground and half-scan templates.
As an illustration of the template degeneracy, we used the reference design simulation and the half scan filter order 5. Ground filter order was set to 10. If any of the ground polynomial orders between 0 and 5 are included in the template matrix, the reciprocal condition number of the template covariance matrix is less than . If we only consider ground polynomial orders above 5, the reciprocal condition number is better than .
We simulate 10 full observations of the SAT deep field from the South Pole. The simulated TOD only comprises scalar CMB fluctuations. As in the reference design simulations, each scanset takes 30 minutes and there are 0.25 degree elevation steps between the scan sets. We simulate a sparsely-sampled ST0 optics tube: MFLS1 band with 890 detectors covering a 36-degree FOV. Each half-scan is filtered with a 5th order polynomial and each scanset is filtered with a 10th degree ground polynomial. The sampling rate is 20Hz, and there are a total of 210 scansets in the simulation.
Here we show four maps centered at the same coordinates. "Input map" contains scalar CMB fluctuations and the solar system dipole. "Binned" shows the result of sampling the input map into time domain and binning it. "Filtered" applies the same binning after first regressing out the polynomial templates. "Obs.matrix X input" shows how our observation matrix models the filtering.
Here we look at the difference, "filtered" - "Obs.matrix X input", in IQU:
For a more comprehensive analysis, here are the pseudospectra of the maps from above. For all maps except "input", we are inverse variance weighting the pixels based on the Q and U noise level. The "input" spectrum is full sky. The "filtered" and "obs.matrix" curves are almost on top of each other, as is apparent from the difference map ("obs.matrix-filtered") power spectrum. E-B mode mixing from incomplete sky coverage is very apparent in the "binned" BB spectrum.
- It takes roughly 400 seconds to process one scanset for one detector using 4 light-weight KNL cores.
- Most of the time is spent accumulating the observation matrix.
- For our proof-of-concept test we allocated 1024 Cori KNL nodes, each with 68 KNL cores.
- It took 150 minutes to process the entire simulation into a filtered map and the accompanying observation matrix.
- Combining and writing out the observation matrix took 38 minutes. This part is likely very sub-optimal with the co-add running serially using the scipy sparse matrix facilities.
- The accumulation cost will scale linearly with scansets and detectors and by the square of the scanset length.
- The matrix co-add cost will scale with the logarithm of the MPI communicator size and depends also on the exact distribution of observed pixels.
- Naive calculation would suggest that processing one year of data for three fully-populated optics tubes using the same 1024 nodes would take 34 days or 65M NERSC hours (approximately one years' allocation for the entire CMB repository in 2020). However, the calculation is highly redundant and the observation matrices will only need to be calculated using a small subset of the data.
- Processor architecture matters. A single Haswell core performs the same scanset calculation as 4 KNL cores in the same 400 seconds. That makes the throughput of a Cori Haswell node (32 cores) double the throughput of a KNL node (68 cores) for this calculation. The calculation is also well-suited for the GPU.