Fast, software-based muon tagging with FlashCam

In the context of our involvement in the Cherenkov Telescope Array (CTA), we are currently in the process of finalising the firmware of our readout system of the FlashCam cameras destined to be installed in the middle-sized CTA-South telescopes. One question that came up was whether we could somehow support the quick identification of muon rings in triggered images – and it turns out we can reduce the amount of data the server has to look at by a factor of >100 just by adding a few bits to our event headers.

Relevance of muon rings for IACT calibration

Muon rings are caused by relativistic local muons passing through the telescope mirror and leaving a nearly instantaneous, ring-like pattern of Cherenkov photons on the camera surface. Because the expected photon density in these rings can be predicted from the measured diameter (which is related to the muon energy), muon rings have been essential in improving the calibrationFor more information I can recommend this nicely systematic review of the muon calibration method and its uncertainties. of imaging atmospheric Cherenkov telescopes: they can be used to calibrate the full optical throughput of the telescope (mirrors, camera entrance window, light concentrators, photosensors) with a wavelength spectrum that is very close to what is collected from the gamma-ray showers of actual interest (especially when using a protective window with a UV cut-off).

Full rings are rare

Measuring and fitting the shape and total charge (in terms of photo-electrons) of such a ring—and comparing that to a model—will yield a good estimate of the optical throughput. However, only a subset of the muons are truly good calibration sources: those that are nearly on-axis (parallel to the telescope pointing) and hitting the mirror at low impact distance (close to the mirror centre). If too far off this ideal, considerable asymmetries evolve, rings turn into arcs, and part of the image may be cut off at the edge of the camera’s field of view. Hence, for calibration purposes we are mainly interested in full rings, which arrive at a rate of just a few Hz.For arrays with one or more large telescopes such as H.E.S.S., muon arcs may be a useful tool for hadron rejection though. This is still good enough to get a precise estimate of the optical throughput on the scale of tens of minutes—as long as we make sure to actually catch most of the muons, which is not trivial.

The benefit of a large, multi-telescope system such as CTA is that it allows you to set the trigger threshold of each telescope low, close to the noise level, and separate chance coincidences (noise) from actual showers by requiring a multiplicity within a small time window. This coincidence requirement is one of the most powerful ways to enhance the signal-to-noise ratio—but at the same time kills many of the clean, local muons you would measure in only one telescope. So how do we keep them for calibration purposes?

Software-based muon tagging

Luckily, the CTA will not apply its coincidence trigger in hardware but in software. This means that camera servers read out the full event information and waveforms of locally triggered events and send lists of event metadata (mostly timestamps) to a central software process, the Software Array Trigger (SWAT), for coincidence matching between neighbouring telescopes. The camera servers keep the events in memory for a short period of time until they receive answer from the SWAT which event should be sent on to the array data handler (ADH) and which should be discarded.

Easy enough, cameras can include a bit in each event’s header whether it looks like a local muon that should be kept. To do this, a camera server in principle needs to look into all events for ring-like patterns. At data rates of up to several GByte s^–1, fast methods to tag muons are absolutely essential. Luckily, the tagging doesn’t need to be very precise – efficiency should be good (and there’s actually a requirement on that), but purity can be in the single-digit percentages without leading to tremendous data rates for the ADH and on-site storage.

In the FlashCam team we always assumed that it should be more or less trivial to implement some rough muon tagging like this simply based on which of the trigger patches actually fired (which is less than 1 kbit of information per event), but we did not get around to actually verifying this so far. So I was somewhat alarmed by a recent discussion with ADH, in which our colleagues from the NectarCAM and LSTCam teams claimed they may not be able to do such a quick muon tagging and would like to change the protocol between the camera servers and SWAT.

This seemed like a good incentive to check whether our assumptions about fast muon tagging with FlashCam were reasonable.

Using trigger-level information for fast muon tagging

The topology of clean, full muon rings is quite unique: each ring is spread out over a large part of the camera (compared to the dominant small, low-energy showers) and near instantaneous in time (compared to other large images from showers with high impact distances). So we expect 1) a large number of trigger patches to fire simultaneously and 2) that such a pattern should stand out reasonably well from the normal shower images.

In FlashCam, the trigger logic is implemented entirely in firmware: for each of the 4 ns samples digitised with our 250 MS s^–1 ADCs, the charge sums of 588 seamlessly overlapping trigger patches are calculated. Each patch sum is compared to its trigger threshold and, if a patch exceeds its threshold, an event is triggered.

For a near-instantenous light pulse such as from a muon, multiple patches may trigger within a given 4 ns sample. However, there is some statistical spread depending on the time synchronisation of the individual channels (which is adjusted down to the level of ±500 ps for a plane-wave illumination) and on the phase of the light pulse relative to the ADC sampling clock. Since merely one triggering patch is enough to generate an event trigger, a muon signal rising just before an ADC sample is captured may trigger an event with just one active trigger patch (the earliest patch). Hence, to identify large, fast trigger patterns we need to combine the information of the actual trigger sample and (at the very least) its subsequent one. If we encode the triggering patches as bit masks (a bit is raised if the corresponding patch sum exceeded its threshold at a given sample), we need to count the number of bits in the logical OR of two sets of 588 bits, each set distributed over twelve 64-bit integers (one for each trigger card of our readout system).

On current CPUs this may be done using a population count instruction.On x86-64 the number of raised bits may be counted with the POPCNT instruction, some neat AVX2-based vectorisation, or the AVX512-based VPOPCNTD and VPOPCNTQ; on ARM64, CNT would be available in the NEON extensions. On ancient CPUs we probably would have made use of one of those beautiful recipes from chapter 5 of Hacker’s Delight. Can we really pull off reasonable muon tagging in just a few CPU cycles? Well, our stock firmware currently doesn’t provide the information of which patches triggered an event, let alone which would have triggered at the following sample. Since it’s a fully deterministic, digital trigger we can re-calculate this information based on the captured waveforms—but this is obviously quite expensive (there’s a reason we’re doing this on 97 FPGAs in parallel in the first place). To convince ourselves that we should implement the triggered-patches bit-masks into our firmware, a muon tagging proof-of-concept was needed.

Testing fast muon tagging

To test muon tagging, we can make use of our (C-based) software reimplementationThe FlashCam trigger emulator was implemented during Simon’s doctorate for the verification of our trigger performance. of our (Verilog-based) trigger logic: we feed measured waveforms into the trigger emulator and count the number of triggered patches n_p. Since the firmware implementation would reside in a common Verilog module, our readout master would automatically provide a 12-bit mask describing which of the twelve trigger cards participated—so we calculate the number of triggered trigger cards n_c as well.

Applying this to a few-second observation of PKS 2155-304 from the commissioning period of the FlashCam prototype in the large (CT5) telescope of H.E.S.S., we obtain the following 2-d histograms for the trigger sample (left) and the two-sample window (right):

As expected, both histograms show a smooth distribution dominated by cosmic rays (which exhibit a steep power-law energy spectrum), and most events were triggered by compact, low-energy showers (lower left corner of both plots). The two-sample window shows generally larger numbers of triggered patches and cards because of the mentioned phase-dependence and the inherent time gradient of air showers (which increases with impact distance beyond the Cherenkov cone).

We expect muon rings to occupy some part in the upper middle of these histograms, but a generally sharper feature for the two-sample window. We can estimate that a full muon ring hits at least 2π θ_c θ_pix^–1 ≅ 100 pixels as a lower bound (more likely ~2× that because of ring broadening effects) and covers significant space in all of the 9 “inner” trigger cards (most likely one or two more). To get a more precise estimate of the occupied parameter space, we use the fake_muon simulator from Konrad’s sim_telarray package and limit the muon geometries to those producing full rings with nearly no leakage out of the camera field of view (up to 10 m impact distance and 0.4° off-axis angle). The event display of an ideal ring and one rather extreme image that we may still be interested in are shown below.

Event display of a simulated on-axis muon hitting H.E.S.S. CT5.

Event display of a simulated muon hitting the mirror 10 m from its center at an off-axis angle of 0.4°.

We run those simulated muon events through our trigger emulator and overlay the PKS 2155-304 data with contour lines showing the parameter space of the muon rings:

It is obvious from these plots that the one-sample trigger information is not sufficient to separate muon rings from air showers—a two-sample window is needed (and sufficient). Choosing some round numbers for cuts, such as n_c ≥ 5 and n_p ≥ 42, rejects 99.3% of the triggered events (showers) while being 100% efficient for the simulated set of full muon rings. The rate of tagged muons would be 15–20 Hz for CT5. The purity of this selection is a bit harder to estimate. Our students tell me that we currently obtain about 0.6 Hz of nicely reconstructed muon rings in CT5 data; and my biological neural network classifies about 2–3 Hz worth of data as muon rings that ought to be fittable (based on browsing 10 seconds of data after being reduced with above cuts). So the purity may lie somewhere between 5% and 15% – which is not great but good enough to do the job; especially considering that it takes just a couple of CPU cycles on our camera server.

Technical implementation

The following technical implementation could be envisaged:

each module of the readout system provides two bit masks in its event header:
- trigger_source, encoding which trigger sources (ADC channel, trigger patch, or trigger card) participated in generating the event trigger
- extended_trigger_source, encoding which trigger sources were active in the n following samples (n is a run parameter)

in the DAQ server, trivial code such as

bool is_muon_candidate(const Event &event, const Configuration &cfg)
{
  // We merely need the event header to tag muons
  const auto &header = event.header;

  // Apply a first cut on the number of triggered trigger cards (which
  // rejects ~99% of shower-like events)
  int ncards = __builtin_popcount(header.master_card.trigger_source |
                                  header.master_card.extended_trigger_source);
  if (ncards < cfg.muon_tagging.min_cards)
    return false;

  // Sum up the number of triggered patches from each trigger card
  int npatches = 0;
  for (int card = 0; card != header.num_trigger_cards; card++)
    npatches += __builtin_popcountl(header.trigger_card[card].trigger_source |
                                    header.trigger_card[card].extended_trigger_source);

  return npatches >= cfg.muon_tagging.min_patches;
}

tags muon-like events with sufficient efficiency in a couple of CPU cycles

Future studies

This quick study was just a superficial look at how the trigger information could be exploited for fast muon tagging; and it served its purpose in convincing ourselves within the FlashCam team that it is worthwhile to extend our firmware to include such information.

More detailed simulations are needed to find a proper set of cut parameters and characterise the efficiency and purity as a function of trigger threshold, night-sky background and defects such as broken pixels. In case this simple two-parameter cut is shown to lack performance (in efficiency, purity or robustness), the newly available information may be exploited further. For example, since a cut on the number of trigger cards alone already reduces the amount of data to look at by a factor of >50, I would expect that we can either improve performance using the (adaptive) circle Hough transform or train a rather simple classifier to analyse the topology of the triggered patches (which we completely ignored in this study) without having a significant impact on computing resources. A convolutional neural network could be the right tool for the job, but even a small (preferably cache-friendly) BDT may suffice.

Relevance of muon rings for IACT calibration#

Full rings are rare#

Software-based muon tagging#

Using trigger-level information for fast muon tagging#

Testing fast muon tagging#

Technical implementation#

Future studies#