Defining, obtaining, archiving, pre-processing, calibrating, processing,
extracting the sources, and distributing the CFHTLS data is a highly complex undertaking.
Three separate entities are involved in the CFHTLS global data flow that eventually brings
the data to the users:
- CFHT: data acquisition, pre-processing and calibration
- Terapix: data stacking, fine astrometric and photometric calibration, source catalogs generation
- CADC: data archiving and distribution of all CFHTLS products
The following diagram, though primarily focused on describing the details of
the CFHT pipeline, shows the complete data flow of the CFHTLS: the black ellipses
represent points where the end CFHTLS users can download from CADC the data product
represented on that branch. These black ellipses have a time tag ("T = X t") corresponding
to the time needed for the particular data set to reach that point from the
time it was acquired on the telescope. Each step leading to each of these
levels of data digestion by the various pipelines is described below:
Definition and acquisition of the data
After the Steering Group defined the fields in consultation with the CFHTLS community,
the next phase consisted in cutting down the global exposure times per field and per year
in order to optimize the observing efficiency (see the MegaPrime site
"Exposure Time Calculator" page for more details on such issues). The observing
strategy was then entered in the Queued Service Observing (QSO) tool, PH2. See the
"Q Service Observing Home" for more on the way the service observing operates
at CFHT. QSO's PH2 was flexible enough to allow the CFHTLS coordinators tweaking
the observing strategy from run to run in order to optimize the use of sky time
(bad weather is by far the worst factor affecting the obsersing efficiency, and
tunings are necessary to ensure the completion of the primary science drivers of the survey).
Raw data archiving
Within minutes after an exposure has been obtained on the telescope, the
image (a single MEF 700 Mbytes large file) is archived at the CFHT headquarters
in Waimea (the summit to the Waimea headquarters link is a DS3 line) synchronously
on two SDLT tapes (100 Gbytes capacity). MegaCam generates data at the rate of
approximately 100 Gbytes a night. All raw data are immediately transfered through
the network to CADC where they are promptly put online and made available to the
users who gain access to raw data within 1 day typically, 2 days maximum,
from the time it was acquired on the sky.
Pre-processing for the Real Time Analyses Systems
CFHT hosts three clusters of powerful computers dedicated
to real time data analysis: 2 for the supernovae program (SNLS: 1 Canada, 1 France)
using the Deep component data, and a third for GRBs (from France) using the data from
the Very Wide component. The machines are operated by CFHT and users connect
remotely from their home institution to install and run their softwares (Linux
architecture). The dedicated pipelines on these machines use data which have been pre-processed
by Elixir using master detrending frames (bias, flat-field, fringe frames, etc...)
from the previous observing run. This proves to be of quality high enough
for the science not to be affected by this compromise which however
allows a consistent data quality throughout the run, starting at the very
first night. Since some spectroscopic follow-up programs wait for some supernovae
candidates, it is important to get the raw frames transfered to the Waimea
archive as soon as possible as Elixir real-time processing operates only
on data already archived. For that very reason, a priority scheme has
been setup to prioritize the transfer of the CFHTLS Deep frames to Waimea
as soon as they are acquired (otherwise the lag due to the limited throughput
of the DS3 line can be up to 1 hour versus a few minutes with the priority scheme).
After the image has arrived in Waimea, it gets fully processed by Elixir in about 3
minutes and is immediately delivered to the Real Time Analysis Systems. The pipelines
then analyze the data (or the set of data) and the results are published on the teams'
site usually the day after and are visible by the whole world. No pixel data are made
available though, in compliance with the CFHTLS data policy.
Elixir Processing
As part of the New Observing Process, CFHT is committed to pre-process
the data at the pixel level (removal of the instrumental signature).
The Elixir pipeline process covers: bad pixels masking, bias & overscan
correction, flat-fielding, photometric superflat, and fringe correction for
the i' and z' data.
Elixir also derives an astrometric solution on a per CCD basis at the 0.2
arcsec. scale, and computes the zero point per filter for
the whole run using the collection of photometric standards acquired
throughout the observing run. A very detailed description of this
process if presented on the MegaPrime and CFHTLS Observing Status pages
in the "Data Preprocessing & Calibration" section. Creating the master
detrending frames and deriving all the images characteristics for the
whole observing run takes no more than a week.
CFHT data products distribution and archiving
When Elixir is finished calibrating the whole run, the CFHT distribution
system (DADS) can at that point trigger the processing and distribution of
the specific subset of CFHTLS images captured during the run (among the
global pool of data that also includes the PI frames). The Elixir data come
along with the MetaData, a large collection of auxiliary data related
to the observing process: weather statistics, observer comments, sky transparency
for the night, etc... All this is transfered through the network
to CADC where they are made available to the community for download. This entire
process takes at the most a total of 3 weeks.
Terapix processing
As soon as the Elixir data arrive at CADC, they are copied to Terapix via
a high speed network. The Terapix data center is primarily focused on handling
the CFHTLS data and provides its services to the whole CFHTLS community with the
production of the weight and flag map images attached to each megacam image, the data
stacking, fine astrometric calibration, and source catalogs generation. It takes
months to gather all the frames to make proper releases of such data set. Terapix
has settled on delivering global CFHTLS releases to the LS community every six
months.
Terapix data products archiving and distribution
While Terapix proposes many web pages focused on data quality and the survey
progress, its primary products for the CFHTLS community (images and catalogs) are
made available only through CADC. Note that all major releases (named TXXXX, starting
at T0001) have a large suite of data products for quality control, all of them can
be freely accessed on the Terapix web site.
|