-
Notifications
You must be signed in to change notification settings - Fork 3
/
dp0.tex
35 lines (20 loc) · 2.67 KB
/
dp0.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
\section{Data Preview 0}\label{sec:dp0}
In \citeds{RDO-011} we outlined a number of scenarios for early releases of Rubin Observatory~data. The purpose of the these releases are not only to prepare the community for LSST data, but also to serve as an early integration test of existing elements of the Data Management systems and to familiarize the community with our access mechanisms.
Two major new developments have occurred since \citeds{RDO-011} was drafted:
\begin{itemize}
\item There have since been delays in construction such that we are now planning on making Data Previews with Rubin Observatory simulated data or on-sky data from other observatories (see \secref{sec:dataset}) which would still allow us to meet some of the goals of the early releases.
\item We are planning on carrying these activities at the Interim Data Facility, which is is dedicated to Pre-Ops activities infrastructure needs such as serving data and training operations staff. (Commissioning actives will continue at NCSA and in Chile.)
\end{itemize}
In this document we outline notable elements of DP0, the first of these planned data previews, from the Data Management and Pre-Operations perspective.
Data Preview 0 itself was broken down in two parts: 0.1 (\appref{sec:dp0.1}) servings existing data products, 0.2 (\secref{sec:dp0.2})reprocessing that data and publishing new catalogs.
Since DP0.1 has been released that text has been moved to an appendix (\appref{sec:dp0.1}).
A DP0.3 has been mentioned but no agreement has been made to do this (apart from tha tit must be real data like HSC). No plannign for that will be done until 2022 when we are confident about DP0.2.
\input{dp02}
\subsection{Risks and mitigation}
The biggest schedule risk is not getting an interim data facility in place in time.
This would delay the entire schedule and there is not much mitigation.
In the long run costs may be higher than expected in a cloud based IDF. This will be due to storage.
An mitigation to this would be to store data on our own systems (NCSA or Chile) and expose it through S3.
NCSA already have this in place and we should consider testing this for lesser used data sets.
There is some risk that Butler over S3 and Postgres might not be at production grade by DP0. We are working hard on that in construction. There is the possibility to run Gen 3 over a filesystem which would not be ideal on the cloud. If Gen3 does not work at all we will have to have a major rethink and build a much simpler butler.
Similarly, the workflow system and associated tools may not be mature enough for large-scale production. Scalability in production is also not understood. We may need to limit the size of DP0 and rethink the system.