-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsupport.tex
42 lines (33 loc) · 4.23 KB
/
support.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
\section{Fundamentals}
\label{sec:support}
The LSST Science Pipelines software is written in Python with C++ used for high performance algorithms and for core classes that are usable in both languages.
We use Python 3 \citep[having ported from python 2,][currently with a minimum version of Python 3.11]{2020ASPC..522..541J}, and the C++ layer can use C++17 features with pybind11 being used to provide the interface from Python to C++.
Additionally, the C++ layer uses \texttt{ndarray} to allow seamless passing of C++ arrays to and from Python \texttt{numpy} arrays.
This compatibility with \texttt{numpy} is important in that it makes LSST data structures available to standard Python libraries such as Scipy and Astropy \citep{2016SPIE.9913E..0GJ,2018AJ....156..123A}.
Although all the software uses the \texttt{lsst} namespace, the code base is split into individual Python products in the LSST GitHub organization\footnote{\url{https://github.com/lsst}} that can be installed independently and which declare their own dependencies.
These dependencies are managed using the EUPS \citep{EUPS,2018SPIE10707E..09J} where
most of the products are built using the SCons system \citep{2005Scons1377085} with LSST-specific extensions provided in the \texttt{sconsUtils} package enforcing standard build rules.
For logging we always use standard Python logging with an additional \texttt{VERBOSE} log level between \texttt{INFO} and \texttt{DEBUG} to provide additional non-debugging detail that can be enabled during batch processing.
This verbose logging is used for periodic logging where long-lived analysis tasks are required to issue a log message every 10 minutes to indicate to the batch system that they are still alive and actively performing work.
For logging from C++ we use Log4CXX wrapped in the \texttt{lsst.log} package to make it look more like standard Python logging, whilst also supporting deferred string formatting such that log messages are only formed if the log message level is sufficient for the message to be logged.
These C++ log messages are forwarded to Python rather than being issued from an independent logging stream.
Finally, we also provide some LSST-specific exceptions that can be thrown from C++ code and caught in Python.
As of April 2024, the Science Pipelines software is approximately 640,000 lines of Python and 225,000 lines of C++.
The number of lines in the pipelines code as a function of time is given in Fig.~\ref{fig:pipe-loc}.
\begin{figure}
\plotone{fig-pipe-loc}
\caption{The number of lines of code comprising the LSST Science Pipelines software as a function of time.
Line counts include comments but not blank lines. Python interfaces are implemented using \texttt{pybind11} and that is counted as C++ code. For the purposes of this count Science pipelines software is defined as the \texttt{lsst\_distrib} metapackage and does not include code from third party packages.}
\label{fig:pipe-loc}
\end{figure}
\subsection{Python environment}
An important aspect of running a large data processing campaign is to ensure that the software environment is well defined.
We define a base python environment using conda-forge via a meta package named \texttt{rubinenv}\footnote{\url{https://github.com/conda-forge/rubinenv-feedstock}}.
This specifies all the software needed to build and run the science pipelines software.
A Docker container is built for each software release and the fully-specified versions of all software are recorded to ensure repeatability.
\subsection{Unit Testing and Code Coverage}
Unit testing and code coverage are critical components of code quality \citep{2018SPIE10707E..09J}.
Every package comes with unit tests written using the standard \texttt{unittest} module.
We run the tests using \texttt{pytest} \citep{pytest} and this comes with many advantages in that all the tests run in the same process and requiring global parameters to be well understood, test can be run in parallel in multiple processes, plugins can be enabled to extend testing and record test coverage, and a test report can be created giving details of run times and test failures.
Coding standards compliance with PEP\,8 \citep{pep8} is enforced using GitHub actions and \texttt{pre-commit} checks.
A Jenkins system provides the team with continuous integration facilities.