LSST Interdisciplinary Network for Collaboration and Computing

LINCC Frameworks

FUNDED BY THE SCHMIDT FUTURES FOUNDATION

This page describes the LINCC Frameworks project, an ambitious 5-year program to provide advances in software infrastructure needed to enable the community's analysis tools to work at the scale and complexity demanded by the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) data.  The program involves software infrastructure development at two sites, with programs administered by LSSTC to enable community engagement at a variety of levels over the length of the program.  Stay tuned for more about community engagement opportunities that will be announced later in 2021! For more information contact Andy Connolly  (ajc@astro.washington.edu) or Rachel Mandelbaum(rmandelb@andrew.cmu.edu) or sign up for the LINCC mailer

Vision

The once-in-a-generation opportunity of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) to transform our knowledge of the Universe relies on the development of state-of-the-art analysis techniques that can work at the scale and complexity of the LSST data. The vision for the LINCC Frameworks, a collaboration between the University of Washington (UW), Carnegie Mellon University (CMU), and the LSSTC, is to support the LSST community through the development of cloud-based analysis frameworks for LSST science. Our goal is to enable the delivery of key computational infrastructure and code for petabyte-scale analyses, mechanisms to search for one-in-a-million events in continuous streams of data, and community organizations and communication channels that enable researchers to develop and share their algorithms and software.

LINCC Frameworks will support the LSST community in developing their analyses in collaboration with professional software engineers and data scientists. The analysis frameworks will provide JupyterHub as an interface to the data. While the system will be cloud-first, care will be taken during development to ensure the frameworks can also be employed on HPC, to suit the wide range of computing resources used by the astronomical community. 

Science Frameworks

Below we describe some example science cases and the functionality needed to support these areas. The objective of LINCC Frameworks is to aid in the development of software and computational infrastructure to support LSST research. Through interactions with the community in the year of the program, we will refine the plans, identify opportunities to collaborate and coordinate with groups already working in these areas, and seek other areas where software infrastructure development could strongly impact community software development for LSST.

A framework for Solar System science

Asteroids and comets are the remnants of the Solar System's early assembly. Their history of accretion, collisions, and perturbation by existing and vanished giant planets is preserved in their orbital elements and size distributions. Rubin has the potential to discover the nearest (Near Earth Objects; NEOs) and the most distant (Trans Neptunian Objects; TNOs) asteroid populations, mapping the Solar System in unprecedented detail. The discoveries we expect from this framework include: (a) detection and impact probabilities for ~80% of NEOs with sizes over 140m (the impact of which would cause devastation on a regional scale); (b) the discovery of interstellar objects as they pass through the Solar System; (c) a 10-fold increase in the number of known TNOs, which can elucidate the evolution and origin of the Solar System; and (d) the ability to find objects that originate from the inner Oort Cloud and potentially additional planets beyond 100 au (astronomical units).

Supporting these science questions requires key functionality: (a) a scalable moving object linking service to identify asteroids that the Rubin image processing pipelines missed, and cross-link with other surveys; (b) a detection pipeline for distant Solar System objects capable of detecting and characterizing the orbits of asteroids or planets >100 au from the Sun; (c) an integrator suite for large dynamical computations to predict positions of asteroids (including uncertainty); (e) light curve fitting applications that can characterize the shapes and compositions of asteroids; (f) algorithms to automatically search and analyze the LSST alert stream to detect activity and outbursts from comets; (g) infrastructure to coordinate and disseminate follow-up observations of NEOs or other timely events.

A framework for time domain science

Rubin is uniquely capable of monitoring the sky at unprecedented depths, detecting and characterizing time-variability of tens of billions of astrophysical objects. Fast and efficient access to astronomical light curves (LCs) will enable the discovery of the most energetic events in the Universe and provide the first systematic characterization of variability within the Milky Way. Scientific discoveries we expect from this framework include: (a) identification of the pre-explosion outbursts from supernova progenitors, which are poorly understood and which hold the key to understanding the chemical enrichment of the intergalactic medium; (b) understanding the evolution and death of stars by detecting the very rarest and the most distant transients that represent the most extreme ways stars die; (c) probing the properties of dark matter by mapping of the distribution of mass in the Milky Way using the kinematics of Cepheids and RR Lyrae.

Supporting these science questions requires key functionality: (a) a database or datastore for lightcurves that scales to the size of LSST data sets (1010 sources); (b) algorithms to automatically search and analyze light curves including measures of multi-band periods, detection of outbursts, generation of classification features, identification of changes in state of a source, and measures of distance between lightcurves accounting for phase and period variation; (c) a framework that scales user defined algorithms to the size of LSST data; (d) trained neural network architectures for classifying sparsely sampled time series data (including trusted training samples); (e) a fast and scalable catalog cross-matching engine to match sources from the LSST with existing data sets to provide panchromatic information for detected sources

A framework for extragalactic science

Credit: NASA, ESA and R. Massey (California Institute of Technology)

A key element in turning 2D images from LSST into a 4D view is determining distances to the astronomical objects studied. Scientifically, this 2D to 4D conversion is essential to mapping out the expansion history of the Universe and growth of cosmological structure as a function of time, which is the key to understanding dark energy and galaxy formation/evolution.  Photometric redshifts, or photo-z’s, use the observed colors of the galaxies (and potentially other information) to produce an estimate of the distance to an object and tell us about its nature. We propose to deliver a comprehensive photo-z development, validation, and optimization framework, which will enable scientific discovery in essentially all extragalactic science cases for LSST.  

Some key scientific outcomes of this framework include: (a) improvements in photo-z precision compared to the current state-of-the-art, which would tighten constraints on the nature of dark energy by a factor of two; (b) tests of models of dark matter based on studies of dwarf galaxies over the past 4 billion years of cosmic history; (c) the ability to promptly and reliably localize electromagnetic counterparts to gravitational wave sources; and (d) interpretable and information-rich cross-correlations against multi-wavelength datasets including the Cosmic Microwave Background.

Supporting these science questions requires key functionality in this framework: (a) tools for manipulating catalog data efficiently; (b) metrics that quantify how photo-z quality affects extragalactic science; (c) simulations and ancillary datasets for characterizing photo-z performance; (d) a database for storing and accessing probabilistic estimates of photo-z’s, i.e., probability density functions, p(z), which are often represented in terms of arrays, and which (for multiple photo-z methods) require more than the limited storage capacity devoted to photo-z in the Rubin Observatory database; (e) uncertainty estimation for individual and ensemble photo-z distributions; (f) guided training set development to identify spectroscopic observations that would maximally improve the photo-z for a science case; and (g) robust, scalable joint estimation of photo-z’s and other galaxy properties such as stellar mass.

Timeline

LINCC Frameworks will engage the LSST science community in prioritizing and designing the software frameworks needed to support early science. The five-year development timeline for LINCC has a phased approach for deliverables, their application to increasingly large and more complex dataset, and engagement with an increasingly broader community. In the first year, LINCC will host a workshop to define the software infrastructure needs and priorities for the LSST science community.

Getting Involved and Building a Community

The goal of the LINCC Frameworks initiative is to support LSST science analyses for all the LSST science community by providing analysis tools and infrastructure that enable a vast array of Rubin science use cases.  To encourage the science community to help shape the design and functionality of the analysis system we will deliver 

  • A science analysis platform built on Jupyterhub to experiment with precursor data (building off the platforms developed for the Rubin Science Platform)

  • Compute resources for developing science analysis using precursor data sets with ~2M CPU hrs/year (cloud and HPC) and >1 PB of disk storage 

  • Workshops to help define the functionality and priorities for the software frameworks and the tools LINCC will help build

  • Incubators (with $20K research grants) to bring research teams together with the LINCC software team to develop their own science analyses using the LINCC tools and framework

  • Funded hackweeks and workshops to help the LSST Science Collaborations work together to learn and test the tools

Early in October we will announce the first of these initiatives, an LSST Science Collaboration Software Infrastructure Workshop to define and discuss the software and algorithm development priorities for the LSST science community. So we can stay in touch with you about this workshop, please sign up for the LINCC mailing list.