LSST Interdisciplinary Network for Collaboration and Computing

LINCC Frameworks

Funding for the LINCC Frameworks project has been provided through Schmidt Futures, a philanthropic initiative of Eric and Wendy Schmidt.

****Click Here for information about the LINCC Frameworks Incubators!****

 

We are happy to share a community white paper on cross-cutting analysis software needs that will support a wide range of early LSST science cases. This white paper is the outcome of a workshop held in March 2022, “From Data to Software to Science with the Rubin Observatory LSST”, organized by the LINCC Frameworks team with participation from across the LSST science community, including members of the LSST Science Collaborations, Rubin Observatory,  NOIRLab, and IDACs teams.  See our Community post for opportunities for follow-up discussion!

This page describes the LINCC Frameworks project, an ambitious 5-year program to provide advances in software infrastructure needed to enable the community's analysis tools to work at the scale and complexity demanded by the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) data.  The program involves software infrastructure development at two sites, with programs administered by LSSTC to enable community engagement at a variety of levels over the length of the program. For more information contact Andy Connolly  (ajc@astro.washington.edu) or Rachel Mandelbaum(rmandelb@andrew.cmu.edu) or sign up for the LINCC mailer

See the jobs page for job opportunities associated with this initiative!  LSSTC Catalyst Fellows will also have an opportunity to engage with this effort, as outlined on the Catalyst Fellowship page.

The first LINCC Frameworks community engagement activity was a workshop called "From Data to Software to Science with the Rubin Observatory LSST".  Please check the meeting website for more information, and enjoy (a) the plenary session recordings from the meeting and (b) the Rubin/LSST youtube playlist that provides an introduction to Rubin Observatory and what the LSST data products will look like.

VISION

The once-in-a-generation opportunity of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) to transform our knowledge of the Universe relies on the development of state-of-the-art analysis techniques that can work at the scale and complexity of the LSST data. The vision for the LINCC Frameworks, a collaboration between the University of Washington (UW), Carnegie Mellon University (CMU), and the LSSTC, is to support the LSST community through the development of cloud-based analysis frameworks for LSST science. Our goal is to enable the delivery of key computational infrastructure and code for petabyte-scale analyses, mechanisms to search for one-in-a-million events in continuous streams of data, and community organizations and communication channels that enable researchers to develop and share their algorithms and software.

LINCC Frameworks will support the LSST community in developing their analyses in collaboration with professional software engineers and data scientists. The analysis frameworks will provide JupyterHub as an interface to the data. While the system will be cloud-first, care will be taken during development to ensure the frameworks can also be employed on HPC, to suit the wide range of computing resources used by the astronomical community. 

SCIENCE FRAMEWORKS 

The objective of the LINCC Frameworks team is to aid in the development of software and computational infrastructure to support LSST research. Below we describe the current areas of focus for the team, example scientific use cases, and some of the known technical challenges. Through interactions with the community, we will continually refine the plans, identify new opportunities to collaborate, coordinate with groups already working in these areas, and seek other areas where software infrastructure development could strongly impact community software development for LSST.
A FRAMEWORK FOR SCALABLE SPATIAL ANALYSIS

Over its lifetime Rubin will compile an unprecedented catalog of tens of billions of astrophysical objects. This wealth of data will enable astronomers to answer a range of scientific and statistical questions about our universe such as: (a) understanding structure by analyzing the distribution of objects; (b) modeling the changes of variable sources over time; (c) allowing prompt localization of electromagnetic counterparts to gravitational wave sources; and (d) providing interpretable and information-rich cross-correlations against multi-wavelength datasets including the Cosmic Microwave Background.

Supporting these science questions requires key functionality in an analysis framework including the ability to: (a) store and manipulate catalog data at scale; (b) perform distributed computation over this data; (c) use spatial structure within searches and statistical computation; (d) interoperate with data from other surveys; and (e) access these catalogs without having to directly download them.

The LINCC Frameworks team is developing the Large Survey DataBase (LSDB), an infrastructure to facilitate the analysis of large-survey data based on efficient spatial partitioning. Driven by the initial use cases of catalog cross matching and distributed time series analysis, the team is developing an end-to-end technology stack. The work includes coordinating with the broader astronomy community on developing standardized data formats, enabling cloud or HPC-based analysis, and the development of a full suite of software tools.

A FRAMEWORK FOR TIME DOMAIN SCIENCE

Rubin is uniquely capable of monitoring the sky at unprecedented depths, detecting and characterizing time-variability of tens of billions of astrophysical objects. Fast and efficient access to astronomical light curves (LCs) will enable the discovery of the most energetic events in the Universe and provide the first systematic characterization of variability within the Milky Way. Scientific discoveries we expect from this framework include: (a) identification of the pre-explosion outbursts from supernova progenitors, which are poorly understood and which hold the key to understanding the chemical enrichment of the intergalactic medium; (b) understanding the evolution and death of stars by detecting the very rarest and the most distant transients that represent the most extreme ways stars die; (c) probing the properties of dark matter by mapping of the distribution of mass in the Milky Way using the kinematics of Cepheids and RR Lyrae.

Supporting these science questions requires key functionality: (a) a database or datastore for lightcurves that scales to the size of LSST data sets (1010 sources); (b) algorithms to automatically search and analyze light curves including measures of multi-band periods, detection of outbursts, generation of classification features, identification of changes in state of a source, and measures of distance between lightcurves accounting for phase and period variation; (c) a framework that scales user defined algorithms to the size of LSST data; (d) trained neural network architectures for classifying sparsely sampled time series data (including trusted training samples); (e) a fast and scalable catalog cross-matching engine to match sources from the LSST with existing data sets to provide panchromatic information for detected sources.

The LINCC Frameworks team is developing the lsstseries library to provide a framework for scalable and automated time series analysis at Rubin data scales. Interoperability with the Large Survey DataBase (LSDB) will provide further scalability and functionality, such as catalog cross-matching. The initial use cases driving development are multiband support for period finding in RR Lyrae stars and structure function calculation for active galactic nuclei and CARMA modeling.

SCALABLE FAINT OBJECT DETECTION

Asteroids and comets are the remnants of the Solar System's early assembly. Their history of accretion, collisions, and perturbation by existing and vanished giant planets is preserved in their orbital elements and size distributions. Rubin has the potential to discover the nearest (Near Earth Objects; NEOs) and the most distant (Trans Neptunian Objects; TNOs) asteroid populations, mapping the Solar System in unprecedented detail. The discoveries we expect from this framework include: (a) detection and impact probabilities for ~80% of NEOs with sizes over 140m (the impact of which would cause devastation on a regional scale); (b) the discovery of interstellar objects as they pass through the Solar System; (c) a 10-fold increase in the number of known TNOs, which can elucidate the evolution and origin of the Solar System; and (d) the ability to find objects that originate from the inner Oort Cloud and potentially additional planets beyond 100 au (astronomical units).

The LINCC Frameworks team is currently working to scale the KBMOD algorithm – a shift-and-stack search that finds objects that may not be bright enough to detect in a single image. The goal is to enable the efficient detection of faint objects at Rubin’s data scales. Key technical challenges include: (a) scaling this approach to Rubin’s massive data volume; and (b) improving sensitivity and accuracy.

COMPREHENSIVE PHOTO-Z INFRATRUCTURE

A key element in turning 2D images from LSST into a 3D view is determining distances to the astronomical objects studied. Scientifically, this 2D to 3D conversion is essential to mapping out the expansion history of the Universe and growth of cosmological structure as a function of time, which is the key to understanding dark energy, galaxy formation/evolution, and the physical processes that drive transient and variable phenomena. Photometric redshifts, or photo-zs, use the observed colors of the galaxies (and potentially other information) to produce an estimate of the distance to an object, which can be critically important information to understand the physics that may have given rise to it. 

The development of photo-z estimation methods is an active field of research that gives rise to a number of software needs for application to Rubin data. The openly-developed Redshift Assessment Infrastructure Layers (RAIL) software library was initiated by the Dark Energy Science Collaboration (DESC) to establish a unified framework for comprehensive photo-z development, validation, and optimization within the scope of cosmological analysis. LINCC Frameworks is carrying out the following activities, all in collaboration or coordination with DESC as appropriate: (a) collaborating with members of the other LSST Science Collaborations to extend RAIL to other extragalactic use cases beyond cosmology, including the addition of new photo-z performance metrics; (b) working with Rubin Observatory on compatibility of RAIL with the Rubin Science Platform, to enable key commissioning activities; (c) extending the probabilistic representations of photo-z uncertainty, i.e., probability density functions, p(z); (d) improving uncertainty estimation for individual and ensemble photo-z distributions; and (e) generally investing effort into optimizations that improve the robustness of the software at LSST scale.

TIMELINE

LINCC Frameworks will engage the LSST science community in prioritizing and designing the software frameworks needed to support early science. The five-year development timeline for LINCC has a phased approach for deliverables, their application to increasingly large and more complex dataset, and engagement with an increasingly broader community. In the first year, LINCC will host a workshop to define the software infrastructure needs and priorities for the LSST science community.

Getting Involved and Building a Community

The goal of the LINCC Frameworks initiative is to support LSST science analyses for all the LSST science community by providing analysis tools and infrastructure that enable a vast array of Rubin science use cases.  To encourage the science community to help shape the design and functionality of the analysis system we will deliver 

  • A science analysis platform built on Jupyterhub to experiment with precursor data (building off the platforms developed for the Rubin Science Platform)

  • Compute resources for developing science analysis using precursor data sets with ~2M CPU hrs/year (cloud and HPC) and >1 PB of disk storage 

  • Workshops to help define the functionality and priorities for the software frameworks and the tools LINCC will help build

  • Incubators (with $20K research grants) to bring research teams together with the LINCC software team to develop their own science analyses using the LINCC tools and framework

  • Funded hackweeks and workshops to help the LSST Science Collaborations work together to learn and test the tools

The first LINCC Frameworks community engagement activity was a workshop called "From Data to Software to Science with the Rubin Observatory LSST".  Please check the meeting website for more information, and enjoy (a) the plenary session recordings from the meeting and (b) the Rubin/LSST youtube playlist that provides an introduction to Rubin Observatory and what the LSST data products will look like.  The white paper summarizing workshop outcomes may be found on arxiv.