Research – Prime Focus Spectrograph

Galaxy Evolution

This page is currently under construction!

Published 2025-06-XX.

Introduction

The Subaru Prime Focus Spectrograph at the Mauna Kea observatory in Hawai'i seeks to answer the following question: What aspects of large-scale cosmological structure drive the formation and evolution of galaxies?

Researchers from all over the world are working together to construct a first-of-its-kind instrument that can conduct a spectrographic survey "large enough" to precisely identify these causal relationships. Its observations will test our theories of dark matter, illuminate the life cycles of black holes, and trace "reionization" — the moment when the first stars flooded space with ultraviolet light, lifting the fog of the cosmic "dark ages."

Problem

For every night of scheduled observations, the PFS will conduct some small number of exposures $T \sim \mathcal{O}(10^1)$, during which each of $K \sim \mathcal{O}(10^3)$ fibers on the instrument will observe a galaxy. Each of $I \sim \mathcal{O}(10^5)$ galaxies are grouped by intrinsic astrophysical properties into classes, and a galaxy's class determines the integration time (i.e. number of exposures) required for it to be completely observed. Notably, any given fiber has access to some, but not necessarily all, galaxies.

Here is the principal question which I and my advisor, Peter Melchior, seek to address:

Given the global objective of balancing the proportion of completed galaxies per class, how can we optimally make local assignments of fibers to galaxies during each exposure?

Explicitly, suppose we denote $\mathcal{I}$ the set of galaxies, $\mathcal{K}$ the set of fibers, and $\mathcal{L}$ the set of exposures. Let the galaxies be partitioned into $M$ classes $\{\Theta_m\}_{m \in \mathcal{M}}$, each with $N_m := |\Theta_m|$ galaxies, based on their shared required observation time $T_m$. Concretely, the problem we'd like to solve (henceforth denoted $\text{PFS}$) is

$$ \text{PFS} := \begin{cases} \begin{align} \operatorname*{maximize}_{\substack{t_{ikl} \in \{0,1\}}} \quad & \min_{m \in \mathcal{M}} \Big\{ N_m^{-1} \sum_{i \in \Theta_m} \mathcal{X}_{[T_m, \infty)} \Big( \sum_{k \in \mathcal{K}} \sum_{l \in \mathcal{L}} t_{ikl} \Big) \Big\} \\ \text{subject to} \quad & t_{ikl} \leqslant \mathcal{X}_{\Psi_k}(i),\; \sum_{k \in \mathcal{K}} t_{ikl} \leqslant 1,\; \sum_{i \in \mathcal{I}} t_{ikl} \leqslant 1 \end{align} \end{cases} $$

where $\mathcal{X}$ is the characteristic function. It has an intuitive interpretation, despite its technical representation:

The scientific objective is an "equitable" distribution among the galaxy classes. It seeks to optimize the minimum fraction of galaxies completed in each class $m$ (i.e. $n_m / N_m$, where $n_m$ is the number of galaxies in $\Theta_m$ which receive at least $T_m$ observation time), across all classes.

The first constraint asks that each fiber $k$ only observes a predefined subset of galaxies $\Psi_k \subseteq \mathcal{I}$.

The second constraint asks that in each exposure, there is at most one fiber observing each galaxy.

The third constraint asks that in each exposure, each fiber observes at most one galaxy.

This combinatorial optimization is extremely hard; in fact, it is NP-hard. Given the high-dimensional nature of the data (there are $IKL\sim \mathcal{O}(10^{10})$ binary decision variables!), brute force is effectively intractable. Instead, we'll need some more powerful tools with the right architecture to exploit this problem's discrete structure.

Class-Optimal Message Passing

What are MPNNs?

Message-passing Neural Networks (MPNNs), introduced by Gilmer et. al. at ICML 2017 (and later generalized in Battaglia et. al.'s 2018 graph-nets paper) are a unified framework for interpreting learning on graphs. Let $G=(V,E)$ be an input graph represented by node features $\mathbf{h}_v^{(0)}$ and edge features $\mathbf{e}_{uv}$ for $u, v \in V$. Then learning proceeds through a sequence of $T$ discrete message-passing layers. In each layer $t$:

Message Phase: for every edge $(u, v) \in E$, generate the message $$ \mathbf{m}_{uv}^{(t)} = M^{(t)}(\mathbf{h}_u^{(t-1)}, \mathbf{h}_v^{(t-1)}, \mathbf{e}_{uv}) $$ where $M^{(t)}$ is a learnable (often neural) function shared across edges.
Aggregation Phase: each node $v \in V$ aggregates incoming messages with a permutation-invariant operator $\phi$ such as a sum, mean, or max: $$ \mathbf{m}_v^{(t)} = \phi_{u \in \mathcal{N}(v)} \mathbf{m}_{uv}^{(t)} $$ where $\mathcal{N}(\cdot)$ represents the neighborhood of its input (i.e., all connected objects). Permutation-invariance is critical so that the network doesn't associate the arbitrary ordering of the nodes as meaningful information.
Update Phase: the node state is updated via another learnable function $U^{(t)}$: $$ \mathbf{h}_v^{(t)} = U^{(t)}(\mathbf{h}_v^{(t-1)}, \mathbf{m}_v^{(t)}) $$

After $T$ rounds, the graph-level or node-level readout (e.g., pooling or attention) produces task-specific predictions, which can be used to calculate the objective and loss.

Modeling $\text{PFS}$ with MPNNs

TODO.