Welcome to this week’s roundup of machine learning in astrophysics! From galaxy mergers and radio oddities to simulation-based inference for X-ray spectra, this edition highlights how data-driven tools are helping us tackle a range of astrophysical problems. I've organized the 19 papers by topic and given a brief description of the methods in each paper. Whether you're here to skim or dive deep, we hope you find something surprising, elegant, or just plain clever.
The full bib file for the papers this week can be found HERE.
One trend I noted this week is that there are 3 different papers using simulation-based inference (SBI) with three new papers showcasing its flexibility across domains. Regamey et al. (arxiv:2506.05457) apply SBI to galaxy cluster counts, allowing them to constrain cosmological parameters without needing an analytic likelihood—ideal for capturing the messy realities of surveys and selection effects. Dupourqué & Barret (arxiv:2506.05911) use neural posterior estimation (NPE), a variant of SBI that directly learns the distribution of physical parameters given observed X-ray spectra. This makes it usefule for fast, flexible inference in high-resolution spectroscopy where traditional methods are computationally expensive. Meanwhile, Hikida et al. (arxiv:2506.06087) introduce a hierarchical SBI framework that can handle multi-scale structure and stochasticity in simulations.
🧠 What is Simulation-Based Inference (SBI)?
Sometimes we don’t know how to write down the math that links data to the parameters we care about—but we can simulate it. SBI uses machine learning to “invert” a simulator: by generating lots of fake data and learning which parameters created it, a model can later infer the most likely parameters from real observations. It’s like figuring out if a die is loaded by comparing real rolls to thousands of simulated ones.
🌌 Cosmology & Large-Scale Structure
Generative Models of 21cm EoR Lightcones with 3D Scattering Transforms, Hothi et al., arxiv:2506.10061
Unsupervised – Uses scattering transforms to extract statistical features from 21cm maps and trains generative models. An elegant way to simulate EoR structures.Power Spectrum Emulators from Neural Networks and Tree-Based Methods, Lazanu, arxiv:2506.07514
Supervised – Trains emulators to predict cosmological power spectra from parameters using neural nets and decision trees. A solid performance benchmark across ML architectures. The learning models are available as Jupyter notebooks on GitHub.Galaxy cluster count cosmology with simulation-based inference, Regamey et al., arxiv:2506.05457
Simulation-based inference – Inference of cosmological parameters directly from mock catalogues. Especially useful for non-Gaussian observables.Observational Insights on DBI K-essence Models Using Machine Learning and Bayesian Analysis, Ganguly et al., arxiv:2506.05674
Hybrid – Applies supervised learning for parameter fitting and Bayesian statistics to probe scalar field models of dark energy. DBI-type k-essence models are dark energy theories where a scalar field with a non-standard kinetic term (k-essence which is short for kinetic quintessence) evolves under a Dirac–Born–Infeld (DBI) action, inspired by string theory, to drive the late-time acceleration of the universe.Learning Correlated Astrophysical Foregrounds with Denoising Diffusion Probabilistic Models, Prabhu et al., arxiv:2506.09036
Generative (Diffusion models) – Uses cutting-edge denoising diffusion models to extract structured foregrounds from CMB data. Code and plotting scripts can be found on GitHub.
🌀 Galaxy Evolution
The connection between galaxy mergers, star formation and AGN activity in the HSC-SSP, Omori et al., arxiv:2506.08469
Supervised – Fine-tunes a convolutional neural network (Zoobot) on labeled simulated galaxy images to classify mergers in HSC-SSP data, combining deep learning with citizen science training sets.Radio-loud AGN morphology and host-galaxy properties in the LOFAR Two-Metre Sky Survey, Clews et al., arxiv:2506.08878
Supervised – Classifies AGN types based on radio morphology and links them to host properties using decision-tree classifiers.Discovery of Odd Radio Circles and Other Peculiars in the EMU Survey using Object Detection, Gupta et al., arxiv:2506.08439
Supervised – Uses a CNN-based object detection model trained on labeled radio sources to identify rare structures like Odd Radio Circles and other unusual galaxy-scale radio morphologies in EMU survey data.
🕳️ High-Energy Astrophysics
Gamma-Ray Bursts Calibrated by Using Artificial Neural Networks from the Pantheon+ Sample, Huang et al., arxiv:2506.08929
Supervised – Applies ANNs to calibrate GRBs as distance indicators, drawing analogies from SN Ia standardization.Mitigating Polarization Leakage in Gas Pixel Detectors through Hybrid Machine Learning and Analytic Event Reconstruction, Cibrario et al., arxiv:2506.07828
Hybrid – Combines supervised learning with analytical reconstruction for X-ray polarization data—great for future missions.Simulation-based inference with neural posterior estimation applied to X-ray spectral fitting, Dupourqué & Barret, arxiv:2506.05911
Simulation-based inference – Learns posterior distributions of X-ray parameters using neural density estimation. A major upgrade over MCMC.
🪐 Exoplanets and Stars
The TESS Ten Thousand Catalog: 10,001 uniformly-vetted and -validated Eclipsing Binary Stars detected in Full-Frame Image data by machine learning and analyzed by citizen scientists, Kostov et al., arxiv:2506.05631
Supervised + Human-in-the-loop – ML classifiers flagged eclipsing binaries, which were then refined by citizen scientists. A cool hybrid validation process. I picked the network structure figure as an example but check out the paper for all sorts of cool light curves and lots of issues with light curves that are not actually eclipses.DART-Vetter: A Deep LeARning Tool for automatic triage of exoplanet candidates, Fiscale et al., arxiv:2506.05556
Supervised – Deep learning triage system to prioritize promising exoplanet transit candidates for follow-up.
🛰️ Instrumentation, Tools and Methods
AstroCompress: A benchmark dataset for multi-purpose compression of astronomical data, Truong et al., arxiv:2506.08306
Provides a benchmark for evaluating supervised, unsupervised, and lossless compression tools across image types and fidelity regimes. Underrated but very cool.Multilevel neural simulation-based inference, Hikida et al., arxiv:2506.06087
Simulation-based inference – A hierarchical neural network framework for handling simulation noise and multi-scale uncertainty. The application discussed is cosmological simulation but this is a general method.
The ATLAS Virtual Research Assistant, Stevance et al., arxiv:2506.09778
Supervised - employs Histogram-Based Gradient Boosted Decision Trees trained on real alert data to assign real/bogus and galactic/extragalactic scores.Teaching Astronomy with Large Language Models, Ting & O'Briain, arxiv:2506.06921
LLM-based – Evaluates how LLMs like ChatGPT can support astronomy teaching and scientific communication. An interesting piece for those interested in applications of LLMs in education.
🚀 Bonus: Solar System & Planetary Missions
Study of Venera Spacecraft Trajectories and Wider Implications, Hibberd, arxiv:2506.09478
This study does not apply ML directly but provides a really cool historical-trajectory dataset. I flagged this paper for two reasons: (1) love a good story about space junk and (2) what is the Earth Tisserand parameter?So here is the cosmic mystery: a handful of weird, dark asteroids with nearly perfectly Earth-like orbits—so perfect, in fact, that they match the orbital fingerprints of old Soviet Venera missions from the 1970s. By analyzing their Earth Tisserand parameters (a kind of orbital DNA), the author finds a suspiciously tight match between six “Dark Comets” and the Blok-L rocket stages used to send spacecraft to Venus.
👀 Could these be stealthy comets, long-lost relics of the space age, or just an orbital coincidence? The odds say: not likely a coincidence.
Verdict? The Solar System might be hiding some Soviet-era space junk—and it’s still orbiting incognito.
To point (2): The Earth Tisserand parameter is like a gravitational fingerprint that helps astronomers figure out how an object's orbit compares to Earth's—if it's close to 3, it usually means the object shares a similar path around the Sun. Because it's mostly conserved during gentle gravitational encounters, it's a powerful clue for spotting things that may have once come near Earth—or even launched from it.