Pratham Aggarwal · Machine Learning Researcher

My research focuses on generative modeling, with an emphasis on understanding and controlling model behavior at inference time. At UC San Diego's Belkin Lab, I investigate controllable generation in masked diffusion language models, developing activation-steering methods and token-selection strategies that guide generation without retraining. At the Climate Analytics Lab within Scripps Institution of Oceanography, under the supervision of Duncan Watson-Parris, I develop diffusion-based climate models for ensemble forecasting and Earth system emulation, enabling faster uncertainty-aware climate simulations. More broadly, I am interested in the foundations of controllable AI and the development of generative models for scientific discovery.

Curriculum Vitae GitHub LinkedIn Email

Portrait of Pratham Aggarwal — fig.1 · Pratham Aggarwal, San Diego

research

Steering what generative models produce.

One question runs through my work: once a model can generate, how do we control what it generates, precisely, at inference time, without retraining? I approach it from two sides, language and physical systems, with diffusion and masked-diffusion models as the shared substrate.

2026–present Belkin Lab, UC San Diego ML Researcher

Controllable generation in masked diffusion language models

Investigating controllable generation in masked diffusion language models (MDLMs) through inference-time unmasking, and developing masking-aware activation-steering methods for partially observed sequences. Benchmarking token-selection strategies (margin, entropy, saliency, and concept-directed) for controllability and text quality, measured with perplexity, MAUVE, and classifier-based metrics on the LLaDA codebase.

MDLM
activation steering
LLaDA
controllability

2025–present Scripps Institution of Oceanography ML Researcher

Neural climate surrogates & diffusion ensembles

Built a neural climate-modeling pipeline that runs ~20× faster than state-of-the-art by coupling a lightweight UNet with an ocean emulator, holding prediction accuracy across a 40-year autoregressive rollout. Trained a conditional diffusion model (DDPM) on 155 years of simulation to generate 20–50-member ensembles per timestep, recovering distributional extremes and fine spatial structure that regression baselines lose. Deployed a full-stack ML evaluation platform on GCP benchmarking 500+ model outputs with 20+ custom metrics, work that earned a $4,500 scholarship for interpretability tooling.

diffusion (DDPM)
UNet surrogate
ensembles
GCP

evaluation platform ↗ code ↗

2025 Existential Robotics Lab, UC San Diego Research Assistant

Motion planning under uncertainty

Benchmarked and optimized motion-planning algorithms (A*, RRT, MPC), cutting path-planning failures 28% while keeping average trajectory error below 0.15 m in dynamic environments. Strengthened PyBullet simulation pipelines for mapping and localization across 100+ trials, and sped up simulation 2.1× to test 10+ planners concurrently.

A*
RRT
MPC
PyBullet

2025 SFIC Quantitative Technologies Quantitative Researcher

Sequential decision-making for options

Developed a deep-learning agent with adaptive policy learning and early-exercise strategies that improved simulated options-trading returns 12% for a $1.3M student-run fund. A sequential decision step dynamically selected option type, expiration, and exercise timing from time-series data; an LSTM trained on five years of market data reached 0.96 MSE on price prediction.

deep RL
LSTM
time series

projects & awards

Selected work, built to ship and to win.

1st place · NSF / CERN hackathon

Coastal-flooding risk prediction

Built and compared ML models for coastal-flooding risk; selected CatBoost as the top approach and reached a 0.94 F1 on the final CodaBench benchmark through feature engineering on large-scale data.

CatBoost · feature engineering · Jan 2026

1st place · Hackfrontier

Homeless-services CV platform

A geospatial forecasting platform integrating 35+ transit, demographic, and geographic features (67% siting accuracy), paired with a real-time computer-vision system (Oxen.ai, EyePop.ai) tracking transient populations for data-driven service allocation.

scikit-learn · OpenCV · Jun 2025

Internship

NLP for meetings · NIOV Labs

System extracting goals, tasks, participants, and deadlines from meeting transcripts at 88% agreement with human labels; automated scheduling cut manual effort 60% with 95% successful confirmations. Containerized with Docker, deployed on Kubernetes.

NLP · Docker · Kubernetes · 2026

Research report

Black-hole seed growth simulation

Simulated black-hole growth under Eddington and super-Eddington accretion, comparing light- and heavy-seed trajectories over 12 weeks of guided research in computational astrophysics.

numerical modeling · Astropy · read the paper ↗

news

Recent.

2026Wrote up a deep dive on how a 10-neuron network serves 100 features, a hands-on look at superposition.
2026Joined the Belkin Lab to work on controllable generation in masked diffusion language models.
Jan 20261st place at the NSF / CERN-hosted hackathon, coastal-flooding prediction (0.94 F1).
2025Awarded a $4,500 research scholarship at Scripps for ML interpretability tooling.
Jun 20251st place at Hackfrontier, homeless-services computer-vision platform.
2025Began ML research at Scripps on neural climate modeling and diffusion ensembles.

contact

Let's talk research.

I'm happy to hear from prospective advisors, collaborators, and anyone working on controllable generation, diffusion models, or scientific ML. The fastest way to reach me is email.

praggarwal@ucsd.edu github.com/pratham-aggr linkedin.com/in/pratham-agg

toolkit: PyTorch · TensorFlow · scikit-learn · diffusion models · Python / C++ / Java · GCP · Docker · Kubernetes · Git