ph.d. applicant · machine learning

I'm an undergraduate at UC San Diego (B.S. Mathematics–Computer Science & Data Science, expected 2027) studying how generative models can be steered. Right now I work on inference-time control of masked diffusion language models at the Belkin Lab and on diffusion-based climate ensembles at Scripps Institution of Oceanography. I'm applying to Ph.D. programs in machine learning.

Portrait of Pratham Aggarwal
fig.1 · Pratham Aggarwal, San Diego

research

Steering what generative models produce.

One question runs through my work: once a model can generate, how do we control what it generates, precisely, at inference time, without retraining? I approach it from two sides, language and physical systems, with diffusion and masked-diffusion models as the shared substrate.

Controllable generation in masked diffusion language models

Investigating controllable generation in masked diffusion language models (MDLMs) through inference-time unmasking, and developing masking-aware activation-steering methods for partially observed sequences. Benchmarking token-selection strategies (margin, entropy, saliency, and concept-directed) for controllability and text quality, measured with perplexity, MAUVE, and classifier-based metrics on the LLaDA codebase.

  • MDLM
  • activation steering
  • LLaDA
  • controllability

Neural climate surrogates & diffusion ensembles

Built a neural climate-modeling pipeline that runs ~20× faster than state-of-the-art by coupling a lightweight UNet with an ocean emulator, holding prediction accuracy across a 40-year autoregressive rollout. Trained a conditional diffusion model (DDPM) on 155 years of simulation to generate 20–50-member ensembles per timestep, recovering distributional extremes and fine spatial structure that regression baselines lose. Deployed a full-stack ML evaluation platform on GCP benchmarking 500+ model outputs with 20+ custom metrics, work that earned a $4,500 scholarship for interpretability tooling.

  • diffusion (DDPM)
  • UNet surrogate
  • ensembles
  • GCP

evaluation platform ↗ code ↗

Motion planning under uncertainty

Benchmarked and optimized motion-planning algorithms (A*, RRT, MPC), cutting path-planning failures 28% while keeping average trajectory error below 0.15 m in dynamic environments. Strengthened PyBullet simulation pipelines for mapping and localization across 100+ trials, and sped up simulation 2.1× to test 10+ planners concurrently.

  • A*
  • RRT
  • MPC
  • PyBullet

Sequential decision-making for options

Developed a deep-learning agent with adaptive policy learning and early-exercise strategies that improved simulated options-trading returns 12% for a $1.3M student-run fund. A sequential decision step dynamically selected option type, expiration, and exercise timing from time-series data; an LSTM trained on five years of market data reached 0.96 MSE on price prediction.

  • deep RL
  • LSTM
  • time series

projects & awards

Selected work, built to ship and to win.

1st place · NSF / CERN hackathon

Coastal-flooding risk prediction

Built and compared ML models for coastal-flooding risk; selected CatBoost as the top approach and reached a 0.94 F1 on the final CodaBench benchmark through feature engineering on large-scale data.

CatBoost · feature engineering · Jan 2026

1st place · Hackfrontier

Homeless-services CV platform

A geospatial forecasting platform integrating 35+ transit, demographic, and geographic features (67% siting accuracy), paired with a real-time computer-vision system (Oxen.ai, EyePop.ai) tracking transient populations for data-driven service allocation.

scikit-learn · OpenCV · Jun 2025

Internship

NLP for meetings · NIOV Labs

System extracting goals, tasks, participants, and deadlines from meeting transcripts at 88% agreement with human labels; automated scheduling cut manual effort 60% with 95% successful confirmations. Containerized with Docker, deployed on Kubernetes.

NLP · Docker · Kubernetes · 2026

Research report

Black-hole seed growth simulation

Simulated black-hole growth under Eddington and super-Eddington accretion, comparing light- and heavy-seed trajectories over 12 weeks of guided research in computational astrophysics.

numerical modeling · Astropy · read the paper ↗

news

Recent.

contact

Let's talk research.

I'm happy to hear from prospective advisors, collaborators, and anyone working on controllable generation, diffusion models, or scientific ML. The fastest way to reach me is email.

praggarwal@ucsd.edu github.com/pratham-aggr linkedin.com/in/pratham-agg

toolkit: PyTorch · TensorFlow · scikit-learn · diffusion models · Python / C++ / Java · GCP · Docker · Kubernetes · Git