Hello, I'm

Simran
Sodhi

I'm a grad student at Carnegie Mellon studying Automated Science in the Computational Biology department. My work lives at the intersection of biology and computation: I've built closed-loop optimization systems for robotic pipetting, adapted protein language models to predict mutation effects, and trained deep learning models to read chest X-rays. Before CMU, I worked as a software engineer at Amazon and interned at research labs in Munich and Delhi.

I also love teaching and mentorship. At CMU, I've TA'd courses in programming, bioinformatics, and even a pre-college program where I helped high school students with hands-on lab work. Previously, I mentored students from underserved communities in data structures and algorithms through The Barabari Project and volunteered as a math and science teacher through NSS during my undergrad.

When I'm not debugging pipelines, you'll probably find me cooking, singing, or exploring a new city and its culture.

At a glance
  • Based inPittsburgh, PA
  • SchoolCarnegie Mellon University
  • FocusComputational Biology & ML
  • GPA4.3 / 4.0
  • Prev.Amazon · Nutanix · Samsung
Simran Sodhi
Selected Work

Things I've built and explored

A few projects I'm most proud of, spanning ML, protein language modeling, genomics, software engineering, and scientific automation.

OT-2 Robot dispense Image Capture feed YOLO Detection confidence gate acquisition fn Bayesian Optimization (Optuna) next parameters closed-loop optimization search space reduced to <10%
Capstone · Robotics · Bayesian Optimization

Closing the Loop in Robotic Pipetting

Lawrence Livermore National Laboratory · 2025

How do you teach a robot to pipette better by learning from its own mistakes? We built a closed-loop optimization pipeline that connects Bayesian Optimization with an OT-2 liquid-handling robot, using image-based feedback to iteratively improve dispensing accuracy. By gating optimization updates on YOLO prediction confidence, we reduced the effective search space to under 10%.

PythonOptuna YOLOBayesian Optimization OT-2
Mutation-level effect prediction M K T A Y I A K L G S D V E mutations highlighted ESM-2 pretrained protein language model ~70% accuracy also exploring: Transformer + Mamba hybrids
Research · Protein ML

Predicting Mutation Effects with Protein Language Models

Carnegie Mellon University · 2025, ongoing

Can we predict how a single amino acid change will affect a protein's function? I built a mutation-level prediction pipeline using the pretrained protein language model ESM-2, with custom tokenization and masking-based training objectives, achieving roughly 70% accuracy on held-out data. In parallel, I'm designing hybrid architectures that combine transformers with state-space models (Mamba) to better capture long-range dependencies in protein sequences.

PyTorchESM-2 MambaTransformers Python
UMAP plots showing cell type subsetting for Pig, Mouse, and Human Pseudotime trajectories for Pig, Mouse, and Human
Cell type subsetting and pseudotime trajectories across species
Master's Thesis · Computational Biology

Tracing Beta-Cell Journeys Across Species

Institute of Computational Biology, Munich · 2022

How do insulin-producing cells develop differently in humans, mice, and pigs? Using single-cell RNA sequencing data from 16 samples, I traced beta-cell trajectories during embryonic development and applied dynamic time warping to align them across species, revealing conserved and divergent patterns in pancreatic development.

PythonScanpy scVIDTW PalantirtradeSeq
CNN-based chest X-ray classification MobileNetV2 + U-Net MDR-TB drug-resistant DS-TB drug-susceptible 87% · ~3,000 cases
Deep Learning · Medical Imaging

Distinguishing Drug-Resistant TB from Chest X-Rays

BITS Pilani · 2021

Drug-resistant tuberculosis is hard to diagnose and deadly when missed. We trained CNN models including MobileNetV2 with U-Net segmentation on ~3,000 chest X-rays to distinguish multi-drug-resistant TB from drug-susceptible TB, reaching 87% accuracy and deploying the model as a web application.

MobileNetV2U-Net TensorFlowFlask
Journey

Where I've been

My path has zigzagged between software engineering and research, and I think that's what makes it interesting.

Mar 2025 – Present
Pittsburgh, PA

Research Assistant

Carnegie Mellon University

Building mutation-level prediction pipelines with pretrained protein language models (ESM-2) and designing hybrid transformer/Mamba architectures for long-range protein sequence modeling.

Jul 2023 – Jun 2024
Bangalore, India

Software Development Engineer 1

Amazon

Built a Java data pipeline handling SNS notifications and DynamoDB updates that cut storage costs by 40%. Designed a multilingual UI for a global trade platform, improving accessibility across English and Spanish.

Jan – Jun 2023
Bangalore, India

Software Intern

Nutanix Technologies

Migrated a critical recovery plans workflow from BackboneJS to React 16 and built interactive formatters and event handlers for the Prism website.

Aug – Dec 2022
Munich, Germany

Master's Thesis Researcher

Institute of Computational Biology, Dr. Fabian Theis

Analyzed beta-cell trajectories across species using scRNA-seq data, applying dynamic time warping to align pseudotime trajectories in humans, mice, and pigs.

May – Aug 2021
Munich, Germany

Computational Biology Research Intern

Institute of Computational Biology, Dr. Fabian Theis

Optimized unsupervised ML algorithms (trVAE, scVI) to integrate genomic data from 140,000 cells. Ran 10+ experiments with varying species ratios to find the sweet spot for preserving biological signal while eliminating batch effects.

Jun – Aug 2022
Delhi, India

Summer Intern

Samsung Research Institute

Built a multimedia API testing application for Samsung TV with a custom HTML/CSS/JS interface.

Toolkit

What I work with

Core strengths in terracotta, bioinformatics tools in green.

Python PyTorch Java Go C JavaScript SQL TensorFlow Keras scikit-learn Pandas NumPy Scanpy scVI Cellpose BLAST AWS DynamoDB S3 Git JUnit Mockito
Education

Where I studied

M.S. Automated Science

Carnegie Mellon University
Pittsburgh, PA · GPA 4.3/4.0 · Expected May 2026
Machine Learning · Automation of Scientific Research · Bioinformatics · Programming for Scientists

B.E. Computer Science & M.Sc. Biology

BITS Pilani
Pilani, India · GPA 8.99/10 · Jul 2023
Deep Learning · Data Structures & Algorithms · Genetics · Genetic Engineering · Probability & Statistics · Linear Algebra
Say hello

Let's talk biology,
code, or both.

I'm currently looking for opportunities in software engineering, computational biology, and ML research. Whether you're hiring, collaborating, or just want to chat about the intersection of biology and computation, I'd love to hear from you.

Email LinkedIn