Leading the RL post-training team at Covenant AI · comms efficiency and off-policiness.
ErfanMiahi
Current conditions.
Pressure: 11,000 psi · Last ping: April 18, 2026
Shipping a new trainer-to-trainer comms algorithm. (The weight-comms paper is already out, cited by Fireworks × Cursor for Composer 2.)
Nietzsche · Thus Spoke Zarathustra. Re-reading Mathematics for Machine Learning.
34 skydive jumps logged. Target: 200 by end of summer. Wind tunnels at iFLY in between.
A brief dispatch.
I 'm a research engineer & scientist working on the hard parts of post-training: off-policy RL, verifiable rewards, and decentralized training infrastructure. Nine years in AI, five shipping production ML.
I did my M.Sc. at the University of Alberta's RLAI Lab under Martha White and Marlos C. Machado. Collaborated with Google DeepMind. Published across EMNLP, AIJ, MLJ, TMLR, and IEEE T-Cyb. Ten papers, 212 citations.
I'm a founding research engineer at Covenant AI, where we pre-trained Covenant-72B, a 72B LLM trained across trustless peers over the open internet. My weight-update-sparsity paper (arXiv 2602.03839, ~100× bandwidth efficiency) is cited by Fireworks in their globally-distributed training of Cursor's Composer 2. Before Covenant, founding ML research engineer at DeepR Analytics; interviewed at YC S24 (top 7%).
When I surface from the work: 34 skydive jumps and climbing toward 200 by end of summer, wind tunnels at iFLY between weekends, parkour when a city asks for it, long-distance running when it doesn't. The body moves as much as the mind. I'm drawn to the art of movement and to anything that asks you to meet the edge honestly. The two thinkers I keep returning to are Nietzsche and Jung. I've mentored 10+ students since 2017, mostly on AI research and career.
- 212
- Citations
- 10
- Publications
- 34
- Skydives
- 1st
- Meta Llama Toronto '24
- 72B
- Covenant pre-train
- 10+
- Mentees since 2017
Publications & preprints.
EMNLP · AIJ · MLJ · TMLR · DeepMind collaboration · 212 citations across ten papers.
- 2026
Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL
Erfan Miahi, Eugene Belilovsky
First author
arXivpreprint - 2026
How Reliable are Confidence Estimators for Large Reasoning Models?
Reza Khanmohammadi, Erfan Miahi, Sahar Kaur, Charese Smiley, Isabella Brugere
European Chapter of the ACL
EACLmain - 2026
Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet
Joel Lidin, Amir Sarfi, Erfan Miahi, Quentin Anthony, Suraj Chauhan, Eugenios Pappas
Largest decentralized pre-train to date
arXivpreprint - 2025
Calibrating LLM Confidence by Probing Perturbed Representation Stability
Reza Khanmohammadi, Erfan Miahi, Mardikoraem, Kaur, Brugere
Empirical Methods in Natural Language Processing
EMNLPmain - 2024
Investigating the Properties of Neural Network Representations in Reinforcement Learning
Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy
DeepMind collaboration · Artificial Intelligence Journal
AIJjournal - 2024
GVFs in the Real World: Making Predictions Online for Water Treatment
Muhammad K. Janjua, Haseeb Shah, Martha White, Erfan Miahi, Marlos C. Machado, Adam White
Machine Learning Journal
MLJjournal - 2023
ResMax: An Alternative Soft-Greedy Operator for Reinforcement Learning
Erfan Miahi, Revan MacQueen, Alex Ayoub, Abbas Masoumzadeh, Martha White
Transactions on Machine Learning Research
TMLRjournal - 2022
Genetic Neural Architecture Search for Automatic Assessment of Human Sperm Images
Erfan Miahi, S.A. Mirroshandel, A. Nasr
First author · NAS for medical imaging
Expert Systems w/ Apps.journal - 2022
Scalable Transfer Evolutionary Optimization: Coping with Big Task Instances
Mojtaba Shakeri, Erfan Miahi, Abhishek Gupta, Yew-Soon Ong
NTU × A*STAR collaboration
IEEE T-Cybjournal - 2021
Effect of Deep Transfer and Multi-task Learning on Sperm Abnormality Detection
A. Abbasi, Erfan Miahi, S.A. Mirroshandel
Most-cited paper · Computers in Biology and Medicine
Comp. Bio. & Med.journal
Projects in the open.
Grail
Distributed, decentralized, incentivized post-training system for LLM reasoning. Covenant AI's open research stack.
Covenant-72B
Pre-trained a 72B LLM across trustless peers over the open internet. Largest decentralized pre-train to date.
q Evaluation Harness
First open-source evaluation framework for LLMs on q/kdb+. Published at KX Systems.
Draw Your Circuit
Meta Quest 3 circuit analyzer that turns freehand sketches into executable schematics. First place at the Meta Llama hackathon, Toronto 2024.
Deep-RL-CS285-Pytorch
PyTorch re-implementation of Berkeley CS285 deep RL assignments. Most-starred repo, used widely as a learning reference.
IntractCodeAPI
API for code completion and fine-tuning of open-source LLMs.
Dispatches from the deep.
Essays, technical notes, and things that needed to be said.
The Act of Creation
On moving from passive absorption to active curation, and finding the self through creation.
Introducing q Evaluation Harness
The first open-source evaluation framework for LLMs on q/kdb+. LLMs score 96.2% on Python HumanEval; even Grok 4 manages only 43.4% on the equivalent q problems.
Books at depth.
47 read · 20 currently reading · 176 on the shelf · 4.86 avg rating on Goodreads.
Accelerate
The Startup of You
Team Geek
The secret for harvesting from existence the greatest fruitfulness and the greatest enjoyment is: to live dangerously.
A small apprenticeship.
Since 2017 I've mentored 10+ students through monthly or bi-weekly meetings, mapping passion to path. One mentee went from a regional Iranian university to a PhD at Michigan State.
Email with 'Mentorship Program' in the subject line. Address is in the Contact section below.
- 01
You want to do AI research or engineering, and want an honest sparring partner.
- 02
You're willing to show up consistently, even when the work stops being exciting.
- 03
You'd rather be told the truth than something comfortable.
When I surface.
Skydiving, wind tunnels, parkour, long-distance running. The body moves as much as the mind.