N 43°39′ · W 79°23′ · Toronto

ErfanMiahi

Currently

Working on off-policy RLVR, verifiable rewards, and geo-distributed post-training. Author of the weight-update-sparsity paper (arXiv 2602.03839) cited by Fireworks × Cursor for Composer 2; co-shipped Covenant-72B across trustless peers. M.Sc. at UAlberta RLAI Lab (Richard Sutton's Lab) with Martha White and Marlos C. Machado, collaboration with Google DeepMind.

mhi.erfan1@gmail.com CV GitHub

based

Toronto, Canada

formerly

RLAI · UAlberta

field

Post-training & RL

tagline

Research Engineer / Scientist · Post-training · Reasoning

Now Current conditions.

Pressure: 11,000 psi · Last ping: May 11, 2026

2026 · Q2

RL post-training · communication efficiency and off-policy methods.

This month

Shipping a new trainer-to-trainer comms algorithm. (The weight-comms paper is already out, cited by Fireworks × Cursor for Composer 2.)

weight-comms paper Fireworks blog

Reading

Nietzsche · Thus Spoke Zarathustra. Re-reading Mathematics for Machine Learning.

Training

34 skydive jumps logged. Target: 200 by end of summer. Wind tunnels at iFLY in between.

Skydive Toronto iFLY Toronto

About A brief dispatch.

Portrait of Erfan Miahi, Research Engineer · Post-training & RL

I 'm a research engineer and scientist working on the hard parts of post-training: off-policy RL, verifiable rewards, and geo-distributed post-training infrastructure. Nine years in AI, five shipping production ML.

I did my M.Sc. at the University of Alberta's RLAI Lab (Richard Sutton's Lab) under Martha White and Marlos C. Machado. Collaborated with Google DeepMind. Published across EMNLP, AIJ, MLJ, TMLR, and IEEE T-Cyb. Ten papers, 212 citations.

At Covenant AI, I was a founding research engineer on Covenant-72B (a 72B LLM pre-trained across trustless peers over the open internet) and authored the weight-update-sparsity paper (arXiv 2602.03839, ~100× bandwidth efficiency) cited by Fireworks in their globally-distributed training of Cursor's Composer 2. Earlier, founding ML research engineer at DeepR Analytics; interviewed at YC S24 (top 7%).

When I surface from the work: 34 skydive jumps and climbing toward 200 by end of summer, wind tunnels at iFLY between weekends, parkour when a city asks for it, long-distance running when it doesn't. The body moves as much as the mind. I'm drawn to the art of movement and to anything that asks you to meet the edge honestly. The two thinkers I keep returning to are Nietzsche and Jung. I've mentored 10+ students since 2017, mostly on AI research and career.

212: Citations
10: Publications
34: Skydives
1st: Meta Llama Toronto '24
72B: Covenant pre-train
10+: Mentees since 2017

Research Publications & preprints.

EMNLP · AIJ · MLJ · TMLR · DeepMind collaboration · 212 citations across ten papers.

Selected

All publications

Work Projects in the open.

open-source · flagship

Grail

Geo-distributed, incentivized post-training system for LLM reasoning. Covenant AI's open research stack.

PyTorchRayRLVR

flagship

Covenant-72B

Pre-trained a 72B LLM across trustless peers over the open internet. Largest geo-distributed pre-train to date.

FSDPMegatronNebius

tooling

q Evaluation Harness

First open-source evaluation framework for LLMs on q/kdb+. Published at KX Systems.

PythonTRLUnsloth

1st · Meta Llama Hackathon · Toronto 2024

Draw Your Circuit

Meta Quest 3 circuit analyzer that turns freehand sketches into executable schematics. First place at the Meta Llama hackathon, Toronto 2024.

UnityLLMComputer Vision

educational

Deep-RL-CS285-Pytorch

PyTorch re-implementation of Berkeley CS285 deep RL assignments. Most-starred repo, used widely as a learning reference.

PyTorchDeep RL

tooling

IntractCodeAPI

API for code completion and fine-tuning of open-source LLMs.

PythonFastAPIHF Transformers

Writing Dispatches from the deep.

Essays, technical notes, and things that needed to be said.

2026 · 01 · 05

The Act of Creation

On moving from passive absorption to active curation, and finding the self through creation.

essay · 12 min Substack · Liminal

2025 · 08 · 14

Introducing q Evaluation Harness

The first open-source evaluation framework for LLMs on q/kdb+. LLMs score 96.2% on Python HumanEval; even Grok 4 manages only 43.4% on the equivalent q problems.

with Andrew Morrison

technical · 7 min Medium · KX Systems

Reading Books at depth.

47 read · 20 currently reading · 176 on the shelf · 4.86 avg rating on Goodreads.

Currently reading

Accelerate

Nicole Forsgren

engineering

The Startup of You

Reid Hoffman

entrepreneurship

Team Geek

Brian Fitzpatrick

engineering

Accelerate

Nicole Forsgren

NOW

The Startup of You

Reid Hoffman

NOW

Team Geek

Brian Fitzpatrick

NOW

On Intelligence

Jeff Hawkins

Five Dialogues

Plato

A Little History of the World

E.H. Gombrich

The Denial of Death

Ernest Becker

Memories, Dreams, Reflections

C.G. Jung

Good to Great

Jim Collins

The Art of Being

Erich Fromm

The Last Lecture

Randy Pausch

Art History

Dana Arnold

How to Win Friends…

Dale Carnegie

See full shelf on Goodreads ↗

47read 20in progress 176to read 4.86avg ★

The secret for harvesting from existence the greatest fruitfulness and the greatest enjoyment is: to live dangerously.

Friedrich Nietzsche · Die fröhliche Wissenschaft · §283

Mentorship A small apprenticeship.

Since 2017 I've mentored 10+ students through monthly or bi-weekly meetings, mapping passion to path. One mentee went from a regional Iranian university to a PhD at Michigan State.

How to apply

Email with 'Mentorship Program' in the subject line. Address is in the Contact section below.

You might be a fit if

01
You want to do AI research or engineering, and want an honest sparring partner.
02
You're willing to show up consistently, even when the work stops being exciting.
03
You'd rather be told the truth than something comfortable.

Off-desk When I surface.

Skydiving, wind tunnels, parkour, long-distance running. The body moves as much as the mind.

freefall

Skydive #34 · climbing toward 200 44°N · 79°W

movement

Parkour · wherever a city asks for it 43°N · 79°W

research

RLAI Lab · Edmonton 53°N · 113°W

city

Toronto skyline · 04:12 AM 43°N · 79°W

endurance

Long run · Lake Ontario shore 43°N · 79°W

1st · Meta Llama

Hackathon win · Toronto 2024 43°N · 79°W

See the full archive ↗

Surface contact

Write a better
letter than
the usual one.

mhi.erfan1 · at · gmail · com

CV · PDF ↗

ErfanMiahi

Now Current conditions.

About A brief dispatch.

Research Publications & preprints.

Selected

Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet

Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL

Investigating the Properties of Neural Network Representations in Reinforcement Learning

All publications

How Reliable are Confidence Estimators for Large Reasoning Models?

Calibrating LLM Confidence by Probing Perturbed Representation Stability

GVFs in the Real World: Making Predictions Online for Water Treatment

ResMax: An Alternative Soft-Greedy Operator for Reinforcement Learning

Genetic Neural Architecture Search for Automatic Assessment of Human Sperm Images

Scalable Transfer Evolutionary Optimization: Coping with Big Task Instances

Effect of Deep Transfer and Multi-task Learning on Sperm Abnormality Detection

Work Projects in the open.

Grail

Covenant-72B

q Evaluation Harness

Draw Your Circuit

Deep-RL-CS285-Pytorch

IntractCodeAPI

Writing Dispatches from the deep.

The Act of Creation

Introducing q Evaluation Harness

Reading Books at depth.

Accelerate

The Startup of You

Team Geek

Mentorship A small apprenticeship.

Off-desk When I surface.

Write a better
letter than
the usual one.

ErfanMiahi

Selected

Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet

Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL

Investigating the Properties of Neural Network Representations in Reinforcement Learning

All publications

How Reliable are Confidence Estimators for Large Reasoning Models?

Calibrating LLM Confidence by Probing Perturbed Representation Stability

GVFs in the Real World: Making Predictions Online for Water Treatment

ResMax: An Alternative Soft-Greedy Operator for Reinforcement Learning

Genetic Neural Architecture Search for Automatic Assessment of Human Sperm Images

Scalable Transfer Evolutionary Optimization: Coping with Big Task Instances

Effect of Deep Transfer and Multi-task Learning on Sperm Abnormality Detection

Grail

Covenant-72B

q Evaluation Harness

Draw Your Circuit

Deep-RL-CS285-Pytorch

IntractCodeAPI

The Act of Creation

Introducing q Evaluation Harness

Accelerate

The Startup of You

Team Geek

Write a betterletter thanthe usual one.

Write a better
letter than
the usual one.