AutoResearchStudio

Write a program.md that describes your research goal and point an AI agent at it. It modifies your code, trains, evaluates, keeps or discards — and repeats, while you sleep.

Get started
terminal
# Install and init your project
$ pip install autoresearchstudio
$ ars init
Created autoresearch.yaml
Created program.md

# Generate a dashboard API key
$ ars key
โœ“ API key saved to autoresearch.yaml

# Launch Claude Code โ€” it takes over from here
$ claude
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚  Claude Code                         โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

> Hi Claude, read program.md and start the experiment.

Claude: Setting up run...
  ars setup --tag v1
  ars run -d "baseline"
  accuracy: 95.40
  ars judge
  KEEP โ€” baseline accepted

Claude: Trying dropout + cosine lr schedule...
  ars run -d "add dropout + cosine lr"
  accuracy: 97.20
  ars judge
  KEEP โ€” 95.40 โ†’ 97.20

Claude: Testing gelu activation...
  ars run -d "replace relu with gelu"
  accuracy: 96.80
  ars judge
  DISCARD โ€” regressed, reverted

Claude: Continuing to next experiment...

The loop

The agent repeats this cycle autonomously until your model converges.

Claude
claude
ModifyAI edits code
TrainRuns experiment
JudgeLLM evaluates
Keepor discard

Three files, full autonomy

You write program.md and autoresearch.yaml. The agent edits train.py.

๐Ÿ“ program.md
You are an ML researcher optimizing a
neural network for image classification.

Goal: maximize accuracy on the test set.

Strategy:
- Try architectural changes first
- Then tune hyperparameters
- Prefer simpler solutions
- Keep changes small and focused
The instructions for the AI agent. This is the file you iterate on.
โš™๏ธ autoresearch.yaml
project:
  name: mnist-research
  goal: Maximize accuracy

files:
  editable: [train.py]
  readonly: [prepare.py]

experiment:
  run_command: python train.py
  timeout: 120

metric:
  name: accuracy
  pattern: "Accuracy: ([0-9.]+)"
  direction: maximize
Config: training command, metric, time budget. One file, no boilerplate.
๐Ÿงช train.py
import torch
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

# training loop ...
print(f"Accuracy: {acc:.2f}")
The agent modifies this file. Every change is a git commit โ€” keeps are merged, discards are reverted.
๐Ÿ˜ด๐Ÿ’ค

Now go to sleep.
Claude will take it from here.