Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,22 +28,22 @@ build:
- "libglib2.0-0"
python_version: "3.13"
python_requirements: requirements.txt
predict: "predict.py:Predictor"
run: "run.py:Runner"
```

Define how predictions are run on your model with `predict.py`:
Define how predictions are run on your model with `run.py`:

```python
from cog import BasePredictor, Input, Path
from cog import BaseRunner, Input, Path
import torch

class Predictor(BasePredictor):
class Runner(BaseRunner):
def setup(self):
"""Load the model into memory to make running multiple predictions efficient"""
self.model = torch.load("./weights.pth")

# The arguments and types the model takes as input
def predict(self,
def run(self,
image: Path = Input(description="Grayscale input image")
) -> Path:
"""Run a single prediction on the model"""
Expand All @@ -57,7 +57,7 @@ In the above we accept a path to the image as an input, and return a path to our
Now, you can run predictions on this model:

```console
$ cog predict -i image=@input.jpg
$ cog run -i image=@input.jpg
--> Building Docker image...
--> Running Prediction...
--> Output written to output.jpg
Expand Down Expand Up @@ -180,7 +180,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for how to set up a development environme
- [Take a look at some examples of using Cog](https://github.com/replicate/cog-examples)
- [Deploy models with Cog](docs/deploy.md)
- [`cog.yaml` reference](docs/yaml.md) to learn how to define your model's environment
- [Prediction interface reference](docs/python.md) to learn how the `Predictor` interface works
- [Run interface reference](docs/python.md) to learn how the `Runner` interface works
- [Training interface reference](docs/training.md) to learn how to add a fine-tuning API to your model
- [HTTP API reference](docs/http.md) to learn how to use the HTTP API that models serve

Expand Down
28 changes: 14 additions & 14 deletions architecture/00-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,15 @@ flowchart LR

### Model Source

What the model author provides: `cog.yaml` for environment config, a Predictor class with `setup()` and `predict()` methods, and optionally model weights.
What the model author provides: `cog.yaml` for environment config, a Runner class with `setup()` and `run()` methods, and optionally model weights.

**Deep dive**: [Model Source](./01-model-source.md)

---

### Python SDK

The `cog` Python package that model authors import. Provides `BasePredictor`, the type system (`Input`, `Path`, `Secret`, `ConcatenateIterator`), and the thin server entry point that launches coglet. Installed inside every Cog container as a wheel.
The `cog` Python package that model authors import. Provides `BaseRunner`, the type system (`Input`, `Path`, `Secret`, `ConcatenateIterator`), and the thin server entry point that launches coglet. Installed inside every Cog container as a wheel.

**Deep dive**: [Model Source](./01-model-source.md) (covers the SDK's public API)

Expand Down Expand Up @@ -93,7 +93,7 @@ The command-line tool for building, testing, and deploying models.
flowchart TB
subgraph source["Model Source"]
yaml["cog.yaml"]
code["predict.py"]
code["run.py"]
weights["weights"]
end

Expand All @@ -111,7 +111,7 @@ flowchart TB
subgraph runtime["Runtime"]
server["HTTP Server<br/>(Rust/Axum)"]
worker["Worker Subprocess<br/>(Python)"]
predictor["Predictor"]
predictor["Runner"]
end

yaml --> config
Expand All @@ -130,16 +130,16 @@ flowchart TB

## Terminology

| Term | Meaning |
| ------------- | ------------------------------------------------------------------------- |
| **SDK** | The `cog` Python package -- the framework users build models on |
| **Predictor** | User's model class with `setup()` and `predict()` methods |
| **Schema** | OpenAPI spec describing the model's input/output interface |
| **Envelope** | Fixed request/response structure wrapping model-specific data |
| **Worker** | Isolated subprocess running user code |
| **Setup** | One-time model initialization at container start |
| **Coglet** | Rust-based prediction server that runs inside containers |
| **Slot** | A concurrency unit -- one Unix socket connection to the worker subprocess |
| Term | Meaning |
| ------------ | ------------------------------------------------------------------------- |
| **SDK** | The `cog` Python package -- the framework users build models on |
| **Runner** | User's model class with `setup()` and `run()` methods |
| **Schema** | OpenAPI spec describing the model's input/output interface |
| **Envelope** | Fixed request/response structure wrapping model-specific data |
| **Worker** | Isolated subprocess running user code |
| **Setup** | One-time model initialization at container start |
| **Coglet** | Rust-based prediction server that runs inside containers |
| **Slot** | A concurrency unit -- one Unix socket connection to the worker subprocess |

## Reading Order

Expand Down
80 changes: 40 additions & 40 deletions architecture/01-model-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ A Cog model consists of:
```
my-model/
├── cog.yaml # Environment configuration
├── predict.py # Predictor class
├── run.py # Runner class
└── weights/ # Model weights (optional, can be downloaded)
```

Expand All @@ -29,37 +29,37 @@ build:
run:
- curl -o /src/model.bin https://example.com/model.bin

predict: "predict.py:Predictor"
run: "run.py:Runner"

concurrency:
max: 1
```

| Field | Purpose |
| ----------------------- | -------------------------------------------- |
| `build.python_version` | Python interpreter version (3.10-3.13) |
| `build.gpu` | Enable CUDA support |
| `build.python_packages` | pip packages to install |
| `build.system_packages` | apt packages to install |
| `build.run` | Arbitrary shell commands during build |
| `predict` | Path to predictor class (`module:ClassName`) |
| `concurrency.max` | Max concurrent predictions (requires async) |
| Field | Purpose |
| ----------------------- | ------------------------------------------- |
| `build.python_version` | Python interpreter version (3.10-3.13) |
| `build.gpu` | Enable CUDA support |
| `build.python_packages` | pip packages to install |
| `build.system_packages` | apt packages to install |
| `build.run` | Arbitrary shell commands during build |
| `run` | Path to runner class (`module:ClassName`) |
| `concurrency.max` | Max concurrent predictions (requires async) |

The [Build System](./05-build-system.md) uses this configuration to produce an image containing all necessary dependencies, libraries, and the correct Python/CUDA versions.

## The Predictor Class
## The Runner Class

A predictor is a Python class with two methods:
A runner is a Python class with two methods:

```python
from cog import BasePredictor, Input, Path
from cog import BaseRunner, Input, Path

class Predictor(BasePredictor):
class Runner(BaseRunner):
def setup(self):
"""Load model into memory. Called once at container start."""
self.model = load_model("./weights")

def predict(self, prompt: str, steps: int = 50) -> Path:
def run(self, prompt: str, steps: int = 50) -> Path:
"""Run inference. Called for each prediction request."""
output = self.model.generate(prompt, steps=steps)
output.save("/tmp/output.png")
Expand All @@ -74,7 +74,7 @@ class Predictor(BasePredictor):
- Optional: if omitted, Cog proceeds directly to serving
- See [Container Runtime: Predictor Lifecycle](./04-container-runtime.md#predictor-lifecycle) for details on instance lifetime, concurrency, crash recovery, and shutdown

### predict()
### run()

- Called **for each prediction request**
- Signature defines the model's input schema (via type hints)
Expand All @@ -84,12 +84,12 @@ class Predictor(BasePredictor):

## Input Types

The types used in `predict()` parameters become the model's input schema.
The types used in `run()` parameters become the model's input schema.

### Basic Types

```python
def predict(
def run(
self,
text: str, # String input
count: int, # Integer
Expand All @@ -105,7 +105,7 @@ URLs are automatically downloaded to local files:
```python
from cog import Path

def predict(self, image: Path) -> Path:
def run(self, image: Path) -> Path:
# Client sends: {"input": {"image": "https://example.com/photo.jpg"}}
# Cog downloads the URL, `image` is a local path like /tmp/inputabc123.jpg
img = PIL.Image.open(image)
Expand All @@ -125,7 +125,7 @@ For sensitive values that shouldn't appear in logs:
```python
from cog import Secret

def predict(self, api_key: Secret) -> str:
def run(self, api_key: Secret) -> str:
# Value is masked in logs and webhooks
client = SomeAPI(api_key.get_secret_value())
...
Expand All @@ -138,7 +138,7 @@ Use `Input()` to add metadata and validation:
```python
from cog import Input

def predict(
def run(
self,
prompt: str = Input(description="The text prompt"),
steps: int = Input(default=50, ge=1, le=100, description="Inference steps"),
Expand All @@ -159,7 +159,7 @@ def predict(
```python
from typing import Literal

def predict(
def run(
self,
size: Literal["small", "medium", "large"] = "medium",
) -> str:
Expand All @@ -171,7 +171,7 @@ def predict(
from typing import List
from cog import Path

def predict(
def run(
self,
images: List[Path], # Multiple file inputs
tags: List[str], # Multiple strings
Expand All @@ -183,7 +183,7 @@ def predict(
```python
from typing import Optional

def predict(
def run(
self,
seed: Optional[int] = None, # Can be omitted or null
) -> str:
Expand All @@ -196,7 +196,7 @@ The return type annotation defines what the model produces.
### Basic Types

```python
def predict(self, prompt: str) -> str:
def run(self, prompt: str) -> str:
return "Generated text..."
```

Expand All @@ -207,7 +207,7 @@ Return `cog.Path` pointing to a generated file:
```python
from cog import Path

def predict(self, prompt: str) -> Path:
def run(self, prompt: str) -> Path:
# Generate file
output_path = "/tmp/output.png"
self.model.generate(prompt).save(output_path)
Expand All @@ -224,7 +224,7 @@ Return a list:
from typing import List
from cog import Path

def predict(self, prompt: str) -> List[Path]:
def run(self, prompt: str) -> List[Path]:
paths = []
for i in range(4):
path = f"/tmp/output_{i}.png"
Expand All @@ -240,7 +240,7 @@ Yield values progressively:
```python
from typing import Iterator

def predict(self, prompt: str) -> Iterator[str]:
def run(self, prompt: str) -> Iterator[str]:
for token in self.model.generate_stream(prompt):
yield token
```
Expand All @@ -254,7 +254,7 @@ For LLM-style token streaming where outputs should be concatenated:
```python
from cog import ConcatenateIterator

def predict(self, prompt: str) -> ConcatenateIterator[str]:
def run(self, prompt: str) -> ConcatenateIterator[str]:
for token in self.model.generate(prompt):
yield token # "Hello", " ", "world", "!"
# Client sees progressive: "Hello" -> "Hello " -> "Hello world" -> "Hello world!"
Expand All @@ -273,7 +273,7 @@ Include weights in your source directory - they're copied into the image during
```
my-model/
├── cog.yaml
├── predict.py
├── run.py
└── weights/
└── model.safetensors
```
Expand Down Expand Up @@ -313,11 +313,11 @@ The choice depends on your deployment needs - bundled weights make images larger
For concurrent predictions, use async:

```python
class Predictor(BasePredictor):
class Runner(BaseRunner):
async def setup(self):
self.model = await load_model_async()

async def predict(self, prompt: str) -> str:
async def run(self, prompt: str) -> str:
return await self.model.generate(prompt)
```

Expand All @@ -330,10 +330,10 @@ See [Container Runtime](./04-container-runtime.md) for concurrency details.

## Code References

| File | Purpose |
| ------------------------- | --------------------------------------------------------- |
| `python/cog/__init__.py` | Public API exports |
| `python/cog/predictor.py` | BasePredictor class, type introspection, weights handling |
| `python/cog/types.py` | Path, Secret, ConcatenateIterator |
| `python/cog/input.py` | `Input()` function and field metadata |
| `pkg/config/config.go` | cog.yaml parsing |
| File | Purpose |
| ------------------------- | ------------------------------------------------------ |
| `python/cog/__init__.py` | Public API exports |
| `python/cog/predictor.py` | BaseRunner class, type introspection, weights handling |
| `python/cog/types.py` | Path, Secret, ConcatenateIterator |
| `python/cog/input.py` | `Input()` function and field metadata |
| `pkg/config/config.go` | cog.yaml parsing |
Loading
Loading