replicate · markphelps · May 12, 2026 · May 12, 2026 · May 12, 2026 · May 13, 2026
@@ -28,22 +28,22 @@ build:
     - "libglib2.0-0"
   python_version: "3.13"
   python_requirements: requirements.txt
-predict: "predict.py:Predictor"
+run: "run.py:Runner"
 ```
 
-Define how predictions are run on your model with `predict.py`:
+Define how predictions are run on your model with `run.py`:
 
 ```python
-from cog import BasePredictor, Input, Path
+from cog import BaseRunner, Input, Path
 import torch
 
-class Predictor(BasePredictor):
+class Runner(BaseRunner):
     def setup(self):
         """Load the model into memory to make running multiple predictions efficient"""
         self.model = torch.load("./weights.pth")
 
     # The arguments and types the model takes as input
-    def predict(self,
+    def run(self,
           image: Path = Input(description="Grayscale input image")
     ) -> Path:
         """Run a single prediction on the model"""
@@ -57,7 +57,7 @@ In the above we accept a path to the image as an input, and return a path to our
 Now, you can run predictions on this model:
 
 ```console
-$ cog predict -i image=@input.jpg
+$ cog run -i image=@input.jpg
 --> Building Docker image...
 --> Running Prediction...
 --> Output written to output.jpg
@@ -180,7 +180,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for how to set up a development environme
 - [Take a look at some examples of using Cog](https://github.com/replicate/cog-examples)
 - [Deploy models with Cog](docs/deploy.md)
 - [`cog.yaml` reference](docs/yaml.md) to learn how to define your model's environment
-- [Prediction interface reference](docs/python.md) to learn how the `Predictor` interface works
+- [Run interface reference](docs/python.md) to learn how the `Runner` interface works
 - [Training interface reference](docs/training.md) to learn how to add a fine-tuning API to your model
 - [HTTP API reference](docs/http.md) to learn how to use the HTTP API that models serve
 

@@ -33,15 +33,15 @@ flowchart LR
 
 ### Model Source
 
-What the model author provides: `cog.yaml` for environment config, a Predictor class with `setup()` and `predict()` methods, and optionally model weights.
+What the model author provides: `cog.yaml` for environment config, a Runner class with `setup()` and `run()` methods, and optionally model weights.
 
 **Deep dive**: [Model Source](./01-model-source.md)
 
 ---
 
 ### Python SDK
 
-The `cog` Python package that model authors import. Provides `BasePredictor`, the type system (`Input`, `Path`, `Secret`, `ConcatenateIterator`), and the thin server entry point that launches coglet. Installed inside every Cog container as a wheel.
+The `cog` Python package that model authors import. Provides `BaseRunner`, the type system (`Input`, `Path`, `Secret`, `ConcatenateIterator`), and the thin server entry point that launches coglet. Installed inside every Cog container as a wheel.
 
 **Deep dive**: [Model Source](./01-model-source.md) (covers the SDK's public API)
 
@@ -93,7 +93,7 @@ The command-line tool for building, testing, and deploying models.
 flowchart TB
     subgraph source["Model Source"]
         yaml["cog.yaml"]
-        code["predict.py"]
+        code["run.py"]
         weights["weights"]
     end
 
@@ -111,7 +111,7 @@ flowchart TB
     subgraph runtime["Runtime"]
         server["HTTP Server<br/>(Rust/Axum)"]
         worker["Worker Subprocess<br/>(Python)"]
-        predictor["Predictor"]
+        predictor["Runner"]
     end
 
     yaml --> config
@@ -130,16 +130,16 @@ flowchart TB
 
 ## Terminology
 
-| Term          | Meaning                                                                   |
-| ------------- | ------------------------------------------------------------------------- |
-| **SDK**       | The `cog` Python package -- the framework users build models on           |
-| **Predictor** | User's model class with `setup()` and `predict()` methods                 |
-| **Schema**    | OpenAPI spec describing the model's input/output interface                |
-| **Envelope**  | Fixed request/response structure wrapping model-specific data             |
-| **Worker**    | Isolated subprocess running user code                                     |
-| **Setup**     | One-time model initialization at container start                          |
-| **Coglet**    | Rust-based prediction server that runs inside containers                  |
-| **Slot**      | A concurrency unit -- one Unix socket connection to the worker subprocess |
+| Term         | Meaning                                                                   |
+| ------------ | ------------------------------------------------------------------------- |
+| **SDK**      | The `cog` Python package -- the framework users build models on           |
+| **Runner**   | User's model class with `setup()` and `run()` methods                     |
+| **Schema**   | OpenAPI spec describing the model's input/output interface                |
+| **Envelope** | Fixed request/response structure wrapping model-specific data             |
+| **Worker**   | Isolated subprocess running user code                                     |
+| **Setup**    | One-time model initialization at container start                          |
+| **Coglet**   | Rust-based prediction server that runs inside containers                  |
+| **Slot**     | A concurrency unit -- one Unix socket connection to the worker subprocess |
 
 ## Reading Order
 

@@ -9,7 +9,7 @@ A Cog model consists of:
 ```
 my-model/
 ├── cog.yaml          # Environment configuration
-├── predict.py        # Predictor class
+├── run.py            # Runner class
 └── weights/          # Model weights (optional, can be downloaded)
 ```
 
@@ -29,37 +29,37 @@ build:
   run:
     - curl -o /src/model.bin https://example.com/model.bin
 
-predict: "predict.py:Predictor"
+run: "run.py:Runner"
 
 concurrency:
   max: 1
 ```
 
-| Field                   | Purpose                                      |
-| ----------------------- | -------------------------------------------- |
-| `build.python_version`  | Python interpreter version (3.10-3.13)       |
-| `build.gpu`             | Enable CUDA support                          |
-| `build.python_packages` | pip packages to install                      |
-| `build.system_packages` | apt packages to install                      |
-| `build.run`             | Arbitrary shell commands during build        |
-| `predict`               | Path to predictor class (`module:ClassName`) |
-| `concurrency.max`       | Max concurrent predictions (requires async)  |
+| Field                   | Purpose                                     |
+| ----------------------- | ------------------------------------------- |
+| `build.python_version`  | Python interpreter version (3.10-3.13)      |
+| `build.gpu`             | Enable CUDA support                         |
+| `build.python_packages` | pip packages to install                     |
+| `build.system_packages` | apt packages to install                     |
+| `build.run`             | Arbitrary shell commands during build       |
+| `run`                   | Path to runner class (`module:ClassName`)   |
+| `concurrency.max`       | Max concurrent predictions (requires async) |
 
 The [Build System](./05-build-system.md) uses this configuration to produce an image containing all necessary dependencies, libraries, and the correct Python/CUDA versions.
 
-## The Predictor Class
+## The Runner Class
 
-A predictor is a Python class with two methods:
+A runner is a Python class with two methods:
 
 ```python
-from cog import BasePredictor, Input, Path
+from cog import BaseRunner, Input, Path
 
-class Predictor(BasePredictor):
+class Runner(BaseRunner):
     def setup(self):
         """Load model into memory. Called once at container start."""
         self.model = load_model("./weights")
 
-    def predict(self, prompt: str, steps: int = 50) -> Path:
+    def run(self, prompt: str, steps: int = 50) -> Path:
         """Run inference. Called for each prediction request."""
         output = self.model.generate(prompt, steps=steps)
         output.save("/tmp/output.png")
@@ -74,7 +74,7 @@ class Predictor(BasePredictor):
 - Optional: if omitted, Cog proceeds directly to serving
 - See [Container Runtime: Predictor Lifecycle](./04-container-runtime.md#predictor-lifecycle) for details on instance lifetime, concurrency, crash recovery, and shutdown
 
-### predict()
+### run()
 
 - Called **for each prediction request**
 - Signature defines the model's input schema (via type hints)
@@ -84,12 +84,12 @@ class Predictor(BasePredictor):
 
 ## Input Types
 
-The types used in `predict()` parameters become the model's input schema.
+The types used in `run()` parameters become the model's input schema.
 
 ### Basic Types
 
 ```python
-def predict(
+def run(
     self,
     text: str,              # String input
     count: int,             # Integer
@@ -105,7 +105,7 @@ URLs are automatically downloaded to local files:
 ```python
 from cog import Path
 
-def predict(self, image: Path) -> Path:
+def run(self, image: Path) -> Path:
     # Client sends: {"input": {"image": "https://example.com/photo.jpg"}}
     # Cog downloads the URL, `image` is a local path like /tmp/inputabc123.jpg
     img = PIL.Image.open(image)
@@ -125,7 +125,7 @@ For sensitive values that shouldn't appear in logs:
 ```python
 from cog import Secret
 
-def predict(self, api_key: Secret) -> str:
+def run(self, api_key: Secret) -> str:
     # Value is masked in logs and webhooks
     client = SomeAPI(api_key.get_secret_value())
     ...
@@ -138,7 +138,7 @@ Use `Input()` to add metadata and validation:
 ```python
 from cog import Input
 
-def predict(
+def run(
     self,
     prompt: str = Input(description="The text prompt"),
     steps: int = Input(default=50, ge=1, le=100, description="Inference steps"),
@@ -159,7 +159,7 @@ def predict(
 ```python
 from typing import Literal
 
-def predict(
+def run(
     self,
     size: Literal["small", "medium", "large"] = "medium",
 ) -> str:
@@ -171,7 +171,7 @@ def predict(
 from typing import List
 from cog import Path
 
-def predict(
+def run(
     self,
     images: List[Path],      # Multiple file inputs
     tags: List[str],         # Multiple strings
@@ -183,7 +183,7 @@ def predict(
 ```python
 from typing import Optional
 
-def predict(
+def run(
     self,
     seed: Optional[int] = None,  # Can be omitted or null
 ) -> str:
@@ -196,7 +196,7 @@ The return type annotation defines what the model produces.
 ### Basic Types
 
 ```python
-def predict(self, prompt: str) -> str:
+def run(self, prompt: str) -> str:
     return "Generated text..."
 ```
 
@@ -207,7 +207,7 @@ Return `cog.Path` pointing to a generated file:
 ```python
 from cog import Path
 
-def predict(self, prompt: str) -> Path:
+def run(self, prompt: str) -> Path:
     # Generate file
     output_path = "/tmp/output.png"
     self.model.generate(prompt).save(output_path)
@@ -224,7 +224,7 @@ Return a list:
 from typing import List
 from cog import Path
 
-def predict(self, prompt: str) -> List[Path]:
+def run(self, prompt: str) -> List[Path]:
     paths = []
     for i in range(4):
         path = f"/tmp/output_{i}.png"
@@ -240,7 +240,7 @@ Yield values progressively:
 ```python
 from typing import Iterator
 
-def predict(self, prompt: str) -> Iterator[str]:
+def run(self, prompt: str) -> Iterator[str]:
     for token in self.model.generate_stream(prompt):
         yield token
 ```
@@ -254,7 +254,7 @@ For LLM-style token streaming where outputs should be concatenated:
 ```python
 from cog import ConcatenateIterator
 
-def predict(self, prompt: str) -> ConcatenateIterator[str]:
+def run(self, prompt: str) -> ConcatenateIterator[str]:
     for token in self.model.generate(prompt):
         yield token  # "Hello", " ", "world", "!"
     # Client sees progressive: "Hello" -> "Hello " -> "Hello world" -> "Hello world!"
@@ -273,7 +273,7 @@ Include weights in your source directory - they're copied into the image during
 ```
 my-model/
 ├── cog.yaml
-├── predict.py
+├── run.py
 └── weights/
     └── model.safetensors
 ```
@@ -313,11 +313,11 @@ The choice depends on your deployment needs - bundled weights make images larger
 For concurrent predictions, use async:
 
 ```python
-class Predictor(BasePredictor):
+class Runner(BaseRunner):
     async def setup(self):
         self.model = await load_model_async()
 
-    async def predict(self, prompt: str) -> str:
+    async def run(self, prompt: str) -> str:
         return await self.model.generate(prompt)
 ```
 
@@ -330,10 +330,10 @@ See [Container Runtime](./04-container-runtime.md) for concurrency details.
 
 ## Code References
 
-| File                      | Purpose                                                   |
-| ------------------------- | --------------------------------------------------------- |
-| `python/cog/__init__.py`  | Public API exports                                        |
-| `python/cog/predictor.py` | BasePredictor class, type introspection, weights handling |
-| `python/cog/types.py`     | Path, Secret, ConcatenateIterator                         |
-| `python/cog/input.py`     | `Input()` function and field metadata                     |
-| `pkg/config/config.go`    | cog.yaml parsing                                          |
+| File                      | Purpose                                                |
+| ------------------------- | ------------------------------------------------------ |
+| `python/cog/__init__.py`  | Public API exports                                     |
+| `python/cog/predictor.py` | BaseRunner class, type introspection, weights handling |
+| `python/cog/types.py`     | Path, Secret, ConcatenateIterator                      |
+| `python/cog/input.py`     | `Input()` function and field metadata                  |
+| `pkg/config/config.go`    | cog.yaml parsing                                       |