Add torchvision as a dependency so that vision doesn't fall back to torchvision's slow processor by RWL-Dittrich · Pull Request #1839 · exo-explore/exo

RWL-Dittrich · 2026-04-03T10:04:41Z

Motivation

Without the torchvision dependency, any vision prompts show this warning in the logs: Using use_fast=Truebuttorchvision is not available. Falling back to the slow image processor..
To fix this, I added torchvision as a dependency and made sure the vision pipeline is compatible with it.

The fast image processor (torchvision-based) returns PyTorch tensors, not NumPy arrays. Using return_tensors="np" fails when the fast processor is active because it relies on torchvision transforms that produce torch.Tensor outputs. This fixes image processing in VisionEncoder when using the fast image processor path.

Changes

Added torchvision as a dependency in pyproject.toml and python/parts.nix (with ignoreMissing for Nix compatibility)
Changed VisionEncoder to request PyTorch tensors (return_tensors="pt") instead of NumPy ("np") from the image processor
Convert PyTorch tensors to NumPy via .numpy() before passing to mx.array()

Why It Works

The fast image processor internally uses torchvision transforms, which produce torch.Tensor objects. Requesting "np" tensors caused a failure because the processor couldn't convert its internal torch tensors to NumPy in the expected way. By requesting "pt" (PyTorch) tensors and explicitly calling .numpy() before constructing MLX arrays, we align with what the fast processor actually produces while maintaining the same data flow into MLX.

Test Plan

Manual Testing

Hardware: (Two Mac Mini's M4 Pro 64GB, connected via Thunderbolt 4)
What you did:
Launched Qwen3.5-35B and attached a PDF to the prompt. No more warnings about the slow image processor show up in the console.

Automated Testing

rltakashige · 2026-04-03T14:22:34Z

Hi there! Thanks for this - we intentionally left torchvision out of the multimodality PR, as it pulls in a bunch of CUDA dependencies on Linux. How we will handle the dependencies is currently being discussed with branches such as https://github.com/exo-explore/exo/tree/vllm-nix .

Out of curiosity, how much faster is the fast image processor? The "slow" one already seems quite fast.

RWL-Dittrich · 2026-04-07T08:23:01Z

I did some benchmarking and it seems the "fast" processor is actually a tiny bit slower (Based on how long the TTFT actually is). But honestly speaking it's probably within margin of error. I attached the sample images zip here so you can test for yourself. sample images.zip

I used mlx-community/Qwen3.5-35B-A3B-8bit spread over two M4 pro mac mini's with 64 gigabytes of ram.

One weird and unexpected observation, Qwen seemed to struggle recognizing the bird in image 2 when the slow processor is used. And during all four runs of the benchmark I did with the fast runner it seemed to recognize the bird species. So maybe the slow and fast processor encode the images differently, somehow giving more context to the model with the "fast" runner.

Below is the result of the tests I did.

RWL-Dittrich added 2 commits April 3, 2026 11:36

Add torchvision for the fast image processor

83f2292

Update VisionEncoder to use PyTorch tensors for image processing

51d57e4

Merge remote-tracking branch 'origin/main' into fix/torchvision

2ce77c3

Merge remote-tracking branch 'origin/main' into fix/torchvision

c4beb5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add torchvision as a dependency so that vision doesn't fall back to torchvision's slow processor#1839

Add torchvision as a dependency so that vision doesn't fall back to torchvision's slow processor#1839
RWL-Dittrich wants to merge 4 commits intoexo-explore:mainfrom
RWL-Dittrich:fix/torchvision

RWL-Dittrich commented Apr 3, 2026

Uh oh!

rltakashige commented Apr 3, 2026

Uh oh!

RWL-Dittrich commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RWL-Dittrich commented Apr 3, 2026

Motivation

Changes

Why It Works

Test Plan

Manual Testing

Automated Testing

Uh oh!

rltakashige commented Apr 3, 2026

Uh oh!

RWL-Dittrich commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants