Nixie is an efficient service for transparent GPU multiplexing without worrying about insufficient VRAM/DRAM capacity on Linux.
Our highlighted features include:
- Optimizing for modern large AI models.
- Transparent GPU multiplexing, supporting popular applications like llama.cpp, SGLang, ComfyUI and more out of the box.
- Low task switching latency
- Configurable maximum memory size depending on user needs.
Prerequisites:
- Rust (>=1.90 stable)
Build the project with:
git clone https://github.com/XOR-op/nixie
cd nixie
cargo build --releaseFirst, we need to start Nixie daemon:
nixie daemonTo configure the capacity of memory used, run with
nixie daemon --shmem <pinned-memory-size> --hostmem <paged-memory-size>
# For example, to use 16GB of pinned memory and 32GB of paged memory:
nixie daemon --shmem 16g --hostmem 32gThen, we can launch applications with Nixie:
nixie run <app-name> <app-args>To specify which GPU to use, assuming we use GPU 0:
nixie run -d 0 <app-name> <app-args>See CLI Reference for more details on the available commands and options.