Skip to content

Shikhar-S/OpenBEATs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenBEATs

OpenBEATs is a general-purpose audio encoder pre-trained on speech, music, environmental sound, and bioacoustics. This package runs it on audio and returns patch-level embeddings, plus class probabilities when a fine-tuned checkpoint is used.

Install

pip install openbeats

This adds two commands, openbeats-infer and openbeats-download. The dependencies are kept light (torch, torchaudio, numpy, huggingface-hub, pyyaml, soundfile), and torch is pinned loosely so an existing build is not replaced. To avoid touching an existing environment, install it in its own with uv or pipx:

uv tool install openbeats     # or: pipx install openbeats

Usage

From the command line

Handy for a quick look:

openbeats-infer --checkpoint espnet/OpenBEATS-Large-i2-as20k \
    --audio audio.wav --out embeddings.npz

--checkpoint takes a Hugging Face repo id (downloaded automatically), a local directory, or a checkpoint file. The .npz holds patch_embeddings (num_patches, 1024), plus logits and probs when the checkpoint has a classifier. Other options: --device cuda, --max-layer N, and --chunk-seconds 10 for long recordings.

From Python

from openbeats.model import OpenBeats
from openbeats.utils import load_audio

# load model
model = OpenBeats.from_pretrained("espnet/OpenBEATS-Large-i2-as20k", device="cuda")

# from a file with any sample rate
out = model.encode_file("audio.wav")              # pass chunk_seconds=10 for long audio

# or load the waveform in 16khz monoaural array with values in [-1,1]
wav, sr = load_audio("audio.wav")
# and pass it
out = model.encode(wav, sr)

print(out["patch_embeddings"].shape)               # (num_patches, 1024)

Checkpoints

The variants (Base and Large, plus AudioSet and bioacoustics fine-tunes) live in the espnet OpenBEATs collection.

Citation

If you use OpenBEATs, please cite:

@INPROCEEDINGS{11230965,
  author={Bharadwaj, Shikhar and Cornell, Samuele and Choi, Kwanghee and Fukayama, Satoru and Shim, Hye-Jin and Deshmukh, Soham and Watanabe, Shinji},
  booktitle={2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  title={OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder},
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Training;Representation learning;Codes;Conferences;Pipelines;Signal processing;Cognition;Robustness;Reproducibility of results;Question answering (information retrieval)},
  doi={10.1109/WASPAA66052.2025.11230965}}

If you use the checkpoints trained for our ICME 2025 Audio Encoder Challenge submission, please also cite:

@article{bharadwaj2026cmu,
  title={The CMU-AIST submission for the ICME 2025 Audio Encoder Challenge},
  author={Bharadwaj, Shikhar and Cornell, Samuele and Choi, Kwanghee and Shim, Hye-jin and Deshmukh, Soham and Fukayama, Satoru and Watanabe, Shinji},
  journal={arXiv preprint arXiv:2601.16273},
  year={2026}
}

About

To run inference with OpenBEATs models

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages