Stable Audio creates original music and sound effects from natural-language text prompts. Generate full tracks in seconds, edit with precision, and build on open-weights models trained on licensed data.
Stable Audio is built by Stability AI on fully licensed data. It combines fast inference, artist-first controllability, and open-weights availability into a single platform for music and sound design.
Describe the music or sound effect you want in natural language. The model outputs full tracks with coherent musical structure at 44.1kHz stereo.
Upload existing audio and pair it with a text prompt to change style, genre, or mood. The input audio guides the model toward your target output.
Edit specific segments of a track, replace sections, or extend audio beyond its original endpoint. Targeted control without regenerating the whole file.
Download Small and Medium model weights for self-hosting, or use the Stability AI API for managed hosting. Enterprise plans include customization and indemnification.
Stable Audio uses a fast latent diffusion architecture. A semantic-acoustic autoencoder projects audio into a compact latent space, enabling efficient generation while preserving fidelity.
Describe the audio you want: genre, mood, tempo, key instruments. The more detail you provide, the closer the output matches your intent.
The diffusion model produces full-length audio in less than two seconds on an H200 GPU. Small models run on consumer hardware including MacBook Pro M4.
Use inpainting to modify specific segments, change style via audio-to-audio, or extend the track with causal continuation.
Download in WAV or MP3 format. Pro users get full commercial rights. Enterprise customers receive legal indemnification.
Four model variants cover everything from on-device sound effects to enterprise-grade production. Small and Medium weights are open-source on Hugging Face.
459M params · Sound effects
Optimized for mobile devices. Generates up to 2 minutes of sound effects. Runs offline on consumer laptops.
459M params · Short music
Full music composition on-device with open-weights availability. Generates tracks up to 2 minutes.
1.4B params · Full tracks
Generates up to 6m 20s of music with complex dynamic structure. Open-weights on Hugging Face. LoRA fine-tuning supported.
Enterprise-grade
Designed for enterprise sound production. Access via Stability AI API with customization and white-glove support.
Stable Audio combines open innovation with commercial safety. Every model is trained on licensed and Creative Commons data, so you can use the outputs with confidence.
Experiment with open-weights models. See what is under the hood. Build what comes next. Small and Medium are freely available to download and customize.
Fine-tune models on your own audio library using LoRA. Enterprise customers get guided fine-tuning support from the Stability AI Audio Research team.
Commercially safe models trained on fully licensed datasets. Legal indemnification provided under the Enterprise license. Use outputs in your commercial projects.
Stable Audio 3.0 is a family of fast latent diffusion models for variable-length audio generation and editing. At its core is a novel semantic-acoustic autoencoder that projects audio into a compact latent space, preserving fidelity while encouraging semantic structure. This representation makes diffusion efficient enough to generate up to six minutes of audio in under two seconds on an H200 GPU.
The model family supports three key interaction modes. Text-to-audio generates full tracks from natural-language descriptions. Audio-to-audio transforms uploaded audio by changing style, genre, or instrumentation. Inpainting enables targeted editing — modify a single segment, perform multi-segment edits, or extend audio coherently beyond its original endpoint via causal continuation.
Adversarial post-training reduces the number of inference steps needed while improving fidelity and prompt adherence. The result is a model that runs on consumer hardware — a MacBook Pro M4 generates audio in a few seconds — while matching the quality of much larger systems.
Read the research paperWhether you use the Stability AI API or run models locally, getting started takes only a few steps.
Use the Stability AI API for managed hosting, or download open weights from Hugging Face for self-hosted inference.
Python 3.10+ and PyTorch. The open-weights models run on any CUDA-capable GPU or Apple Silicon Mac.
Load the model and pass a text prompt. A few lines of code is all it takes to generate your first track.
Use LoRA to adapt the model to your own audio library. Documentation is available alongside the weights.
Start here if you want to know about licensing, hardware requirements, or what makes Stable Audio different from other AI music tools.
Every claim on this page is grounded in the research paper, official documentation, or published public discussion.
Generate music and sound effects directly in the browser. Create an account and start prompting in seconds.
Full technical report detailing the latent diffusion architecture, autoencoder design, and adversarial post-training methodology.
Download Small and Medium weights. Access LoRA fine-tuning documentation and community resources.