New bomb dropped from asian researchers: YuE: Open Music Foundation Models for Full-Song Generation

Only few days ago a r/LocalLLaMA user was going to give away a kidney for this.

YuE is an open-source project by HKUST tackling the challenge of generating full-length songs from lyrics (lyrics2song). Unlike existing models limited to short clips, YuE can produce 5-minute songs with coherent vocals and accompaniment. Key innovations include:

  • A semantically enhanced audio tokenizer for efficient training.
  • Dual-token technique for synced vocal-instrumental modeling.
  • Lyrics-chain-of-thoughts for progressive song generation.
  • Support for diverse genres, languages, and advanced vocal techniques (e.g., scatting, death growl).

Check out the GitHub repo for demos and model checkpoints.