Added New Speech and Audio Generation AI Model Templates with Focus on Music and Multimodal Capabilities #79
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Below are the 8 important Audio Generation and Speech Generation multimodels which are missing in the templates section,and should be added,as for now no speech/audio model exists:
What this PR brings to the templates section?
Note: I have added proper tags and recommended on discord to add them to the website (https://dashboard.nosana.com/deploy) ,for proper syncing.
1. Meta MusicGen Medium
A state-of-the-art text-to-music generation model that transforms text descriptions into high-quality music samples.
Key Features:
Technical Specs:
2. Microsoft Phi-4 Multimodal
A lightweight open multimodal foundation model supporting text, image, and audio inputs.
Key Features:
Technical Specs:
3. OpenAI Whisper Large V3 Turbo
Type: Speech Recognition
4. MIT AST Speech Commands v2
Type: Audio Classification
5. NVIDIA BigVGAN v2
Type: Neural Vocoder
6. MIT Audio Spectrogram Transformer
Type: Audio Classification
7. Coqui XTTS-v2
Type: Text-to-Speech
8. F5-TTS
Type: Text-to-Speech
Implementation Details
Template Structure
Each template includes:
info.json
: Metadata and categorizationjob-definition.json
: Deployment configurationREADME.md
: Comprehensive documentationAPI Standardization
Impact
Enhanced Capabilities