MultimodalImage GenerationTTSVideoAPI
Multimodal AI: Image, Audio, and Video API Integration
Complete guide to integrating image generation, speech synthesis, and video generation APIs.
Image Generation APIs
Access FLUX, Wan-Image, and Doubao Image through unified endpoints. Compare quality vs cost tradeoffs.
Speech Synthesis (TTS)
CosyVoice and Fish Speech offer natural-sounding Chinese voice synthesis. Use for voice assistants and content narration.
Video Generation
Wan Video and CogVideoX enable AI video creation. Best for short clips and social media content.
Integration Patterns
// Image generation example
curl https://chinawhapi.com/v1/images/generations \
-H "Authorization: Bearer {key}" \
-d '{"model":"wan-image-2.0","prompt":"cyberpunk city at night"}'