ChinaWHAPI
Global Gateway
← Back to Knowledge Center
MultimodalImage GenerationTTSVideoAPI

Multimodal AI: Image, Audio, and Video API Integration

Complete guide to integrating image generation, speech synthesis, and video generation APIs.

Image Generation APIs

Access FLUX, Wan-Image, and Doubao Image through unified endpoints. Compare quality vs cost tradeoffs.

Speech Synthesis (TTS)

CosyVoice and Fish Speech offer natural-sounding Chinese voice synthesis. Use for voice assistants and content narration.

Video Generation

Wan Video and CogVideoX enable AI video creation. Best for short clips and social media content.

Integration Patterns

// Image generation example
curl https://chinawhapi.com/v1/images/generations \
  -H "Authorization: Bearer {key}" \
  -d '{"model":"wan-image-2.0","prompt":"cyberpunk city at night"}'