•   2 months ago

Learnings from building a Creative Storyteller with Gemini Live API + ADK

Hey everyone! Check out Reveria for the Creative Storyteller track and wanted to share some things I learned building with the Gemini Live API in case it helps others.

Reveria is an interactive multimodal story engine. You describe a story via voice or text, and a team of AI agents builds it live: illustrated scenes, narrated audio, and an AI Director that shapes the narrative. All streaming as interleaved output in an interactive storybook.

(Quick note: you might see "StoryForge" in the GitHub repo or some URLs. We rebranded to Reveria midway through development. Same project, new name.)

A few technical learnings that might save you time:

1. Gemini Live API native features are underused. Instead of making separate API calls for transcription and intent detection, enable input_audio_transcription, output_audio_transcription, and declare function tools directly in the LiveConnectConfig. This eliminated 3-5 extra API calls per interaction for us.

2. Character consistency in Imagen is hard. If you let Gemini write the full image prompt, it summarizes character descriptions and drops details. Our fix: extract character descriptions once, then prepend them verbatim to every image prompt. Gemini only writes the scene composition (setting, lighting, camera angle). Characters look the same across every scene.

3. Gemini's native audio model (gemini-2.5-flash-native-audio) produces way better narration than Cloud TTS. It understands narrative context and varies tone with the story's mood. Worth the switch if you're doing any kind of storytelling.

4. For anyone struggling with Imagen rendering text/speech bubbles into comic-style images: put a positive "text-free" instruction at the START of your prompt (where Imagen pays most attention), not just negative constraints at the end.

Project link: https://devpost.com/software/reveria-stories-from-your-imagination

Happy to answer any Gemini Live API or ADK questions. Good luck to everyone!

  • 0 comments

Log in or sign up for Devpost to join the conversation.