File size: 3,918 Bytes
bdc21c2
 
 
 
 
 
 
 
 
 
37caa62
 
bdc21c2
37caa62
bdc21c2
37caa62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
title: CineGen
emoji: πŸ‘€
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: false
short_description: automate the process of short movie creation
tags:
  - mcp-in-action-track-creative
---
**CineGen AI Director** is an AI agent designed to automate the process of short movie creation. It transforms a simple text or image idea into a fully realized video production by handling scriptwriting, storyboard generation, character design, and video synthesis using a multi-model approach.

- **Sponsor Platforms**: Uses Google Gemini (story + character prompts) and Hugging Face Inference Client with fal.ai hosting for Wan 2.2 TI2V video renders; 
- **Autonomous Agent Flow**: StoryGenerator β†’ CharacterDesigner β†’ VideoDirector pipeline runs sequentially inside a single Gradio Blocks app, with MCP-friendly abstractions (`StoryGenerator`, `CharacterDesigner`, `VideoDirector`) designed for tool-call orchestration.
- **Evaluation Notes**: Covers reasoning (Gemini JSON storyboard spec), planning (scene/character tables that feed downstream steps), and execution (queued video renders with serialized HF jobs). 

## Artifacts for Reviewers

- **Social Media Proof**: Replace `<SOCIAL_LINK_HERE>` with your live tweet/thread/LinkedIn post so judges can verify community sharing.
- **Video Recording**: Upload a walkthrough of the Gradio agent (screen + narration) and swap `<DEMO_VIDEO_LINK>` with the shareable link.


## πŸš€ Key Features

*   **End-to-End Automation**: Converts a single sentence idea into a complete short film (approx. 30s-60s runtime).
*   **Intelligent Storyboarding**: Breaks down concepts into scene-by-scene visual prompts and narrative descriptions.
*   **Character Consistency System**:
    *   Automatically identifies main characters.
    *   Generates visual reference sheets (Character Anchors).
    *   Allows users to "tag" specific characters in specific scenes to ensure visual consistency in the video generation prompt.
*   **Multi-Model Video Generation**: Supports multiple state-of-the-art open-source video models via Hugging Face.
    *   **Robust Fallback System**: If the selected video model fails (e.g., server overload), the system automatically tries alternative models until generation succeeds.
*   **Interactive Editing**:
    *   Edit visual prompts manually.
    *   Add, Insert, or Delete scenes during production.
    *   Regenerate specific clips or character looks.
*   **Client-Side Video Merging**: Combines individual generated clips into a single continuous movie file directly in the browser without requiring a backend video processing server.


## πŸ€– AI Models & API Usage

The application orchestrates two primary AI services:

### 1. Google Gemini API (`@google/genai`)
Used for the "Brain" and "Art Department" of the application.

*   **Logic & Scripting**: `gemini-2.5-flash`
    *   **Role**: Analyzes the user's idea, generates the title, creates character profiles, and writes the JSON-structured storyboard with visual prompts.
    *   **Technique**: Uses Structured Output (JSON Schema) to ensure the app can parse the story data reliably.
*   **Character Design**: `gemini-2.5-flash-image`
    *   **Role**: Generates static reference images for characters based on the script's descriptions.
    *   **Role**: Acts as the visual anchor for the user to verify character appearance before video generation.

### 2. Hugging Face Inference API (`@huggingface/inference`)
Used for the "Production/Camera" department.

*   **Video Generation Models**:
    *   **Wan 2.1 (Wan-AI)**: `Wan-AI/Wan2.1-T2V-14B` (Primary/Default)
    *   **LTX Video (Lightricks)**: `Lightricks/LTX-Video-0.9.7-distilled`
    *   **Hunyuan Video 1.5**: `tencent/HunyuanVideo-1.5`
    *   **CogVideoX**: `THUDM/CogVideoX-5b`
*   **Provider**: Defaults to `fal-ai` via Hugging Face Inference for high-performance GPU access.