Metadata-Version: 2.4
Name: api-sdk
Version: 0.1.0
Summary: API SDK for image generation and other services
Author-email: Your Name <your.email@example.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: elevenlabs>=1.0.0
Requires-Dist: google-genai>=1.56.0
Requires-Dist: Pillow>=10.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: websocket-client>=1.6.0

# API SDK

A Python SDK for image generation using Google Gemini API.

## Installation

```bash
pip install -e .
```

Or install dependencies directly:

```bash
pip install -r requirements.txt
```

## Configuration

This SDK auto-loads a `.env` file colocated with the package and reads
`GEMINI_API_KEY` on import.

- Editable安装（本仓库结构）：将密钥放到 `src/api_sdk/.env`
- 已安装包（site-packages）：将 `.env` 放到安装后的包目录 `.../site-packages/api_sdk/.env`
- 也可以在外部进程环境里设置（无需 `.env`）：`export GEMINI_API_KEY=...`

`.env` 示例（不建议提交到版本库，仓库已 .gitignore）：

```bash
GEMINI_API_KEY=your-api-key
FLYTEK_APPID=your-app-id
FLYTEK_APIKey=your-api-key
FLYTEK_APISecret=your-api-secret
FLYTEK_URL=wss://tts-api-sg.xf-yun.com/v2/tts
```

## Usage

### As a Python Library

```python
from api_sdk.image_generator import generate_image
from api_sdk import generate_text
from api_sdk import generate_speech, generate_speech_bytes

# Generate an image
output_path = generate_image(
    prompt="A beautiful sunset over mountains",
    output_path="./images/sunset.png"
)
print(f"Image saved to: {output_path}")

# Generate text
text, _ = generate_text(
    prompt="给我一首关于秋天的短诗",
)
print(text)

# Synthesize speech (bytes, no disk I/O)
audio_bytes = generate_speech("你好，世界")  # 默认返回 WAV bytes
Path("./audio").mkdir(parents=True, exist_ok=True)
Path("./audio/hello.wav").write_bytes(audio_bytes)
```

### Using the CLI

Generate an image with a simple prompt:

```bash
python -m api_sdk.cli generate-image "A beautiful sunset over mountains"
```

Generate an image and save to specific location:

```bash
python -m api_sdk.cli generate-image "A cat playing piano" -o ./images/cat_piano.png
```

Use verbose mode for detailed output:

```bash
python -m api_sdk.cli generate-image "Modern architecture" -v
```

No CLI `--api-key` flag is provided; credentials are discovered automatically
via包内 `.env` 或进程环境。

After installation with pip, you can also use the command directly:

```bash
api-sdk generate-image "Your prompt here"
```

Generate image (Volcengine Jimeng / 即梦，AK/SK):

```bash
export VOLC_ACCESSKEY=...
export VOLC_SECRETKEY=...

# 文生图（当前接入：jimeng_t2i_v40）
api-sdk generate-image --provider jimeng "公共信息墙上的医学文献检索结果页面，冷静权威可信" -o ./images -n jimeng_img

# 可选：尝试指定宽高（是否支持取决于模型；高级参数可用 --extra-config-file 透传）
api-sdk generate-image --provider jimeng "A calm authoritative information dashboard UI" --width 1024 --height 576 -o ./images -n jimeng_16_9
```

Generate video (Gemini Veo):

```bash
api-sdk generate-video "A cinematic shot of a tiny robot walking through a rainy neon city"
api-sdk generate-video "Make it sunrise" -i examples/1.jpg -o ./videos -n sunrise

# With reference images (max 3) + optional types (ASSET / STYLE)
api-sdk generate-video "Keep the same character, new scene" \
  --reference-image examples/1.jpg --reference-type ASSET \
  --reference-image examples/2.jpg --reference-type STYLE \
  -o ./videos -n ref_demo

# Extend an existing video
api-sdk generate-video "Continue the scene with slower camera movement" \
  --extend-video ./videos/demo.mp4 \
  --duration-seconds 8 --resolution 720p -o ./videos -n extended
```

Generate video (Volcengine Jimeng / 即梦，AK/SK):

```bash
# .env / env 里提供火山引擎 AK/SK（不会覆盖已存在的进程环境变量）
export VOLC_ACCESSKEY=...
export VOLC_SECRETKEY=...

# 文生视频（当前接入：jimeng_t2v_v30）
api-sdk generate-video --provider jimeng "一段冷静、权威的医学文献检索信息墙界面，从加载态过渡到结果总览" \
  --aspect-ratio 16:9 --duration-seconds 5 -o ./videos -n jimeng_demo

# Jimeng 也可直接指定帧数（121=5s，241=10s）
api-sdk generate-video --provider jimeng "A calm authoritative public information panel UI" --frames 241 -o ./videos -n jimeng_10s
```

Generate text:

```bash
api-sdk generate-text "给我一段关于海风的散文"
api-sdk --json generate-text "Summarize: ..." --model gemini-2.5-pro --max-tokens 512

# Use Codex as text provider (non-interactive, safe sandbox)
api-sdk generate-text "写一首关于红豆的诗歌（中文）" --provider codex
python -m api_sdk.cli generate-text --provider codex "写一句关于红豆的短句（中文）"
```

Structured Output (Gemini):

```bash
# Pass a JSON schema to enforce structured JSON output.
api-sdk generate-text "Generate a recipe for butter cookies" --provider gemini --response-schema-file ./examples/recipe.schema.json
```

Structured Output (Codex):

```bash
api-sdk generate-text "Extract project metadata" --provider codex --response-schema-file ./examples/recipe.schema.json
```

Models:
- Default image model: `gemini-3-pro-image-preview`
- You can also pass a different image model explicitly, e.g.: `--model gemini-2.5-flash-image`
- Default image model (Jimeng req_key): `jimeng_t2i_v40`
- Default video model (Gemini Veo): `veo-3.1-generate-preview`
- Default video model (Jimeng req_key): `jimeng_t2v_v30`

Global flags:
- `--json` Emit machine-readable JSON to stdout (for pipelines)
- `-v/--verbose` Verbose logs (DEBUG)
- `-q/--quiet` Suppress non-error logs
- `--no-color` Disable decorative icons/emojis

Example JSON output:

```bash
api-sdk --json detect-objects examples/1.jpg "Find animals"
# {"ok":true,"command":"detect-objects","output_path":"images/1_detected.jpg","objects":[...],"model":"gemini-2.0-flash-exp"}
```

Shell wrapper:
- A thin shell launcher is available at `bin/api-sdk` (executes `python -m api_sdk.cli "$@"`).
  You can run `chmod +x bin/api-sdk` and add it to your PATH if desired.

### Text-to-Speech (TTS)

支持提供方：Google Gemini TTS（默认，Algieba）、iFlytek、ElevenLabs。

ElevenLabs（需要在 `.env` / 环境变量里配置 `ELEVENLABS_API_KEY`；voice_id 默认 `DowyQ68vDpgFYdWVGjc3`，也可通过 `ELEVENLABS_VOICE_ID` 或 CLI `--voice` 覆盖）：

```bash
export ELEVENLABS_API_KEY=...
export ELEVENLABS_VOICE_ID=JBFqnCBsd6RMkjVDRZzb

api-sdk synthesize-speech "The first move is what sets everything in motion." \
  --provider elevenlabs --format mp3 --model eleven_multilingual_v2 -o ./audio/eleven.mp3
```

合成中文语音示例（默认输出 WAV 16k 单声道）：

```bash
# Gemini（默认，Algieba）
api-sdk synthesize-speech "你好，世界" -o ./audio/hello_google.wav --format mp3

# 指定模型（默认已是 gemini-2.5-pro-preview-tts）：
api-sdk synthesize-speech "你好，世界" --provider gemini --model gemini-2.5-pro-preview-tts -o ./audio/hello2.mp3 --format mp3

# 生成 MP3（Google 侧支持）：
api-sdk synthesize-speech "Hello, this is a demo." --provider algieba --model gemini-2.5-pro-tts --format mp3 -o ./audio/demo.mp3
```

说明：
- Google Gemini TTS（Algieba）默认输出可直接为 MP3（--format mp3）。可选传入 `--voice`（可用值因模型/地区不同）。
- iFlytek 默认音色（voice/vcn）为 `x_xiaoyang_story`，也可以通过 `--voice` 自行指定。

自定义参数（音色、格式、采样率）：

```bash
api-sdk synthesize-speech "欢迎使用语音合成" \
  --voice x_John \
  --format wav \
  --sample-rate 16000 \
  -o ./audio/welcome.wav
```

环境变量（或在 `.env`）:
- `FLYTEK_APPID`
- `FLYTEK_APIKey`
- `FLYTEK_APISecret`
- `FLYTEK_URL`（可选，覆盖默认端点）
- `GEMINI_API_KEY`（可选；若未设置则走 Google 默认凭据/ADC）

认证说明（Google）：
- 本地开发可直接设置 `GEMINI_API_KEY`（无需交互），或运行 `gcloud auth application-default login` 使用 ADC（需要一次交互登录）。
- 生产环境推荐使用服务账号，通过 `GOOGLE_APPLICATION_CREDENTIALS` 指向 JSON key，或使用云环境的工作负载身份（无人工交互）。

Provider 别名：`google`、`gemini`、`algieba` 均指向 Google Gemini TTS（默认模型：gemini-2.5-pro-tts）。默认 provider 为 `algieba`。

注意：可用的音色代码（`--voice`/vcn）因账号而异，请根据你在讯飞控制台开通的音色填写。

## API Reference

### `generate_image()`

Generate an image based on a text prompt.

**Parameters:**
- `prompt` (str): Text prompt for image generation
- `output_path` (str, optional): Path where the generated image will be saved
- `api_key` (str, optional): Google Gemini API key
- `model` (str, optional): Model to use for generation

**Returns:**
- `Path`: The path to the saved image

### `ImageGenerator` Class

A wrapper class for Google Gemini image generation.

```python
from api_sdk.image_generator import ImageGenerator

generator = ImageGenerator(api_key="your-api-key")
path = generator.generate_image(
    prompt="Your prompt",
    output_path="output.png"
)
```
