第7章: Prompt Management 実践 — バージョニング・A/B・CI/CD | Langfuse Deep Dive

なぜプロンプト管理が独立した"機能"なのか

プロンプトは「ただの文字列」に見えて、実運用では挙動を決める最重要のビジネスロジックです。コードと同じようにバージョン管理・レビュー・リリース・ロールバックができなければ、障害の原因特定すらできません。 LangfuseのPrompt Managementは、こうした要求に応える「プロンプトの版管理層」として独立しています。

Prompt Object — immutableバージョン + 可変Label

graph LR
  subgraph Prompt 'chat-agent'
    V1[Version 1<br/>immutable]
    V2[Version 2<br/>immutable]
    V3[Version 3<br/>immutable]
    V4[Version 4<br/>immutable]
  end
  PROD[production<br/>Label] -.points to.-> V3
  STAGE[staging<br/>Label] -.points to.-> V4
  LAT[latest<br/>Label - 自動] -.points to.-> V4
  CANARY[canary-a<br/>Label] -.points to.-> V2

Prompt は名前ごとに複数のimmutableバージョンを持ち、Labelは任意のバージョンを指す可変ポインタ。アプリは Label 経由で取得するため、バージョン切り替えがコード変更なしで可能

概念	性質	具体例
Prompt Name	ユーザー定義の識別子	"chat-agent", "rag-qa", "summarize-jp"
Version	整数、完全にimmutable	1, 2, 3, ...（削除不可、上書き不可）
Label	可変の名前付きポインタ	production / staging / latest / canary-a / jp-only
Protected Label	誤上書き防止つき特殊Label	"production" を保護し、意図的な解除なしでは移動不可
Type	プロンプト形式	"text"（単一文字列） / "chat"（messages配列）
Config	モデル設定をPromptに同梱	model, temperature, max_tokens, tools等

text型と chat型

// text型: 単一のテンプレート文字列
"あなたは{{role}}です。{{question}}に日本語で答えてください。"

// chat型: messages配列
[
  { "role": "system", "content": "あなたは親切な{{role}}です。" },
  { "role": "user",   "content": "{{question}}" }
]

取得 → コンパイル → 呼び出し

from langfuse import get_client
from openai import OpenAI

langfuse = get_client()
openai = OpenAI()

# 1. Prompt取得（デフォルトは production Label）
prompt = langfuse.get_prompt("chat-agent")  # label="production" 相当

# 2. 変数をcompile（{{variable}} が値に展開される）
messages = prompt.compile(role="SRE", question="p99レイテンシが悪化した原因は?")

# 3. LLMに渡す。langfuse_prompt を渡すと Linked Generation になる
completion = openai.chat.completions.create(
    model=prompt.config.get("model", "gpt-4o-mini"),
    messages=messages,
    extra_body={"langfuse_prompt": prompt.langfuse_id},
)

Linked Generation で"使われたバージョン"を記録

langfuse_prompt を渡すと、生成された Observation（Generation）に「どのPromptのどのバージョンを使ったか」が永続的にリンクされます。これはUIで「prompt v3でのTrace一覧」「v2→v3前後でのscore差分」など、バージョン別の計測を可能にする重要な仕組みです。

クライアント側キャッシュとTTL

Prompt APIは高頻度に呼ばれるため、SDKは クライアント側キャッシュ を内蔵しています。

キャッシュ設定	デフォルト	推奨	トレードオフ
cache_ttl_seconds	60秒	30〜300秒	短いほど反映早いがAPI負荷増
fallback	なし	必ず設定（文字列 or messages）	ネット断絶時の生存性
label指定	"production"	Env別にproduction/staging	staging→productionへ昇格フローが基本形

prompt = langfuse.get_prompt(
    "chat-agent",
    label="production",
    cache_ttl_seconds=120,
    fallback="あなたは親切なエージェントです。{{question}}に答えてください。",
)

ラベルベースのA/B・Canary

Labelを複数作って、アプリ側で「ユーザーIDの末尾がX%ならcanary-aを取る」といったルーティングをすることでA/Bが可能です。

import hashlib

def choose_label(user_id: str) -> str:
    digest = hashlib.md5(user_id.encode()).hexdigest()
    bucket = int(digest[:4], 16) % 100
    return "canary-a" if bucket < 10 else "production"  # 10%トラフィック

prompt = langfuse.get_prompt("chat-agent", label=choose_label(user_id))

GitHub Actions とCI/CD連携

sequenceDiagram
  participant Dev as Prompt Engineer
  participant UI as Langfuse UI
  participant LF as Langfuse Server
  participant GH as GitHub Actions
  participant CH as Canary Pod
  Dev->>UI: v5 を編集・保存
  Dev->>UI: v5 に 'canary' Label 付与
  UI->>LF: Label update
  LF-->>GH: Webhook (repository_dispatch)
  GH->>GH: 評価ワークフロー起動<br/>（DatasetRun / promptfoo / eval test）
  alt 評価OK
    GH->>LF: POST: 'production' Label を v5 に移動
    LF->>CH: Prompt取得キャッシュ期限切れ → v5 有効化
  else 評価NG
    GH->>LF: 'canary' Label を v5 → v4 に戻す
  end

Labelの付け換えをWebhookトリガにGitHub Actionsで評価を回し、自動昇格/ロールバックを行うフロー

Webhook payload の要点

{
  "event": "prompt.version.label.updated",
  "projectId": "proj_xxx",
  "prompt": {
    "name": "chat-agent",
    "version": 5,
    "labels": ["canary"]
  },
  "actor": { "id": "u_42", "email": "dev@example.com" },
  "occurredAt": "2026-04-21T10:30:00Z"
}

Protected Label で誤上書き防止

productionのような重要Labelは Protected Label に設定できます。保護中は UI/APIからの付け換えが拒否され、Adminが保護解除したうえで移動する2段階操作が必要になります。

Playground と現場往復

UIの Playground では、既存Promptを開いて変数を当てて実行し、そのまま「新バージョンとして保存」できます。

本番Traceで失敗例を発見 → Trace詳細から「Playgroundで開く」
入力を当てた状態で Prompt を手直し
実行してスコアやコストを比較
OKなら "Save as new version" で v_n+1 を作成
staging Label を v_n+1 に付ける → CI評価 → production Labelへ昇格

MCP Server — AIエージェント自身がPromptを編集する

Langfuseは MCP Server（Model Context Protocol）も公開しており、Claude Codeやその他のAIエージェントが Prompt管理APIを自然言語でやり取りできます。

MCPツール	用途
`list_prompts`	プロジェクト内のプロンプト一覧を取得
`get_prompt`	指定バージョン/Labelのプロンプトを取得
`create_prompt`	新バージョンを作成（既存のtextをベースに改変可）
`update_label`	Labelの付け替え（Protected Labelは不可）
`list_traces`	失敗Traceを探索してPromptの穴を発見

まとめ

次章では Score / Evaluation / Dataset / Experiment を取り上げ、Promptの変更をどう"数字"で評価するかを見ていきます。

理解度チェック

問題 0 / 50%

Langfuse Prompt Management の「Label」の性質として正しいものはどれか？

キーボード: 1〜4 で選択、Enter で回答