提交实验技能

2026-03-26 02:23:12 +00:00 · 2026-03-25 00:42:28 +08:00
parent f18f6d82fc
commit e7a4afd6b5
2 changed files with 695 additions and 0 deletions
--- a/.cursor/skills/batch-submit-experiment/SKILL.md
+++ b/.cursor/skills/batch-submit-experiment/SKILL.md
@@ -0,0 +1,301 @@
 ---
 name: batch-submit-experiment
 description: Batch submit experiments (notebooks) to Uni-Lab platform — list workflows, generate node_params from registry schemas, submit multiple rounds. Use when the user wants to submit experiments, create notebooks, batch run workflows, or mentions 提交实验/批量实验/notebook/实验轮次.
 ---
 # 批量提交实验指南
 通过云端 API 批量提交实验（notebook），支持多轮实验参数配置。根据 workflow 模板详情和本地设备注册表自动生成 `node_params` 模板。
 ## 前置条件（缺一不可）
 使用本指南前，**必须**先确认以下信息。如果缺少任何一项，**立即向用户询问并终止**，等补齐后再继续。
 ### 1. ak / sk → AUTH
 询问用户的启动参数，从 `--ak` `--sk` 或 config.py 中获取。
 生成 AUTH token（任选一种方式）：
 ```bash
 # 方式一：Python 一行生成
 python -c "import base64,sys; print('Authorization: Lab ' + base64.b64encode(f'{sys.argv[1]}:{sys.argv[2]}'.encode()).decode())" <ak> <sk>
 # 方式二：手动计算
 # base64(ak:sk) → Authorization: Lab <token>
 ```
 ### 2. --addr → BASE URL
 | `--addr` 值 | BASE |
 |-------------|------|
 | `test` | `https://uni-lab.test.bohrium.com` |
 | `uat` | `https://uni-lab.uat.bohrium.com` |
 | `local` | `http://127.0.0.1:48197` |
 | 不传（默认） | `https://uni-lab.bohrium.com` |
 确认后设置：
 ```bash
 BASE="<根据 addr 确定的 URL>"
 AUTH="Authorization: Lab <上面命令输出的 token>"
 ```
 ### 3. req_device_registry_upload.json（设备注册表）
 **批量提交实验时需要本地注册表来解析 workflow 节点的参数 schema。**
 按优先级搜索：
 ```
 <workspace 根目录>/unilabos_data/req_device_registry_upload.json
 <workspace 根目录>/req_device_registry_upload.json
 ```
 也可直接 Glob 搜索：`**/req_device_registry_upload.json`
 找到后**检查文件修改时间**并告知用户。超过 1 天提醒用户是否需要重新启动 `unilab`。
 **如果文件不存在** → 告知用户先运行 `unilab` 启动命令，等注册表生成后再执行。可跳过此步，但将无法自动生成参数模板，需要用户手动填写 `param`。
 ### 4. workflow_uuid（目标工作流）
 用户需要提供要提交的 workflow UUID。如果用户不确定，通过 API #2 列出可用 workflow 供选择。
 **四项全部就绪后才可开始。**
 ## Session State
 在整个对话过程中，agent 需要记住以下状态，避免重复询问用户：
 - `lab_uuid` — 实验室 UUID（首次通过 API #1 自动获取，**不需要问用户**）
 - `workflow_uuid` — 工作流 UUID（用户提供或从列表选择）
 - `workflow_nodes` — workflow 中各 action 节点的 uuid、设备 ID、动作名（从 API #3 获取）
 ## 请求约定
 所有请求使用 `curl -s`，POST 需加 `Content-Type: application/json`。
 > **Windows 平台**必须使用 `curl.exe`（而非 PowerShell 的 `curl` 别名），示例中的 `curl` 均指 `curl.exe`。
 >
 > **PowerShell JSON 传参**：PowerShell 中 `-d '{"key":"value"}'` 会因引号转义失败。请将 JSON 写入临时文件，用 `-d '@tmp_body.json'`（单引号包裹 `@`，否则会被解析为 splatting 运算符）。
 ---
 ## API Endpoints
 ### 1. 获取实验室信息（自动获取 lab_uuid）
 ```bash
 curl -s -X GET "$BASE/api/v1/edge/lab/info" -H "$AUTH"
 ```
 返回：
 ```json
 {"code": 0, "data": {"uuid": "xxx", "name": "实验室名称"}}
 ```
 记住 `data.uuid` 为 `lab_uuid`。
 ### 2. 列出可用 workflow
 ```bash
 curl -s -X GET "$BASE/api/v1/lab/workflow/workflows?page=1&page_size=20&lab_uuid=$lab_uuid" -H "$AUTH"
 ```
 返回 workflow 列表，展示给用户选择。列出每个 workflow 的 `uuid` 和 `name`。
 ### 3. 获取 workflow 模板详情
 ```bash
 curl -s -X GET "$BASE/api/v1/lab/workflow/template/detail/$workflow_uuid" -H "$AUTH"
 ```
 返回 workflow 的完整结构，包含所有 action 节点信息。需要从响应中提取：
 - 每个 action 节点的 `node_uuid`
 - 每个节点对应的设备 ID（`resource_template_name`）
 - 每个节点的动作名（`node_template_name`）
 - 每个节点的现有参数（`param`）
 > **注意**：此 API 返回格式可能因版本不同而有差异。首次调用时，先打印完整响应分析结构，再提取节点信息。常见的节点字段路径为 `data.nodes[]` 或 `data.workflow_nodes[]`。
 ### 4. 提交实验（创建 notebook）
 ```bash
 curl -s -X POST "$BASE/api/v1/lab/notebook" \
  -H "$AUTH" -H "Content-Type: application/json" \
  -d '<request_body>'
 ```
 请求体结构：
 ```json
 {
    "lab_uuid": "<lab_uuid>",
    "workflow_uuid": "<workflow_uuid>",
    "name": "<实验名称>",
    "node_params": [
        {
            "sample_uuids": ["<样品UUID1>", "<样品UUID2>"],
            "datas": [
                {
                    "node_uuid": "<workflow中的节点UUID>",
                    "param": {},
                    "sample_params": [
                        {
                            "container_uuid": "<容器UUID>",
                            "sample_value": {
                                "liquid_names": "<液体名称>",
                                "volumes": 1000
                            }
                        }
                    ]
                }
            ]
        }
    ]
 }
 ```
 > **注意**：`sample_uuids` 必须是 **UUID 数组**（`[]uuid.UUID`），不是字符串。无样品时传空数组 `[]`。
 ---
 ## Notebook 请求体详解
 ### node_params 结构
 `node_params` 是一个数组，**每个元素代表一轮实验**：
 - 要跑 2 轮 → `node_params` 有 2 个元素
 - 要跑 N 轮 → `node_params` 有 N 个元素
 ### 每轮的字段
 | 字段 | 类型 | 说明 |
 |------|------|------|
 | `sample_uuids` | array\<uuid\> | 该轮实验的样品 UUID 数组，无样品时传 `[]` |
 | `datas` | array | 该轮中每个 workflow 节点的参数配置 |
 ### datas 中每个节点
 | 字段 | 类型 | 说明 |
 |------|------|------|
 | `node_uuid` | string | workflow 模板中的节点 UUID（从 API #3 获取） |
 | `param` | object | 动作参数（根据本地注册表 schema 填写） |
 | `sample_params` | array | 样品相关参数（液体名、体积等） |
 ### sample_params 中每条
 | 字段 | 类型 | 说明 |
 |------|------|------|
 | `container_uuid` | string | 容器 UUID |
 | `sample_value` | object | 样品值，如 `{"liquid_names": "水", "volumes": 1000}` |
 ---
 ## 从本地注册表生成 param 模板
 ### 自动方式 — 运行脚本
 ```bash
 python scripts/gen_notebook_params.py \
  --auth <token> \
  --base <BASE_URL> \
  --workflow-uuid <workflow_uuid> \
  [--registry <path/to/req_device_registry_upload.json>] \
  [--rounds <轮次数>] \
  [--output <输出文件路径>]
 ```
 > 脚本位于本文档同级目录下的 `scripts/gen_notebook_params.py`。
 脚本会：
 1. 调用 workflow detail API 获取所有 action 节点
 2. 读取本地注册表，为每个节点查找对应的 action schema
 3. 生成 `notebook_template.json`，包含：
   - 完整 `node_params` 骨架
   - 每个节点的 param 字段及类型说明
   - `_schema_info` 辅助信息（不提交，仅供参考）
 ### 手动方式
 如果脚本不可用或注册表不存在：
 1. 调用 API #3 获取 workflow 详情
 2. 找到每个 action 节点的 `node_uuid`
 3. 在本地注册表中查找对应设备的 `action_value_mappings`：
   ```
   resources[].id == <device_id>
   → resources[].class.action_value_mappings.<action_name>.schema.properties.goal.properties
   ```
 4. 将 schema 中的 properties 作为 `param` 的字段模板
 5. 按轮次复制 `node_params` 元素，让用户填写每轮的具体值
 ### 注册表结构参考
 ```json
 {
  "resources": [
    {
      "id": "liquid_handler.prcxi",
      "class": {
        "module": "unilabos.devices.xxx:ClassName",
        "action_value_mappings": {
          "transfer_liquid": {
            "type": "LiquidHandlerTransfer",
            "schema": {
              "properties": {
                "goal": {
                  "properties": {
                    "asp_vols": {"type": "array", "items": {"type": "number"}},
                    "sources": {"type": "array"}
                  },
                  "required": ["asp_vols", "sources"]
                }
              }
            },
            "goal_default": {}
          }
        }
      }
    }
  ]
 }
 ```
 `param` 填写时，使用 `goal.properties` 中的字段名和类型。
 ---
 ## 完整工作流 Checklist
 ```
 Task Progress:
 - [ ] Step 1: 确认 ak/sk → 生成 AUTH token
 - [ ] Step 2: 确认 --addr → 设置 BASE URL
 - [ ] Step 3: GET /edge/lab/info → 获取 lab_uuid
 - [ ] Step 4: 确认 workflow_uuid（用户提供或从 GET #2 列表选择）
 - [ ] Step 5: GET workflow detail (#3) → 提取各节点 uuid、设备ID、动作名
 - [ ] Step 6: 定位本地注册表 req_device_registry_upload.json
 - [ ] Step 7: 运行 gen_notebook_params.py 或手动匹配 → 生成 node_params 模板
 - [ ] Step 8: 引导用户填写每轮的参数（sample_uuids、param、sample_params）
 - [ ] Step 9: 构建完整请求体 → POST /lab/notebook 提交
 - [ ] Step 10: 检查返回结果，确认提交成功
 ```
 ---
 ## 常见问题
 ### Q: workflow 中有多个节点，每轮都要填所有节点的参数吗？
 是的。`datas` 数组中需要包含该轮实验涉及的每个 workflow 节点的参数。通常每个 action 节点都需要一条 `datas` 记录。
 ### Q: 多轮实验的参数完全不同吗？
 通常每轮的 `param`（设备动作参数）可能相同或相似，但 `sample_uuids` 和 `sample_params`（样品信息）每轮不同。脚本生成模板时会按轮次复制骨架，用户只需修改差异部分。
 ### Q: 如何获取 sample_uuids 和 container_uuid？
 这些 UUID 通常来自实验室的样品管理系统。向用户询问，或从资源树（API `GET /lab/material/download/$lab_uuid`）中查找。
--- a/.cursor/skills/batch-submit-experiment/scripts/gen_notebook_params.py
+++ b/.cursor/skills/batch-submit-experiment/scripts/gen_notebook_params.py
@@ -0,0 +1,394 @@
 #!/usr/bin/env python3
 """
 从 workflow 模板详情 + 本地设备注册表生成 notebook 提交用的 node_params 模板。
 用法:
  python gen_notebook_params.py --auth <token> --base <url> --workflow-uuid <uuid> [选项]
 选项:
  --auth <token>          Lab token（base64(ak:sk) 的结果，不含 "Lab " 前缀）
  --base <url>            API 基础 URL（如 https://uni-lab.test.bohrium.com）
  --workflow-uuid <uuid>  目标 workflow 的 UUID
  --registry <path>       本地注册表文件路径（默认自动搜索）
  --rounds <n>            实验轮次数（默认 1）
  --output <path>         输出模板文件路径（默认 notebook_template.json）
  --dump-response         打印 workflow detail API 的原始响应（调试用）
 示例:
  python gen_notebook_params.py \\
    --auth YTFmZDlkNGUtxxxx \\
    --base https://uni-lab.test.bohrium.com \\
    --workflow-uuid abc-123-def \\
    --rounds 2
 """
 import copy
 import json
 import os
 import sys
 from datetime import datetime
 from urllib.request import Request, urlopen
 from urllib.error import HTTPError, URLError
 REGISTRY_FILENAME = "req_device_registry_upload.json"
 def find_registry(explicit_path=None):
    """查找本地注册表文件，逻辑同 extract_device_actions.py"""
    if explicit_path:
        if os.path.isfile(explicit_path):
            return explicit_path
        if os.path.isdir(explicit_path):
            fp = os.path.join(explicit_path, REGISTRY_FILENAME)
            if os.path.isfile(fp):
                return fp
        print(f"警告: 指定的注册表路径不存在: {explicit_path}")
        return None
    candidates = [
        os.path.join("unilabos_data", REGISTRY_FILENAME),
        REGISTRY_FILENAME,
    ]
    for c in candidates:
        if os.path.isfile(c):
            return c
    script_dir = os.path.dirname(os.path.abspath(__file__))
    workspace_root = os.path.normpath(os.path.join(script_dir, "..", "..", ".."))
    for c in candidates:
        path = os.path.join(workspace_root, c)
        if os.path.isfile(path):
            return path
    cwd = os.getcwd()
    for _ in range(5):
        parent = os.path.dirname(cwd)
        if parent == cwd:
            break
        cwd = parent
        for c in candidates:
            path = os.path.join(cwd, c)
            if os.path.isfile(path):
                return path
    return None
 def load_registry(path):
    with open(path, "r", encoding="utf-8") as f:
        return json.load(f)
 def build_registry_index(registry_data):
    """构建 device_id → action_value_mappings 的索引"""
    index = {}
    for res in registry_data.get("resources", []):
        rid = res.get("id", "")
        avm = res.get("class", {}).get("action_value_mappings", {})
        if rid and avm:
            index[rid] = avm
    return index
 def flatten_goal_schema(action_data):
    """从 action_value_mappings 条目中提取 goal 层的 schema"""
    schema = action_data.get("schema", {})
    goal_schema = schema.get("properties", {}).get("goal", {})
    return goal_schema if goal_schema else schema
 def build_param_template(goal_schema):
    """根据 goal schema 生成 param 模板，含类型标注"""
    properties = goal_schema.get("properties", {})
    required = set(goal_schema.get("required", []))
    template = {}
    for field_name, field_def in properties.items():
        if field_name == "unilabos_device_id":
            continue
        ftype = field_def.get("type", "any")
        default = field_def.get("default")
        if default is not None:
            template[field_name] = default
        elif ftype == "string":
            template[field_name] = f"$TODO ({ftype}, {'required' if field_name in required else 'optional'})"
        elif ftype == "number" or ftype == "integer":
            template[field_name] = 0
        elif ftype == "boolean":
            template[field_name] = False
        elif ftype == "array":
            template[field_name] = []
        elif ftype == "object":
            template[field_name] = {}
        else:
            template[field_name] = f"$TODO ({ftype})"
    return template
 def fetch_workflow_detail(base_url, auth_token, workflow_uuid):
    """调用 workflow detail API"""
    url = f"{base_url}/api/v1/lab/workflow/template/detail/{workflow_uuid}"
    req = Request(url, method="GET")
    req.add_header("Authorization", f"Lab {auth_token}")
    try:
        with urlopen(req, timeout=30) as resp:
            return json.loads(resp.read().decode("utf-8"))
    except HTTPError as e:
        body = e.read().decode("utf-8", errors="replace")
        print(f"API 错误 {e.code}: {body}")
        return None
    except URLError as e:
        print(f"网络错误: {e.reason}")
        return None
 def extract_nodes_from_response(response):
    """
    从 workflow detail 响应中提取 action 节点列表。
    适配多种可能的响应格式。
    返回: [(node_uuid, resource_template_name, node_template_name, existing_param), ...]
    """
    data = response.get("data", response)
    search_keys = ["nodes", "workflow_nodes", "node_list", "steps"]
    nodes_raw = None
    for key in search_keys:
        if key in data and isinstance(data[key], list):
            nodes_raw = data[key]
            break
    if nodes_raw is None:
        if isinstance(data, list):
            nodes_raw = data
        else:
            for v in data.values():
                if isinstance(v, list) and len(v) > 0 and isinstance(v[0], dict):
                    nodes_raw = v
                    break
    if not nodes_raw:
        print("警告: 未能从响应中提取节点列表")
        print("响应顶层 keys:", list(data.keys()) if isinstance(data, dict) else type(data).__name__)
        return []
    result = []
    for node in nodes_raw:
        if not isinstance(node, dict):
            continue
        node_uuid = (
            node.get("uuid")
            or node.get("node_uuid")
            or node.get("id")
            or ""
        )
        resource_name = (
            node.get("resource_template_name")
            or node.get("device_id")
            or node.get("resource_name")
            or node.get("device_name")
            or ""
        )
        template_name = (
            node.get("node_template_name")
            or node.get("action_name")
            or node.get("template_name")
            or node.get("action")
            or node.get("name")
            or ""
        )
        existing_param = node.get("param", {}) or {}
        if node_uuid:
            result.append((node_uuid, resource_name, template_name, existing_param))
    return result
 def generate_template(nodes, registry_index, rounds):
    """生成 notebook 提交模板"""
    node_params = []
    schema_info = {}
    datas_template = []
    for node_uuid, resource_name, template_name, existing_param in nodes:
        param_template = {}
        matched = False
        if resource_name and template_name and resource_name in registry_index:
            avm = registry_index[resource_name]
            if template_name in avm:
                goal_schema = flatten_goal_schema(avm[template_name])
                param_template = build_param_template(goal_schema)
                goal_default = avm[template_name].get("goal_default", {})
                if goal_default:
                    for k, v in goal_default.items():
                        if k in param_template and v is not None:
                            param_template[k] = v
                matched = True
                schema_info[node_uuid] = {
                    "device_id": resource_name,
                    "action_name": template_name,
                    "action_type": avm[template_name].get("type", ""),
                    "schema_properties": list(goal_schema.get("properties", {}).keys()),
                    "required": goal_schema.get("required", []),
                }
        if not matched and existing_param:
            param_template = existing_param
        if not matched and not existing_param:
            schema_info[node_uuid] = {
                "device_id": resource_name,
                "action_name": template_name,
                "warning": "未在本地注册表中找到匹配的 action schema",
            }
        datas_template.append({
            "node_uuid": node_uuid,
            "param": param_template,
            "sample_params": [
                {
                    "container_uuid": "$TODO_CONTAINER_UUID",
                    "sample_value": {
                        "liquid_names": "$TODO_LIQUID_NAME",
                        "volumes": 0,
                    },
                }
            ],
        })
    for i in range(rounds):
        node_params.append({
            "sample_uuids": f"$TODO_SAMPLE_UUID_ROUND_{i + 1}",
            "datas": copy.deepcopy(datas_template),
        })
    return {
        "lab_uuid": "$TODO_LAB_UUID",
        "workflow_uuid": "$TODO_WORKFLOW_UUID",
        "name": "$TODO_EXPERIMENT_NAME",
        "node_params": node_params,
        "_schema_info（仅参考，提交时删除）": schema_info,
    }
 def parse_args(argv):
    """简单的参数解析"""
    opts = {
        "auth": None,
        "base": None,
        "workflow_uuid": None,
        "registry": None,
        "rounds": 1,
        "output": "notebook_template.json",
        "dump_response": False,
    }
    i = 0
    while i < len(argv):
        arg = argv[i]
        if arg == "--auth" and i + 1 < len(argv):
            opts["auth"] = argv[i + 1]
            i += 2
        elif arg == "--base" and i + 1 < len(argv):
            opts["base"] = argv[i + 1].rstrip("/")
            i += 2
        elif arg == "--workflow-uuid" and i + 1 < len(argv):
            opts["workflow_uuid"] = argv[i + 1]
            i += 2
        elif arg == "--registry" and i + 1 < len(argv):
            opts["registry"] = argv[i + 1]
            i += 2
        elif arg == "--rounds" and i + 1 < len(argv):
            opts["rounds"] = int(argv[i + 1])
            i += 2
        elif arg == "--output" and i + 1 < len(argv):
            opts["output"] = argv[i + 1]
            i += 2
        elif arg == "--dump-response":
            opts["dump_response"] = True
            i += 1
        else:
            print(f"未知参数: {arg}")
            i += 1
    return opts
 def main():
    opts = parse_args(sys.argv[1:])
    if not opts["auth"] or not opts["base"] or not opts["workflow_uuid"]:
        print("用法:")
        print("  python gen_notebook_params.py --auth <token> --base <url> --workflow-uuid <uuid> [选项]")
        print()
        print("必需参数:")
        print("  --auth <token>          Lab token（base64(ak:sk)）")
        print("  --base <url>            API 基础 URL")
        print("  --workflow-uuid <uuid>  目标 workflow UUID")
        print()
        print("可选参数:")
        print("  --registry <path>       注册表文件路径（默认自动搜索）")
        print("  --rounds <n>            实验轮次数（默认 1）")
        print("  --output <path>         输出文件路径（默认 notebook_template.json）")
        print("  --dump-response         打印 API 原始响应")
        sys.exit(1)
    # 1. 查找并加载本地注册表
    registry_path = find_registry(opts["registry"])
    registry_index = {}
    if registry_path:
        mtime = os.path.getmtime(registry_path)
        gen_time = datetime.fromtimestamp(mtime).strftime("%Y-%m-%d %H:%M:%S")
        print(f"注册表: {registry_path}  (生成时间: {gen_time})")
        registry_data = load_registry(registry_path)
        registry_index = build_registry_index(registry_data)
        print(f"已索引 {len(registry_index)} 个设备的 action schemas")
    else:
        print("警告: 未找到本地注册表，将跳过 param 模板生成")
        print("  提交时需要手动填写各节点的 param 字段")
    # 2. 获取 workflow 详情
    print(f"\n正在获取 workflow 详情: {opts['workflow_uuid']}")
    response = fetch_workflow_detail(opts["base"], opts["auth"], opts["workflow_uuid"])
    if not response:
        print("错误: 无法获取 workflow 详情")
        sys.exit(1)
    if opts["dump_response"]:
        print("\n=== API 原始响应 ===")
        print(json.dumps(response, indent=2, ensure_ascii=False)[:5000])
        print("=== 响应结束（截断至 5000 字符） ===\n")
    # 3. 提取节点
    nodes = extract_nodes_from_response(response)
    if not nodes:
        print("错误: 未能从 workflow 中提取任何 action 节点")
        print("请使用 --dump-response 查看原始响应结构")
        sys.exit(1)
    print(f"\n找到 {len(nodes)} 个 action 节点:")
    print(f"  {'节点 UUID':<40} {'设备 ID':<30} {'动作名':<25} {'Schema'}")
    print("  " + "-" * 110)
    for node_uuid, resource_name, template_name, _ in nodes:
        matched = "✓" if (resource_name in registry_index and
                          template_name in registry_index.get(resource_name, {})) else "✗"
        print(f"  {node_uuid:<40} {resource_name:<30} {template_name:<25} {matched}")
    # 4. 生成模板
    template = generate_template(nodes, registry_index, opts["rounds"])
    template["workflow_uuid"] = opts["workflow_uuid"]
    output_path = opts["output"]
    with open(output_path, "w", encoding="utf-8") as f:
        json.dump(template, f, indent=2, ensure_ascii=False)
    print(f"\n模板已写入: {output_path}")
    print(f"  轮次数: {opts['rounds']}")
    print(f"  节点数/轮: {len(nodes)}")
    print()
    print("下一步:")
    print("  1. 打开模板文件，将 $TODO 占位符替换为实际值")
    print("  2. 删除 _schema_info 字段（仅供参考）")
    print("  3. 使用 POST /api/v1/lab/notebook 提交")
 if __name__ == "__main__":
    main()