v0.2 · early alpha · MCP record/replay in dev

Your prompt lifecycle,
under version control.

Specs in Git. Signed releases. Test harnesses for JUnit, pytest, and Vitest. No new SaaS, no new auth — promptLM lives in the Git infra your team already trusts.

Get started View on GitHub

$ curl -fsSL https://raw.githubusercontent.com/promptLM/promptlm-app/main/scripts/install.sh | bash

promptlm.yaml · v1.0.0 → v1.1.0 +2 −2 · just now

1 id: customer_support

2 - version: 1.0.0

2 + version: 1.1.0

3 request:

4 parameters:

5 - temperature: 0.6

5 + temperature: 0.3

03 — capabilities

Spec to release to replay —
the prompt lifecycle, one tool deep.

PROMPTS AS SPEC

Define a prompt. Version it. Release it.

Every prompt is a spec file — vendor, model, parameters, messages. Versioned in Git. Released as a signed JAR your app pins.

# promptlm.yaml
id: customer_support
group: support
version: 1.0.0
request:
  vendor: openai
  model: gpt-4o
  parameters: { temperature: 0.6, maxTokens: 256 }
  messages:
    - role: system
      content: You are a support assistant.
    - role: user
      content: "Summarize: {{ticket}}"

STUDIO CATALOG

Every prompt. Every release. One catalog.

Browse, search, and filter the inventory. See status, executions, and latency at a glance. Syncs from Git on push.

promptLM studio — prompt catalog with version, model, status, and execution counts

IN YOUR APP

Pin a version. Load it. Ship it.

Released prompts ship as a JAR your app depends on. Load by id, fill placeholders, send. Vendor, model, and parameters travel with the spec — your code never drifts from the prompt.

promptLM studio — prompt detail showing version 1.8.0, model, parameters, and dev execution metrics

// SupportService.java
Prompt prompt = PromptLoader.load("customer_support");  // pinned v1.0.0 in pom.xml
String filled = prompt.fill(Map.of("ticket", ticket));

// vendor, model, parameters come from the spec — not your code
String reply = llm.send(filled, prompt.model(), prompt.parameters());

TEST SUPPORT

Test against
recorded responses.

Each released prompt ships with golden responses captured during dev. Your tests replay them locally via WireMock — no real LLM calls, no tokens billed, deterministic in CI.

Golden responses bundled with the prompt JAR
WireMock stubs auto-generated — no manual fixtures
@InjectResponse for equality assertions
Zero tokens. Zero flake. Same bytes, every run.

// SupportServiceTest.java
@EnablePromptWireMock // boots WireMock on a local port,
class SupportServiceTest {              // pre-stubbed from your prompt repo

  @Test
  void summarizesTicket(@InjectResponse(id = "customer_support") String recorded) {
    String reply = supportService.summarize(ticket); // your prod code POSTs to the LLM
    // The LLM URL points at WireMock, so the recorded reply is returned.
    // No real model call, no tokens billed, deterministic across runs.
    assertThat(reply).isEqualTo(recorded);
  }
}

# test_support_service.py
from promptlm.testing import enable_prompt_mock, inject_response

@enable_prompt_mock                                  # boots a mock LLM on a local port,
@inject_response("customer_support")                   # pre-stubbed from your prompt repo
def test_summarizes_ticket(recorded: str):
    reply = support_service.summarize(ticket)        # your prod code POSTs to the LLM
    # The LLM URL points at the mock, so the recorded reply is returned.
    # No real model call, no tokens billed, deterministic across runs.
    assert reply == recorded

// supportService.test.ts
import { enablePromptMock, injectResponse } from '@promptlm/testing';

enablePromptMock();                                  // boots a mock LLM on a local port,

test('summarizes ticket', () => {                    // pre-stubbed from your prompt repo
  const recorded = injectResponse('customer_support');
  const reply    = supportService.summarize(ticket); // your prod code POSTs to the LLM
  // The LLM URL points at the mock, so the recorded reply is returned.
  // No real model call, no tokens billed, deterministic across runs.
  expect(reply).toBe(recorded);
});

COMING SOON ROADMAP

MCP record/replay test harness.

Capture every tool call against your real MCP servers, freeze the bytes, and replay them deterministically in CI. No flaky network, no vendor rate limits, no surprise charges. Same story as prompt replay — extended to the tools your agent actually uses.

GET STARTED

Ship prompts
like code.

View on GitHub Read the docs

Your prompt lifecycle, under version control.

Spec to release to replay — the prompt lifecycle, one tool deep.