fugue labs gollem · live
Fugue Labs · est. 2026 · independent

Agent runtimes,
written in Go.

We build the infrastructure that lets LLM agents run reliably: durable execution, structured output, multi-provider streaming, and guardrails, in a single binary with zero core dependencies.


01

Thesis runtime, not API

Production AI does not fail because the model is too small. It fails because the runtime around the model is an afterthought.

Most agent frameworks treat durable execution, type-safe tools, and strict structured output as "enterprise features": bolted on later, if at all. We treat them as load-bearing.

We write in Go. We ship single binaries. Our agents run for days, crash, resume where they left off, and tell you exactly what they spent doing it.

The memory layer is still research. The runtime is already in production.


02

Gollem the agent framework

docs ↗ github ↗
gollem stable docs ↗ Go 1.22+ Apache 2.0

Production-grade agent framework for Go. Type-safe agents with generics. Strict structured output validated at the schema boundary. Multi-provider streaming (Anthropic, OpenAI, Vertex AI). Native Temporal integration for durable multi-day runs. Multi-agent orchestration. Guardrails. MCP client and server. Zero core dependencies. Full reference, runnable examples, and a live agent-builder playground at gollem.fugue-labs.ai.

  • core: typed agents, structured output, tool calls, streaming
  • ext/temporal: durable execution across workflow restarts
  • ext/codetool: full coding-agent toolset (edit/grep/bash/LSP)
  • ext/team: multi-agent orchestration with handoff filters
  • ext/mcp: MCP client, server, SSE, sampling bridge
  • ext/deep: long-running agents with checkpointed context
  • ext/monty: Python execution via WASM, no CGO
  • ext/agui: stream agent UI events over SSE
gollem.nvim/internal/sidecar/server.go
1// From gollem.nvim/internal/sidecar: a coding agent with the full codetool2// toolset, human-in-the-loop approval, tool events bussed to the editor.3agent := core.NewAgent[string](model, append(4    codetool.AgentOptions(cwd),                  // edit, grep, bash, LSP, read, write5    core.WithToolApproval[string](approvalFn),6    core.WithEventBus[KafkaMessage](bus),              // tool events → editor UI7    core.WithRunCondition[string](core.MaxRunDuration(24 * time.Hour)),8)...)910stream, _ := agent.RunStream(ctx, "remove deprecated call sites")11defer stream.Close()1213// Go 1.23+ iterators. Debounced text deltas repaint the assistant pane.14for text, err := range streamutil.StreamTextDebounced(stream, 50*time.Millisecond) {15    if err != nil { return err }16    editor.SetAssistantPane(text)17}
1// From brainrot-detection: typed structured output, validated at the schema boundary.2type Classification struct {3    IsBrainrot bool   `json:"is_brainrot"`4    Confidence string `json:"confidence" jsonschema:"enum=high|medium|low"`5    Reason     string `json:"reason"`6}78agent := core.NewAgent[Classification](provider,9    core.WithSystemPrompt[Classification](10        "You are a parental content filter for a 6-year-old..."),11)1213res, _ := agent.Run(ctx, "Title: Skibidi Toilet Ep. 73\nChannel: ...")14if res.Output.IsBrainrot {15    sonos.PlayWarning()          // +30% volume over the soundbar16    time.Sleep(10 * time.Second)17    if stillBrainrot() { tv.PowerOff() }  // LG WebOS → off18}
1// Streaming: one unified iterator across Anthropic / OpenAI / Vertex.2stream, _ := agent.RunStream(ctx, "Write a haiku about goroutines.")34for ev, err := range stream.StreamEvents() {5    if err != nil { continue }6    d, ok := ev.(core.PartDeltaEvent)7    if !ok { continue }8    switch x := d.Delta.(type) {9    case core.TextPartDelta:10        fmt.Print(x.ContentDelta)11    case core.ToolCallPartDelta:12        log.Printf("args=%s", x.ArgsJSONDelta)13    }14}
nvim → :Gollem "remove deprecated call sites"
sidecar │ session     sess-9f2c  model=claude-sonnet-4-6
sidecar │ tool_call   grep("Deprecated:")                42ms
sidecar │ tool_result 37 matches across 9 files          118ms
sidecar │ tool_call   edit(src/api.go:142)                89ms
sidecar │ approval    bash("go vet ./...") → ok            1.2s
sidecar │ stream      "removing 3 call sites in api.go..."
sidecar │ usage       in=3412  out=1208  cost=$0.0287

03

Sleepy your code evolves while you don't

github ↗
sleepy design-partner beta BYO LLM MCP-native

LLM-guided evolution, hosted as an MCP server. Point it at anything with a fitness signal: a hot function and a benchmark, a system prompt and an eval set, an agent topology and a task suite, an infrastructure config and a staging score. Sleepy runs populations, selection, crossover, and convergence-detection until the winner beats the baseline. In the background. While you work on something else.

The twist: the server owns zero intelligence. Mutations come from your LLM subscription over MCP sampling; the evaluator (tests, benchmark, staging) runs on your machine. The server orchestrates evolution but never holds your keys and never runs your code. Every candidate is gated end-to-end by the tests you already have.

  • targets: functions, prompts, agent graphs, configs, multi-file refactors
  • languages: Go, Python, Rust, JavaScript, C++, Zig, Java
  • evolve: populations, MAP-Elites, islands, selection, crossover
  • mutation: LLM-driven SEARCH/REPLACE diffs with a fuzz-tested parser
  • reward-hacking gate: tests must pass before a benchmark is recorded
  • hosted mode: serve + worker + watch, SQLite or Postgres, durable pause/resume
  • providers: Anthropic, OpenAI, Ollama, Codex CLI, or zero-config claude
sleepy · v0.3.0 · chatgpt (gpt-5.4)
> make my codebase faster

 analyze_project(path=".")                             1.8s
    3 targets · benchmark evaluator

    1. ext/codetool/middleware.go         95%  parsing/walk hot path
    2. ext/orchestrator/sqlite/store.go   88%  DB/store overhead
    3. ext/codetool/bash.go               81%  buffer-heavy exec

  Three heaviest paths: parsing in codetool, sqlite store
  overhead, shell exec buffers. Starting evolution on all
  three, chasing the biggest wins first.

 start_evolution(max_gen=20, targets=[3 items])

  ─── evolution ────────────────────────────────────────────
    Gen 1   LLM-mutate   middleware.go    8471 ns/op   1.21×
    Gen 2   LLM-mutate   middleware.go    5103 ns/op   2.01×
    Gen 5   crossover    store.go         1984 ns/op   5.18×
    Gen 9   LLM-mutate   middleware.go     612 ns/op  16.8×
    Gen 14  MAP-Elites   bash.go           287 ns/op  35.9×  tests pass ✓

  1 ◉ middleware.go      gen 14/20  35.9×   ██████████░░
  2 ◉ sqlite/store.go    gen 11/20  18.4×   ████████░░░░
  3 ◉ bash.go            gen  9/20   9.2×   ██████░░░░░░

  EVOLVING  p pause · d diff · q stop         chat 7.5k→552 tok · $4.21

04

Research open problems

We work on the systems research problems that only show up once your agents have been running for a thousand hours. These are the ones we're actively investing in.

Evaluator robustness under evolutionary pressure. Sleepy depends on the evaluator being harder to game than the mutator is to make clever. As LLM-driven mutation gets sharper, fitness functions get gamed in ways the author didn't anticipate. The reward-hacking gate is a floor, not a ceiling. We're building tooling to fuzz evaluators adversarially before the mutator finds the exploit you didn't.

Long-horizon context continuity. Running a coherent agent for 24 hours is a memory management problem, a prompt-cache problem, and a context-window problem at the same time. Naive solutions blow the cache; sophisticated solutions lose coherence. We're working on memory injection and eviction strategies that preserve cache hits across multi-hour sessions without breaking the agent's mental model.

Deterministic replay of agent runs. If you can replay a multi-day run deterministically, you can differential-test framework changes, A/B prompts against a fixed trace, and answer "what did the agent do, and why" in a way regulated environments actually accept. This is a runtime problem first and a research problem second. Most frameworks weren't built to make it possible.

Agent topology search. The shape of a multi-agent system (which tools, how many sub-agents, what the handoff graph looks like) is almost always hand-designed. Sleepy plus a topology search space turns "what's the right architecture for this task" from an architect's guess into a measurable optimization problem. Early work. The substrate is Sleepy.