Carrier - Interactive Experiment Platform

Accountable AI Across the Research Lifecycle

Automation Experimentation Collaboration

Enter Platform →

Documentation

About

Contact

Internal Testing

Meet Carrier

What Carrier Can Do

One research platform, three connected workspaces — automate data annotation, run mixed human-AI experiments, and keep your team's agentic work transparent.

Automation

Annotate datasets at scale with large language models — using codebooks taken straight from peer-reviewed research.

AI Annotation at Scale

Annotate Your Corpus with Validated LLM Codebooks

Upload a CSV and label every row with large language models using annotators drawn straight from peer-reviewed research — sentiment, emotion, moral foundations, media framing, stance, and empathy coding. Compare OpenAI, Anthropic, and Google under one prompt, repeat rows to check consistency, and preview real token usage on a sample before you commit. Everything runs server-side and exports to CSV, Excel, or JSON.

Literature-Based Templates Multi-Provider Repeatable Runs Token Preview

Annotator — Sentiment Analysis

Input TextModelResult

"The product was great..." GPT-4o Positive · 0.94

"Disappointing quality..." Claude 3.5 Negative · 0.91

72% · 1,440 / 2,000 rows

Batch LLM Annotation

Upload CSV datasets, annotate with multiple AI models at scale.

▼

Upload CSV datasets and process with multiple AI models simultaneously. Built-in cost estimation, batch APIs, and multi-repetition for inter-rater reliability. Choose from 20+ literature-based templates or create custom annotation tasks.

CSV Upload Multi-Provider Cost Estimation 20+ Templates

Annotator — Sentiment Analysis

Input TextModelResult

"The product was great..." GPT-4o Positive · 0.94

"Disappointing quality..." Claude 3.5 Negative · 0.91

72% · 1,440 / 2,000 rows

Literature-Based Templates

A growing library of annotators from peer-reviewed papers.

▼

Start from annotators drawn verbatim from peer-reviewed research — PNAS, Nature Machine Intelligence, Scientific Reports — each shipping its exact prompt, model settings, and the paper's reported accuracy and reliability. The catalogue keeps growing as new validated templates are added, and you can save your own and contribute them back to the shared set after an admin review.

Peer-Reviewed Sources Always Expanding Save & Share

Template Library — 24 and counting

Moral Foundations PNAS · 2024 α 0.82

Empathy Coding (EPITOME) Nature MI · 2026 F1 0.88

+ sentiment, emotion, stance, framing… library still growing

Token Preview & Batching

See real token usage before you spend, batch large jobs.

▼

Run a small live sample to see real token usage extrapolated across your whole corpus — with links to each provider's pricing — before you commit budget. Send large OpenAI and Anthropic jobs through batch APIs for cheaper, high-volume processing. (Google has no batch API and runs in standard mode.)

Sample-Based Preview Batch APIs Budget Control

Cost Preview — sample of 5 rows

Avg input tokens / row312

Avg output tokens / row48

Projected · 2,000 rows~720k tokens

See provider pricing pages for current rates →

Repeatable, Multi-Model Runs

Repeat every row across models and passes for reliability analysis.

▼

Re-run every row up to twenty times per model and across providers, with each pass exported as its own column — the raw material for your own consistency and inter-rater reliability analysis. (Carrier stores every repeated run; you compute the agreement statistic that fits your design.)

Up to 20× per Model Cross-Provider One Column per Run

Results — one column per model per run

RowGPT-4o · r1GPT-4o · r2Claude · r1

#001PositivePositivePositive

#002NeutralPositiveNeutral

Export → compute κ / α in your stats tool

Experimentation

Design and run studies that mix real people, scripted bots, and LLM agents. Every Carrier experiment is built from four interlocking systems — Shape (chamberlines, chambers, segments: the journey), Roles (communicator, mediator, processor), Variables (what you know about each participant), and Triggers (how non-human participants behave).

Mixed Human–AI Interaction

Humans, Agents, and Every Mix Between

As social interaction moves online, it increasingly blends people and AI. Carrier lets you compose and study every configuration in one room — human-to-human, agent-to-agent, human-to-agent, and human conversations supported by AI assistants — with each agent's identity disclosed or blinded.

Human ↔ Human Agent ↔ Agent Human ↔ Agent AI-Assisted

Interaction Configurations

H ↔ H

Human ↔ Human

AI ↔ B

Agent ↔ Agent

H ↔ AI

Human ↔ Agent

H ↔ H +

Human ↔ Human · AI-assisted

Any participant, any role · agent identities disclosed or blinded

Participant Role System

Three Roles, Three Zones

Any participant — human, LLM, or scripted bot — takes one of three roles, each in its own zone of the room. Communicators converse in the main chat. Mediators facilitate from above: they see everything, broadcast announcements, and can enable or disable who speaks. Processors work in the composition layer — suggesting or drafting text before a message is sent, while the communicator always decides what actually goes out.

Communicator → converse Mediator → facilitate Processor → assist

Chat Segment — Three Zones

Mediator · broadcasts · sees all · controls input

"2 minutes left — try to reach consensus."

Communicator · converses in the chat

Alex Human — "I'd prioritise transparency."

Claude LLM — "Fair — though cost matters too."

Processor · assists before you send

Draft: "I disagree with that…"

Suggestion: add a concrete example Use Dismiss

You decide what actually gets sent.

The Variable System

Define It Once, Use It Everywhere

Turn anything you know about a participant into a variable — a survey answer, a random or counterbalanced condition, or a value computed from other variables — fixed per participant at the start of their run. Then reference it anywhere with {{ }}: personalise instructions, fill LLM system prompts, decide who gets matched together, and show or hide content by condition. Define your conditions once, and the whole study reads from them.

Survey-Derived Random & Counterbalanced Computed Expressions Prompts · Matching · Visibility

Variables Manager

condition counterbalance High / Low anonymity

age survey from pre-survey · Q3

persona expression based on {{condition}}

Referenced everywhere

InstructionWelcome — you're in the {{condition}} group.

LLM promptAdopt a {{persona}} tone.

VisibilityShow debrief if {{age}} ≥ 18

MatchingGroup by {{condition}}

Segment System

Compose a Timed Session, Step by Step

Chain activities into one timed sequence every participant moves through together — chat, vote, rank, survey, watch media, complete a task, and more. Each segment carries its own timing and a transition rule (auto, manual, synced, or host-led), and AI behaviour can be overridden per segment. Eleven activity types in all.

11 Activity Types Timed Transitions Sync Modes Agent Overrides

Session Timeline

Every participant · same order

Instruction 1m auto

Survey 3m manual

Chat agent override 10m sync

Ranking 2m auto

11 types — chat · selection · ranking · survey · input · media · timer · task · slide · instruction · attention-check

Agentic Experiment Builder

Design the Four Systems by Conversation

Describe your study in plain language and an AI agent reads your current experiment and proposes concrete changes — adding chamberlines and chambers, configuring roles, setting variables and triggers. It works across OpenAI, Anthropic, and Google, and it is human-in-the-loop by design: the agent proposes every change for you to review and accept or reject. You stay in control of the experiment; the agent just does the wiring.

Natural-Language Design Proposes, You Approve Multi-Provider

Design Assistant

You

Researcher

Add a second condition with a mediator that summarises the discussion every 5 messages.

Proposed changes

+ Chamberline “Mediated” · + Mediator role (aggregate, every 5 msgs)

Accept Reject

Experiment Builder

Visual drag-and-drop experiment design — no code required.

▼

Visual 3-pane builder. Drag segments and participants from a library onto a timeline canvas, configure properties in the inspector panel. Preview and validate experiments before deploying.

Drag & Drop 3-Pane Layout Visual Timeline

DELIBERATION STUDYValidatePreviewDeploy

Library

Segments

💬 Chat

☑️ Select

📊 Rank

📋 Survey

Chamber Line A

Intro

2 seg

Discussion

3 seg

Debrief

1 seg

Discussion — Segments

Chat · 10m

Select · 3m

Survey · 5m

Inspector

Chamber Name

Discussion

Participants

👤 2 Humans

🤖 1 Bot

AI Design Assistant

Describe your experiment in natural language — the AI builds it for you.

▼

An agentic LLM assistant that co-designs experiments through conversation. Describe what you need — "add a 3-person chat with a mediator bot" — and it configures chambers, segments, triggers, and participant slots. It understands the full experiment model and can scaffold complex designs from a brief description.

Natural Language Agentic Design Auto-Config

AI Assistant + Experiment Builder

AI Assistant

You

I need a group deliberation with 3 humans and a mediator bot that summarises every 5 messages

Assistant

Done! I've created a chamber with 3 communicators and 1 mediator bot. Added a periodic trigger (every 5 messages) with summary broadcast. Want me to add a post-survey?

You

Yes, and add a ranking segment after the chat

Generated Config

Chamber Line A

Chamber 1 — Deliberation

3H + 1 Mediator Bot

Chat · 15m Rank · 3m Survey · 5m

Trigger: periodic (5 msgs) → summary broadcast

Smart Participant Matching

Automatic grouping by condition, survey response, or queue order.

▼

Match participants into chatrooms automatically using three strategies. Survey-based assignment groups people by their pre-survey answers. Counterbalancing ensures even distribution across conditions. FIFO matches in queue order for speed.

Survey-Based Counterbalance FIFO

Matching Queue

Alex Condition A matched

Jordan Condition A matched

Sam Condition B waiting

Multi-Provider LLMs

OpenAI, Anthropic, and Google models side by side.

▼

Run the same experiment with different LLM providers to compare responses. Configure temperature, context window, system prompts, and response logic per agent. Supports GPT-4o, Claude, Gemini, and custom endpoints.

OpenAI Anthropic Google Custom API

Agent Configuration

Agent A

GPT-4o

temp: 0.7

Agent B

Claude 3.5

temp: 0.7

Agent C

Gemini Pro

temp: 0.7

Same prompt · Same context · Compare outputs

Live Monitoring Dashboard

Real-time session tracking, alerts, and data export.

▼

Watch every active session in real time. Get alerts for disconnects, long waits, and drop-outs. Pause, resume, or end individual participant sessions. Export all data as CSV, XLSX, or JSON.

Real-time Alerts Session Control Export

Dashboard

Active

Waiting

Completed

⚠ P-0847 disconnected 45s ago — Chamber 2

No-Code Bot Scripting

13+ trigger types for scripted agent behavior — no code required.

▼

Build sophisticated bot behavior with trigger-response rules. Keywords, regex, timed events, message counts, cross-bot chains, activity timeouts, and more. Triggers can fire conditionally, chain to other triggers, and have cooldowns and probability controls.

Keyword & Regex Timed Triggers Chain Reactions No Code

Bot Configuration

keyword "hello" → "Welcome to the study!"

time 30s → "Any initial thoughts?"

msg-count = 10 → "Let's summarize"

Template Library

Save any experiment as a reusable, forkable template.

▼

Save any experiment as a reusable template and one-click fork it into a fresh study — all IDs regenerated. Browse by experiment type, participant count, and chamber structure.

Quick-Start Fork & Customize ID-Safe Cloning

Template Library

Group Deliberation 3 chambers · 2H + 1AI Fork

Dyad Conversation 2 chambers · 2H Fork

Human-AI Collab 3 chambers · 1H + 2AI Fork

Collaboration

Bring your team's Claude Code agent sessions into one shared, self-hosted workspace — and make the way your studies are built open to inspection.

Team Workspace

See Your Lab's Agentic Work in One Place

Carrier mirrors your team's Claude Code sessions and shared memory into a single workspace, so you can review how the lab is using AI agents without combing through everyone's local files. Because a shared agentic record makes every prompt, edit, and decision visible, the way a study was built and adjusted becomes transparent and auditable to collaborators — research manipulations are open to inspection, not hidden in someone's terminal. Link a GitHub repo or upload directly from each machine; either way the data stays on infrastructure you control.

Research Transparency Self-Hosted Claude Code Sessions Shared Memory

Workspace — deliberation-study

alice@lab“Add mediated condition”Pushed

12 min · 4 files · 2 commits

ben@lab“Tune trigger keywords”Local

6 min · 1 file

Linked Repos

Auto-sync a team's shared Claude Code branch from GitHub.

▼

Connect a repository with a GitHub token (encrypted at rest with AES-256-GCM — fine-grained recommended, classic also accepted). Carrier mirrors the team's shared Claude Code branch and re-syncs roughly every 30 seconds, so the workspace stays current as your team pushes.

Encrypted Token ~30s Auto-Sync Read-Only

Linked Repo

org / deliberation-study Synced 14s ago

Branchclaude-team-share

Tokenfine-grained · encrypted

Auto-syncevery ~30s

Manual Upload

Sync without GitHub, straight from each machine.

▼

A small Python client bundles each machine's local sessions and memory and uploads them straight to your Carrier server over an authenticated API key (hashed with bcrypt and shown only once). Wire it to a Claude Code SessionEnd hook for hands-off syncing — no GitHub required.

No GitHub Needed SessionEnd Hook Direct to Your Server

carrier_sync.py

$ python3 carrier_sync.py
packaging ~/.claude-team-share …
uploading → carrier.lab.ac.uk …
✓ 3 sessions · 2 memory · 412 KB

Runs automatically on each Claude Code SessionEnd

Session Transcripts

Review every Claude Code session turn by turn.

▼

Open any team member's Claude Code session and review it turn by turn — the prompts, the tools and files touched, and whether the work was committed and pushed — with duration and file and commit counts at a glance.

Turn-by-Turn Files & Commits Per Person

alice@lab · session

"Add mediated condition"

P "add a mediator that summarises every 5 messages"

Edit · models/Participant.js Bash · git commit -m "mediator"

4 files2 commitsPushed

Shared Memory

One home for the team's accumulated agent memory.

▼

Browse the Claude Code memory entries the team has shared — name, type, and full body — grouped by person, in one place, so hard-won context is not stranded on a single laptop.

Grouped by Person Full Entries Shared Context

Shared Memory

matching-rules project alice@lab

consent-copy reference ben@lab

stimulus-set-v2 project alice@lab

↑ Click any card above to expand its details and mock UI

Responsible Agentic AI

Accountable at Every Stage of the Research Lifecycle

Empirical social science is moving into the digital space — and Carrier puts LLMs and AI agents to work at every stage of it, from first design to final dataset. At each step the researcher stays in control: agents propose, you decide; every action stays visible; every method is validated and reproducible.

01

Design
Workspace

Shape studies with agentic assistants and keep every AI-made decision visible to your team — how a study was built stays transparent and auditable.
Human-in-the-loop
02

Deploy
Experiment Builder

Launch mixed human–AI experiments — real participants alongside LLM agents and scripted bots — running exactly as you designed them.
Safety & data controls
03

Analyse
Annotator

Code and label data at scale with LLM codebooks drawn from peer-reviewed research, repeatable across models for the consistency your analysis needs.
Validated & reproducible

What Carrier Can Do

Automation

Annotate Your Corpus with Validated LLM Codebooks

Batch LLM Annotation

Literature-Based Templates

Token Preview & Batching

Repeatable, Multi-Model Runs

Experimentation

Humans, Agents, and Every Mix Between

Three Roles, Three Zones

Define It Once, Use It Everywhere

Compose a Timed Session, Step by Step

Design the Four Systems by Conversation

Experiment Builder

AI Design Assistant

Smart Participant Matching

Multi-Provider LLMs

Live Monitoring Dashboard

No-Code Bot Scripting

Template Library

Collaboration

See Your Lab's Agentic Work in One Place

Linked Repos

Manual Upload

Session Transcripts

Shared Memory

Accountable at Every Stage of the Research Lifecycle

Design

Deploy

Analyse