Accountable AI Across the Research Lifecycle
Automation Experimentation Collaboration
Enter Platform
Internal Testing
Meet Carrier

What Carrier Can Do

One research platform, three connected workspaces — automate data annotation, run mixed human-AI experiments, and keep your team's agentic work transparent.

Automation

Annotate datasets at scale with large language models — using codebooks taken straight from peer-reviewed research.

01
AI Annotation at Scale

Annotate Your Corpus with Validated LLM Codebooks

Upload a CSV and label every row with large language models using annotators drawn straight from peer-reviewed research — sentiment, emotion, moral foundations, media framing, stance, and empathy coding. Compare OpenAI, Anthropic, and Google under one prompt, repeat rows to check consistency, and preview real token usage on a sample before you commit. Everything runs server-side and exports to CSV, Excel, or JSON.

Literature-Based Templates Multi-Provider Repeatable Runs Token Preview
Annotator — Sentiment Analysis
Input TextModelResult
"The product was great..." GPT-4o Positive · 0.94
"Disappointing quality..." Claude 3.5 Negative · 0.91
72% · 1,440 / 2,000 rows

Batch LLM Annotation

Upload CSV datasets, annotate with multiple AI models at scale.

Upload CSV datasets and process with multiple AI models simultaneously. Built-in cost estimation, batch APIs, and multi-repetition for inter-rater reliability. Choose from 20+ literature-based templates or create custom annotation tasks.
CSV Upload Multi-Provider Cost Estimation 20+ Templates
Annotator — Sentiment Analysis
Input TextModelResult
"The product was great..." GPT-4o Positive · 0.94
"Disappointing quality..." Claude 3.5 Negative · 0.91
72% · 1,440 / 2,000 rows

Literature-Based Templates

A growing library of annotators from peer-reviewed papers.

Start from annotators drawn verbatim from peer-reviewed research — PNAS, Nature Machine Intelligence, Scientific Reports — each shipping its exact prompt, model settings, and the paper's reported accuracy and reliability. The catalogue keeps growing as new validated templates are added, and you can save your own and contribute them back to the shared set after an admin review.
Peer-Reviewed Sources Always Expanding Save & Share
Template Library — 24 and counting
Moral Foundations PNAS · 2024 α 0.82
Empathy Coding (EPITOME) Nature MI · 2026 F1 0.88
+ sentiment, emotion, stance, framing… library still growing

Token Preview & Batching

See real token usage before you spend, batch large jobs.

Run a small live sample to see real token usage extrapolated across your whole corpus — with links to each provider's pricing — before you commit budget. Send large OpenAI and Anthropic jobs through batch APIs for cheaper, high-volume processing. (Google has no batch API and runs in standard mode.)
Sample-Based Preview Batch APIs Budget Control
Cost Preview — sample of 5 rows
Avg input tokens / row312
Avg output tokens / row48
Projected · 2,000 rows~720k tokens
See provider pricing pages for current rates →

Repeatable, Multi-Model Runs

Repeat every row across models and passes for reliability analysis.

Re-run every row up to twenty times per model and across providers, with each pass exported as its own column — the raw material for your own consistency and inter-rater reliability analysis. (Carrier stores every repeated run; you compute the agreement statistic that fits your design.)
Up to 20× per Model Cross-Provider One Column per Run
Results — one column per model per run
RowGPT-4o · r1GPT-4o · r2Claude · r1
#001PositivePositivePositive
#002NeutralPositiveNeutral
Export → compute κ / α in your stats tool

Experimentation

Design and run studies that mix real people, scripted bots, and LLM agents. Every Carrier experiment is built from four interlocking systemsShape (chamberlines, chambers, segments: the journey), Roles (communicator, mediator, processor), Variables (what you know about each participant), and Triggers (how non-human participants behave).

01
Mixed Human–AI Interaction

Humans, Agents, and Every Mix Between

As social interaction moves online, it increasingly blends people and AI. Carrier lets you compose and study every configuration in one room — human-to-human, agent-to-agent, human-to-agent, and human conversations supported by AI assistants — with each agent's identity disclosed or blinded.

Human ↔ Human Agent ↔ Agent Human ↔ Agent AI-Assisted
Interaction Configurations
H H
Human ↔ Human
AI B
Agent ↔ Agent
H AI
Human ↔ Agent
H H +
Human ↔ Human · AI-assisted
Any participant, any role · agent identities disclosed or blinded
02
Participant Role System

Three Roles, Three Zones

Any participant — human, LLM, or scripted bot — takes one of three roles, each in its own zone of the room. Communicators converse in the main chat. Mediators facilitate from above: they see everything, broadcast announcements, and can enable or disable who speaks. Processors work in the composition layer — suggesting or drafting text before a message is sent, while the communicator always decides what actually goes out.

Communicator → converse Mediator → facilitate Processor → assist
Chat Segment — Three Zones
Mediator · broadcasts · sees all · controls input
"2 minutes left — try to reach consensus."
Communicator · converses in the chat
Alex Human — "I'd prioritise transparency."
Claude LLM — "Fair — though cost matters too."
Processor · assists before you send
Draft: "I disagree with that…"
Suggestion: add a concrete example Use Dismiss
You decide what actually gets sent.
03
The Variable System

Define It Once, Use It Everywhere

Turn anything you know about a participant into a variable — a survey answer, a random or counterbalanced condition, or a value computed from other variables — fixed per participant at the start of their run. Then reference it anywhere with {{ }}: personalise instructions, fill LLM system prompts, decide who gets matched together, and show or hide content by condition. Define your conditions once, and the whole study reads from them.

Survey-Derived Random & Counterbalanced Computed Expressions Prompts · Matching · Visibility
Variables Manager
condition counterbalance High / Low anonymity
age survey from pre-survey · Q3
persona expression based on {{condition}}
Referenced everywhere
InstructionWelcome — you're in the {{condition}} group.
LLM promptAdopt a {{persona}} tone.
VisibilityShow debrief if {{age}} ≥ 18
MatchingGroup by {{condition}}
04
Segment System

Compose a Timed Session, Step by Step

Chain activities into one timed sequence every participant moves through together — chat, vote, rank, survey, watch media, complete a task, and more. Each segment carries its own timing and a transition rule (auto, manual, synced, or host-led), and AI behaviour can be overridden per segment. Eleven activity types in all.

11 Activity Types Timed Transitions Sync Modes Agent Overrides
Session Timeline
Every participant · same order
Ix
Instruction 1m auto
Su
Survey 3m manual
Ch
Chat agent override 10m sync
Rk
Ranking 2m auto
11 types — chat · selection · ranking · survey · input · media · timer · task · slide · instruction · attention-check
05
Agentic Experiment Builder

Design the Four Systems by Conversation

Describe your study in plain language and an AI agent reads your current experiment and proposes concrete changes — adding chamberlines and chambers, configuring roles, setting variables and triggers. It works across OpenAI, Anthropic, and Google, and it is human-in-the-loop by design: the agent proposes every change for you to review and accept or reject. You stay in control of the experiment; the agent just does the wiring.

Natural-Language Design Proposes, You Approve Multi-Provider
Design Assistant
You
Researcher
Add a second condition with a mediator that summarises the discussion every 5 messages.
Proposed changes
+ Chamberline “Mediated” · + Mediator role (aggregate, every 5 msgs)
Accept Reject

Experiment Builder

Visual drag-and-drop experiment design — no code required.

Visual 3-pane builder. Drag segments and participants from a library onto a timeline canvas, configure properties in the inspector panel. Preview and validate experiments before deploying.
Drag & Drop 3-Pane Layout Visual Timeline
DELIBERATION STUDYValidatePreviewDeploy
Library
Segments
💬 Chat
☑️ Select
📊 Rank
📋 Survey
Chamber Line A
Intro
2 seg
Discussion
3 seg
Debrief
1 seg
Discussion — Segments
Chat · 10m
Select · 3m
Survey · 5m
Inspector
Chamber Name
Discussion
Participants
👤 2 Humans
🤖 1 Bot

AI Design Assistant

Describe your experiment in natural language — the AI builds it for you.

An agentic LLM assistant that co-designs experiments through conversation. Describe what you need — "add a 3-person chat with a mediator bot" — and it configures chambers, segments, triggers, and participant slots. It understands the full experiment model and can scaffold complex designs from a brief description.
Natural Language Agentic Design Auto-Config
AI Assistant + Experiment Builder
AI Assistant
You
I need a group deliberation with 3 humans and a mediator bot that summarises every 5 messages
Assistant
Done! I've created a chamber with 3 communicators and 1 mediator bot. Added a periodic trigger (every 5 messages) with summary broadcast. Want me to add a post-survey?
You
Yes, and add a ranking segment after the chat
Generated Config
Chamber Line A
Chamber 1 — Deliberation
3H + 1 Mediator Bot
Chat · 15m Rank · 3m Survey · 5m
Trigger: periodic (5 msgs) → summary broadcast

Smart Participant Matching

Automatic grouping by condition, survey response, or queue order.

Match participants into chatrooms automatically using three strategies. Survey-based assignment groups people by their pre-survey answers. Counterbalancing ensures even distribution across conditions. FIFO matches in queue order for speed.
Survey-Based Counterbalance FIFO
Matching Queue
P1
Alex Condition A matched
P2
Jordan Condition A matched
P3
Sam Condition B waiting

Multi-Provider LLMs

OpenAI, Anthropic, and Google models side by side.

Run the same experiment with different LLM providers to compare responses. Configure temperature, context window, system prompts, and response logic per agent. Supports GPT-4o, Claude, Gemini, and custom endpoints.
OpenAI Anthropic Google Custom API
Agent Configuration
Agent A
GPT-4o
temp: 0.7
Agent B
Claude 3.5
temp: 0.7
Agent C
Gemini Pro
temp: 0.7
Same prompt · Same context · Compare outputs

Live Monitoring Dashboard

Real-time session tracking, alerts, and data export.

Watch every active session in real time. Get alerts for disconnects, long waits, and drop-outs. Pause, resume, or end individual participant sessions. Export all data as CSV, XLSX, or JSON.
Real-time Alerts Session Control Export
Dashboard
12
Active
3
Waiting
47
Completed
⚠ P-0847 disconnected 45s ago — Chamber 2

No-Code Bot Scripting

13+ trigger types for scripted agent behavior — no code required.

Build sophisticated bot behavior with trigger-response rules. Keywords, regex, timed events, message counts, cross-bot chains, activity timeouts, and more. Triggers can fire conditionally, chain to other triggers, and have cooldowns and probability controls.
Keyword & Regex Timed Triggers Chain Reactions No Code
Bot Configuration
keyword "hello" "Welcome to the study!"
time 30s "Any initial thoughts?"
msg-count = 10 "Let's summarize"

Template Library

Save any experiment as a reusable, forkable template.

Save any experiment as a reusable template and one-click fork it into a fresh study — all IDs regenerated. Browse by experiment type, participant count, and chamber structure.
Quick-Start Fork & Customize ID-Safe Cloning
Template Library
Group Deliberation 3 chambers · 2H + 1AI Fork
Dyad Conversation 2 chambers · 2H Fork
Human-AI Collab 3 chambers · 1H + 2AI Fork

Collaboration

Bring your team's Claude Code agent sessions into one shared, self-hosted workspace — and make the way your studies are built open to inspection.

01
Team Workspace

See Your Lab's Agentic Work in One Place

Carrier mirrors your team's Claude Code sessions and shared memory into a single workspace, so you can review how the lab is using AI agents without combing through everyone's local files. Because a shared agentic record makes every prompt, edit, and decision visible, the way a study was built and adjusted becomes transparent and auditable to collaborators — research manipulations are open to inspection, not hidden in someone's terminal. Link a GitHub repo or upload directly from each machine; either way the data stays on infrastructure you control.

Research Transparency Self-Hosted Claude Code Sessions Shared Memory
Workspace — deliberation-study
alice@lab“Add mediated condition”Pushed
12 min · 4 files · 2 commits
ben@lab“Tune trigger keywords”Local
6 min · 1 file

Linked Repos

Auto-sync a team's shared Claude Code branch from GitHub.

Connect a repository with a GitHub token (encrypted at rest with AES-256-GCM — fine-grained recommended, classic also accepted). Carrier mirrors the team's shared Claude Code branch and re-syncs roughly every 30 seconds, so the workspace stays current as your team pushes.
Encrypted Token ~30s Auto-Sync Read-Only
Linked Repo
org / deliberation-study Synced 14s ago
Branchclaude-team-share
Tokenfine-grained · encrypted
Auto-syncevery ~30s

Manual Upload

Sync without GitHub, straight from each machine.

A small Python client bundles each machine's local sessions and memory and uploads them straight to your Carrier server over an authenticated API key (hashed with bcrypt and shown only once). Wire it to a Claude Code SessionEnd hook for hands-off syncing — no GitHub required.
No GitHub Needed SessionEnd Hook Direct to Your Server
carrier_sync.py
$ python3 carrier_sync.py
packaging ~/.claude-team-share …
uploading → carrier.lab.ac.uk …
✓ 3 sessions · 2 memory · 412 KB
Runs automatically on each Claude Code SessionEnd

Session Transcripts

Review every Claude Code session turn by turn.

Open any team member's Claude Code session and review it turn by turn — the prompts, the tools and files touched, and whether the work was committed and pushed — with duration and file and commit counts at a glance.
Turn-by-Turn Files & Commits Per Person
alice@lab · session
"Add mediated condition"
P "add a mediator that summarises every 5 messages"
Edit · models/Participant.js Bash · git commit -m "mediator"
4 files2 commitsPushed

Shared Memory

One home for the team's accumulated agent memory.

Browse the Claude Code memory entries the team has shared — name, type, and full body — grouped by person, in one place, so hard-won context is not stranded on a single laptop.
Grouped by Person Full Entries Shared Context
Shared Memory
matching-rules project alice@lab
consent-copy reference ben@lab
stimulus-set-v2 project alice@lab
↑ Click any card above to expand its details and mock UI
Responsible Agentic AI

Accountable at Every Stage of the Research Lifecycle

Empirical social science is moving into the digital space — and Carrier puts LLMs and AI agents to work at every stage of it, from first design to final dataset. At each step the researcher stays in control: agents propose, you decide; every action stays visible; every method is validated and reproducible.

  1. 01

    Design

    Workspace

    Shape studies with agentic assistants and keep every AI-made decision visible to your team — how a study was built stays transparent and auditable.

    Human-in-the-loop
  2. 02

    Deploy

    Experiment Builder

    Launch mixed human–AI experiments — real participants alongside LLM agents and scripted bots — running exactly as you designed them.

    Safety & data controls
  3. 03

    Analyse

    Annotator

    Code and label data at scale with LLM codebooks drawn from peer-reviewed research, repeatable across models for the consistency your analysis needs.

    Validated & reproducible