Back to Home

Carrier — A Researcher's Guide

A guide for researchers — psychologists, social scientists, HCI researchers, and study designers — who want to use Carrier to run interactive experiments involving any combination of human participants, LLM chatbots, scripted chatbots, and Claude Agents. Part I introduces the four systems out of which every Carrier experiment is built; Part II covers the day-to-day mechanics of running a study.

Who this guide is for

This guide is written for researchers — psychologists, social scientists, HCI researchers, and study designers — who want to use Carrier to run interactive experiments involving any combination of human participants, LLM chatbots, scripted chatbots, and Claude Agents.

It assumes you are comfortable thinking about experiments in the usual methodological terms — conditions, manipulations, between- vs. within-subjects designs, counterbalancing, blinding, attention checks — but not that you have any prior experience with the platform or with the engineering vocabulary used internally to describe it.

Whenever Carrier uses a term that has a recognisable counterpart in research methodology, the first time you meet it we will name the counterpart explicitly.

How to read this guide

The guide has two parts.

Part I — Designing an Experiment is conceptual. It introduces the four systems out of which every Carrier experiment is built. Each chapter begins with the research problem the system solves, then names the building blocks Carrier offers, and finally describes the design decisions you make as a researcher. Short builder walkthroughs (still images and short GIFs) and worked examples accompany each chapter.

Part II — Operating the Platform is operational. It covers the day-to-day mechanics of running a study: activating an experiment, monitoring participants, exporting data, managing accounts. It is short and assumes you have read at least the relevant chapters of Part I.

Appendices at the end provide a glossary, the type-by-role compatibility matrix, and quick-reference indexes for segment types and trigger types.

The four systems

A Carrier experiment is built out of four interlocking systems. Each answers one of four design questions:

System Question What it gives you
Chamberlines, chambers, segments What is the shape of a participant's journey? A way to specify conditions, group participants, and lay out the activities they move through.
Roles Who takes part, and in what capacity? Three roles (communicator, mediator, processor) that any human, LLM chatbot, scripted chatbot, or agent can occupy.
Variables What do we know about each participant, and how should that change their journey? A way of carrying participant attributes through the experiment and using them to filter matches, gate visibility, and personalise instructions.
Triggers How should non-human participants behave? A rule-based system that defines, for every non-human participant, the conditions under which they speak or act.

Schematically, they fit together like this:

Experiment
Chamberline · condition
Chamber 1
segments
slots / roles
Chamber 2
segments
slots / roles
↑ Variables
Travel with each participant. Decide which chamberline, which slots, which chambers, and what each activity says.
↑ Triggers
Govern the behaviour of every non-human participant inside any chamber.
The four systems of a Carrier experiment. Chamberlines hold chambers; chambers hold segments and slots; variables and triggers cut across the structure.

The systems are introduced in this order — shape, then occupants, then information, then behaviour — because each layer presupposes the one before it. Once you have read all four chapters you can return to each independently.

A note on terminology

Carrier termClosest research counterpart
ChamberlineCondition / experimental arm
ChamberA timed grouping of matched participants
SegmentAn activity / phase within a chamber
SlotA position in a chamber to be filled by a participant
RunOne participant's complete pass through the experiment
ChatroomThe live instantiation of a chamber for a matched group
VariableAn attribute attached to a participant (from a survey, an assignment, or the system)
TriggerA condition–response rule that governs a non-human participant

These mappings are not strict — see the glossary in Appendix A for nuance — but they will get you most of the way.

A note on non-human participants

Carrier distinguishes three kinds of non-human participant. The distinction is important because they differ in what they can do, how reproducible they are, and which research designs they serve best.

  • LLM chatbots are language-model-driven participants (OpenAI, Anthropic, Google, or any compatible provider) that converse. On each turn the model is given the conversation so far and a system prompt, and it produces a chat message. That message is the entirety of its output. LLM chatbots are open-ended and naturalistic, but they vary from session to session. Use them when you want behaviour that reads and reacts to conversation in an unconstrained way.
  • Scripted chatbots are rule-driven participants whose responses are pre-written. A scripted chatbot is configured with a set of triggers (§4) — rules of the form “when keyword X is uttered, send one of these three sentences” — and produces nothing else. Scripted chatbots are deterministic and replayable: the same input sequence produces the same outputs across sessions. Use them when you want behaviour that is reproducible, auditable, and identical across participants.
  • Agents are autonomous participants built on Anthropic's Claude Agent API. Unlike a plain LLM chatbot, an agent does not just answer from the conversation history — it has access to a vocabulary of built-in tools it can invoke on its own initiative to gather information or take action: reading files from a configured document area, running commands, browsing the web. The agent chains these tool calls across multiple steps without being asked between each step, and produces its eventual chat message grounded in what it read or computed. Each agent is typically scoped to a particular document area — the study materials, a cited corpus, a specific dataset — and behaves as the experiment's resident expert on that material. Use an agent when you want a non-human participant that retrieves and reasons over a body of material during the conversation, not just one that talks from training-time knowledge.

All three kinds can take any of the three roles described in §2, with one exception: scripted chatbots cannot serve as processors (see §2.5). The relationship between kinds and roles is summarised in the type × role matrix in §2.1 and again in Appendix B.

When this guide says “chatbot” without qualification it means either an LLM chatbot or a scripted chatbot. “Agent” — capitalised or not — means specifically a Claude Agent. “Non-human participant” is the umbrella term that covers all three.

A note on terminology overlap: any LLM-driven participant (an LLM chatbot or an agent acting in a mediator role) can also be configured to emit Carrier intervention actions — disable a participant's input, prompt someone, highlight a message — alongside its chat message. These intervention actions are a Carrier-specific structured-output mechanism, not the same thing as a Claude Agent's built-in tools. §4.5 separates the two carefully; for now, it is enough to know that “tools the agent uses to read files and browse” and “intervention actions a mediator chooses to fire” are different channels.

Part I · §1. Chamberlines, Chambers, and Segments

The three-level shape of a Carrier experiment: chamberlines isolate the condition, chambers isolate the matched group, and segments isolate the activity.

1.1 The shape of a Carrier experiment

Every Carrier experiment has the same nested shape. A participant who opens the experiment URL is assigned to a chamberline — the condition they will experience. Their chamberline is an ordered sequence of chambers, each of which is a small group of participants (human, AI, or both) who are matched once and remain together for the duration of that chamber. Inside each chamber, participants progress through an ordered list of segments — the activities that constitute the chamber, of which a real-time conversation is the most common.

This three-level structure is the backbone of Carrier:

Participant
Chamberline condition / arm
Chamber matched group, stable across segments
Segment activity
The three-level nested shape of a Carrier experiment.

The three levels exist to separate three concerns that are easily entangled when designing an interactive study:

  • Chamberlines isolate the condition. Different chamberlines represent different experimental arms; a participant sees exactly one.
  • Chambers isolate the matched group. Within a chamber, who you are with does not change.
  • Segments isolate the activity. Within a segment, what you are doing does not change.

The remainder of this chapter introduces each level in turn.

1.2 Chamberlines: the unit of condition

1.2.1 Why chamberlines

In a typical lab study you might compare two or three experimental conditions — say, high-anonymity versus low-anonymity discussions of a controversial topic. A chamberline is Carrier's representation of exactly that: a complete journey that one group of participants will take through the experiment. Multiple chamberlines in the same experiment correspond to multiple between-subjects conditions.

A participant is assigned to a single chamberline at the start of their session and does not leave it. Within-subjects comparisons (the same person experiencing two manipulations) are typically built inside a chamberline, by sequencing chambers that differ along the manipulated dimension; between-subjects comparisons (different people in different manipulations) are built across chamberlines.

1.2.2 Assigning participants to chamberlines

Carrier offers four assignment methods, configured at the experiment level:

MethodWhat it doesWhen to use
RandomEach new participant is allocated to a chamberline uniformly at random.The default for simple between-subjects studies.
CounterbalanceCarrier maintains running counts and assigns each new participant to the currently smallest chamberline.When you want equal n per condition and cannot wait for the law of large numbers.
Survey-basedA field from the global pre-survey is read at assignment time and used to choose the chamberline.When the condition depends on a participant attribute that they declare themselves (e.g. native language, political identification).
FixedAll participants are placed in the same named chamberline regardless of anything else.Pilots and demonstrations; reproducing exactly one condition.

Survey-based assignment is the most flexible. The global pre-survey runs before chamberline assignment, so any response collected there is available as a routing variable; this is the same mechanism described in §3.

1.2.3 What lives on a chamberline

A chamberline is, formally, a name plus an ordered list of chambers plus optional assignment criteria. There is no further configuration at this level — chamberlines are intentionally thin, so that a researcher can read the shape of an experiment by scanning the chamberline names and their chamber sequences in the builder.

🎞 Builder walkthrough — Creating chamberlines for two conditions

The researcher opens the Builder, clicks “Add chamberline” twice, names the new chamberlines “High Anonymity” and “Low Anonymity”, and sets the experiment-level assignment method to “Random”. The two chamberlines appear side by side in the experiment-level outline pane, each ready to receive chambers.

chamberline-create.gif

1.3 Chambers: the unit of matched group

1.3.1 Why chambers

A chamber is the basic unit of togetherness in Carrier. Once a participant enters a chamber, they are matched with the other occupants of that chamber and they stay together until the chamber ends. Within a chamber, the cast does not change.

This is a deliberate constraint. Many interactive studies depend on participants having a stable conversational partner across multiple tasks — a discussion followed by a joint ranking, for example, or a chat followed by a rating of the other person. In Carrier, those sequential activities belong in the same chamber and share its participants. Crossing a chamber boundary, by contrast, dissolves the group: the next chamber re-matches its occupants from the pool of participants who have reached that point.

1.3.2 Matching at the chamber boundary

Matching happens once, at the start of each chamber. Participants who finish the previous chamber (or, for the first chamber in a chamberline, who have completed the global pre-survey) enter a waiting pool. As soon as enough participants are present to fill the chamber's required slots, the chamber begins.

A slot is a description of the kind of participant the chamber needs. Each slot has a type (human, LLM chatbot, scripted chatbot, or agent) and a role (communicator, mediator, or processor; see §2). For human slots, matching can additionally require certain variable values — for example, that the chamber contain one self-identified novice and one self-identified expert. Variable-based matching is the topic of §3.

When a matching attempt does not assemble enough participants within a configured interval, Carrier applies a chamber-level fallback policy. The fallback is part of the chamber's configuration; common choices are to keep the participant waiting, to fill the missing slot with a default agent, or to end the run gracefully with a completion code. The choice is the researcher's, not the participant's.

1.3.3 What lives on a chamber

A chamber carries several pieces of configuration:

  • A name and identifier, used in the dashboard and in exported data.
  • A communication channel: text, audio, or video. The channel determines what the chat segment looks like and what data is recorded (transcripts, audio files, recorded video, or any combination).
  • A slot definition, listing the roles to be filled and the type each slot expects.
  • An ordered list of segments, described in §1.4.
  • An optional pre-survey shown to each participant before they enter the chamber, and an optional post-survey shown after they leave.
  • A maximum participant count — the total number of slots.

Chamber pre- and post-surveys are distinct from the experiment's global pre- and post-surveys. The global surveys run once per participant, at the very beginning and end of the run; the chamber surveys run once per chamber. Researchers typically use the global surveys for demographics and consent, and the chamber surveys for state measures that need to be taken before and after each manipulation.

🎞 Builder walkthrough — Configuring a chamber

The researcher selects a chamber inside the “High Anonymity” chamberline, renames it to “Deliberation”, sets the communication channel to “text”, adds two human communicator slots and one LLM mediator slot, and attaches a brief chamber pre-survey containing a single 7-point trust item. The chamber summary updates to show three slots and one survey.

chamber-configure.gif

1.4 Segments: the unit of activity

1.4.1 Why segments

A chamber's segments are its inner timeline: the participants are already matched, and now they move together through an ordered series of activities. Each segment is a single, self-contained activity — showing a slide, holding a conversation, voting on options, ranking items, watching a video, or completing a short embedded survey — with its own timing and transition rules.

Segments are deliberately fine-grained. A “thirty-minute deliberation” study in Carrier is usually not a single thirty-minute chat segment but a sequence: a slide introducing the topic, a timer giving participants a moment to think, a chat segment for the deliberation itself, a ranking segment to record the group's collective answer, and a short survey at the end. Each piece is configured separately, recorded separately, and can be skipped, repeated, or replaced without touching the others.

1.4.2 The catalogue of segment types

TypeWhat participants doCompatible with AI
instructionRead formatted instructions and click Continue.
slideView a static or dynamic content slide.
mediaWatch a video or listen to an audio clip.
timerWait for a countdown to elapse (often used between activities).
surveyComplete a short embedded survey (Survey.js form).
inputType a free-text response into a single-question prompt.
selectionChoose one or more options from a list (multiple choice / voting).
rankingDrag items into preferred order.
chatHold a real-time conversation with the other chamber occupants.
taskComplete a custom interactive task defined by the experiment.
attention-checkPass a survey-based or camera-based attention check.

The “Compatible with AI” column indicates whether the segment can include contributions from any of the three non-human kinds — LLM chatbots, scripted chatbots, or agents. The conversational and choice-based types can; the read/listen/wait/attention types cannot, because there is nothing for a non-human participant to do.

1.4.3 The chat segment as a special case

Of the eleven segment types, the chat segment is the most elaborate. It is the only segment that:

  • Hosts a live, multi-party exchange among all participants in the chamber simultaneously.
  • Supports the full type-by-role matrix: any participant in the chamber — human, LLM chatbot, scripted chatbot, or agent — can be a communicator, a mediator, or a processor inside a chat segment (subject to the one exclusion in §2.1).
  • Can carry embedded child segments (see §1.4.4).
  • Drives the bulk of the trigger system: most of the trigger types described in §4 (keyword, message-count, sequence, after-bot-message, and so on) are evaluated against a chat segment's message stream.

For these reasons, the chat segment carries the most configuration. Its parameters include the communication channel inherited from the chamber, a per-message length cap, optional reaction support, participant-level send permissions, and the duration and transition mode of the chat itself.

When this guide refers to “the conversation” without further qualification, it means the messages exchanged inside a chat segment.

1.4.4 Embedded vs. standalone segments

Most segment types occupy the participant's entire screen for the duration of the segment. We call these standalone segments: they take their turn in the chamber's timeline, run to completion (or timeout), and then yield to the next segment.

Most non-chat segment types can additionally be configured to run embedded inside a chat segment. An embedded segment is rendered as an overlay on top of an ongoing chat, so that participants can act on it — vote, rank, write, watch a clip, read an instruction — without leaving the conversation. The chat continues to record messages in the background, and the embedded child appears only when its start trigger fires.

Embedded display is supported for the selection, ranking, input, task, slide, instruction, timer, media, and survey segment types. Two segment types are excluded: chat (it is the parent container; you cannot embed a chat inside a chat) and attention-check (no embedded renderer — attention checks always run standalone). On screens narrower than 900 px the runtime additionally falls back to standalone rendering for every segment, so the layout never breaks on mobile.

Embedded segments are useful when the activity is part of the conversation rather than an interruption to it. Two examples:

  • Periodic polling during a deliberation. Ask participants to vote at three points (after one minute, after three minutes, after five minutes) while they continue talking. Each vote is a separate embedded selection segment with a different embedded-start trigger.
  • Pacing a ranking activity to chat progress. Show a ranking overlay only once the conversation has produced enough material to rank — for example after a fixed number of chat messages, or after a fixed time offset from the start of the chat.

An embedded segment adds three configuration parameters beyond its standalone equivalent:

ParameterWhat it does
Embedded startWhen the overlay first appears, relative to the parent chat: immediately, after N seconds, after N chat messages, or after the previous embedded sibling ends (chained, with an optional delay).
Embedded stopWhen the overlay closes: as soon as the participant submits / clicks Next (the default), or after a hard N-second timeout.
Embedded completion behaviourWhat happens when the embedded child finishes: dismiss it (continue chatting), end the parent chat, lock the overlay so it cannot be reopened, or minimise it as a badge.

A chat segment with one or more embedded children may use an additional transition mode, embedded-complete, which ends the chat when every embedded child has completed. This is the canonical way to build a chat segment whose end is gated on the group having voted (or ranked, or read the instruction), rather than on a fixed duration. An optional fallback timeout on the chat itself is a safety net for chains whose start triggers might never fire — for instance an after N messages trigger if the participants never reach that message count.

For an attention-check segment (or any activity that needs the participant's complete attention), keep the display mode at standalone — the chamber timeline yields to it the same way it would for a survey or instruction.

🎞 Builder walkthrough — Embedding a vote in a chat

The researcher selects a chat segment in a chamber, adds a selection segment immediately after it in the segment timeline, and changes the selection segment's display mode to “Embedded (overlay on chat)”. They set the embedded start to “after 60 seconds” and the completion behaviour to “minimise”. The selection segment now appears in the timeline as an indented child of the chat, with a small “embedded” badge.

segment-embed.gif

1.5 Timing, transitions, and participant pacing

Every segment has its own timing and transition rules, which together determine how long participants spend on it and how they advance. These are the four parameters that matter most:

  • Duration — the maximum time the segment may run. If left unset, the segment has no automatic deadline.
  • Minimum duration — the earliest moment at which a participant may advance. This is the standard way to enforce a floor on engagement (a “read for at least 30 seconds before continuing” instruction slide, for example).
  • Warning time — how long before an auto-advance the participant is warned. Useful to prevent surprise transitions in long segments.
  • Transition mode — the rule for moving on:
ModeDescription
AutoThe segment advances on its own when the duration elapses.
ManualEach participant advances when they click Continue.
SyncThe segment advances only when every participant in the chamber is ready. Keeps the group in lock-step.
HostThe experimenter advances the segment from the dashboard.
Embedded-complete(Chat segments only) Advances when every embedded child has completed.

The pacing choice has substantive consequences. A sync transition gives participants the experience of a shared rhythm, but it also means that the slowest participant determines the group's pace, which can be frustrating in long studies. A manual transition lets each participant move at their own speed, but it can break the group character of a chamber if used for the chat segment itself. A host transition is most useful during pilot testing — the experimenter can step the group through the timeline by hand to debug pacing.

1.6 Worked example: a two-condition deliberation study

To make the structure concrete, here is a complete shape for a small study comparing deliberation under high versus low anonymity.

Experiment

  • Global pre-survey: demographics, consent, a political-identification scale.
  • Chamberline assignment: Random.

Chamberline A — “High anonymity”

  1. Chamber A1 — Briefing. One human slot. Segments: an instruction segment with the study brief; a survey segment measuring baseline opinion on the discussion topic.
  2. Chamber A2 — Deliberation. Three human slots. Segments: a 30-second timer (a “settle in” pause); a 10-minute chat segment using anonymous display names; an embedded ranking child of the chat segment, triggered after 5 minutes, in which the group ranks five policy options.
  3. Chamber A3 — Debrief. One human slot. Segments: a survey segment measuring post-deliberation opinion and group satisfaction.

Chamberline B — “Low anonymity”

Identical to A, except that the chat segment in chamber B2 displays each participant's first name and a chosen avatar.

Global post-survey

A short reflection on the discussion, plus a payment code.

In this shape, the chamberlines isolate the manipulation (anonymity), the chambers isolate the matched groups (one trio per chamber), and the segments isolate the activities (instruction, survey, chat, ranking). The same group of three participants moves together through chamber A2's segments because matching happens once at the start of A2 and not again until A3 begins.

Researcher: this is a fictional placeholder. We will substitute one of your real studies here when you provide it.

1.7 Common pitfalls

A handful of design mistakes recur often enough to be worth flagging.

  • Putting two unrelated activities in the same chamber. Because participants are matched once per chamber, a chamber should contain only activities that benefit from sharing the same cast. If two activities do not need to share participants, they belong in separate chambers — possibly in the same chamberline, possibly not.
  • Confusing chamber surveys with global surveys. Chamber pre/post-surveys run every time the chamber is entered; experiment-level global surveys run once per participant. State measures (mood, trust, fatigue) typically belong in chamber surveys; trait measures (demographics, personality) in the global pre-survey.
  • Long, structureless chat segments. A common reflex is to make the chat segment thirty minutes long with no embedded structure. This makes participant pacing harder to control and complicates the analysis. Breaking the deliberation into a short framing slide, a chat with one or two embedded voting children, and a closing reflection survey gives you both finer-grained timing control and richer data.
  • Choosing sync for solo activities. A sync transition only makes sense if there is more than one participant in the chamber. For solo activities (instruction reading, individual surveys), prefer auto or manual.
  • Random chamberline assignment when n is small. With fewer than roughly thirty participants per condition, random assignment can produce noticeable imbalance. Prefer counterbalance for small-n studies.

Roles — who occupies the slots defined here and what they can do inside a chamber — are the subject of §2. The variables that drive chamberline assignment and slot matching are the subject of §3.

Part I · §2. Roles: Communicator, Mediator, Processor

Three orthogonal roles — communicate, facilitate, assist composition — each occupying a distinct zone of the participant's screen. Any type (human, LLM chatbot, scripted chatbot, agent) can fill any role, with one exception.

2.1 Why roles exist

Chapter 1 defined the shape of an experiment but not its cast. A chamber declares a list of slots, each of which awaits a participant; the question this chapter answers is what a participant who fills a slot can actually do.

Carrier separates that question into two orthogonal axes:

  • The type of a participant — what they are. Four types: a real human, an LLM chatbot (language-model-driven, chat only), a scripted chatbot (rule-driven, chat only), or an agent (an autonomous Claude Agent with built-in tools for reading documents, running code, and browsing the web — see the note on non-human participants).
  • The role of a participant — how they take part. Three roles: a communicator, a mediator, or a processor.

The four kinds of participant differ in two practical respects — how they produce what they say, and how reproducible they are across sessions:

KindHow it produces outputReproducibility
HumanThe person typesWhatever the person does
LLM chatbotAn LLM is called once per turn with the conversation history; output is a chat message or silenceVariable across sessions
Scripted chatbotPre-written rules fire when their trigger conditions matchIdentical across sessions
AgentThe Claude Agent loops between LLM calls and built-in tool calls (read files, run code, browse) until it decides it is ready to speak, then produces a chat message grounded in what it foundVariable across sessions

The type × role matrix is the most important table in this chapter, because almost every design decision below either depends on it or is constrained by it:

Type \ RoleCommunicatorMediatorProcessor
Human
LLM chatbot
Scripted chatbot
Agent

The only forbidden combination is scripted chatbot as processor. The reason is technical but worth knowing: processors operate by reading drafts, generating suggestions, or interrupting composition — activities that demand the kind of open-ended language understanding only a human or a language model can provide. A pre-scripted rule set has no business being a processor.

Everything else is supported. That makes it possible to write one experimental design and instantiate the same role with a human in one condition and a language model in another — which is the single most important affordance Carrier offers for studies that compare human and machine behaviour. It also means that the choice between an LLM chatbot, a scripted chatbot, and an agent can itself be the manipulation: same role, same instructions, three different kinds of non-human partner — one talking from training-time knowledge, one talking from a written rulebook, one talking from documents it has just read.

2.2 The spatial model: three zones in the interface

A useful way to keep the roles separate in your mind is to remember that each occupies a distinct zone of the participant's screen during a chat segment:

Mediator zone
Top of chat / overlay banners. Styled announcements, broadcasts, facilitation cues.
Communicator zone
The main message area. Messages exchanged among the chamber's communicators — human and/or AI/scripted — appear here side by side.
Processor zone
Input area / composition sidebar. Draft review, generation, in-progress suggestions.
The three role zones in the chat-segment interface.

This is more than visual hygiene. Mediators broadcast; communicators converse; processors assist composition before words enter the conversation. The separation of channel is what makes it possible to study facilitation, conversation, and composition assistance independently of one another — or to combine them deliberately, knowing that the layers do not bleed into each other.

2.3 Communicator

2.3.1 Framing

The communicator is the primary interactive participant. Whatever the experiment ultimately studies, communicators are the ones doing the studied behaviour. All communicators — whether a human, an LLM chatbot, a scripted chatbot, or an agent — operate in the same message space: their messages appear in the main chat area alongside each other in the order they were sent.

The animating design principle is interaction parity. A human communicator and any non-human communicator send messages through the same mechanism, appear in the same UI, and are indistinguishable to other participants unless the researcher explicitly marks them otherwise. This is what makes it possible to run human–human, human–machine, and machine–machine conditions of the same design without rebuilding the experiment.

2.3.2 The communicator design surface

Three dimensions of configuration matter for any communicator. They are independent: settings on one dimension do not constrain settings on the others.

Identity. Who appears in the chat, by what name, and with what disclosure?

AspectChoicesWhat you decide
Source of identityUser-provided · Configured · Auto-generatedWhether the participant chooses their own display name, you pre-set it, or the platform invents one.
VisibilityVisible · HiddenWhether the communicator appears in the participant list at all.
Type disclosureDisclosed · BlindedWhether other participants are told that this communicator is human or AI.

Human communicators typically go through an identity-setup flow (choose a display name, pick an avatar) before entering the experiment. Non-human communicators — LLM chatbots, scripted chatbots, and agents alike — carry pre-configured identities. The separation makes blinding possible: a participant cannot tell from the interface alone whether a fellow communicator is human or a machine.

Input control. When can the communicator speak, and who decides?

AspectChoicesWhat you decide
Initial stateEnabled · Delayed · ConditionalWhether the communicator can send messages from the moment the chat begins.
Enable conditionsTime-based · Message-count · Bot-trigger · Participant-messageWhat event lifts a delay or unlocks input.
External controlNone · Mediator-controlledWhether a mediator can disable or enable this communicator's input dynamically during the chat.

Carrier's chat input is not simply on or off. A communicator might begin with input disabled, wait for three messages from other participants, and then become enabled. Or a mediator (see §2.4) might disable and re-enable input on the fly to enforce turn-taking. Building experimental conditions out of these primitives is how you produce interventions like simultaneous discussion vs. sequential discussion.

Message capabilities. What kinds of messages can be sent and acted on?

AspectChoicesWhat you decide
Content typesText · Media (audio/video)What the communicator can attach to a message.
ReactionsEnabled · DisabledWhether emoji reactions are available to the communicators.
ReportingEnabled · DisabledWhether a participant can flag a message for the experimenter.

2.3.3 Communicator subtypes at a glance

The four communicator subtypes correspond to the four types in the matrix:

  • Human communicator. A real participant joining via browser. They enter the matching queue, are matched into a chamber, and join its chatroom. The platform tracks their socket connection with a heartbeat; on disconnect they can be reconnected within the session. All messages, survey responses, timestamps, and activity events are recorded.
  • LLM-chatbot communicator. Configured by provider (OpenAI / Anthropic / Google / compatible), model, system prompt, temperature, and response logic (when to speak, when to stay silent — see §4.3). The model is given the conversation so far on each turn and produces a chat message (or stays silent). LLM chatbots never enter the matching queue: once the human slots in a chamber are filled, they are spawned into the chatroom automatically. The platform supports multi-step LLM chains for advanced configurations (a generation step, then a critique step, then a rewrite step).
  • Scripted-chatbot communicator. Configured by a set of triggers (§4). Like LLM chatbots, scripted chatbots do not enter the matching queue — they are spawned into the chatroom after human slots are filled. Their behaviour is deterministic and replayable: the same input sequence produces the same outputs across sessions, which makes them the right choice for confederate roles and any design in which conversational reproducibility matters.
  • Agent communicator. A Claude Agent (Anthropic) scoped to a particular document area — typically the study materials, a reference corpus, or a configured dataset. Before producing each message, the agent's underlying model loops over its built-in tools (file reading, code execution, web browsing) to look things up, run small computations, or check a citation. The message it eventually sends is grounded in what it has retrieved. Agent communicators are the natural choice when the experiment wants a conversational partner that can answer with evidence — a study-materials expert that quotes the brief verbatim, a fact-checker that can browse during the discussion, a domain assistant that can re-read the dataset before stating a number.

2.3.4 Research uses of the communicator role

A short, indicative list of designs that map cleanly onto different communicator configurations:

DesignCommunicator configurationWhat it studies
Group discussion2+ human communicatorsOpinion formation, group dynamics, polarisation
Human–AI dyad1 human + 1 LLM communicatorTrust, persuasion, perception of machine partners
Confederated AI1 human + N LLM communicators, blindedConformity, majority influence
Turn-taking studyHumans with delayed input controlSequential vs. simultaneous discussion
Agent-to-agent comparison2 LLM communicators with different system promptsModel behaviour under controlled prompting

2.4 Mediator

2.4.1 Framing

The mediator is a facilitator. Unlike a communicator, a mediator does not converse on equal footing with the others — they orchestrate the conversation. The qualitative differences are four:

  1. Universal visibility. A mediator sees every message in the chat, regardless of who it was addressed to.
  2. Distinct delivery. A mediator's output appears as styled announcements at the top of the chat, not as chat bubbles.
  3. Control capabilities. A mediator can act on the chat — disable a communicator's input, prompt a specific participant, highlight a message — not only speak into it.
  4. Event awareness. A mediator reacts to aggregate patterns (number of messages, time elapsed, idle participants) at least as readily as to individual messages.

These four capabilities together describe the moderator / facilitator / researcher dynamic that has no analogue in plain group chat.

2.4.2 The broadcast system

Mediator messages are called broadcasts. A broadcast is styled along three independent axes:

AxisChoicesEffect
StyleFacilitator · Announcement · SystemIcon and tone of voice in the rendered banner.
PriorityNormal · Important · UrgentVisual emphasis and how long the banner stays before auto-dismissing.
PersistenceDismissible · PersistentWhether the participant can dismiss the banner.

Broadcasts can also be targeted: at every participant in the chamber, at communicators only, at a specific named participant, or at all participants in a specific role. This is the mechanism by which a mediator can deliver a private prompt to one communicator without the rest of the group seeing it.

2.4.3 Facilitation actions (LLM-driven mediators)

When the mediator is driven by a language model — either an LLM chatbot or a Claude Agent — it has access to an action vocabulary that goes beyond plain broadcasting. These five intervention actions are produced as part of the model's structured response on each turn; the most consequential is the first.

A mediator implemented as a scripted chatbot can fire the same actions, but their parameters must be baked into the trigger configuration ahead of time (§4.5); the scripted bot cannot choose the action contextually based on what was just said.

Note that this intervention-action vocabularydisable_chat, enable_chat, prompt_participant, highlight_message, request_attention — is distinct from the built-in tools of a Claude Agent (file reading, code execution, web browsing). They live on different channels:

  • The intervention actions are how a mediator acts on the chat — they affect what participants see and what they are allowed to do.
  • An agent's built-in tools are how it gathers information for itself — they affect what the agent knows when it speaks, but the participants see only the eventual message.

A Claude Agent acting as mediator has access to both: it can read its configured document area before responding, and fire intervention actions alongside its broadcast. An LLM chatbot acting as mediator has only the intervention actions; a scripted chatbot acting as mediator has only the pre-baked variants.

ActionTargetDescription
disable_chatA specific communicator, or allTemporarily prevents the target from sending messages. The release is governed by a set of conditions described below.
enable_chatA specific communicator, or allExplicitly lifts a disable. Immediate.
prompt_participantA specific communicatorSends a private encouragement or prompt visible only to that participant.
highlight_messageA specific messageMarks a past message as highlighted in the chat for a configurable duration.
request_attentionA specific communicatorTriggers a visual or audio cue to draw the participant's attention.

The disable_chat action supports composite release conditions — a list of conditions combined with an any / all connector that determines when the disable lifts. The available conditions are:

  • Timeout — after a fixed duration.
  • All others responded — when every other communicator has sent a message of at least a configured minimum length.
  • Message count — after a fixed total number of messages have been sent in the chat.
  • Keyword mentioned — when a designated keyword (or any from a list) is uttered by any, or a specific, participant.
  • Participant message — when a designated participant has sent a configured number of messages.
  • Mediator release — released only by an explicit subsequent enable_chat.
  • Segment change — released when the chamber transitions to its next segment.

Composite release conditions are the building blocks for richly specified turn-taking protocols. “Wait until everyone else has responded, or sixty seconds, whichever comes first” is a single disable_chat action with two conditions and an any connector.

2.4.4 Mediator-specific triggers

Mediators inherit the standard trigger system (§4), but five extra trigger types are particularly suited to facilitation:

TriggerWhen it firesTypical use
PeriodicAt regular intervals after the chat beginsRecurring summaries, scheduled check-ins.
AggregateAfter N messages have accumulated within a time windowBatched synthesis or pattern detection.
Topic-detectedWhen a designated keyword pattern appearsTopic steering, off-topic detection.
Activity-timeoutWhen no messages have been sent for N millisecondsIdle prompts, participation encouragement.
Participant-countWhen the number of active participants crosses a thresholdReacting to departures, waiting for arrivals.

These trigger types, combined with the action vocabulary above, are what make automated AI facilitation in Carrier expressive: a mediator can be configured to periodically summarise the discussion every two minutes, prompt any participant who has been silent for ninety seconds, or steer the conversation back on topic when a specified keyword has not appeared in the last thirty messages.

2.4.5 Activity monitoring

A mediator can optionally maintain an activity model of each communicator. The activity-monitor settings are:

  • An idle threshold (milliseconds of inactivity before a participant is considered idle), and a list of idle prompts to send when it is reached.
  • An active threshold (messages-per-minute rate at which a participant is considered to be dominating), and a list of active prompts to send when it is reached.

This is the building block for participation-equity interventions: a mediator that automatically prompts quiet members and gently invites dominant members to “make space for others”.

2.4.6 Research uses of the mediator role

DesignMediator configurationWhat it studies
Automated facilitatorLLM mediator with periodic + topic triggersEffectiveness of automated facilitation.
Turn-taking enforcementLLM mediator using disable_chat with all_others_respondedEffects of structured discussion on quality.
Participation equityLLM mediator with activity monitoring and idle promptsInterventions on balanced participation.
Discussion steeringLLM mediator with topic-detected triggersTopic-management strategies.
Human facilitatorHuman in mediator role, broadcast capabilityExpert facilitation patterns.
Timed interventionsScripted mediator with periodic broadcastsInformation-injection effects.
Real-time summarisationLLM mediator with aggregate triggers and a synthesis promptImpact of real-time summaries on deliberation.

2.5 Processor

2.5.1 Framing

The processor is the most novel of the three roles. It operates in the input composition space rather than the message exchange space — that is, it acts before a communicator's text becomes a message in the chat. Where a mediator sits over the conversation and a communicator sits inside it, a processor sits alongside the input box, reviewing what the communicator is about to send, generating drafts on request, or offering live suggestions as the communicator types.

The role exists because the act of composing a message is a distinct site of intervention — distinct from facilitating the conversation, and distinct from participating in it. A study that wants to ask “what happens when an AI helps people write what they say?” needs a place to put that AI, and that place is not the chat.

2.5.2 The design space, in three dimensions

It is tempting to think of processors in terms of “review” and “generate” alone, but the design space is richer. Three independent dimensions structure it.

Initiation. Who starts an interaction?

  • Communicator-initiated — the communicator explicitly asks (submits a draft for review, clicks Generate).
  • Processor-initiated — the processor offers help without being asked (sends a suggestion).
  • System-initiated — the platform triggers an interaction based on an event (a pause is detected, a timer elapses).

Control. Who controls the final output?

  • Communicator retains control — the communicator always decides what is actually sent (accept / reject / edit the processor's output).
  • Processor retains control — the processor decides what the communicator sees (filtering, rewriting).
  • Shared control — both can edit; the final version is negotiated.

Timing. When does the interaction happen?

  • Pre-send — before the message enters the chat (review, approval).
  • During composition — while the communicator is typing (real-time suggestions).
  • On-demand — when explicitly requested (a Generate button).

Not every combination of these dimensions is feasible. LLM API latency makes true simultaneous co-editing impractical for LLM processors; processor-controlled rewriting risks undermining the validity of self-report studies by replacing the communicator's voice with the processor's. Carrier's design therefore commits to one principle and offers three concrete modes.

The committed principle: communicator agency. Whatever the processor does, the communicator retains final control over what enters the chat. The processor never bypasses the communicator's agency.

2.5.3 The three modes

Review

Communicator writes draft  →  submits for review  →  processor gives feedback
                                                     ↓
Communicator accepts / edits / rejects  →  message sent to chat
DimensionSetting
InitiationCommunicator-initiated (on-submit) or system-initiated (pause-triggered)
ControlCommunicator
TimingPre-send

Configuration parameters:

  • Triggeron-submit (the communicator clicks Send and the processor steps in) or pause-triggered (the processor steps in automatically after the communicator stops typing for a configured period).
  • Pause timeout — the inactivity window, in milliseconds, used only when the trigger is pause-triggered.
  • Feedback formatfreeform (the processor writes a free-text response), inline-edit (the processor proposes an edited version of the draft), or approve-reject (the processor returns a binary verdict).
  • Mandatory — whether the communicator must address the feedback before they can send the message.
  • Max rounds — a cap on the number of review iterations on a single draft.

Research applications include writing-quality improvement, self-reflection, metacognitive scaffolding, and peer-review dynamics.

Generate

Communicator clicks Generate  →  processor creates a draft
                                  ↓
Communicator edits  →  message sent to chat
DimensionSetting
InitiationCommunicator-initiated (explicit request)
ControlCommunicator (edits before sending)
TimingOn-demand

Research applications include AI-ghostwriting perception, co-authoring dynamics, and the trade-off between generation quality and editing effort.

Real-time assist (human processors only)

Communicator types  →  the draft streams to a human processor
                       ↓
Processor sends ephemeral suggestions  →  communicator accepts or dismisses
DimensionSetting
InitiationSystem-initiated (continuous streaming)
ControlCommunicator (suggestions are ephemeral)
TimingDuring composition

This mode is restricted to human processors because LLM round-trip latency makes truly live suggestions impractical. The use cases — peer coaching, expertise-based assistance, live mentoring during composition — all assume a human partner at the other end.

2.5.4 Phase scripts: dynamic processor behaviour

A processor does not have to behave the same way throughout a chamber. Carrier supports phase scripts — an ordered list of phases, each with a mode and a transition trigger that advances to the next phase.

phases: [
    { id: "warmup",     mode: "disabled", transition: { on-start } }
    { id: "review",     mode: "review",   transition: { message-count: 5 } }
    { id: "generate",   mode: "generate", transition: { time-elapsed: 180000 } }
]

The processor in this example starts disabled, switches into review mode once five messages have been exchanged, and switches into generate mode three minutes after the segment starts. The supported transition trigger types are:

TriggerWhen the next phase activates
On-startImmediately when the segment begins.
Message-countAfter N messages have been sent in the chatroom.
Time-elapsedAfter N milliseconds from segment start.
KeywordWhen a designated keyword appears in the chat.
Participant-eventWhen a participant joins or leaves.
ManualWhen the experimenter advances from the dashboard.
On-endWhen the segment ends.

Phase scripts are how Carrier supports designs like progressive scaffolding (start with heavy review, then withdraw assistance over time) or adaptive intervention (switch modes once a quality threshold has been crossed).

2.5.5 Context configuration

The amount of conversation a processor can see is configurable:

LevelWhat the processor sees
NoneOnly the current draft; no chat history.
PartialThe last N messages (configurable).
FullThe complete chat history of the chamber so far.

For LLM processors, the context level controls what is sent to the model with each call. For human processors, it controls the contents of a read-only chat panel beside the suggestion interface. Context level interacts with privacy considerations: a research design that does not want the processor influenced by — or able to see — earlier conversation should set context to none.

2.5.6 Research uses of the processor role

DesignProcessor configurationWhat it studies
AI writing coachLLM processor, review mode, freeform feedbackImprovement of writing quality through AI feedback.
Mandatory reviewLLM processor, review mode, mandatory: trueEffects of forced reflection on output.
AI ghostwriterLLM processor, generate modeAuthorship perception, AI-assisted communication.
Peer reviewHuman processor, review modePeer-feedback dynamics.
Live coachingHuman processor, real-time assist modeExpert scaffolding during composition.
Progressive scaffoldingPhase script: disabled → review → generateEffects of withdrawing assistance over time.
Mode comparisonTwo chamberlines: one review, one generateTrade-off between reviewing and generating.

2.6 How the roles interact

The three roles are designed to be orthogonal — to operate in different spatial zones, on different inputs, with different output channels — but they can be present in the same chamber and combined in principled ways.

Communicator ↔ mediator. Asymmetric. The mediator sees everything; the communicator does not see the mediator's view. The mediator can broadcast to communicators, disable or enable their input, prompt them privately, and highlight their messages. This asymmetry is intentional: it models the moderator dynamic.

Communicator ↔ processor. Collaborative but communicator-controlled. The processor sees drafts; it sends feedback, suggestions, or generated text; the communicator decides what actually gets sent. The processor never bypasses the communicator.

Mediator ↔ processor. Orthogonal. Mediators act in the broadcast/control layer; processors act in the composition layer. They do not directly interact, but their effects can be coordinated — a mediator might broadcast that “the next message will be reviewed” at the same trigger point at which a processor switches from disabled to review mode.

Four representative configurations to keep in mind:

ConfigurationRoles presentResearch scenario
Communicators only2+ communicatorsStandard group discussion.
Communicators + mediatorN communicators + 1 mediatorFacilitated discussion.
Communicators + processorN communicators + 1 processorAssisted composition.
Full setupN communicators + mediator + processorFacilitated discussion with composition assistance.

2.7 Design principles, distilled

Six principles run through the role system. They are summarised here for reference.

  1. Separation of concerns. Each role occupies a distinct spatial zone (chat, broadcast, input) and serves a distinct function (communicate, facilitate, assist). The separation is what enables clean experimental contrasts.
  2. Communicator agency. The communicator always retains final control over what they say. No mediator or processor can speak in their name.
  3. Type orthogonality. Any participant type that is compatible with a role can fill it. This is what makes human–AI comparisons trivial to set up.
  4. Phase-based dynamism. Both mediators (via triggers) and processors (via phase scripts) can change behaviour during a session, enabling within-session manipulations and progressive designs.
  5. Comprehensive logging. All messages, broadcasts, mediator actions, and processor interactions are recorded with metadata. This is what makes the resulting data analysable.
  6. Configuration hierarchy. Global → chamber → segment overrides → event-based control. Defaults at the top, specificity at the bottom.

2.8 Builder walkthroughs

🎞 Builder walkthrough — Adding an LLM mediator to a chamber

The researcher opens a chamber, clicks “Add slot”, picks the mediator role and the LLM chatbot type, and selects a provider, model, and system prompt from the configuration panel. They enable two periodic triggers (“summarise every 2 minutes”) and one topic-detected trigger (“steer back on topic when X has not appeared in 30 messages”). The chamber's slot list now shows a mediator with two triggers attached.

role-mediator.gif
🎞 Builder walkthrough — Configuring a processor with a phase script

The researcher adds a processor slot to a chamber, opens the phase-script editor, and creates three phases: disabled until segment start, review until five messages, generate until three minutes. They set the review feedback format to inline-edit and mark it non-mandatory. The slot now displays a small timeline summarising the three phases.

role-processor.gif

2.9 Worked example: a human–AI co-writing study

A small study comparing two ways an AI can help people write persuasive messages.

Experiment

  • Global pre-survey: demographics, writing-confidence scale, target-topic baseline opinion.
  • Chamberline assignment: random.

Chamberline A — “AI as reviewer”

  1. Chamber A1 — Briefing. One human communicator slot. Segments: instruction, baseline survey.
  2. Chamber A2 — Persuasive writing. One human communicator + one LLM processor, configured in review mode with inline-edit feedback and a maximum of two review rounds per message. Segments: a 12-minute chat segment in which the human composes three messages directed at a fictional audience.
  3. Chamber A3 — Debrief. One human slot. Segments: post-task survey.

Chamberline B — “AI as generator”

Identical to A, except that in chamber B2 the LLM processor is configured in generate mode. The communicator clicks Generate to receive an initial draft, edits it, and sends.

Global post-survey

Reflection on the experience, ownership/authorship attributions, payment code.

In this design, the role system is doing the lion's share of the work. The chamberlines, chambers, and segments are simple; the manipulation lives entirely in the configuration of the processor.

Researcher: fictional placeholder. We will substitute a real study when you provide one.

2.10 Common pitfalls

  • Conflating type and role. A common reflex is to say “we want a bot in this chamber” without specifying what role it plays. “Bot” is a type label; the chamber still needs to know whether it is a communicator, a mediator, or — for LLM chatbots and agents, but not scripted chatbots — a processor. The three roles cannot be substituted for one another.
  • Putting a scripted chatbot in the processor slot. The matrix forbids it, and the builder will refuse to save the configuration. If your design calls for deterministic assistance during composition, use a human processor with a strict script of allowed feedback responses, or an LLM-chatbot processor with a temperature of zero — but not a scripted chatbot.
  • Mediators that participate. It is tempting to use the mediator as a “knowledgeable participant” that joins the conversation as a peer. Resist this. If the entity is meant to participate on equal footing, make it a communicator. The mediator's affordances — universal visibility, broadcasts, control actions — only make sense for a non-peer facilitator.
  • Real-time-assist with a non-human processor. This combination is unsupported for both LLM chatbots and agents, and for good reason: LLM round-trip latency is too high for live token-by-token suggestion (and an agent's tool-loop makes it slower still). If you need real-time assistance, place a human in the processor slot.
  • Over-configuring the disable_chat action. Composite release conditions are powerful but easy to over-specify; a disable_chat with five conditions tied together with all may never release at all. Pilot every novel turn-taking protocol with a small group and watch the action log before scaling up.
  • Forgetting that non-human communicators do not enter the matching queue. A chamber that requires three communicators, of which one is an LLM chatbot, only needs to match two humans before it begins; the chatbot is spawned automatically. Researchers who set maxParticipants equal to the human count alone are sometimes surprised by who shows up.

Variables — the attributes that travel with each participant and decide which slots they can fill, which chambers they see, and what each activity says to them — are the subject of §3.

Part I · §3. Variables

Variables are how a Carrier experiment becomes personal. They constrain matching, gate visibility, inject context into prompts, and drive triggers.

3.1 Why variables exist

By the end of Chapter 2 we have a design that can place a researcher's chosen mixture of humans, LLM chatbots, scripted chatbots, and agents inside a sequence of chambers, with each role doing what the design expects. That design is, however, still impersonal: it treats every participant identically. The same chamberline, the same slots, the same instructions, the same prompts — applied to every person who walks into the experiment.

Variables are how a Carrier experiment becomes personal. A variable is an attribute attached to a participant — their self-identified expertise, the political party they support, the score they obtained on the pre-survey, the option they ranked first in the last chamber — that is then available to the rest of the experiment to consult. Variables let you:

  • Constrain matching. Require that the matched group contain (say) one self-identified novice and one self-identified expert.
  • Gate visibility. Show a remedial chamber only to participants whose pre-test score was below threshold; skip the chamber for everyone else.
  • Inject context into instructions and prompts. Address each participant by their chosen name; brief an LLM mediator on the political identification of each person in the room.
  • Drive triggers (§4). Fire a bot's response only when the variable matches a condition.

In conventional methodological language, variables are how Carrier represents the experimentally-relevant measured and manipulated characteristics of each participant. Once captured, they travel with the participant for the rest of the run.

3.2 The anatomy of a variable

A variable is defined once, at the experiment level, and then referenced from anywhere in the experiment. Each definition has the following parts:

PartWhat it specifies
KeyThe short identifier used to reference the variable (e.g. expertise, party_id, pretest_score).
TypeThe kind of value the variable can take: string, number, boolean, single-choice, multi-choice.
SourceWhere the value comes from — a survey response, the result of a segment, a system-assigned value, or an aggregate over other participants.
Options (for choice types)The list of allowable values, optionally each with a display label and a numeric value (see 3.2.2).
DefaultThe value used when the source produces nothing.

Variables are addressed in the rest of the experiment by their key, prefixed with var. — for example, var.expertise or var.party_id. The prefix exists so that the system can tell, when reading a configuration, that a string is referring to a variable rather than to a literal.

3.2.1 Where variable values come from

A variable's value is computed from a source specification. The five common sources are:

  • Survey response. The most common source. Read the value of a named question from one of the participant's completed surveys — the global pre-survey, a chamber pre- or post-survey, or any embedded survey segment. Survey-based variables are evaluated as soon as the survey is submitted, which means they are available for chamberline assignment, slot matching, and chamber visibility from that point forward.
  • Segment submission. Read what the participant submitted during a particular segment — the option they chose in a selection segment, the order they produced in a ranking segment, the text they entered in an input segment.
  • Segment data. Read data captured about a segment — how long the participant spent, how many messages they sent, whether they passed an attention check.
  • System-assigned. Read a value the platform sets automatically — the chamberline the participant was assigned to, their participant ID, the run identifier.
  • Aggregate. Compute a value across multiple participants — the mean of all communicators' pre-test scores in the same chamber, for example, or the modal political identification of the group. Aggregates are how a configuration can talk about the group, not just the individual.

A source can be qualified by a participant reference: by default a variable is read from the participant who is being configured (referred to internally as self), but it can also be read from another slot in the chamber (slot:1, slot:2, or slot:mediator). This is what makes it possible to address a fellow participant in an instruction or in a system prompt — for example: “you are about to chat with {slot:1:var.first_name}, who identifies as a {slot:1:var.party_id} voter.”

3.2.2 Per-option numeric values

A choice-type variable is, by default, a categorical label — “novice” or “expert”, “Democrat” or “Republican”. When a numerical contrast is also useful, each option in the variable's definition can carry an associated numeric value.

A typical case is a five-point self-rating scale. The variable's options might be labelled Very low / Low / Moderate / High / Very high, with numeric values 1 / 2 / 3 / 4 / 5. The label is what the participant sees and what is logged in raw form; the numeric value is what the configuration arithmetic uses when computing aggregates or threshold conditions. A condition like “this chamber only appears if the participant's expertise is ≥ 3” is expressed against the numeric value, not the label.

Numeric values are optional. A variable without them behaves as a pure categorical attribute.

3.2.3 Aggregate variables across non-human participants

When an aggregate variable is computed across the participants of a chamber, the researcher can choose whether non-human participants (LLM chatbots, scripted chatbots, and agents) are included or excluded from the aggregate. This setting matters in two common situations:

  • Confederated designs (one human, several non-human communicators) — usually the researcher wants aggregates to reflect humans only, since the non-humans are confederates.
  • Group-property analyses (e.g. mean expertise of the room) — the researcher chooses whether non-human participants carry an expertise value at all, and whether it contributes to the mean.

The default is to include only humans in aggregates; the alternative is configured on each aggregate variable definition. The same humans-only / include-all toggle applies regardless of which of the three non-human kinds is present in the chamber.

3.3 Variables as matching constraints

The first place a variable does work is during matching. There are three places in the configuration where a variable can constrain who is matched into what:

3.3.1 Chamberline eligibility

A chamberline can declare a filter — a condition expressed against variables that a participant must satisfy in order to be eligible for that chamberline. When the experiment's chamberline-assignment method is survey-based (see §1.2.2), the filter is the mechanism by which the assignment is made:

Chamberline "Pro-disclosure"   filter:  var.party_id == "Democrat"
Chamberline "Anti-disclosure"  filter:  var.party_id == "Republican"

A participant whose var.party_id is Democrat will be eligible for the first chamberline and not the second; the inverse holds for Republican participants. Participants who do not satisfy any chamberline filter are routed to a default chamberline if one exists, or terminated gracefully otherwise.

3.3.2 Slot requirements

Inside a chamber, each slot can declare a list of required properties — conditions that the participant filling the slot must satisfy. A two-slot chamber with the slot constraints

Slot 1   human communicator   requires:  var.expertise <= 2
Slot 2   human communicator   requires:  var.expertise >= 4

will pair each session into one novice and one expert. Matching does not begin for this chamber until two participants are waiting who jointly satisfy the slot constraints — a queued participant whose expertise is 2 can be matched into Slot 1, and a queued participant whose expertise is 5 can be matched into Slot 2, but two novices cannot fill the chamber on their own.

Slot requirements can reference any variable that has been resolved by the time the chamber begins. In practice this means anything captured in the global pre-survey, anything assigned by the system, and anything from earlier chambers in the participant's run.

3.3.3 Group-composition constraints

Slot-by-slot constraints are sometimes too weak to express a desired group property. Where the slot constraints care only about who fills each seat, a group-composition constraint cares about a property of the matched set as a whole. Typical examples:

  • The chamber requires a gender-balanced trio — at least one male, at least one female, no constraint on the third seat.
  • The chamber requires that the modal political identification of the trio is Democrat.
  • The chamber requires that no two members share the same self-reported expertise level.

Group-composition constraints are expressed against aggregate variables (3.2.3) and are evaluated against the candidate set during matching. A candidate matching is admitted only if the corresponding aggregate, computed over the candidates, satisfies the constraint.

3.4 Variables as visibility gates

A second use of variables is to gate what a participant sees, not just who they see it with. A chamber can declare a visibility condition against one or more variables. The chamber is included in the participant's run only if the condition is true at the moment the chamberline would otherwise enter that chamber.

This is the canonical mechanism for branching designs:

  • A practice chamber that is shown only to participants whose pre-test score was below threshold.
  • A debriefing chamber for the high-anonymity condition that does not exist in the low-anonymity condition (an alternative to two parallel chamberlines, useful when most of the experiment is identical between conditions).
  • A post-task survey that is shown only to participants who actually completed the preceding chat (skipped for those whose chamber was terminated early).

Visibility conditions take the same form as slot constraints: a boolean expression against variable values, with the standard comparison operators and any / all connectors. Crucially, visibility is evaluated just before the chamber would begin, not once at the start of the run. This means that variables produced during the run — by an earlier chamber's segment submission, by a chamber post-survey — are usable as gating conditions for later chambers.

When a chamber is gated out for a participant, the participant skips directly to the next chamber in their chamberline. The chamberline itself does not change.

3.4.1 Slot requirements vs visibility conditions

A natural question at this point: a chamber whose every slot has a requires constraint already excludes participants who cannot fill any seat — they will never be matched. Why, then, does Carrier also offer a separate visibility condition at the chamber level? Don't they do the same job?

They overlap in one specific case and diverge in the others. The rule that governs the overlap is:

Implicit skip rule. If every human slot in a chamber has a non-empty requires constraint, and the participant satisfies no slot's constraint, then the chamber is silently skipped for that participant — exactly as if it had a failing visibility condition.

The skip is implicit: there is no separate visibility expression to write. The chamber is treated as not part of the participant's run, and they advance to the next chamber. This is what you observed.

Three scenarios make the distinction concrete.

Scenario A — Single fully-constrained slot. A solo-participant chamber whose one human slot requires var.condition == "treatment". A control-condition participant cannot fill it; the implicit skip rule fires; the participant moves to the next chamber. Here, slot requirements alone are sufficient — adding a chamber-level visibility condition would be redundant.

Scenario B — Mixed open and constrained slots. A two-slot chamber where one slot requires var.role == "expert" and the other slot has no constraint:

Slot 1   human communicator   requires:  var.role == "expert"
Slot 2   human communicator   requires:  (none)

A novice participant can fill Slot 2, so the implicit skip rule does not fire — the chamber is visible to them, and they will join it paired with an expert. If you intended this — “novices and experts meet here, with the expert always seated in Slot 1” — that is the correct behaviour and you should add nothing further. But if you intended the chamber to exist only for experts (with Slot 2 reserved for another expert who happens to be unconstrained because of how you defined the slot), the implicit skip rule will not save you. A novice will silently enter. To get a true gate here, add a chamber-level visibility condition var.role == "expert". Slot requirements cannot express this on their own.

Scenario C — Chamber-level gate plus slot-level role assignment. A debriefing chamber that should exist only for participants who completed a treatment chamber upstream, and within that chamber should pair a “writer” participant with a “reviewer” participant:

Chamber "Debrief"
   visible if:  var.completed_treatment == true
   Slot 1   human communicator   requires:  var.debrief_role == "writer"
   Slot 2   human communicator   requires:  var.debrief_role == "reviewer"

Both layers are necessary. The chamber-level visibility expresses who belongs in this chamber at all; the slot requirements express which seat each participant takes once they are here. Trying to collapse the gate into a slot constraint (for example, by adding var.completed_treatment == true to both slots' requires) works only by coincidence — the implicit skip rule fires because every slot is constrained — and it conflates two distinct intents. If a future edit relaxes Slot 2's requires to allow walk-ins, the chamber suddenly becomes visible to participants who never completed the treatment, and the gate is silently gone.

This last point is the deeper reason the two systems are kept separate. Visibility conditions express intent explicitly; slot requirements express it emergently. When the only way a chamber is skipped is “all slots happen to be constrained and the participant happens to satisfy none,” that skip is a side effect of a configuration that was written for a different reason. Edits to the slots — adding a slot, opening one up, retitling roles — can quietly remove the gate. An explicit visibility condition survives those edits and is auditable as a routing rule on its own.

A compact way to choose between them:

You want…Reach for
A chamber that exists only when seat-level requirements alone suffice to exclude the wrong participants, and you do not anticipate slot edits relaxing this.Slot requirements alone.
A chamber that should be hidden for some participants even though other participants would still find an open seat in it.A chamber visibility condition.
A chamber that gates a population and role-assigns within that population.Both: visibility condition for the gate, slot requirements for the seating.
A routing rule that should be self-documenting and resilient to slot edits.A chamber visibility condition (in addition to whatever slot requirements you also want).

In short: when slot requirements happen to act as a gate, treat that as a convenient side effect, not as the gate itself. If the chamber is meant to be conditional on a participant property, say so with a visibility condition.

3.5 Variables in instructions and prompts

The third use of variables is the most pervasive: interpolation into the natural-language content shown to or used by participants. Any place in the experiment where text is shown to a participant or sent to an AI is also a place where variables can be injected.

The interpolation syntax is {{var.<key>}} for a variable read from the current participant, and {{slot:<n>:var.<key>}} or {{slot:<role>:var.<key>}} for a variable read from a fellow slot. A few worked examples:

Instruction segment text

Welcome, {{var.first_name}}. In the next ten minutes you will discuss
climate policy with two other participants. The participant on your
left has expertise level "{{slot:1:var.expertise_label}}"; the
participant on your right has expertise level
"{{slot:2:var.expertise_label}}".

System prompt for an LLM mediator

You are facilitating a discussion among three participants. Their
self-identified political positions are: {{slot:1:var.party_label}},
{{slot:2:var.party_label}}, and {{slot:3:var.party_label}}. Adjust
your tone to be welcoming to all three positions; do not take sides.

Survey question stem

Earlier you said your most important consideration was
"{{var.top_value}}". On the slider below, indicate how strongly you
still feel that this is your most important consideration.

Two things to note. First, interpolation happens at the moment the text is needed — the same instruction segment used in a chamber that appears twice will be re-interpolated each time, with the latest variable values. Second, an undefined variable interpolates to the empty string by default, but the variable definition's default field can be used to specify a fallback. Where it matters, configure a default.

3.5.1 Per-option text vs per-option value in interpolation

Recall that a choice-type variable can carry both a label and a numeric value for each option (3.2.2). When interpolating into text, two conventions matter:

  • {{var.expertise}} interpolates the key of the chosen option (“low”, “moderate”, “high”).
  • {{var.expertise_label}} interpolates the display label of the chosen option (“Low”, “Moderate”, “High”).
  • {{var.expertise_value}} interpolates the numeric value of the chosen option (e.g. 2, 3, 4).

For a five-point Likert variable, the three forms give you the same information at three different levels of formality. Choose by context: instructions to participants benefit from the display label; an LLM system prompt typically benefits from the numeric value or the key.

3.6 Variables in trigger conditions

A fourth use of variables is in trigger conditions (§4): a bot's keyword trigger, time trigger, or message-count trigger can be qualified by a check on a variable. This makes triggers conditional on participant attributes — a bot might only respond to keyword “X” if slot:1:var.condition is “treatment”, for example. The full mechanics of trigger conditions are covered in §4.2; here it is enough to know that the same var.<key> references work inside trigger conditions as work elsewhere.

3.7 Builder walkthroughs

🎞 Builder walkthrough — Defining a variable

The researcher opens the Variables tab at the experiment level, clicks “Add variable”, and creates a expertise variable of type single-choice with five options. They label the options “Very low”, “Low”, “Moderate”, “High”, “Very high” and assign numeric values 1 through 5. They set the source to a question on the global pre-survey, then save. The new variable now appears in the variable list with three computed aliases — var.expertise, var.expertise_label, var.expertise_value — available throughout the experiment.

var-define.gif
🎞 Builder walkthrough — Using variables for slot matching

In a chamber configuration, the researcher opens Slot 1's settings, adds a required-property rule “var.expertise_value <= 2”, and saves. They do the same for Slot 2 with “var.expertise_value >= 4”. The chamber summary panel now displays a small badge — “novice + expert pair” — generated from the constraint pattern.

var-slot-filter.gif
🎞 Builder walkthrough — Gating a chamber on a variable

The researcher selects a remedial chamber in a chamberline, opens its visibility settings, and adds the rule “var.pretest_score < 0.5”. They save. The chamber is now visually marked as conditional in the chamberline outline, with a small icon indicating the visibility condition.

var-visibility.gif

3.8 Worked example: matched-pair deliberation with branching debrief

A study that pairs each novice with an expert for a short discussion, with a different debriefing depending on the discussion outcome.

Variables defined at the experiment level

KeyTypeSource
first_namestringGlobal pre-survey: “What name would you like to be called?”
expertisesingle-choice (5 options, numeric 1–5)Global pre-survey: “How would you rate your own expertise on this topic?”
chose_collaborativebooleanSegment submission from chamber B's selection segment

Chamberline (only one — between-subjects manipulation is unused here)

  1. Chamber A — Briefing. One human slot, no constraints. Segments: instruction, pre-survey.
  2. Chamber B — Discussion. Two human slots. Slot 1: var.expertise <= 2 (novice). Slot 2: var.expertise >= 4 (expert). Segments: a 10-minute chat segment, then a selection segment in which both participants choose between “collaborative” and “competitive” approaches.
  3. Chamber C1 — Debrief: collaborative. One human slot. Visibility: var.chose_collaborative == true. Segments: a tailored debrief survey.
  4. Chamber C2 — Debrief: competitive. One human slot. Visibility: var.chose_collaborative == false. Segments: a different tailored debrief survey.

The instruction segment in Chamber B reads:

Welcome, {{var.first_name}}. You're about to discuss with one other
participant. They have rated their own expertise as
"{{slot:other:var.expertise_label}}".

In this design, variables are doing three different kinds of work: matching constraints (Chamber B), instruction interpolation (the briefing text), and visibility branching (Chambers C1 vs. C2). All three are expressed against the same variable system.

Researcher: fictional placeholder. Substitute a real study when you provide one.

3.9 Common pitfalls

  • Variables that haven't been collected yet. A slot constraint or visibility condition that references a variable whose source has not yet fired is unsatisfiable. Plan the timeline: a constraint on var.expertise can be evaluated after the pre-survey has been submitted, not before. If you find a chamber timing out unexpectedly, check whether the variable it depends on has been produced.
  • Numeric versus label interpolation confusion. It is easy to write {{var.expertise}} when you meant {{var.expertise_label}}. The former interpolates the option key (“high”), which may not be what you want to show participants. When in doubt, prefer …_label for participant-facing text and …_value for arithmetic conditions and aggregates.
  • Aggregates including bots when you wanted humans only. The default is humans-only, but it is overrideable. If a group-composition constraint behaves strangely in a chamber that contains non-human confederates, check whether the aggregate is computing across all participants or only over humans.
  • Visibility conditions evaluated against stale variables. A variable produced in a chamber is available after that chamber completes. A visibility condition on a chamber B that depends on a variable produced in chamber A only works because A is earlier than B in the chamberline. Reordering the chambers — moving A after B — silently breaks the gate.
  • Default values that hide problems. Setting a default on every variable will keep your experiment from terminating on a missing value, but it can also mask a genuine bug (a survey question whose response was never recorded). For variables that drive matching or visibility, prefer to not set a default, and let the run terminate noisily, until you are confident the source is reliable.
  • Mixing identifier capitalisation. Variable keys are case-sensitive. var.partyID, var.partyId, and var.party_id are three different references. Pick one casing convention at the start of the experiment and apply it consistently.

Triggers — the rule system that drives the behaviour of every non-human participant in the experiment, the final piece of the four-system picture — are the subject of §4.

Part I · §4. Triggers

The rule system that drives every non-human participant's behaviour. A trigger combines a condition, a response, and optionally an action.

4.1 Why triggers exist

The previous three chapters answered the questions what shape is the experiment, who is in it, and what do we know about them. They left one question open: how do the non-human participants behave? A scripted chatbot, an LLM chatbot, and an agent are all just slots in a chamber until something tells them when to speak and what to say.

That “something” is the trigger system. A trigger is a rule that combines:

  • A conditionwhat has to be true before the trigger fires.
  • A responsewhat the bot says if the trigger fires.
  • Optionally, an actionwhat else the bot does if the trigger fires (disable a participant's input, prompt a specific person, transition the chamber, and so on).

You can think of a non-human participant's configuration as a book of rules: an ordered list of triggers, each with a condition–response–action triple. When something happens in the chamber, Carrier consults the book in order, evaluates the conditions against the current state of the chat, and fires the rules that match. This applies equally to deterministic scripted chatbots (whose responses are pre-written sentences), to LLM chatbots (whose responses are generated by a language model on the fly), and to agents (whose responses are generated by a language model that has consulted its tools first).

A useful analogy from the methods literature: a trigger is to an interactive bot what a confederate script is to a confederate participant — a list of contingent rules that say “when X happens, do Y”. Carrier's contribution is to make the list expressive enough to capture the contingencies real conversations contain, and machine-readable enough to be replayed identically across sessions.

4.2 The trigger model: condition → response → action

A single trigger has the structure:

Trigger
Condition
When does it fire?
Response
What does it say? (scripted or LLM)
Action(s)
What else does it do? (optional)
Modifierspriority · cooldown · max-fires · probability · chain target · variable filters
The structure of a single trigger: a condition, a response, optional actions, plus shared modifiers.

The next three sections expand each of these three parts. The modifiers are covered together in §4.6.

4.3 Conditions: when a trigger fires

The condition of a trigger is the part most worth understanding well, because it determines whether the rule ever runs. Carrier supports a catalogue of condition types organised by what they listen for: message content, time, message counts, sequences, participant events, and aggregate states.

4.3.1 The catalogue of condition types

TypeListens forTypical use
keywordA configurable word or phrase in a chat messageRespond when “climate” is mentioned; greet on “hello”.
regexA regular expression match in a chat messageRecognise URLs, profanity, structured statements like “I disagree with X”.
timeA delay (ms) from chamber or segment startSend an opening prompt 5 s in; remind participants of the time after 8 min.
message-countA total count of messages in the chatroomIntervene every N messages; introduce a summarisation at message 20.
participant-message-countA count of messages from a specific participantDetect that one person has dominated; reward an under-contributing participant.
sequenceAn ordered series of keyword matches“First X is said, then Y” — useful for staged conversation steering.
participant-actionA participant event such as join, leave, idleGreet on join; flag a drop-out; chain into a backup-bot prompt.
after-bot-messageA specified bot has just sent a messageCross-bot chaining; staged multi-bot interactions.
event-monitorAn arbitrary chatroom eventCatch dashboard interventions, segment transitions, embedded-child completions.
chain-only(Passive) Only fires when another trigger chains to itBuilding multi-step responses.
llm-drivenThe trigger asks an LLM to decide whether it should fire and what to sayOpen-ended judgement triggers — see §4.4.2.
periodic(Mediator only) Fires at a fixed intervalRegular check-ins, repeated summaries.
aggregate(Mediator only) Fires after N messages have accumulated within a time windowBatched synthesis.
topic-detected(Mediator only) Fires when a topic pattern is detectedTopic steering.
activity-timeout(Mediator only) Fires after a period of inactivityIdle prompting.
participant-count(Mediator only) Fires when the active participant count crosses a thresholdReacting to departures or arrivals.
discussion-phase(Mediator only) Fires at the start, middle, or end of a chamberPhase-appropriate facilitation.

Two practical notes:

  • Most of these conditions presuppose a chat segment. Keyword, regex, message-count, sequence, and the mediator-only message conditions all listen to the conversation; if the chamber is currently in a slide or a survey segment, they are silent. Time and participant-action conditions, by contrast, can fire from any segment.
  • The mediator-only types (periodic, aggregate, topic-detected, activity-timeout, participant-count, discussion-phase) are enforced as mediator-only by both the builder and the runtime. The builder hides them from non-mediator agents' trigger pickers, and the runtime only initialises and evaluates them for bots with role === 'mediator'. Attaching one of these types to a communicator or processor (e.g., by hand-editing the JSON) will silently never fire.

4.3.2 Modifiers shared by all condition types

Every trigger, whatever its condition type, can be qualified by a small set of modifiers:

ModifierMeaning
Case sensitivityFor keyword and regex conditions, whether matching is case-sensitive (default: insensitive).
Match modeFor multi-value conditions, whether any of the values is sufficient (the default) or all are required.
Sender filterRestrict matching to messages sent by humans, by a specific participant, or by a participant of a specific role. The default is to consider every sender.
Variable filterA boolean expression over var.* references; the trigger does not fire unless the expression is satisfied. This is how a trigger becomes condition-dependent — for example, “only fire this prompt if the current speaker's var.condition == 'treatment'”.

The variable filter is especially powerful in combination with §3: a single set of triggers, attached to a single bot template, can produce qualitatively different behaviours in different chamberlines purely on the basis of the variables that are true of the participants in front of it.

4.3.3 An aside: triggers, segments, and the “active segment” filter

Most trigger configurations include an active-segments filter — a list of segment IDs (within the chamber) during which the trigger is eligible to fire. Leaving the list empty means fire in any segment. Restricting it to a specific segment — typically a chat segment — is the canonical way of writing “this rule only applies during the deliberation, not during the survey at the end”. The filter also accepts embedded child segments, which is how a trigger can be made to fire only while a particular embedded vote overlay is open.

This is a small detail, but it is the source of a very common pitfall: a trigger that “doesn't seem to fire” is often a trigger whose active-segment filter excludes the segment the chamber is actually in.

4.4 Responses: what the bot says

When a trigger's condition fires, the response tells Carrier what message to send. There are two fundamentally different kinds of response.

4.4.1 Scripted responses

A scripted response is a pre-written sentence (or a list of sentences from which Carrier picks one at random). It is the only kind of response a scripted chatbot can produce, and it is also available to LLM chatbots and agents for cases in which the researcher wants exact-text control.

Configuration is minimal:

  • Message — a single fixed string, or a list from which the platform draws (uniformly, by default).
  • Delay — an interval before the message is actually sent, simulating typing or thought.
  • Probability — the chance that the trigger fires at all when its condition is satisfied (default 1.0). Used when researcher wants stochastic intervention.

Scripted responses are deterministic: identical inputs produce identical outputs. Two participants who encounter the same conversation in the same condition will receive the same scripted response, with the same delay, in the same order. This makes them the right choice for studies in which experimental control matters more than naturalism — confederated-AI conformity studies, attention-check probes, scripted “noise” injections, and so on.

4.4.2 LLM-generated responses

The alternative is to let an LLM generate the response on the fly. A language-model response is produced by sending a request to a language-model provider (OpenAI, Anthropic, Google, or a compatible service) with a prompt assembled from:

  • The system prompt of the bot (its personality, role, instructions).
  • A configurable amount of chat history as context (none, the last N messages, or the full history — see §2.5.5 for the analogous setting on processors).
  • Variable interpolations (§3.5) — the participant's name, condition assignment, current state.
  • Optional chain steps — a sequence of LLM calls in which each step's output becomes part of the next step's input. Chains are how a single trigger can implement plan, then critique, then rewrite behaviour.

The response of an LLM-driven trigger is therefore open-ended: the model can produce any text that satisfies its prompt, modified each time by the current state of the conversation. Where scripted responses trade variation for control, LLM responses trade control for naturalism. They are the right choice for studies in which the realism of the bot's behaviour is itself part of what is being tested.

A note on the response format: Carrier's LLM responses follow a structured JSON shape with three fields — content (the message text, or null to remain silent), rationale (a brief justification, logged but not shown), and an optional actions list (the topic of §4.5). The structure exists so that the AI can decide “remain silent” without producing a blank message, and so that researchers can audit why the AI chose to speak after the fact.

4.4.3 Choosing between scripted and LLM responses

A short heuristic that holds in most studies:

If you want…Prefer…Because…
Exact replicability across sessionsScriptedSame input → same output, always.
Naturalistic, varied bot behaviourLLMThe model adapts to what the participant actually said.
A confederate that does not break characterScriptedLLMs can drift; scripted text cannot.
A facilitator that responds to topics in real conversationLLMResearcher cannot pre-write every possible response.
Strict auditing of bot speechScriptedThe set of possible utterances is finite and visible.

The two are not mutually exclusive on a single bot. A bot template can mix scripted triggers (for greetings, attention checks, and exit messages) with LLM triggers (for the bulk of the discussion), routing each contingency through whichever response type fits best.

4.4.4 Response logic and silence

For LLM chatbots and agents, two additional configuration parameters govern when not to speak. They sit alongside the trigger list and are sometimes more important than it:

  • triggerOnFirstMessage — whether the bot is allowed to make the first move (greet, open the discussion) before any human has spoken.
  • respondToEveryMessage — whether the bot should attempt to respond after every incoming message, or only when one of its triggers explicitly says so.
  • respondOnMention / mentionKeywords — whether the bot only responds when its name (or a configured list of keywords) appears in the chat.
  • initialSalute — a configured opening message sent on chamber start, regardless of triggers.
  • timeoutTrigger — a “fail-safe” prompt the bot sends after a configured silence interval if no other trigger has fired.

These five toggles, together with the trigger list, are how a researcher tells an LLM chatbot or agent the difference between a talkative configuration (“respond whenever spoken to, and also if no one has spoken for a minute”) and a reserved one (“only respond when explicitly addressed by name”).

4.5 Actions: what else the bot does

A trigger's action field — present on every trigger, optional in most cases, central for mediators — is the side effect that fires alongside (or instead of) a response message. Actions are what allow a bot to do something to the chat rather than just in the chat.

Carrier exposes actions through four distinct mechanisms, which are easy to confuse but do different things. Read this list once before the subsections below:

MechanismAvailable toWhat it doesDecided when?
Scripted actionsScripted chatbots, and any LLM bot whose trigger has a baked-in actionApply a Carrier intervention (disable chat, prompt, highlight, …) with parameters fixed in the trigger configurationAt configuration time
LLM-chosen Carrier actionsLLM chatbots and agents acting in a mediator roleThe model emits a structured response whose actions field selects which interventions to fire and with what parametersAt each turn, by the model
Segment-submission actionsAll non-human participants (scripted or LLM) configured as communicators in an interactive segmentSubmit a vote, a ranking, or a free-text answer alongside humans — optionally counting toward the segment's completionAt configuration time, with the submitted value optionally resolved per session
Agent built-in toolsAgents only (Claude Agent API)The Claude Agent autonomously reads files, runs commands, or browses the web to gather information for itself before producing its messageAcross multiple internal steps within a single turn, by the agent

The first two are about the bot acting on the chamber — they affect what participants see and what they are allowed to do. The third is about the bot acting as a participant within an interactive segment — its submission joins the humans' in the segment's results. The fourth is about the agent informing itself — it affects what the agent knows when it speaks, but its only externally visible output is still the eventual chat message. Sections 4.5.1 to 4.5.4 cover each in turn; 4.5.5 helps choose among them.

4.5.1 Scripted actions

A scripted action is an unambiguous instruction baked into the trigger configuration: “when this rule fires, disable participant X's input until either 60 s have passed or all other participants have responded, whichever comes first.” The action is part of the rule. The bot's job is to fire the trigger; the action's effect on the chat is deterministic, predictable, and visible in the configuration before the experiment runs.

Carrier's primary catalogue of scripted-action types — drawn from §2.4.3 — is:

  • disable_chat — temporarily prevent a participant from sending messages, with a composite release condition.
  • enable_chat — explicitly re-enable input.
  • prompt_participant — send a private prompt visible only to a specific participant.
  • highlight_message — visually highlight a past message for a configurable duration.
  • request_attention — trigger a visual or auditory cue at a specific participant.

There are also a small number of chamber-level scripted actions used less frequently:

  • advance_segment — force the chamber to move to its next segment immediately.
  • terminate_chamber — end the chamber early.
  • set_variable — set a variable on a participant or on the chamber, useful for cascading state into later chambers' visibility conditions (§3.4).

Scripted actions are how a confederated-chatbot conformity study can be made literally identical across participants: the rules say “disable the human's input for 30 s after the first confederate's message” or “highlight the confederate's response in green for 5 s”, and the chat then unfolds with the same scaffolding in every session.

4.5.2 LLM-chosen Carrier actions

The alternative — and the more powerful in open-ended designs — is to let the language model itself decide which Carrier interventions to fire. This is the mechanism by which an LLM-driven mediator can act contextually: if the model judges that one participant has been quiet for too long, it can choose to issue a prompt_participant; if it judges that the conversation has drifted off-topic, it can choose to send a styled broadcast.

The rule does not pre-specify the action's parameters. Instead, the trigger fires an LLM call (with the bot's system prompt and chat context), and the model returns a structured response of the form

{
  "content":   "<broadcast text, or null>",
  "rationale": "<one-sentence justification, logged>",
  "actions":   [ { "type": "prompt_participant",
                   "target": "slot:2",
                   "message": "What do you think?" },
                 { "type": "disable_chat",
                   "target": "slot:1",
                   "release_conditions": { ... } } ]
}

The model's selection of actions is constrained by the bot's configured action vocabulary (researchers can choose to expose only a subset of action types to the model) and by the bot's role: a communicator's action vocabulary is small (no disable_chat); a mediator's is large. This mechanism is available to any LLM-driven mediator — both LLM chatbots and agents acting in a mediator role.

This is the most expressive — and also the least replicable — corner of Carrier intervention. The trade-off is real: an LLM-chosen action set gives you a facilitator that adapts, but the cost is that two sessions of the same condition may diverge in their facilitation. The right choice depends on what the experiment is testing.

4.5.3 Segment-submission actions

The interactive segments introduced in §1.4.2selection, ranking, and input — collect an answer from every participant in the chamber. A non-human participant configured as a communicator can be set up to submit alongside the humans, the way a confederate in a behavioural lab study would. The submission is attached to the bot's participant identity, surfaces in the segment's results, and appears in the exported data with the same shape as a human's answer.

What the bot submits depends on the segment type. For a selection, it is one or more option indices (or, in slider mode, a numeric value within the slider range). For a ranking, it is a permutation of the item indices. For an input, it is a string — or a number, for the numeric input subtype. In every case the submission shape mirrors the human's, so cross-participant aggregates and downstream variable expressions (§3) read bot and human submissions uniformly.

Where the value itself comes from is configured per trigger. Carrier supports four data modes:

  • Static — the researcher hardcodes the value. The bot submits exactly that, every session. Useful when the value is the manipulation.
  • Random — the value is drawn from a configured pool, optionally weighted. For selection and ranking the pool is the segment's own options; for input it is a researcher-provided list of candidate strings, since “random text” without an anchor is not meaningful.
  • Referenced — the value is derived from what humans have already submitted in the same segment. Strategies include match the first human's answer, match the majority option, oppose the majority, and pick a different option at random. For selection and ranking all four strategies translate naturally; for input, only verbatim copy of a target human's text is well-defined.
  • LLM-generated — the bot calls the language model with the segment's prompt, the chamber's chat context, and a JSON-shaped schema instruction. The model returns a structured submission (an index, a permutation, or a string). This is the most flexible mode and the one most often appropriate for input.

Three submission-metadata flags refine how the bot's answer is treated by the rest of the chamber. countTowardTotal decides whether the bot's submission contributes to the “everyone has answered” check that releases the chamber forward — set to false if the bot is a passive confederate that shouldn't gate progression. showInResults decides whether the submission appears in any aggregated results display participants see at the segment's end. tagAsBot decides whether the submission is visually marked as bot-origin in the UI; the default is false, so the bot is indistinguishable from the humans, which is usually what a confederacy design requires.

input deserves a brief separate note. Free-text submissions raise sharper measurement-validity questions than categorical or ordinal ones: a bot's prose is harder to compare across sessions than a chosen option, and small differences in wording can have outsized effects on the humans who read it. The recommended pattern when using input with LLM-generated mode is a tightly scoped system prompt, the raw model output logged in the export for audit, and tagAsBot: true whenever participants will read the submission and the design should be transparent about its bot origin.

A worked example: a confederacy study runs a selection segment (“Which option do you find more compelling?”) followed by an input segment (“In one sentence, why?”). Two configured bots use static mode in the selection segment, picking option A in every session; in the input segment they switch to LLM-generated mode under a system prompt instructed to elaborate on option A in plain, peer-like language. The participant sees four answers in each segment — two human, two bot — and across sessions the design holds: the bots' selections are reproducible to the index, and the bots' free-text answers vary in surface form while cohering around the same content.

4.5.4 Agent built-in tools (Claude Agent API)

The fourth mechanism is internal to the agent itself, and only applies to participants of the agent type (§ note on non-human participants). A Claude Agent has access to a vocabulary of built-in tools provided by the Anthropic Agent API — tools for reading files in a configured document area, executing small commands, and browsing the web. Before producing the message it will eventually send to the chat, the agent's underlying model can autonomously decide to invoke one or more of these tools, examine the results, and iterate.

A typical pattern, from the chamber's point of view:

  1. A trigger fires that asks the agent to respond.
  2. The agent reads (silently) the section of the configured study brief that is relevant to the conversation so far.
  3. The agent runs (silently) a short check against a dataset to confirm a number it is about to cite.
  4. The agent produces a single chat message that quotes the relevant passage and reports the number.

Steps 2 and 3 are internal to the agent. Participants in the chat see only step 4 — a single grounded reply. The agent's internal trace (which tools it called, with what arguments, and what results it got back) is preserved in the exported data for the researcher to audit, but it is not shown to other participants.

The two design decisions a researcher makes for an agent are therefore:

  • Tool scope. Which of Claude's built-in tools to enable, and — for the file-reading tool — what document area to expose. The narrower the scope, the more focused the agent's contributions; the wider the scope, the more open-ended.
  • Step budget / latency. Agents take longer per response than LLM chatbots, because they loop. Configure a maximum step count (or wall-clock budget) so the agent does not hold up the chamber. The platform shows a “thinking” indicator while the agent is in its loop.

Two notes on relating 4.5.4 to 4.5.2:

  • An agent acting as mediator can fire LLM-chosen Carrier actions (4.5.2) and use its built-in tools (4.5.4) in the same turn. The former affects the chat; the latter informs the model.
  • The two channels are logged separately in the exported data. The Carrier action log records intervention actions; the agent trace records tool invocations.

4.5.5 Choosing among the four mechanisms

If you want…Prefer…
Identical turn-taking enforcement across sessionsScripted actions (4.5.1) with fixed release conditions.
Facilitator interventions that respond to what was actually saidLLM-chosen Carrier actions (4.5.2), constrained to a small vocabulary.
To study the effect of a particular intervention patternScripted — the intervention is the manipulation, so it must be uniform across participants.
To study whether an automated facilitator helps at allLLM-chosen — the manipulation is the model's judgement, so it must vary contextually.
A non-human participant that votes, ranks, or writes alongside humansSegment-submission actions (4.5.3) — pick a data mode according to how reproducible the submission needs to be.
A confederacy condition where the non-humans submit identical answers in every sessionSegment-submission actions in static mode (4.5.3).
A non-human participant that cites the study material accuratelyAn agent with file-reading enabled over the materials (4.5.4); the agent quotes what it reads.
A non-human participant that fact-checks live during conversationAn agent with web-browsing enabled (4.5.4).
A non-human participant that runs computations over a dataset before answeringAn agent with code execution enabled (4.5.4).

4.6 Composing triggers

Real bots rarely consist of a single trigger. The composition surface lets you make the rules interact:

  • Priority. Each trigger has a numeric priority. When multiple triggers' conditions are satisfied by the same event, Carrier evaluates them in descending priority order and fires the highest-priority match. Use priority to handle exceptions: an attention-check trigger with high priority can override a greeting trigger that would otherwise also fire.
  • Cooldown. A trigger can specify a minimum interval that must elapse between successive firings. Used to prevent a bot from spamming when a condition stays true for a while.
  • Max fires. A trigger can cap how many times it ever fires per chamber (e.g. an introduction trigger that only fires once).
  • Probability. A trigger can fire with a configured probability less than 1.0 when its condition is satisfied, producing stochastic interventions.
  • Chain target. A trigger can specify another trigger's ID to fire after it completes. This is how multi-step behaviours are built: trigger A says something, then chains to trigger B which fires a follow-up question after a delay, which chains to trigger C which records the response. The chained trigger's condition can be chain-only, which means it can only ever fire by being chained — useful for keeping cascading sequences out of the normal trigger queue.

Together, these modifiers turn a flat list of triggers into a directed graph of contingent behaviour. Most experiments need only flat lists; the modifiers are there for designs that demand them.

4.6.1 Variable conditions in triggers (again)

The variable-filter modifier from §4.3.2 deserves a second mention here because of its compositional consequences. A single trigger list with variable filters

TRIGGER 1   if var.condition == "treatment", respond with X
TRIGGER 2   if var.condition == "control",   respond with Y

is functionally equivalent to two bots, one per condition, with one trigger each. Whether to write the contrast as “one bot with two filtered triggers” or “two bots, one per condition” is a design decision: the former keeps the experiment shorter and easier to read; the latter is sometimes clearer when the two conditions differ in many small ways.

4.7 Builder walkthroughs

🎞 Builder walkthrough — Adding a keyword trigger to a scripted communicator

The researcher opens a scripted-agent template, clicks “Add trigger”, picks the keyword condition type, enters “climate” as the keyword, and writes three response variants. They set the response delay to 1500 ms and the probability to 0.7, leaving the cooldown at the default of 30 s. The trigger appears in the agent's script list with a small chip marking its condition type.

trigger-keyword.gif
🎞 Builder walkthrough — Chaining two triggers

The researcher creates a time trigger (“at 5 minutes, ask 'how is everyone feeling?'”) and a chain-only trigger (“respond to the first new message with 'thanks for sharing'”). They open the first trigger and set its chain target to the second trigger's ID. A small arrow now appears between the two triggers in the agent's script outline.

trigger-chain.gif
🎞 Builder walkthrough — An LLM-driven trigger with action vocabulary

The researcher creates a mediator with a single llm-driven trigger that fires every 90 s. They configure the system prompt to “facilitate a balanced discussion” and enable two actions on the bot's action vocabulary: prompt_participant and highlight_message. They disable disable_chat. The mediator can now choose, on each firing, to broadcast a message, to prompt a quiet participant, or to highlight an important contribution — but not to disable anyone's input.

trigger-llm-driven.gif

4.8 Worked example: a probing mediator that escalates on disagreement

A mediator designed to keep a deliberation balanced and to escalate its intervention when the discussion becomes adversarial.

Bot template — “Probing mediator”

  • System prompt (used for any LLM-driven response): “You are facilitating a small-group discussion. Your goal is to keep the discussion balanced and respectful. You do not take sides on the topic.”
  • Triggers:
IDTypeConditionResponseActionPriority
T1time5 s after chamber startscripted: “Welcome — please introduce yourselves briefly.”10
T2periodicevery 120 sLLM-driven: a one-sentence neutral summary5
T3keywordmatches ["disagree", "wrong", "no, you're"] (any)LLM-driven: an empathetic acknowledgementLLM-decided: optionally prompt_participant for the speaker on slot 1 if they have not spoken in 60 s7
T4activity-timeout90 s of silencescripted: “Anyone want to add to that?”6
T5message-countevery 30 messagesLLM-driven: longer synthesis4

This bot, attached to the mediator slot of a deliberation chamber, will: open with a scripted welcome (T1, deterministic); provide periodic neutral summaries (T2, LLM-generated, varied); intervene with empathetic acknowledgements when disagreement words appear (T3, with an optional prompt action); break long silences with a fixed nudge (T4); and produce longer summaries periodically (T5). The priority ordering means that when a disagreement keyword and an activity timeout both fire, T3's acknowledgement runs first.

Researcher: fictional placeholder. We will substitute a real study when you provide one.

4.9 Common pitfalls

  • Triggers that never fire because the active-segment filter is wrong. Easily the most common reported bug. If a trigger is failing to fire, the first place to check is whether its active-segment list includes the segment the chamber is actually in. An empty list — meaning all segments — is the safest default while developing.
  • Cooldowns that swallow the trigger. A keyword trigger with a 60-second cooldown will respond only once per minute, even if the keyword is uttered repeatedly. For “single intervention per chamber” semantics, prefer maxTriggers: 1 over a long cooldown.
  • Priority collisions on simultaneous events. When two triggers fire at exactly the same event with the same priority, the order is undefined. If the order matters, give one trigger a strictly higher priority. The builder shows a warning when two triggers share the same priority.
  • LLM triggers fired too often. An LLM trigger fired on every message in a long deliberation can become expensive (in API tokens and in latency) and can also drown out human conversation. Use response-logic settings (respondOnMention, respondToEveryMessage: false) or message-count cooldowns to keep LLM bots from speaking on every turn.
  • Mixing scripted and LLM responses on the same trigger. A trigger has one response type. If a researcher wants the bot to either say a fixed sentence or an LLM-generated reply depending on context, they should write two triggers — one scripted and one LLM-driven — and use variable filters or priority to disambiguate.
  • LLM-chosen actions on attention-check chambers. Attention checks are typically the place researchers most want determinism; allowing the bot to choose its own actions during an attention check undermines the check. Scope LLM-chosen Carrier actions to discussion segments via the active-segment filter.
  • Conflating an agent's built-in tools with its Carrier actions. A Claude Agent's file/code/web tools are internal to the agent — they affect what it knows, not what participants see. The Carrier intervention actions (disable_chat, prompt_participant, …) are external — they affect the chat. When an agent acts as mediator, both channels are available; researchers occasionally configure one expecting the effect of the other. The action log and the agent trace are logged separately in the export for exactly this reason.
  • Forgetting that responses can be null. An LLM-driven response that returns content: null is a deliberate silence — the bot has been asked and has chosen not to speak. This is not a bug; it is one of the things LLM-driven bots are best at. Inspect the rationale field in exported data to understand why the bot stayed silent.

This concludes Part I. The four systems — chamberlines/chambers/segments, roles, variables, and triggers — together define everything a Carrier experiment can express. Part II turns to running an experiment built with them.

Part II · §5. Running and monitoring experiments

Part II is short by design. Most of what makes Carrier worth using is in Part I; what follows is the day-to-day mechanics of running a study built with the four systems above.

5.1 The experiment lifecycle

Every experiment moves through a small lifecycle. Its status field — visible at the top of the builder and on the dashboard — takes one of five values:

StatusMeaning
DraftThe experiment is being edited. Participants cannot enter it.
ActiveThe experiment is open. New participants who visit the URL begin a run.
PausedNew participants are blocked, but existing runs continue. Use during pilots when you want to freeze enrolment without disrupting in-progress sessions.
CompletedThe experiment is closed. No new runs; existing data remains exportable.
ArchivedThe experiment is hidden from the main dashboard listing. Data remains exportable.

The transition from Draft to Active is the activation step. The builder will refuse to activate an experiment that has obvious gaps — no chamberlines, an unfilled bot template, an invalid variable reference — but it will not catch every error. Pilot every experiment against yourself (and ideally a colleague) before opening it to real participants.

5.2 The dashboard at a glance

The dashboard is the experimenter's command surface during a live experiment. It has four panels, of which the first three are tightly coupled.

Active sessions. A list of every participant currently in a run, with their current phase (initialisation / identity setup / global pre-survey / chamber line execution / global post-survey / completed), the chamberline they were assigned to, and the index of the chamber they are currently in. Clicking a participant opens a per-participant detail view.

Matching queue. A list of every participant currently waiting to be matched into a chamber. Each entry shows the chamber the participant is waiting for, how long they have been waiting, and what slot constraints (§3.3) need to be satisfied for them to be admitted. A queue that grows steadily during an experiment is the symptom of a constraint that is too tight.

Alerts. A rolling list of events that warrant attention: disconnects, long waits, drop-outs, and idle participants. The dashboard surfaces these in priority order; experimenters typically watch this panel rather than the others.

Chatrooms. A list of every active chatroom (matched chamber). Each entry can be opened to show the live transcript, the broadcast log, the action log, and the processor-interaction log. This is the place to watch a chamber unfold in real time.

The dashboard auto-refreshes; no manual refresh is required.

5.3 Live monitoring and intervention

Three kinds of intervention are available from the dashboard during a live session.

Pause a participant's run. Halts the run at its current phase. The participant sees a paused indicator; segments and chamber timers do not advance. Resume with a single click. Use when a participant has hit a problem you want to debug before they continue.

End a participant's run. Terminates the run with a configurable completion message. The participant is shown the message and the global post-survey is skipped (unless explicitly forced). Used for participants who cannot continue — disconnections that will not heal, attention-check failures, withdrawal requests.

Host-advance a segment. Forces a segment whose transition mode is host (see §1.5) to advance for the chamber. Used during pilots to step a chamber through its timeline without waiting for timers or for participant clicks.

A fourth, lighter intervention — broadcast a message into a chatroom — is available from the chatroom detail view. The message appears in the chat as an experimenter announcement. Use sparingly: every dashboard broadcast is recorded in the chat transcript, so it becomes part of the dataset.

5.4 Exporting data

Data export is the final step of an experiment. It is available from the experiment detail page and the dashboard.

Two parameters control what comes out:

ParameterChoicesEffect
FormatJSON · CSVThe serialisation of the export. JSON preserves nesting; CSV flattens.
TypeAll · Participants · Chatrooms · ResponsesWhat subset of the experiment to include.

The four export types correspond to four levels of granularity:

  • Participants. One row per participant per chamberline assignment, with their identity, demographics, status, and variable values.
  • Chatrooms. One row per chatroom (matched chamber), with the chamber's participants, settings, and timestamps.
  • Responses. All survey responses across all surveys (global pre, global post, chamber pre, chamber post, embedded segment surveys), keyed by participant and survey ID.
  • All. Every preceding type, plus the full chat transcripts and the broadcast / action / processor-interaction logs.

For a quantitative analysis pipeline, Responses and Participants in CSV are usually the right starting point; for a qualitative pass over conversation, All in JSON gives you the structure to operate on.

Exports are produced on demand; there is no waiting queue. For very large experiments, the export endpoint accepts a participant filter, so you can export a single chamberline or a single date range without downloading the entire experiment.

For a category-by-category description of what each export actually contains, see §5.5.

5.5 What's in your data

An export is not a single thing. It is a layered snapshot of a run viewed from several angles — the survey angle, the conversation angle, the timing angle, and so on. Most analysis questions touch two or three of these layers at once. This section walks through what Carrier captures for every run, what it does not capture, and which export type each kind of data lands in.

The map below shows, at a glance, which export type carries which category. All is a superset; researchers who plan to do anything beyond the simplest summary should default to it.

Data category Participants Chatrooms Responses All
Identity, assignment, status
Survey responses
Chat transcripts
Timing and pacing
AI / processor interactions
Behavioural events
Attention checks and face monitoring
Assignment and reproducibility

The eight sub-sections that follow describe each category in turn, including the cases in which a category is empty by design.

5.5.1 Survey responses

For most studies, this is the primary data. Carrier captures survey responses in four places:

  • The global pre-survey, completed before any chamber, once per run.
  • The global post-survey, completed after the final chamber, once per run.
  • Chamber pre- and post-surveys, completed at the boundaries of each chamber.
  • Embedded segment surveys — the survey segment type — completed within a chamber as part of its segment timeline.

Two shapes come out together. The raw Survey.js JSON preserves nested question structures (matrices, panels, conditional branching) and is appropriate when the response shape itself matters. The flattened response rows give one row per question per participant, with questionId, questionText, response, responseType, and a stage indicator pointing at the survey instance the answer belongs to.

For most quantitative pipelines, Responses in CSV is the right starting point. For qualitative analyses or for questions where the survey was deliberately non-trivial, All in JSON preserves the structure you need to operate on.

5.5.2 Chat transcripts

Every message exchanged in every chatroom is preserved verbatim. Each message carries a sender (human participant, LLM chatbot, scripted chatbot, agent, mediator bot, or system), an ISO timestamp, and a message type that distinguishes ordinary text from system notifications, joins and leaves, mediator broadcasts, bot and AI responses, and processor suggestions.

System messages are interleaved with the conversation rather than stored on the side, which means a researcher reading the transcript chronologically sees joins, disconnects, broadcasts, and attention-check events in situ. Chatrooms and All exports include the full chat history; Participants does not.

5.5.3 Timing and pacing

Several layers of timestamps come out together.

  • Run-level. When the run started, when each phase transitioned, when the run completed or was terminated, and the reason for termination.
  • Chamber-level. When matching happened, when the chatroom began, when each segment within the chamber started, when the chamber ended, and the actual elapsed duration.
  • Per-message. Every chat message carries an ISO timestamp.
  • Connection-level. Heartbeats, reconnection counts, and the participant's total time in the experiment.

These together let researchers reconstruct any per-participant duration of interest — time-to-first-message, time between segments, time spent re-reading instructions — without custom instrumentation.

5.5.4 AI and processor interactions

When chambers use the processor role, every assist event is logged with its full text. Review interactions carry the draft text that was submitted, the feedback that came back, and whether the communicator accepted, rejected, or edited the suggestion. Generate interactions carry the request and the generated response. Real-time assist suggestions carry their content and outcome.

For chambers that use an LLM chatbot, mediator, or agent, the model's reply is stored in the chat history alongside human messages, with sender metadata identifying the role and, where set, the provider. For agents on the Claude Agent path — where memory is provider-managed — the provider's session handle is preserved on the chatroom so that Carrier-side and provider-side timelines can be aligned after the fact.

5.5.5 Behavioural events

When client-side instrumentation is active for a segment, Carrier captures a stream of low-level events: tab visibility changes, focus changes, pointer activity, clicks, and a small set of custom events raised by specific segment types. Per-segment summaries are produced automatically — most commonly tab-away count and total tab-away time — and the raw event stream is preserved for replay or fine-grained sequence analysis.

This data is opt-in by segment. Researchers who want it should confirm that the relevant segments have behavioural-events instrumentation enabled in the builder before piloting.

5.5.6 Attention checks and face monitoring

The attention-check segment captures a result record per attempt: the mode (face-based or survey-based), whether it passed, the retry count, and any mode-specific details. The record appears in two places — a structured array attached to the run, and a corresponding system message interleaved into the chat transcript at the moment of the check.

Face monitoring, when enabled on a chat segment, emits its own event stream: warning shown, face returned, grace expired, paused, resumed, terminated. It is stored the same way: a structured array on the run plus interleaved system messages in the transcript.

Both categories are present only when the experiment was configured to produce them. Their absence in an export is not a missing value; it means the experiment did not ask for them.

5.5.7 Assignment and reproducibility

For anyone who needs to reconstruct, after the fact, why a given participant saw what they saw, the export carries:

  • The chamberline each participant was assigned to, and the reason (random, counterbalance, survey-based, or fixed).
  • A frozen snapshot of the participant's run plan — the chambers in their assigned order, each with its role and slot assignment for that participant.
  • The experiment's version at the moment the run was created, so that a later configuration change does not corrupt the interpretation of earlier runs.
  • A condition seed where randomisation was involved.

Combined with the admin-side activity log (see §6), this is sufficient to reproduce a participant's path through the experiment exactly.

5.5.8 What's conditional

Several categories appear only when the experiment is configured to produce them. Worth flagging up front, so that an absent column is not mistaken for a bug:

  • Behavioural events require client-side instrumentation enabled on the relevant segments.
  • Attention-check results require an attention-check segment in the chamberline.
  • Face-monitoring events require face monitoring enabled on a chat segment.
  • Processor interaction logs require at least one chamber to use a processor role.
  • Non-human sender metadata (role, provider) is populated when the message originates from a chatbot, mediator, or agent; for human messages those fields are empty by design.
  • Variable values appear only for variables the experiment defined; there are no system-provided demographic variables.

If a researcher expects one of these and finds it missing, the place to check is the experiment configuration, not the export.

5.6 Pilot first, ramp second

A short note that does not fit anywhere else in this guide but matters in practice. Every Carrier experiment benefits enormously from a small pilot — three to five participants, ideally including the researcher themselves — before being opened to a larger sample. Pilots are the only reliable way to catch the kinds of issues that the builder cannot validate: a slot constraint that is unsatisfiable in practice, a chamber timing that is too short to read the instructions, an LLM mediator whose system prompt produces unexpected behaviour on real conversations, a survey question that is ambiguous to actual participants.

Pilot with the experiment status set to active and the dashboard open. Watch the matching queue, watch the chat transcripts, and watch the action log. Most experiments end up requiring at least one round of revision after the first pilot. This is normal; budget time for it.

Part II · §6. Administration

Accounts, collaborators, and the admin portal — the parts of Carrier that exist to keep multiple researchers working on the same platform.

6.1 Accounts and collaboration

Every researcher account in Carrier has a role: either researcher or admin. Researchers can create, edit, run, and export their own experiments; admins additionally manage the user list and the activity log.

An experiment has one owner and any number of collaborators:

  • The owner can edit everything, transfer ownership, add and remove collaborators, and delete the experiment.
  • A collaborator can edit the experiment's configuration and view its data, but cannot transfer ownership, add other collaborators, or delete the experiment.

This separation is the simplest model that supports the common pattern of one PI owning each study and several lab members helping to configure and run it.

6.2 The admin portal

The admin portal is available only to users with the admin role. It exposes three sub-areas.

User management. Create, update, enable, and disable user accounts. Disabling an account preserves all of the user's experiments and data but prevents them from logging in. This is the right action when a lab member leaves; deletion is rarely necessary.

Registration approval. When self-registration is enabled, new sign-ups arrive in a pending state. The admin reviews each request — typically by checking the requester's institutional email and the project they intend to use Carrier for — and approves or rejects.

Activity logs. A chronological log of meaningful actions across the platform — logins, experiment creations, role changes, exports. Useful both for accountability and for understanding usage patterns when scaling the platform across multiple labs.

Appendices

Glossary, type × role matrix, and quick-reference indexes for segment types and trigger types.

Appendix A · Glossary

TermDefinition
Aggregate variableA variable computed over multiple participants in a chamber. Configurable to include or exclude bot/agent participants.
Active-segment filterA list of segment IDs during which a trigger is eligible to fire. Empty list = fire in any segment.
Chain targetAnother trigger ID that fires after this one completes. Used to compose multi-step bot behaviour.
ChamberA timed grouping of matched participants who share the same segments and remain together until the chamber ends.
ChamberlineAn ordered sequence of chambers, representing one experimental condition. A participant is assigned to exactly one.
Chamberline filterA condition under which a participant is eligible for a given chamberline; used by survey-based assignment.
ChatroomThe live, runtime instantiation of a chamber for a particular matched group.
CommunicatorThe role of a primary conversational participant. The “default” role in any chamber.
Embedded segmentA selection or ranking segment displayed as an overlay on a chat segment, so participants can vote or rank without leaving the conversation.
Global pre-survey / post-surveySurveys at the very start and very end of a run. Distinct from chamber-level surveys.
AgentAn autonomous non-human participant built on Anthropic's Claude Agent API. Has built-in tools for reading files (in a configured document area), running code, and browsing the web; uses them on its own initiative to inform its messages. Distinct from an LLM chatbot.
Agent built-in toolsThe file-reading, code-execution, and web-browsing tools available to an agent via the Claude Agent API. Used for information gathering; distinct from Carrier intervention actions.
LLM chatbotA non-human participant that produces chat messages from a language model, with no tools and no scripted rules. Open-ended, varies across sessions.
LLM-chosen Carrier actionAn intervention action (disable_chat, prompt_participant, …) selected at runtime by an LLM-driven participant via its structured response. Available to any LLM chatbot or agent acting as mediator.
LLM-driven responseA response produced by a language model on the fly, rather than from a pre-written script.
MatchThe event of assembling enough participants of the right kinds to fill a chamber's slots.
MediatorThe role of a facilitator participant — sees everything, broadcasts, controls turn-taking.
Non-human participantUmbrella term for the three kinds of non-human entity Carrier supports: LLM chatbots, scripted chatbots, and agents.
Phase scriptAn ordered list of phases for a processor, each with a mode and a transition trigger.
PriorityA numeric ranking among triggers; higher priority fires first when multiple triggers match.
ProcessorThe role that assists composition before a communicator's text becomes a message. Three modes: review, generate, real-time assist.
ResponseThe message a trigger sends when it fires. Either scripted or LLM-driven.
RunOne participant's complete pass through the experiment, from arrival to completion.
Scripted chatbotA rule-driven, deterministic non-human participant. Configured by triggers; produces pre-written messages. Can fill communicator and mediator roles, but not processor.
Scripted responseA pre-written message (or random pick from a list) sent when a trigger fires.
SegmentAn activity within a chamber: a chat, a slide, a survey, a timer, a vote, etc.
SlotA position in a chamber, with a type (human / LLM chatbot / scripted chatbot / agent) and a role (communicator / mediator / processor).
Standalone segmentA segment that occupies the participant's entire screen, as opposed to embedded.
TriggerA condition–response–action rule that governs when a non-human participant speaks or acts.
VariableAn attribute attached to a participant, used for matching, visibility, interpolation, or trigger conditions.
Visibility conditionA condition on a chamber that, if false, causes the participant to skip the chamber.

Appendix B · Type × Role compatibility matrix

Type \ RoleCommunicatorMediatorProcessor
Human
LLM chatbot
Scripted chatbot
Agent (Claude Agent API)

Reproduced from §2.1 for quick reference. The only forbidden combination is scripted chatbot as processor.

Appendix C · Segment types — quick index

TypeWhat participants doEmbeddableAI-compatible
instructionRead formatted instructions, click Continue
slideView a content slide
mediaWatch audio or video
timerWait for countdown
surveyComplete a Survey.js form
inputType a free-text response
selectionChoose one or more options
rankingDrag items into order
chatLive multi-party conversation
taskCustom interactive task
attention-checkSurvey- or camera-based check

Appendix D · Trigger types — quick index

TypeListens forNotes
keywordConfigurable word / phraseMost common.
regexRegular expression matchUse for structured patterns.
timeDelay from chamber / segment startFires regardless of chat activity.
message-countTotal messages in chatroomFires once per matching count.
participant-message-countMessages from a specific participantSupports total / consecutive / since-reset.
sequenceOrdered series of matchesFor staged steering.
participant-actionJoin, leave, idle, etc.Fires across segments.
after-bot-messageAnother bot's messageCross-bot chaining.
event-monitorArbitrary chatroom eventCatches segment transitions, dashboard interventions.
chain-only(Passive) Only fires from chainFor multi-step bot behaviour.
llm-drivenAn LLM judges the conditionMost expressive; least replicable.
periodicFixed intervalMediator-specific.
aggregateN messages in a windowMediator-specific.
topic-detectedTopic / keyword patternMediator-specific.
activity-timeoutInactivity durationMediator-specific.
participant-countActive participant thresholdMediator-specific.
discussion-phaseChamber start / middle / endMediator-specific.

Annotator Documentation

The Annotator is a batch LLM annotation engine for processing text data at scale. Upload a CSV, configure LLM annotators, and download structured results.

What is the Annotator?

The Annotator is a batch LLM annotation engine. Upload a CSV, configure one or more LLM annotators with prompt templates, run the task at scale, and download structured results.

Common use cases include text classification, sentiment analysis, content coding, and replicating published annotation schemes from peer-reviewed research.

Key Concepts

Concept Description
Task Top-level container holding CSV data, LLM configs, and processing settings
Row One CSV record, processed independently
LLM Config A provider + model + prompt template combination
Repetition Running each row through each config multiple times for reliability
Template Reusable annotation configuration that can be shared
Work Unit One row × one config × one repetition = one API call

Your First Annotation Task

Get started in four steps:

1
Upload a CSV with a text column

Your CSV should contain the text you want annotated. Column names become template variables.

2
Add an LLM config with a classification prompt

Choose a provider and model, then write a prompt template using {{columnName}} syntax to reference your data.

3
Run the task

Start processing. The engine sends each row through your LLM config and stores the results.

4
Download results

Export your annotated data as CSV, Excel, or JSON.

Providing API Keys

The Annotator requires API keys for the LLM providers you use: OpenAI, Anthropic, and/or Google.

User-level keys are set in your account settings and reused across all your tasks. Per-task keys can be provided when creating or editing a task and override user-level keys for that task only.

API keys are never visible to collaborators. Each user must provide their own keys.

Upload & Preview CSV Data

Upload a CSV file (max 10 MB). After upload you can preview the headers and the first rows of data. Column names become {{columnName}} template variables for use in your prompt templates.

Configure LLM Annotators

Add one or more LLM configurations to a task. Each configuration specifies a provider (OpenAI, Anthropic, or Google), a model, and prompt templates. You can add multiple configs to compare models or prompt strategies side by side.

Each config supports temperature and maxTokens settings to control response variability and length.

Write Prompt Templates

Each LLM config has a system prompt and a user prompt. Use {{columnName}} syntax to insert values from each CSV row into the prompt.

Tips for effective prompts: request structured output (e.g., JSON or a single label), define clear categories with descriptions, and provide examples of expected classifications in the system prompt.

Set Repetitions

Set between 1 and 20 repetitions per row per config. Multiple repetitions let you measure reliability and use majority voting to determine final labels.

The total number of work units (API calls) is: rows × configs × repetitions.

Estimate Costs

Before running a full task, use the cost estimator. It runs a sample of up to 10 rows, measures the tokens consumed, and extrapolates to give you an estimated cost for the complete task.

Standard Processing

Standard mode streams results in real time using 1–20 parallel workers. Failed requests are retried automatically with exponential backoff. Processing is crash-safe — results are saved per row, so progress is never lost.

Batch Processing

Batch mode uses the OpenAI and Anthropic batch APIs for approximately 50% cost savings with a 24-hour turnaround. Google requests fall back to standard processing automatically.

Batch jobs cannot be paused or resumed. Use standard mode if you need fine-grained control over execution.

Pause, Resume & Cancel

In standard mode, you can pause processing at any time. All completed results are preserved. Resume picks up where you left off. Cancel stops the task permanently but keeps all results that were completed before cancellation.

Pause and resume are only available in standard processing mode. Batch jobs run to completion or can only be cancelled.

Use Research Templates

The Annotator includes 25+ peer-reviewed annotation presets from published research. Select a template to pre-fill your LLM configs with validated prompt designs.

Authors Configs Domain
Gilardi et al. (2023) 7 annotators Text classification
Rathje et al. (2024) 6 annotators Psychological text analysis
Bhatia et al. (2025) 3 annotators Choice dilemma annotation
Bojic et al. (2025) 5 annotators Latent content analysis
Kumar et al. (2026) 4 annotators Empathic communication evaluation

Create Custom Templates

Save any task configuration as a reusable template. Custom templates are private by default and available only to you. They capture the full LLM config including prompts, model settings, and repetition count.

Share Templates

Submit a custom template for public review. An administrator reviews and approves or rejects the submission. Approved templates become available to all users. Usage is tracked so you can see how often your shared templates are being used.

Monitor Progress

A progress bar shows real-time completion status. Each task follows a status lifecycle: pendingprocessingcompleted or cancelled. In standard mode, a paused state is also available.

Download Results

Export results in CSV, Excel, or JSON format. You can download partial results while the task is still running — useful for spot-checking quality before the full run completes.

Understanding Output Format

Results use a flattened format with one row per input record. Columns include all original input data, the rendered prompts, and response columns for each config and repetition combination.

For analysis in R, read the CSV directly with read.csv(). In Python, use pandas.read_csv(). In Excel, open the Excel export for automatic column formatting. Response columns follow the naming pattern [configName]_rep[N].

Carrier Workspace

Carrier Workspace brings your research team's Claude Code activity — session transcripts and shared memory — into one place inside Carrier, so the way your team used AI assistance to build and analyse a study is searchable, reviewable, and preserved alongside the study itself.

What is Carrier Workspace?

When a team uses Claude Code while building an experiment, writing analysis scripts, or preparing materials, each developer accumulates a local history of sessions (the back-and-forth transcripts of their work) and memory (durable notes the assistant keeps about the project). That history normally lives buried in each person's local ~/.claude directory, invisible to the rest of the team.

A workspace collects that data for a single repository and shows it on one page in Carrier. Carrier Workspace is powered by the team-claude-view skill, which provides the small client scripts that package and upload a machine's data, plus a /private command for marking sessions you don't want shared.

A workspace is tied to one repository, and you choose how its data arrives when you create it. There are two modes:

Mode How data arrives
Carrier Workspace mode (default) Each developer's machine packages its local Claude Code cache and uploads it to Carrier with a small Python client. Works with no GitHub repository involved.
GitHub-linked mode Carrier connects to a GitHub repository and pulls the shared data automatically on a schedule. Lowest-effort once set up — nobody runs anything by hand.

Both modes end up in the same place: a workspace page showing sessions and memory.

When a research team needs it

This is a team-tooling feature, separate from running experiments. It does not touch participant data or your experiment configuration — it concerns how your team worked, not what your participants did.

The deeper reason to keep this record is delegation. Empirical research now routinely hands real methodological work to agentic AI: cleaning a dataset, deciding which records to exclude, choosing a transformation, drafting an analysis script, selecting a model specification. Those are not neutral chores — they are methods decisions, and when an agent makes them they tend to vanish the moment the session closes. A workspace turns that delegated work into a durable, shareable record of what was asked, what the assistant decided, and why. Making AI use visible in this way is squarely in the spirit of open science: the same disclosure norms that ask us to share data, code, and pre-registrations extend naturally to disclosing how AI shaped the work.

Transparency also guards against a subtle integrity risk that agentic workflows can introduce without anyone intending it. An assistant pointed at a loosely specified goal — “find the effect,” “get the model to fit,” “clean this up so the result holds” — can quietly explore many exclusion rules, covariate sets, and specifications, then surface only the one that reaches significance. That is the garden-of-forking-paths / researcher-degrees-of-freedom problem, arrived at as unintentional p-hacking rather than deliberate fishing. Because the workspace preserves the full transcript — every fork the agent tried, not just the final answer — you, your collaborators, reviewers, and your future self can tell whether a reported result survived a single principled analysis or emerged after dozens of silent attempts. The record makes the exploration auditable, which is precisely what keeps delegation honest.

Concretely, reach for it when:

  • Reproducibility & provenance. You want a durable record of how AI assistance produced study materials, analysis code, or stimuli — and which analytic decisions were delegated — the kind of provenance a methods section or a replication package benefits from.
  • Research integrity. You want the agent's exploration to be auditable, so a reported effect can be traced back to a principled analysis rather than an opaque search.
  • Onboarding. A new RA or collaborator can read how the project was built rather than starting cold.
  • Coordination. Several people on the team use Claude Code on the same repository and you want a shared, searchable view instead of scattered local histories.

Carrier Workspace mode (default)

This is the default mode. There is no GitHub connection: each developer runs a small Python client (provided by the team-claude-view skill) that bundles their local Claude Code cache and uploads it to Carrier with an API key. The data travels straight from your team's machines to Carrier.

1
Save the API key (shown once)

When you create a Carrier Workspace, Carrier displays an API key exactly once, right after creation. Copy and save it now — it is never recoverable. The creation screen also shows the exact upload URL and a ready-to-paste configure command. If you lose the key, you'll need to recreate the workspace to get a new one.

2
Configure each machine once

On every machine that should contribute data, run the configure command once. It saves the upload URL and key locally so later syncs don't need them. Use the exact --url and --key shown in the create modal:

python3 scripts/team-claude-client/carrier_configure.py --url <upload-url> --key <api-key>
3
Sync

After configuring, push the machine's Claude Code data. The client packages your local sessions/ and memory/ directories into a gzip tarball and uploads it, then prints how many sessions and memory entries were sent:

python3 scripts/team-claude-client/carrier_sync.py
4
Automate with a SessionEnd hook (recommended)

Running the sync by hand is easy to forget. Claude Code's SessionEnd hook in .claude/settings.json fires when a session ends — wire the sync script in so every finished session uploads automatically:

{
  "hooks": {
    "SessionEnd": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "python3 scripts/team-claude-client/carrier_sync.py"
          }
        ]
      }
    ]
  }
}

Adjust the path if your repo lays the script out differently.

GitHub-linked mode

In GitHub-linked mode, Carrier holds a personal access token (PAT) for your repository and uses it to keep a private mirror of the shared Claude Code data up to date — nobody has to run anything by hand. Choose this if your team already publishes shared data to GitHub.

1
Create a fine-grained personal access token

Carrier needs read access to one repository's contents. Create a token at github.com/settings/personal-access-tokens/new:

  • Repository access — scope to the single repository you're linking; don't grant access to all repositories.
  • Repository permissions — set Contents: Read-only. That is the only permission Carrier requires.
  • Expiration — a 90-day expiry balances safety against re-linking too often.

Copy the token when GitHub shows it — you won't be able to see it again. A classic PAT (with the repo scope) also works, but Carrier will show an advisory banner recommending you switch to a fine-grained, single-repo, read-only token.

2
Link the repo in Carrier

From the Workspaces page click Link a repo, choose the GitHub tab, fill in a name, the repository URL (e.g. https://github.com/your-org/your-repo), and paste the PAT, then click Create.

3
What the first sync does

Carrier clones the repository and fetches its claude-team-share branch — the branch your team uses to publish shared data — and reads its sessions/ and memory/ directories into a private server-side mirror. If that branch doesn't exist yet, the workspace simply shows an empty state until it appears; nothing is broken.

Choosing a mode

  Carrier Workspace mode (default) GitHub-linked mode
Auth API key, shown once, bcrypt-hashed Fine-grained PAT (Contents: Read), encrypted at rest
Cold start Create workspace, save key, run carrier_configure.py per machine Create token, paste into the GitHub tab
Updates Run carrier_sync.py (or a SessionEnd hook) after each session Automatic — Carrier polls ~every 30s; Re-sync now for immediate
Works without GitHub Yes — no GitHub repo needed No — requires a repo and the claude-team-share branch
Data path Straight from your machines to Carrier From GitHub to Carrier

Pick Carrier Workspace mode (the default) when you want to keep data flowing through your own machines or have no GitHub repo in the loop. Pick GitHub-linked mode when your team already publishes a claude-team-share branch and you'd rather not run a client by hand.

Browsing sessions & memory

However the data arrives, the workspace page presents it the same way. Open a workspace from the Workspaces page to see two things:

  • Sessions — the Claude Code transcripts contributed to this repository. Open one to read it as a linear transcript of the work.
  • Memory — the durable notes the assistant kept about the project.

In Carrier Workspace mode, every machine that runs the client contributes to the same workspace: Carrier derives the workspace from the repository's root folder name, so all checkouts of the same repository map to one workspace and each machine's contribution is additive.

Keeping data current

How a workspace stays fresh depends on its mode:

  • Carrier Workspace mode — data updates whenever a machine runs carrier_sync.py. The recommended setup is the SessionEnd hook (see setup, step 4), so every finished session uploads on its own.
  • GitHub-linked mode — Carrier polls the repository roughly every 30 seconds and re-syncs stale workspaces automatically as your team pushes new data. Use the Re-sync now button on the workspace page for an immediate refresh.

Privacy & security

  • GitHub PATs are encrypted at rest using AES-256-GCM. Carrier stores the encrypted token, not the plaintext.
  • API keys are bcrypt-hashed. They are shown once at creation and never recoverable — Carrier cannot display or email them again.
  • Marking a session private. The team-claude-view skill provides a /private slash command that marks a session as non-shareable, so it won't be included when your data is shared.
  • Removing a workspace deletes its data. Carrier deletes the server-side mirror and, for GitHub-mode workspaces, the encrypted token along with it.
The scrubber is a convenience, not a guarantee
Before storing data, Carrier runs a scrubber that removes known credential patterns only. It is not a general PII or secrets detector and cannot catch everything. Do not rely on it to sanitise sensitive transcripts — if a session contains something you wouldn't want your team to see, don't share it.

Troubleshooting & FAQ

Carrier Workspace mode (manual upload)

Symptom What it means
401 Unauthorized The API key is wrong or has been rotated. Re-run carrier_configure.py with the correct key for this workspace.
413 Payload Too Large The tarball exceeded the 50 MB upload cap. Trim older sessions from your local cache before syncing again.
400 Bad Request The upload was rejected as malformed — usually a corrupt tarball, or one containing paths outside the allowed sessions/ and memory/ prefixes.

GitHub-linked mode

Symptom What it means
Auth-error banner The PAT expired or was revoked. Create a fresh token and re-link the repo (or update the token on the existing workspace).
Empty state persists The claude-team-share branch doesn't exist on the remote yet. The workspace stays empty until the branch is created and pushed.
Can't read a private repo Make sure the PAT actually has access to that specific repository.

Common questions

How do multiple machines work together? In Carrier Workspace mode, every machine that runs the client contributes to the same workspace — Carrier derives the workspace from the repository root's folder name, so all checkouts of the same repository map to one workspace. Each contribution is additive.

Can I share privately, without GitHub? Yes — that's exactly what Carrier Workspace mode is for. The data travels straight from your machines to Carrier with no GitHub repository in the loop.

How do I rotate a GitHub PAT? Create a new fine-grained token and either re-link the repo or update the token on the existing workspace, then revoke the old token on GitHub.

How do I regenerate an API key? There is no in-place regenerate. Delete the workspace and recreate it to issue a fresh key (shown once), then reconfigure each machine with carrier_configure.py using the new key.

What gets deleted when I remove a workspace? Its server-side mirror of sessions and memory. For GitHub-mode workspaces, the encrypted PAT is deleted as well. After removal there is nothing left server-side for that workspace.