Carrier — A Researcher's Guide
A guide for researchers — psychologists, social scientists, HCI researchers, and study designers — who want to use Carrier to run interactive experiments involving any combination of human participants, LLM chatbots, scripted chatbots, and Claude Agents. Part I introduces the four systems out of which every Carrier experiment is built; Part II covers the day-to-day mechanics of running a study.
Overview
Who this guide is for
This guide is written for researchers — psychologists, social scientists, HCI researchers, and study designers — who want to use Carrier to run interactive experiments involving any combination of human participants, LLM chatbots, scripted chatbots, and Claude Agents.
It assumes you are comfortable thinking about experiments in the usual methodological terms — conditions, manipulations, between- vs. within-subjects designs, counterbalancing, blinding, attention checks — but not that you have any prior experience with the platform or with the engineering vocabulary used internally to describe it.
Whenever Carrier uses a term that has a recognisable counterpart in research methodology, the first time you meet it we will name the counterpart explicitly.
Overview
How to read this guide
The guide has two parts.
Part I — Designing an Experiment is conceptual. It introduces the four systems out of which every Carrier experiment is built. Each chapter begins with the research problem the system solves, then names the building blocks Carrier offers, and finally describes the design decisions you make as a researcher. Short builder walkthroughs (still images and short GIFs) and worked examples accompany each chapter.
Part II — Operating the Platform is operational. It covers the day-to-day mechanics of running a study: activating an experiment, monitoring participants, exporting data, managing accounts. It is short and assumes you have read at least the relevant chapters of Part I.
Appendices at the end provide a glossary, the type-by-role compatibility matrix, and quick-reference indexes for segment types and trigger types.
Overview
The four systems
A Carrier experiment is built out of four interlocking systems. Each answers one of four design questions:
| System | Question | What it gives you |
|---|---|---|
| Chamberlines, chambers, segments | What is the shape of a participant's journey? | A way to specify conditions, group participants, and lay out the activities they move through. |
| Roles | Who takes part, and in what capacity? | Three roles (communicator, mediator, processor) that any human, LLM chatbot, scripted chatbot, or agent can occupy. |
| Variables | What do we know about each participant, and how should that change their journey? | A way of carrying participant attributes through the experiment and using them to filter matches, gate visibility, and personalise instructions. |
| Triggers | How should non-human participants behave? | A rule-based system that defines, for every non-human participant, the conditions under which they speak or act. |
Schematically, they fit together like this:
slots / roles
slots / roles
The systems are introduced in this order — shape, then occupants, then information, then behaviour — because each layer presupposes the one before it. Once you have read all four chapters you can return to each independently.
Overview
A note on terminology
| Carrier term | Closest research counterpart |
|---|---|
| Chamberline | Condition / experimental arm |
| Chamber | A timed grouping of matched participants |
| Segment | An activity / phase within a chamber |
| Slot | A position in a chamber to be filled by a participant |
| Run | One participant's complete pass through the experiment |
| Chatroom | The live instantiation of a chamber for a matched group |
| Variable | An attribute attached to a participant (from a survey, an assignment, or the system) |
| Trigger | A condition–response rule that governs a non-human participant |
These mappings are not strict — see the glossary in Appendix A for nuance — but they will get you most of the way.
Overview
A note on non-human participants
Carrier distinguishes three kinds of non-human participant. The distinction is important because they differ in what they can do, how reproducible they are, and which research designs they serve best.
- LLM chatbots are language-model-driven participants (OpenAI, Anthropic, Google, or any compatible provider) that converse. On each turn the model is given the conversation so far and a system prompt, and it produces a chat message. That message is the entirety of its output. LLM chatbots are open-ended and naturalistic, but they vary from session to session. Use them when you want behaviour that reads and reacts to conversation in an unconstrained way.
- Scripted chatbots are rule-driven participants whose responses are pre-written. A scripted chatbot is configured with a set of triggers (§4) — rules of the form “when keyword X is uttered, send one of these three sentences” — and produces nothing else. Scripted chatbots are deterministic and replayable: the same input sequence produces the same outputs across sessions. Use them when you want behaviour that is reproducible, auditable, and identical across participants.
- Agents are autonomous participants built on Anthropic's Claude Agent API. Unlike a plain LLM chatbot, an agent does not just answer from the conversation history — it has access to a vocabulary of built-in tools it can invoke on its own initiative to gather information or take action: reading files from a configured document area, running commands, browsing the web. The agent chains these tool calls across multiple steps without being asked between each step, and produces its eventual chat message grounded in what it read or computed. Each agent is typically scoped to a particular document area — the study materials, a cited corpus, a specific dataset — and behaves as the experiment's resident expert on that material. Use an agent when you want a non-human participant that retrieves and reasons over a body of material during the conversation, not just one that talks from training-time knowledge.
All three kinds can take any of the three roles described in §2, with one exception: scripted chatbots cannot serve as processors (see §2.5). The relationship between kinds and roles is summarised in the type × role matrix in §2.1 and again in Appendix B.
When this guide says “chatbot” without qualification it means either an LLM chatbot or a scripted chatbot. “Agent” — capitalised or not — means specifically a Claude Agent. “Non-human participant” is the umbrella term that covers all three.
A note on terminology overlap: any LLM-driven participant (an LLM chatbot or an agent acting in a mediator role) can also be configured to emit Carrier intervention actions — disable a participant's input, prompt someone, highlight a message — alongside its chat message. These intervention actions are a Carrier-specific structured-output mechanism, not the same thing as a Claude Agent's built-in tools. §4.5 separates the two carefully; for now, it is enough to know that “tools the agent uses to read files and browse” and “intervention actions a mediator chooses to fire” are different channels.
Part I · §1. Chamberlines, Chambers, and Segments
The three-level shape of a Carrier experiment: chamberlines isolate the condition, chambers isolate the matched group, and segments isolate the activity.
§1 · Chambers
1.1 The shape of a Carrier experiment
Every Carrier experiment has the same nested shape. A participant who opens the experiment URL is assigned to a chamberline — the condition they will experience. Their chamberline is an ordered sequence of chambers, each of which is a small group of participants (human, AI, or both) who are matched once and remain together for the duration of that chamber. Inside each chamber, participants progress through an ordered list of segments — the activities that constitute the chamber, of which a real-time conversation is the most common.
This three-level structure is the backbone of Carrier:
The three levels exist to separate three concerns that are easily entangled when designing an interactive study:
- Chamberlines isolate the condition. Different chamberlines represent different experimental arms; a participant sees exactly one.
- Chambers isolate the matched group. Within a chamber, who you are with does not change.
- Segments isolate the activity. Within a segment, what you are doing does not change.
The remainder of this chapter introduces each level in turn.
§1 · Chambers
1.2 Chamberlines: the unit of condition
1.2.1 Why chamberlines
In a typical lab study you might compare two or three experimental conditions — say, high-anonymity versus low-anonymity discussions of a controversial topic. A chamberline is Carrier's representation of exactly that: a complete journey that one group of participants will take through the experiment. Multiple chamberlines in the same experiment correspond to multiple between-subjects conditions.
A participant is assigned to a single chamberline at the start of their session and does not leave it. Within-subjects comparisons (the same person experiencing two manipulations) are typically built inside a chamberline, by sequencing chambers that differ along the manipulated dimension; between-subjects comparisons (different people in different manipulations) are built across chamberlines.
1.2.2 Assigning participants to chamberlines
Carrier offers four assignment methods, configured at the experiment level:
| Method | What it does | When to use |
|---|---|---|
| Random | Each new participant is allocated to a chamberline uniformly at random. | The default for simple between-subjects studies. |
| Counterbalance | Carrier maintains running counts and assigns each new participant to the currently smallest chamberline. | When you want equal n per condition and cannot wait for the law of large numbers. |
| Survey-based | A field from the global pre-survey is read at assignment time and used to choose the chamberline. | When the condition depends on a participant attribute that they declare themselves (e.g. native language, political identification). |
| Fixed | All participants are placed in the same named chamberline regardless of anything else. | Pilots and demonstrations; reproducing exactly one condition. |
Survey-based assignment is the most flexible. The global pre-survey runs before chamberline assignment, so any response collected there is available as a routing variable; this is the same mechanism described in §3.
1.2.3 What lives on a chamberline
A chamberline is, formally, a name plus an ordered list of chambers plus optional assignment criteria. There is no further configuration at this level — chamberlines are intentionally thin, so that a researcher can read the shape of an experiment by scanning the chamberline names and their chamber sequences in the builder.
The researcher opens the Builder, clicks “Add chamberline” twice, names the new chamberlines “High Anonymity” and “Low Anonymity”, and sets the experiment-level assignment method to “Random”. The two chamberlines appear side by side in the experiment-level outline pane, each ready to receive chambers.
§1 · Chambers
1.3 Chambers: the unit of matched group
1.3.1 Why chambers
A chamber is the basic unit of togetherness in Carrier. Once a participant enters a chamber, they are matched with the other occupants of that chamber and they stay together until the chamber ends. Within a chamber, the cast does not change.
This is a deliberate constraint. Many interactive studies depend on participants having a stable conversational partner across multiple tasks — a discussion followed by a joint ranking, for example, or a chat followed by a rating of the other person. In Carrier, those sequential activities belong in the same chamber and share its participants. Crossing a chamber boundary, by contrast, dissolves the group: the next chamber re-matches its occupants from the pool of participants who have reached that point.
1.3.2 Matching at the chamber boundary
Matching happens once, at the start of each chamber. Participants who finish the previous chamber (or, for the first chamber in a chamberline, who have completed the global pre-survey) enter a waiting pool. As soon as enough participants are present to fill the chamber's required slots, the chamber begins.
A slot is a description of the kind of participant the chamber needs. Each slot has a type (human, LLM chatbot, scripted chatbot, or agent) and a role (communicator, mediator, or processor; see §2). For human slots, matching can additionally require certain variable values — for example, that the chamber contain one self-identified novice and one self-identified expert. Variable-based matching is the topic of §3.
When a matching attempt does not assemble enough participants within a configured interval, Carrier applies a chamber-level fallback policy. The fallback is part of the chamber's configuration; common choices are to keep the participant waiting, to fill the missing slot with a default agent, or to end the run gracefully with a completion code. The choice is the researcher's, not the participant's.
1.3.3 What lives on a chamber
A chamber carries several pieces of configuration:
- A name and identifier, used in the dashboard and in exported data.
- A communication channel:
text,audio, orvideo. The channel determines what the chat segment looks like and what data is recorded (transcripts, audio files, recorded video, or any combination). - A slot definition, listing the roles to be filled and the type each slot expects.
- An ordered list of segments, described in §1.4.
- An optional pre-survey shown to each participant before they enter the chamber, and an optional post-survey shown after they leave.
- A maximum participant count — the total number of slots.
Chamber pre- and post-surveys are distinct from the experiment's global pre- and post-surveys. The global surveys run once per participant, at the very beginning and end of the run; the chamber surveys run once per chamber. Researchers typically use the global surveys for demographics and consent, and the chamber surveys for state measures that need to be taken before and after each manipulation.
The researcher selects a chamber inside the “High Anonymity” chamberline, renames it to “Deliberation”, sets the communication channel to “text”, adds two human communicator slots and one LLM mediator slot, and attaches a brief chamber pre-survey containing a single 7-point trust item. The chamber summary updates to show three slots and one survey.
§1 · Chambers
1.4 Segments: the unit of activity
1.4.1 Why segments
A chamber's segments are its inner timeline: the participants are already matched, and now they move together through an ordered series of activities. Each segment is a single, self-contained activity — showing a slide, holding a conversation, voting on options, ranking items, watching a video, or completing a short embedded survey — with its own timing and transition rules.
Segments are deliberately fine-grained. A “thirty-minute deliberation” study in Carrier is usually not a single thirty-minute chat segment but a sequence: a slide introducing the topic, a timer giving participants a moment to think, a chat segment for the deliberation itself, a ranking segment to record the group's collective answer, and a short survey at the end. Each piece is configured separately, recorded separately, and can be skipped, repeated, or replaced without touching the others.
1.4.2 The catalogue of segment types
| Type | What participants do | Compatible with AI |
|---|---|---|
instruction | Read formatted instructions and click Continue. | — |
slide | View a static or dynamic content slide. | — |
media | Watch a video or listen to an audio clip. | — |
timer | Wait for a countdown to elapse (often used between activities). | — |
survey | Complete a short embedded survey (Survey.js form). | — |
input | Type a free-text response into a single-question prompt. | — |
selection | Choose one or more options from a list (multiple choice / voting). | ✓ |
ranking | Drag items into preferred order. | ✓ |
chat | Hold a real-time conversation with the other chamber occupants. | ✓ |
task | Complete a custom interactive task defined by the experiment. | ✓ |
attention-check | Pass a survey-based or camera-based attention check. | — |
The “Compatible with AI” column indicates whether the segment can include contributions from any of the three non-human kinds — LLM chatbots, scripted chatbots, or agents. The conversational and choice-based types can; the read/listen/wait/attention types cannot, because there is nothing for a non-human participant to do.
1.4.3 The chat segment as a special case
Of the eleven segment types, the chat segment is the most elaborate. It is the only segment that:
- Hosts a live, multi-party exchange among all participants in the chamber simultaneously.
- Supports the full type-by-role matrix: any participant in the chamber — human, LLM chatbot, scripted chatbot, or agent — can be a communicator, a mediator, or a processor inside a chat segment (subject to the one exclusion in §2.1).
- Can carry embedded child segments (see §1.4.4).
- Drives the bulk of the trigger system: most of the trigger types described in §4 (keyword, message-count, sequence, after-bot-message, and so on) are evaluated against a chat segment's message stream.
For these reasons, the chat segment carries the most configuration. Its parameters include the communication channel inherited from the chamber, a per-message length cap, optional reaction support, participant-level send permissions, and the duration and transition mode of the chat itself.
When this guide refers to “the conversation” without further qualification, it means the messages exchanged inside a chat segment.
1.4.4 Embedded vs. standalone segments
Most segment types occupy the participant's entire screen for the duration of the segment. We call these standalone segments: they take their turn in the chamber's timeline, run to completion (or timeout), and then yield to the next segment.
Most non-chat segment types can additionally be configured to run embedded inside a chat segment. An embedded segment is rendered as an overlay on top of an ongoing chat, so that participants can act on it — vote, rank, write, watch a clip, read an instruction — without leaving the conversation. The chat continues to record messages in the background, and the embedded child appears only when its start trigger fires.
Embedded display is supported for the selection, ranking,
input, task, slide, instruction,
timer, media, and survey segment types. Two
segment types are excluded: chat (it is the parent container; you
cannot embed a chat inside a chat) and attention-check (no embedded
renderer — attention checks always run standalone). On screens narrower than
900 px the runtime additionally falls back to standalone rendering for
every segment, so the layout never breaks on mobile.
Embedded segments are useful when the activity is part of the conversation rather than an interruption to it. Two examples:
- Periodic polling during a deliberation. Ask participants to vote at three points (after one minute, after three minutes, after five minutes) while they continue talking. Each vote is a separate embedded
selectionsegment with a different embedded-start trigger. - Pacing a ranking activity to chat progress. Show a ranking overlay only once the conversation has produced enough material to rank — for example after a fixed number of chat messages, or after a fixed time offset from the start of the chat.
An embedded segment adds three configuration parameters beyond its standalone equivalent:
| Parameter | What it does |
|---|---|
| Embedded start | When the overlay first appears, relative to the parent chat: immediately, after N seconds, after N chat messages, or after the previous embedded sibling ends (chained, with an optional delay). |
| Embedded stop | When the overlay closes: as soon as the participant submits / clicks Next (the default), or after a hard N-second timeout. |
| Embedded completion behaviour | What happens when the embedded child finishes: dismiss it (continue chatting), end the parent chat, lock the overlay so it cannot be reopened, or minimise it as a badge. |
A chat segment with one or more embedded children may use an additional transition mode, embedded-complete, which ends the chat when every embedded child has completed. This is the canonical way to build a chat segment whose end is gated on the group having voted (or ranked, or read the instruction), rather than on a fixed duration. An optional fallback timeout on the chat itself is a safety net for chains whose start triggers might never fire — for instance an after N messages trigger if the participants never reach that message count.
For an attention-check segment (or any activity that needs the
participant's complete attention), keep the display mode at standalone
— the chamber timeline yields to it the same way it would for a survey or
instruction.
The researcher selects a chat segment in a chamber, adds a selection segment immediately after it in the segment timeline, and changes the selection segment's display mode to “Embedded (overlay on chat)”. They set the embedded start to “after 60 seconds” and the completion behaviour to “minimise”. The selection segment now appears in the timeline as an indented child of the chat, with a small “embedded” badge.
§1 · Chambers
1.5 Timing, transitions, and participant pacing
Every segment has its own timing and transition rules, which together determine how long participants spend on it and how they advance. These are the four parameters that matter most:
- Duration — the maximum time the segment may run. If left unset, the segment has no automatic deadline.
- Minimum duration — the earliest moment at which a participant may advance. This is the standard way to enforce a floor on engagement (a “read for at least 30 seconds before continuing” instruction slide, for example).
- Warning time — how long before an auto-advance the participant is warned. Useful to prevent surprise transitions in long segments.
- Transition mode — the rule for moving on:
| Mode | Description |
|---|---|
| Auto | The segment advances on its own when the duration elapses. |
| Manual | Each participant advances when they click Continue. |
| Sync | The segment advances only when every participant in the chamber is ready. Keeps the group in lock-step. |
| Host | The experimenter advances the segment from the dashboard. |
| Embedded-complete | (Chat segments only) Advances when every embedded child has completed. |
The pacing choice has substantive consequences. A sync transition gives participants the experience of a shared rhythm, but it also means that the slowest participant determines the group's pace, which can be frustrating in long studies. A manual transition lets each participant move at their own speed, but it can break the group character of a chamber if used for the chat segment itself. A host transition is most useful during pilot testing — the experimenter can step the group through the timeline by hand to debug pacing.
§1 · Chambers
1.6 Worked example: a two-condition deliberation study
To make the structure concrete, here is a complete shape for a small study comparing deliberation under high versus low anonymity.
Experiment
- Global pre-survey: demographics, consent, a political-identification scale.
- Chamberline assignment: Random.
Chamberline A — “High anonymity”
- Chamber A1 — Briefing. One human slot. Segments: an
instructionsegment with the study brief; asurveysegment measuring baseline opinion on the discussion topic. - Chamber A2 — Deliberation. Three human slots. Segments: a 30-second
timer(a “settle in” pause); a 10-minutechatsegment using anonymous display names; an embeddedrankingchild of the chat segment, triggered after 5 minutes, in which the group ranks five policy options. - Chamber A3 — Debrief. One human slot. Segments: a
surveysegment measuring post-deliberation opinion and group satisfaction.
Chamberline B — “Low anonymity”
Identical to A, except that the chat segment in chamber B2 displays each participant's first name and a chosen avatar.
Global post-survey
A short reflection on the discussion, plus a payment code.
In this shape, the chamberlines isolate the manipulation (anonymity), the chambers isolate the matched groups (one trio per chamber), and the segments isolate the activities (instruction, survey, chat, ranking). The same group of three participants moves together through chamber A2's segments because matching happens once at the start of A2 and not again until A3 begins.
Researcher: this is a fictional placeholder. We will substitute one of your real studies here when you provide it.
§1 · Chambers
1.7 Common pitfalls
A handful of design mistakes recur often enough to be worth flagging.
- Putting two unrelated activities in the same chamber. Because participants are matched once per chamber, a chamber should contain only activities that benefit from sharing the same cast. If two activities do not need to share participants, they belong in separate chambers — possibly in the same chamberline, possibly not.
- Confusing chamber surveys with global surveys. Chamber pre/post-surveys run every time the chamber is entered; experiment-level global surveys run once per participant. State measures (mood, trust, fatigue) typically belong in chamber surveys; trait measures (demographics, personality) in the global pre-survey.
- Long, structureless chat segments. A common reflex is to make the chat segment thirty minutes long with no embedded structure. This makes participant pacing harder to control and complicates the analysis. Breaking the deliberation into a short framing slide, a chat with one or two embedded voting children, and a closing reflection survey gives you both finer-grained timing control and richer data.
- Choosing sync for solo activities. A sync transition only makes sense if there is more than one participant in the chamber. For solo activities (instruction reading, individual surveys), prefer auto or manual.
- Random chamberline assignment when n is small. With fewer than roughly thirty participants per condition, random assignment can produce noticeable imbalance. Prefer counterbalance for small-n studies.
Roles — who occupies the slots defined here and what they can do inside a chamber — are the subject of §2. The variables that drive chamberline assignment and slot matching are the subject of §3.
Part I · §2. Roles: Communicator, Mediator, Processor
Three orthogonal roles — communicate, facilitate, assist composition — each occupying a distinct zone of the participant's screen. Any type (human, LLM chatbot, scripted chatbot, agent) can fill any role, with one exception.
§2 · Roles
2.1 Why roles exist
Chapter 1 defined the shape of an experiment but not its cast. A chamber declares a list of slots, each of which awaits a participant; the question this chapter answers is what a participant who fills a slot can actually do.
Carrier separates that question into two orthogonal axes:
- The type of a participant — what they are. Four types: a real human, an LLM chatbot (language-model-driven, chat only), a scripted chatbot (rule-driven, chat only), or an agent (an autonomous Claude Agent with built-in tools for reading documents, running code, and browsing the web — see the note on non-human participants).
- The role of a participant — how they take part. Three roles: a communicator, a mediator, or a processor.
The four kinds of participant differ in two practical respects — how they produce what they say, and how reproducible they are across sessions:
| Kind | How it produces output | Reproducibility |
|---|---|---|
| Human | The person types | Whatever the person does |
| LLM chatbot | An LLM is called once per turn with the conversation history; output is a chat message or silence | Variable across sessions |
| Scripted chatbot | Pre-written rules fire when their trigger conditions match | Identical across sessions |
| Agent | The Claude Agent loops between LLM calls and built-in tool calls (read files, run code, browse) until it decides it is ready to speak, then produces a chat message grounded in what it found | Variable across sessions |
The type × role matrix is the most important table in this chapter, because almost every design decision below either depends on it or is constrained by it:
| Type \ Role | Communicator | Mediator | Processor |
|---|---|---|---|
| Human | ✓ | ✓ | ✓ |
| LLM chatbot | ✓ | ✓ | ✓ |
| Scripted chatbot | ✓ | ✓ | — |
| Agent | ✓ | ✓ | ✓ |
The only forbidden combination is scripted chatbot as processor. The reason is technical but worth knowing: processors operate by reading drafts, generating suggestions, or interrupting composition — activities that demand the kind of open-ended language understanding only a human or a language model can provide. A pre-scripted rule set has no business being a processor.
Everything else is supported. That makes it possible to write one experimental design and instantiate the same role with a human in one condition and a language model in another — which is the single most important affordance Carrier offers for studies that compare human and machine behaviour. It also means that the choice between an LLM chatbot, a scripted chatbot, and an agent can itself be the manipulation: same role, same instructions, three different kinds of non-human partner — one talking from training-time knowledge, one talking from a written rulebook, one talking from documents it has just read.
§2 · Roles
2.2 The spatial model: three zones in the interface
A useful way to keep the roles separate in your mind is to remember that each occupies a distinct zone of the participant's screen during a chat segment:
This is more than visual hygiene. Mediators broadcast; communicators converse; processors assist composition before words enter the conversation. The separation of channel is what makes it possible to study facilitation, conversation, and composition assistance independently of one another — or to combine them deliberately, knowing that the layers do not bleed into each other.
§2 · Roles
2.3 Communicator
2.3.1 Framing
The communicator is the primary interactive participant. Whatever the experiment ultimately studies, communicators are the ones doing the studied behaviour. All communicators — whether a human, an LLM chatbot, a scripted chatbot, or an agent — operate in the same message space: their messages appear in the main chat area alongside each other in the order they were sent.
The animating design principle is interaction parity. A human communicator and any non-human communicator send messages through the same mechanism, appear in the same UI, and are indistinguishable to other participants unless the researcher explicitly marks them otherwise. This is what makes it possible to run human–human, human–machine, and machine–machine conditions of the same design without rebuilding the experiment.
2.3.2 The communicator design surface
Three dimensions of configuration matter for any communicator. They are independent: settings on one dimension do not constrain settings on the others.
Identity. Who appears in the chat, by what name, and with what disclosure?
| Aspect | Choices | What you decide |
|---|---|---|
| Source of identity | User-provided · Configured · Auto-generated | Whether the participant chooses their own display name, you pre-set it, or the platform invents one. |
| Visibility | Visible · Hidden | Whether the communicator appears in the participant list at all. |
| Type disclosure | Disclosed · Blinded | Whether other participants are told that this communicator is human or AI. |
Human communicators typically go through an identity-setup flow (choose a display name, pick an avatar) before entering the experiment. Non-human communicators — LLM chatbots, scripted chatbots, and agents alike — carry pre-configured identities. The separation makes blinding possible: a participant cannot tell from the interface alone whether a fellow communicator is human or a machine.
Input control. When can the communicator speak, and who decides?
| Aspect | Choices | What you decide |
|---|---|---|
| Initial state | Enabled · Delayed · Conditional | Whether the communicator can send messages from the moment the chat begins. |
| Enable conditions | Time-based · Message-count · Bot-trigger · Participant-message | What event lifts a delay or unlocks input. |
| External control | None · Mediator-controlled | Whether a mediator can disable or enable this communicator's input dynamically during the chat. |
Carrier's chat input is not simply on or off. A communicator might begin with input disabled, wait for three messages from other participants, and then become enabled. Or a mediator (see §2.4) might disable and re-enable input on the fly to enforce turn-taking. Building experimental conditions out of these primitives is how you produce interventions like simultaneous discussion vs. sequential discussion.
Message capabilities. What kinds of messages can be sent and acted on?
| Aspect | Choices | What you decide |
|---|---|---|
| Content types | Text · Media (audio/video) | What the communicator can attach to a message. |
| Reactions | Enabled · Disabled | Whether emoji reactions are available to the communicators. |
| Reporting | Enabled · Disabled | Whether a participant can flag a message for the experimenter. |
2.3.3 Communicator subtypes at a glance
The four communicator subtypes correspond to the four types in the matrix:
- Human communicator. A real participant joining via browser. They enter the matching queue, are matched into a chamber, and join its chatroom. The platform tracks their socket connection with a heartbeat; on disconnect they can be reconnected within the session. All messages, survey responses, timestamps, and activity events are recorded.
- LLM-chatbot communicator. Configured by provider (OpenAI / Anthropic / Google / compatible), model, system prompt, temperature, and response logic (when to speak, when to stay silent — see §4.3). The model is given the conversation so far on each turn and produces a chat message (or stays silent). LLM chatbots never enter the matching queue: once the human slots in a chamber are filled, they are spawned into the chatroom automatically. The platform supports multi-step LLM chains for advanced configurations (a generation step, then a critique step, then a rewrite step).
- Scripted-chatbot communicator. Configured by a set of triggers (§4). Like LLM chatbots, scripted chatbots do not enter the matching queue — they are spawned into the chatroom after human slots are filled. Their behaviour is deterministic and replayable: the same input sequence produces the same outputs across sessions, which makes them the right choice for confederate roles and any design in which conversational reproducibility matters.
- Agent communicator. A Claude Agent (Anthropic) scoped to a particular document area — typically the study materials, a reference corpus, or a configured dataset. Before producing each message, the agent's underlying model loops over its built-in tools (file reading, code execution, web browsing) to look things up, run small computations, or check a citation. The message it eventually sends is grounded in what it has retrieved. Agent communicators are the natural choice when the experiment wants a conversational partner that can answer with evidence — a study-materials expert that quotes the brief verbatim, a fact-checker that can browse during the discussion, a domain assistant that can re-read the dataset before stating a number.
2.3.4 Research uses of the communicator role
A short, indicative list of designs that map cleanly onto different communicator configurations:
| Design | Communicator configuration | What it studies |
|---|---|---|
| Group discussion | 2+ human communicators | Opinion formation, group dynamics, polarisation |
| Human–AI dyad | 1 human + 1 LLM communicator | Trust, persuasion, perception of machine partners |
| Confederated AI | 1 human + N LLM communicators, blinded | Conformity, majority influence |
| Turn-taking study | Humans with delayed input control | Sequential vs. simultaneous discussion |
| Agent-to-agent comparison | 2 LLM communicators with different system prompts | Model behaviour under controlled prompting |
§2 · Roles
2.4 Mediator
2.4.1 Framing
The mediator is a facilitator. Unlike a communicator, a mediator does not converse on equal footing with the others — they orchestrate the conversation. The qualitative differences are four:
- Universal visibility. A mediator sees every message in the chat, regardless of who it was addressed to.
- Distinct delivery. A mediator's output appears as styled announcements at the top of the chat, not as chat bubbles.
- Control capabilities. A mediator can act on the chat — disable a communicator's input, prompt a specific participant, highlight a message — not only speak into it.
- Event awareness. A mediator reacts to aggregate patterns (number of messages, time elapsed, idle participants) at least as readily as to individual messages.
These four capabilities together describe the moderator / facilitator / researcher dynamic that has no analogue in plain group chat.
2.4.2 The broadcast system
Mediator messages are called broadcasts. A broadcast is styled along three independent axes:
| Axis | Choices | Effect |
|---|---|---|
| Style | Facilitator · Announcement · System | Icon and tone of voice in the rendered banner. |
| Priority | Normal · Important · Urgent | Visual emphasis and how long the banner stays before auto-dismissing. |
| Persistence | Dismissible · Persistent | Whether the participant can dismiss the banner. |
Broadcasts can also be targeted: at every participant in the chamber, at communicators only, at a specific named participant, or at all participants in a specific role. This is the mechanism by which a mediator can deliver a private prompt to one communicator without the rest of the group seeing it.
2.4.3 Facilitation actions (LLM-driven mediators)
When the mediator is driven by a language model — either an LLM chatbot or a Claude Agent — it has access to an action vocabulary that goes beyond plain broadcasting. These five intervention actions are produced as part of the model's structured response on each turn; the most consequential is the first.
A mediator implemented as a scripted chatbot can fire the same actions, but their parameters must be baked into the trigger configuration ahead of time (§4.5); the scripted bot cannot choose the action contextually based on what was just said.
Note that this intervention-action vocabulary — disable_chat,
enable_chat, prompt_participant,
highlight_message, request_attention — is distinct
from the built-in tools of a Claude Agent (file reading, code
execution, web browsing). They live on different channels:
- The intervention actions are how a mediator acts on the chat — they affect what participants see and what they are allowed to do.
- An agent's built-in tools are how it gathers information for itself — they affect what the agent knows when it speaks, but the participants see only the eventual message.
A Claude Agent acting as mediator has access to both: it can read its configured document area before responding, and fire intervention actions alongside its broadcast. An LLM chatbot acting as mediator has only the intervention actions; a scripted chatbot acting as mediator has only the pre-baked variants.
| Action | Target | Description |
|---|---|---|
disable_chat | A specific communicator, or all | Temporarily prevents the target from sending messages. The release is governed by a set of conditions described below. |
enable_chat | A specific communicator, or all | Explicitly lifts a disable. Immediate. |
prompt_participant | A specific communicator | Sends a private encouragement or prompt visible only to that participant. |
highlight_message | A specific message | Marks a past message as highlighted in the chat for a configurable duration. |
request_attention | A specific communicator | Triggers a visual or audio cue to draw the participant's attention. |
The disable_chat action supports composite release conditions
— a list of conditions combined with an any / all
connector that determines when the disable lifts. The available conditions are:
- Timeout — after a fixed duration.
- All others responded — when every other communicator has sent a message of at least a configured minimum length.
- Message count — after a fixed total number of messages have been sent in the chat.
- Keyword mentioned — when a designated keyword (or any from a list) is uttered by any, or a specific, participant.
- Participant message — when a designated participant has sent a configured number of messages.
- Mediator release — released only by an explicit subsequent
enable_chat. - Segment change — released when the chamber transitions to its next segment.
Composite release conditions are the building blocks for richly specified
turn-taking protocols. “Wait until everyone else has responded, or sixty
seconds, whichever comes first” is a single disable_chat action
with two conditions and an any connector.
2.4.4 Mediator-specific triggers
Mediators inherit the standard trigger system (§4), but five extra trigger types are particularly suited to facilitation:
| Trigger | When it fires | Typical use |
|---|---|---|
| Periodic | At regular intervals after the chat begins | Recurring summaries, scheduled check-ins. |
| Aggregate | After N messages have accumulated within a time window | Batched synthesis or pattern detection. |
| Topic-detected | When a designated keyword pattern appears | Topic steering, off-topic detection. |
| Activity-timeout | When no messages have been sent for N milliseconds | Idle prompts, participation encouragement. |
| Participant-count | When the number of active participants crosses a threshold | Reacting to departures, waiting for arrivals. |
These trigger types, combined with the action vocabulary above, are what make automated AI facilitation in Carrier expressive: a mediator can be configured to periodically summarise the discussion every two minutes, prompt any participant who has been silent for ninety seconds, or steer the conversation back on topic when a specified keyword has not appeared in the last thirty messages.
2.4.5 Activity monitoring
A mediator can optionally maintain an activity model of each communicator. The activity-monitor settings are:
- An idle threshold (milliseconds of inactivity before a participant is considered idle), and a list of idle prompts to send when it is reached.
- An active threshold (messages-per-minute rate at which a participant is considered to be dominating), and a list of active prompts to send when it is reached.
This is the building block for participation-equity interventions: a mediator that automatically prompts quiet members and gently invites dominant members to “make space for others”.
2.4.6 Research uses of the mediator role
| Design | Mediator configuration | What it studies |
|---|---|---|
| Automated facilitator | LLM mediator with periodic + topic triggers | Effectiveness of automated facilitation. |
| Turn-taking enforcement | LLM mediator using disable_chat with all_others_responded | Effects of structured discussion on quality. |
| Participation equity | LLM mediator with activity monitoring and idle prompts | Interventions on balanced participation. |
| Discussion steering | LLM mediator with topic-detected triggers | Topic-management strategies. |
| Human facilitator | Human in mediator role, broadcast capability | Expert facilitation patterns. |
| Timed interventions | Scripted mediator with periodic broadcasts | Information-injection effects. |
| Real-time summarisation | LLM mediator with aggregate triggers and a synthesis prompt | Impact of real-time summaries on deliberation. |
§2 · Roles
2.5 Processor
2.5.1 Framing
The processor is the most novel of the three roles. It operates in the input composition space rather than the message exchange space — that is, it acts before a communicator's text becomes a message in the chat. Where a mediator sits over the conversation and a communicator sits inside it, a processor sits alongside the input box, reviewing what the communicator is about to send, generating drafts on request, or offering live suggestions as the communicator types.
The role exists because the act of composing a message is a distinct site of intervention — distinct from facilitating the conversation, and distinct from participating in it. A study that wants to ask “what happens when an AI helps people write what they say?” needs a place to put that AI, and that place is not the chat.
2.5.2 The design space, in three dimensions
It is tempting to think of processors in terms of “review” and “generate” alone, but the design space is richer. Three independent dimensions structure it.
Initiation. Who starts an interaction?
- Communicator-initiated — the communicator explicitly asks (submits a draft for review, clicks Generate).
- Processor-initiated — the processor offers help without being asked (sends a suggestion).
- System-initiated — the platform triggers an interaction based on an event (a pause is detected, a timer elapses).
Control. Who controls the final output?
- Communicator retains control — the communicator always decides what is actually sent (accept / reject / edit the processor's output).
- Processor retains control — the processor decides what the communicator sees (filtering, rewriting).
- Shared control — both can edit; the final version is negotiated.
Timing. When does the interaction happen?
- Pre-send — before the message enters the chat (review, approval).
- During composition — while the communicator is typing (real-time suggestions).
- On-demand — when explicitly requested (a Generate button).
Not every combination of these dimensions is feasible. LLM API latency makes true simultaneous co-editing impractical for LLM processors; processor-controlled rewriting risks undermining the validity of self-report studies by replacing the communicator's voice with the processor's. Carrier's design therefore commits to one principle and offers three concrete modes.
The committed principle: communicator agency. Whatever the processor does, the communicator retains final control over what enters the chat. The processor never bypasses the communicator's agency.
2.5.3 The three modes
Review
Communicator writes draft → submits for review → processor gives feedback
↓
Communicator accepts / edits / rejects → message sent to chat
| Dimension | Setting |
|---|---|
| Initiation | Communicator-initiated (on-submit) or system-initiated (pause-triggered) |
| Control | Communicator |
| Timing | Pre-send |
Configuration parameters:
- Trigger — on-submit (the communicator clicks Send and the processor steps in) or pause-triggered (the processor steps in automatically after the communicator stops typing for a configured period).
- Pause timeout — the inactivity window, in milliseconds, used only when the trigger is pause-triggered.
- Feedback format — freeform (the processor writes a free-text response), inline-edit (the processor proposes an edited version of the draft), or approve-reject (the processor returns a binary verdict).
- Mandatory — whether the communicator must address the feedback before they can send the message.
- Max rounds — a cap on the number of review iterations on a single draft.
Research applications include writing-quality improvement, self-reflection, metacognitive scaffolding, and peer-review dynamics.
Generate
Communicator clicks Generate → processor creates a draft
↓
Communicator edits → message sent to chat
| Dimension | Setting |
|---|---|
| Initiation | Communicator-initiated (explicit request) |
| Control | Communicator (edits before sending) |
| Timing | On-demand |
Research applications include AI-ghostwriting perception, co-authoring dynamics, and the trade-off between generation quality and editing effort.
Real-time assist (human processors only)
Communicator types → the draft streams to a human processor
↓
Processor sends ephemeral suggestions → communicator accepts or dismisses
| Dimension | Setting |
|---|---|
| Initiation | System-initiated (continuous streaming) |
| Control | Communicator (suggestions are ephemeral) |
| Timing | During composition |
This mode is restricted to human processors because LLM round-trip latency makes truly live suggestions impractical. The use cases — peer coaching, expertise-based assistance, live mentoring during composition — all assume a human partner at the other end.
2.5.4 Phase scripts: dynamic processor behaviour
A processor does not have to behave the same way throughout a chamber. Carrier supports phase scripts — an ordered list of phases, each with a mode and a transition trigger that advances to the next phase.
phases: [
{ id: "warmup", mode: "disabled", transition: { on-start } }
{ id: "review", mode: "review", transition: { message-count: 5 } }
{ id: "generate", mode: "generate", transition: { time-elapsed: 180000 } }
]
The processor in this example starts disabled, switches into review mode once five messages have been exchanged, and switches into generate mode three minutes after the segment starts. The supported transition trigger types are:
| Trigger | When the next phase activates |
|---|---|
| On-start | Immediately when the segment begins. |
| Message-count | After N messages have been sent in the chatroom. |
| Time-elapsed | After N milliseconds from segment start. |
| Keyword | When a designated keyword appears in the chat. |
| Participant-event | When a participant joins or leaves. |
| Manual | When the experimenter advances from the dashboard. |
| On-end | When the segment ends. |
Phase scripts are how Carrier supports designs like progressive scaffolding (start with heavy review, then withdraw assistance over time) or adaptive intervention (switch modes once a quality threshold has been crossed).
2.5.5 Context configuration
The amount of conversation a processor can see is configurable:
| Level | What the processor sees |
|---|---|
| None | Only the current draft; no chat history. |
| Partial | The last N messages (configurable). |
| Full | The complete chat history of the chamber so far. |
For LLM processors, the context level controls what is sent to the model with each call. For human processors, it controls the contents of a read-only chat panel beside the suggestion interface. Context level interacts with privacy considerations: a research design that does not want the processor influenced by — or able to see — earlier conversation should set context to none.
2.5.6 Research uses of the processor role
| Design | Processor configuration | What it studies |
|---|---|---|
| AI writing coach | LLM processor, review mode, freeform feedback | Improvement of writing quality through AI feedback. |
| Mandatory review | LLM processor, review mode, mandatory: true | Effects of forced reflection on output. |
| AI ghostwriter | LLM processor, generate mode | Authorship perception, AI-assisted communication. |
| Peer review | Human processor, review mode | Peer-feedback dynamics. |
| Live coaching | Human processor, real-time assist mode | Expert scaffolding during composition. |
| Progressive scaffolding | Phase script: disabled → review → generate | Effects of withdrawing assistance over time. |
| Mode comparison | Two chamberlines: one review, one generate | Trade-off between reviewing and generating. |
§2 · Roles
2.6 How the roles interact
The three roles are designed to be orthogonal — to operate in different spatial zones, on different inputs, with different output channels — but they can be present in the same chamber and combined in principled ways.
Communicator ↔ mediator. Asymmetric. The mediator sees everything; the communicator does not see the mediator's view. The mediator can broadcast to communicators, disable or enable their input, prompt them privately, and highlight their messages. This asymmetry is intentional: it models the moderator dynamic.
Communicator ↔ processor. Collaborative but communicator-controlled. The processor sees drafts; it sends feedback, suggestions, or generated text; the communicator decides what actually gets sent. The processor never bypasses the communicator.
Mediator ↔ processor. Orthogonal. Mediators act in the broadcast/control layer; processors act in the composition layer. They do not directly interact, but their effects can be coordinated — a mediator might broadcast that “the next message will be reviewed” at the same trigger point at which a processor switches from disabled to review mode.
Four representative configurations to keep in mind:
| Configuration | Roles present | Research scenario |
|---|---|---|
| Communicators only | 2+ communicators | Standard group discussion. |
| Communicators + mediator | N communicators + 1 mediator | Facilitated discussion. |
| Communicators + processor | N communicators + 1 processor | Assisted composition. |
| Full setup | N communicators + mediator + processor | Facilitated discussion with composition assistance. |
§2 · Roles
2.7 Design principles, distilled
Six principles run through the role system. They are summarised here for reference.
- Separation of concerns. Each role occupies a distinct spatial zone (chat, broadcast, input) and serves a distinct function (communicate, facilitate, assist). The separation is what enables clean experimental contrasts.
- Communicator agency. The communicator always retains final control over what they say. No mediator or processor can speak in their name.
- Type orthogonality. Any participant type that is compatible with a role can fill it. This is what makes human–AI comparisons trivial to set up.
- Phase-based dynamism. Both mediators (via triggers) and processors (via phase scripts) can change behaviour during a session, enabling within-session manipulations and progressive designs.
- Comprehensive logging. All messages, broadcasts, mediator actions, and processor interactions are recorded with metadata. This is what makes the resulting data analysable.
- Configuration hierarchy. Global → chamber → segment overrides → event-based control. Defaults at the top, specificity at the bottom.
§2 · Roles
2.8 Builder walkthroughs
The researcher opens a chamber, clicks “Add slot”, picks the mediator role and the LLM chatbot type, and selects a provider, model, and system prompt from the configuration panel. They enable two periodic triggers (“summarise every 2 minutes”) and one topic-detected trigger (“steer back on topic when X has not appeared in 30 messages”). The chamber's slot list now shows a mediator with two triggers attached.
The researcher adds a processor slot to a chamber, opens the phase-script editor, and creates three phases: disabled until segment start, review until five messages, generate until three minutes. They set the review feedback format to inline-edit and mark it non-mandatory. The slot now displays a small timeline summarising the three phases.
§2 · Roles
2.9 Worked example: a human–AI co-writing study
A small study comparing two ways an AI can help people write persuasive messages.
Experiment
- Global pre-survey: demographics, writing-confidence scale, target-topic baseline opinion.
- Chamberline assignment: random.
Chamberline A — “AI as reviewer”
- Chamber A1 — Briefing. One human communicator slot. Segments: instruction, baseline survey.
- Chamber A2 — Persuasive writing. One human communicator + one LLM processor, configured in review mode with inline-edit feedback and a maximum of two review rounds per message. Segments: a 12-minute chat segment in which the human composes three messages directed at a fictional audience.
- Chamber A3 — Debrief. One human slot. Segments: post-task survey.
Chamberline B — “AI as generator”
Identical to A, except that in chamber B2 the LLM processor is configured in generate mode. The communicator clicks Generate to receive an initial draft, edits it, and sends.
Global post-survey
Reflection on the experience, ownership/authorship attributions, payment code.
In this design, the role system is doing the lion's share of the work. The chamberlines, chambers, and segments are simple; the manipulation lives entirely in the configuration of the processor.
Researcher: fictional placeholder. We will substitute a real study when you provide one.
§2 · Roles
2.10 Common pitfalls
- Conflating type and role. A common reflex is to say “we want a bot in this chamber” without specifying what role it plays. “Bot” is a type label; the chamber still needs to know whether it is a communicator, a mediator, or — for LLM chatbots and agents, but not scripted chatbots — a processor. The three roles cannot be substituted for one another.
- Putting a scripted chatbot in the processor slot. The matrix forbids it, and the builder will refuse to save the configuration. If your design calls for deterministic assistance during composition, use a human processor with a strict script of allowed feedback responses, or an LLM-chatbot processor with a temperature of zero — but not a scripted chatbot.
- Mediators that participate. It is tempting to use the mediator as a “knowledgeable participant” that joins the conversation as a peer. Resist this. If the entity is meant to participate on equal footing, make it a communicator. The mediator's affordances — universal visibility, broadcasts, control actions — only make sense for a non-peer facilitator.
- Real-time-assist with a non-human processor. This combination is unsupported for both LLM chatbots and agents, and for good reason: LLM round-trip latency is too high for live token-by-token suggestion (and an agent's tool-loop makes it slower still). If you need real-time assistance, place a human in the processor slot.
- Over-configuring the disable_chat action. Composite release conditions are powerful but easy to over-specify; a
disable_chatwith five conditions tied together with all may never release at all. Pilot every novel turn-taking protocol with a small group and watch the action log before scaling up. - Forgetting that non-human communicators do not enter the matching queue. A chamber that requires three communicators, of which one is an LLM chatbot, only needs to match two humans before it begins; the chatbot is spawned automatically. Researchers who set maxParticipants equal to the human count alone are sometimes surprised by who shows up.
Variables — the attributes that travel with each participant and decide which slots they can fill, which chambers they see, and what each activity says to them — are the subject of §3.
Part I · §3. Variables
Variables are how a Carrier experiment becomes personal. They constrain matching, gate visibility, inject context into prompts, and drive triggers.
§3 · Variables
3.1 Why variables exist
By the end of Chapter 2 we have a design that can place a researcher's chosen mixture of humans, LLM chatbots, scripted chatbots, and agents inside a sequence of chambers, with each role doing what the design expects. That design is, however, still impersonal: it treats every participant identically. The same chamberline, the same slots, the same instructions, the same prompts — applied to every person who walks into the experiment.
Variables are how a Carrier experiment becomes personal. A variable is an attribute attached to a participant — their self-identified expertise, the political party they support, the score they obtained on the pre-survey, the option they ranked first in the last chamber — that is then available to the rest of the experiment to consult. Variables let you:
- Constrain matching. Require that the matched group contain (say) one self-identified novice and one self-identified expert.
- Gate visibility. Show a remedial chamber only to participants whose pre-test score was below threshold; skip the chamber for everyone else.
- Inject context into instructions and prompts. Address each participant by their chosen name; brief an LLM mediator on the political identification of each person in the room.
- Drive triggers (§4). Fire a bot's response only when the variable matches a condition.
In conventional methodological language, variables are how Carrier represents the experimentally-relevant measured and manipulated characteristics of each participant. Once captured, they travel with the participant for the rest of the run.
§3 · Variables
3.2 The anatomy of a variable
A variable is defined once, at the experiment level, and then referenced from anywhere in the experiment. Each definition has the following parts:
| Part | What it specifies |
|---|---|
| Key | The short identifier used to reference the variable (e.g. expertise, party_id, pretest_score). |
| Type | The kind of value the variable can take: string, number, boolean, single-choice, multi-choice. |
| Source | Where the value comes from — a survey response, the result of a segment, a system-assigned value, or an aggregate over other participants. |
| Options (for choice types) | The list of allowable values, optionally each with a display label and a numeric value (see 3.2.2). |
| Default | The value used when the source produces nothing. |
Variables are addressed in the rest of the experiment by their key, prefixed with
var. — for example, var.expertise or
var.party_id. The prefix exists so that the system can tell, when
reading a configuration, that a string is referring to a variable rather than to a
literal.
3.2.1 Where variable values come from
A variable's value is computed from a source specification. The five common sources are:
- Survey response. The most common source. Read the value of a named question from one of the participant's completed surveys — the global pre-survey, a chamber pre- or post-survey, or any embedded
surveysegment. Survey-based variables are evaluated as soon as the survey is submitted, which means they are available for chamberline assignment, slot matching, and chamber visibility from that point forward. - Segment submission. Read what the participant submitted during a particular segment — the option they chose in a
selectionsegment, the order they produced in arankingsegment, the text they entered in aninputsegment. - Segment data. Read data captured about a segment — how long the participant spent, how many messages they sent, whether they passed an attention check.
- System-assigned. Read a value the platform sets automatically — the chamberline the participant was assigned to, their participant ID, the run identifier.
- Aggregate. Compute a value across multiple participants — the mean of all communicators' pre-test scores in the same chamber, for example, or the modal political identification of the group. Aggregates are how a configuration can talk about the group, not just the individual.
A source can be qualified by a participant reference: by default a variable is read
from the participant who is being configured (referred to internally as self),
but it can also be read from another slot in the chamber (slot:1,
slot:2, or slot:mediator). This is what makes it possible
to address a fellow participant in an instruction or in a system prompt — for
example: “you are about to chat with
{slot:1:var.first_name}, who identifies as a
{slot:1:var.party_id} voter.”
3.2.2 Per-option numeric values
A choice-type variable is, by default, a categorical label — “novice” or “expert”, “Democrat” or “Republican”. When a numerical contrast is also useful, each option in the variable's definition can carry an associated numeric value.
A typical case is a five-point self-rating scale. The variable's options might be labelled Very low / Low / Moderate / High / Very high, with numeric values 1 / 2 / 3 / 4 / 5. The label is what the participant sees and what is logged in raw form; the numeric value is what the configuration arithmetic uses when computing aggregates or threshold conditions. A condition like “this chamber only appears if the participant's expertise is ≥ 3” is expressed against the numeric value, not the label.
Numeric values are optional. A variable without them behaves as a pure categorical attribute.
3.2.3 Aggregate variables across non-human participants
When an aggregate variable is computed across the participants of a chamber, the researcher can choose whether non-human participants (LLM chatbots, scripted chatbots, and agents) are included or excluded from the aggregate. This setting matters in two common situations:
- Confederated designs (one human, several non-human communicators) — usually the researcher wants aggregates to reflect humans only, since the non-humans are confederates.
- Group-property analyses (e.g. mean expertise of the room) — the researcher chooses whether non-human participants carry an expertise value at all, and whether it contributes to the mean.
The default is to include only humans in aggregates; the alternative is configured on each aggregate variable definition. The same humans-only / include-all toggle applies regardless of which of the three non-human kinds is present in the chamber.
§3 · Variables
3.3 Variables as matching constraints
The first place a variable does work is during matching. There are three places in the configuration where a variable can constrain who is matched into what:
3.3.1 Chamberline eligibility
A chamberline can declare a filter — a condition expressed against variables that a participant must satisfy in order to be eligible for that chamberline. When the experiment's chamberline-assignment method is survey-based (see §1.2.2), the filter is the mechanism by which the assignment is made:
Chamberline "Pro-disclosure" filter: var.party_id == "Democrat"
Chamberline "Anti-disclosure" filter: var.party_id == "Republican"
A participant whose var.party_id is Democrat will be eligible
for the first chamberline and not the second; the inverse holds for
Republican participants. Participants who do not satisfy any chamberline
filter are routed to a default chamberline if one exists, or terminated gracefully
otherwise.
3.3.2 Slot requirements
Inside a chamber, each slot can declare a list of required properties — conditions that the participant filling the slot must satisfy. A two-slot chamber with the slot constraints
Slot 1 human communicator requires: var.expertise <= 2
Slot 2 human communicator requires: var.expertise >= 4
will pair each session into one novice and one expert. Matching does not begin for this chamber until two participants are waiting who jointly satisfy the slot constraints — a queued participant whose expertise is 2 can be matched into Slot 1, and a queued participant whose expertise is 5 can be matched into Slot 2, but two novices cannot fill the chamber on their own.
Slot requirements can reference any variable that has been resolved by the time the chamber begins. In practice this means anything captured in the global pre-survey, anything assigned by the system, and anything from earlier chambers in the participant's run.
3.3.3 Group-composition constraints
Slot-by-slot constraints are sometimes too weak to express a desired group property. Where the slot constraints care only about who fills each seat, a group-composition constraint cares about a property of the matched set as a whole. Typical examples:
- The chamber requires a gender-balanced trio — at least one male, at least one female, no constraint on the third seat.
- The chamber requires that the modal political identification of the trio is Democrat.
- The chamber requires that no two members share the same self-reported expertise level.
Group-composition constraints are expressed against aggregate variables (3.2.3) and are evaluated against the candidate set during matching. A candidate matching is admitted only if the corresponding aggregate, computed over the candidates, satisfies the constraint.
§3 · Variables
3.4 Variables as visibility gates
A second use of variables is to gate what a participant sees, not just who they see it with. A chamber can declare a visibility condition against one or more variables. The chamber is included in the participant's run only if the condition is true at the moment the chamberline would otherwise enter that chamber.
This is the canonical mechanism for branching designs:
- A practice chamber that is shown only to participants whose pre-test score was below threshold.
- A debriefing chamber for the high-anonymity condition that does not exist in the low-anonymity condition (an alternative to two parallel chamberlines, useful when most of the experiment is identical between conditions).
- A post-task survey that is shown only to participants who actually completed the preceding chat (skipped for those whose chamber was terminated early).
Visibility conditions take the same form as slot constraints: a boolean expression
against variable values, with the standard comparison operators and
any / all connectors. Crucially, visibility is evaluated
just before the chamber would begin, not once at the start of the run.
This means that variables produced during the run — by an earlier
chamber's segment submission, by a chamber post-survey — are usable as
gating conditions for later chambers.
When a chamber is gated out for a participant, the participant skips directly to the next chamber in their chamberline. The chamberline itself does not change.
3.4.1 Slot requirements vs visibility conditions
A natural question at this point: a chamber whose every slot has a
requires constraint already excludes participants who cannot fill any
seat — they will never be matched. Why, then, does Carrier also offer a
separate visibility condition at the chamber level? Don't they do
the same job?
They overlap in one specific case and diverge in the others. The rule that governs the overlap is:
Implicit skip rule. If every human slot in a chamber has a non-empty
requiresconstraint, and the participant satisfies no slot's constraint, then the chamber is silently skipped for that participant — exactly as if it had a failing visibility condition.
The skip is implicit: there is no separate visibility expression to write. The chamber is treated as not part of the participant's run, and they advance to the next chamber. This is what you observed.
Three scenarios make the distinction concrete.
Scenario A — Single fully-constrained slot. A
solo-participant chamber whose one human slot requires
var.condition == "treatment". A control-condition participant cannot
fill it; the implicit skip rule fires; the participant moves to the next chamber.
Here, slot requirements alone are sufficient — adding a chamber-level
visibility condition would be redundant.
Scenario B — Mixed open and constrained slots. A two-slot
chamber where one slot requires var.role == "expert" and the other
slot has no constraint:
Slot 1 human communicator requires: var.role == "expert"
Slot 2 human communicator requires: (none)
A novice participant can fill Slot 2, so the implicit skip rule does
not fire — the chamber is visible to them, and they will
join it paired with an expert. If you intended this — “novices and
experts meet here, with the expert always seated in Slot 1” — that
is the correct behaviour and you should add nothing further. But if you intended
the chamber to exist only for experts (with Slot 2 reserved for another expert
who happens to be unconstrained because of how you defined the slot), the implicit
skip rule will not save you. A novice will silently enter. To get a true gate here,
add a chamber-level visibility condition var.role == "expert". Slot
requirements cannot express this on their own.
Scenario C — Chamber-level gate plus slot-level role assignment. A debriefing chamber that should exist only for participants who completed a treatment chamber upstream, and within that chamber should pair a “writer” participant with a “reviewer” participant:
Chamber "Debrief"
visible if: var.completed_treatment == true
Slot 1 human communicator requires: var.debrief_role == "writer"
Slot 2 human communicator requires: var.debrief_role == "reviewer"
Both layers are necessary. The chamber-level visibility expresses who belongs
in this chamber at all; the slot requirements express which seat each
participant takes once they are here. Trying to collapse the gate into a slot
constraint (for example, by adding var.completed_treatment == true to
both slots' requires) works only by coincidence — the implicit
skip rule fires because every slot is constrained — and it conflates
two distinct intents. If a future edit relaxes Slot 2's requires
to allow walk-ins, the chamber suddenly becomes visible to participants who never
completed the treatment, and the gate is silently gone.
This last point is the deeper reason the two systems are kept separate. Visibility conditions express intent explicitly; slot requirements express it emergently. When the only way a chamber is skipped is “all slots happen to be constrained and the participant happens to satisfy none,” that skip is a side effect of a configuration that was written for a different reason. Edits to the slots — adding a slot, opening one up, retitling roles — can quietly remove the gate. An explicit visibility condition survives those edits and is auditable as a routing rule on its own.
A compact way to choose between them:
| You want… | Reach for |
|---|---|
| A chamber that exists only when seat-level requirements alone suffice to exclude the wrong participants, and you do not anticipate slot edits relaxing this. | Slot requirements alone. |
| A chamber that should be hidden for some participants even though other participants would still find an open seat in it. | A chamber visibility condition. |
| A chamber that gates a population and role-assigns within that population. | Both: visibility condition for the gate, slot requirements for the seating. |
| A routing rule that should be self-documenting and resilient to slot edits. | A chamber visibility condition (in addition to whatever slot requirements you also want). |
In short: when slot requirements happen to act as a gate, treat that as a convenient side effect, not as the gate itself. If the chamber is meant to be conditional on a participant property, say so with a visibility condition.
§3 · Variables
3.5 Variables in instructions and prompts
The third use of variables is the most pervasive: interpolation into the natural-language content shown to or used by participants. Any place in the experiment where text is shown to a participant or sent to an AI is also a place where variables can be injected.
The interpolation syntax is {{var.<key>}} for a variable read
from the current participant, and {{slot:<n>:var.<key>}} or
{{slot:<role>:var.<key>}} for a variable read from a fellow
slot. A few worked examples:
Instruction segment text
Welcome, {{var.first_name}}. In the next ten minutes you will discuss
climate policy with two other participants. The participant on your
left has expertise level "{{slot:1:var.expertise_label}}"; the
participant on your right has expertise level
"{{slot:2:var.expertise_label}}".
System prompt for an LLM mediator
You are facilitating a discussion among three participants. Their
self-identified political positions are: {{slot:1:var.party_label}},
{{slot:2:var.party_label}}, and {{slot:3:var.party_label}}. Adjust
your tone to be welcoming to all three positions; do not take sides.
Survey question stem
Earlier you said your most important consideration was
"{{var.top_value}}". On the slider below, indicate how strongly you
still feel that this is your most important consideration.
Two things to note. First, interpolation happens at the moment the text is
needed — the same instruction segment used in a chamber that appears
twice will be re-interpolated each time, with the latest variable values. Second,
an undefined variable interpolates to the empty string by default, but the
variable definition's default field can be used to specify a fallback.
Where it matters, configure a default.
3.5.1 Per-option text vs per-option value in interpolation
Recall that a choice-type variable can carry both a label and a numeric value for each option (3.2.2). When interpolating into text, two conventions matter:
{{var.expertise}}interpolates the key of the chosen option (“low”, “moderate”, “high”).{{var.expertise_label}}interpolates the display label of the chosen option (“Low”, “Moderate”, “High”).{{var.expertise_value}}interpolates the numeric value of the chosen option (e.g. 2, 3, 4).
For a five-point Likert variable, the three forms give you the same information at three different levels of formality. Choose by context: instructions to participants benefit from the display label; an LLM system prompt typically benefits from the numeric value or the key.
§3 · Variables
3.6 Variables in trigger conditions
A fourth use of variables is in trigger conditions
(§4): a bot's keyword trigger, time
trigger, or message-count trigger can be qualified by a check on a variable. This
makes triggers conditional on participant attributes — a bot might only
respond to keyword “X” if slot:1:var.condition is
“treatment”, for example. The full mechanics of trigger conditions are
covered in §4.2; here it is enough to know
that the same var.<key> references work inside trigger
conditions as work elsewhere.
§3 · Variables
3.7 Builder walkthroughs
The researcher opens the Variables tab at the experiment level, clicks “Add variable”, and creates a expertise variable of type single-choice with five options. They label the options “Very low”, “Low”, “Moderate”, “High”, “Very high” and assign numeric values 1 through 5. They set the source to a question on the global pre-survey, then save. The new variable now appears in the variable list with three computed aliases — var.expertise, var.expertise_label, var.expertise_value — available throughout the experiment.
In a chamber configuration, the researcher opens Slot 1's settings, adds a required-property rule “var.expertise_value <= 2”, and saves. They do the same for Slot 2 with “var.expertise_value >= 4”. The chamber summary panel now displays a small badge — “novice + expert pair” — generated from the constraint pattern.
The researcher selects a remedial chamber in a chamberline, opens its visibility settings, and adds the rule “var.pretest_score < 0.5”. They save. The chamber is now visually marked as conditional in the chamberline outline, with a small icon indicating the visibility condition.
§3 · Variables
3.8 Worked example: matched-pair deliberation with branching debrief
A study that pairs each novice with an expert for a short discussion, with a different debriefing depending on the discussion outcome.
Variables defined at the experiment level
| Key | Type | Source |
|---|---|---|
first_name | string | Global pre-survey: “What name would you like to be called?” |
expertise | single-choice (5 options, numeric 1–5) | Global pre-survey: “How would you rate your own expertise on this topic?” |
chose_collaborative | boolean | Segment submission from chamber B's selection segment |
Chamberline (only one — between-subjects manipulation is unused here)
- Chamber A — Briefing. One human slot, no constraints. Segments: instruction, pre-survey.
- Chamber B — Discussion. Two human slots. Slot 1:
var.expertise <= 2(novice). Slot 2:var.expertise >= 4(expert). Segments: a 10-minute chat segment, then a selection segment in which both participants choose between “collaborative” and “competitive” approaches. - Chamber C1 — Debrief: collaborative. One human slot. Visibility:
var.chose_collaborative == true. Segments: a tailored debrief survey. - Chamber C2 — Debrief: competitive. One human slot. Visibility:
var.chose_collaborative == false. Segments: a different tailored debrief survey.
The instruction segment in Chamber B reads:
Welcome, {{var.first_name}}. You're about to discuss with one other
participant. They have rated their own expertise as
"{{slot:other:var.expertise_label}}".
In this design, variables are doing three different kinds of work: matching constraints (Chamber B), instruction interpolation (the briefing text), and visibility branching (Chambers C1 vs. C2). All three are expressed against the same variable system.
Researcher: fictional placeholder. Substitute a real study when you provide one.
§3 · Variables
3.9 Common pitfalls
- Variables that haven't been collected yet. A slot constraint or visibility condition that references a variable whose source has not yet fired is unsatisfiable. Plan the timeline: a constraint on
var.expertisecan be evaluated after the pre-survey has been submitted, not before. If you find a chamber timing out unexpectedly, check whether the variable it depends on has been produced. - Numeric versus label interpolation confusion. It is easy to write
{{var.expertise}}when you meant{{var.expertise_label}}. The former interpolates the option key (“high”), which may not be what you want to show participants. When in doubt, prefer…_labelfor participant-facing text and…_valuefor arithmetic conditions and aggregates. - Aggregates including bots when you wanted humans only. The default is humans-only, but it is overrideable. If a group-composition constraint behaves strangely in a chamber that contains non-human confederates, check whether the aggregate is computing across all participants or only over humans.
- Visibility conditions evaluated against stale variables. A variable produced in a chamber is available after that chamber completes. A visibility condition on a chamber B that depends on a variable produced in chamber A only works because A is earlier than B in the chamberline. Reordering the chambers — moving A after B — silently breaks the gate.
- Default values that hide problems. Setting a default on every variable will keep your experiment from terminating on a missing value, but it can also mask a genuine bug (a survey question whose response was never recorded). For variables that drive matching or visibility, prefer to not set a default, and let the run terminate noisily, until you are confident the source is reliable.
- Mixing identifier capitalisation. Variable keys are case-sensitive.
var.partyID,var.partyId, andvar.party_idare three different references. Pick one casing convention at the start of the experiment and apply it consistently.
Triggers — the rule system that drives the behaviour of every non-human participant in the experiment, the final piece of the four-system picture — are the subject of §4.
Part I · §4. Triggers
The rule system that drives every non-human participant's behaviour. A trigger combines a condition, a response, and optionally an action.
§4 · Triggers
4.1 Why triggers exist
The previous three chapters answered the questions what shape is the experiment, who is in it, and what do we know about them. They left one question open: how do the non-human participants behave? A scripted chatbot, an LLM chatbot, and an agent are all just slots in a chamber until something tells them when to speak and what to say.
That “something” is the trigger system. A trigger is a rule that combines:
- A condition — what has to be true before the trigger fires.
- A response — what the bot says if the trigger fires.
- Optionally, an action — what else the bot does if the trigger fires (disable a participant's input, prompt a specific person, transition the chamber, and so on).
You can think of a non-human participant's configuration as a book of rules: an ordered list of triggers, each with a condition–response–action triple. When something happens in the chamber, Carrier consults the book in order, evaluates the conditions against the current state of the chat, and fires the rules that match. This applies equally to deterministic scripted chatbots (whose responses are pre-written sentences), to LLM chatbots (whose responses are generated by a language model on the fly), and to agents (whose responses are generated by a language model that has consulted its tools first).
A useful analogy from the methods literature: a trigger is to an interactive bot what a confederate script is to a confederate participant — a list of contingent rules that say “when X happens, do Y”. Carrier's contribution is to make the list expressive enough to capture the contingencies real conversations contain, and machine-readable enough to be replayed identically across sessions.
§4 · Triggers
4.2 The trigger model: condition → response → action
A single trigger has the structure:
The next three sections expand each of these three parts. The modifiers are covered together in §4.6.
§4 · Triggers
4.3 Conditions: when a trigger fires
The condition of a trigger is the part most worth understanding well, because it determines whether the rule ever runs. Carrier supports a catalogue of condition types organised by what they listen for: message content, time, message counts, sequences, participant events, and aggregate states.
4.3.1 The catalogue of condition types
| Type | Listens for | Typical use |
|---|---|---|
keyword | A configurable word or phrase in a chat message | Respond when “climate” is mentioned; greet on “hello”. |
regex | A regular expression match in a chat message | Recognise URLs, profanity, structured statements like “I disagree with X”. |
time | A delay (ms) from chamber or segment start | Send an opening prompt 5 s in; remind participants of the time after 8 min. |
message-count | A total count of messages in the chatroom | Intervene every N messages; introduce a summarisation at message 20. |
participant-message-count | A count of messages from a specific participant | Detect that one person has dominated; reward an under-contributing participant. |
sequence | An ordered series of keyword matches | “First X is said, then Y” — useful for staged conversation steering. |
participant-action | A participant event such as join, leave, idle | Greet on join; flag a drop-out; chain into a backup-bot prompt. |
after-bot-message | A specified bot has just sent a message | Cross-bot chaining; staged multi-bot interactions. |
event-monitor | An arbitrary chatroom event | Catch dashboard interventions, segment transitions, embedded-child completions. |
chain-only | (Passive) Only fires when another trigger chains to it | Building multi-step responses. |
llm-driven | The trigger asks an LLM to decide whether it should fire and what to say | Open-ended judgement triggers — see §4.4.2. |
periodic | (Mediator only) Fires at a fixed interval | Regular check-ins, repeated summaries. |
aggregate | (Mediator only) Fires after N messages have accumulated within a time window | Batched synthesis. |
topic-detected | (Mediator only) Fires when a topic pattern is detected | Topic steering. |
activity-timeout | (Mediator only) Fires after a period of inactivity | Idle prompting. |
participant-count | (Mediator only) Fires when the active participant count crosses a threshold | Reacting to departures or arrivals. |
discussion-phase | (Mediator only) Fires at the start, middle, or end of a chamber | Phase-appropriate facilitation. |
Two practical notes:
- Most of these conditions presuppose a chat segment. Keyword, regex, message-count, sequence, and the mediator-only message conditions all listen to the conversation; if the chamber is currently in a slide or a survey segment, they are silent. Time and participant-action conditions, by contrast, can fire from any segment.
- The mediator-only types (
periodic,aggregate,topic-detected,activity-timeout,participant-count,discussion-phase) are enforced as mediator-only by both the builder and the runtime. The builder hides them from non-mediator agents' trigger pickers, and the runtime only initialises and evaluates them for bots withrole === 'mediator'. Attaching one of these types to a communicator or processor (e.g., by hand-editing the JSON) will silently never fire.
4.3.2 Modifiers shared by all condition types
Every trigger, whatever its condition type, can be qualified by a small set of modifiers:
| Modifier | Meaning |
|---|---|
| Case sensitivity | For keyword and regex conditions, whether matching is case-sensitive (default: insensitive). |
| Match mode | For multi-value conditions, whether any of the values is sufficient (the default) or all are required. |
| Sender filter | Restrict matching to messages sent by humans, by a specific participant, or by a participant of a specific role. The default is to consider every sender. |
| Variable filter | A boolean expression over var.* references; the trigger does not fire unless the expression is satisfied. This is how a trigger becomes condition-dependent — for example, “only fire this prompt if the current speaker's var.condition == 'treatment'”. |
The variable filter is especially powerful in combination with §3: a single set of triggers, attached to a single bot template, can produce qualitatively different behaviours in different chamberlines purely on the basis of the variables that are true of the participants in front of it.
4.3.3 An aside: triggers, segments, and the “active segment” filter
Most trigger configurations include an active-segments filter — a list of segment IDs (within the chamber) during which the trigger is eligible to fire. Leaving the list empty means fire in any segment. Restricting it to a specific segment — typically a chat segment — is the canonical way of writing “this rule only applies during the deliberation, not during the survey at the end”. The filter also accepts embedded child segments, which is how a trigger can be made to fire only while a particular embedded vote overlay is open.
This is a small detail, but it is the source of a very common pitfall: a trigger that “doesn't seem to fire” is often a trigger whose active-segment filter excludes the segment the chamber is actually in.
§4 · Triggers
4.4 Responses: what the bot says
When a trigger's condition fires, the response tells Carrier what message to send. There are two fundamentally different kinds of response.
4.4.1 Scripted responses
A scripted response is a pre-written sentence (or a list of sentences from which Carrier picks one at random). It is the only kind of response a scripted chatbot can produce, and it is also available to LLM chatbots and agents for cases in which the researcher wants exact-text control.
Configuration is minimal:
- Message — a single fixed string, or a list from which the platform draws (uniformly, by default).
- Delay — an interval before the message is actually sent, simulating typing or thought.
- Probability — the chance that the trigger fires at all when its condition is satisfied (default 1.0). Used when researcher wants stochastic intervention.
Scripted responses are deterministic: identical inputs produce identical outputs. Two participants who encounter the same conversation in the same condition will receive the same scripted response, with the same delay, in the same order. This makes them the right choice for studies in which experimental control matters more than naturalism — confederated-AI conformity studies, attention-check probes, scripted “noise” injections, and so on.
4.4.2 LLM-generated responses
The alternative is to let an LLM generate the response on the fly. A language-model response is produced by sending a request to a language-model provider (OpenAI, Anthropic, Google, or a compatible service) with a prompt assembled from:
- The system prompt of the bot (its personality, role, instructions).
- A configurable amount of chat history as context (none, the last N messages, or the full history — see §2.5.5 for the analogous setting on processors).
- Variable interpolations (§3.5) — the participant's name, condition assignment, current state.
- Optional chain steps — a sequence of LLM calls in which each step's output becomes part of the next step's input. Chains are how a single trigger can implement plan, then critique, then rewrite behaviour.
The response of an LLM-driven trigger is therefore open-ended: the model can produce any text that satisfies its prompt, modified each time by the current state of the conversation. Where scripted responses trade variation for control, LLM responses trade control for naturalism. They are the right choice for studies in which the realism of the bot's behaviour is itself part of what is being tested.
A note on the response format: Carrier's LLM responses follow a structured JSON
shape with three fields — content (the message text, or
null to remain silent), rationale (a brief justification,
logged but not shown), and an optional actions list (the topic of
§4.5). The structure exists so that the AI
can decide “remain silent” without producing a blank message, and so
that researchers can audit why the AI chose to speak after the fact.
4.4.3 Choosing between scripted and LLM responses
A short heuristic that holds in most studies:
| If you want… | Prefer… | Because… |
|---|---|---|
| Exact replicability across sessions | Scripted | Same input → same output, always. |
| Naturalistic, varied bot behaviour | LLM | The model adapts to what the participant actually said. |
| A confederate that does not break character | Scripted | LLMs can drift; scripted text cannot. |
| A facilitator that responds to topics in real conversation | LLM | Researcher cannot pre-write every possible response. |
| Strict auditing of bot speech | Scripted | The set of possible utterances is finite and visible. |
The two are not mutually exclusive on a single bot. A bot template can mix scripted triggers (for greetings, attention checks, and exit messages) with LLM triggers (for the bulk of the discussion), routing each contingency through whichever response type fits best.
4.4.4 Response logic and silence
For LLM chatbots and agents, two additional configuration parameters govern when not to speak. They sit alongside the trigger list and are sometimes more important than it:
triggerOnFirstMessage— whether the bot is allowed to make the first move (greet, open the discussion) before any human has spoken.respondToEveryMessage— whether the bot should attempt to respond after every incoming message, or only when one of its triggers explicitly says so.respondOnMention/mentionKeywords— whether the bot only responds when its name (or a configured list of keywords) appears in the chat.initialSalute— a configured opening message sent on chamber start, regardless of triggers.timeoutTrigger— a “fail-safe” prompt the bot sends after a configured silence interval if no other trigger has fired.
These five toggles, together with the trigger list, are how a researcher tells an LLM chatbot or agent the difference between a talkative configuration (“respond whenever spoken to, and also if no one has spoken for a minute”) and a reserved one (“only respond when explicitly addressed by name”).
§4 · Triggers
4.5 Actions: what else the bot does
A trigger's action field — present on every trigger, optional in
most cases, central for mediators — is the side effect that fires alongside
(or instead of) a response message. Actions are what allow a bot to do something
to the chat rather than just in the chat.
Carrier exposes actions through four distinct mechanisms, which are easy to confuse but do different things. Read this list once before the subsections below:
| Mechanism | Available to | What it does | Decided when? |
|---|---|---|---|
| Scripted actions | Scripted chatbots, and any LLM bot whose trigger has a baked-in action | Apply a Carrier intervention (disable chat, prompt, highlight, …) with parameters fixed in the trigger configuration | At configuration time |
| LLM-chosen Carrier actions | LLM chatbots and agents acting in a mediator role | The model emits a structured response whose actions field selects which interventions to fire and with what parameters | At each turn, by the model |
| Segment-submission actions | All non-human participants (scripted or LLM) configured as communicators in an interactive segment | Submit a vote, a ranking, or a free-text answer alongside humans — optionally counting toward the segment's completion | At configuration time, with the submitted value optionally resolved per session |
| Agent built-in tools | Agents only (Claude Agent API) | The Claude Agent autonomously reads files, runs commands, or browses the web to gather information for itself before producing its message | Across multiple internal steps within a single turn, by the agent |
The first two are about the bot acting on the chamber — they affect what participants see and what they are allowed to do. The third is about the bot acting as a participant within an interactive segment — its submission joins the humans' in the segment's results. The fourth is about the agent informing itself — it affects what the agent knows when it speaks, but its only externally visible output is still the eventual chat message. Sections 4.5.1 to 4.5.4 cover each in turn; 4.5.5 helps choose among them.
4.5.1 Scripted actions
A scripted action is an unambiguous instruction baked into the trigger configuration: “when this rule fires, disable participant X's input until either 60 s have passed or all other participants have responded, whichever comes first.” The action is part of the rule. The bot's job is to fire the trigger; the action's effect on the chat is deterministic, predictable, and visible in the configuration before the experiment runs.
Carrier's primary catalogue of scripted-action types — drawn from §2.4.3 — is:
disable_chat— temporarily prevent a participant from sending messages, with a composite release condition.enable_chat— explicitly re-enable input.prompt_participant— send a private prompt visible only to a specific participant.highlight_message— visually highlight a past message for a configurable duration.request_attention— trigger a visual or auditory cue at a specific participant.
There are also a small number of chamber-level scripted actions used less frequently:
advance_segment— force the chamber to move to its next segment immediately.terminate_chamber— end the chamber early.set_variable— set a variable on a participant or on the chamber, useful for cascading state into later chambers' visibility conditions (§3.4).
Scripted actions are how a confederated-chatbot conformity study can be made literally identical across participants: the rules say “disable the human's input for 30 s after the first confederate's message” or “highlight the confederate's response in green for 5 s”, and the chat then unfolds with the same scaffolding in every session.
4.5.2 LLM-chosen Carrier actions
The alternative — and the more powerful in open-ended designs — is to
let the language model itself decide which Carrier interventions to fire. This is
the mechanism by which an LLM-driven mediator can act contextually: if the
model judges that one participant has been quiet for too long, it can choose to
issue a prompt_participant; if it judges that the conversation has
drifted off-topic, it can choose to send a styled broadcast.
The rule does not pre-specify the action's parameters. Instead, the trigger fires an LLM call (with the bot's system prompt and chat context), and the model returns a structured response of the form
{
"content": "<broadcast text, or null>",
"rationale": "<one-sentence justification, logged>",
"actions": [ { "type": "prompt_participant",
"target": "slot:2",
"message": "What do you think?" },
{ "type": "disable_chat",
"target": "slot:1",
"release_conditions": { ... } } ]
}
The model's selection of actions is constrained by the bot's configured action
vocabulary (researchers can choose to expose only a subset of action types to the
model) and by the bot's role: a communicator's action vocabulary is small (no
disable_chat); a mediator's is large. This mechanism is available to
any LLM-driven mediator — both LLM chatbots and agents acting in a mediator
role.
This is the most expressive — and also the least replicable — corner of Carrier intervention. The trade-off is real: an LLM-chosen action set gives you a facilitator that adapts, but the cost is that two sessions of the same condition may diverge in their facilitation. The right choice depends on what the experiment is testing.
4.5.3 Segment-submission actions
The interactive segments introduced in §1.4.2
— selection, ranking, and input —
collect an answer from every participant in the chamber. A non-human participant
configured as a communicator can be set up to submit alongside the humans,
the way a confederate in a behavioural lab study would. The submission is attached
to the bot's participant identity, surfaces in the segment's results, and appears
in the exported data with the same shape as a human's answer.
What the bot submits depends on the segment type. For a selection, it
is one or more option indices (or, in slider mode, a numeric value within the
slider range). For a ranking, it is a permutation of the item indices.
For an input, it is a string — or a number, for the numeric
input subtype. In every case the submission shape mirrors the human's, so
cross-participant aggregates and downstream variable expressions
(§3) read bot and human submissions
uniformly.
Where the value itself comes from is configured per trigger. Carrier supports four data modes:
- Static — the researcher hardcodes the value. The bot submits exactly that, every session. Useful when the value is the manipulation.
- Random — the value is drawn from a configured pool, optionally weighted. For
selectionandrankingthe pool is the segment's own options; forinputit is a researcher-provided list of candidate strings, since “random text” without an anchor is not meaningful. - Referenced — the value is derived from what humans have already submitted in the same segment. Strategies include match the first human's answer, match the majority option, oppose the majority, and pick a different option at random. For
selectionandrankingall four strategies translate naturally; forinput, only verbatim copy of a target human's text is well-defined. - LLM-generated — the bot calls the language model with the segment's prompt, the chamber's chat context, and a JSON-shaped schema instruction. The model returns a structured submission (an index, a permutation, or a string). This is the most flexible mode and the one most often appropriate for
input.
Three submission-metadata flags refine how the bot's answer is treated by the rest
of the chamber. countTowardTotal decides whether the bot's submission
contributes to the “everyone has answered” check that releases the
chamber forward — set to false if the bot is a passive
confederate that shouldn't gate progression. showInResults decides
whether the submission appears in any aggregated results display participants see
at the segment's end. tagAsBot decides whether the submission is
visually marked as bot-origin in the UI; the default is false, so the
bot is indistinguishable from the humans, which is usually what a confederacy
design requires.
input deserves a brief separate note. Free-text submissions raise
sharper measurement-validity questions than categorical or ordinal ones: a bot's
prose is harder to compare across sessions than a chosen option, and small
differences in wording can have outsized effects on the humans who read it. The
recommended pattern when using input with LLM-generated mode is a
tightly scoped system prompt, the raw model output logged in the export for audit,
and tagAsBot: true whenever participants will read the submission and
the design should be transparent about its bot origin.
A worked example: a confederacy study runs a selection segment
(“Which option do you find more compelling?”) followed by an
input segment (“In one sentence, why?”). Two configured
bots use static mode in the selection segment, picking option A
in every session; in the input segment they switch to LLM-generated
mode under a system prompt instructed to elaborate on option A in plain, peer-like
language. The participant sees four answers in each segment — two human, two
bot — and across sessions the design holds: the bots' selections are
reproducible to the index, and the bots' free-text answers vary in surface form
while cohering around the same content.
4.5.4 Agent built-in tools (Claude Agent API)
The fourth mechanism is internal to the agent itself, and only applies to participants of the agent type (§ note on non-human participants). A Claude Agent has access to a vocabulary of built-in tools provided by the Anthropic Agent API — tools for reading files in a configured document area, executing small commands, and browsing the web. Before producing the message it will eventually send to the chat, the agent's underlying model can autonomously decide to invoke one or more of these tools, examine the results, and iterate.
A typical pattern, from the chamber's point of view:
- A trigger fires that asks the agent to respond.
- The agent reads (silently) the section of the configured study brief that is relevant to the conversation so far.
- The agent runs (silently) a short check against a dataset to confirm a number it is about to cite.
- The agent produces a single chat message that quotes the relevant passage and reports the number.
Steps 2 and 3 are internal to the agent. Participants in the chat see only step 4 — a single grounded reply. The agent's internal trace (which tools it called, with what arguments, and what results it got back) is preserved in the exported data for the researcher to audit, but it is not shown to other participants.
The two design decisions a researcher makes for an agent are therefore:
- Tool scope. Which of Claude's built-in tools to enable, and — for the file-reading tool — what document area to expose. The narrower the scope, the more focused the agent's contributions; the wider the scope, the more open-ended.
- Step budget / latency. Agents take longer per response than LLM chatbots, because they loop. Configure a maximum step count (or wall-clock budget) so the agent does not hold up the chamber. The platform shows a “thinking” indicator while the agent is in its loop.
Two notes on relating 4.5.4 to 4.5.2:
- An agent acting as mediator can fire LLM-chosen Carrier actions (4.5.2) and use its built-in tools (4.5.4) in the same turn. The former affects the chat; the latter informs the model.
- The two channels are logged separately in the exported data. The Carrier action log records intervention actions; the agent trace records tool invocations.
4.5.5 Choosing among the four mechanisms
| If you want… | Prefer… |
|---|---|
| Identical turn-taking enforcement across sessions | Scripted actions (4.5.1) with fixed release conditions. |
| Facilitator interventions that respond to what was actually said | LLM-chosen Carrier actions (4.5.2), constrained to a small vocabulary. |
| To study the effect of a particular intervention pattern | Scripted — the intervention is the manipulation, so it must be uniform across participants. |
| To study whether an automated facilitator helps at all | LLM-chosen — the manipulation is the model's judgement, so it must vary contextually. |
| A non-human participant that votes, ranks, or writes alongside humans | Segment-submission actions (4.5.3) — pick a data mode according to how reproducible the submission needs to be. |
| A confederacy condition where the non-humans submit identical answers in every session | Segment-submission actions in static mode (4.5.3). |
| A non-human participant that cites the study material accurately | An agent with file-reading enabled over the materials (4.5.4); the agent quotes what it reads. |
| A non-human participant that fact-checks live during conversation | An agent with web-browsing enabled (4.5.4). |
| A non-human participant that runs computations over a dataset before answering | An agent with code execution enabled (4.5.4). |
§4 · Triggers
4.6 Composing triggers
Real bots rarely consist of a single trigger. The composition surface lets you make the rules interact:
- Priority. Each trigger has a numeric priority. When multiple triggers' conditions are satisfied by the same event, Carrier evaluates them in descending priority order and fires the highest-priority match. Use priority to handle exceptions: an attention-check trigger with high priority can override a greeting trigger that would otherwise also fire.
- Cooldown. A trigger can specify a minimum interval that must elapse between successive firings. Used to prevent a bot from spamming when a condition stays true for a while.
- Max fires. A trigger can cap how many times it ever fires per chamber (e.g. an introduction trigger that only fires once).
- Probability. A trigger can fire with a configured probability less than 1.0 when its condition is satisfied, producing stochastic interventions.
- Chain target. A trigger can specify another trigger's ID to fire after it completes. This is how multi-step behaviours are built: trigger A says something, then chains to trigger B which fires a follow-up question after a delay, which chains to trigger C which records the response. The chained trigger's condition can be
chain-only, which means it can only ever fire by being chained — useful for keeping cascading sequences out of the normal trigger queue.
Together, these modifiers turn a flat list of triggers into a directed graph of contingent behaviour. Most experiments need only flat lists; the modifiers are there for designs that demand them.
4.6.1 Variable conditions in triggers (again)
The variable-filter modifier from §4.3.2 deserves a second mention here because of its compositional consequences. A single trigger list with variable filters
TRIGGER 1 if var.condition == "treatment", respond with X
TRIGGER 2 if var.condition == "control", respond with Y
is functionally equivalent to two bots, one per condition, with one trigger each. Whether to write the contrast as “one bot with two filtered triggers” or “two bots, one per condition” is a design decision: the former keeps the experiment shorter and easier to read; the latter is sometimes clearer when the two conditions differ in many small ways.
§4 · Triggers
4.7 Builder walkthroughs
The researcher opens a scripted-agent template, clicks “Add trigger”, picks the keyword condition type, enters “climate” as the keyword, and writes three response variants. They set the response delay to 1500 ms and the probability to 0.7, leaving the cooldown at the default of 30 s. The trigger appears in the agent's script list with a small chip marking its condition type.
The researcher creates a time trigger (“at 5 minutes, ask 'how is everyone feeling?'”) and a chain-only trigger (“respond to the first new message with 'thanks for sharing'”). They open the first trigger and set its chain target to the second trigger's ID. A small arrow now appears between the two triggers in the agent's script outline.
The researcher creates a mediator with a single llm-driven trigger that fires every 90 s. They configure the system prompt to “facilitate a balanced discussion” and enable two actions on the bot's action vocabulary: prompt_participant and highlight_message. They disable disable_chat. The mediator can now choose, on each firing, to broadcast a message, to prompt a quiet participant, or to highlight an important contribution — but not to disable anyone's input.
§4 · Triggers
4.8 Worked example: a probing mediator that escalates on disagreement
A mediator designed to keep a deliberation balanced and to escalate its intervention when the discussion becomes adversarial.
Bot template — “Probing mediator”
- System prompt (used for any LLM-driven response): “You are facilitating a small-group discussion. Your goal is to keep the discussion balanced and respectful. You do not take sides on the topic.”
- Triggers:
| ID | Type | Condition | Response | Action | Priority |
|---|---|---|---|---|---|
T1 | time | 5 s after chamber start | scripted: “Welcome — please introduce yourselves briefly.” | — | 10 |
T2 | periodic | every 120 s | LLM-driven: a one-sentence neutral summary | — | 5 |
T3 | keyword | matches ["disagree", "wrong", "no, you're"] (any) | LLM-driven: an empathetic acknowledgement | LLM-decided: optionally prompt_participant for the speaker on slot 1 if they have not spoken in 60 s | 7 |
T4 | activity-timeout | 90 s of silence | scripted: “Anyone want to add to that?” | — | 6 |
T5 | message-count | every 30 messages | LLM-driven: longer synthesis | — | 4 |
This bot, attached to the mediator slot of a deliberation chamber, will: open with a scripted welcome (T1, deterministic); provide periodic neutral summaries (T2, LLM-generated, varied); intervene with empathetic acknowledgements when disagreement words appear (T3, with an optional prompt action); break long silences with a fixed nudge (T4); and produce longer summaries periodically (T5). The priority ordering means that when a disagreement keyword and an activity timeout both fire, T3's acknowledgement runs first.
Researcher: fictional placeholder. We will substitute a real study when you provide one.
§4 · Triggers
4.9 Common pitfalls
- Triggers that never fire because the active-segment filter is wrong. Easily the most common reported bug. If a trigger is failing to fire, the first place to check is whether its active-segment list includes the segment the chamber is actually in. An empty list — meaning all segments — is the safest default while developing.
- Cooldowns that swallow the trigger. A keyword trigger with a 60-second cooldown will respond only once per minute, even if the keyword is uttered repeatedly. For “single intervention per chamber” semantics, prefer
maxTriggers: 1over a long cooldown. - Priority collisions on simultaneous events. When two triggers fire at exactly the same event with the same priority, the order is undefined. If the order matters, give one trigger a strictly higher priority. The builder shows a warning when two triggers share the same priority.
- LLM triggers fired too often. An LLM trigger fired on every message in a long deliberation can become expensive (in API tokens and in latency) and can also drown out human conversation. Use response-logic settings (
respondOnMention,respondToEveryMessage: false) or message-count cooldowns to keep LLM bots from speaking on every turn. - Mixing scripted and LLM responses on the same trigger. A trigger has one response type. If a researcher wants the bot to either say a fixed sentence or an LLM-generated reply depending on context, they should write two triggers — one scripted and one LLM-driven — and use variable filters or priority to disambiguate.
- LLM-chosen actions on attention-check chambers. Attention checks are typically the place researchers most want determinism; allowing the bot to choose its own actions during an attention check undermines the check. Scope LLM-chosen Carrier actions to discussion segments via the active-segment filter.
- Conflating an agent's built-in tools with its Carrier actions. A Claude Agent's file/code/web tools are internal to the agent — they affect what it knows, not what participants see. The Carrier intervention actions (
disable_chat,prompt_participant, …) are external — they affect the chat. When an agent acts as mediator, both channels are available; researchers occasionally configure one expecting the effect of the other. The action log and the agent trace are logged separately in the export for exactly this reason. - Forgetting that responses can be
null. An LLM-driven response that returnscontent: nullis a deliberate silence — the bot has been asked and has chosen not to speak. This is not a bug; it is one of the things LLM-driven bots are best at. Inspect therationalefield in exported data to understand why the bot stayed silent.
This concludes Part I. The four systems — chamberlines/chambers/segments, roles, variables, and triggers — together define everything a Carrier experiment can express. Part II turns to running an experiment built with them.
Part II · §5. Running and monitoring experiments
Part II is short by design. Most of what makes Carrier worth using is in Part I; what follows is the day-to-day mechanics of running a study built with the four systems above.
§5 · Running
5.1 The experiment lifecycle
Every experiment moves through a small lifecycle. Its status field — visible at the top of the builder and on the dashboard — takes one of five values:
| Status | Meaning |
|---|---|
| Draft | The experiment is being edited. Participants cannot enter it. |
| Active | The experiment is open. New participants who visit the URL begin a run. |
| Paused | New participants are blocked, but existing runs continue. Use during pilots when you want to freeze enrolment without disrupting in-progress sessions. |
| Completed | The experiment is closed. No new runs; existing data remains exportable. |
| Archived | The experiment is hidden from the main dashboard listing. Data remains exportable. |
The transition from Draft to Active is the activation step. The builder will refuse to activate an experiment that has obvious gaps — no chamberlines, an unfilled bot template, an invalid variable reference — but it will not catch every error. Pilot every experiment against yourself (and ideally a colleague) before opening it to real participants.
§5 · Running
5.2 The dashboard at a glance
The dashboard is the experimenter's command surface during a live experiment. It has four panels, of which the first three are tightly coupled.
Active sessions. A list of every participant currently in a run, with their current phase (initialisation / identity setup / global pre-survey / chamber line execution / global post-survey / completed), the chamberline they were assigned to, and the index of the chamber they are currently in. Clicking a participant opens a per-participant detail view.
Matching queue. A list of every participant currently waiting to be matched into a chamber. Each entry shows the chamber the participant is waiting for, how long they have been waiting, and what slot constraints (§3.3) need to be satisfied for them to be admitted. A queue that grows steadily during an experiment is the symptom of a constraint that is too tight.
Alerts. A rolling list of events that warrant attention: disconnects, long waits, drop-outs, and idle participants. The dashboard surfaces these in priority order; experimenters typically watch this panel rather than the others.
Chatrooms. A list of every active chatroom (matched chamber). Each entry can be opened to show the live transcript, the broadcast log, the action log, and the processor-interaction log. This is the place to watch a chamber unfold in real time.
The dashboard auto-refreshes; no manual refresh is required.
§5 · Running
5.3 Live monitoring and intervention
Three kinds of intervention are available from the dashboard during a live session.
Pause a participant's run. Halts the run at its current phase. The participant sees a paused indicator; segments and chamber timers do not advance. Resume with a single click. Use when a participant has hit a problem you want to debug before they continue.
End a participant's run. Terminates the run with a configurable completion message. The participant is shown the message and the global post-survey is skipped (unless explicitly forced). Used for participants who cannot continue — disconnections that will not heal, attention-check failures, withdrawal requests.
Host-advance a segment. Forces a segment whose transition mode is host (see §1.5) to advance for the chamber. Used during pilots to step a chamber through its timeline without waiting for timers or for participant clicks.
A fourth, lighter intervention — broadcast a message into a chatroom — is available from the chatroom detail view. The message appears in the chat as an experimenter announcement. Use sparingly: every dashboard broadcast is recorded in the chat transcript, so it becomes part of the dataset.
§5 · Running
5.4 Exporting data
Data export is the final step of an experiment. It is available from the experiment detail page and the dashboard.
Two parameters control what comes out:
| Parameter | Choices | Effect |
|---|---|---|
| Format | JSON · CSV | The serialisation of the export. JSON preserves nesting; CSV flattens. |
| Type | All · Participants · Chatrooms · Responses | What subset of the experiment to include. |
The four export types correspond to four levels of granularity:
- Participants. One row per participant per chamberline assignment, with their identity, demographics, status, and variable values.
- Chatrooms. One row per chatroom (matched chamber), with the chamber's participants, settings, and timestamps.
- Responses. All survey responses across all surveys (global pre, global post, chamber pre, chamber post, embedded segment surveys), keyed by participant and survey ID.
- All. Every preceding type, plus the full chat transcripts and the broadcast / action / processor-interaction logs.
For a quantitative analysis pipeline, Responses and Participants in CSV are usually the right starting point; for a qualitative pass over conversation, All in JSON gives you the structure to operate on.
Exports are produced on demand; there is no waiting queue. For very large experiments, the export endpoint accepts a participant filter, so you can export a single chamberline or a single date range without downloading the entire experiment.
For a category-by-category description of what each export actually contains, see §5.5.
§5 · Running
5.5 What's in your data
An export is not a single thing. It is a layered snapshot of a run viewed from several angles — the survey angle, the conversation angle, the timing angle, and so on. Most analysis questions touch two or three of these layers at once. This section walks through what Carrier captures for every run, what it does not capture, and which export type each kind of data lands in.
The map below shows, at a glance, which export type carries which category. All is a superset; researchers who plan to do anything beyond the simplest summary should default to it.
| Data category | Participants | Chatrooms | Responses | All |
|---|---|---|---|---|
| Identity, assignment, status | • | • | ||
| Survey responses | • | • | ||
| Chat transcripts | • | • | ||
| Timing and pacing | • | • | • | |
| AI / processor interactions | • | • | ||
| Behavioural events | • | |||
| Attention checks and face monitoring | • | • | ||
| Assignment and reproducibility | • | • |
The eight sub-sections that follow describe each category in turn, including the cases in which a category is empty by design.
5.5.1 Survey responses
For most studies, this is the primary data. Carrier captures survey responses in four places:
- The global pre-survey, completed before any chamber, once per run.
- The global post-survey, completed after the final chamber, once per run.
- Chamber pre- and post-surveys, completed at the boundaries of each chamber.
- Embedded segment surveys — the
surveysegment type — completed within a chamber as part of its segment timeline.
Two shapes come out together. The raw Survey.js JSON preserves nested question
structures (matrices, panels, conditional branching) and is appropriate when the
response shape itself matters. The flattened response rows give one row per
question per participant, with questionId, questionText,
response, responseType, and a stage indicator
pointing at the survey instance the answer belongs to.
For most quantitative pipelines, Responses in CSV is the right starting point. For qualitative analyses or for questions where the survey was deliberately non-trivial, All in JSON preserves the structure you need to operate on.
5.5.2 Chat transcripts
Every message exchanged in every chatroom is preserved verbatim. Each message carries a sender (human participant, LLM chatbot, scripted chatbot, agent, mediator bot, or system), an ISO timestamp, and a message type that distinguishes ordinary text from system notifications, joins and leaves, mediator broadcasts, bot and AI responses, and processor suggestions.
System messages are interleaved with the conversation rather than stored on the side, which means a researcher reading the transcript chronologically sees joins, disconnects, broadcasts, and attention-check events in situ. Chatrooms and All exports include the full chat history; Participants does not.
5.5.3 Timing and pacing
Several layers of timestamps come out together.
- Run-level. When the run started, when each phase transitioned, when the run completed or was terminated, and the reason for termination.
- Chamber-level. When matching happened, when the chatroom began, when each segment within the chamber started, when the chamber ended, and the actual elapsed duration.
- Per-message. Every chat message carries an ISO timestamp.
- Connection-level. Heartbeats, reconnection counts, and the participant's total time in the experiment.
These together let researchers reconstruct any per-participant duration of interest — time-to-first-message, time between segments, time spent re-reading instructions — without custom instrumentation.
5.5.4 AI and processor interactions
When chambers use the processor role, every assist event is logged with its full text. Review interactions carry the draft text that was submitted, the feedback that came back, and whether the communicator accepted, rejected, or edited the suggestion. Generate interactions carry the request and the generated response. Real-time assist suggestions carry their content and outcome.
For chambers that use an LLM chatbot, mediator, or agent, the model's reply is stored in the chat history alongside human messages, with sender metadata identifying the role and, where set, the provider. For agents on the Claude Agent path — where memory is provider-managed — the provider's session handle is preserved on the chatroom so that Carrier-side and provider-side timelines can be aligned after the fact.
5.5.5 Behavioural events
When client-side instrumentation is active for a segment, Carrier captures a stream of low-level events: tab visibility changes, focus changes, pointer activity, clicks, and a small set of custom events raised by specific segment types. Per-segment summaries are produced automatically — most commonly tab-away count and total tab-away time — and the raw event stream is preserved for replay or fine-grained sequence analysis.
This data is opt-in by segment. Researchers who want it should confirm that the relevant segments have behavioural-events instrumentation enabled in the builder before piloting.
5.5.6 Attention checks and face monitoring
The attention-check segment captures a result record per attempt: the
mode (face-based or survey-based), whether it passed, the retry count, and any
mode-specific details. The record appears in two places — a structured array
attached to the run, and a corresponding system message interleaved into the chat
transcript at the moment of the check.
Face monitoring, when enabled on a chat segment, emits its own event stream: warning shown, face returned, grace expired, paused, resumed, terminated. It is stored the same way: a structured array on the run plus interleaved system messages in the transcript.
Both categories are present only when the experiment was configured to produce them. Their absence in an export is not a missing value; it means the experiment did not ask for them.
5.5.7 Assignment and reproducibility
For anyone who needs to reconstruct, after the fact, why a given participant saw what they saw, the export carries:
- The chamberline each participant was assigned to, and the reason (random, counterbalance, survey-based, or fixed).
- A frozen snapshot of the participant's run plan — the chambers in their assigned order, each with its role and slot assignment for that participant.
- The experiment's version at the moment the run was created, so that a later configuration change does not corrupt the interpretation of earlier runs.
- A condition seed where randomisation was involved.
Combined with the admin-side activity log (see §6), this is sufficient to reproduce a participant's path through the experiment exactly.
5.5.8 What's conditional
Several categories appear only when the experiment is configured to produce them. Worth flagging up front, so that an absent column is not mistaken for a bug:
- Behavioural events require client-side instrumentation enabled on the relevant segments.
- Attention-check results require an
attention-checksegment in the chamberline. - Face-monitoring events require face monitoring enabled on a chat segment.
- Processor interaction logs require at least one chamber to use a processor role.
- Non-human sender metadata (role, provider) is populated when the message originates from a chatbot, mediator, or agent; for human messages those fields are empty by design.
- Variable values appear only for variables the experiment defined; there are no system-provided demographic variables.
If a researcher expects one of these and finds it missing, the place to check is the experiment configuration, not the export.
§5 · Running
5.6 Pilot first, ramp second
A short note that does not fit anywhere else in this guide but matters in practice. Every Carrier experiment benefits enormously from a small pilot — three to five participants, ideally including the researcher themselves — before being opened to a larger sample. Pilots are the only reliable way to catch the kinds of issues that the builder cannot validate: a slot constraint that is unsatisfiable in practice, a chamber timing that is too short to read the instructions, an LLM mediator whose system prompt produces unexpected behaviour on real conversations, a survey question that is ambiguous to actual participants.
Pilot with the experiment status set to active and the dashboard open. Watch the matching queue, watch the chat transcripts, and watch the action log. Most experiments end up requiring at least one round of revision after the first pilot. This is normal; budget time for it.
Part II · §6. Administration
Accounts, collaborators, and the admin portal — the parts of Carrier that exist to keep multiple researchers working on the same platform.
§6 · Administration
6.1 Accounts and collaboration
Every researcher account in Carrier has a role: either researcher or admin. Researchers can create, edit, run, and export their own experiments; admins additionally manage the user list and the activity log.
An experiment has one owner and any number of collaborators:
- The owner can edit everything, transfer ownership, add and remove collaborators, and delete the experiment.
- A collaborator can edit the experiment's configuration and view its data, but cannot transfer ownership, add other collaborators, or delete the experiment.
This separation is the simplest model that supports the common pattern of one PI owning each study and several lab members helping to configure and run it.
§6 · Administration
6.2 The admin portal
The admin portal is available only to users with the admin role. It exposes three sub-areas.
User management. Create, update, enable, and disable user accounts. Disabling an account preserves all of the user's experiments and data but prevents them from logging in. This is the right action when a lab member leaves; deletion is rarely necessary.
Registration approval. When self-registration is enabled, new sign-ups arrive in a pending state. The admin reviews each request — typically by checking the requester's institutional email and the project they intend to use Carrier for — and approves or rejects.
Activity logs. A chronological log of meaningful actions across the platform — logins, experiment creations, role changes, exports. Useful both for accountability and for understanding usage patterns when scaling the platform across multiple labs.
Appendices
Glossary, type × role matrix, and quick-reference indexes for segment types and trigger types.
Appendices
Appendix A · Glossary
| Term | Definition |
|---|---|
| Aggregate variable | A variable computed over multiple participants in a chamber. Configurable to include or exclude bot/agent participants. |
| Active-segment filter | A list of segment IDs during which a trigger is eligible to fire. Empty list = fire in any segment. |
| Chain target | Another trigger ID that fires after this one completes. Used to compose multi-step bot behaviour. |
| Chamber | A timed grouping of matched participants who share the same segments and remain together until the chamber ends. |
| Chamberline | An ordered sequence of chambers, representing one experimental condition. A participant is assigned to exactly one. |
| Chamberline filter | A condition under which a participant is eligible for a given chamberline; used by survey-based assignment. |
| Chatroom | The live, runtime instantiation of a chamber for a particular matched group. |
| Communicator | The role of a primary conversational participant. The “default” role in any chamber. |
| Embedded segment | A selection or ranking segment displayed as an overlay on a chat segment, so participants can vote or rank without leaving the conversation. |
| Global pre-survey / post-survey | Surveys at the very start and very end of a run. Distinct from chamber-level surveys. |
| Agent | An autonomous non-human participant built on Anthropic's Claude Agent API. Has built-in tools for reading files (in a configured document area), running code, and browsing the web; uses them on its own initiative to inform its messages. Distinct from an LLM chatbot. |
| Agent built-in tools | The file-reading, code-execution, and web-browsing tools available to an agent via the Claude Agent API. Used for information gathering; distinct from Carrier intervention actions. |
| LLM chatbot | A non-human participant that produces chat messages from a language model, with no tools and no scripted rules. Open-ended, varies across sessions. |
| LLM-chosen Carrier action | An intervention action (disable_chat, prompt_participant, …) selected at runtime by an LLM-driven participant via its structured response. Available to any LLM chatbot or agent acting as mediator. |
| LLM-driven response | A response produced by a language model on the fly, rather than from a pre-written script. |
| Match | The event of assembling enough participants of the right kinds to fill a chamber's slots. |
| Mediator | The role of a facilitator participant — sees everything, broadcasts, controls turn-taking. |
| Non-human participant | Umbrella term for the three kinds of non-human entity Carrier supports: LLM chatbots, scripted chatbots, and agents. |
| Phase script | An ordered list of phases for a processor, each with a mode and a transition trigger. |
| Priority | A numeric ranking among triggers; higher priority fires first when multiple triggers match. |
| Processor | The role that assists composition before a communicator's text becomes a message. Three modes: review, generate, real-time assist. |
| Response | The message a trigger sends when it fires. Either scripted or LLM-driven. |
| Run | One participant's complete pass through the experiment, from arrival to completion. |
| Scripted chatbot | A rule-driven, deterministic non-human participant. Configured by triggers; produces pre-written messages. Can fill communicator and mediator roles, but not processor. |
| Scripted response | A pre-written message (or random pick from a list) sent when a trigger fires. |
| Segment | An activity within a chamber: a chat, a slide, a survey, a timer, a vote, etc. |
| Slot | A position in a chamber, with a type (human / LLM chatbot / scripted chatbot / agent) and a role (communicator / mediator / processor). |
| Standalone segment | A segment that occupies the participant's entire screen, as opposed to embedded. |
| Trigger | A condition–response–action rule that governs when a non-human participant speaks or acts. |
| Variable | An attribute attached to a participant, used for matching, visibility, interpolation, or trigger conditions. |
| Visibility condition | A condition on a chamber that, if false, causes the participant to skip the chamber. |
Appendices
Appendix B · Type × Role compatibility matrix
| Type \ Role | Communicator | Mediator | Processor |
|---|---|---|---|
| Human | ✓ | ✓ | ✓ |
| LLM chatbot | ✓ | ✓ | ✓ |
| Scripted chatbot | ✓ | ✓ | — |
| Agent (Claude Agent API) | ✓ | ✓ | ✓ |
Reproduced from §2.1 for quick reference. The only forbidden combination is scripted chatbot as processor.
Appendices
Appendix C · Segment types — quick index
| Type | What participants do | Embeddable | AI-compatible |
|---|---|---|---|
instruction | Read formatted instructions, click Continue | — | — |
slide | View a content slide | — | — |
media | Watch audio or video | — | — |
timer | Wait for countdown | — | — |
survey | Complete a Survey.js form | — | — |
input | Type a free-text response | — | — |
selection | Choose one or more options | ✓ | ✓ |
ranking | Drag items into order | ✓ | ✓ |
chat | Live multi-party conversation | — | ✓ |
task | Custom interactive task | — | ✓ |
attention-check | Survey- or camera-based check | — | — |
Appendices
Appendix D · Trigger types — quick index
| Type | Listens for | Notes |
|---|---|---|
keyword | Configurable word / phrase | Most common. |
regex | Regular expression match | Use for structured patterns. |
time | Delay from chamber / segment start | Fires regardless of chat activity. |
message-count | Total messages in chatroom | Fires once per matching count. |
participant-message-count | Messages from a specific participant | Supports total / consecutive / since-reset. |
sequence | Ordered series of matches | For staged steering. |
participant-action | Join, leave, idle, etc. | Fires across segments. |
after-bot-message | Another bot's message | Cross-bot chaining. |
event-monitor | Arbitrary chatroom event | Catches segment transitions, dashboard interventions. |
chain-only | (Passive) Only fires from chain | For multi-step bot behaviour. |
llm-driven | An LLM judges the condition | Most expressive; least replicable. |
periodic | Fixed interval | Mediator-specific. |
aggregate | N messages in a window | Mediator-specific. |
topic-detected | Topic / keyword pattern | Mediator-specific. |
activity-timeout | Inactivity duration | Mediator-specific. |
participant-count | Active participant threshold | Mediator-specific. |
discussion-phase | Chamber start / middle / end | Mediator-specific. |
Annotator Documentation
The Annotator is a batch LLM annotation engine for processing text data at scale. Upload a CSV, configure LLM annotators, and download structured results.
Getting Started
What is the Annotator?
The Annotator is a batch LLM annotation engine. Upload a CSV, configure one or more LLM annotators with prompt templates, run the task at scale, and download structured results.
Common use cases include text classification, sentiment analysis, content coding, and replicating published annotation schemes from peer-reviewed research.
Getting Started
Key Concepts
| Concept | Description |
|---|---|
| Task | Top-level container holding CSV data, LLM configs, and processing settings |
| Row | One CSV record, processed independently |
| LLM Config | A provider + model + prompt template combination |
| Repetition | Running each row through each config multiple times for reliability |
| Template | Reusable annotation configuration that can be shared |
| Work Unit | One row × one config × one repetition = one API call |
Getting Started
Your First Annotation Task
Get started in four steps:
Your CSV should contain the text you want annotated. Column names become template variables.
Choose a provider and model, then write a prompt template using
{{columnName}} syntax to reference your data.
Start processing. The engine sends each row through your LLM config and stores the results.
Export your annotated data as CSV, Excel, or JSON.
Getting Started
Providing API Keys
The Annotator requires API keys for the LLM providers you use: OpenAI, Anthropic, and/or Google.
User-level keys are set in your account settings and reused across all your tasks. Per-task keys can be provided when creating or editing a task and override user-level keys for that task only.
Task Setup
Upload & Preview CSV Data
Upload a CSV file (max 10 MB). After upload you can preview the headers and the first
rows of data. Column names become {{columnName}} template variables for use
in your prompt templates.
Task Setup
Configure LLM Annotators
Add one or more LLM configurations to a task. Each configuration specifies a provider (OpenAI, Anthropic, or Google), a model, and prompt templates. You can add multiple configs to compare models or prompt strategies side by side.
Each config supports temperature and maxTokens settings
to control response variability and length.
Task Setup
Write Prompt Templates
Each LLM config has a system prompt and a user prompt.
Use {{columnName}} syntax to insert values from each CSV row into the prompt.
Task Setup
Set Repetitions
Set between 1 and 20 repetitions per row per config. Multiple repetitions let you measure reliability and use majority voting to determine final labels.
The total number of work units (API calls) is:
rows × configs × repetitions.
Processing
Estimate Costs
Before running a full task, use the cost estimator. It runs a sample of up to 10 rows, measures the tokens consumed, and extrapolates to give you an estimated cost for the complete task.
Processing
Standard Processing
Standard mode streams results in real time using 1–20 parallel workers. Failed requests are retried automatically with exponential backoff. Processing is crash-safe — results are saved per row, so progress is never lost.
Processing
Batch Processing
Batch mode uses the OpenAI and Anthropic batch APIs for approximately 50% cost savings with a 24-hour turnaround. Google requests fall back to standard processing automatically.
Processing
Pause, Resume & Cancel
In standard mode, you can pause processing at any time. All completed results are preserved. Resume picks up where you left off. Cancel stops the task permanently but keeps all results that were completed before cancellation.
Templates
Use Research Templates
The Annotator includes 25+ peer-reviewed annotation presets from published research. Select a template to pre-fill your LLM configs with validated prompt designs.
| Authors | Configs | Domain |
|---|---|---|
| Gilardi et al. (2023) | 7 annotators | Text classification |
| Rathje et al. (2024) | 6 annotators | Psychological text analysis |
| Bhatia et al. (2025) | 3 annotators | Choice dilemma annotation |
| Bojic et al. (2025) | 5 annotators | Latent content analysis |
| Kumar et al. (2026) | 4 annotators | Empathic communication evaluation |
Templates
Create Custom Templates
Save any task configuration as a reusable template. Custom templates are private by default and available only to you. They capture the full LLM config including prompts, model settings, and repetition count.
Results
Monitor Progress
A progress bar shows real-time completion status. Each task follows a status lifecycle:
pending → processing →
completed or cancelled. In standard mode, a
paused state is also available.
Results
Download Results
Export results in CSV, Excel, or JSON format. You can download partial results while the task is still running — useful for spot-checking quality before the full run completes.
Results
Understanding Output Format
Results use a flattened format with one row per input record. Columns include all original input data, the rendered prompts, and response columns for each config and repetition combination.
read.csv().
In Python, use pandas.read_csv(). In
Excel, open the Excel export for automatic column formatting.
Response columns follow the naming pattern
[configName]_rep[N].
Carrier Workspace
Carrier Workspace brings your research team's Claude Code activity — session transcripts and shared memory — into one place inside Carrier, so the way your team used AI assistance to build and analyse a study is searchable, reviewable, and preserved alongside the study itself.
Overview
What is Carrier Workspace?
When a team uses Claude Code while building an experiment, writing analysis scripts, or
preparing materials, each developer accumulates a local history of sessions
(the back-and-forth transcripts of their work) and memory (durable notes
the assistant keeps about the project). That history normally lives buried in each person's
local ~/.claude directory, invisible to the rest of the team.
A workspace collects that data for a single repository and shows it on one
page in Carrier. Carrier Workspace is powered by the team-claude-view skill,
which provides the small client scripts that package and upload a machine's data, plus a
/private command for marking sessions you don't want shared.
A workspace is tied to one repository, and you choose how its data arrives when you create it. There are two modes:
| Mode | How data arrives |
|---|---|
| Carrier Workspace mode (default) | Each developer's machine packages its local Claude Code cache and uploads it to Carrier with a small Python client. Works with no GitHub repository involved. |
| GitHub-linked mode | Carrier connects to a GitHub repository and pulls the shared data automatically on a schedule. Lowest-effort once set up — nobody runs anything by hand. |
Both modes end up in the same place: a workspace page showing sessions and memory.
Overview
When a research team needs it
This is a team-tooling feature, separate from running experiments. It does not touch participant data or your experiment configuration — it concerns how your team worked, not what your participants did.
The deeper reason to keep this record is delegation. Empirical research now routinely hands real methodological work to agentic AI: cleaning a dataset, deciding which records to exclude, choosing a transformation, drafting an analysis script, selecting a model specification. Those are not neutral chores — they are methods decisions, and when an agent makes them they tend to vanish the moment the session closes. A workspace turns that delegated work into a durable, shareable record of what was asked, what the assistant decided, and why. Making AI use visible in this way is squarely in the spirit of open science: the same disclosure norms that ask us to share data, code, and pre-registrations extend naturally to disclosing how AI shaped the work.
Transparency also guards against a subtle integrity risk that agentic workflows can introduce without anyone intending it. An assistant pointed at a loosely specified goal — “find the effect,” “get the model to fit,” “clean this up so the result holds” — can quietly explore many exclusion rules, covariate sets, and specifications, then surface only the one that reaches significance. That is the garden-of-forking-paths / researcher-degrees-of-freedom problem, arrived at as unintentional p-hacking rather than deliberate fishing. Because the workspace preserves the full transcript — every fork the agent tried, not just the final answer — you, your collaborators, reviewers, and your future self can tell whether a reported result survived a single principled analysis or emerged after dozens of silent attempts. The record makes the exploration auditable, which is precisely what keeps delegation honest.
Concretely, reach for it when:
- Reproducibility & provenance. You want a durable record of how AI assistance produced study materials, analysis code, or stimuli — and which analytic decisions were delegated — the kind of provenance a methods section or a replication package benefits from.
- Research integrity. You want the agent's exploration to be auditable, so a reported effect can be traced back to a principled analysis rather than an opaque search.
- Onboarding. A new RA or collaborator can read how the project was built rather than starting cold.
- Coordination. Several people on the team use Claude Code on the same repository and you want a shared, searchable view instead of scattered local histories.
Setup
Carrier Workspace mode (default)
This is the default mode. There is no GitHub connection: each developer runs a small Python client (provided by the team-claude-view skill) that bundles their local Claude Code cache and uploads it to Carrier with an API key. The data travels straight from your team's machines to Carrier.
When you create a Carrier Workspace, Carrier displays an API key exactly once, right after creation. Copy and save it now — it is never recoverable. The creation screen also shows the exact upload URL and a ready-to-paste configure command. If you lose the key, you'll need to recreate the workspace to get a new one.
On every machine that should contribute data, run the configure command once. It saves
the upload URL and key locally so later syncs don't need them. Use the exact
--url and --key shown in the create modal:
python3 scripts/team-claude-client/carrier_configure.py --url <upload-url> --key <api-key>
After configuring, push the machine's Claude Code data. The client packages your local
sessions/ and memory/ directories into a gzip tarball and
uploads it, then prints how many sessions and memory entries were sent:
python3 scripts/team-claude-client/carrier_sync.py
Running the sync by hand is easy to forget. Claude Code's SessionEnd hook in
.claude/settings.json fires when a session ends — wire the sync script
in so every finished session uploads automatically:
{
"hooks": {
"SessionEnd": [
{
"hooks": [
{
"type": "command",
"command": "python3 scripts/team-claude-client/carrier_sync.py"
}
]
}
]
}
}
Adjust the path if your repo lays the script out differently.
Setup
GitHub-linked mode
In GitHub-linked mode, Carrier holds a personal access token (PAT) for your repository and uses it to keep a private mirror of the shared Claude Code data up to date — nobody has to run anything by hand. Choose this if your team already publishes shared data to GitHub.
Carrier needs read access to one repository's contents. Create a token at github.com/settings/personal-access-tokens/new:
- Repository access — scope to the single repository you're linking; don't grant access to all repositories.
- Repository permissions — set Contents: Read-only. That is the only permission Carrier requires.
- Expiration — a 90-day expiry balances safety against re-linking too often.
Copy the token when GitHub shows it — you won't be able to see it again. A classic
PAT (with the repo scope) also works, but Carrier will show an advisory
banner recommending you switch to a fine-grained, single-repo, read-only token.
From the Workspaces page click Link a repo, choose the
GitHub tab, fill in a name, the repository URL (e.g.
https://github.com/your-org/your-repo), and paste the PAT, then click
Create.
Carrier clones the repository and fetches its claude-team-share branch
— the branch your team uses to publish shared data — and reads its
sessions/ and memory/ directories into a private server-side
mirror. If that branch doesn't exist yet, the workspace simply shows an empty state until
it appears; nothing is broken.
Setup
Choosing a mode
| Carrier Workspace mode (default) | GitHub-linked mode | |
|---|---|---|
| Auth | API key, shown once, bcrypt-hashed | Fine-grained PAT (Contents: Read), encrypted at rest |
| Cold start | Create workspace, save key, run carrier_configure.py per machine |
Create token, paste into the GitHub tab |
| Updates | Run carrier_sync.py (or a SessionEnd hook) after each session |
Automatic — Carrier polls ~every 30s; Re-sync now for immediate |
| Works without GitHub | Yes — no GitHub repo needed | No — requires a repo and the claude-team-share branch |
| Data path | Straight from your machines to Carrier | From GitHub to Carrier |
Pick Carrier Workspace mode (the default) when you want to keep data flowing
through your own machines or have no GitHub repo in the loop. Pick
GitHub-linked mode when your team already publishes a
claude-team-share branch and you'd rather not run a client by hand.
Using & maintaining
Browsing sessions & memory
However the data arrives, the workspace page presents it the same way. Open a workspace from the Workspaces page to see two things:
- Sessions — the Claude Code transcripts contributed to this repository. Open one to read it as a linear transcript of the work.
- Memory — the durable notes the assistant kept about the project.
In Carrier Workspace mode, every machine that runs the client contributes to the same workspace: Carrier derives the workspace from the repository's root folder name, so all checkouts of the same repository map to one workspace and each machine's contribution is additive.
Using & maintaining
Keeping data current
How a workspace stays fresh depends on its mode:
- Carrier Workspace mode — data updates whenever a machine runs
carrier_sync.py. The recommended setup is theSessionEndhook (see setup, step 4), so every finished session uploads on its own. - GitHub-linked mode — Carrier polls the repository roughly every 30 seconds and re-syncs stale workspaces automatically as your team pushes new data. Use the Re-sync now button on the workspace page for an immediate refresh.
Reference
Privacy & security
- GitHub PATs are encrypted at rest using AES-256-GCM. Carrier stores the encrypted token, not the plaintext.
- API keys are bcrypt-hashed. They are shown once at creation and never recoverable — Carrier cannot display or email them again.
- Marking a session private. The team-claude-view skill provides a
/privateslash command that marks a session as non-shareable, so it won't be included when your data is shared. - Removing a workspace deletes its data. Carrier deletes the server-side mirror and, for GitHub-mode workspaces, the encrypted token along with it.
Reference
Troubleshooting & FAQ
Carrier Workspace mode (manual upload)
| Symptom | What it means |
|---|---|
| 401 Unauthorized | The API key is wrong or has been rotated. Re-run carrier_configure.py with the correct key for this workspace. |
| 413 Payload Too Large | The tarball exceeded the 50 MB upload cap. Trim older sessions from your local cache before syncing again. |
| 400 Bad Request | The upload was rejected as malformed — usually a corrupt tarball, or one containing paths outside the allowed sessions/ and memory/ prefixes. |
GitHub-linked mode
| Symptom | What it means |
|---|---|
| Auth-error banner | The PAT expired or was revoked. Create a fresh token and re-link the repo (or update the token on the existing workspace). |
| Empty state persists | The claude-team-share branch doesn't exist on the remote yet. The workspace stays empty until the branch is created and pushed. |
| Can't read a private repo | Make sure the PAT actually has access to that specific repository. |
Common questions
How do multiple machines work together? In Carrier Workspace mode, every machine that runs the client contributes to the same workspace — Carrier derives the workspace from the repository root's folder name, so all checkouts of the same repository map to one workspace. Each contribution is additive.
Can I share privately, without GitHub? Yes — that's exactly what Carrier Workspace mode is for. The data travels straight from your machines to Carrier with no GitHub repository in the loop.
How do I rotate a GitHub PAT? Create a new fine-grained token and either re-link the repo or update the token on the existing workspace, then revoke the old token on GitHub.
How do I regenerate an API key? There is no in-place regenerate. Delete the
workspace and recreate it to issue a fresh key (shown once), then reconfigure each machine
with carrier_configure.py using the new key.
What gets deleted when I remove a workspace? Its server-side mirror of sessions and memory. For GitHub-mode workspaces, the encrypted PAT is deleted as well. After removal there is nothing left server-side for that workspace.