What does 'redacted' mean? Designing a sentinel for computer-use telemetry

Why the obvious redaction markers fail for computer-use data, and what a better one looks like.

The reviewer who couldn’t see the bug

A code reviewer read a Python source file in our codebase and reported a critical bug. The file defined a redaction sentinel as a single Unicode code point inside a string literal. The reviewer’s terminal rendered the code point as nothing visible. They concluded — confidently, with a detailed and well-argued report — that the sentinel was the empty string.

They were wrong. The character was a Unicode Private Use Area code point. It had no glyph in their rendering stack, but it was emphatically not empty. The code worked exactly as designed.

The shape of that mistake is the entire shape of the problem this post is about.

Why the existing literature does not apply

We are building devtee, which captures how engineering and operations teams use their computers and turns the capture into grounding for the agents they ship. The capture inevitably touches things the user did not intend to share — passwords, two-factor codes, email subject lines, document names, banking URLs. We strip those values on-device before they ever reach the data plane. The question this post is about is what to put in their place.

The existing literature on redaction — DLP systems, document redaction, medical-record de-identification — assumes the reader is a human and the document will be reviewed by an analyst before publication. Our reader is a language model. Our document is a stream of billions of events. Nobody is going to review each one.

That changes what makes a marker good. A redaction sentinel in computer-use data has to survive:

  1. Wire serialization without collapsing to a default value.
  2. Storage in a typed schema that does not encode “missing.”
  3. Inspection by a human in a terminal that may not have full Unicode rendering.
  4. Tokenization by language models with idiosyncratic vocabularies.
  5. Pipeline normalization by quality scorers, deduplicators, and prompt builders.
  6. Greppability in JSON, CSV, proto debug output, and database CLIs.

Most of the obvious choices fail at least one of these. The reviewer story is what happens when you optimize for (1) and (2) and underweight (3).

What we tried, and the cost we underestimated

Our first version used a single Unicode Private Use Area code point — U+E088 — as the sentinel for any redacted string field. The argument for it was clean. PUA characters are reserved by the Unicode standard for use by private agreement, will never be assigned standard meanings, and are not produced by any system keyboard. A PUA sentinel is impossible-in-domain. Fixed-length. Standards-blessed.

The argument was correct. The cost we underweighted is that PUA characters have no defined glyph. On any rendering surface that does not specifically know about your sentinel — terminals, database clients, debug logs, code review tools — the character renders as a tofu box, or as nothing at all.

Invisibility is fine when the consumer of the data is your own code. It is not fine when the consumer is a human reviewer, a downstream tool you do not control, or — increasingly — a language model. PUA code points tokenize unpredictably across model families. The model that reads your data may not learn “this is a redaction marker” at all. It may learn nothing, or learn a spurious correlation with the unfamiliar token’s neighborhood.

A sentinel that can’t be read by its reader is not a sentinel. It is a hope.

A short tour of the alternatives

Drop the field, or set it to null. Safest in isolation. Destroys structure. Every downstream consumer has to handle “field missing” everywhere, and in protobuf v3 a scalar default is not serialized at all — a redacted-to-default string is bit-for-bit identical to a never-set one. Downstream, you cannot tell that anything was scrubbed.

Use the empty string. Same failure mode in different clothes. Real window titles can legitimately be empty (transient modals, system processes). Once "" is your sentinel, you have created a class of legitimately empty values indistinguishable from redacted ones.

Use an English marker like [REDACTED]. Unambiguous to a human, which is real progress. But [REDACTED] is a string a user can plausibly produce — a code reviewer annotating a diff, a bug tracker title, an email subject line with a bracketed label. Inviolability matters at scale: across billions of events, the rare collision becomes a regular occurrence.

Use a bracketed, self-describing marker. This is the family we have converged on: a pair of rare bracket characters around an ASCII payload that names what was redacted. The serious candidates are:

  • ⟪redacted⟫ — mathematical double angle brackets (U+27EA / U+27EB). Distinctive, vanishingly rare in real input.
  • ❲redacted❳ — light vertical bar brackets (U+2772 / U+2773). Similar properties, slightly heavier visual weight.
  • <<redacted>> — pure ASCII for maximum compatibility, at the cost of a higher collision rate.

We chose ⟪redacted⟫. The pattern matches a single regex, the payload tokenizes as common English, the brackets carry the “this is a sentinel” signal at a glance, and the shape extends cleanly: ⟪redacted⟫, ⟪redacted:keypress⟫, ⟪redacted:1password⟫, ⟪redacted:policy=v3⟫. The specific bracket pair is the kind of detail worth settling by convention across teams, and we would happily migrate to a different one if a shared standard emerged.

The agent-reader axis

The reason the older redaction literature does not transfer is that its reader is a human. Ours is a model. That changes three things.

Tokenization stability. A marker that tokenizes consistently across model families — into a small, fixed number of common tokens — gets learned as a coherent semantic unit. ASCII strings built from common words tokenize cleanly. PUA code points may receive a single rare token, or fall through to byte-level fallback, depending on the tokenizer. The variance is wide and the published evidence is thin.

Self-description. A model that encounters ⟪redacted⟫ during training has a reasonable chance of associating the surface form with “this region was hidden.” A model that encounters a lone PUA character has no semantic context to attach. If we want the model to learn that redaction exists — and to behave correctly around redacted spans, not autocompleting them, not imagining plausible content — the marker has to carry its own definition.

Pipeline robustness. Training data passes through tokenizers, deduplicators, length filters, quality scorers, and prompt builders. Each stage may normalize, strip, or rewrite characters. ASCII-heavy markers survive these stages most reliably; pure-Unicode markers sometimes do not.

These three points are the argument for moving away from invisible single-codepoint markers. None of them appear in the older redaction literature. They are specific to the kind of dataset computer-use modeling produces.

What this post is not

Five adjacent questions deserve careful treatment and do not fit here:

  • Side channels. Even a fixed-length opaque sentinel still leaks the count and timing of redacted events. For us this is a feature; in other threat models it is a leak.
  • Type preservation across non-string fields. What is the redaction sentinel for an integer? A float? A timestamp?
  • Tier composition. How do you express “redacted because of an app-level drop list” versus “redacted because a content classifier flagged a credit card”?
  • Provenance. When data flows through multiple systems, each with its own redaction policy, can a downstream reader tell which system did what?
  • Reversibility. Most redactors are irreversible. Encrypted redaction has uses too.

Each gets its own post.

Where we are

A PUA code point is still our string sentinel in production, in front of real users. We have updated our position on the agent-reader axis and are migrating to ⟪redacted⟫ across the data plane. A follow-up post will cover the migration and the wire-compatibility protocol we settle on.

If you are building a computer-use dataset and have made redaction choices of your own, we would value comparing notes. The space is small, the work is not yet shared in public, and the conversations that move it forward are mostly happening one team at a time. We would like to change that.


If you are working on computer-use agents and want to compare notes, reach us at hello@devtee.com.