DataDiary

Field Notes: Dashboard Debt

2025-01-20T00:00:00+00:00

We talk about technical debt in code. We don’t talk about it enough in dashboards.

I’ve spent the past few weeks auditing a CJA workspace that grew organically over two years — inherited from two analysts who’ve since moved on, added to by three different stakeholders with different definitions of “conversion,” and now used daily by a team who mostly trusts it implicitly.

What I found wasn’t broken. It was just quietly wrong in a dozen small ways.

The symptoms

Panels with no owner. Metrics with names like Conversion (v2 - FINAL) and Conversion (v2 - FINAL - USE THIS ONE). A date range filter applied to one panel but not the others, invisible unless you know to check. A calculated metric referencing a segment that was deprecated eight months ago — still returning data, just not the data anyone thinks it is.

None of this would fail a QA check. It all renders fine. The numbers look plausible. That’s what makes it dangerous.

How it accumulates

Dashboard debt builds the same way code debt does: through small shortcuts taken under time pressure, each reasonable in isolation.

You duplicate a panel instead of refactoring it because the client meeting is in 20 minutes. You name the metric something temporary because you’ll “clean it up later.” You add a filter for a one-time campaign analysis and forget to remove it. Multiply by six analysts over two years and you get a workspace nobody fully understands.

The difference from code debt is that dashboards have no tests. There’s no CI pipeline that fails when a segment gets deprecated. No lint rule that flags an ambiguous metric name. No type system. The wrongness is silent.

What I’m doing about it

A few things that are actually helping:

Naming conventions enforced at the component level. Calculated metrics and segments now follow a [Scope] Name (version) pattern — e.g. [Global] Sessions — Authenticated (v3). The scope prefix makes it immediately clear what data is included. Old versions get archived, not deleted, for 90 days.

Panel descriptions as contracts. Every panel now has a description that states: what question it answers, what the date range is, and what the primary segment is. Sounds bureaucratic. In practice it takes 90 seconds and has already caught two cases where a panel was answering a slightly different question than its title implied.

Quarterly workspace reviews. Thirty minutes, calendar’d, with whoever owns the workspace. We look at: what’s still being used (CJA has usage data), what’s changed in the underlying data that might affect calculations, what new segments or metrics have been added and whether they duplicate existing ones.

None of this is revolutionary. It’s the analytics equivalent of code comments and PR reviews. The fact that it feels like extra work is the problem — it should feel like the default.

The underlying issue

Dashboards feel finished in a way that code doesn’t. When you ship a feature, everyone knows it’ll need maintenance. When you hand over a workspace, the implicit message is: here, it’s done.

It isn’t done. It’s just quiet.

AEP Schema Design: Getting Identity Map Right

2024-12-03T00:00:00+00:00

The identity map in Adobe Experience Platform is one of those things you configure once, get slightly wrong, and then wonder for months why your profile merge rates are lower than expected. This is what I’ve learned from fixing it in production.

What the identity map actually is

At its core, identityMap is a map of namespace keys to arrays of identity objects. Each identity object has three fields:

{
  "identityMap": {
    "ECID": [
      {
        "id": "12345678901234567890",
        "primary": false,
        "authenticatedState": "ambiguous"
      }
    ],
    "Email": [
      {
        "id": "user@example.com",
        "primary": true,
        "authenticatedState": "authenticated"
      }
    ]
  }
}

The namespace keys (ECID, Email) must match namespaces you’ve created in AEP’s Identity Service. Case matters — email and Email are different namespaces.

The primary flag

You can mark exactly one identity as primary per event. The primary identity is what gets used as the record key when the event lands in a dataset with profile-enabled schemas.

Common mistake: marking ECID as primary. ECID is a device identifier — if you use it as the primary identity and merge policy treats it as highest priority, two different people sharing a device (family computer, kiosk, shared tablet) end up merged into one profile. Almost never what you want.

Best practice: Set authenticated identities (Email, CRMID, loyalty number) as primary when available. Let ECID be a secondary, linking identity.

authenticatedState values

Three options: authenticated, loggedOut, ambiguous.

authenticated — user has explicitly logged in during this session
loggedOut — user has explicitly logged out
ambiguous — no auth signal (most anonymous/ECID traffic)

This field feeds into probabilistic and deterministic merge policies. If you’re sending everything as ambiguous because it’s the default, you’re leaving merge accuracy on the table for any authenticated traffic.

Namespace priority in merge policies

The merge policy you attach to a schema determines which identity “wins” when two profile fragments conflict. Priority order is set in the UI: Profile > Merge Policies.

A sensible default priority stack:

CRMID (deterministic, authoritative)
Email (deterministic, usually available post-login)
ECID (device-level, lowest trust)

If ECID is above Email in your merge policy and a user logs in from two devices, you may end up with split profiles instead of a merged one.

Schema field group placement

identityMap is a standard XDM field — you don’t define it yourself. It’s included automatically when you enable a schema for Profile. But it only works correctly if your datastream / source connector is actually populating it.

One thing that catches people: if you’re using the Web SDK, ECID is populated automatically in the identity map. But your authenticated identities (Email, CRMID) need to be explicitly set in alloy("setConsent", ...) or in the identityMap option passed to alloy("sendEvent", ...).

Checking your identity graph

After ingestion, you can inspect the resolved identity graph in the UI: Profiles > Browse > search by namespace and value. The graph view shows which identities have been linked and through which namespace.

If a known user’s email and ECID aren’t linked, the most common causes are:

Namespace name mismatch (capitalisation)
primary flag on the wrong identity
Merge policy not yet applied to historical data (reprocessing required)

Getting the identity map right at schema design time saves a lot of backfilling pain later.

CJA Derived Fields: Regex Replace in Practice

2024-11-15T00:00:00+00:00

Derived fields in Customer Journey Analytics are one of those features that sound straightforward until you’re staring at a regex replace rule that refuses to behave. This is a short field note on the patterns that actually work in production.

The problem

You’ve got a page_url dimension full of tracking parameters — UTMs, click IDs, session tokens — and marketing wants a clean page_path that strips everything after the ?. Simple enough in SQL. In CJA’s derived field builder, slightly less obvious.

The regex replace approach

The function you want is Regex Replace. The gotcha is that CJA’s regex engine is Java-based, so some shortcuts you’re used to from JavaScript or Python won’t behave the same way.

To strip query strings:

Pattern:  \?.*$
Replace:  (leave empty)

This matches the ? character and everything that follows, then replaces it with nothing. Works reliably for stripping UTM parameters.

Capturing groups

Where it gets interesting is when you want to extract something from the middle of a string rather than just strip a suffix. Say you want the first path segment from /products/shoes/red:

Pattern:  ^\/([^\/]+).*$
Replace:  $1

The $1 backreference returns just the first captured group — in this case products. CJA supports $1 through $9 for group references.

Case-insensitive matching

CJA’s regex replace doesn’t have a flag toggle in the UI. To match case-insensitively, use the inline flag at the start of your pattern:

(?i)campaign

This matches Campaign, CAMPAIGN, campaign, etc.

Chaining derived fields

One derived field can reference another. This is useful when you want to build up a clean value in stages — first strip the query string, then extract the path segment, then recode known values with a lookup table. Each step is its own derived field, and they compose cleanly.

The build order matters: Jekyll compiles them top-to-bottom in the field list, so make sure a field is defined before anything that references it.

What doesn’t work

Lookaheads ((?=...)) — not supported
Named capture groups ((?...)) — use positional $1 instead
The \s+ shorthand in character classes inside replace strings — spell it out as a space or use [ \t]

These limitations catch most people coming from a Python or JS regex background. Once you know them, the derived field builder is actually quite capable for the 80% of use cases that don’t need lookaheads.