<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://datadiary.dk/feed.xml" rel="self" type="application/atom+xml" /><link href="https://datadiary.dk/" rel="alternate" type="text/html" /><updated>2026-05-09T11:47:59+00:00</updated><id>https://datadiary.dk/feed.xml</id><title type="html">DataDiary</title><subtitle>Field notes from the data trenches — CJA, AEP, and beyond.</subtitle><author><name>martineijrgensen</name></author><entry><title type="html">Field Notes: Dashboard Debt</title><link href="https://datadiary.dk/2025/01/20/field-notes-dashboard-debt/" rel="alternate" type="text/html" title="Field Notes: Dashboard Debt" /><published>2025-01-20T00:00:00+00:00</published><updated>2025-01-20T00:00:00+00:00</updated><id>https://datadiary.dk/2025/01/20/field-notes-dashboard-debt</id><content type="html" xml:base="https://datadiary.dk/2025/01/20/field-notes-dashboard-debt/"><![CDATA[<p>We talk about technical debt in code. We don’t talk about it enough in dashboards.</p>

<p>I’ve spent the past few weeks auditing a CJA workspace that grew organically over two years — inherited from two analysts who’ve since moved on, added to by three different stakeholders with different definitions of “conversion,” and now used daily by a team who mostly trusts it implicitly.</p>

<p>What I found wasn’t broken. It was just quietly wrong in a dozen small ways.</p>

<h2 id="the-symptoms">The symptoms</h2>

<p>Panels with no owner. Metrics with names like <code class="language-plaintext highlighter-rouge">Conversion (v2 - FINAL)</code> and <code class="language-plaintext highlighter-rouge">Conversion (v2 - FINAL - USE THIS ONE)</code>. A date range filter applied to one panel but not the others, invisible unless you know to check. A calculated metric referencing a segment that was deprecated eight months ago — still returning data, just not the data anyone thinks it is.</p>

<p>None of this would fail a QA check. It all renders fine. The numbers look plausible. That’s what makes it dangerous.</p>

<h2 id="how-it-accumulates">How it accumulates</h2>

<p>Dashboard debt builds the same way code debt does: through small shortcuts taken under time pressure, each reasonable in isolation.</p>

<p>You duplicate a panel instead of refactoring it because the client meeting is in 20 minutes. You name the metric something temporary because you’ll “clean it up later.” You add a filter for a one-time campaign analysis and forget to remove it. Multiply by six analysts over two years and you get a workspace nobody fully understands.</p>

<p>The difference from code debt is that dashboards have no tests. There’s no CI pipeline that fails when a segment gets deprecated. No lint rule that flags an ambiguous metric name. No type system. The wrongness is silent.</p>

<h2 id="what-im-doing-about-it">What I’m doing about it</h2>

<p>A few things that are actually helping:</p>

<p><strong>Naming conventions enforced at the component level.</strong> Calculated metrics and segments now follow a <code class="language-plaintext highlighter-rouge">[Scope] Name (version)</code> pattern — e.g. <code class="language-plaintext highlighter-rouge">[Global] Sessions — Authenticated (v3)</code>. The scope prefix makes it immediately clear what data is included. Old versions get archived, not deleted, for 90 days.</p>

<p><strong>Panel descriptions as contracts.</strong> Every panel now has a description that states: what question it answers, what the date range is, and what the primary segment is. Sounds bureaucratic. In practice it takes 90 seconds and has already caught two cases where a panel was answering a slightly different question than its title implied.</p>

<p><strong>Quarterly workspace reviews.</strong> Thirty minutes, calendar’d, with whoever owns the workspace. We look at: what’s still being used (CJA has usage data), what’s changed in the underlying data that might affect calculations, what new segments or metrics have been added and whether they duplicate existing ones.</p>

<p>None of this is revolutionary. It’s the analytics equivalent of code comments and PR reviews. The fact that it feels like extra work is the problem — it should feel like the default.</p>

<h2 id="the-underlying-issue">The underlying issue</h2>

<p>Dashboards feel finished in a way that code doesn’t. When you ship a feature, everyone knows it’ll need maintenance. When you hand over a workspace, the implicit message is: here, it’s done.</p>

<p>It isn’t done. It’s just quiet.</p>]]></content><author><name>martineijrgensen</name></author><category term="Field Notes" /><summary type="html"><![CDATA[We talk about technical debt in code. We don’t talk about it enough in dashboards.]]></summary></entry><entry><title type="html">AEP Schema Design: Getting Identity Map Right</title><link href="https://datadiary.dk/2024/12/03/aep-schema-design-identity-map/" rel="alternate" type="text/html" title="AEP Schema Design: Getting Identity Map Right" /><published>2024-12-03T00:00:00+00:00</published><updated>2024-12-03T00:00:00+00:00</updated><id>https://datadiary.dk/2024/12/03/aep-schema-design-identity-map</id><content type="html" xml:base="https://datadiary.dk/2024/12/03/aep-schema-design-identity-map/"><![CDATA[<p>The identity map in Adobe Experience Platform is one of those things you configure once, get slightly wrong, and then wonder for months why your profile merge rates are lower than expected. This is what I’ve learned from fixing it in production.</p>

<h2 id="what-the-identity-map-actually-is">What the identity map actually is</h2>

<p>At its core, <code class="language-plaintext highlighter-rouge">identityMap</code> is a map of namespace keys to arrays of identity objects. Each identity object has three fields:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"identityMap"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"ECID"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
      </span><span class="p">{</span><span class="w">
        </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"12345678901234567890"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"primary"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w">
        </span><span class="nl">"authenticatedState"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ambiguous"</span><span class="w">
      </span><span class="p">}</span><span class="w">
    </span><span class="p">],</span><span class="w">
    </span><span class="nl">"Email"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
      </span><span class="p">{</span><span class="w">
        </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"user@example.com"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"primary"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
        </span><span class="nl">"authenticatedState"</span><span class="p">:</span><span class="w"> </span><span class="s2">"authenticated"</span><span class="w">
      </span><span class="p">}</span><span class="w">
    </span><span class="p">]</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The namespace keys (ECID, Email) must match namespaces you’ve created in AEP’s Identity Service. Case matters — <code class="language-plaintext highlighter-rouge">email</code> and <code class="language-plaintext highlighter-rouge">Email</code> are different namespaces.</p>

<h2 id="the-primary-flag">The primary flag</h2>

<p>You can mark exactly one identity as primary per event. The primary identity is what gets used as the record key when the event lands in a dataset with profile-enabled schemas.</p>

<p>Common mistake: marking ECID as primary. ECID is a device identifier — if you use it as the primary identity and merge policy treats it as highest priority, two different people sharing a device (family computer, kiosk, shared tablet) end up merged into one profile. Almost never what you want.</p>

<p><strong>Best practice:</strong> Set authenticated identities (Email, CRMID, loyalty number) as primary when available. Let ECID be a secondary, linking identity.</p>

<h2 id="authenticatedstate-values">authenticatedState values</h2>

<p>Three options: <code class="language-plaintext highlighter-rouge">authenticated</code>, <code class="language-plaintext highlighter-rouge">loggedOut</code>, <code class="language-plaintext highlighter-rouge">ambiguous</code>.</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">authenticated</code> — user has explicitly logged in during this session</li>
  <li><code class="language-plaintext highlighter-rouge">loggedOut</code> — user has explicitly logged out</li>
  <li><code class="language-plaintext highlighter-rouge">ambiguous</code> — no auth signal (most anonymous/ECID traffic)</li>
</ul>

<p>This field feeds into probabilistic and deterministic merge policies. If you’re sending everything as <code class="language-plaintext highlighter-rouge">ambiguous</code> because it’s the default, you’re leaving merge accuracy on the table for any authenticated traffic.</p>

<h2 id="namespace-priority-in-merge-policies">Namespace priority in merge policies</h2>

<p>The merge policy you attach to a schema determines which identity “wins” when two profile fragments conflict. Priority order is set in the UI: <strong>Profile &gt; Merge Policies</strong>.</p>

<p>A sensible default priority stack:</p>
<ol>
  <li>CRMID (deterministic, authoritative)</li>
  <li>Email (deterministic, usually available post-login)</li>
  <li>ECID (device-level, lowest trust)</li>
</ol>

<p>If ECID is above Email in your merge policy and a user logs in from two devices, you may end up with split profiles instead of a merged one.</p>

<h2 id="schema-field-group-placement">Schema field group placement</h2>

<p><code class="language-plaintext highlighter-rouge">identityMap</code> is a standard XDM field — you don’t define it yourself. It’s included automatically when you enable a schema for Profile. But it only works correctly if your datastream / source connector is actually populating it.</p>

<p>One thing that catches people: if you’re using the Web SDK, ECID is populated automatically in the identity map. But your authenticated identities (Email, CRMID) need to be explicitly set in <code class="language-plaintext highlighter-rouge">alloy("setConsent", ...)</code> or in the <code class="language-plaintext highlighter-rouge">identityMap</code> option passed to <code class="language-plaintext highlighter-rouge">alloy("sendEvent", ...)</code>.</p>

<h2 id="checking-your-identity-graph">Checking your identity graph</h2>

<p>After ingestion, you can inspect the resolved identity graph in the UI: <strong>Profiles &gt; Browse &gt; search by namespace and value</strong>. The graph view shows which identities have been linked and through which namespace.</p>

<p>If a known user’s email and ECID aren’t linked, the most common causes are:</p>
<ul>
  <li>Namespace name mismatch (capitalisation)</li>
  <li><code class="language-plaintext highlighter-rouge">primary</code> flag on the wrong identity</li>
  <li>Merge policy not yet applied to historical data (reprocessing required)</li>
</ul>

<p>Getting the identity map right at schema design time saves a lot of backfilling pain later.</p>]]></content><author><name>martineijrgensen</name></author><category term="AEP" /><summary type="html"><![CDATA[The identity map in Adobe Experience Platform is one of those things you configure once, get slightly wrong, and then wonder for months why your profile merge rates are lower than expected. This is what I’ve learned from fixing it in production.]]></summary></entry><entry><title type="html">CJA Derived Fields: Regex Replace in Practice</title><link href="https://datadiary.dk/2024/11/15/cja-derived-fields-regex/" rel="alternate" type="text/html" title="CJA Derived Fields: Regex Replace in Practice" /><published>2024-11-15T00:00:00+00:00</published><updated>2024-11-15T00:00:00+00:00</updated><id>https://datadiary.dk/2024/11/15/cja-derived-fields-regex</id><content type="html" xml:base="https://datadiary.dk/2024/11/15/cja-derived-fields-regex/"><![CDATA[<p>Derived fields in Customer Journey Analytics are one of those features that sound straightforward until you’re staring at a regex replace rule that refuses to behave. This is a short field note on the patterns that actually work in production.</p>

<h2 id="the-problem">The problem</h2>

<p>You’ve got a <code class="language-plaintext highlighter-rouge">page_url</code> dimension full of tracking parameters — UTMs, click IDs, session tokens — and marketing wants a clean <code class="language-plaintext highlighter-rouge">page_path</code> that strips everything after the <code class="language-plaintext highlighter-rouge">?</code>. Simple enough in SQL. In CJA’s derived field builder, slightly less obvious.</p>

<h2 id="the-regex-replace-approach">The regex replace approach</h2>

<p>The function you want is <strong>Regex Replace</strong>. The gotcha is that CJA’s regex engine is Java-based, so some shortcuts you’re used to from JavaScript or Python won’t behave the same way.</p>

<p>To strip query strings:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Pattern:  \?.*$
Replace:  (leave empty)
</code></pre></div></div>

<p>This matches the <code class="language-plaintext highlighter-rouge">?</code> character and everything that follows, then replaces it with nothing. Works reliably for stripping UTM parameters.</p>

<h2 id="capturing-groups">Capturing groups</h2>

<p>Where it gets interesting is when you want to <em>extract</em> something from the middle of a string rather than just strip a suffix. Say you want the first path segment from <code class="language-plaintext highlighter-rouge">/products/shoes/red</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Pattern:  ^\/([^\/]+).*$
Replace:  $1
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">$1</code> backreference returns just the first captured group — in this case <code class="language-plaintext highlighter-rouge">products</code>. CJA supports <code class="language-plaintext highlighter-rouge">$1</code> through <code class="language-plaintext highlighter-rouge">$9</code> for group references.</p>

<h2 id="case-insensitive-matching">Case-insensitive matching</h2>

<p>CJA’s regex replace doesn’t have a flag toggle in the UI. To match case-insensitively, use the inline flag at the start of your pattern:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(?i)campaign
</code></pre></div></div>

<p>This matches <code class="language-plaintext highlighter-rouge">Campaign</code>, <code class="language-plaintext highlighter-rouge">CAMPAIGN</code>, <code class="language-plaintext highlighter-rouge">campaign</code>, etc.</p>

<h2 id="chaining-derived-fields">Chaining derived fields</h2>

<p>One derived field can reference another. This is useful when you want to build up a clean value in stages — first strip the query string, then extract the path segment, then recode known values with a lookup table. Each step is its own derived field, and they compose cleanly.</p>

<p>The build order matters: Jekyll compiles them top-to-bottom in the field list, so make sure a field is defined before anything that references it.</p>

<h2 id="what-doesnt-work">What doesn’t work</h2>

<ul>
  <li>Lookaheads (<code class="language-plaintext highlighter-rouge">(?=...)</code>) — not supported</li>
  <li>Named capture groups (<code class="language-plaintext highlighter-rouge">(?&lt;name&gt;...)</code>) — use positional <code class="language-plaintext highlighter-rouge">$1</code> instead</li>
  <li>The <code class="language-plaintext highlighter-rouge">\s+</code> shorthand in character classes inside replace strings — spell it out as a space or use <code class="language-plaintext highlighter-rouge">[ \t]</code></li>
</ul>

<p>These limitations catch most people coming from a Python or JS regex background. Once you know them, the derived field builder is actually quite capable for the 80% of use cases that don’t need lookaheads.</p>]]></content><author><name>martineijrgensen</name></author><category term="CJA" /><summary type="html"><![CDATA[Derived fields in Customer Journey Analytics are one of those features that sound straightforward until you’re staring at a regex replace rule that refuses to behave. This is a short field note on the patterns that actually work in production.]]></summary></entry></feed>