A word that shows up in every pitch now

“Data journalism” appears in half the marketing decks a CMO reads this year, usually next to a stock photo of a server rack. Ask five vendors to define it and four will describe a chart. The actual definition is older and narrower than that, and it matters, because the narrow version is the one that still works.

You’re reading a blog post right now. Worth knowing up front: blog and article content makes up only 1.9% of the pages ChatGPT cites most often. Wikipedia alone accounts for 29.7%. That’s not a reason to stop publishing. It’s the reason this piece is about the method, not the format.

What data journalism actually is

Paul Bradshaw’s definition, written for The Data Journalism Handbook, is the one that has held up. It combines “the traditional ‘nose for news’ and ability to tell a compelling story, with the sheer scale and range of digital information now available.” That’s a fusion, not a format: the reporter’s instinct for what matters, applied to a dataset large enough to hold a real finding.

The practice is not new. Long before anyone said “AI search,” newsrooms ran this exact play on paper records and court dockets. In 2010, reporters Marshall Allen and Alex Richards at the Las Vegas Sun analyzed 2.9 million Nevada hospital inpatient records. They found 3,689 cases of preventable harm meeting the state’s own definition of a reportable “sentinel event.” Hospitals had self-reported just 402 over that same period. That gap, computed from public records nobody else had assembled, prompted three draft bills in the 2011 Nevada Legislature. Nobody argued with an adjective. They argued with a number, because the number came with its method attached.

That’s the whole definition in one example: a claim only holds if you can see how it was built.

The definition also draws a line around two things people often confuse with it. A quarterly roundup built from other people’s public statistics is aggregation, genuinely useful, but somebody else already ran the analysis. An infographic designed to look shareable without a dataset behind it is a design exercise wearing the vocabulary. Both can be good content. Neither is data journalism, because neither one produces a finding nobody else could have published.

The method: four moves, same shape every time

Strip away the industry and the four moves repeat. Acquire the underlying records, the ones nobody else has assembled this way, a court docket, an ad platform’s disclosure data, a state agency’s public dataset. Compute a finding, a real number, not a summary of numbers other people already published, the kind that only exists once someone runs the actual analysis. Show the method, the sample size, the date, the source, in a linked methodology note, not buried in a PDF nobody opens. Publish the receipt where anyone, a reporter, an editor, an AI system reading the page, can check it, which is what turns a claim into a citation.

Skip any one of those four and what’s left is a blog post with a chart pasted into it. The chart doesn’t make it data journalism. The checkable method does.

Most published content earns nothing. BuzzSumo and Majestic analyzed 1 million posts in 2016 and found 70% of them picked up zero external links, from anyone, ever. The content marketing industry has noticed. In Orbit Media’s 2025 survey of 808 content marketers, 49% of content programs now publish original research, nearly double the 25% that did in 2018.

The reason isn’t complicated. A writer looking for something to cite can link to an opinion, or link to a number with a dataset behind it. Most write the opinion because it’s faster. The number is what survives, because it’s the only thing on the page another writer can’t just rewrite in their own words and claim as their own insight. An original finding is the one asset a competitor can’t copy without doing the work themselves, which is a different foundation than ranking on keyword coverage alone.

For a marketing decision maker in a fragmented field like law firm marketing, that difference compounds fast. A post explaining how AI search works competes against a few hundred nearly identical versions of the same explainer, all published within months of each other. A study measuring what’s actually happening in one market has no competing version, because nobody else ran that analysis on that market.

Why the AI-search era raises the stakes

Here’s the part that changes the math again. Internal Taqtics keyword research shows search interest in both “data journalism” and its entry-point phrase “what is data journalism” still modest in absolute terms, but climbing. That’s a market asking the definitional question before it asks the commercial one, usually the earliest signal a category is about to get more competitive, not less.

Ahrefs analyzed 863,000 keyword SERPs against 4 million AI Overview URLs and found 38% of the pages Google’s AI Overviews cite also rank in the top 10 organically. The other 62% don’t. In fact 36.7% of cited pages rank beyond position 100, or don’t rank at all. Being citable and being ranked are two different things now, measured separately, and a page can win one without the other. In legal search specifically, where AI Overviews now show up on the large majority of queries, that gap is the whole ballgame.

Legal AI citations, source split

Two out of three legal AI citations point to a firm's own pages, not a directory and not a listicle.
SHARE OF LEGAL AI CITATIONS BY SOURCE TYPE67%A firm’s own pages21%Directories11%Earned press and listicles

Taqtics legal AI-citation study, n=236

That split matters because it runs against the assumption most vendors sell on, that a firm needs a directory profile or a press hit to show up in an AI answer. In legal search specifically, the majority of citations already point back to a firm’s own pages, the ones a firm actually controls. The bar for getting cited by an AI system isn’t ownership of a listing. It’s ownership of a finding.

The bar is also genuinely high, and rehashed content doesn’t clear it. Originality.AI studied 29,000 health, finance, legal, and politics queries and found 10.4% of AI Overview citations were themselves AI-generated content, and citations pulled from outside the top 100 organic results were more likely to be AI-generated than citations from top-ranked pages, 12.8% against 7.7%. Read plainly: a large share of what’s getting cited, even in high-stakes categories, is thin. That’s not a reason to write thinner content faster. It’s the gap a sourced, computed finding walks straight into.

The same method, six markets

The four moves hold regardless of the subject. Our own study of legal AI citations ran the method on a live, high-stakes category: real queries, a real sample, a shown source split, published where the method sits next to the finding.

The same shape built the rest of the desk. One study traces where legal advertising dollars actually flow across markets, not where the industry assumes they flow. Another tests whether mass-tort ad spend predicts new case filings before the filings show up in a court docket. A third measures the real gap between marketing spend and a signed case, the number most agencies never publish because it’s inconvenient. A fourth tracks where connected TV advertising lanes are closing, market by market, using actual inventory data instead of a sales deck. A fifth checks whether a firm’s entity presence predicts whether AI systems convert a query into a citation.

Six different questions, the same four moves behind every one: acquire the records, compute the finding, show the method, publish the receipt. Every one of them sits on the studies index, cataloged the same way, with the same methodology disclosure attached to each.

What a marketing decision maker actually does with this

None of this argues against a normal content calendar. It argues against betting the calendar’s whole job on restating what ten other pages already say. A firm publishing four explainer posts a month and one real study a quarter is running a different portfolio than a firm publishing twenty explainer posts and zero studies, even if the second firm’s word count is higher.

The starting material is usually already sitting inside the business. Intake records, ad spend by channel, case or client outcomes over time, are the kind of data a firm already owns but has never published as a finding. Most firms don’t need a new data collection effort. They need someone willing to run the numbers already on hand and show the method behind them.

The shift in what AI search rewards makes the choice sharper, not different in kind. If citability now runs on a separate track from ranking, the asset that wins both tracks at once is the one with a checkable method attached to it, the kind that earns a link because another writer genuinely can’t get the number anywhere else, and earns a citation because an AI system can trace the claim back to a primary source instead of a paraphrase.

See what a study would look like in your market.

References

  1. Paul Bradshaw, The Data Journalism Handbook (European Journalism Centre).
  2. BuzzSumo and Majestic. "What Makes People Share Content? New Research From BuzzSumo & Majestic." November 1, 2016.
  3. Orbit Media. "Blogging Statistics: 13 Years of Reader Survey Data." August 27, 2025.
  4. Ahrefs. "AI Overviews Cite Pages Outside the Top 10 More Than You'd Think." March 2, 2026.
  5. Ahrefs. "We Analyzed ChatGPT's Most-Cited Pages." October 28, 2025.
  6. Originality.AI. "Are AI Overviews Citing AI-Generated Content?" October 28, 2025.
  7. Marshall Allen and Alex Richards. "Do No Harm: Hospital Care in Las Vegas." Las Vegas Sun, 2010.
  8. Taqtics legal AI-citation study, n=236.