Methodology · Published in the open

How we test newsletter platforms

We publish our methodology because comparison sites that hide their process are usually hiding shortcuts. Here is exactly what we measure, how often, and what triggers a refresh.

Pricing verification

Every vendor in our coverage has pricing verified within the last 90 days. The verification process for each vendor :

Pull pricing directly from the vendor's official pricing page (e.g. beehiiv.com/pricing).
Cross-validate against at least one independent aggregator if the vendor uses slider-based pricing UI (Beehiiv, Kit, Mailerlite).
Record the verification date in our pricing data layer, per vendor.
Re-verify every quarter as a minimum, faster if a vendor announces a pricing change.

An automated weekly scraper re-checks every vendor's pricing page and flags any change. Each data point carries its verification date in our pricing layer, and anything past our refresh window gets re-verified before it ships. The freshness check is automated, not a manual promise.

Practical consequence : you will not see pricing on OwnLetter that was last verified in 2024.

Claims ledger

Every sensitive claim on every page passes through a claims ledger before publish. Sensitive claims include : pricing, fees, commissions, break-even calculations, plan limits, dates, real-world case examples, migration paths, legal compliance statements, and strong recommendations ("best", "wins", "cheaper").

For each sensitive claim, the ledger records :

The verbatim claim text as it appears on the page.
The primary source URL we verified the claim against.
The verification date.
The verification status : verified against a primary source, flagged for re-research, removed from the narrative, or kept as reasoned editorial judgment (noting which verified facts back it).

Pages do not ship until every sensitive claim is either verified or backed by documented editorial judgment. No round numbers pulled from competitor blogs. No "industry experts say" without naming them.

First-hand testing

When we say "we tested" on a page, here is what that means :

We opened a real account on the platform (free tier where available, paid tier when needed to test paid-only features).
We ran the workflow we describe in the article (e.g. import a CSV, set up a paid subscription, send a test broadcast).
We captured the test date and the platform version we tested.
For deliverability, we score the features each platform offers (authentication, custom sending domains, dedicated IPs) and summarise what real users report. First-hand inbox-placement testing runs through an opt-in panel we are building, published with full methodology once the sample is large enough to be honest, never a single-account number.

If we have not first-hand tested something, we will not write "we tested". We will write "according to the vendor's documentation" or "reported by users on Reddit" with a link.

How we score (the OwnLetter Score)

The OwnLetter Score at the top of a review is our editorial verdict, on a 0–10 scale, scored the way a good product review works: we rate eight things that matter to a newsletter creator, then average them. Each score starts from a number our own data computes automatically — our feature-by-feature audit weighted by a sourced quality layer, the live pricing, the experiential layer (G2/Capterra ease and support ratings), our platform-trust audit, and our first-hand test — which an editor can then nudge only within a tight, published band, with the reason shown. The data does the first pass; we sign off, and we publish the one-line basis behind every score so you can check it.

Two things it is not. It is not a lab benchmark: we don't claim to have stress-tested deliverability in a controlled rig; where we haven't measured something, we say so and don't score it on a guess. And it is not the users' own rating: those are the G2 / Capterra / Trustpilot scores shown separately in the “What real users say” block on each review. Our score can differ from the crowd's, and when it does we say why. To keep our own bias in check, every scorecard is cross-checked against independent AI models before it ships.

Commission is never an input. A platform is scored on what it does and how it feels to use, never on what it pays us. Here is the exact weighting, published so you can check our work:

Criterion	Weight	Grounded in
Monetization	17%	Feature audit + first-hand
Value for money	15%	Pricing + free-tier reach
Deliverability & analytics	14%	Feature audit
Platform trust & ownership	13%	Terms-of-service & risk audit
Ease of use & editor	12%	G2/Capterra + first-hand
Growth & audience	11%	Feature audit
Automation & workflows	10%	Feature audit + first-hand
Support	8%	Review themes / G2-Capterra

When we have no defensible basis for a criterion on a given platform, we leave it out and the remaining weights rebalance — a platform is never docked for a gap in our data, only for something it genuinely lacks or does poorly. Value for money is scored here as its own criterion (free-tier reach plus entry price), so a powerful-but-pricey tool and a cheap-but-limited one are both judged on the same honest terms.

Does our ranking just track who pays us?We checked, because we earn an affiliate commission from eight of the eleven platforms — Substack, Mailchimp, and WordPress pay us nothing. Our top picks do correlate with the more generous affiliate programs, but that is because the most capable creator platforms are also the best-funded, not because a commission moves a score (it never enters the math). To prove it is not our weighting either, we re-scored every platform under thousands of alternative weightings: the order barely moves, our published weights land in the middle of that range rather than at a setting that flatters anyone, and the lowest-scoring platforms stay lowest however you weight them — including the one that pays us nothing, which scores low for the exact reason this site exists: on it, you do not own your audience.

Cross-vendor review aggregation

Headline numbers like "10,810+ reviews aggregated" reflect actual review collection across G2, Capterra, Trustpilot, iOS App Store, and Reddit threads. We aggregate, we cite the source for each verbatim quote we surface, and we publish the underlying counts per vendor.

We do not write fake reviews. We do not pay reviewers. We do not cherry-pick negative reviews to make a competing vendor look better when we have affiliate commission on the alternative. If a vendor has 4.5/5 average on G2 across 200 reviews, we will say so even if their commission rate is lower than the alternative we cover.

Refresh cadences

Different data ages at different rates. Our refresh policy :

Data type	Maximum age	Trigger faster refresh
Vendor pricing	90 days	Vendor announces pricing change
Feature availability	180 days	Vendor launches major feature
User-reported deliverability signals	12 months	New inbox-issue reports surface
Affiliate commission rates	90 days	Vendor changes program terms
Methodology pages	365 days	Process change

We record the verification date for every pricing point in our data layer and re-verify on the cadence above, so a number is never older than its stated window.

When we get it wrong

We will get things wrong. Vendors change pricing without notice. Affiliate programs add restrictions mid-quarter. A claims ledger entry slips through review. When that happens :

Email us at contact@ownletter.com — we read everything.
We fix within 48 hours when the correction is verifiable.
We log the correction in the page's changelog (visible in page metadata).

We do not silently rewrite history. If a recommendation flipped because pricing changed, you will see the old recommendation and the new one with the date of change.