How to Backtest an AVM
Backtesting is how you verify an AVM’s claims. This guide explains the process, the metrics, and what to look for.
What is backtesting?
Backtesting is the process of comparing an AVM’s valuation estimates against actual transaction prices. You take a set of properties that have actually sold, ask the AVM to value them (typically as at a date just before the sale), and then measure how close the estimates were to the real prices.
It is the most rigorous way to evaluate an AVM’s accuracy, because it tests the model against ground truth — actual market transactions — rather than relying on theoretical claims or self-reported metrics.
The principle is simple: if an AVM claims to be accurate, prove it. Show a large, representative sample of properties where the model’s estimates can be compared against known sale prices. The resulting error statistics tell you exactly how the model performs in practice, across different property types, price bands, and regions.
Why backtesting matters
Backtesting is important for three distinct audiences, each with different needs:
For lenders and risk managers
Before adopting an AVM for lending decisions, a lender needs evidence that the model performs adequately on their specific portfolio. A national average PE10 of 70% is meaningless if the lender concentrates on a region or property type where the AVM underperforms. Backtesting against the lender’s own data reveals this.
For regulators
The PRA, EBA, and RICS all require that AVMs used for regulated purposes be subject to ongoing performance monitoring. Backtesting is the primary mechanism for this. Under Basel 3.1 (UK implementation from 1 January 2027), lenders using AVMs for property revaluation will need to evidence model accuracy through regular backtesting.
For AVM providers
Backtesting is how providers calibrate and improve their models. It identifies systematic weaknesses — regions where the model is biased, property types it struggles with, or market conditions that cause performance to deteriorate. No responsible AVM provider operates without continuous backtesting.
In short, backtesting is the bridge between an AVM’s claims and verifiable evidence. Without it, accuracy numbers are marketing material. With it, they are auditable facts.
Preparing your backtest data
A backtest is only as good as the data it uses. The ideal backtest dataset has the following characteristics:
Sufficient volume
At least a few hundred transactions for meaningful aggregate metrics. More is better — thousands allow robust segmentation by region, type, and price band.
Representative of your portfolio
The test set should reflect the property types, regions, and price bands you actually encounter. A lender focused on London flats needs backtest data from London flats, not national averages.
Recent transactions
Ideally from the last 12–24 months. Older transactions test the model against a market that may no longer exist.
Clean data
Each record needs at minimum: a property address (or UPRN), the sale price, and the transaction date. Incomplete or incorrect addresses reduce match rates and distort results.
Arms-length transactions only
Exclude transfers between related parties, shared ownership, right-to-buy, and other non-market transactions. These distort accuracy measurements because the prices do not reflect open-market value.
For most lenders, the Land Registry Price Paid dataset provides a convenient source of backtest transactions. For details on this dataset and its characteristics, see our article on Land Registry data in property valuation.
Key metrics to evaluate
A backtest produces a set of error statistics that together describe the model’s performance. The most important are:
PE10 — Percentage within ±10%
The proportion of valuations that fall within 10% of the actual sale price. This is the most widely cited headline metric. A PE10 of 70% means that 70 out of 100 valuations were within 10% of the transaction price. PE5, PE15, and PE20 work the same way at different thresholds.
MdAPE — Median Absolute Percentage Error
The median of all absolute percentage errors. If MdAPE is 6%, it means that half of all valuations were within 6% of the sale price and half were further away. MdAPE is more robust than mean error because it is less sensitive to extreme outliers.
Bias — Mean Percentage Error
Whether the model systematically over- or under-values. A positive bias means the model tends to value above the sale price; negative bias means below. Ideally, bias should be close to zero. A small positive bias is generally preferable to a negative one for lending purposes.
FSD — Forecast Standard Deviation
The standard deviation of percentage errors, measured at the portfolio or tier level. FSD indicates the spread of errors — how dispersed the model’s predictions are around the actual values. For a detailed explanation, see our FSD guide.
No single metric tells the full story. PE10 is the most intuitive, MdAPE captures central tendency, bias reveals systematic direction, and FSD measures dispersion. Together, they give a complete picture of how the model performs. For a deeper look at each metric, see our accuracy metrics guide.
What good looks like
There is no universal pass/fail threshold for AVM backtests — acceptable performance depends on the use case, the property mix, and the risk appetite of the user. However, the following benchmarks represent broadly accepted standards in the UK market:
| Metric | Strong | Acceptable | Weak |
|---|---|---|---|
| PE10 | > 70% | 60–70% | < 60% |
| MdAPE | < 6% | 6–9% | > 9% |
| Bias | ±2% | ±2–5% | > ±5% |
These benchmarks should be applied not just to the aggregate result, but to segments. An AVM with a strong national PE10 might underperform badly in a specific region or for a particular property type. Segmented analysis is essential — the aggregate can mask weaknesses that matter for your specific use case.
Also look for consistency: is the model equally good across different price bands, or does performance degrade sharply for higher-value properties? Does it handle flats and houses equally well? Segmented backtesting answers these questions.
Common backtesting pitfalls
Backtesting is straightforward in principle but easy to get wrong in practice. Watch out for:
Data leakage
The most serious error. If the AVM has already seen the sale prices being tested against — because they were in its training data — the backtest is meaningless. The test data must be out-of-sample: transactions the model was not trained on. Any credible AVM provider will confirm their backtesting methodology uses held-out or time-split data.
Survivorship bias
If the AVM declines to value certain properties (those it has low confidence in) and the backtest only measures properties that received a valuation, the results are flattered. Always report the hit rate (how many properties the AVM was able to value) alongside accuracy metrics. A model with 90% PE10 on 50% of properties is very different from one with 70% PE10 on 95% of properties.
Non-market transactions
Including shared ownership, right-to-buy, transfers between related parties, or other non-arms-length transactions pollutes the test set. These transactions do not represent open-market value and will distort accuracy metrics — typically making the model appear worse than it actually is.
Aggregation without segmentation
A single national PE10 figure can disguise significant regional or type-specific weaknesses. Always segment results by property type, price band, region, and confidence tier. The aggregate is the starting point, not the conclusion.
Small sample sizes
Backtesting on 50 properties produces noisy, unreliable metrics. Aim for thousands if possible — and be sceptical of segment-level metrics based on fewer than 100 transactions.
Backtest Meridian yourself
We believe the best way to evaluate an AVM is to test it against your own data. That is why we offer a free, unlimited backtesting service for all registered users.
Upload a CSV of historical transactions — addresses, sale prices, and dates — and we will run them through Meridian, comparing the model’s estimates against the actual prices. You will receive a full results report with PE10, MdAPE, bias, segmented breakdowns, and individual property-level comparisons.
There is no credit cost for backtesting. It uses the same model and the same methodology as our production valuations, so the results are representative of what you would see in live use.
The process takes a few minutes:
Prepare your CSV
Columns: address, postcode, sale price, sale date. Download our example CSV for the correct format.
Upload
Go to the backtest page and upload your file. The system processes transactions in the background.
Review results
View the full report online or download as CSV/PDF. Results include aggregate metrics, segmented breakdowns, and property-level comparisons.
Test Meridian against your own data
Upload historical transactions and see how Meridian performs on your portfolio. Free, unlimited, no credit cost.