How AVM Confidence Scoring Works

Not all AVM valuations are created equal. Confidence scoring tells you which estimates to rely on and which to escalate.

Why confidence scoring matters

Every AVM produces a point estimate — a single number that represents its best guess at a property’s market value. But that number alone is incomplete. Without knowing how confident the model is, the recipient has no basis for deciding whether to rely on it.

Consider two valuations from the same model, both estimating £350,000. One is for a standard three-bedroom terrace in a street where six similar houses have sold in the past year. The other is for a converted chapel in a hamlet with no sales for three years. The model might produce the same point estimate, but the confidence behind each is fundamentally different.

Confidence scoring makes this difference explicit. It gives the user — whether a lender, valuer, broker, or investor — a structured way to assess how much weight to place on each individual valuation. Regulators expect this — in the UK via PRA expectations, in the EU via the EBA’s guidelines: an AVM output without a confidence measure is considered incomplete for lending purposes.

What drives confidence in an AVM valuation

Confidence is not a single number derived from a single factor. It reflects a combination of evidence signals that together indicate how well-positioned the model is to value a specific property. The main drivers are:

Comparable transaction volume

How many similar properties have sold recently in the vicinity. More transactions mean more evidence for the model to learn from. A property in a heavily-transacted suburb has a natural advantage over one in a rural area with sporadic sales.

Recency of evidence

A comparable sale from three months ago is more informative than one from three years ago. Markets move, and older transactions carry less weight. The model tracks how recent the available evidence is and adjusts confidence accordingly.

Market homogeneity

In a street of identical 1930s semis, the model can be highly confident because the properties are similar and prices cluster tightly. In an area with a mix of period conversions, new builds, and bungalows, the comparables are less directly relevant and confidence drops.

Property typicality

Properties that are typical of their area — standard size, standard type, standard condition — are easier to value than outliers. A six-bedroom detached house in an area dominated by two-bedroom flats is harder to value accurately, even if transaction volumes are high.

Data completeness

The quality and completeness of the data available for the subject property matters. Properties with full EPC data, clear transaction history, and well-defined characteristics are valued with higher confidence than those with sparse records.

The model synthesises all of these factors into a per-property Forecast Standard Deviation (FSD) — a quantified estimate of its own likely error for that specific valuation. This is what makes confidence scoring property-specific rather than model-level: two properties valued by the same model on the same day will carry different FSDs if the evidence supporting them differs.

From FSD to confidence bands

FSD is a continuous value, and for policy decisions a continuous value needs structure. Meridian maps each property’s FSD to one of the four classification bands used by Fitch Ratings in RMBS analysis. The band is the confidence expression on every valuation.

The bands are not our own grading scheme. They are an external, published standard — the same bands a ratings agency applies when it decides how much to discount an AVM valuation in a structured transaction. Using them means a lender can read our confidence output directly against the framework their securitisation desk already uses.

Band FSD range
Band A FSD ≤ 0.05
Band B ≤ 0.10
Band C ≤ 0.20
Band D > 0.20

Alongside the band, every valuation also carries a European AVM Alliance confidence level on the 0–7 scale — a second standards-aligned expression of the same underlying FSD, included in every API response for lenders whose frameworks reference the EAA scale rather than the Fitch bands.

Band performance on the live backtest

The bands are calibrated against realised Land Registry sale prices, not held as theory. The table below shows how each band performed on our latest bulk test of 295,026 transactions across England and Wales.

Back-test performance by FSD band

H2-2025 test period · 295,026 transactions

Band n % of test MdAPE PE10 Bias
Band A 0 0.0%
Band B 143,546 48.7% 6.0% 70.4% -0.7%
Band C 123,787 42.0% 7.0% 63.9% +1.7%
Band D 27,693 9.4% 11.2% 46.4% +16.5%

No valuations in the current test cohort carry Band A. The current FSD lookup does not produce FSDs at or below 0.05 for any segment; the band is shown for completeness of the Fitch scale.

Band D valuations show a mean bias of +16.5% on the current test — the model overvalues these properties.

For the full breakdown including property type, price band, and regional segmentation, see our accuracy page or download the accuracy report PDF.

How lenders use confidence bands

Confidence bands give lenders a simple, consistent framework for making accept/review/escalate decisions on valuation inputs. Rather than interpreting raw FSD values case by case, a lender can define policy rules tied to the bands — and because the band is reported on every valuation, the policy is auditable.

Example patterns — each lender sets its own thresholds:

B

Accept AVM-only

For remortgages and low-LTV lending where the financial exposure to valuation error is limited. The lender accepts Band B output without further verification.

C

Desktop review or second opinion

The valuation is useful but warrants additional checking. The lender may commission a desktop review, request a second AVM for comparison, or accept with adjusted LTV caps.

D

Escalate to physical valuation

The model’s uncertainty is too wide for the lending context. The lender requires a physical valuation or declines to lend on the basis of the AVM alone.

These rules can be further refined by combining the band with loan-to-value ratio, property type, loan purpose, or other risk factors. The band provides the valuation dimension; the lender’s credit policy provides the rest.

This approach reflects PRA expectations and EU best practice (the EBA’s loan-origination guidelines), under which lenders apply confidence-based acceptance criteria when using AVMs — rather than treating all AVM outputs as equally reliable.

Confidence is not the same as quality

A common misconception is that a Band D valuation is a “bad” valuation. This is not the case. A Band D valuation from a well-built model is still the best estimate available given the evidence — it simply carries more uncertainty, and the band says so.

The quality of an AVM is measured by aggregate metrics like PE10, MdAPE, and bias. These tell you whether the model is well-calibrated overall. The band tells you whether a specific property happens to be in the model’s sweet spot or at the edge of its capabilities.

An AVM that claims Band A for every property is not better — it is less honest. The whole point of confidence scoring is to differentiate: to flag the valuations where caution is warranted so that users can take appropriate action. Our own current backtest carries an empty Band A — published as such, because that is what the data shows.

Put another way: confidence scoring is the model being transparent about its own limitations on a case-by-case basis. That transparency is a feature, not a weakness.

Confidence you can quantify

Every Meridian valuation includes an FSD, a Fitch confidence band, and a prediction interval. See how it works with a free account.

This site uses essential cookies and Google Analytics. See our Privacy Policy.