Land Registry Data in Property Valuation
The transaction data that underpins every UK residential AVM — what it contains, where it comes from, and what it cannot tell you.
The Price Paid dataset
HM Land Registry’s Price Paid dataset is the foundation of residential property valuation in England and Wales. It records every residential property transaction that is lodged for registration, going back to January 1995. As of early 2026, the dataset contains over 28 million individual transaction records.
The data is published as open data under the Open Government Licence, updated monthly. New transactions typically appear in the dataset 4–8 weeks after completion, once the registration process is finalised. This delay is important: it means the most recent month or two of transactions are underrepresented in any analysis based on this data.
For AVMs, the Price Paid dataset serves two critical functions: it provides the training data from which the model learns the relationship between property characteristics and value, and it provides the test data against which the model’s accuracy is measured through backtesting.
What the data contains
Each record in the Price Paid dataset includes the following fields:
| Field | Description |
|---|---|
| Price | The amount paid for the property, in pounds |
| Date of transfer | The date on which the transaction completed |
| Address | Full address including postcode |
| Property type | Detached, semi-detached, terraced, flat/maisonette, or other |
| Old/new build | Whether the property was newly built at the time of sale |
| Tenure | Freehold or leasehold |
| Transaction category | Standard price paid or additional price paid entry |
Note what is absent: there is no floor area, no number of bedrooms, no property condition, no garden size, no EPC rating. The Price Paid dataset tells you what sold, when, and for how much, but very little about the physical characteristics of the property. This is why AVMs must supplement it with other data sources.
Limitations and data quality issues
The Price Paid dataset is authoritative but not perfect. Understanding its limitations is essential for interpreting AVM outputs and backtesting results.
Non-market transactions
The dataset includes some transactions that do not reflect open-market value: transfers between family members at below-market prices, right-to-buy sales at statutory discounts, shared-ownership transactions (where only a percentage of the property is purchased), and repossessions. While the “additional price paid” category captures some of these, it is not a reliable filter. AVM providers must apply their own filtering logic to exclude non-market transactions from training and testing data.
Registration delay
Transactions appear in the dataset 4–8 weeks after completion, sometimes longer. This means the most recent data is always incomplete, creating a blind spot for the most current market conditions. AVMs must account for this latency in their modelling.
No property characteristics
As noted above, the dataset contains no information about the physical property beyond its type and tenure. Two semi-detached houses on the same street might differ enormously in size, condition, and specification, but the Price Paid data records only the price and type. AVMs must source property characteristics from elsewhere.
Address matching challenges
Addresses in the Land Registry do not always follow standard formatting. Flat numbering schemes, building names, and rural addresses without postcodes can make it difficult to match transactions to properties in other datasets. Robust address-matching logic is essential for any AVM that uses this data.
England and Wales only
HM Land Registry covers England and Wales. Scotland has its own Registers of Scotland with a separate dataset, and Northern Ireland has the Land & Property Services. AVMs built on the Price Paid dataset are applicable to England and Wales only.
EPC data: filling the characteristics gap
Energy Performance Certificates (EPCs) are the single most important supplement to the Price Paid dataset for UK AVMs. Required for most property sales and rentals since 2008, EPCs contain detailed property characteristics that the Land Registry data lacks.
An EPC record typically includes:
Total floor area (m²)
The single most predictive physical characteristic
Property type and form
More granular than Land Registry categories
Number of habitable rooms
Bedroom/reception count indicators
Construction age band
e.g. 1930–1949, 1967–1975, post-2012
Wall type and insulation
Cavity, solid, insulated, uninsulated
Energy efficiency rating
A–G band, numeric score
Heating system
Boiler type, fuel source
Glazing type
Single, double, or triple glazed
The EPC register contains over 25 million certificates, covering the majority of residential properties in England and Wales. By linking EPC records to Land Registry transactions (using address matching and UPRN linkage), AVMs can associate physical characteristics with sale prices — the essential combination for accurate valuation.
EPC data has its own limitations: certificates are valid for 10 years, so some records describe a property as it was a decade ago. Properties that have not been sold or rented since 2008 may have no EPC at all. And the quality of individual assessments varies, particularly for floor area measurements. Despite these limitations, EPC data represents the most comprehensive source of property characteristics available in England and Wales.
Other data sources used by AVMs
Beyond the Land Registry and EPC data, modern AVMs draw on additional sources to improve accuracy and coverage:
Ordnance Survey and geographic data
Location matters enormously in property valuation. Geographic data provides coordinates, postcodes, ward and parish boundaries, and UPRNs (Unique Property Reference Numbers) that enable precise address matching. It also allows calculation of proximity features: distance to stations, schools, town centres, and other amenities that influence value.
Census and demographic data
ONS census data provides neighbourhood-level information about housing stock, tenure mix, population density, and socioeconomic characteristics. While not property-specific, these area-level features help the model understand local market context.
Planning and building data
Council tax bands, planning applications, listed building status, conservation area designations, and other regulatory data provide additional property and location features. These can be particularly important for unusual properties where standard comparables are scarce.
The art of building a good AVM lies in combining these disparate sources into a coherent feature set that the model can learn from. Each source has coverage gaps, quality issues, and matching challenges. The model’s performance depends as much on the quality of data engineering as on the choice of algorithm.
How Meridian uses these data sources
Meridian — the AVM powering Gadsden Valuations — ingests and links data from all the sources described above. The result is a property database covering 147,188+ transactions, with 49 features per property derived from Land Registry, EPC, geographic, and contextual data.
The linking process uses a combination of UPRN matching, address parsing, and fuzzy matching algorithms to associate transaction records with property characteristics. Where multiple data sources disagree (for example, on floor area), the model uses the most reliable source available for each property.
Data freshness is maintained through automated pipelines that ingest new Land Registry releases monthly and EPC updates as they become available. The model is retrained on the updated data to ensure it reflects current market conditions.
The quality of this data pipeline — matching accuracy, feature coverage, freshness — directly affects the model’s accuracy and confidence levels. Properties with complete, recent data receive higher confidence scores; those with gaps or stale records receive lower confidence, because the model honestly reflects the limitations of the available evidence.
See the data in action
Every Meridian valuation shows the comparable transactions and data sources that informed the estimate. Try it with a free account.