NHSD Usage Data #102

em-baggie · 2025-07-21T20:13:15Z

em-baggie
Jul 21, 2025
Collaborator

NHSD Usage Data

The The NHSD SNOMED code usage data is a dataset published by NHS Digital which shows how often each code is used in English primary care (GP) records.

Why should we incorporate the usage data into the Codelist Builder?

This would allow the builder can take into account which codes are actually used in practice, and how frequently. For example, users could filter, prioritise, or exclude codes based on their real-world usage.

Available years and reporting periods

At the time of writing, data from 2012-2024 is available. Each file represents the usage over one year from 1st August to 31st July.

File formats

Data is available in .txt and .xlsx formats. The structure is consistent across both data formats.

File contents

The column structure is consistent across all years. Two metadata files are available, which describe the contents of the files:

One describing data from 2011-12 to 2017-18
The other describing data from 2018-19 onwards.

Below are the descriptions of the column data, summarised from the information included in the metadata files. Most of the descriptions were identical between the two files and I've highlighted where these differ.

SNOMED_Concept_ID

"SNOMED concepts which have been added to a patient record in a general practice system during the reporting period. Text string of digits up to 18 characters long."

Description

"The fully specified name associated with the SNOMED_Concept_ID on the final day of the reporting period (31 July)."

Usage

"The number of times that the SNOMED_Concept_ID was added into any patient record within the reporting period, rounded to the nearerst 10. Usage of 1 to 4 is displayed as *. SNOMED concepts with no code usage are not included within the dataset."

IMPORTANT TO NOTE:

Data prior to 2019 was originally submitted mostly in READ V2 or CTV3, but in the usage files, these codes have been mapped to corresponding SNOMED codes using final 2020 version of the mapping tables published by NHS England. Therefore all of the available files, even if they are from 2019 or prior, only include valid SNOMED codes.
The usage does not show how many patients had each code added to their record - each addition regardless of whether it is the same patient increments the count by 1. Therefore it is not possible to infer the number of individual patients with a particular code.
For the 2011-12 to 2017-18 data, it is stated that "Current maximum value is approximately 250,000,000" - no such maximum is stated for the 2018-19 onwards data.

Active_at_Start

"Active status of the SNOMED_Concept_ID on the first day of the reporting period. This is taken from the most recent UK clinical extension, or associated International extention, which was published up to the start of the reporting year (1 August)

1 = SNOMED concept was published and was active (active = 1).

0 = SNOMED concept was either not yet available or was inactive (active = 0)."

Active_at_End

"Active status of the SNOMED_Concept_ID on the first day of the reporting period. This is taken from the most recent UK clinical extension, or associated International extention, which was published up to the end of the reporting year (31 July).

1 = SNOMED concept was published and was active (active = 1).

0 = SNOMED concept was either not yet available or was inactive (active = 0)."

Limitations

Data is obtained only from English primary care records and may not reflect secondary care or other countries
Not all primary care practices are included
READ V2 or CTV3 data from prior to 2019 has all been converted to SNOMED so does not show the usage of the old codes
SNOMED codes listed in a given timeframe that later became inactive are still listed
Code usage figures are rounded to the nearest 10 and codes with no usage are not included in the dataset

Loading data into the builder

See PR for proposed method of loading the usage data into the codelist builder

(Relates to #99)

CarolineMorton · 2025-08-04T10:27:20Z

CarolineMorton
Aug 4, 2025
Maintainer

This is really great thanks @em-baggie

I was recently sent this R package and R shiny app from Bennett that does something similar. https://github.com/bennettoxford/opencodecounts/tree/main

We should have a look at this as well and find out if we want to use any of their approaches or even collaborate with them.

1 reply

em-baggie Aug 4, 2025
Collaborator Author

Oh cool - I'll take a look.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NHSD Usage Data #102

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

NHSD Usage Data #102

Uh oh!

Uh oh!

em-baggie Jul 21, 2025 Collaborator

NHSD Usage Data

Why should we incorporate the usage data into the Codelist Builder?

Available years and reporting periods

File formats

File contents

SNOMED_Concept_ID

Description

Usage

Active_at_Start

Active_at_End

Limitations

Loading data into the builder

Replies: 1 comment · 1 reply

Uh oh!

CarolineMorton Aug 4, 2025 Maintainer

Uh oh!

em-baggie Aug 4, 2025 Collaborator Author

em-baggie
Jul 21, 2025
Collaborator

Replies: 1 comment 1 reply

CarolineMorton
Aug 4, 2025
Maintainer

em-baggie Aug 4, 2025
Collaborator Author