Methodology

Core field descriptions

The core fields included in the DQMI along with the description of how the validity and default measures have been calculated is available in the ‘Core Field Descriptions’ tab of the DQMI publication accessible via the NHS Digital website data quality page.

The information provided includes the following:

‘Data Item’ – the name of the core field as defined in the NHS Data Dictionary. This is also a hyperlink and opens onto this field definition in the NHS Data Dictionary.

‘Plain English Description’ – a description of the core field in layman’s terms.

‘Definition of Validity’ – a pseudo-code description of the validation rules applied to the core fields in the DQMI.

‘Threshold for proportion of defaults’ – Percentage of defaults accepted as valid and included in the Percentage Valid, Complete percentage scores:

100% - all default values are meaningful and considered valid
0% - all defaults are meaningless and considered invalid
'-‘ - default values are not applicable for this data item
any other percentage value - a mix of meaningful and meaningless default values

Other percentage value gives the percentage of default values considered valid. See Calculation of variable thresholds below.

Further information on the validity definitions used in the DQMI can be found within the 'Data Item Help' page on the DQMI Power BI report.

Calculations of the DQMI

Calculations of Coverage

The monthly coverage is calculated as:

\(C = {period\ for\ which\ a\ provider\ submitted\ data\ \over period\ a\ provider\ was\ expected\ to\ submit\ data}\)

Examples:

For a month "✓" indicates data submitted, "x" indicates data expected but not submitted and "-" indicates where data is not expected.

Oct	Coverage factor	C	Plain English
x	0	0.00	Data expected, but no submissions
✓	1/1	1.00	Data expected, and all data submitted

If a Provider has closed, then it is not expected to submit data after the month of closure. If a Provider is new then it is expected to submit data after the month opened, despite having no previous submission history.

Coverage expectancy

APC, OP and ECDS

Expected coverage for APC, OP and ECDS looks back over a three month window (including the current processing month), as a master list of providers who should/shouldn’t be submitting these data sets does not exist. A provider is expected to submit when they have activity within this three month period. Beyond this timeframe a provider would no longer be expected to submit. This logic is applied to each of these data sets individually.

CSDS, DID, MHSDS and MSDS

Expected coverage uses a rolling six-month window, as there is not a master list of providers. For example, a provider is expected to submit in October 2018 if the provider has submitted at least one month during the period April 2018 to September 2018.

IAPT

Expected coverage looks back over previous 2 months (3 periods including current), as there is not a master list of IAPT providers. For example, a provider is expected to submit in October 2018 if the provider has submitted at least one month during the period August 2018 to October 2018.

Consistency

The calculation of consistency is:

Average number of records submitted +/- 2 standard deviations to create an upper and a lower limit. Then calculate a % of how close to that range the records are to that range.

Methodology for Excluding Default Values from Percentage Valid, Complete

Default values for each data item are identified and defined on NHS Data Dictionary. Loshin (Reference: Loshin, D. The Practitioner's Guide to Data Quality Improvement. 2011. USA. Morgan Kaufman) defines that there are two kinds of default values: meaningless and meaningful. A meaningless default is equivalent to a true “null” value that represents the absence of a value. A meaningful default value is the one used to represent some concept without specifying a value. For the purposes of the DQMI, we have made a distinction between these as:

Meaningless: When the actual value is not available or is defined as ‘Not Known’. These values are eliminated from the valid attribute during the calculation of the DQMI score. For example, Ethnic Category ‘99’ (Not Known), as there is a separate (valid) code of ‘Not Stated’ for these cases.

Meaningful: When the actual value offers some valuable information. For example, using a default to specify non-consultant led activity such as nurses or midwives on the Consultant Code field.

Calculation of defaults in excess

Defaults in Excess include all meaningless default values and a proportion of records with a mix of meaningful and meaningless defaults. These values are eliminated from the valid attribute during the calculation of the DQMI score.

For example, Administrative Category Code is a mixed default as ‘98’ (Not Applicable) is considered meaningful, and ‘99’ (Not Known) is considered meaningless. This field is given a threshold of 4% for Outpatients and 1% for Admitted Patient Care; see below for the calculation of thresholds.

Defaults in excess = 𝑇𝑜𝑡𝑎𝑙 𝐷𝑒𝑓𝑎𝑢𝑙𝑡𝑠 − (𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ∗ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑅𝑒𝑐𝑜𝑟𝑑𝑠)
If less than zero then count as 0

The validity of the field will be affected depending on the types of default a data item has:

Default type defined for the data item	Threshold	Notes
Only meaningful defaults	100%	All records with default values will be treated as valid
Only meaningless defaults	0%	All records with default values will be treated as invalid (defaults in excess)
A mix of meaningful and meaningless defaults	Variable, see details below this table	A proportion of records with default values will be treated as invalid (defaults in excess)
No defaults defined	-	No defaults defined - No records with default values will be present on this data item

Calculation of Default variable thresholds

Defaults

1. Obtain the proportion of defaults per data iterm per dataset for each provider.

\(proportion\ of\ defaults = {Number\ of\ default \over Number\ Valid, Complete}\)

2. Identify outliers (biggest offenders).

3. The upper limit (threshold) is calculated excluding outliers as:

𝑈𝐿 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒(𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠) + 2(𝑠𝑡𝑑𝑒𝑣(𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠))

The average of proportion of defaults for the data item plus two times the standard deviation of proportion of defaults

The calculated threshold is presented as percentages by multiplying the proportion by 100 and adding the percentage (%) symbol. It is available in the ‘Core Field Descriptions’ tab of the DQMI publication.

Data item score

Where defaults are used.

Data Item Score = Percentage Valid, Complete for the applicable data item (such as an NHS Number).

The Data Item Score may be formally expressed as:

\(Data\ item \ score = ({Number\ of\ valid\ and\ complete\ records - defaults\ in\ excess \over\ Number\ of\ records}) * 100\)

Find the applicable data item.

See calculations of defaults in excess above for an explanation.

See Appendix 1 for a worked example of a DQMI calculation.

Where on the hour is used

Data Item Score = Proportion of times not recorded on the hour (for example, Onward Referral Time (Hour))

The Data Item Score may be formally expressed as:

\(Data\ Item\ Score = ({Number\ of\ records - On\ the\ hour\ in\ excess\ \over Number\ of\ records}) * 100\)

For the applicable data item.

See Calculation of hour in excess above for an explanation.

MHSDS (hour) measures variable thresholds

1. Obtain the proportion of on the hour per data item per dataset for each provider.

\(proportion\ of\ on\ the\ hour = {Number\ on\ the\ hour \over Number\ of\ Records}\)

2. Identify outliers (biggest offenders).

3. The upper limit (threshold) is calculated excluding outliers as:

𝑈𝐿 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒(𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑜𝑛 𝑡ℎ𝑒 ℎ𝑜𝑢𝑟) + 2(𝑠𝑡𝑑𝑒𝑣(𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑜𝑛 𝑡ℎ𝑒 ℎ𝑜𝑢𝑟))

The average of proportion of on the hour for the data item plus two times the standard deviation of proportion of on the hour.

The calculated threshold is presented as percentages by multiplying the proportion by 100 and adding the percentage (%) symbol. It is available in the ‘Core Field Descriptions’ tab of the DQMI publication.

Dataset score

Dataset Score (e.g. APC Score) = Mean of all the Data Item Scores for Percentage Valid & Complete for the applicable dataset.

The Dataset Score may be formally expressed as:

\(Dataset\ score = \frac{1}{n} \Sigma_{i=1}^n({Number\ of\ valid\ and\ complete\ records - defaults\ in\ excess \over\ Number\ of\ records})_i \ *\ 100\)

Where:

n is the number of fields for which data was submitted for the applicable dataset and

i is the index number of each of those fields

See calculation of defaults in excess above for an explanation.

DQMI

The DQMI is an overall score calculated for each provider; it is defined as the average of the percentage of valid and complete entries in each field of each dataset and is proportional to the coverage. The excessive use of default values is penalised from the valid values.

Where data items are not expected for a field, the percentage is treated as a null value and is not included in the calculation of the mean.

DQMI = Mean of all the Data Item Scores (for Percentage Valid & Complete) multiplied by the coverage score.

The DQMI may be formally expressed as:

\(DQMI = \frac{1}{n} \Sigma_{i=1}^n({Number\ of\ valid\ and\ complete\ records - defaults\ in\ excess \over\ Number\ of\ records})_i \ *\ \frac{1}{n}\Sigma_{i=1}^n(C)_i \ *\ 100\)