Do you use census data? We'd like your feedback.

Statistical disclosure control

Before we publish census data, we go through processes to protect the privacy of individuals and households in published census outputs.

We call this process statistical disclosure control (SDC).

SDC refers to the methods that we use. These methods can be applied to the underlying census datasets, or at the point where census outputs are created.

This process helps us make sure we are following the rules and laws that protect the confidentiality of census data.

Read more about the rules and laws that protect census confidentiality.

SDC measures must achieve an appropriate balance between:

  • protecting the privacy of data subjects
  • preserving utility for data users by minimising the impact on data quality

With this in mind, we have worked closely with UK census colleagues at ONS and NISRA to develop our census SDC strategy for Scotland's Census.

There are three main methods that we have used to protect confidentiality in 2022 census outputs. These methods are:

  • targeted record swapping
  • cell key perturbation
  • flexible table builder rules

Targeted record swapping

This involves swapping the geographic information of a proportion of households in the underlying census datasets. We start by assessing every individual and household for uniqueness or rarity on several characteristics. Households which contain individuals who are unique or rare on one or more of these characteristics are flagged as “high risk” records.

We then select a sample of households for swapping. All households in Scotland have a chance to be selected in the swapping sample. But those who have been deemed “high risk” are much more likely to be included.

For the households selected for swapping, we find demographically similar households in nearby geographical areas to swap them with.

We try not to cause excessive changes to the underlying census data such that data utility would be impacted. We carry out extensive post-swapping checks to provide assurance that post swapping distributions have not been significantly impacted across different levels of geography.

A similar method of swapping is also applied to individuals residing in communal establishments. However, in this instance we swap individuals between communal establishments in nearby areas.

For more information on record swapping is available in our Household Record Swapping methodology paper.

Cell key perturbation

Our Flexible Table Builder tool is an innovation for Scotland’s Census 2022. This allows users to create their own tables from census data.

The table builder will use a method called cell key perturbation. This helps to protect the confidentiality of data within tables.

Perturbation is a technique which is applied at the point where census tables are created. Perturbation adds statistical “noise” to protect confidentiality in published tables. We use an algorithm that applies a pre-defined level of perturbation to cells in each dataset. The same perturbation is applied to every instance of that cell to maintain consistency between different tables.

For the practical application of perturbation we assign key to every record in the census dataset. The record key is a random number within a pre-defined range. It is assigned once and once only, so a record's key never changes.

When census tables are constructed, each cell is a count of the number of respondents, and the cell key is calculated by summing their record keys. The combination of cell value and cell key is then read from a previously constructed look-up table to decide the amount of perturbation that should be used.

Where the same cell, or same combination of respondents, appears in different tables, all instances will have the same cell value and cell key, and so receive the same perturbation. This also ensures that repeated requests of the same dataset will have the same perturbation applied consistently.

For more information on the development and application of Cell Key perturbation is available in our Cell Key Perturbation methodology paper.

Flexible Table Builder rules

We have a number of rules which are applied when tables are created in the Flexible Table Builder. They ensure that tables do not pose a disclosure risk. These rules broadly fall under the following categories:

 

  • Restrictions on the total number of variables which can be added to a table

Users will be limited in the total number of variables that can be added to a table. This will ensure that tables do not become too large and more disclosive.

This limit on variables will vary depending on the required level of geography. For example, we may allow the inclusion of a larger number of variables at higher levels of geography such as Scotland and Local Authority level.

 

Certain combinations of variables will be restricted to ensure that the specific characteristics of individuals and households is not at risk of disclosure in tables. This will be particularly relevant for variables which are considered special category data due to their sensitivity (for example religion, ethnicity, sexual orientation).

 

  • Checks on the proportion of small cells (1s and 2s) in a table and sparsity

We will also carry out automatic checks to ensure that tables do not have a high proportion of small cell counts (1s and 2s) or empty cells (0s).

Small counts (zero, one, and two) will be included in publicly released tables for 2022 if there is sufficient uncertainty as to whether the small cell count was a true value, and that this uncertainty had been systematically created.

UK harmonisation

We are working with the other UK censuses to make sure our SDC methods are harmonised, where appropriate.

We do this through a working group that we attend with the Office for National Statistics and the Northern Ireland Statistics and Research Agency.

Find out more about how we work with other UK censuses.

Stakeholder events

In June 2017, we outlined our strategy for SDC to stakeholders, along with some early plans. See the slides from the event.

In February 2020 we outlined our SDC methodology, as part of our broader statistical methodology approach. See the slides from the event.

We also provided a summary of our SDC approach in the our May 2024 webinars.