Our Privacy Act rights and Stats Integrated Data Infrastructure

The Council has longstanding concerns about the Integrated Data Infrastructure (IDI), a tool developed by Statistics New Zealand to combine data about each of us from many databases in other government agencies. That data might be in connection to paying income tax, claiming a benefit, or completing the census. The IDI has grown, and continues to grow, in importance to government activity, both because it will be central to the work of the current government’s new Social Investment Agency, and because StatsNZ intend for the IDI to replace at least part of the traditional census as the main way of gathering information about us.

The Council’s is concerned about the risks the IDI poses to our privacy, both because a large centralised store of personal data is attractive to hackers, and because the flimsy measures in the Data and Statistics Act 2022 fail to prevent authorised uses of this data from harming people.

One of the key ways that people can exercise some control over the data about them is to exercise their right under the Privacy Act 2020 to obtain a copy of that data. Unfortunately, Stats NZ refuses people’s requests to see our information in the IDI. We think this is based on misinterpretation of Stats NZ’s obligations under the Privacy Act, so we have written to the Government Statistician setting out why we think its approach is incorrect, and how to fix the problem. We have also copied our letter to the Privacy Commissioner, since we think the Commissioner should take action to uphold people’s rights.


Mr Mark Sowden
Government Statistician
Statistics New Zealand
8 Willis Street
PO Box 2922
Wellington 6011

16 May 2024

Dear Mr Sowden,

The New Zealand Council for Civil Liberties (the Council) has held several meetings with officials from Statistics New Zealand (Stats NZ) over the last year. These have been informative for us, and we hope useful for Stats NZ.

One of the key concerns of the Council for a number of years has been Stats NZ’s shift to using administrative data gathered by other government agencies, the linking of that data, and its effects on people’s privacy.A particular aspect of the Council’s concerns is the Integrated Data Infrastructure (the IDI). Although we have broader concerns about the tool, this letter is focussed on people’s ability to access the data held in the IDI about themselves. The right to access data on oneself is a critical aspect of New Zealand’s privacy regime, and to a person’s control over how information about them is used. Privacy may not be listed in the New Zealand Bill of Rights Act, but it is a fundamental human right, at the core of human dignity.1Privacy Concepts and Issues, Law Commission, January 2008. NZLC SP19, chapter 4. https://www.lawcom.govt.nz/our-work/review-of-the-law-of-privacy/tab/study-paper

People’s right of access to information about themselves held by agencies is governed by Information Privacy Principle 6 (IPP6) in section 22 of the Privacy Act 2020. Part 4 of the Act sets out the details of making IPP6 requests for information, how they must be processed, and the reasons for which a request may be wholly or partially refused.Stats NZ is acutely aware of the privacy issues around the amount of personal data it collects, stores and uses. It aims to reassure the public with multiple web pages and publications. For example, on its page about ‘Information, privacy, security, and confidentiality policy’ Stats NZ says “Stats NZ aspires to model best practice in these areas, rather than mere compliance. Through transparency about our behaviour and demonstrating our stewardship, we foster public trust and confidence in our work and government more generally.”2Information, privacy, security, and confidentiality policy, Stats NZ, 19 October 2022. https://stats.govt.nz/about-us/legislation-policies-and-guidelines/information-privacy-security-and-confidentiality-policy/

This page refers to several items of internal guidance for employees of Stats NZ, which include:

  1. Understanding and Applying the Information, Privacy, Security, and Confidentiality Policy
  2. IPSaC Roles and Responsibilities
  3. Privacy Policy

Unfortunately, these are not linked to from the page and do not appear to be published on the Stats NZ website.

Thanks in part to information published on its website, but also to a helpful meeting in August 2023 with the Stats NZ officials who manage and maintain the IDI, the Council now has a fairly good understanding of this tool.

Before ‘de-identified’ data can be made available to researchers in government departments and other third parties, the data from various sources has to be linked together in what is known as ‘IDI Raw’. This itself has two components: IDI Raw and IDI Raw (Cleaned). The former is when the datasets are first linked, and the latter when the data is de-duplicated and linkages reviewed. This second step is necessary to improve the quality of the data and therefore the probability that the linkages between the personal data about a person in dataset A will be connected to the data about the same person in dataset B, and so on.

The core of the linking is done around the IDI’s ‘spine’, which consists of people’s Inland Revenue Department numbers, birth information from the Department of Internal Affairs, and visa data from the Ministry of Business Innovation and Employment. After this, personal information contained in other datasets from different departments is linked to the spine using two methods: deterministic linking and probabilistic linking.

Deterministic linking is about exact matches, using data fields such as a person’s IRD number. Probabilistic linking connects records based on matches between the datasets for fields such as first name, last name, sex, date of birth and address. It then considers two records and compares the values of the variables in common. After that, it assigns a probability of these two records belonging to the same person, given the quality of the variables and commonality of the values. The record pair with the highest probability of belonging to the same person ‘wins’ and the link between the spine and the dataset is established.

At our meeting in August 2023, Stats NZ told us that the probabilistic type of matching obviously comes with the risk of errors – linking incorrectly (false positive) or missing legitimate links (false negatives). The officials’ presentation also told us that Stats NZ “aim[s] to keep the false positive rate under 2% – this strikes a balance between quality and quantity of links”. Officials also told us that Stats NZ has a process for measuring the false positives and false negatives. Stats NZ’s Data Integration Manual suggests that resolution of false positives and false negatives is a manual process, with officials reviewing the potential linkages that fall into the ‘grey area’ following the automated linking process that has predetermined link ‘thresholds’.3Paragraph 6.4.3 Clerical review. “If the upper and lower thresholds are equal, this divides the set of record pairs cleanly into two sets. However, if they are not, then the record pairs with weights in between the two limits are in the ‘clerical review’ area. In this area, the analyst decides which record pairs are links and which are non-links.” Data Integration Manual: 2nd edition, March 2015. https://www.stats.govt.nz/assets/Uploads/Integrated-data-infrastructure/Your-information-in-the-IDI/Data-Integration-manual-2nd-edition.pdf

It appears to follow – from what Stats NZ publish and have told us – that with a false positive rate of 2% or under the department can be sure that 98% of the data connected to a person’s ‘spine’ record in IDI Raw from other data sources is actually a match for them, and that the data linked together is about that particular person.

Stats NZ officials told us in the August 2023 meeting that they are already doing sampling to manually examine whether the linking is accurate or not, to give an overall % estimate. This implies that they can do the same checks when someone makes a Privacy Act request for the data held on them in the IDI.

It therefore also appears that Stats NZ does have processes for interrogating the data linked to a particular person’s ‘spine’ records in IDI Raw, Indeed, it needs these in order to produce the IDI Raw (Cleaned) version of the tool.

After the links between the datasets and the IDI spine have been established and the data cleaned, the personally identifying information is removed and identifiers encrypted, to produce ‘IDI Clean’. It is this that provides the foundation for third party researchers, who gain access to whatever subset of IDI Clean they need for their research.

People have a right to request information held about themselves under IPP6 of the Privacy Act 2020. This right is qualified, with various grounds set out in sections 49-53 of the Act for refusing to disclose some or all of the data that an agency holds.

The right of access is critical to the scheme of the Privacy Act, since without access people cannot exercise other rights (for example, to have inaccurate information corrected under IPP7), or seek redress for a breach of the Act (for example, to argue that the information should be deleted as no longer relevant to the purpose for which it was collected under IPP9).

When Stats NZ receives a request from someone under IPP6 for the data about them that is held in the IDI, it refuses these requests.

The reason for refusal cited by Stats NZ is “the information you seek cannot be readily retrieved” so Stats NZ “cannot” fulfil the request. Stats NZ claims that this basis for refusal “complies with section 44(2)(a) of the Privacy Act 2020”, which states:

the agency does not hold personal information in a way that enables the information to be readily retrieved;

Stats NZ seeks to underpin its refusal to disclose information held in the IDI about a person under s 44(2)(a) in the following manner:

While these linking techniques are accurate enough to produce a dataset for statistical or research purposes, we can never be sure that the individual we have identified in one dataset is in fact the same person in another dataset. For example, some names are more common than others, and we are often working with incomplete information. We cannot be certain that J Smith in one dataset is the same as John Smith in another.

Since we cannot be 100 percent confident that the data belongs to you after it has been linked, we would need to refer to the raw source data as received from other organisations. We would need to go through these raw files manually, which would be a very resource-intensive task. For example, from one government agency, we receive 23 raw data files. We’d need to manually search through these 23 files to see if they contain information relating to you, using a range of identifying information from you such as current and previous addresses, date of birth, full name, aliases, and agency-specific unique identifiers. We would also need to account for multiple identity numbers within an agency relating to the same person. This can occur because of administrative processes and system changes.

As of August 2022, there are more than 15 organisations that supply data to the IDI. A similar process would need to be repeated for the more than 1000 raw data files that we receive annually from source organisations. Because of this, the information you seek cannot be readily retrieved and we cannot fulfil your request. This decision complies with section 44(2)(a) of the Privacy Act (2020).

While passages above are quotes from a specific Stats NZ response to an IPP6 request, it is clear from Stats NZ’s website that this reflects the department’s approved and authorised approach towards all such requests. On the page Your information in the IDI, Stats NZ says the following:4Your information in the IDI, Stats NZ, 22 March 2023. https://www.stats.govt.nz/integrated-data/integrated-data-infrastructure/your-information-in-the-idi/

We cannot readily retrieve your personal information from the IDI

The IDI has not been designed to allow the removal of information relating to individuals by request.

Since we cannot be sure of the data belonging to you after it has been linked, we would need to refer back to the raw-source data received from other organisations. We would need to go through all the raw files manually, which would be an extremely resource intensive task. Because of this, we are usually unable to retrieve information about individuals when we are asked to do so.

If you wish to have a copy of your information from the organisations who supply data to the IDI, we recommend contacting them. Data in the IDI has a list of organisations to contact who can consider your request.

The Council does not believe that Stats NZ’s claim that it “cannot” disclose the information held in the IDI and requested by a person making an IPP6 application is a lawful response that is compliant with the Privacy Act.

Stats NZ’s reliance on this response is even more problematic when applied in a blanket-fashion to all requests for data held on an identifiable person in the IDI.

As quoted above, Stats NZ says the reason underpinning its refusals is because it “cannot be 100 percent confident that the data belongs to you after it has been linked” and that therefore it would need to do resource-intensive checking.

However, this is not a lawful basis for refusal of an IPP6 request. No part of sections 49 to 53 of the Privacy Act states that an agency can refuse to disclose information to an applicant on the basis that it is not ‘100 percent confident’ that the data relates to that person.

Only two aspects of those sections would appear to offer any ability to refuse disclosure because doing so might provide the applicant with information about another person:

  1. Section 49(1)(a)(iii) states than agency may refuse access to any personal information requested if the disclosure of the information would include disclosure of information about another person who (A) is the victim of an offence or alleged offence; and (B) would be caused significant distress, loss of dignity, or injury to feelings by the disclosure of the information; and 
  2. Section 53(b) states that an agency may refuse access to any personal information if the disclosure of the information would involve the unwarranted disclosure of the affairs of another individual or a deceased person.

These can be discounted as reasons for refusal to disclose information held in the IDI Raw (cleaned) on the basis of not being “100 percent confident that the data belongs to you after it has been linked”.

This is because by the time the data is in the ‘Raw (Cleaned)’ stage of the IDI production process, the data is only connected to one identifiable individual via the spine.

For incorrect information to be linked to the wrong person, Stats NZ would, during the linking process, have to have made errors in relation to at least the person’s name, sex, date of birth, and possibly their address as well. Stats NZ officials told us that it aims to keep the false positive rate for links to “under 2%”, and section 7.4.1 of Stats NZ’s Data Integration Manual explains that “if it is critical to avoid false links” officials doing the linking should “set the cut-off threshold higher, being mindful that some true matches will be missed.” The Council would expect that datasets relating to the victims of offences would be particularly sensitive and involve especially close scrutiny by Stats NZ officials prior to linking. Therefore the possibility of a response to an IPP6 request meeting the section 49(1)(a)(iii) tests would be vanishingly small.

Even if information disclosed in response to an IPP6 request contains data that falls in this 2 percent of false positives, it is unlikely to meet the tests in sections 49(1)(a)(iii) and 53(b) because if it did relate to another person, it is highly unlikely to be able to be connected by the requester to the relevant other person. Such matching errors are most likely to occur when there are many people with both the same name, sex, and date of birth. In such circumstances, the probability of the requester being able to match it to the other person with the same name as them is also vanishingly small.

In any event, neither of the tests for these withholding provisions require the agency that holds the information to be “100 percent confident” that they are not inadvertently disclosing information about another individual.

Since the obligation to be “100 percent confident” before making a disclosure in response to an IPP6 request does not exist in the Privacy Act, it cannot be relied upon by Stats NZ to claim that prior to disclosure it would “need to refer to the raw source data as received from other organisations. We would need to go through these raw files manually, which would be a very resource-intensive task.”

Thus Stats NZ cannot claim section 44(2)(a) of the Privacy Act to assert that it “does not hold personal information in a way that enables the information to be readily retrieved.” The personal information in the IDI Raw (Cleaned) version of the tool is all personal information that is linked to a verified individual person that has a unique ID within Stats NZ and therefore can be ‘readily retrieved’.

Furthermore, even if any information were to be disclosed that inadvertently includes information about another person, section 205 of the Privacy Act provides agencies with indemnity against actions for wrongful disclosure if the disclosure was made in good faith in response to an IPP6 request. It states that:

  1. If any personal information is made available in good faith under IPP 6,—
    • no proceedings, civil or criminal, may be brought against the Crown or any other person in respect of the making available of that information, or in respect of any consequences that follow from the making available of that information; and
    • no proceedings, civil or criminal, in respect of any publication involved in, or resulting from, the making available of that information may be brought against the author of the information or any other person by reason of that author or other person having supplied the information to an agency.

Therefore, in the highly unlikely event that Stats NZ were to disclose information incorrectly linked to the IPP6 requester to that person, and the person whose incorrectly linked information were not only identifiable but suffered “significant distress, loss of dignity, or injury” or the disclosure was “unwarranted”, then Stats NZ would be protected from legal action by that third party if they had disclosed the information in good faith. This is a very similar protection to that which exists in section 48 of the Official Information Act, which government departments have relied upon for years to protect them against legal action for decades when responding to requests for information under that law. Aside from anything else, the Ombudsman has consistently told agencies responding to OIA requests that if they’re concerned the requester will misunderstand or misinterpret the information disclosed to them, the answer is to provide an explanation, not to refuse the request.

The other problem with Stats NZ’s approach to IPP6 requests for information held about the requester in the IDI is that it takes a ‘one size fits all’ approach.

This presumes that the possibility of inadvertently disclosing information about the wrong person to the applicant is the same regardless of whether the applicant’s name is very common, e.g. ‘John Smith’, or very likely to be unique, e.g. ‘Zaphod Beeblebrox’.

Even if the requester’s name is common, given the normal steps required by all agencies to verify the identity of the requester before responding to their IPP6 request, incorrect disclosure would have to mean there is at least one other person with the same full name, sex, date of birth, and IRD number – since that is a ‘spine’ attribute for distinguishing between people in the IDI.

The Council doubts that, in these circumstances, a ‘one size fits all’ approach is lawful in relation to the Privacy Act. At the very least, the department would seem to be improperly fettering its discretion.

As we have noted, and the department is acutely aware, public trust and confidence in the collection, storage, management, sharing and disposal of people’s personal data is essential to Stats NZ’s work. We do not need to cite all the statements made by Stats NZ on this subject nor repeat its concerns about the effects of a loss of trust on the public’s willingness to respond to the government’s surveys. However, high standards of compliance with the Privacy Act is not only essential to trust in the department and the social licence to operate the IDI, but to trust in government overall. For this reason, it is essential that Stats NZ makes changes to how it responds to IPP6 requests.

The Council understands that a change of approach by Stats NZ to IPP6 requests for information held by Stats NZ, particularly in the IDI, will have implications for the workload on the agency and the resources required. The Council has a great deal of sympathy for the department given the ridiculous requirement from ministers that it make 7.5% cuts to its budget. 

Nonetheless, legal obligations apply to the department irrespective of the funding it is provided with. In creating the IDI, the department has gathered masses of information on the public. Information is power, and with power come responsibilities.

There are several aspects to the changes Stats NZ must make:

First, Stats NZ needs to recognise and publicly acknowledge that as the government’s centralised ‘hub’ for such a massive accumulation of administrative data on all the people living in Aotearoa, it has a responsibility to all those people on behalf of the government as whole. Not just to ensure that data is accurately linked to facilitate research and analysis by government departments and non-government researchers, but to make it easier for people to make practical use of their right to information, to ascertain what information the state holds on them. Acknowledging this is and taking ownership of it is an essential first step to making a bid for the funding needed to meet the department’s legal obligations.

Second, as we noted earlier, IPP6 is the keystone for people exercising their other rights under the Privacy Act.

Section 29(2) of Privacy Act 2020 disapplies principle 7 – correction of personal data – to personal information collected by the Government Statistician under the Data and Statistics Act 2022. This means that while Stats NZ must comply with its IPP6 obligations in relation to data stored in the IDI, it is not obliged to correct that data if the requester shows that the information is not accurate, up to date, complete or is misleading. This relieves Stats NZ of a significant obligation, but also presents it with an opportunity that could lead to improvements in the quality of data it is supplied with by third parties for use in the IDI, as well as public trust and confidence in how the state manages the data held on people.

As a key conduit for people to find out what information the state holds on them, many people are likely to use this route to exercise their IPP6 rights. Stats NZ can help the requesters and itself by designing query reports from the IDI Raw (Cleaned) database that, in columns alongside the data linked to that person, specify the source agency for that data and the name of the dataset it was contained in when the agency provided it to Stats NZ. If necessary, its responses to requests could also state the unique ID used by the supplying agency for each item of data, and in the case of datasets that have been probabilistically matched, the percentage confidence it has that the data in question being about the requester.

This will enable requesters who believe that any information about them held by Stats NZ is inaccurate, out of date, incomplete or misleading to approach the supplying agency to exercise their IPP7 rights to have it corrected. If the erroneous data is corrected or deleted, this will improve the quality of the data Stats NZ receives from that agency in future.

Furthermore, since IPP8 specifies that “An agency that holds personal information must not use or disclose that information without taking any steps that are, in the circumstances, reasonable to ensure that the information is accurate, up to date, complete, relevant, and not misleading”, enabling people to exercise their IPP7 rights will help the source agencies comply with their IPP8 obligations. In turn this would assist Stats NZ with its own compliance with this principle.

Third, Stats NZ needs to be ready to explain to IPP6 requesters how IPP9 and paragraph (1)(b)(ii) of IPP10 interact in relation to the data that Stats NZ holds in the IDI.

IPP9 specifies that “An agency that holds personal information must not keep that information for longer than is required for the purposes for which the information may lawfully be used.” However, paragraph (1)(b)(ii) of IPP10 says that:

An agency that holds personal information that was obtained in connection with one purpose may not use the information for any other purpose unless the agency believes, on reasonable grounds… that the information… is to be used for statistical or research purposes and will not be published in a form that could reasonably be expected to identify the individual concerned.

The cumulative effect of these two provisions appears to be that there is no time limit on how long Stats NZ can hold any amount of personal data on all the people in Aotearoa for use in the IDI.

Since there is no statutory time limit on Stats NZ’s retention of the data, over time ever more data will be accumulated and held on each person. This has to be explained in the department’s responses to IPP6 requests, so that people understand the situation and can make informed choices about how long to wait until they make a fresh request to Stats NZ.

The Council would add that the absence of a statutory time limit on Stats NZ’s data retention in the IDI inherently increases the risks to people’s privacy as more data is accumulated, and seems contrary to the data minimisation intent of IPP1. This is besides the absence of adequate safeguards against misuse that we pointed out during the passage of the 2022 Act. Given the overturning by the present government in its Fast-track Approvals Bill of decades’ long public expectations of access to information and participation in the planning system, we can have little confidence that the government will abide by the norms on which the 2022 Act’s ‘safeguards’ rest.

In the course of writing this letter, it has become clear that the public would be better served by some unpublished information held by Stats NZ being placed in the public domain. The Council therefore requests under the Official Information Act to be provided with:

  1. The following documents that are referred to on the web page entitled Information, privacy, security, and confidentiality policy:
    • Understanding and Applying the Information, Privacy, Security, and Confidentiality Policy
    • IPSaC Roles and Responsibilities
    • Privacy Policy
  2. A copy of all legal advice generated within, or provided to, Stats NZ about its approach to IPP6 requests for information held in the IDI.
  3. A copy of information showing the outcome of Stats NZ’s communications with the Chief Archivist in relation to defining ‘historical data’ for the purposes of section 41 of the Data and Statistics Act 2022.

Under section 16 of the OIA, the Council requests (a) that this information is provided to it in the file format in which the information was created, and (b) that if the information is contained within a larger document, the whole of that document is provided. Under section 19(a)(ii) of the OIA, if Stats NZ believes there is ‘good reason’ in the OIA not to provide the information requested, the Council requests that Stats NZ provides it with an explanation of the grounds in support of its reasons for withholding, and the public interest factors it considered when reaching its decision.

We look forward to Stats NZ addressing the concerns raised in this letter regarding its compliance with the Privacy Act 2020. At a time when the government is not only relying on the IDI to inform policy but to drive a ‘social investment’ approach, the importance of Stats NZ properly responding to its IPP6 obligations cannot be overstated. Stats NZ will benefit from whole hearted compliance with the Privacy Act through a feedback loop of corrected data being supplied to it by the source agencies; the government will benefit from improved public trust and confidence in how it is collecting, storing, managing and sharing people’s personal data; and the public will benefit from being able to more effectively exercise those rights that are at the heart of human dignity.

In view of the public interest in this issue, the Council is copying this letter to the Privacy Commissioner, and will be publishing it, and your reply, on its website.

Yours sincerely,

Thomas Beagle
Chair