How NSW released COVID data by postcode

by

NSW is using an innovative data privacy check to allow the quick release of daily local COVID-19 data without it being re-identified afterwards.

The NSW Data Analytics Centre has used a mathematical index to assess the safety of releasing infection data in a postcode heat map where people can check on infections in their local area.

https://static.ffx.io/images/$zoom_0.552%2C$multiply_0.7214%2C$ratio_1.776846%2C$width_1059%2C$x_0%2C$y_0/t_crop_custom/e_sharpen:25%2Cq_42%2Cf_auto/f35d0e38e14c45b3c0b6cfa240a7207e885d83dd
NSW Health has revealed that Waverley, home to Bondi Beach, has clusters of the coronavirus. Nigel Gladstone / Steven Siewert

It is the only Australian jurisdiction to release granular postcode-level data about case infection and causes in near real-time.

Known as the Personal Information Factor, the index paves the way for far more sharing of anonymised public data between governments and agencies.

Integrating data lets policymakers better understand problems and the underlying drivers of major social issues such as family violence, children at risk and mental health.

It also means operational data can be shared to better manage systems such as transport, health, energy and public safety.

Data integration also helps streamline complex multi-jurisdictional life event into one service. For example, when someone dies their relatives wouldn't have to navigate the three tiers of government to wind up their affairs.

Data integration and sharing also enables simple improvements such as "tell us once" sign-in, where people only have to sign in once and forms can be pre-filled.

To date, state and federal agencies have applied a subjective "five-safes test" to determine if releasing data is appropriate. This has seen long delays as risk-averse agencies procrastinated.

NSW Chief Data Scientist Ian Oppermann said the Data Analytics Centre had been working for the past three years on an index to measure the safety of releasing anonymous data.

"Along came COVID, so we very rapidly accelerated our processes of thinking through how we would map that Personal Information Factor," Dr Oppermann told a CEDA seminar on Tuesday.

It measures the sensitivity of the data, the context and use of the data, then generates a score that assesses the potential risk someone has of being re-identified from an anonymised data set. If the score is below one, the data can't be shared.

The test has been developed corroboratively under the auspices of the Australian Computer Society, together with other Australian jurisdictions and standards bodies, and is applicable for both government and business.

Dr Oppermann said the emphasis in NSW was to put out as much useful data in the public interest as possible.

"Over the course of the last three or four years the Data Analytics Centre has been looking at the challenge of linking data sets to address wicked policy challenges.

"The concern always is, even if de-identified, even by using the gold standards, how do you ensure that you have data which doesn't cross that threshold of being identifiable. It turns out to be an extraordinarily difficult challenge."

Dr Oppermann said the concern was that if you link enough data sets together, at some point you will get people-centric data.

"You will get to the point where someone is reasonably identifiable by hair colour or shoe size.

"And so we very rapidly accelerated our processes of thinking through how we would map that personal inflation factor."

Context was important, he said. "Thinking through in a crisis, where do we put those sensitivity thresholds so that we would really start off with a certain level of personal information under normal circumstances versus what would we do under crisis circumstances and the public interests.

"So we simplified all the complexity down into three broad categories: closed environment, environment where it's restricted who gets to see the data, who gets to see the insights. And then when you release to the outside world, as open data."

The use of the Personal Information Factor to integrate infection data comes as the federal government is proposing a data-sharing bill to override various privacy regulations that ban Commonwealth agencies from sharing data.

The federal government lags the states in sharing data and is proposing to continue using the five-safes test to determine if it is OK to integrate and release data.

NSW Customer Service Minister Victor Dominello is pushing for the Commonwealth to embrace the new technique to accelerate data sharing between governments and agencies.

The Australian Data and Digital Council – made up of digital ministers across the federation – is meeting monthly with an agenda to accelerate the use of data to build better life-cycle services for government.

A dedicated group of officials from all jurisdictions has been looking at how to improve data sharing to support COVID-19 response and recovery.