Personal identifiable information (PII) in the deployment subsystem #464

Whathecode · 2024-03-02T20:11:29Z

Overall, the deployments subsystem does a good job of not including PII in its subsystem. The link to an actual account which allows for re-identification happens in the study subsystem. A potential aim could be that a claim can be made the deployments subsystem does not store PII, and hosting can thus be outsourced to a third-party with less legally binding requirements.

However, there is data stored in the deployment subsystem which can carry PII. Concretely:

The data which is requested from participants through the study protocol's "expected participant data" is stored in the deployment subsystem (ParticipantData). By design, the goal is for this data not to be PII (when considering all potential users using the platform). But, the researcher may request PII data, or inadvertently, a combination of requested data may make re-identification possible.
Device registrations, and their full history over the course of a study, are stored in study deployments. They serve at least two purposes: (1) contains the necessary details for the client on how to connect to the device, (2) stores device specifications which may be relevant to the researcher when interpreting collection data. The former is the most problematic in the context of PII. For example, a MACAddressDeviceRegistration is used to store connection information for BLE devices. This is useful, as researchers can pre-register devices when handing them out to lend to study participants. But, in the case where participants use their own devices, they are uniquely linked to a specific participant.

The latter issue (DeviceRegistration) may not be a big issue when considering the scope of people to consider during reidentification. It may be easy to get a MAC address of a specific individual you are targeting (simple BLE scanning), but it's not trivial to scan every potential person who may have data in the deployments subsystem. More risky could be when DeviceRegistration would be used to connect to third-party services, such as Google Fit, to retrieve sensor data.

This may make the idea of hosting the studies and deployments subsystems by separate organizations without a legal binding contract to address these issues hard to achieve, unless some further design or infrastructure work is done. E.g., encrypting DeviceRegistration and ParticipantData.

The text was updated successfully, but these errors were encountered:

Whathecode · 2024-10-09T21:51:22Z

More risky could be when DeviceRegistration would be used to connect to third-party services, such as Google Fit, to retrieve sensor data.

From this, I currently conclude that the device registration for such a device shouldn't contain the username of the account, and instead rely on a UUID (so probably just DefaultDeviceRegistration), and handle the linking of that id to a setup enabling authentication in the application/infrastructure layer (outside of core). Or, it could store a token instead.

Either way, the general point is that care needs to be taken when designing new DeviceRegistration types to adhere to the data minimization principle, and reduce the risk at a minimum of direct identification of individuals for any data stored in the deployment subsystem.

But, @bardram, I believe the following conclusion in still spot on:

This may make the idea of hosting the studies and deployments subsystems by separate organizations without a legal binding contract to address these issues hard to achieve, unless some further design or infrastructure work is done. E.g., encrypting DeviceRegistration and ParticipantData.

It seems likely enough that some data stored in the deployment subsystem would be classified as PII, so stuff like GDPR kicks in. Without having the application layer fully handle encryption of said data, that would make the deployments subsystem in CARP core a data processor.

But, none of this is a concern for the current release, and this subsystem isn't deployed separately yet either way (other subsystems, like studies obviously will always have PII), so we can consider this a theoretical exercise until the point when this becomes an actual requirement. Therefore, I'll remove resolving this from the next milestone.

Whathecode added the needs discussion This cannot yet be implemented since it requires further conversation. label Mar 2, 2024

Whathecode mentioned this issue Mar 2, 2024

Should DeviceRegistration include freeform specification data? #431

Closed

bardram assigned Whathecode Sep 3, 2024

bardram added this to the 1.3.0 milestone Sep 3, 2024

yuanchen233 mentioned this issue Sep 19, 2024

Endpoints to edit staged participant groups #319

Open

Whathecode removed this from the 1.3.0 milestone Oct 9, 2024

Whathecode mentioned this issue Oct 9, 2024

Include DeviceRegistration in DeviceDeploymentStatus #488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Personal identifiable information (PII) in the deployment subsystem #464

Personal identifiable information (PII) in the deployment subsystem #464

Whathecode commented Mar 2, 2024 •

edited

Loading

Whathecode commented Oct 9, 2024 •

edited

Loading

Personal identifiable information (PII) in the deployment subsystem #464

Personal identifiable information (PII) in the deployment subsystem #464

Comments

Whathecode commented Mar 2, 2024 • edited Loading

Whathecode commented Oct 9, 2024 • edited Loading

Whathecode commented Mar 2, 2024 •

edited

Loading

Whathecode commented Oct 9, 2024 •

edited

Loading