Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Personal identifiable information (PII) in the deployment subsystem #464

Open
Whathecode opened this issue Mar 2, 2024 · 1 comment
Open
Assignees
Labels
needs discussion This cannot yet be implemented since it requires further conversation.

Comments

@Whathecode
Copy link
Member

Whathecode commented Mar 2, 2024

Overall, the deployments subsystem does a good job of not including PII in its subsystem. The link to an actual account which allows for re-identification happens in the study subsystem. A potential aim could be that a claim can be made the deployments subsystem does not store PII, and hosting can thus be outsourced to a third-party with less legally binding requirements.

However, there is data stored in the deployment subsystem which can carry PII. Concretely:

  • The data which is requested from participants through the study protocol's "expected participant data" is stored in the deployment subsystem (ParticipantData). By design, the goal is for this data not to be PII (when considering all potential users using the platform). But, the researcher may request PII data, or inadvertently, a combination of requested data may make re-identification possible.
  • Device registrations, and their full history over the course of a study, are stored in study deployments. They serve at least two purposes: (1) contains the necessary details for the client on how to connect to the device, (2) stores device specifications which may be relevant to the researcher when interpreting collection data. The former is the most problematic in the context of PII. For example, a MACAddressDeviceRegistration is used to store connection information for BLE devices. This is useful, as researchers can pre-register devices when handing them out to lend to study participants. But, in the case where participants use their own devices, they are uniquely linked to a specific participant.

The latter issue (DeviceRegistration) may not be a big issue when considering the scope of people to consider during reidentification. It may be easy to get a MAC address of a specific individual you are targeting (simple BLE scanning), but it's not trivial to scan every potential person who may have data in the deployments subsystem. More risky could be when DeviceRegistration would be used to connect to third-party services, such as Google Fit, to retrieve sensor data.

This may make the idea of hosting the studies and deployments subsystems by separate organizations without a legal binding contract to address these issues hard to achieve, unless some further design or infrastructure work is done. E.g., encrypting DeviceRegistration and ParticipantData.

@Whathecode Whathecode added the needs discussion This cannot yet be implemented since it requires further conversation. label Mar 2, 2024
@bardram bardram added this to the 1.3.0 milestone Sep 3, 2024
@Whathecode
Copy link
Member Author

Whathecode commented Oct 9, 2024

More risky could be when DeviceRegistration would be used to connect to third-party services, such as Google Fit, to retrieve sensor data.

From this, I currently conclude that the device registration for such a device shouldn't contain the username of the account, and instead rely on a UUID (so probably just DefaultDeviceRegistration), and handle the linking of that id to a setup enabling authentication in the application/infrastructure layer (outside of core). Or, it could store a token instead.

Either way, the general point is that care needs to be taken when designing new DeviceRegistration types to adhere to the data minimization principle, and reduce the risk at a minimum of direct identification of individuals for any data stored in the deployment subsystem.

But, @bardram, I believe the following conclusion in still spot on:

This may make the idea of hosting the studies and deployments subsystems by separate organizations without a legal binding contract to address these issues hard to achieve, unless some further design or infrastructure work is done. E.g., encrypting DeviceRegistration and ParticipantData.

It seems likely enough that some data stored in the deployment subsystem would be classified as PII, so stuff like GDPR kicks in. Without having the application layer fully handle encryption of said data, that would make the deployments subsystem in CARP core a data processor.

But, none of this is a concern for the current release, and this subsystem isn't deployed separately yet either way (other subsystems, like studies obviously will always have PII), so we can consider this a theoretical exercise until the point when this becomes an actual requirement. Therefore, I'll remove resolving this from the next milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion This cannot yet be implemented since it requires further conversation.
Projects
None yet
Development

No branches or pull requests

2 participants