Understanding Data Subject Requests (DSRs) and Implementing a Scalable Solution
A Data Subject Request (DSR) is a formal request made by an individual (the "data subject") to an organization, asking to exercise their rights under various privacy regulations. These regulations, such as the General Data Protection Regulation (GDPR) in the European Union and the California Consumer Privacy Act (CCPA) in the United States, grant individuals specific rights over their personal data.
Types of DSRs
Common types of DSRs include:
- Right to Access: Request to access personal data held by an organization.
- Right to Rectification: Request to correct inaccurate or incomplete personal data.
- Right to Erasure (Right to Be Forgotten): Request to delete personal data under certain conditions.
- Right to Data Portability: Request to transfer personal data to another service provider.
- Right to Restrict Processing: Request to limit how personal data is processed.
- Right to Object: Request to stop processing personal data for specific purposes, such as marketing.
Challenges in Handling DSRs in Distributed Systems
Organizations often face challenges in handling DSRs in systems built with multiple microservices. Some key challenges include:
- Data Distribution: Personal data is scattered across various microservices.
- Inter-Service Dependencies: Fulfilling a DSR may require a specific order of operations among services.
- Scalability: Ensuring the system can handle multiple DSRs simultaneously without bottlenecks.
Architecture
In most cases, support for DSR is not a primary consideration when designing the architecture of microservices - it isn’t a core business value. Instead, it becomes a requirement that must be addressed once the system is prepared to process user data.
In our case, the microservices architecture was already well established when we needed to add support for DSR. The proposed architecture leverages AWS Lambda for processing logic, Kinesis Streams for communication, and microservices for handling specific data operations, ensuring a scalable, modular, and auditable system for DSR handling.
Key Components
- API Gateway:
- Acts as the entry point for receiving DSR requests.
- It can be used to create an HTTP endpoint for your lambda, see the docs.
- Orchestrator Lambda:
- Validates the request and translates it into a task that can be consumed by task executors.
- Kinesis Stream for Task Propagation:
- Acts as a queue for tasks created by the Orchestrator Lambda.
- Ensures reliable and scalable delivery of tasks to the Task Executor Lambdas.
- Task Executor Lambdas:
- Multiple Task Executor Lambdas handles different sets of microservices based on the type of tasks or areas of interest, e.g.,: User Services, Commerce, Content, etc.
- Each Task Executor Lambda consumes tasks from the Kinesis Stream and calls the assigned microservices in the correct order, respecting dependencies.
- Microservices:
- Handle specific parts of the DSR, such as retrieving, deleting, or updating data.
- Each of the microservices emits events representing changes to the state which can be used to monitor the progress of DSR.
- Kinesis Events Streams:
- All our services have been emitting events related to data change (similar to CDC), thus allowed to use those events to monitor progress of DSR
- Used for tracking the status of tasks and enabling asynchronous communication between components.
- Completion Lambda:
- Filters events emitted by microservices to monitor the status of the DSR tasks.
- Aggregates results and sends a final response.
- Notifier (SNS/SES):
- Sends notifications to the requester about the status or completion of the DSR.
- Any other notification mechanism can be used instead, yet this is the simplest solution to support application-to-person messaging.
- Database/Storage:
- Optionally stores audit logs and tracks the progress of DSRs.
Diagram of the Architecture
The diagram below looks complex, yet it merely represents the scale of challenges involved in implementing a DSR solution for a system with dozens of microservices. It leverages existing communication architecture using Kinesis streams to propagate events. It can be called a vertical architecture, where execution flow proceeds from top to bottom.
Workflow Description
- Request Initialization:
- The API Gateway receives the DSR request (e.g., a request to delete data for a specific user).
- The request is forwarded to the Orchestrator Lambda, which validates the request and creates tasks for relevant executors.
- Task Broadcasting:
- The Orchestrator Lambda sends tasks as events to a Kinesis Stream.
- Parallel Execution by Multiple Executors:
- Multiple Task Executor Lambdas consume events from the Kinesis Stream. Each Task Executor Lambda is responsible for a specific group of microservices.
- Within each Task Executor Lambda, tasks are executed sequentially if dependencies exist between microservices.
- If any of the microservice fails to process the request, the whole task is restarted from scratch, thus implicates support for idempotency in each of the microservices
- Status Tracking:
- Each microservice sends an event of data change back to a Kinesis Stream. This is a part of normal service behaviour not related to handling DSR.
- The Completion Lambda monitors the stream and aggregates results, but filtering the appropriate events can be challenging. One option is to introduce a dedicated field that specifies the reason for deletion, which the Lambda can use to process only the relevant events.
- Final Response:
- Once all tasks are completed, the Completion Lambda updates the status in the database and triggers a notification via SNS/SES.
- The final response is sent to the requester via the API Gateway.
Alternative approach
It is possible to have another approach similar to that represented above, especially if, in your microservices architecture, there are no additional Kinesis streams used to propagate events. In such a case the Executor lambdas can propagate results for each group of microservices into a dedicated Kinesis stream. We can call this a horizontal architecture.
The workflow description is very similar to the previous one, except the progress of executions is tracked using one dedicated Kinesis stream instead of collecting data from many streams.
Tracking DSR Progress
Key Principles
Tracking the progress of Data Subject Requests (DSRs) is crucial for maintaining visibility, ensuring compliance with deadlines, and providing status updates to requesters. Each step in the DSR process must emit a status update indicating its progress, recorded in a centralized and queryable location.
Components for Progress Tracking
- Kinesis Streams for Status Updates:
- Each microservice sends status updates (e.g.,
in-progress
,completed
,failed
) to a dedicated Kinesis Stream. - Updates include metadata such as the DSR ID, task name, timestamp, and current status.
- Each microservice sends status updates (e.g.,
- Completion Lambda for Aggregation:
- Monitors the status stream and aggregates statuses for the DSR.
- Tracks which tasks are pending, completed, or failed.
- Updates the centralized database with real-time progress.
- Centralized Database (DynamoDB/RDS):
- Stores the progress and current state of each DSR.
- Schema Example:
- DSR ID: Unique identifier for the DSR.
- Tasks: List of all tasks and their statuses.
- Overall Status:
Pending
,In-Progress
, orCompleted
. - Last Updated: Timestamp of the latest update.
- Allows querying and auditing of the DSR process.
- Notification Service (SNS/SES):
- Sends updates to requesters if enabled, such as:
- When the DSR begins processing.
- If there are delays or issues.
- Upon completion.
- Sends updates to requesters if enabled, such as:
- Dashboard (Optional):
- A web-based or console interface for administrators to display the progress of all active DSRs.
- Provides search and filtering capabilities.
Workflow for Progress Tracking
- Microservices Emit Status Updates:
- Each microservice processes its task and sends a status update to the Kinesis Data Stream upon completion, failure, or intermediate progress.
- Data Stream Feeds Completion Lambda:
- The Completion Lambda consumes updates from the Data Stream and updates the Centralized Database.
- It determines if all tasks are completed or if errors need to be escalated.
- Real-Time Updates in Database:
- The centralized database is updated with task-level progress for the DSR.
- Example Record:
{ "DSR_ID": "12345", "Tasks": [ {"TaskName": "Delete Data from Service A", "Status": "Completed"}, {"TaskName": "Delete Data from Service B", "Status": "In-Progress"} ], "OverallStatus": "In-Progress", "LastUpdated": "2024-12-23T10:15:30Z" }
- Notifications and Dashboard:
- Notifications are triggered based on specific conditions (e.g., DSR completion or delays).
- Administrators use a dashboard to view DSR progress and intervene if necessary.
Best Practices for Progress Tracking
- Consistent Metadata: Standardize the format of status updates to ensure uniform tracking.
- Retry Logic: Implement retry mechanisms for failed tasks and log errors for auditing.
- Audit Logs: Maintain a history of all status updates for compliance and debugging.
- Timeouts and Alerts: Set timeouts for tasks and trigger alerts if deadlines are not met.
- Privacy and Security: Ensure that all tracking data is secure and adheres to privacy regulations.
Advantages of the Mechanism
- Real-Time Visibility: Administrators and requesters can see the progress of DSRs at any time.
- Error Handling: Centralized tracking makes it easier to identify and resolve issues.
- Compliance Assurance: Ensures deadlines are met and creates a clear audit trail for regulatory purposes.
- Scalability: The system can handle multiple DSRs simultaneously without bottlenecks.
Advantages of the Architecture
- Scalability:
- The use of AWS Lambda ensures that the system scales automatically to handle multiple DSRs.
- Modularity:
- Microservices handle specific tasks, making the system easier to maintain and extend.
- Dependency Management:
- The Task Executor Lambdas ensures tasks are executed in the correct order, even when there are inter-service dependencies.
- Audibility:
- Logs and status tracking provide a clear audit trail, ensuring compliance with privacy regulations.
- Asynchronous Communication:
- Kinesis Streams allow components to communicate asynchronously, reducing the risk of bottlenecks.
Conclusion
The proposed architectures provide a scalable and efficient way to handle Data Subject Requests in a distributed system with multiple microservices. Leveraging AWS services such as Lambda and Kinesis Streams ensures compliance with privacy regulations while maintaining high performance and reliability. Real-time progress tracking enhances visibility and helps organizations meet regulatory deadlines effectively.
Reviewed by: Paweł Stawicki