Docs / Delete sensitive data
Delete sensitive data
Implement comprehensive data deletion procedures to protect user privacy and maintain regulatory compliance. Secure deletion processes ensure sensitive information is permanently removed while maintaining audit trails for legal requirements.
Data deletion is a critical component of privacy protection and regulatory compliance, particularly under regulations like GDPR, CCPA, and HIPAA. Effective deletion procedures balance thorough data removal with operational efficiency and audit requirements.
Modern AI systems generate extensive data trails including training data, evaluation results, and user interactions. Comprehensive deletion procedures must address all data repositories, backups, and derived datasets to ensure complete privacy protection.

Data Identification and Discovery
Systematically identify all instances of sensitive data across your AI evaluation infrastructure. Comprehensive discovery ensures no data remnants are overlooked during deletion procedures.
- 1
Search across data stores Use identifiers, tags, and metadata to locate all instances of sensitive data.
- 2
Map data dependencies Identify derived datasets, cached results, and system logs containing the data.
- 3
Check backup systems Ensure backup and archive systems are included in deletion scope.
- 4
Validate discovery completeness Use multiple search methods to confirm all data instances are identified.
import evaligo
from evaligo.privacy import DataDeletionManager
from typing import List, Dict, Set
import re
class SensitiveDataDiscovery:
"""Discovers all instances of sensitive data across the platform"""
def __init__(self, client: evaligo.Client):
self.client = client
self.deletion_manager = DataDeletionManager(client)
def discover_user_data(self, user_identifiers: List[str]) -> Dict:
"""Discover all data associated with specific user identifiers"""
discovered_data = {
'traces': [],
'datasets': [],
'experiments': [],
'evaluations': [],
'logs': [],
'backups': []
}
for identifier in user_identifiers:
# Search traces
traces = self._search_traces_by_identifier(identifier)
discovered_data['traces'].extend(traces)
# Search datasets
datasets = self._search_datasets_by_identifier(identifier)
discovered_data['datasets'].extend(datasets)
# Search experiments
experiments = self._search_experiments_by_identifier(identifier)
discovered_data['experiments'].extend(experiments)
# Search evaluation results
evaluations = self._search_evaluations_by_identifier(identifier)
discovered_data['evaluations'].extend(evaluations)
# Search system logs
logs = self._search_logs_by_identifier(identifier)
discovered_data['logs'].extend(logs)
# Search backup systems
backups = self._search_backups_by_identifier(identifier)
discovered_data['backups'].extend(backups)
# Remove duplicates and validate
discovered_data = self._deduplicate_results(discovered_data)
discovery_report = self._generate_discovery_report(discovered_data, user_identifiers)
return {
'discovered_data': discovered_data,
'discovery_report': discovery_report,
'total_items': sum(len(items) for items in discovered_data.values())
}
def _search_traces_by_identifier(self, identifier: str) -> List[Dict]:
"""Search traces containing the identifier"""
search_queries = [
f"user_id:{identifier}",
f"email:{identifier}",
f"session_id:{identifier}",
f"metadata.user_identifier:{identifier}"
]
found_traces = []
for query in search_queries:
traces = self.client.traces.search(
query=query,
include_metadata=True,
include_spans=True
)
for trace in traces:
# Additional content-based search
if self._contains_identifier_in_content(trace, identifier):
found_traces.append({
'trace_id': trace.id,
'timestamp': trace.timestamp,
'project_id': trace.project_id,
'match_method': 'content_search',
'sensitive_spans': self._identify_sensitive_spans(trace, identifier)
})
return found_traces
def _contains_identifier_in_content(self, trace, identifier: str) -> bool:
"""Check if identifier appears in trace content"""
# Check inputs and outputs
for span in trace.spans:
if identifier in str(span.input) or identifier in str(span.output):
return True
# Check metadata
if span.metadata and identifier in str(span.metadata):
return True
return False
def validate_discovery_completeness(self, identifiers: List[str]) -> Dict:
"""Validate that data discovery is complete using multiple methods"""
validation_results = {
'method_agreement': {},
'potential_gaps': [],
'confidence_score': 0.0
}
# Run discovery using different methods
primary_results = self.discover_user_data(identifiers)
# Cross-validation with alternative search methods
secondary_results = self._alternative_discovery_method(identifiers)
# Compare results
agreement_score = self._calculate_method_agreement(
primary_results, secondary_results
)
validation_results['method_agreement'] = agreement_score
validation_results['confidence_score'] = min(agreement_score['overall'], 1.0)
# Identify potential gaps
if agreement_score['overall'] < 0.95:
validation_results['potential_gaps'] = self._identify_discovery_gaps(
primary_results, secondary_results
)
return validation_results
# Usage example
discovery = SensitiveDataDiscovery(client)
# Discover all data for a user deletion request
user_identifiers = [
"user123@example.com",
"session_abc123",
"customer_id_456"
]
discovery_results = discovery.discover_user_data(user_identifiers)
print(f"Discovered {discovery_results['total_items']} data items")
# Validate discovery completeness
validation = discovery.validate_discovery_completeness(user_identifiers)
print(f"Discovery confidence: {validation['confidence_score']:.1%}")
Secure Deletion Execution
Execute secure deletion procedures that permanently remove sensitive data while maintaining integrity of remaining systems. Implement verification mechanisms to ensure deletion completeness.
Irreversible Process: Data deletion is irreversible. Verify deletion requests are legitimate and necessary before execution. Consider data export options for legitimate business needs.
Video

Audit Trail and Compliance Documentation
Maintain comprehensive audit trails of all deletion activities to demonstrate compliance with privacy regulations and organizational policies. Proper documentation protects against legal challenges and regulatory inquiries.

Retention Requirements: While user data is deleted, maintain audit logs of deletion activities as required by regulations. These logs should not contain the actual sensitive data that was deleted.