Docs / Manage datasets
Manage datasets
Effective dataset management is crucial for maintaining evaluation consistency as your AI system evolves. Learn how to version datasets, organize them with metadata, and curate specialized subsets for targeted testing.
Dataset management in Evaligo helps teams maintain consistency and traceability across experiments. As your prompts and models evolve, having well-organized datasets ensures that comparisons remain meaningful and improvements can be measured reliably.
Version control for datasets works similarly to code versioning - you can create snapshots at key milestones, track changes over time, and easily revert to previous versions when needed. This is especially important when collaborating with team members or when you need to reproduce specific experimental results.

Dataset Versioning
Create dataset versions to capture snapshots of your data at different points in time. Each version preserves the exact state of your dataset, including all rows, metadata, and configuration settings, ensuring reproducible experiments.
- 1
Create baseline version Establish your initial dataset as version 1.0 with comprehensive test cases covering your core use cases.
- 2
Track incremental changes Add new test cases or modify existing ones, creating versions 1.1, 1.2, etc. for minor updates.
- 3
Major version releases Create version 2.0 when making significant structural changes or adding new evaluation dimensions.
- 4
Branch for experiments Create experimental branches to test new data types without affecting the main dataset lineage.
Best Practice: Use semantic versioning (major.minor.patch) for your datasets. Increment the major version for breaking changes, minor for new test cases, and patch for corrections to existing data.
# Create a new version from the current dataset
dataset = client.datasets.get("customer-support-qa")
new_version = dataset.create_version(
version="1.3.0",
description="Added edge cases for multilingual support",
changes_summary="15 new test cases in Spanish and French"
)
# Tag specific rows for easy identification
new_version.tag_rows(
row_ids=[101, 102, 103],
tags=["multilingual", "edge-case"]
)
Metadata and Organization
Use tags, labels, and custom metadata to organize your dataset rows into meaningful groups. This enables segmented evaluation where you can run experiments on specific subsets and analyze performance across different dimensions.
Metadata-driven organization helps teams understand dataset composition at a glance and enables powerful filtering and analysis capabilities. You can segment by user persona, product feature, difficulty level, or any custom dimension relevant to your domain.

- 1
Define tag taxonomy Create consistent tagging conventions across your team (e.g., "difficulty:easy", "persona:enterprise-user").
- 2
Add contextual metadata Include relevant context like user locale, product tier, or business criticality for each test case.
- 3
Create filtered views Save commonly used filter combinations as named views for quick access during experiments.
- 4
Bulk operations Use bulk tagging and metadata updates to efficiently organize large datasets.
Metadata Schema: Define a consistent metadata schema across datasets to enable cross-dataset analysis and standardized reporting.
Challenge Sets and Curation
Curate specialized subsets of your dataset to focus on specific challenges or edge cases. Challenge sets help you identify and address model weaknesses before they impact production systems.
Effective curation involves identifying patterns in model failures and creating targeted test cases that expose these weaknesses. This proactive approach helps catch regressions early and guides focused improvement efforts.
Video

# Identify challenging cases from experiment results
failed_cases = experiment.get_failures(
min_confidence_threshold=0.7,
evaluator_types=["groundedness", "relevance"]
)
# Create a challenge set from these failures
challenge_set = dataset.create_subset(
name="hallucination-edge-cases",
row_ids=[case.row_id for case in failed_cases],
description="Cases where model showed hallucination patterns"
)
# Add to automated testing pipeline
challenge_set.enable_regression_testing(
alert_threshold=0.85, # Alert if success rate drops below 85%
notification_channels=["#ai-alerts"]
)
Archive and Cleanup
Maintain a clean, organized workspace by archiving outdated datasets and removing obsolete versions. This improves performance and helps team members focus on relevant, current datasets.
Data Retention: Before archiving datasets, ensure you have proper backups and that no active experiments depend on the data you're removing.
