Docs / Import a dataset
Import a dataset
High-quality datasets are the foundation of reliable AI evaluation. They represent the real-world scenarios your AI will encounter, providing consistent test cases that enable objective comparison across experiments and models.
Effective dataset import goes beyond just uploading files. It involves careful field mapping, metadata organization, quality validation, and strategic structuring that supports both immediate experimentation and long-term evaluation needs.
This guide walks you through the complete dataset import process, from preparing your data files to configuring field mappings and organizing metadata for maximum evaluation effectiveness.
Whether you're importing customer support conversations, code examples, creative writing prompts, or domain-specific queries, these practices ensure your datasets provide reliable foundations for AI quality assessment.
Dataset Preparation Best Practices
Before importing, organize your data to maximize its value for evaluation. Well-structured datasets with clear field definitions and comprehensive metadata enable more sophisticated analysis and comparison.
Focus on diversity, quality, and representativeness. Your dataset should cover the range of inputs your AI will encounter in production, including edge cases, common scenarios, and challenging examples that reveal model limitations.


Supported File Formats
Evaligo supports the most common data formats used for AI evaluation, with intelligent parsing and validation to ensure your data imports correctly.
CSV Format
CSV files with header rows are ideal for tabular data with consistent field structures. Each row represents one test case, with columns for inputs, expected outputs, and metadata.
// Example CSV structure
input,expected_output,category,difficulty,language
"How do I reset my password?","Visit settings > security > reset password","account","easy","en"
"My payment failed, what should I do?","Check your payment method and try again...","billing","medium","en"
"Can you explain quantum computing?","Quantum computing uses quantum mechanics...","technical","hard","en"
JSONL Format
JSON Lines format provides flexibility for complex, nested data structures. Each line contains a complete JSON object representing one test case.
// Example JSONL structure
{"input": "How do I reset my password?", "expected": "Visit settings > security > reset password", "metadata": {"category": "account", "difficulty": "easy"}}
{"input": "My payment failed, what should I do?", "expected": "Check your payment method and try again", "metadata": {"category": "billing", "difficulty": "medium"}}
{"input": "Can you explain quantum computing?", "expected": "Quantum computing uses quantum mechanics", "metadata": {"category": "technical", "difficulty": "hard"}}
Field Mapping and Configuration
Proper field mapping ensures Evaligo understands your data structure and can use it effectively for experiments and evaluation. Take time to map fields correctly during import to avoid issues later.
- 1
Input Fields Map columns containing the prompts, questions, or inputs that will be sent to your AI model.
- 2
Expected Outputs Identify reference answers, expected responses, or ground truth data for comparison.
- 3
Metadata Fields Configure additional context like categories, difficulty levels, or business metrics.
- 4
Validation Rules Set up data quality checks to ensure imported data meets your standards.
Input Field Configuration
Input fields contain the prompts, questions, or data that will be processed by your AI model. These should be clean, well-formatted, and representative of real user interactions.
For complex inputs involving multiple fields (like system prompts + user queries), you can configure field concatenation or use template variables to combine multiple columns into the final input.
Expected Output Configuration
Expected outputs provide reference points for evaluation. They can be exact answers (for factual queries), example responses (for creative tasks), or structured data (for classification or extraction tasks).
Not all datasets need expected outputs. For exploratory evaluation or creative tasks, you might rely entirely on LLM-based judges or human evaluation rather than reference comparisons.


Metadata Organization
Metadata enables sophisticated analysis by allowing you to segment results, filter experiments, and understand performance patterns across different categories or conditions.
Plan your metadata schema thoughtfully. Common metadata includes difficulty levels, content categories, user types, languages, business priority, and any domain-specific attributes relevant to your evaluation goals.
// Example comprehensive metadata schema
{
"input": "How do I cancel my subscription?",
"expected_output": "Visit account settings > billing > cancel subscription",
"metadata": {
// Categorical
"category": "billing",
"subcategory": "cancellation",
"user_type": "premium",
"difficulty": "easy",
// Numerical
"priority_score": 8,
"input_length": 34,
"complexity_rating": 2,
// Contextual
"language": "en",
"locale": "US",
"product_version": "v2.1",
"date_created": "2024-01-15"
}
}
Data Quality and Validation
Import validation helps catch data quality issues early, before they affect your experiments. Evaligo provides both automatic validation and configurable quality checks.
Common validation includes checking for missing required fields, validating data formats, detecting duplicates, and ensuring metadata values fall within expected ranges.
Video

Large Dataset Handling
For datasets with thousands of examples, Evaligo provides chunked processing that handles large files reliably without timeouts or memory issues.
Large datasets are processed asynchronously with progress tracking. You can continue working in Evaligo while your dataset processes, and you'll receive notifications when import completes.


Dataset Organization Strategies
As your evaluation needs grow, organize datasets strategically to support different types of analysis and experimentation.
Challenge Sets
Create small, focused datasets (10-20 examples) for rapid iteration. These should include your most challenging or representative examples for quick quality checks.
Comprehensive Sets
Maintain larger datasets (100-1000+ examples) for thorough evaluation before major releases or when making significant model or prompt changes.
Domain-Specific Sets
Organize datasets by domain, user type, or use case to enable targeted evaluation and performance analysis across different scenarios.
Next Steps
With your datasets imported and properly configured, you're ready to run experiments that test different approaches against your real-world scenarios.