Docs / Import a dataset

Import a dataset

High-quality datasets are the foundation of reliable AI evaluation. They represent the real-world scenarios your AI will encounter, providing consistent test cases that enable objective comparison across experiments and models.

Effective dataset import goes beyond just uploading files. It involves careful field mapping, metadata organization, quality validation, and strategic structuring that supports both immediate experimentation and long-term evaluation needs.

This guide walks you through the complete dataset import process, from preparing your data files to configuring field mappings and organizing metadata for maximum evaluation effectiveness.

Whether you're importing customer support conversations, code examples, creative writing prompts, or domain-specific queries, these practices ensure your datasets provide reliable foundations for AI quality assessment.

Dataset Preparation Best Practices

Before importing, organize your data to maximize its value for evaluation. Well-structured datasets with clear field definitions and comprehensive metadata enable more sophisticated analysis and comparison.

Focus on diversity, quality, and representativeness. Your dataset should cover the range of inputs your AI will encounter in production, including edge cases, common scenarios, and challenging examples that reveal model limitations.

Dataset preparation and validation interface

Field mapping and metadata configuration

Info

Start with 20-50 high-quality, diverse examples rather than hundreds of similar cases. You can always expand your dataset, but fixing quality issues after import is more difficult.

Supported File Formats

Evaligo supports the most common data formats used for AI evaluation, with intelligent parsing and validation to ensure your data imports correctly.

CSV Format

CSV files with header rows are ideal for tabular data with consistent field structures. Each row represents one test case, with columns for inputs, expected outputs, and metadata.

// Example CSV structure
input,expected_output,category,difficulty,language
"How do I reset my password?","Visit settings > security > reset password","account","easy","en"
"My payment failed, what should I do?","Check your payment method and try again...","billing","medium","en"
"Can you explain quantum computing?","Quantum computing uses quantum mechanics...","technical","hard","en"

JSONL Format

JSON Lines format provides flexibility for complex, nested data structures. Each line contains a complete JSON object representing one test case.

// Example JSONL structure
{"input": "How do I reset my password?", "expected": "Visit settings > security > reset password", "metadata": {"category": "account", "difficulty": "easy"}}
{"input": "My payment failed, what should I do?", "expected": "Check your payment method and try again", "metadata": {"category": "billing", "difficulty": "medium"}}
{"input": "Can you explain quantum computing?", "expected": "Quantum computing uses quantum mechanics", "metadata": {"category": "technical", "difficulty": "hard"}}

Field Mapping and Configuration

Proper field mapping ensures Evaligo understands your data structure and can use it effectively for experiments and evaluation. Take time to map fields correctly during import to avoid issues later.

1
Input Fields Map columns containing the prompts, questions, or inputs that will be sent to your AI model.
2
Expected Outputs Identify reference answers, expected responses, or ground truth data for comparison.
3
Metadata Fields Configure additional context like categories, difficulty levels, or business metrics.
4
Validation Rules Set up data quality checks to ensure imported data meets your standards.

Input Field Configuration

Input fields contain the prompts, questions, or data that will be processed by your AI model. These should be clean, well-formatted, and representative of real user interactions.

For complex inputs involving multiple fields (like system prompts + user queries), you can configure field concatenation or use template variables to combine multiple columns into the final input.

Expected Output Configuration

Expected outputs provide reference points for evaluation. They can be exact answers (for factual queries), example responses (for creative tasks), or structured data (for classification or extraction tasks).

Not all datasets need expected outputs. For exploratory evaluation or creative tasks, you might rely entirely on LLM-based judges or human evaluation rather than reference comparisons.

Metadata Organization

Metadata enables sophisticated analysis by allowing you to segment results, filter experiments, and understand performance patterns across different categories or conditions.

Plan your metadata schema thoughtfully. Common metadata includes difficulty levels, content categories, user types, languages, business priority, and any domain-specific attributes relevant to your evaluation goals.

// Example comprehensive metadata schema
{
  "input": "How do I cancel my subscription?",
  "expected_output": "Visit account settings > billing > cancel subscription",
  "metadata": {
    // Categorical
    "category": "billing",
    "subcategory": "cancellation", 
    "user_type": "premium",
    "difficulty": "easy",
    
    // Numerical  
    "priority_score": 8,
    "input_length": 34,
    "complexity_rating": 2,
    
    // Contextual
    "language": "en",
    "locale": "US",
    "product_version": "v2.1",
    "date_created": "2024-01-15"
  }
}

Tip

Design metadata fields that support the questions you want to answer: "How does performance vary by difficulty?" "Which content categories need improvement?" "Do newer examples perform better?"

Data Quality and Validation

Import validation helps catch data quality issues early, before they affect your experiments. Evaligo provides both automatic validation and configurable quality checks.

Common validation includes checking for missing required fields, validating data formats, detecting duplicates, and ensuring metadata values fall within expected ranges.

Video

Dataset import walkthrough: from file upload to validation

Large Dataset Handling

For datasets with thousands of examples, Evaligo provides chunked processing that handles large files reliably without timeouts or memory issues.

Large datasets are processed asynchronously with progress tracking. You can continue working in Evaligo while your dataset processes, and you'll receive notifications when import completes.

Dataset Organization Strategies

As your evaluation needs grow, organize datasets strategically to support different types of analysis and experimentation.

Challenge Sets

Create small, focused datasets (10-20 examples) for rapid iteration. These should include your most challenging or representative examples for quick quality checks.

Comprehensive Sets

Maintain larger datasets (100-1000+ examples) for thorough evaluation before major releases or when making significant model or prompt changes.

Domain-Specific Sets

Organize datasets by domain, user type, or use case to enable targeted evaluation and performance analysis across different scenarios.

Warning

Changing datasets can affect experiment reproducibility. Document significant changes and consider their impact on historical comparisons when deciding between new versions and updates.

Next Steps

With your datasets imported and properly configured, you're ready to run experiments that test different approaches against your real-world scenarios.

Manage Datasets

Organize, version, and maintain your test data

Run Your First Experiment

Test prompt variants against your datasets

Dataset Curation

Automate dataset quality and expansion

Evaluation Templates

Apply quality metrics to your datasets