Dataset Source/Sink - Evaligo AI Workflow Automation Docs

Dataset nodes let you integrate your curated data with AI workflows. Use Dataset Source to read inputs and Dataset Sink to save results.

Dataset Source Node

The Dataset Source node reads rows from a selected dataset and provides them as inputs to your flow.

Configuration

Select dataset: Choose from your workspace datasets
Select samples: Pick specific rows to process
Add inline samples: Create test data directly in the node

Output Variables

Dataset Source exposes outputs for every field in your dataset schema:

Dataset Fields: name, description, category, price

Available Outputs:
  - out (entire row object)
  - out.name
  - out.description
  - out.category
  - out.price

This allows you to map specific fields to different downstream nodes:

Dataset Source:
  out.name → Prompt A.productName
  out.description → Prompt B.content
  out.price → Calculator.basePrice

Tip

Use field-specific outputs for clearer, more maintainable flows. Avoid mapping the entire out object unless necessary.

Dataset Sink Node

The Dataset Sink node writes processed results back to a dataset for storage and analysis.

Configuration

Select target dataset: Choose or create a dataset to write to
Map input fields: Configure which data goes into which dataset column
Auto-create rows: Automatically adds new rows for each processed item

Input Mapping

Dataset Sink has an input handle for each field in the target dataset schema:

Target Dataset Fields: input_text, generated_summary, sentiment

Map Inputs:
  input_text ← _input.originalText
  generated_summary ← out.summary
  sentiment ← out.sentiment

Common Patterns

Read-Process-Write

The most common pattern: read from a source dataset, process with prompts, write results to a sink dataset.

Dataset Source (products)
  → Prompt (generate descriptions)
  → Dataset Sink (generated_content)

Test with Samples

Select a small sample of rows from your dataset to test and refine your flow before running on the full dataset.

Multiple Sources

Use multiple Dataset Source nodes to combine data from different datasets in a single flow.

Parallel Sinks

Write different outputs to multiple Dataset Sink nodes for organized result storage.

Warning

Dataset Sink appends rows by default. If you need to update existing rows, use the API directly or manually manage the dataset.

Working with Arrays

When your dataset contains array fields, use Array Splitter to process items individually:

Dataset Source (out.urls is an array)
  → Array Splitter (split urls)
  → Prompt (process each URL)
  → Array Flatten
  → Dataset Sink (save results)

Integration with Experiments

Dataset nodes integrate seamlessly with your prompt engineering workflow:

Use the same datasets for prompt evaluation and flow execution
Test prompts in the Playground before using them in flows
Write flow results to evaluation datasets for quality analysis
Compare flow outputs against expected results

Best Practices

Schema Design

Keep field names clear and consistent
Use descriptive names that explain the data
Avoid overly nested structures
Document any special field formats or constraints

Sample Selection

Start with 5-10 samples for initial testing
Include edge cases and diverse examples
Gradually increase sample size as you refine
Use the full dataset only when confident in your flow

Result Storage

Create separate sink datasets for different flow runs
Include metadata fields (timestamp, flow version, etc.)
Store both inputs and outputs for debugging
Keep intermediate results for multi-stage flows

Dataset Source Node

Configuration

Output Variables

Dataset Sink Node

Configuration

Input Mapping

Common Patterns

Read-Process-Write

Test with Samples

Multiple Sources

Parallel Sinks

Working with Arrays

Integration with Experiments

Best Practices

Schema Design

Sample Selection

Result Storage

Related Documentation