Dataset nodes let you integrate your curated data with AI workflows. Use Dataset Source to read inputs and Dataset Sink to save results.

Dataset Source Node

The Dataset Source node reads rows from a selected dataset and provides them as inputs to your flow.

Configuration

  • Select dataset: Choose from your workspace datasets
  • Select samples: Pick specific rows to process
  • Add inline samples: Create test data directly in the node

Output Variables

Dataset Source exposes outputs for every field in your dataset schema:

Dataset Fields: name, description, category, price

Available Outputs:
  - out (entire row object)
  - out.name
  - out.description
  - out.category
  - out.price

This allows you to map specific fields to different downstream nodes:

Dataset Source:
  out.name → Prompt A.productName
  out.description → Prompt B.content
  out.price → Calculator.basePrice
Tip
Use field-specific outputs for clearer, more maintainable flows. Avoid mapping the entire out object unless necessary.

Dataset Sink Node

The Dataset Sink node writes processed results back to a dataset for storage and analysis.

Configuration

  • Select target dataset: Choose or create a dataset to write to
  • Map input fields: Configure which data goes into which dataset column
  • Auto-create rows: Automatically adds new rows for each processed item

Input Mapping

Dataset Sink has an input handle for each field in the target dataset schema:

Target Dataset Fields: input_text, generated_summary, sentiment

Map Inputs:
  input_text ← _input.originalText
  generated_summary ← out.summary
  sentiment ← out.sentiment

Common Patterns

Read-Process-Write

The most common pattern: read from a source dataset, process with prompts, write results to a sink dataset.

Dataset Source (products)
  → Prompt (generate descriptions)
  → Dataset Sink (generated_content)

Test with Samples

Select a small sample of rows from your dataset to test and refine your flow before running on the full dataset.

Multiple Sources

Use multiple Dataset Source nodes to combine data from different datasets in a single flow.

Parallel Sinks

Write different outputs to multiple Dataset Sink nodes for organized result storage.

Warning
Dataset Sink appends rows by default. If you need to update existing rows, use the API directly or manually manage the dataset.

Working with Arrays

When your dataset contains array fields, use Array Splitter to process items individually:

Dataset Source (out.urls is an array)
  → Array Splitter (split urls)
  → Prompt (process each URL)
  → Array Flatten
  → Dataset Sink (save results)

Integration with Experiments

Dataset nodes integrate seamlessly with your prompt engineering workflow:

  • Use the same datasets for prompt evaluation and flow execution
  • Test prompts in the Playground before using them in flows
  • Write flow results to evaluation datasets for quality analysis
  • Compare flow outputs against expected results

Best Practices

Schema Design

  • Keep field names clear and consistent
  • Use descriptive names that explain the data
  • Avoid overly nested structures
  • Document any special field formats or constraints

Sample Selection

  • Start with 5-10 samples for initial testing
  • Include edge cases and diverse examples
  • Gradually increase sample size as you refine
  • Use the full dataset only when confident in your flow

Result Storage

  • Create separate sink datasets for different flow runs
  • Include metadata fields (timestamp, flow version, etc.)
  • Store both inputs and outputs for debugging
  • Keep intermediate results for multi-stage flows

Related Documentation

Import a Dataset
Learn how to upload data
Variable Mapping
Map data between nodes
Using Datasets in Flows
Best practices guide