Dataset nodes let you integrate your curated data with AI workflows. Use Dataset Source to read inputs and Dataset Sink to save results.
Dataset Source Node
The Dataset Source node reads rows from a selected dataset and provides them as inputs to your flow.
Configuration
- Select dataset: Choose from your workspace datasets
- Select samples: Pick specific rows to process
- Add inline samples: Create test data directly in the node
Output Variables
Dataset Source exposes outputs for every field in your dataset schema:
Dataset Fields: name, description, category, price
Available Outputs:
- out (entire row object)
- out.name
- out.description
- out.category
- out.priceThis allows you to map specific fields to different downstream nodes:
Dataset Source:
out.name → Prompt A.productName
out.description → Prompt B.content
out.price → Calculator.basePriceout object unless necessary.Dataset Sink Node
The Dataset Sink node writes processed results back to a dataset for storage and analysis.
Configuration
- Select target dataset: Choose or create a dataset to write to
- Map input fields: Configure which data goes into which dataset column
- Auto-create rows: Automatically adds new rows for each processed item
Input Mapping
Dataset Sink has an input handle for each field in the target dataset schema:
Target Dataset Fields: input_text, generated_summary, sentiment
Map Inputs:
input_text ← _input.originalText
generated_summary ← out.summary
sentiment ← out.sentimentCommon Patterns
Read-Process-Write
The most common pattern: read from a source dataset, process with prompts, write results to a sink dataset.
Dataset Source (products)
→ Prompt (generate descriptions)
→ Dataset Sink (generated_content)Test with Samples
Select a small sample of rows from your dataset to test and refine your flow before running on the full dataset.
Multiple Sources
Use multiple Dataset Source nodes to combine data from different datasets in a single flow.
Parallel Sinks
Write different outputs to multiple Dataset Sink nodes for organized result storage.
Working with Arrays
When your dataset contains array fields, use Array Splitter to process items individually:
Dataset Source (out.urls is an array)
→ Array Splitter (split urls)
→ Prompt (process each URL)
→ Array Flatten
→ Dataset Sink (save results)Integration with Experiments
Dataset nodes integrate seamlessly with your prompt engineering workflow:
- Use the same datasets for prompt evaluation and flow execution
- Test prompts in the Playground before using them in flows
- Write flow results to evaluation datasets for quality analysis
- Compare flow outputs against expected results
Best Practices
Schema Design
- Keep field names clear and consistent
- Use descriptive names that explain the data
- Avoid overly nested structures
- Document any special field formats or constraints
Sample Selection
- Start with 5-10 samples for initial testing
- Include edge cases and diverse examples
- Gradually increase sample size as you refine
- Use the full dataset only when confident in your flow
Result Storage
- Create separate sink datasets for different flow runs
- Include metadata fields (timestamp, flow version, etc.)
- Store both inputs and outputs for debugging
- Keep intermediate results for multi-stage flows