Batch processing allows you to efficiently process hundreds or thousands of items through your flow using parallel execution, smart batching strategies, and progress tracking.
What is Batch Processing?
Batch processing is the technique of processing multiple data items together as a group rather than one at a time. This is essential for:
- Large dataset processing (100s to 1000s of items)
- Periodic data synchronization
- Bulk content generation
- Mass data enrichment
Processing Strategies
Sequential Processing
Process items one after another:
Item 1 → Complete → Item 2 → Complete → Item 3 → ...
Advantages:
- Lower cost (single API call at a time)
- Easier debugging
- Predictable resource usage
Disadvantages:
- Slowest option
- Time = N × per-item-time
Best for:
- Small batches (< 50 items)
- Rate-limited APIs
- Resource-constrained environmentsParallel Processing
Process multiple items simultaneously:
Item 1 ──┐
Item 2 ──┼─→ Process simultaneously
Item 3 ──┘
Advantages:
- Much faster (10x+ speedup)
- Better resource utilization
- Ideal for I/O-bound operations
Disadvantages:
- Higher concurrent API usage
- More complex error handling
- May hit rate limits
Best for:
- Large batches (100+ items)
- Independent items
- Production workflowsHybrid (Chunked) Processing
Process in parallel batches:
Chunk 1 (10 items) → Process in parallel → Complete
Chunk 2 (10 items) → Process in parallel → Complete
Chunk 3 (10 items) → Process in parallel → Complete
Advantages:
- Balance speed and control
- Manage rate limits
- Progressive results
Best for:
- Very large datasets (1000+ items)
- APIs with rate limits
- When you need progress updatesTip
Start with parallel processing of 10 items at a time. Adjust based on performance and rate limits.
Configuring Batch Processing
Array Splitter Settings
Execution Mode:
- Sequential: One at a time
- Parallel: All items simultaneously
- Chunked: Process N items at a time
Parallel Settings:
Max Concurrency: 10 (default)
Chunk Size: 10 items per batch
Error Handling:
On Error: Skip and continue
Max Error Rate: 10%
Fail After: 50 consecutive errorsPerformance Tuning
Small batch (< 50 items):
Mode: Parallel
Concurrency: 10
Medium batch (50-500 items):
Mode: Chunked
Chunk Size: 20
Concurrency: 20
Large batch (500+ items):
Mode: Chunked
Chunk Size: 50
Concurrency: 25
+ Enable checkpointingHandling Rate Limits
Automatic Rate Limiting
Evaligo automatically manages API rate limits:
- Detects 429 (Too Many Requests) errors
- Implements exponential backoff
- Queues requests when limit reached
- Resumes automatically when available
Manual Rate Control
Rate Limit Settings:
Requests per minute: 60
Delay between requests: 1000ms
Burst allowance: 10
Example:
Process 100 items with 60 req/min limit
→ Takes ~2 minutes
→ Automatically paced to stay under limitProvider-Specific Limits
OpenAI GPT-4:
Tier 1: 500 RPM
Tier 2: 3,500 RPM
Tier 3: 10,000 RPM
Claude:
Free: 50 RPM
Pro: 1,000 RPM
Strategy:
- Know your tier limits
- Set concurrency accordingly
- Monitor usage in dashboardWarning
Exceeding rate limits will slow down your batch processing. Monitor the execution logs for rate limit warnings.
Progress Tracking
Real-Time Progress
Monitor batch execution:
{
"totalItems": 500,
"processed": 237,
"successful": 231,
"failed": 6,
"remaining": 263,
"progress": 47.4,
"elapsedTime": "3m 42s",
"estimatedRemaining": "4m 15s",
"currentRate": "1.2 items/sec"
}Checkpointing
Save progress for large batches:
Every 100 items processed:
→ Save checkpoint
→ Mark completed item IDs
If flow fails:
→ Resume from last checkpoint
→ Skip already processed items
→ Continue with remainingBest Practices
1. Test with Small Samples
Step 1: Test with 5 items (validate logic)
Step 2: Test with 50 items (check performance)
Step 3: Test with 500 items (verify scale)
Step 4: Run full batch (production)2. Set Appropriate Timeouts
Fast operations (text processing): 10s per item
API calls (OpenAI): 30s per item
Web scraping: 60s per item
Complex chains: 120s per item3. Handle Partial Failures
Strategy: Skip and Continue
→ 95/100 items succeed
→ 5 items fail (logged)
→ Flow completes
→ Review failed items
→ Reprocess if needed4. Monitor Costs
Before running:
Cost per item: $0.05
Total items: 1,000
Estimated cost: $50
During execution:
Current spend: $23.50
Items processed: 470
Projected total: $50.00 ✓Tip
Always estimate costs before running large batches. Use the cost calculator in the flow settings.
Common Patterns
Dataset Processing
Dataset Source (1000 items)
→ Array Splitter (parallel: 20)
→ Process each item
→ Array Flatten
→ Dataset Sink (save results)Incremental Processing
Dataset Source (filter: unprocessed)
→ Array Splitter (chunked: 50)
→ Process each
→ Mark as processed
→ Dataset Sink
Run daily to process new items onlyMulti-Stage Batching
Stage 1: Fetch data (parallel: 50)
→ Array Flatten
Stage 2: Process data (parallel: 20)
→ Array Flatten
Stage 3: Save results (sequential)
→ Dataset SinkOptimization Techniques
Caching
Reduce redundant API calls:
- Cache website scraping results
- Reuse identical prompt outputs
- Store intermediate results
- Can reduce costs by 30-70%
Deduplication
Remove duplicate items before processing:
Dataset Source (1000 items)
→ Deduplicate (750 unique items)
→ Array Splitter
→ Process 750 instead of 1000
→ 25% cost savingsSmart Ordering
Process items in optimal order:
- Prioritize high-value items
- Group similar items together
- Process fast items first for quick wins
Error Recovery
Automatic Retry
Item fails due to timeout:
Attempt 1: Failed (timeout)
Wait 2s
Attempt 2: Failed (timeout)
Wait 5s
Attempt 3: Success ✓Manual Reprocessing
After batch completes:
1. Export failed items list
2. Fix underlying issues
3. Create new dataset with failed items
4. Reprocess just those itemsPartial Results
Flow processes 800/1000 items then crashes
→ 800 results saved to dataset
→ Resume from item 801
→ Process remaining 200
→ Merge resultsMonitoring and Alerts
Set Up Alerts
- Error rate exceeds 5%
- Execution time exceeds estimate by 50%
- Cost exceeds budget
- Flow fails or times out
Review Metrics
After each batch:
- Success rate
- Average time per item
- Cost per item
- Error patterns
- Bottleneck nodes