Lead Enrichment Pipeline - Evaligo AI Workflow Automation Docs

In this tutorial, you'll build a practical lead enrichment flow that takes a simple list of company names and websites, then automatically enriches each lead with detailed information and an AI-generated quality score.

What You'll Build

A complete lead enrichment pipeline that:

Starts with basic lead data (company name + website)
Analyzes each company's website
Extracts key business information
Generates an AI-powered lead score
Saves enriched data back to your dataset
Deploys as an API for CRM integration

Info

This tutorial takes about 20 minutes and demonstrates a real-world B2B use case.

Step 1: Prepare Your Dataset

Create Input Dataset

1
Go to Datasets Navigate to the Datasets section
2
Create new dataset Name it "Leads - Raw"
3
Add columns company, website, source, created_at
4
Import or add sample data Add 5-10 test leads

Sample Data

company,website,source,created_at
Acme Corp,https://acme.com,webform,2024-01-15
TechStart Inc,https://techstart.io,linkedin,2024-01-15
Global Solutions,https://globalsolutions.com,referral,2024-01-16

Step 2: Create Enrichment Prompts

Prompt 1: Business Analyzer

In the Playground, create this prompt:

Analyze this company's website and provide structured information:

Company: {{companyName}}
Website Content: {{websiteContent}}

Provide a JSON response with:
{
  "industry": "Primary industry",
  "companySize": "estimated employee count range",
  "products": ["list", "of", "main products/services"],
  "targetMarket": "B2B, B2C, or Both",
  "technologyStack": ["detected", "technologies"],
  "description": "2-sentence company overview"
}

Be factual and concise.

Test it, then save as "Business Analyzer".

Prompt 2: Lead Scorer

Score this lead from 1-10 based on these factors:

Company: {{companyName}}
Industry: {{industry}}
Company Size: {{companySize}}
Products: {{products}}
Target Market: {{targetMarket}}
Source: {{leadSource}}

Scoring criteria:
- Relevance to our ideal customer profile
- Company size (prefer 50-500 employees)
- Technology adoption (modern stack is better)
- B2B focus (higher score)
- Website quality and professionalism

Provide:
{
  "score": 1-10,
  "reasoning": "brief explanation",
  "nextSteps": "recommended action",
  "priority": "high/medium/low"
}

Test and save as "Lead Scorer".

Tip

Customize the scoring criteria to match your ideal customer profile for best results.

Step 3: Build the Flow

Flow Structure

Dataset Source ("Leads - Raw")
  ↓
Website Mapper (discover pages)
  ↓
Page Scraper (get homepage)
  ↓
HTML Text Extractor (clean content)
  ↓
Prompt: Business Analyzer (extract info)
  ↓
Prompt: Lead Scorer (score & prioritize)
  ↓
Dataset Sink (UPDATE "Leads - Raw")

Detailed Configuration

1
Dataset Source Select "Leads - Raw" dataset
Expose fields: company, website, source
Filter: enriched_at IS NULL (only process new leads)
2
Website Mapper Map: out.website → url
Max pages: 1 (we only need homepage)
3
Page Scraper Map: out.urls[0] → url
Selector: "body" (get full page)
4
HTML Text Extractor Map: out.html → html
Mode: Standard
5
Prompt: Business Analyzer Select "Business Analyzer" prompt
Map: out.company → companyName
Map: HTMLExtractor.out.text → websiteContent
6
Prompt: Lead Scorer Select "Lead Scorer" prompt
Map: out.company → companyName
Map: BusinessAnalyzer.out.industry → industry
Map: BusinessAnalyzer.out.companySize → companySize
Map: BusinessAnalyzer.out.products → products
Map: BusinessAnalyzer.out.targetMarket → targetMarket
Map: out.source → leadSource
7
Dataset Sink Target: "Leads - Raw"
Mode: UPDATE
Match on: id
Map fields: industry, company_size, products, target_market, lead_score, priority, next_steps, enriched_at

Step 4: Test Your Flow

Run with Sample Data

1
Select 2-3 leads Don't run all at once initially
2
Click "Run Flow" Watch the execution
3
Monitor progress Check each node's output
4
Verify results Check the updated dataset

Expected Output

Your dataset should now have enriched data:

{
  "company": "Acme Corp",
  "website": "https://acme.com",
  "source": "webform",
  "industry": "Manufacturing",
  "company_size": "200-500 employees",
  "products": ["Industrial equipment", "Automation solutions"],
  "target_market": "B2B",
  "lead_score": 8,
  "priority": "high",
  "next_steps": "Schedule demo, emphasize automation ROI",
  "enriched_at": "2024-01-15T10:30:00Z"
}

If results look good, process the remaining leads. If not, refine your prompts and try again.

Step 5: Handle Array Processing

Add Batch Capabilities

To process multiple leads efficiently:

Dataset Source
  ↓
Array Splitter (parallel: 5)
  ↓
[Individual processing per lead]
  ↓
Array Flatten
  ↓
Dataset Sink (batch update)

Parallel Configuration

Concurrency: 5 (safe for most APIs)
Error handling: Skip and continue
Timeout: 60s per lead
Retry: 2 attempts on failure

Step 6: Add Quality Control

Validation Node

Add a validation step before Dataset Sink:

Lead Scorer output
  ↓
Validation: Check required fields
  - lead_score exists and 1-10
  - priority is high/medium/low
  - industry is not "Unknown"
  ↓
If valid: → Dataset Sink
If invalid: → Error log + skip

Fallback Strategy

If website scraping fails:
  → Try alternative data source
  → Or set default values
  → Mark as "needs_manual_review"
  → Still save to dataset

Step 7: Deploy as API

Add API Nodes

API Input (single lead)
  fields: company, website, source
  ↓
[Processing pipeline]
  ↓
API Output
  return: enriched lead data

API Integration

# CRM integration
def enrich_new_lead(company, website, source):
    response = requests.post(
        "https://api.evaligo.com/flows/lead-enrichment/execute",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "company": company,
            "website": website,
            "source": source
        }
    )
    
    enriched_data = response.json()
    
    # Update CRM with enriched data
    crm.update_lead(enriched_data)
    
    # Route high-priority leads
    if enriched_data["priority"] == "high":
        notify_sales_team(enriched_data)

# Webhook from web form
@app.route('/webhook/new-lead', methods=['POST'])
def new_lead_webhook():
    lead = request.json
    enriched = enrich_new_lead(
        lead['company'],
        lead['website'],
        'webform'
    )
    return jsonify(enriched)

Info

API deployment enables real-time lead enrichment as soon as leads enter your system.

Step 8: Monitor and Optimize

Key Metrics to Track

Enrichment success rate
Average processing time
Cost per lead
Lead score distribution
Conversion rate by score

Optimization Strategies

Week 1 Baseline:
  Success rate: 85%
  Avg time: 15s per lead
  Cost: $0.25 per lead
  
Optimizations:
  1. Cache website data (same domain)
     → Cost: $0.18 per lead (-28%)
  
  2. Parallel processing (5 concurrent)
     → Time: 3s per lead (-80%)
  
  3. Refined scoring prompt
     → Better prioritization
  
Week 4 Results:
  Success rate: 92%
  Avg time: 3s per lead
  Cost: $0.18 per lead
  Conversion rate (high priority): +35%

Advanced Features

Re-enrichment Schedule

Run flow weekly:
  Filter: enriched_at < 30 days ago
  → Update company data
  → Recalculate scores
  → Detect changes (funding, growth, etc.)

Multi-Source Enrichment

Source 1: Website analysis
Source 2: LinkedIn company data
Source 3: News mentions
Source 4: Tech stack detection
  → Combine all sources
  → Generate comprehensive profile

Smart Routing

High-score leads (8-10):
  → Immediate sales notification
  → Priority in CRM
  
Mid-score leads (5-7):
  → Marketing nurture campaign
  → Weekly follow-up
  
Low-score leads (1-4):
  → Generic nurture sequence
  → Monthly check-in

Troubleshooting

Common Issues

Low Enrichment Success Rate

Problem: Many leads fail to enrich

Solutions:

Verify websites are accessible
Increase timeout for slow sites
Add retry logic
Implement fallback data sources

Inconsistent Scores

Problem: Lead scores seem random

Solutions:

Use structured output schema
Add more specific criteria to prompt
Lower temperature (0.3-0.5)
Test prompt with edge cases

Slow Processing

Problem: Takes too long

Solutions:

Enable parallel processing
Cache website data
Use async execution
Optimize prompt length

Next Steps

Enhance Your Flow

Add competitor analysis
Integrate with LinkedIn API
Extract contact information
Detect recent funding/news
Build custom ICP matching

Scale to Production

Process full lead database
Set up automated scheduling
Integrate with CRM webhooks
Build enrichment dashboard
Track ROI and conversion rates

What You'll Build

Step 1: Prepare Your Dataset

Create Input Dataset

Sample Data

Step 2: Create Enrichment Prompts

Prompt 1: Business Analyzer

Prompt 2: Lead Scorer

Step 3: Build the Flow

Flow Structure

Detailed Configuration

Step 4: Test Your Flow

Run with Sample Data

Expected Output

Step 5: Handle Array Processing

Add Batch Capabilities

Parallel Configuration

Step 6: Add Quality Control

Validation Node

Fallback Strategy

Step 7: Deploy as API

Add API Nodes

API Integration

Step 8: Monitor and Optimize

Key Metrics to Track

Optimization Strategies

Advanced Features

Re-enrichment Schedule

Multi-Source Enrichment

Smart Routing

Troubleshooting

Common Issues

Low Enrichment Success Rate

Inconsistent Scores

Slow Processing

Next Steps

Enhance Your Flow

Scale to Production

Related Documentation