In this tutorial, you'll build a practical lead enrichment flow that takes a simple list of company names and websites, then automatically enriches each lead with detailed information and an AI-generated quality score.

What You'll Build

A complete lead enrichment pipeline that:

  • Starts with basic lead data (company name + website)
  • Analyzes each company's website
  • Extracts key business information
  • Generates an AI-powered lead score
  • Saves enriched data back to your dataset
  • Deploys as an API for CRM integration
Info
This tutorial takes about 20 minutes and demonstrates a real-world B2B use case.

Step 1: Prepare Your Dataset

Create Input Dataset

  1. 1

    Go to Datasets Navigate to the Datasets section

  2. 2

    Create new dataset Name it "Leads - Raw"

  3. 3

    Add columns company, website, source, created_at

  4. 4

    Import or add sample data Add 5-10 test leads

Sample Data

company,website,source,created_at
Acme Corp,https://acme.com,webform,2024-01-15
TechStart Inc,https://techstart.io,linkedin,2024-01-15
Global Solutions,https://globalsolutions.com,referral,2024-01-16

Step 2: Create Enrichment Prompts

Prompt 1: Business Analyzer

In the Playground, create this prompt:

Analyze this company's website and provide structured information:

Company: {{companyName}}
Website Content: {{websiteContent}}

Provide a JSON response with:
{
  "industry": "Primary industry",
  "companySize": "estimated employee count range",
  "products": ["list", "of", "main products/services"],
  "targetMarket": "B2B, B2C, or Both",
  "technologyStack": ["detected", "technologies"],
  "description": "2-sentence company overview"
}

Be factual and concise.

Test it, then save as "Business Analyzer".

Prompt 2: Lead Scorer

Score this lead from 1-10 based on these factors:

Company: {{companyName}}
Industry: {{industry}}
Company Size: {{companySize}}
Products: {{products}}
Target Market: {{targetMarket}}
Source: {{leadSource}}

Scoring criteria:
- Relevance to our ideal customer profile
- Company size (prefer 50-500 employees)
- Technology adoption (modern stack is better)
- B2B focus (higher score)
- Website quality and professionalism

Provide:
{
  "score": 1-10,
  "reasoning": "brief explanation",
  "nextSteps": "recommended action",
  "priority": "high/medium/low"
}

Test and save as "Lead Scorer".

Tip
Customize the scoring criteria to match your ideal customer profile for best results.

Step 3: Build the Flow

Flow Structure

Dataset Source ("Leads - Raw")
  ↓
Website Mapper (discover pages)
  ↓
Page Scraper (get homepage)
  ↓
HTML Text Extractor (clean content)
  ↓
Prompt: Business Analyzer (extract info)
  ↓
Prompt: Lead Scorer (score & prioritize)
  ↓
Dataset Sink (UPDATE "Leads - Raw")

Detailed Configuration

  1. 1

    Dataset Source Select "Leads - Raw" dataset
    Expose fields: company, website, source
    Filter: enriched_at IS NULL (only process new leads)

  2. 2

    Website Mapper Map: out.website → url
    Max pages: 1 (we only need homepage)

  3. 3

    Page Scraper Map: out.urls[0] → url
    Selector: "body" (get full page)

  4. 4

    HTML Text Extractor Map: out.html → html
    Mode: Standard

  5. 5

    Prompt: Business Analyzer Select "Business Analyzer" prompt
    Map: out.company → companyName
    Map: HTMLExtractor.out.text → websiteContent

  6. 6

    Prompt: Lead Scorer Select "Lead Scorer" prompt
    Map: out.company → companyName
    Map: BusinessAnalyzer.out.industry → industry
    Map: BusinessAnalyzer.out.companySize → companySize
    Map: BusinessAnalyzer.out.products → products
    Map: BusinessAnalyzer.out.targetMarket → targetMarket
    Map: out.source → leadSource

  7. 7

    Dataset Sink Target: "Leads - Raw"
    Mode: UPDATE
    Match on: id
    Map fields: industry, company_size, products, target_market, lead_score, priority, next_steps, enriched_at

Step 4: Test Your Flow

Run with Sample Data

  1. 1

    Select 2-3 leads Don't run all at once initially

  2. 2

    Click "Run Flow" Watch the execution

  3. 3

    Monitor progress Check each node's output

  4. 4

    Verify results Check the updated dataset

Expected Output

Your dataset should now have enriched data:

{
  "company": "Acme Corp",
  "website": "https://acme.com",
  "source": "webform",
  "industry": "Manufacturing",
  "company_size": "200-500 employees",
  "products": ["Industrial equipment", "Automation solutions"],
  "target_market": "B2B",
  "lead_score": 8,
  "priority": "high",
  "next_steps": "Schedule demo, emphasize automation ROI",
  "enriched_at": "2024-01-15T10:30:00Z"
}
If results look good, process the remaining leads. If not, refine your prompts and try again.

Step 5: Handle Array Processing

Add Batch Capabilities

To process multiple leads efficiently:

Dataset Source
  ↓
Array Splitter (parallel: 5)
  ↓
[Individual processing per lead]
  ↓
Array Flatten
  ↓
Dataset Sink (batch update)

Parallel Configuration

  • Concurrency: 5 (safe for most APIs)
  • Error handling: Skip and continue
  • Timeout: 60s per lead
  • Retry: 2 attempts on failure

Step 6: Add Quality Control

Validation Node

Add a validation step before Dataset Sink:

Lead Scorer output
  ↓
Validation: Check required fields
  - lead_score exists and 1-10
  - priority is high/medium/low
  - industry is not "Unknown"
  ↓
If valid: → Dataset Sink
If invalid: → Error log + skip

Fallback Strategy

If website scraping fails:
  → Try alternative data source
  → Or set default values
  → Mark as "needs_manual_review"
  → Still save to dataset

Step 7: Deploy as API

Add API Nodes

API Input (single lead)
  fields: company, website, source
  ↓
[Processing pipeline]
  ↓
API Output
  return: enriched lead data

API Integration

# CRM integration
def enrich_new_lead(company, website, source):
    response = requests.post(
        "https://api.evaligo.com/flows/lead-enrichment/execute",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "company": company,
            "website": website,
            "source": source
        }
    )
    
    enriched_data = response.json()
    
    # Update CRM with enriched data
    crm.update_lead(enriched_data)
    
    # Route high-priority leads
    if enriched_data["priority"] == "high":
        notify_sales_team(enriched_data)

# Webhook from web form
@app.route('/webhook/new-lead', methods=['POST'])
def new_lead_webhook():
    lead = request.json
    enriched = enrich_new_lead(
        lead['company'],
        lead['website'],
        'webform'
    )
    return jsonify(enriched)
Info
API deployment enables real-time lead enrichment as soon as leads enter your system.

Step 8: Monitor and Optimize

Key Metrics to Track

  • Enrichment success rate
  • Average processing time
  • Cost per lead
  • Lead score distribution
  • Conversion rate by score

Optimization Strategies

Week 1 Baseline:
  Success rate: 85%
  Avg time: 15s per lead
  Cost: $0.25 per lead
  
Optimizations:
  1. Cache website data (same domain)
     → Cost: $0.18 per lead (-28%)
  
  2. Parallel processing (5 concurrent)
     → Time: 3s per lead (-80%)
  
  3. Refined scoring prompt
     → Better prioritization
  
Week 4 Results:
  Success rate: 92%
  Avg time: 3s per lead
  Cost: $0.18 per lead
  Conversion rate (high priority): +35%

Advanced Features

Re-enrichment Schedule

Run flow weekly:
  Filter: enriched_at < 30 days ago
  → Update company data
  → Recalculate scores
  → Detect changes (funding, growth, etc.)

Multi-Source Enrichment

Source 1: Website analysis
Source 2: LinkedIn company data
Source 3: News mentions
Source 4: Tech stack detection
  → Combine all sources
  → Generate comprehensive profile

Smart Routing

High-score leads (8-10):
  → Immediate sales notification
  → Priority in CRM
  
Mid-score leads (5-7):
  → Marketing nurture campaign
  → Weekly follow-up
  
Low-score leads (1-4):
  → Generic nurture sequence
  → Monthly check-in

Troubleshooting

Common Issues

Low Enrichment Success Rate

Problem: Many leads fail to enrich

Solutions:

  • Verify websites are accessible
  • Increase timeout for slow sites
  • Add retry logic
  • Implement fallback data sources

Inconsistent Scores

Problem: Lead scores seem random

Solutions:

  • Use structured output schema
  • Add more specific criteria to prompt
  • Lower temperature (0.3-0.5)
  • Test prompt with edge cases

Slow Processing

Problem: Takes too long

Solutions:

  • Enable parallel processing
  • Cache website data
  • Use async execution
  • Optimize prompt length

Next Steps

Enhance Your Flow

  • Add competitor analysis
  • Integrate with LinkedIn API
  • Extract contact information
  • Detect recent funding/news
  • Build custom ICP matching

Scale to Production

  • Process full lead database
  • Set up automated scheduling
  • Integrate with CRM webhooks
  • Build enrichment dashboard
  • Track ROI and conversion rates

Related Documentation

Using Datasets
Work with lead data
Batch Processing
Process many leads efficiently
Deploying as APIs
Real-time enrichment