Website Mapper - Evaligo AI Workflow Automation Docs

The Website Mapper node crawls a website and discovers all accessible pages, creating a sitemap that can be used to systematically extract content.

How It Works

Website Mapper takes a starting URL and:

Fetches the page content
Extracts all internal links
Follows links to discover additional pages
Returns a structured list of all discovered URLs

Configuration

Input Variable

The Website Mapper requires a single input:

urlVar: The starting URL to begin mapping from
Example: https://example.com

Output Structure

The node outputs an array of discovered URLs:

{
  "out": {
    "urls": [
      "https://example.com",
      "https://example.com/about",
      "https://example.com/products",
      "https://example.com/contact"
    ],
    "totalPages": 4,
    "crawlDepth": 2
  },
  "_input": {
    "urlVar": "https://example.com"
  }
}

Common Usage Patterns

Map and Scrape Workflow

Dataset Source (websites)
  → Website Mapper (discover all pages)
  → Array Splitter (split urls array)
  → Page Scraper (extract content from each page)
  → Prompt (analyze content)
  → Dataset Sink (save results)

Selective Crawling

After mapping, you can filter URLs before scraping:

Website Mapper
  → (manual filtering or conditional logic)
  → Only scrape pages matching certain patterns
  → Example: only /blog/ or /docs/ pages

Tip

Website Mapper respects robots.txt by default. It will not crawl pages disallowed for bots.

Rate Limiting

The Website Mapper automatically implements polite crawling:

Respects robots.txt directives
Adds delays between requests (default: 1 second)
Limits concurrent requests
Sets appropriate User-Agent headers

Warning

Be respectful when crawling websites. Excessive requests can strain servers. Consider using the depth limit to avoid crawling entire large sites.

Configuration Options

Max Depth

Control how many levels deep to crawl:

Depth 1: Only the starting page
Depth 2: Starting page + directly linked pages
Depth 3+: Continue following links

Domain Restriction

By default, Website Mapper only follows links within the same domain. It will not follow external links.

Best Practices

Start Small

Test with a small site or limited depth before mapping large websites:

Use depth limit of 2-3 for testing
Verify the results before processing all pages
Consider the number of pages to avoid long execution times

Combine with Array Processing

Use Array Splitter to process discovered URLs individually:

Website Mapper output: out.urls (array)
  → Array Splitter: split on 'urls'
  → Each URL processed individually by downstream nodes

Filter Before Scraping

Not all discovered pages may be relevant. Consider adding filtering logic to process only specific page types.

Limitations

JavaScript-heavy sites: May not discover pages loaded dynamically
Authentication: Cannot access pages behind login walls
Large sites: May take significant time to map thousands of pages
Rate limits: Some sites may block or throttle crawlers

Error Handling

Website Mapper handles common errors gracefully:

Invalid URLs are skipped with warnings
Connection timeouts continue with discovered pages
HTTP errors (404, 500) are logged but don't stop the crawl
Inaccessible pages are reported in the output