What this does
Receives a URL via webhook, uses Firecrawl to scrape the page into clean markdown, and stores it as vector embeddings in Pinecone. A visual, self-hosted ingestion pipeline for RAG knowledge bases. Adding a new source is as simple as sending a URL.
The second part of the workflow exposes a chat interface where an AI Agent queries the stored knowledge base to answer questions, with Cohere reranking for better retrieval quality.
How it works
Part 1: Ingestion Pipeline
url field/scrape fetches the page and converts it to clean markdownPart 2: RAG Chat Agent
🔥 Firecrawl
🌲 Pinecone
🧠 OpenAI Embeddings
🤖 OpenRouter (Claude Sonnet)
🎯 Cohere Reranker
Webhook usage
Send a POST request to the webhook URL:
curl -X POST https://your-n8n-instance/webhook/your-id \
-H "Content-Type: application/json" \
-d '{"url": "firecrawl.dev"}'
Pinecone setup
Your Pinecone index must be configured with 1536 dimensions to match the OpenAI text-embedding-3-small model output. See the sticky note inside the workflow for the exact index settings.
Requirements