Business Contact Capture Flow
Search for businesses by niche, filter official sites, extract key data via scraping, and automatically add verified contacts to Google Sheets.
graph TD
%%{init: {'theme': 'mc','layout': 'elk'}}%%
SearXng-ntpo0[Web Search SearXng]
style SearXng-ntpo0 stroke:#a170ff
DeepseekModel-yd7iq[Deepseek]
style DeepseekModel-yd7iq stroke:#a170ff
CreateData-dfb3f[Create Data]
style CreateData-dfb3f stroke:#a170ff
Switch-56w06[Switch]
style Switch-56w06 stroke:#a170ff
WebScraper-p0rr5[Web Scraper]
style WebScraper-p0rr5 stroke:#a170ff
DeepseekModel-4gnpf[Deepseek2]
style DeepseekModel-4gnpf stroke:#a170ff
CreateData-p25ng[Create Data2]
style CreateData-p25ng stroke:#a170ff
TextInput-3uo52[<div><img alt="logo" src="/_astro/type.Dy26vmDy.svg" style="height: 20px !important;width: 20px !important"/></div>Cantidad de citios ]
style TextInput-3uo52 stroke:#a170ff
TextInput-wcn04[<div><img alt="logo" src="/_astro/type.Dy26vmDy.svg" style="height: 20px !important;width: 20px !important"/></div>Query]
style TextInput-wcn04 stroke:#a170ff
CreateData-taieq[Create Data3]
style CreateData-taieq stroke:#a170ff
Switch-44sfm[Switch2]
style Switch-44sfm stroke:#a170ff
AdvancedAgent-plvkg[Agent]
style AdvancedAgent-plvkg stroke:#a170ff
GSheetCellComponent-usi3o[Sheet Cells ]
style GSheetCellComponent-usi3o stroke:#a170ff
DeepseekModel-c77dx[Deepseek3]
style DeepseekModel-c77dx stroke:#a170ff
SearXng-ntpo0 -.- DeepseekModel-yd7iq
linkStyle 0 stroke:#a170ff
DeepseekModel-yd7iq -.- CreateData-dfb3f
linkStyle 1 stroke:#a170ff
CreateData-dfb3f -.- Switch-56w06
linkStyle 2 stroke:#a170ff
Switch-56w06 -.- WebScraper-p0rr5
linkStyle 3 stroke:#a170ff
WebScraper-p0rr5 -.- DeepseekModel-4gnpf
linkStyle 4 stroke:#a170ff
CreateData-p25ng -.- SearXng-ntpo0
linkStyle 5 stroke:#a170ff
TextInput-3uo52 -.- CreateData-p25ng
linkStyle 6 stroke:#a170ff
TextInput-wcn04 -.- CreateData-p25ng
linkStyle 7 stroke:#a170ff
DeepseekModel-4gnpf -.- CreateData-taieq
linkStyle 8 stroke:#a170ff
CreateData-taieq -.- Switch-44sfm
linkStyle 9 stroke:#a170ff
Switch-44sfm -.- AdvancedAgent-plvkg
linkStyle 10 stroke:#a170ff
GSheetCellComponent-usi3o -.- AdvancedAgent-plvkg
linkStyle 11 stroke:#a170ff
DeepseekModel-c77dx -.- AdvancedAgent-plvkg
linkStyle 12 stroke:#a170ff
Business Contact Capture Flow Documentation
🧩 Overview
The workflow automates the discovery and collection of contact information for businesses in a specified niche. It performs a web search, filters official sites using an LLM classifier, scrapes the chosen pages, extracts structured contact details, and writes verified records to a Google Sheets spreadsheet. This end‑to‑end process streamlines lead generation and reduces manual data gathering.
⚙️ Main Features
- Accepts a niche query and desired number of results from the user.
- Executes a web search to retrieve candidate business sites.
- Classifies each site title as official or not with a LLM.
- Scrapes the content of verified sites to Markdown.
- Extracts business name, contact e‑mail, phone, description, and URL using a structured‑output LLM.
- Routes only valid records through a conditional switch.
- Automatically appends the extracted information to a Google Sheets spreadsheet.
- Uses an advanced agent to orchestrate tool usage and manage memory and iteration limits.
🔄 Workflow Steps
| Component Name | Role in the Workflow | Key Inputs | Key Outputs |
|---|---|---|---|
| Text Input | Collects the desired number of search results. | None (default value “10”) | Text value |
| Text Input | Collects the niche query (e.g., “restaurant in Madrid Spain”). | None | Text value |
| Create Data | Builds a data record combining the number of sites and the query. | Number of sites, Query | Data object with fields for search |
| Web Search (SearXng) | Performs a web search based on the query. | Data record with query | List of search results (titles, URLs) |
| Deepseek LLM (Site Filter) | Classifies each title as official (“Sí”) or not (“No”). | List of search results | Data with classification label and URL |
| Create Data | Constructs records containing the label, site name, and URL. | Classification results | Data items with fields etiqueta, nombre_del_citio, url |
| Switch (Filter Official Sites) | Routes only items labeled “Sí” to the next step. | Data items | Filtered data for official sites |
| Web Scraper | Scrapes the HTML of each official site into Markdown. | URL | Scraped Markdown text |
| Deepseek LLM (Data Extractor) | Parses Markdown to extract structured business details. | Scraped Markdown | Data with nombre, correo, telefono, descripcion, url |
| Create Data (Prepare for Google Sheets) | Builds a row ready for the spreadsheet. | Extracted business fields | Data with etiqueta, nombre_del_citio, url |
| Switch (Filter Non‑“No” entries) | Excludes any records marked “No”. | Data rows | Valid rows for spreadsheet |
| Advanced Agent | Orchestrates tool usage and final insertion into Google Sheets. | Valid rows | Agent response |
| Google Sheets Cells | Adds a new row to the specified spreadsheet. | Row data | Confirmation of insertion |
🧠 Notes
- The LLM used for classification and extraction must receive the exact system messages; any deviation may cause the component to output “No” and halt the pipeline.
- The switch components rely on the exact words “Sí” and “No”; any change in case or punctuation will prevent routing.
- The web scraper is limited to a 10‑second timeout per request; sites that take longer will fail silently.
- Google Sheets integration requires valid credentials and write access to the target spreadsheet; failures will surface as an exception in the sheet component.
- All batch components run up to five parallel executions by default, which can be tuned via the parallel executions setting.
- If no sites pass the official‑site filter, the workflow stops before the agent stage.
- The advanced agent employs a 35‑iteration cap and a trim memory strategy to maintain context without exceeding resource limits.
- The final agent response includes the agent’s state only when the Include State in response flag is set to true; otherwise only the result of the sheet operation is returned.