How do you keep scrapers running without breaking Indeed/LinkedIn TOS?

We use compliant scraping via public endpoints and official APIs where available. Rotating proxies reduce request density per IP. We avoid login-gated data and review each scraper against current platform terms before deployment. The client owns the risk decision.

Why Claude for sentiment and not a $0.001 classification model?

The founder needed "reply now / this week / never" not "positive/negative." A cheap classifier gives you a label. Claude gives you a priority. At Haiku prices the difference is fractions of a cent per reply and the output is actually usable.

How do you handle scraper breakage at 3am?

n8n error branches write a failure event to a Supabase log and trigger a Slack alert. The founder sees a message before they open the inbox. We also built a manual trigger button in n8n so a re-run takes 30 seconds, not a support ticket.

What does firmographic enrichment actually add to a recruiting outreach?

Google Maps data gives you employee count range, industry vertical, and location before the first message. That lets the outreach reference real context instead of generic copy, and lets the founder skip outreach entirely on companies that are out of scope.

Can this run on top of any CRM, or only Zoho?

The scraping and enrichment layers are CRM-agnostic. We used Zoho because the client was already there. The n8n workflow that writes to Zoho can be repointed to HubSpot, Pipedrive, or Salesforce by swapping the CRM node. Data mapping takes an afternoon.

How a Recruiting Firm Automated Its Entire Job-Board to CRM Pipeline

Before	After
90 minutes/day in the inbox, triaged by gut	15 minutes/day in the inbox, triaged by reply urgency
Manual Indeed + LinkedIn pulls into Zoho	Automated scraping + enrichment + Zoho sync
Leads lost to inbox noise weekly	Every relevant posting captured
Candidates reached without firmographic context	~10 hours/week saved

Why a recruiting firm with a CRM still loses leads daily

Most recruiting agencies have a CRM. Almost none of them trust it.

The data is stale, the records are duplicates, and half the contacts have no context attached. So the founder ignores the CRM and works the inbox instead, triaging by gut, following up on whatever they remember, and losing the rest to noise.

That was the situation here. A recruiting firm with solid revenue was running Zoho CRM alongside four other tools, none of them talking to each other. Job postings came from manual Indeed and LinkedIn pulls. Data entry happened when someone had time. The inbox was the real workflow, and it cost 90 minutes a day to process.

The problem wasn't missing tools. They had Zoho. They had SmartLead. They had a process. The problem was that every step required a human to transfer data between systems that should have been connected from the start.

Why we used Claude for sentiment instead of a cheap classifier

The inbox triage problem sounds like a classification problem. You get a reply, you label it positive or negative, you act accordingly.

But that's not how a recruiter thinks. A recruiter wants to know: is this person ready to talk today, should I follow up next week, or is this a dead end? "Positive" doesn't answer that. "Enthusiastic, referenced a specific role, asked about timeline" answers that.

We evaluated a standard classification model first. It would have cost less than a cent per call. But the output was a label with no nuance, and a label with no nuance doesn't change what the founder does next.

Claude Haiku costs fractions of a cent per reply at current pricing. At that price point, you get a structured output that includes urgency tier, a one-line reason, and a suggested follow-up action. The founder reads one line per reply instead of the full thread. Triage dropped from 90 minutes to 15.

The decision rule for when to use a language model vs. a classifier: if the useful output is a label, use a classifier. If the useful output is a judgment call, use a language model. This was a judgment call.

See the Anthropic API docs for how to structure these calls in a production workflow.

The pipeline: Indeed + LinkedIn scrapers through n8n into Zoho

The core of the build is an n8n workflow with four stages.

Stage 1: Scraping. Custom scrapers pull job postings from Indeed and LinkedIn on a schedule, operating within platform terms using compliant scraping via public endpoints and official APIs where available. Rotating proxies reduce request frequency per IP and keep the scraper running without triggering rate limits. The scrapers output structured JSON: company name, job title, location, posting date, contact signals where available.

Stage 2: Enrichment. Each company name and location runs through the Google Maps Platform Places API. This returns employee count range, industry category, address, and website. That firmographic layer is what turns a raw job posting into a qualified prospect. The founder can now filter outreach to companies in a specific size band or geography without doing any research by hand.

Stage 3: Deduplication and Zoho sync. Before writing a record, n8n checks the Zoho CRM API for an existing contact by email or company name. Duplicates get updated with fresh data instead of creating new rows. New contacts get created with the full enrichment attached. The CRM is no longer a graveyard of stale records; it reflects what the scraper found in the last 24 hours.

Stage 4: Outbound routing. Qualified new contacts get routed to SmartLead sequences based on job category and company size. The routing logic lives in an n8n switch node, not hardcoded in the sequence tool. Changing the rules means editing one node, not rebuilding a campaign.

The n8n consultant pillar covers how we architect multi-step n8n workflows for recruiting and operations use cases. This build follows the same pattern: sourcing and outbound are separate flows connected by a staging table, not a single chain where one failure kills everything.

The inbox triage layer that turned 90 minutes into 15

When a reply comes into SmartLead, a webhook fires to n8n. n8n passes the reply text to Claude Haiku with a system prompt that outputs three fields: urgency tier (now / this week / archive), reason (one sentence), and next action (specific, not generic).

That structured output writes back to the contact record in Zoho and tags the thread in SmartLead. The founder opens the inbox and sees every reply sorted by urgency, with a one-line reason and a suggested next step already written.

The cost per reply at current Haiku pricing is well under a cent. For a firm processing 50 to 150 replies a week, the monthly Claude API spend is a rounding error compared to the time saved.

The triage layer also catches something manual review misses: replies that look negative on the surface but contain a timing signal. "Not looking right now but check back in Q3" is an archive label with a reactivation date, not a discard. The prompt is written to extract that nuance explicitly.

The cost math

Monthly run costs for this stack:

Scraper proxy infrastructure: $60 to $80/month depending on request volume
n8n self-hosted on a $10/month VPS: approximately $10/month
Google Maps Places API: $0 to $20/month (well within free tier for this volume)
Claude Haiku API for inbox triage: $5 to $15/month at 50 to 150 replies/week
SmartLead: already in the client's existing stack

Total new monthly spend: under $130. Blended under $200 including the client's existing SmartLead seat.

Time saved: 10 hours per week. At a conservative $75/hour opportunity cost for founder time, that's $3,000/month in reclaimed capacity. The payback math depends on your deal size, but the reclaimed capacity compounds fast.

The scraper build, enrichment pipeline, CRM deduplication logic, and inbox triage layer together represent meaningful engineering work. This is not a weekend project. The build is priced per scope, fixed during the AI Operations X-Ray. The payback curve is steep.

Who this applies to

This build fits 5-15 person recruiting and staffing agencies where a founder or SDR is still pulling postings by hand. If your CRM data is stale, your inbox is your real workflow, and your outreach goes out without firmographic context, this pipeline closes all three gaps. The AI Operations X-Ray will tell you where your specific stack has the highest-value automation opportunities.

Also worth reading: how a 5-person agency replaced its SDR with a sourcing engine at $0.08 per lead.

What I'd revisit

Two things I'd do differently.

First, build the scraper health monitor before the client goes live, not after the first breakage. When Indeed changes its page structure, the workflow fails silently. A row-count check in n8n that fires a Slack alert when no new records are created in 24 hours costs 30 minutes to build and prevents a week of lost data.

Second, I'd check the Indeed legal / robots.txt and LinkedIn's official Jobs API more carefully before reaching for a custom scraper. Platform terms evolve. An official API endpoint is always lower-risk than a scraper, even if it's less flexible. Where official APIs exist and cover the needed data, start there.

Want the same for your pipeline? Run the AI Operations X-Ray.