Job Data · 8 min read

Indeed Scraper Python: Why requests and BeautifulSoup Always Fail (2026)

Rohith

Share:

You installed requests. You installed BeautifulSoup. You ran the script. You got an empty list — or worse, a CAPTCHA page.

Every "Indeed scraper Python" tutorial on the first page of Google breaks within hours of publication. The problem isn't your code. It's that the tutorials describe a version of Indeed that no longer exists. Here's what's actually happening — and what actually works.

Skip the Python setup — export Indeed jobs to CSV in 2 minutes

Clura runs inside your Chrome browser, reads the fully-rendered Indeed page, and exports job title, company, salary, location, and URL to CSV in one click. No code, no proxies, no blocked requests.

Add to Chrome — Free →

Why Does Python requests Return Empty Results on Indeed?

Python's requests library returns empty job results on Indeed because Indeed loads its job cards via JavaScript after the initial page load. The HTML that requests fetches contains an empty container div — the job data is injected by JavaScript milliseconds later, which requests never waits for.

When you send a GET request to an Indeed search URL, you get back the raw HTML the server delivers — before any JavaScript runs. That HTML looks like this:

What requests fetches What you see in the browser
2,345 fully rendered job cards with titles, companies, salaries
Empty shell — no job data Complete page with all fields visible

Indeed's job cards are injected by JavaScript after the page loads. The requests library makes one HTTP request and returns the raw HTML — it doesn't run JavaScript, doesn't wait for API calls to complete, and never sees the data. BeautifulSoup then parses that empty HTML and returns nothing. This is the same reason most scrapers fail on modern sites — see how to scrape dynamic websites for the full picture.

The HTML page source test: right-click any Indeed search page → View Page Source → Ctrl+F for a job title you see on screen. It won't be there. That confirms the data is JavaScript-rendered.

Does Playwright Work for Scraping Indeed in Python?

Playwright works for scraping Indeed because it launches a real Chromium browser that executes JavaScript and renders job cards. However, Indeed's CloudFront bot detection blocks headless Playwright at a ~31% rate based on TLS fingerprinting. Adding the playwright-stealth plugin and residential proxies brings the block rate down to ~12%.

Playwright solves the JavaScript rendering problem — it launches a real browser, waits for content to load, then reads the DOM. Here's a minimal working example:

Step Code
Install pip install playwright playwright-stealth && playwright install chromium
Import from playwright.sync_api import sync_playwright from playwright_stealth import stealth_sync
Launch browser = p.chromium.launch(headless=False) # headless=True gets detected more
Navigate page.goto("https://www.indeed.com/jobs?q=data+engineer&l=New+York")
Wait page.wait_for_selector(".job_seen_beacon", timeout=10000)
Extract cards = page.query_selector_all(".job_seen_beacon")

The problem: even with stealth mode, Indeed's CloudFront layer fingerprints your TLS handshake. Headless Chromium sends different cipher suite ordering than real Chrome. Indeed's detection rate for headless traffic is ~31% — nearly 3x the average across job boards. In practice, roughly 1 in 3 sessions gets a CAPTCHA or empty results.

What You Actually Need to Make Playwright Work on Indeed

Requirement Why It's Needed Cost
playwright-stealth plugin Masks headless browser signals Free
headless=False (visible browser) Reduces detection vs headless mode Free (needs display server on Linux)
Residential proxy rotation Data center IPs are blocked immediately $8–40/GB (Bright Data, Oxylabs)
Random delays between requests Mimics human browsing speed Free
User-agent rotation Reduces fingerprint consistency Free

Even with all of the above, expect a ~12% block rate on Indeed specifically. For one-off exports, that's tolerable. For production pipelines scraping thousands of listings daily, you're managing proxy pools, CAPTCHA solving, and retry logic — significant engineering overhead.

Need Indeed data now, not after a week of proxy setup?

Clura exports Indeed search results from your real Chrome session — no headless detection, no proxies, ~4% block rate. Works in 2 minutes.

Add to Chrome — Free →

What Is the Real Block Rate for Python Scrapers on Indeed?

Based on benchmarks across 100,000+ extractions: Python requests is blocked ~85% of the time on Indeed immediately. Playwright without stealth is blocked ~31%. Playwright with stealth and residential proxies drops to ~12%. A Chrome extension operating in a real browser session achieves ~4% — the lowest of any method.

Method Block Rate on Indeed Why
Python requests + BeautifulSoup ~85% No JS rendering + data center IP + bot UA
Playwright (headless, no stealth) ~31% Detectable TLS fingerprint + headless signals
Playwright + stealth + residential proxies ~12% Some fingerprint signals still detectable
Selenium (undetected-chromedriver) ~18% Patches help but not complete
Chrome extension (real browser session) ~4% Authentic TLS + residential IP + real cookies
Bar chart comparing block rates for different Python scraping methods on Indeed — requests at 85%, Playwright headless at 31%, Playwright with stealth at 12%, Chrome extension at 4%
Block rate comparison across methods on Indeed. Data from 100,000+ extraction attempts.

The 31% headless block rate is Indeed-specific — meaningfully higher than the ~12% average across all dynamic sites we've benchmarked. Indeed's CloudFront configuration is specifically tuned to detect common scraping tool signatures. This is why browser-based scraping is the recommended approach.

Python vs Chrome Extension: Which Should You Actually Use for Indeed?

Python with Playwright is the right choice for scheduled, unattended Indeed scraping — a cron job pulling job data every morning without a browser open. A Chrome extension is faster and more reliable for on-demand exports, ad-hoc research, and users who don't want to manage proxy infrastructure.

Criteria Python (Playwright) Chrome Extension (Clura)
Setup time 4–8 hours 2 minutes
Scheduled / unattended runs Yes — cron job friendly No — requires browser open
Block rate on Indeed ~12% (with stealth + proxies) ~4%
Monthly cost $0 + proxy costs ($50–200+/mo) Free tier / $29.99 lifetime
Maintenance Breaks when Indeed updates layout Updated automatically
Volume Unlimited (with enough proxies) 500 rows free / unlimited paid
Technical skill required Python, async, proxy management None

If you need to run the same Indeed search every morning at 6am without touching a keyboard — use Playwright. Budget 1–2 days of setup, plan for $50–200/mo in proxy costs, and expect occasional breakage when Indeed updates its selectors. If you need a clean export today, or you're a recruiter or HR analyst who needs data weekly without managing infrastructure — use Clura.

Clura extracting Indeed job listings from a real browser session — no Python, no proxies.

How to Scrape Indeed with Python: Working Setup (2026)

To scrape Indeed with Python in 2026: use Playwright (not requests), install playwright-stealth, run with headless=False where possible, use residential proxies, wait for job card selectors before extracting, and add random delays between page loads.

If you specifically need a Python solution, here is the minimum viable setup that actually works in 2026:

  1. Install dependencies: pip install playwright playwright-stealth then playwright install chromium
  2. Configure residential proxies: Sign up for Bright Data, Oxylabs, or Smartproxy. Get your proxy endpoint and credentials. Data center IPs will be blocked within minutes.
  3. Apply stealth: Use stealth_sync(page) immediately after creating the page context — before any navigation.
  4. Navigate and wait: Go to your Indeed search URL, then page.wait_for_selector('.job_seen_beacon', timeout=15000). If this times out, you've been blocked.
  5. Extract fields: Loop through page.query_selector_all('.job_seen_beacon'). Each card contains title, company, location, salary (when present), and the job URL.
  6. Paginate carefully: Wait 3–7 seconds between pages. Click the 'Next' button rather than constructing pagination URLs manually — URL patterns change.

For the selector names, check DevTools on a live Indeed page — they change periodically. The job card container has historically been .job_seen_beacon, title is in [data-testid="jobTitle"], and company is in [data-testid="company-name"]. These are the current selectors as of May 2026 but should be verified against the live page. For a more stable approach, see scraping Indeed without code.

Frequently Asked Questions

Why is my Indeed Python scraper returning an empty list?

Because Indeed loads job cards via JavaScript after the initial page load. Python's requests library fetches the raw HTML before JavaScript runs — the job container is empty at that point. Switch to Playwright, which launches a real browser and waits for JavaScript to render the content.

Can I scrape Indeed with BeautifulSoup?

BeautifulSoup alone can't scrape Indeed because it only parses HTML — it doesn't execute JavaScript. You need Playwright or Selenium to render the page first, then pass the rendered HTML to BeautifulSoup for parsing. Even then, you'll need residential proxies to avoid getting blocked.

Does Selenium work for scraping Indeed?

Selenium can render Indeed's JavaScript, but it's detectable by Indeed's CloudFront bot detection. Using undetected-chromedriver improves results but still results in ~18% block rate. Playwright with the stealth plugin generally performs better. Both require residential proxies for consistent results.

How do I avoid getting blocked when scraping Indeed with Python?

Use Playwright with the playwright-stealth plugin, run with headless=False where possible, route all requests through residential proxies (not data center IPs), add random delays of 3–7 seconds between page loads, and rotate user agents. Even with all measures, expect ~12% block rate on Indeed.

What selectors should I use to scrape Indeed job listings?

As of May 2026: job card container is .job_seen_beacon, job title is [data-testid='jobTitle'], company name is [data-testid='company-name'], location is [data-testid='text-location'], and salary (when present) is [data-testid='attribute_snippet_testid']. These selectors change when Indeed updates its frontend — always verify against a live page.

Conclusion

Python scraping on Indeed is a solvable problem in 2026 — but the simple tutorials are wrong. requests + BeautifulSoup return nothing. Playwright works but needs stealth plugins, residential proxies, and careful rate limiting.

The real question is whether the infrastructure cost is worth it. For scheduled automation — the same search running every morning — Playwright with proxies is the right call. For everything else, a Chrome extension that reads the live DOM is faster, cheaper, and doesn't break when Indeed changes its selectors.

Explore related guides:

Tired of debugging Python scrapers that keep breaking?

Clura works inside your real Chrome session — no Playwright, no proxies, no selectors to maintain. Open Indeed, click Clura, export CSV. Takes 2 minutes.

Add to Chrome — Free →
Share:

About the Author

R
RohithFounder, Clura

Built Clura to make web data extraction simple and accessible — no coding required.

FounderChess PlayerGym Freak
View all →