Indeed Scraper Python: Why requests and BeautifulSoup Always Fail (2026)
You installed requests. You installed BeautifulSoup. You ran the script. You got an empty list — or worse, a CAPTCHA page.
Every "Indeed scraper Python" tutorial on the first page of Google breaks within hours of publication. The problem isn't your code. It's that the tutorials describe a version of Indeed that no longer exists. Here's what's actually happening — and what actually works.
Skip the Python setup — export Indeed jobs to CSV in 2 minutes
Clura runs inside your Chrome browser, reads the fully-rendered Indeed page, and exports job title, company, salary, location, and URL to CSV in one click. No code, no proxies, no blocked requests.
Add to Chrome — Free →Why Does Python requests Return Empty Results on Indeed?
Python's requests library returns empty job results on Indeed because Indeed loads its job cards via JavaScript after the initial page load. The HTML that requests fetches contains an empty container div — the job data is injected by JavaScript milliseconds later, which requests never waits for.
When you send a GET request to an Indeed search URL, you get back the raw HTML the server delivers — before any JavaScript runs. That HTML looks like this:
| What requests fetches | What you see in the browser |
|---|---|
| 2,345 fully rendered job cards with titles, companies, salaries | |
| Empty shell — no job data | Complete page with all fields visible |
Indeed's job cards are injected by JavaScript after the page loads. The requests library makes one HTTP request and returns the raw HTML — it doesn't run JavaScript, doesn't wait for API calls to complete, and never sees the data. BeautifulSoup then parses that empty HTML and returns nothing. This is the same reason most scrapers fail on modern sites — see how to scrape dynamic websites for the full picture.
The HTML page source test: right-click any Indeed search page → View Page Source → Ctrl+F for a job title you see on screen. It won't be there. That confirms the data is JavaScript-rendered.
Does Playwright Work for Scraping Indeed in Python?
Playwright works for scraping Indeed because it launches a real Chromium browser that executes JavaScript and renders job cards. However, Indeed's CloudFront bot detection blocks headless Playwright at a ~31% rate based on TLS fingerprinting. Adding the playwright-stealth plugin and residential proxies brings the block rate down to ~12%.
Playwright solves the JavaScript rendering problem — it launches a real browser, waits for content to load, then reads the DOM. Here's a minimal working example:
| Step | Code |
|---|---|
| Install | pip install playwright playwright-stealth && playwright install chromium |
| Import | from playwright.sync_api import sync_playwright from playwright_stealth import stealth_sync |
| Launch | browser = p.chromium.launch(headless=False) # headless=True gets detected more |
| Navigate | page.goto("https://www.indeed.com/jobs?q=data+engineer&l=New+York") |
| Wait | page.wait_for_selector(".job_seen_beacon", timeout=10000) |
| Extract | cards = page.query_selector_all(".job_seen_beacon") |
The problem: even with stealth mode, Indeed's CloudFront layer fingerprints your TLS handshake. Headless Chromium sends different cipher suite ordering than real Chrome. Indeed's detection rate for headless traffic is ~31% — nearly 3x the average across job boards. In practice, roughly 1 in 3 sessions gets a CAPTCHA or empty results.
What You Actually Need to Make Playwright Work on Indeed
| Requirement | Why It's Needed | Cost |
|---|---|---|
| playwright-stealth plugin | Masks headless browser signals | Free |
| headless=False (visible browser) | Reduces detection vs headless mode | Free (needs display server on Linux) |
| Residential proxy rotation | Data center IPs are blocked immediately | $8–40/GB (Bright Data, Oxylabs) |
| Random delays between requests | Mimics human browsing speed | Free |
| User-agent rotation | Reduces fingerprint consistency | Free |
Even with all of the above, expect a ~12% block rate on Indeed specifically. For one-off exports, that's tolerable. For production pipelines scraping thousands of listings daily, you're managing proxy pools, CAPTCHA solving, and retry logic — significant engineering overhead.
Need Indeed data now, not after a week of proxy setup?
Clura exports Indeed search results from your real Chrome session — no headless detection, no proxies, ~4% block rate. Works in 2 minutes.
Add to Chrome — Free →What Is the Real Block Rate for Python Scrapers on Indeed?
Based on benchmarks across 100,000+ extractions: Python requests is blocked ~85% of the time on Indeed immediately. Playwright without stealth is blocked ~31%. Playwright with stealth and residential proxies drops to ~12%. A Chrome extension operating in a real browser session achieves ~4% — the lowest of any method.
| Method | Block Rate on Indeed | Why |
|---|---|---|
| Python requests + BeautifulSoup | ~85% | No JS rendering + data center IP + bot UA |
| Playwright (headless, no stealth) | ~31% | Detectable TLS fingerprint + headless signals |
| Playwright + stealth + residential proxies | ~12% | Some fingerprint signals still detectable |
| Selenium (undetected-chromedriver) | ~18% | Patches help but not complete |
| Chrome extension (real browser session) | ~4% | Authentic TLS + residential IP + real cookies |
The 31% headless block rate is Indeed-specific — meaningfully higher than the ~12% average across all dynamic sites we've benchmarked. Indeed's CloudFront configuration is specifically tuned to detect common scraping tool signatures. This is why browser-based scraping is the recommended approach.
Python vs Chrome Extension: Which Should You Actually Use for Indeed?
Python with Playwright is the right choice for scheduled, unattended Indeed scraping — a cron job pulling job data every morning without a browser open. A Chrome extension is faster and more reliable for on-demand exports, ad-hoc research, and users who don't want to manage proxy infrastructure.
| Criteria | Python (Playwright) | Chrome Extension (Clura) |
|---|---|---|
| Setup time | 4–8 hours | 2 minutes |
| Scheduled / unattended runs | Yes — cron job friendly | No — requires browser open |
| Block rate on Indeed | ~12% (with stealth + proxies) | ~4% |
| Monthly cost | $0 + proxy costs ($50–200+/mo) | Free tier / $29.99 lifetime |
| Maintenance | Breaks when Indeed updates layout | Updated automatically |
| Volume | Unlimited (with enough proxies) | 500 rows free / unlimited paid |
| Technical skill required | Python, async, proxy management | None |
If you need to run the same Indeed search every morning at 6am without touching a keyboard — use Playwright. Budget 1–2 days of setup, plan for $50–200/mo in proxy costs, and expect occasional breakage when Indeed updates its selectors. If you need a clean export today, or you're a recruiter or HR analyst who needs data weekly without managing infrastructure — use Clura.
How to Scrape Indeed with Python: Working Setup (2026)
To scrape Indeed with Python in 2026: use Playwright (not requests), install playwright-stealth, run with headless=False where possible, use residential proxies, wait for job card selectors before extracting, and add random delays between page loads.
If you specifically need a Python solution, here is the minimum viable setup that actually works in 2026:
- Install dependencies:
pip install playwright playwright-stealththenplaywright install chromium - Configure residential proxies: Sign up for Bright Data, Oxylabs, or Smartproxy. Get your proxy endpoint and credentials. Data center IPs will be blocked within minutes.
- Apply stealth: Use
stealth_sync(page)immediately after creating the page context — before any navigation. - Navigate and wait: Go to your Indeed search URL, then
page.wait_for_selector('.job_seen_beacon', timeout=15000). If this times out, you've been blocked. - Extract fields: Loop through
page.query_selector_all('.job_seen_beacon'). Each card contains title, company, location, salary (when present), and the job URL. - Paginate carefully: Wait 3–7 seconds between pages. Click the 'Next' button rather than constructing pagination URLs manually — URL patterns change.
For the selector names, check DevTools on a live Indeed page — they change periodically. The job card container has historically been .job_seen_beacon, title is in [data-testid="jobTitle"], and company is in [data-testid="company-name"]. These are the current selectors as of May 2026 but should be verified against the live page. For a more stable approach, see scraping Indeed without code.
Frequently Asked Questions
Why is my Indeed Python scraper returning an empty list?
Because Indeed loads job cards via JavaScript after the initial page load. Python's requests library fetches the raw HTML before JavaScript runs — the job container is empty at that point. Switch to Playwright, which launches a real browser and waits for JavaScript to render the content.
Can I scrape Indeed with BeautifulSoup?
BeautifulSoup alone can't scrape Indeed because it only parses HTML — it doesn't execute JavaScript. You need Playwright or Selenium to render the page first, then pass the rendered HTML to BeautifulSoup for parsing. Even then, you'll need residential proxies to avoid getting blocked.
Does Selenium work for scraping Indeed?
Selenium can render Indeed's JavaScript, but it's detectable by Indeed's CloudFront bot detection. Using undetected-chromedriver improves results but still results in ~18% block rate. Playwright with the stealth plugin generally performs better. Both require residential proxies for consistent results.
How do I avoid getting blocked when scraping Indeed with Python?
Use Playwright with the playwright-stealth plugin, run with headless=False where possible, route all requests through residential proxies (not data center IPs), add random delays of 3–7 seconds between page loads, and rotate user agents. Even with all measures, expect ~12% block rate on Indeed.
What selectors should I use to scrape Indeed job listings?
As of May 2026: job card container is .job_seen_beacon, job title is [data-testid='jobTitle'], company name is [data-testid='company-name'], location is [data-testid='text-location'], and salary (when present) is [data-testid='attribute_snippet_testid']. These selectors change when Indeed updates its frontend — always verify against a live page.
Conclusion
Python scraping on Indeed is a solvable problem in 2026 — but the simple tutorials are wrong. requests + BeautifulSoup return nothing. Playwright works but needs stealth plugins, residential proxies, and careful rate limiting.
The real question is whether the infrastructure cost is worth it. For scheduled automation — the same search running every morning — Playwright with proxies is the right call. For everything else, a Chrome extension that reads the live DOM is faster, cheaper, and doesn't break when Indeed changes its selectors.
Explore related guides:
- Indeed Scraper (No-Code Guide) — export Indeed job listings to CSV in under 5 minutes — no Python required
- Does Indeed Have an API? — Indeed's API was deprecated in 2021 — here's what developers use instead
- Scraping Dynamic Websites — why JavaScript-rendered pages break Python scrapers and how to fix it
- Job Listings Scraper Guide — scraping any job board with one workflow
Tired of debugging Python scrapers that keep breaking?
Clura works inside your real Chrome session — no Playwright, no proxies, no selectors to maintain. Open Indeed, click Clura, export CSV. Takes 2 minutes.
Add to Chrome — Free →