Facebook Scraper: What's Still Possible After the API Lockdown
In March 2018, Facebook suspended 200+ apps and locked down the Graph API following the Cambridge Analytica scandal. By 2022, most third-party data endpoints had been deprecated or moved behind strict business verification. A Facebook scraper that worked in 2017 — pulling user data, page posts, group members via Graph API — does almost nothing useful today. But publicly visible Facebook data is still extractable, and the method determines whether you hit an 8% or 78% block rate.
This guide covers what the 2018 lockdown actually removed, what's still accessible on public Pages, Groups, and Marketplace in 2026, why Python scrapers fail on Facebook's behavioral ML, and the full extraction workflow. For the broader social media scraping picture including TikTok, Reddit, and X, read the social media scrapers hub.
Scrape Facebook Pages and Groups without hitting Meta's bot detection
Clura runs in your real Chrome browser — your session cookies, your residential IP, your real TLS fingerprint. Meta's behavioral ML sees normal browsing. ~8% block rate across Facebook extractions.
Add to Chrome — Free →What Did the 2018 Graph API Lockdown Actually Remove?
Facebook's 2018 response to Cambridge Analytica deprecated the Graph API endpoints that let third-party apps read user friend lists, profile data, and private group content without explicit user consent. By Graph API v14.0 (2022), most page and user data endpoints required business verification plus specific permissions reviewed by Meta. What remains publicly accessible: Page posts, page follower counts, public group post titles, and business contact info.
Before 2018, the Facebook Graph API was one of the most permissive data APIs on the internet. Third-party apps could read friend lists, browse post history, access group membership lists, and pull detailed profile data with a single API token. Cambridge Analytica exploited this to harvest data from 87 million Facebook users without their direct consent — and Facebook's response was to shut down the API, not just the app.
| Endpoint/Data Type | Status Pre-2018 | Status 2026 |
|---|---|---|
| User profile data (name, bio, location) | Accessible via Graph API with friend permissions | Blocked — requires direct user OAuth + strict app review |
| Friend list data | Accessible to any app user connected | Removed entirely in 2018 |
| Page posts + engagement | Open API access | Accessible from public page HTML — Graph API restricted |
| Group member lists | Accessible via Groups API | Removed — groups API shut down 2019 |
| Public group post content | Open API | Accessible from public group HTML with real browser |
| Business page info (phone, address, hours) | Graph API | Still visible in public page HTML |
| Instagram cross-data (pre-merger) | Graph API via Instagram Basic Display | Separated, heavily rate-limited |
| Ad targeting data / dark posts | Research tools via API | Blocked entirely — Ad Library has limited replacement |
The practical effect: the Graph API is now useful mainly for apps that need their own users' data — your own page analytics, your own ad performance, your own post reach. For third-party data collection on public pages and groups, direct HTML scraping of facebook.com from a real browser is the only viable approach. Meta hasn't removed publicly visible content — it's removed the programmatic API path. The data is still on the screen; the question is which scraping method can read it without triggering Meta's bot detection.
What Facebook Data Is Still Scrapable in 2026?
Facebook public Pages, public Groups, and Marketplace listings are all scrapable from a real browser in 2026. Public Pages expose: page name, follower/like count, category, business description, address, phone, website, hours, and recent post text with engagement counts. Public Groups expose: post title, author username, reaction count, comment count, post date. Personal profiles and private groups are inaccessible regardless of method.
| Data Source | Scrapable Fields | Blocked | Use Case |
|---|---|---|---|
| Facebook Pages | Page name, category, follower count, like count, bio, address, phone, website, hours, recent posts (text + engagement) | Admin analytics, ad spend, page insights, private messages | Competitor monitoring, local business leads, brand tracking |
| Facebook Groups (public) | Post title, post body, author username, reaction count, comment count, share count, post date, group member count | Member lists, private groups, DMs, admin data | Community research, voice-of-customer, niche monitoring |
| Facebook Marketplace | Listing title, price, location, condition, description, seller username, listing date, listing URL | Seller contact info (hidden behind Messenger), private listings | Comp pricing, inventory research, deal sourcing |
| Facebook Events (public) | Event name, date/time, location, organizer, RSVP count, description | Private events, attendee lists | Local marketing research, event intelligence |
| Personal Profiles | Public name, profile picture (if public) | Almost everything else — posts, friends, contact info | Not viable for data collection |
Marketplace is a special case — it has its own frontend architecture separate from Facebook proper, with location-based filtering, category browsing, and infinite scroll. It's the highest-value Facebook data source for resellers, real estate investors, and pricing researchers. The dedicated Facebook Marketplace scraper guide covers the full Marketplace workflow — listing extraction, price monitoring, and location-based searches.
Why Does Python Fail on Facebook 78% of the Time?
Facebook runs behavioral ML on every session — not just TLS fingerprinting like TikTok, but mouse movement patterns, scroll velocity, time-on-page, and click pattern analysis. Python's requests library gets blocked before the first page loads (~78% rate). Playwright does better (~31%) but leaks headless signals that Meta's detection catches within 10–20 requests. The behavioral layer is why block rates degrade the longer a Python session runs.
Facebook's bot detection has two layers TikTok doesn't have at the same intensity. The first is TLS/device fingerprinting — same as TikTok, Python's JA3 hash is identifiable immediately. The second is behavioral biometrics: Meta's ML system analyzes the pattern of requests, not just their origin. A human browsing Facebook moves the mouse before clicking, scrolls at variable speed, pauses on content, and takes 3–8 seconds between page navigations. Python makes requests in millisecond bursts with zero mouse movement and perfectly regular timing.
| Method | Block Rate | Why | Cost |
|---|---|---|---|
| Chrome extension (Clura) | ~8% | Real session, real behavioral patterns, real TLS | Free / $29.99 lifetime |
| Playwright + residential proxies | ~31% | Better TLS but no real mouse/scroll behavior | $50–200/mo proxies |
| Playwright + stealth + behavioral simulation | ~19% | Simulated mouse movement helps but still detectable | $50–200/mo + dev time |
| Apify Facebook actors | ~28% | Managed headless, better than raw Playwright | $49/mo+ |
| Python requests + headers | ~78% | Wrong TLS + zero behavioral signals | Free (unreliable) |
| Python + Selenium | ~71% | Real browser binary but webdriver flag leaks | Free + proxy costs |
The behavioral degradation pattern is what makes Facebook harder than TikTok for Playwright scrapers. In our tests, a fresh Playwright session started at ~31% block rate. After 50 requests, it climbed to ~44%. After 200 requests in a session, ~61%. Meta's ML accumulates signal: a session that has made 200 requests in 12 minutes without a single mouse movement or back-navigation is statistically impossible for a human. A Chrome extension session doesn't have this degradation because real browsing behavior — your actual scroll pauses, your real mouse movements, your actual tab-switching — is what it inherits. See the full comparison in our social media scraper block rate analysis.
Facebook's bot detection isn't just about what you are — it's about what you do. Python scrapers fail not only because they look wrong at the TLS layer, but because they behave nothing like a human over the course of a session. A Chrome extension inherits 100% of your real browsing behavior for free.
Can You Scrape Facebook With Python in 2026?
Python can scrape public Facebook data at ~22–31% block rate using Playwright with stealth plugins and behavioral simulation — not raw requests. The practical ceiling for a Python Facebook scraper is ~500–800 requests per session before block rate exceeds 50%. For repeatable workflows, a Chrome extension is more reliable. For large-scale automated pipelines running 24/7, Python + Playwright with session rotation is viable with the right setup.
The best Python approach for Facebook in 2026 uses Playwright in headed mode (not headless) with three additions: the playwright-stealth plugin to patch webdriver signals, a behavioral simulation layer that adds realistic mouse movements and scroll timing, and session rotation to prevent behavioral fingerprint accumulation.
Python + Playwright setup for Facebook (best available approach)
- Use Playwright headed, not headless: headed mode passes the WebGL GPU check (real GPU renderer string) and avoids the headless-specific `navigator.webdriver` detection.
- Add playwright-stealth: patches the 8 most-detected headless signals including `navigator.webdriver`, `navigator.plugins`, and `window.chrome`.
- Simulate human scroll timing: between each request, add a random pause (2.1–5.8 seconds) and programmatic mouse movement to a random viewport coordinate. Fixed delays are themselves a bot signal.
- Rotate sessions every 100–150 requests: clear cookies, restart the browser process, load a new residential proxy IP. This resets the behavioral accumulation counter.
- Log in as a real account: Facebook shows dramatically less data to logged-out visitors since 2021. Use a real Facebook account in your Playwright session — the same session cookies a real user would have.
With this setup: ~19–22% block rate for the first 150 requests per session, degrading to ~35–45% by request 300. For a weekly competitive intelligence run (100–200 page posts), this is workable. For a daily pipeline hitting 1,000+ pages, the session rotation overhead becomes significant. Compare this against the TikTok Python scraper approach — same general architecture, TikTok adds the TTWID device binding problem on top. Facebook's behavioral ML is the harder challenge here.
How to Scrape Facebook Pages With a Chrome Extension
Open the Facebook Page in Chrome, click Clura, describe the fields you want (page name, follower count, bio, recent posts, contact info), and export to CSV. Facebook Pages load their full content within 2–3 seconds of page open with no lazy-loading for the primary fields. For post history requiring scroll-down pagination, enable Auto-paginate — Clura scrolls at human speed (~1.8s between scroll events).
Facebook Page scraping workflow
- Open facebook.com and navigate to the Page you want to scrape. Make sure you're logged into Facebook in your browser.
- For bulk page scraping: first run a Facebook search for pages in your category (e.g. 'pizza restaurants New York'), then scrape the search results list to get page URLs. Alternatively, import a list of URLs you already have.
- Open each page URL in Chrome. Click the Clura extension. Describe your fields: 'page name, follower count, category, phone number, address, website URL, recent post text.'
- For post history: scroll down to the post feed section, then toggle Auto-paginate. Clura scrolls through the feed with ~1.8s between scroll events.
- Export to CSV. For bulk page scraping across 50+ pages, use Clura's batch URL mode — paste the list of page URLs and it runs through each sequentially.
**What to expect from Facebook Page data quality:** Phone numbers and addresses are present on ~65% of local business pages — businesses that actively maintain their pages fill these fields consistently. Email addresses are rare on Facebook Pages (Meta removed direct email display in 2019) — you typically get a website URL which you can then use to find the email. For structured local business data with phone + address + rating in one place, Google Maps scraping and Yelp scraping return cleaner, more complete records.
How to Scrape Facebook Groups for Research Data
Public Facebook Groups are scrapable from a real browser — post titles, body text, author usernames, reaction counts, comment counts, and post dates are all visible without joining the group. Private groups require membership. Group post data is one of the best sources for unsolicited consumer opinion — people describe problems and compare products without a research context, producing qualitative data surveys can't replicate.
Facebook Groups are underutilized as a scraping source. Unlike Subreddits (which Reddit has monetized via API), public Facebook Groups are still directly accessible from the HTML without any API key or rate limit. A public group with 50,000 members posting daily about a product category gives you a continuous stream of real customer language — how they describe problems, which brands they compare, what they're willing to pay.
Facebook Group scraping workflow
- Navigate to the public Facebook Group URL (groups/[group-id]). If the group requires joining, it's private — skip it.
- Click Clura. It detects the repeating post card structure: post author, post text, reaction count, comment count, post date.
- Describe your fields: 'post author username, post text, reaction count, comment count, post date.'
- Enable Auto-paginate — Group feeds use infinite scroll. Clura scrolls with natural timing.
- Export to CSV. Filter by reaction count in the spreadsheet to surface high-engagement posts — these represent content the community responded to most strongly.
**Research use cases for Facebook Group data:** product feedback (what are buyers complaining about in competitor product groups?), community sizing (active post frequency as a proxy for category engagement), influencer identification (who are the most-liked commenters in a niche group?), and competitor intelligence (what topics generate the most engagement on a competitor's brand group?). Combine with lead generation scraping workflows to turn research into outreach lists.
Facebook vs Google Maps vs Yelp: Which Is Best for Local Business Leads?
For local business lead generation, Google Maps returns the most complete data (phone + address + rating + category in a structured format, 8–12M listings in the US). Yelp has better data completeness on small businesses but fewer total listings. Facebook Pages have the most inconsistent data quality — field completion depends entirely on how actively a business manages their page. Use Facebook as a verification and enrichment layer, not a primary source.
| Platform | US Business Listings | Phone Number Coverage | Address Coverage | Block Rate (Chrome ext) | Best For |
|---|---|---|---|---|---|
| Google Maps | 8–12M | ~85% | ~90% | ~3% | Primary lead source — most complete |
| Yelp | ~5M | ~78% | ~82% | ~4% | SMB focus, review data, category filtering |
| Facebook Pages | ~80M pages (all types) | ~65% | ~70% | ~8% | Enrichment layer — social proof, post activity |
| ~5M company pages | Low (B2B contacts, not phone) | ~40% | ~12% | B2B decision-maker identification |
The practical workflow for local B2B lead generation: build the list from Google Maps (primary source, best structured data), enrich with Facebook Pages (recent activity, follower count as a signal of business health), verify via Yelp (review count and rating as a quality signal). Three sources, one CRM-ready lead list. See the full pipeline in our web scraping for lead generation guide.
Is Scraping Facebook Legal in 2026?
Scraping publicly visible Facebook data — public page posts, page follower counts, public group content — is generally legal in the US under the hiQ v. LinkedIn (2022) precedent. Facebook's Terms of Service prohibit automated scraping but ToS violations are civil matters. Meta has pursued legal action against scraping operations at commercial scale (harvesting personal data from private profiles), not against businesses extracting public page data for research.
The legal framework for Facebook scraping is the same as other social platforms: publicly visible data is accessible under US law (hiQ v. LinkedIn, 2022). The distinction that matters is public vs. authenticated content. Public Page posts, Group posts in public groups, and Marketplace listings are all visible to any logged-out visitor — the same legal category as public website data. Personal profile data, private group content, and anything behind Facebook login that you don't own is a different matter.
Meta has filed lawsuits against scraping operations, but the targets were: a company harvesting personal data from private profiles at scale (2023), a data broker scraping personal relationship data (2022), and automated ad targeting data collection (2021). None involved businesses extracting public page data for competitive research or lead generation. GDPR applies if you collect EU user data — "legitimate interest" covers most B2B outreach on publicly visible business data. The social media scraper legal section covers the full hiQ precedent and GDPR framework.
What Are Facebook Scrapers Actually Used For?
The five primary Facebook scraping use cases: local business lead lists (Pages), competitive intelligence on competitor pages (post frequency, engagement, campaigns), voice-of-customer research (public Groups), Marketplace pricing research (covered in the dedicated Marketplace guide), and Facebook Ads Library monitoring (competitor ad creative and targeting). Each uses a different Facebook surface and produces a different data structure.
| Use Case | Facebook Surface | Key Data Fields | Output |
|---|---|---|---|
| Local business leads | Pages | Business name, phone, address, website, category, follower count | Lead list CSV → CRM |
| Competitor page monitoring | Pages | Post text, post date, reaction count, comment count, share count | Content performance spreadsheet |
| Voice-of-customer research | Public Groups | Post text, comment text, reaction count, post date | Sentiment dataset → analysis |
| Marketplace pricing | Marketplace | Listing title, price, condition, location, seller, listing date | Comp pricing spreadsheet |
| Ads Library intelligence | Facebook Ads Library | Ad creative text, CTA, run duration, estimated reach | Competitor ad tracking sheet |
The Facebook Ads Library (facebook.com/ads/library) deserves a special mention — it's a Meta-provided transparency tool showing all active ads across Facebook and Instagram. It's publicly accessible without login and has no API rate limits on the web interface. For competitive intelligence on what ads your competitors are running (creative copy, CTA language, how long they've been running a campaign), it's the single best free source of competitive ad data on the internet. Clura can extract it like any other webpage. For a broader competitive intelligence workflow, see the lead scraper guide which covers multi-source enrichment.
Frequently Asked Questions
Can you still scrape Facebook in 2026 after the API changes?
Yes — public Facebook data is still scrapable. The 2018 Graph API lockdown removed programmatic API access to most endpoints, but publicly visible content (Page posts, Page business info, public Group posts, Marketplace listings) is still accessible directly from facebook.com in a real browser. A Chrome extension like Clura reads this data at ~8% block rate. What you can't scrape: personal profile data, private groups, DMs, and anything that requires login you don't own.
How do I scrape Facebook without getting blocked?
Use a real Chrome browser session, not Python or headless browsers. Facebook's bot detection has two layers: TLS fingerprinting (identifies Python immediately) and behavioral ML (identifies non-human request patterns over time). A Chrome extension running in your real browser passes both checks automatically — your real TLS fingerprint, your real browsing behavior. Python with Playwright + stealth plugins reaches ~19–22% block rate at best; it degrades to ~45%+ after 200+ requests in a session.
What happened to the Facebook Graph API?
After Cambridge Analytica (March 2018), Facebook deprecated most Graph API v3 endpoints that allowed third-party apps to read user data, friend lists, and page/group content without explicit user consent. By Graph API v14.0 (2022), most data access requires a Facebook Login flow, specific app permissions, and often Meta's business verification review. The API is now primarily useful for apps managing their own pages/ad accounts — not for collecting data from third-party pages.
Is there a free Facebook scraper?
Clura's Chrome extension is free to install and works on Facebook without an API key. For Pages and Marketplace listings, free tier covers standard field extraction. For Python-based options: most GitHub Facebook scraper repos (facebook-scraper on PyPI, Selenium-based scrapers) break frequently when Facebook updates its frontend. Expect 2–6 weeks of downtime after each Facebook frontend update before repos are patched. Browser-native tools don't have this fragility.
Can I scrape Facebook business pages for leads?
Yes. Facebook business Pages publicly show page name, follower count, category, phone number (on ~65% of pages), address, website URL, and business hours. Open the Page in Chrome, use Clura to extract the fields you need, export to CSV. For bulk page scraping across a category, first scrape Facebook search results to get page URLs, then batch-scrape those pages. For more complete local business data, Google Maps (~85% phone coverage) and Yelp (~78% phone coverage) return cleaner records.
How do I scrape Facebook groups?
Public Facebook groups are scrapable without joining — navigate to the group URL, open Clura, describe the fields you want (post text, reaction count, comment count, author, post date), and export. Private groups require membership. Group post data is particularly valuable for voice-of-customer research: people describe product problems and compare alternatives in a natural context that surveys can't replicate.
What's the difference between Facebook Pages and Marketplace scraping?
Facebook Pages scraping is for business data — finding and monitoring businesses, extracting contact info, tracking competitor post activity. Facebook Marketplace scraping is for listing data — prices, conditions, descriptions, seller info for resale research, comp pricing, inventory sourcing. They use different Facebook frontends and serve completely different use cases. The dedicated Facebook Marketplace scraper guide covers the Marketplace workflow in detail.
Does Facebook scraping violate GDPR?
Scraping publicly visible Facebook business page data (page name, address, phone, posts) does not automatically violate GDPR. You need a lawful basis for processing personal data of EU residents — for B2B outreach on public business data, 'legitimate interest' typically applies. The risk areas: scraping personal profile data (even public), combining datasets to re-identify individuals, or using scraped data for automated decision-making without disclosure. Consult legal counsel before large-scale EU personal data collection.
Conclusion
Facebook's 2018 lockdown didn't make scraping impossible — it made the API path impossible. Publicly visible data on Pages, Groups, and Marketplace is still there, still accessible, still valuable. The method that reaches it reliably is a real Chrome browser session, which passes Facebook's behavioral ML and TLS checks that eliminate Python scrapers at 78% and degrade Playwright sessions after 200 requests.
For local business leads, use Facebook Pages as an enrichment layer on top of Google Maps and Yelp — not as a primary source. For competitive intelligence and voice-of-customer research, Facebook Groups are underrated. For pricing research and inventory sourcing, Marketplace is its own category — covered in the dedicated Marketplace guide.
Explore related guides:
- Facebook Marketplace Scraper — PD 11 spoke — pricing research, comp analysis, listing monitoring workflow
- Social Media Scrapers: Full Platform Comparison — Block rates across TikTok, Facebook, Reddit, X, and Instagram in one place
- Scrape Google Maps — Better phone/address coverage than Facebook for local business leads — start here
- Yelp Scraper — SMB-focused listing data with review counts and ratings — enrichment layer for Facebook leads
- Web Scraping for Lead Generation — Full pipeline: Maps → Facebook → Yelp → CRM-ready lead list
- Lead Scraper: Any Site to CSV — Universal extraction workflow covering social platforms and directories
- Twitter/X Scraper — $100/mo API workarounds — twscrape, Playwright, and Chrome extension block rate comparison
- TikTok Scraper — TTWID fingerprinting, msToken rotation, and the only method with <5% block rate
Scrape Facebook Pages, Groups, and Marketplace without getting flagged
Install Clura, open Facebook in Chrome, describe what you want in plain English. Your real session handles Meta's behavioral ML automatically. ~8% block rate, no Python setup, no API key.
Add to Chrome — Free →