ValidEmailChecker

Free Email Extractor

Paste a document, webpage, or any text — pull out every email address inside. Dedup, sort, filter by domain.

Smart ParserDedup + SortCSV Export

Source text

0 chars

Comma-separated. Prefix with ! to exclude.

Extracted emails

No results yet

Paste text or upload a file on the left, then click Extract emails. Results appear here.

How it works

1

Paste text or drop a file

Paste any text into the input box, or click Upload to load a .txt, .html, .csv, .json, .md, or .eml file. You can also drag-drop a file directly onto the textarea.

2

Adjust options if needed

HTML stripping auto-enables when we detect HTML. Gmail dot-equivalence dedup is on by default. Optionally filter to specific domains with a comma-separated list (use `!` prefix to exclude).

3

Extract and export

Click Extract. See the unique emails, per-domain breakdown, and stats. Click Verify on any row to confirm the mailbox actually exists. Export the full list as CSV, JSON, XLSX, or TXT.

What this tool does in plain English

Paste in any text — a webpage's source, a long PDF copy-paste, a CSV column, a Slack thread — and we pull every email address out into a clean list. Deduplicates them, groups by domain, lets you filter, exports as CSV / JSON / XLSX / TXT. The whole thing runs in your browser, so nothing leaves your machine.

Think of it as a fast pre-cleanup before whatever you're doing next — building a contact list, auditing who's emailed your support inbox, or feeding the results into our email verifier to confirm which addresses are still live.

Where users actually paste content from

  • Long PDFs and Word documents — contract addenda, RFPs, vendor lists. Copy the text, paste here, get a clean list. Way faster than ctrl-F'ing for `@` symbols.
  • Webpage source — paste the HTML of a contact directory or team page. Our HTML strip option pulls emails out of `<a href="mailto:...">` links AND visible text without duplicating either.
  • Support inbox export — paste a few weeks of conversation history, get the list of everyone who's emailed you. Useful for building re-engagement segments.
  • Forum or Slack threads — extract every email mentioned in a discussion. Sometimes people drop addresses inline without using a proper `mailto:` link.
  • CSV column with mixed data — when a CSV has emails buried in a freeform `notes` field. Paste the column, we extract the emails.
  • Old email backups or .eml files — drop a `.eml` file directly into the upload box. We pull sender, recipient, and any addresses mentioned in the body.

How the smart parts work

HTML stripping (auto-detected)

Paste raw webpage HTML and our naive approach would pull the same email twice — once from the visible text `Email: kevin@x.com` and once from the `<a href="mailto:kevin@x.com">` link. The strip-HTML option removes tags before extraction so you get one clean address per real occurrence. We auto-detect HTML in your paste and turn the option on by default — you can switch it off if you want literal tag attributes treated as text.

Gmail dot-equivalence dedup

Gmail and Proton treat dots as cosmetic — `kevin.smith@gmail.com`, `kevinsmith@gmail.com`, and `kev.in.smith@gmail.com` all deliver to the same inbox. Our extractor groups these together by default so your final list has one entry per real mailbox, not three duplicates. Toggle off if you specifically want the original dot placement preserved.

Same logic applies to Gmail's `+tag` addressing. `kevin+sales@gmail.com` and `kevin+support@gmail.com` deliver to the same inbox as `kevin@gmail.com`, so we dedupe those too. This catches a common pattern where someone's used plus-addressing to track which form they signed up through — useful info, but the same person.

Domain include/exclude filter

Type a comma-separated list to filter results. `gmail.com, yahoo.com` keeps only emails at those domains. Prefix with `!` to exclude: `!gmail.com, !yahoo.com` removes those. Mix and match: `acme.com, !spam-domain.io`. Common use cases:

  • B2B prospecting — exclude `gmail.com, yahoo.com, hotmail.com, outlook.com` to keep only business domains.
  • Internal audit — include only `yourcompany.com` to see who from inside is in the list.
  • Vendor cleanup — include only domains you've worked with: `vendor-a.com, vendor-b.com`.

After extraction — what to do with the list

Pulling emails out is the easy part. Making them actually useful takes one more step.

Verify before sending anywhere

Extracted emails are pre-verification — we don't know if any of them still work. Run the list through our email verifier before importing into a CRM or hitting send on a campaign. The verifier opens an SMTP conversation with each recipient mail server and tells you which mailboxes still exist.

A typical extracted list from a 6-month-old document has 15-30% dead addresses. Importing those to your email tool without verification means sending campaigns to bouncing mailboxes — which tanks your sender reputation in the eyes of Gmail and Outlook.

Check syntax for the rough cases

If your source had typos (`kevin@gmail.con`) or unusual characters that snuck through our regex, run the list through our email syntax checker. It catches the structural problems plus provider-specific rules (Gmail rejects dashes, Yahoo rejects dashes, etc.) that the verifier might mark as bouncing without explaining why.

Find missing addresses

Sometimes your extraction misses people you know should be in the list. If you have their name + company domain, our email permutator generates every common email format for that combination so you can verify the most likely ones.

Is it legal to extract emails like this?

Extraction itself is just text processing — nobody can object to copying text and running a regex on it. The legal question is what you DO with the extracted list:

  • Cleaning your own data (your CRM, support inbox, contracts you've signed) — always legal. You already have the relationship.
  • Sending personal one-to-one messages to addresses you found in public business contexts — generally allowed under CAN-SPAM (US) and GDPR's legitimate-interest provisions, provided your message respects opt-out and identifies you clearly.
  • Bulk email campaigns to unverified extracted lists — risky. CAN-SPAM requires sender ID + physical address + unsubscribe; GDPR (EU/UK) wants explicit consent or strong legitimate-interest documentation; CASL (Canada) requires express consent before the first message.
  • Scraping public-facing pages to build sales lists — operates in a gray zone. Legal in the US for true publicly-listed business contacts; restricted in the EU under GDPR even when the data is technically public.

The safest rule

Use extraction to clean up data you already own or have a legitimate context for. Bulk-blasting unverified scraped lists isn't a tactic that works long-term — your domain gets blacklisted before the campaign finishes.

What the tool doesn't catch (and what to do)

A few edge cases our regex misses:

  • Obfuscated emails — `kevin (at) example (dot) com`. People write these on public pages to dodge harvesters. If you're extracting from a page with obfuscated emails, you'll need to manually fix those. Our email obfuscator explains the formats this typically takes.
  • Internationalized addresses — `用户@example.com`. Our regex matches Latin characters only. International support is on the roadmap.
  • Quoted local parts — `"Kevin Smith"@example.com`. RFC-valid but virtually never used in real mail.
  • Images with email text — `<img>` tags showing email addresses as pictures. No way to extract without OCR.

For obfuscated or image-based emails, you'll need eyes-on review. The regex pulls the easy 95%. The remaining 5% are designed not to be extracted by automated tools — which is exactly why people obfuscate them in the first place.

Frequently Asked Questions

Common questions about email extraction, file support, and dedup behavior.

It pulls every email address out of any text or file you paste in. Deduplicates them (with smart handling for Gmail's dot-equivalence), groups by domain, lets you filter to specific domains, and exports the clean list as CSV, JSON, XLSX, or TXT. All client-side — your text never leaves your browser.

Not directly — we accept text-based file formats only (.txt, .md, .html, .csv, .json, .eml). For PDFs, copy the text content from the PDF and paste it into the tool. Most PDF readers let you do this with ctrl-A then ctrl-C. For PDFs where the text is actually images (scanned documents), you'd need OCR first.

Yes — paste the HTML directly and switch on the "Strip HTML tags before extracting" option. We auto-detect HTML in your paste and enable the option by default. This prevents the same email showing up twice (once from a `<a href="mailto:...">` link, once from the visible text next to it).

Gmail and Proton treat dots in the local part as cosmetic. `kevin.smith@gmail.com`, `kevinsmith@gmail.com`, and `kev.in.smith@gmail.com` all deliver to the same inbox, so we count them as one. Plus-addressed Gmail variants (`kevin+tag@gmail.com`) are also collapsed to the base address. Toggle off the dedup option if you want the literal addresses preserved as separate rows.

Type a comma-separated list of domains. Plain entries are an allow-list: `gmail.com, yahoo.com` keeps only those. Prefix with `!` to exclude: `!gmail.com` drops Gmail addresses. Mix freely: `acme.com, !disposable.com` includes Acme addresses and excludes any others at the disposable domain.

Extraction itself is just text processing — running a regex on text you already have is fine everywhere. What you DO with the list is where laws come in. Sending unsolicited bulk email to scraped lists is restricted by CAN-SPAM (US, requires opt-out), GDPR (EU/UK, requires consent or documented legitimate interest), and CASL (Canada, strictest). Cleaning your own data (CRM, support inbox, signed contracts) has no restrictions.

Captured. When you enable HTML stripping, we still extract email addresses from the `href` attributes of `<a href="mailto:...">` links — they're treated as text once the surrounding tags are removed. So pasting a webpage's source captures both visible-text emails and `mailto:`-link emails.

No hard cap on pasted text, but file uploads are capped at 5 MB. For very large source files (millions of lines), processing happens in the browser so performance depends on your machine. Our typical user pastes between 1 KB and 1 MB of text and gets instant results.

Still have questions?

Contact our support team →

Extracted a list?
Find out which addresses still work.

A typical extracted list has 15-30% dead addresses. Run yours through our verifier to filter bouncing mailboxes before you import or send. 200 free credits, no card.