How ScamVerify Document Analysis Catches Fake PDFs

What ScamVerify Document Analysis Does

ScamVerify™ document analysis is an AI-powered tool that examines uploaded PDFs, images, and documents for signs of fraud, manipulation, and connections to known scam operations. The system extracts text content, identifies embedded entities (phone numbers, URLs, email addresses), and checks each entity against 8 million+ threat records spanning FTC complaints, URLhaus malicious domains, ThreatFox indicators of compromise, and community reports.

Upload any document at scamverify.ai/document-checker. The analysis runs in seconds and delivers a plain-English risk assessment.

How It Works: The Analysis Pipeline

Step 1: Document Ingestion

Upload a PDF, image (JPG, PNG), or other document file. The system accepts documents from any source: invoices, contracts, offer letters, insurance documents, government correspondence, or any other document you want to verify.

For image-based documents (scanned PDFs, photographs of documents), the system uses optical character recognition (OCR) powered by GPT-4o to extract text from the image. This means even photographed documents, scanned paper records, and image-only PDFs can be analyzed.

Step 2: Text Extraction

The AI extracts all readable text from the document, including:

Body text from paragraphs, clauses, and descriptions
Headers and footers including letterhead information
Tables with structured data
Fine print and footnotes
Watermarks and stamps (when text-based)

The extracted text forms the foundation for all subsequent analysis. Every word in the document is captured and processed.

Step 3: Entity Identification

The AI identifies specific entities embedded in the document:

Entity Type	What Is Extracted	What Is Checked
Phone numbers	All formats (with/without country code, dashes, dots, parentheses)	FTC complaint database (2.9M+ phone summaries)
URLs and domains	Links, website references, QR code destinations	URLhaus (74,032 domains), ThreatFox (60,758 IOCs)
Email addresses	All email addresses in the document	Domain reputation, known fraud domains
Company names	Identified organizations	Cross-reference with reported entities
Financial details	Bank routing numbers, account numbers, payment instructions	Pattern analysis for known fraud schemes

Step 4: Threat Database Cross-Reference

Each extracted entity is checked against ScamVerify's threat intelligence databases:

FTC Phone Complaints. Phone numbers found in the document are checked against 2.9 million+ FTC phone complaint summaries. If a phone number in an invoice has hundreds of "Do Not Call" complaints, that is a significant red flag.

URLhaus Malicious Domains. URLs and domains in the document are checked against 74,032 known malicious domains tracked for malware distribution. A document linking to a URLhaus-flagged domain indicates a clear threat.

ThreatFox IOCs. Domains and URLs are also checked against 60,758 indicators of compromise from ThreatFox, which tracks malware command-and-control infrastructure, credential harvesting endpoints, and botnet infrastructure.

Community Reports. Entities are checked against ScamVerify's community-reported data, capturing threats that have been flagged by other users but may not yet appear in federal databases.

Step 5: Content Pattern Analysis

Beyond entity checking, the AI analyzes the document's content for patterns associated with fraud:

Urgency language such as "immediate action required," "your account will be suspended," or "deadline expires today"
Payment pressure including demands for wire transfers, gift cards, cryptocurrency, or other non-reversible payment methods
Authority impersonation where the document claims to be from a government agency, bank, or well-known company
Information harvesting where the document requests sensitive data (SSN, bank account, passwords) in ways that legitimate organizations do not
Inconsistencies between the claimed sender and the document's actual characteristics

Step 6: Risk Assessment

The analysis produces a risk assessment that combines all findings into a clear, actionable evaluation:

Entity matches listing any phone numbers, URLs, or domains that appear in threat databases
Content flags highlighting language patterns associated with fraud
Overall risk level synthesizing all signals into a single assessment
Specific recommendations for next steps based on the findings

What Documents Should You Check?

High Priority

Document Type	Why Check It	Common Fraud
Invoices from new vendors	71% of organizations face payments fraud	Altered bank details
Wire transfer instructions	$446M in real estate wire fraud annually	Modified routing numbers
Job offer letters	118% increase in fake postings since 2020	Identity theft via onboarding
Government notices	684,045 impersonation complaints	Fake IRS, SSA, court documents
Insurance documents	$308B+ annual insurance fraud	Fake policies and cards

Worth Checking

Contracts and agreements before signing
Tax documents from unfamiliar preparers
Shipping and customs notifications with payment demands
Loan and mortgage documents from non-traditional lenders
Any document requesting sensitive personal information

When to Be Especially Cautious

The document arrived unexpectedly
It demands immediate action or payment
It requests information via non-standard channels
The formatting or quality differs from previous correspondence
Contact information in the document does not match known, verified information

What Document Analysis Does NOT Do

Transparency matters. ScamVerify document analysis has specific capabilities and limitations:

It does not detect all types of manipulation. Visual alterations to documents (changed logos, modified images, recolored elements) require forensic analysis beyond text extraction. The system focuses on identifying connections to known threat infrastructure and fraudulent content patterns.

It does not replace legal review. For contracts, agreements, and legal documents, a qualified attorney should review the terms. AI analysis identifies fraud indicators, not unfavorable legal terms.

It does not validate document authenticity. The system cannot confirm that a document is genuinely from the claimed sender. It can identify indicators of fraud, but a clean result does not guarantee authenticity. Always verify through independent channels.

It does not scan for malware. Document analysis extracts and evaluates content. For malware detection (embedded executables, malicious macros), use dedicated antivirus software. For more on PDF malware, read our guide on PDF malware in 2026.

Upload a document to analyze

Upload any PDF, image, or document to check for signs of fraud or manipulation.

Analyze Document

How It Compares to Manual Verification

Check	Manual	ScamVerify AI
Read full document content	5-15 minutes per document	Seconds
Look up every phone number in FTC database	Impractical for multiple numbers	Automatic, 2.9M+ records
Check every URL against threat databases	Requires multiple tools	Automatic, 134K+ threat indicators
Identify urgency and manipulation language	Subjective, easy to miss	Pattern matching against known fraud
Cross-reference entities across databases	Requires specialized tools	Simultaneous multi-database check

Manual verification remains valuable for context that AI cannot assess: Does this invoice match previous invoices from this vendor? Do you actually have an account with this company? Was this document expected? AI analysis and human judgment work best together.

Getting Started

Go to scamverify.ai/document-checker
Upload a PDF, image, or document file
Wait a few seconds for the AI analysis to complete
Review the risk assessment and entity check results
Take action based on the findings

Free registered accounts include document analysis as part of the standard lookup allocation. Paid subscribers receive higher limits. For details on plans and pricing, visit scamverify.ai/pricing.

FAQ

What file types can I upload?

ScamVerify document analysis accepts PDFs, JPG/JPEG images, PNG images, and other common document formats. Scanned documents and photographs of documents are supported through OCR text extraction.

Is my uploaded document stored?

ScamVerify processes your document for analysis and stores the results for your account history. The analysis results are accessible from your dashboard. Review the privacy policy for full details on data handling.

Can I check a document without creating an account?

Document analysis is available to registered users. Creating an account is free and takes under a minute. Free accounts include document checks as part of the 5 free lookups.

How accurate is the analysis?

The accuracy of entity matching depends on the coverage of the underlying databases: 2.9M+ FTC phone summaries, 74K+ URLhaus domains, and 60K+ ThreatFox IOCs. If a phone number or URL in the document appears in these databases, the match is definitive. Content pattern analysis uses AI inference and provides probabilistic assessments rather than binary results. A clean analysis result reduces risk but does not eliminate it. Always combine AI analysis with independent verification.

Can I use this for business document verification?

Yes. The B2B API includes document analysis endpoints for businesses that need to verify documents at scale. See the API documentation at docs.scamverify.ai for integration details.