Skip to content

PII Scrubber for Plan Uploads

Automatic removal of owner names, addresses, and phone numbers from uploaded plans

Overview

When you upload plan PDFs to the marketplace, BlueClerk automatically scans every page for personally identifiable information (PII) like owner names, property addresses, builder company information, and phone numbers, then removes them before storing the file. This protects homeowner privacy and ensures your plans can be sold without exposing sensitive customer data.

How It Works

Automatic Scanning

Every uploaded plan PDF is processed through Claude Vision AI:

  1. Every page is scanned - Not just the first page, but all pages in the PDF
  2. PII locations are identified with precise coordinates
  3. White rectangles are drawn over detected PII using pdf-lib
  4. The cleaned PDF is stored in place of the original

What Gets Flagged

The scrubber is trained to catch:

  • Property/build addresses - Like "606 W. 18th Street Georgetown, TX" that appear in title blocks on every page
  • Property owner/client names - Listed in title blocks or headers
  • Builder company name AND office address - Your company info that appears in title blocks
  • Phone numbers and email addresses - Contact information in title blocks
  • "Residence" or "Spec Home" labels that include street addresses

What Doesn't Get Flagged

The scrubber is smart enough to preserve:

  • Plan type names - Like "The Sabine" or "Model 2400"
  • Lot numbers and subdivision names - Development information
  • Legal descriptions - Parcel and survey data
  • Dimensions and measurements - All technical specs
  • Room labels and building codes - Functional plan information
  • Sheet numbers - Like C1.0, A1.0, etc.

Title Block Detection

The scrubber pays special attention to title blocks, which typically appear in the right margin of every page as a vertical strip. This is where most PII is concentrated on architectural plans.

Blob Path Improvements

Your uploaded plan files now use readable filenames instead of random UUIDs:

  • Preview links show real plan names - Like "sabine-2400-sqft.pdf" instead of "abc123.pdf"
  • Downloads use original filename - When buyers download your plan, they get a meaningful filename
  • Better organization - Easier to identify plans in your blob storage

After Upload

Once PII scrubbing completes:

  • You see a PII count - The upload confirmation shows how many items were redacted
  • Your plan is ready - The cleaned PDF can be published to the marketplace
  • Original data is gone - The scrubbed version permanently replaces the original

Why This Matters

Privacy Protection

  • Homeowner safety - Addresses and names can't be traced back to real people
  • Compliance - Helps you meet data privacy regulations
  • Professional risk - Protects you from accidentally exposing customer information

Marketplace Trust

  • Buyers feel safe - They know plans don't contain real addresses
  • Sellers protected - You don't have to manually redact every plan
  • Platform quality - BlueClerk maintains high standards for all marketplace content

Questions

Q: What if the scrubber misses something? A: The AI is trained to be generous with bounding boxes, but if you notice PII that wasn't caught, contact support. We continuously improve the detection prompt based on feedback.

Q: Can I see what was removed? A: The upload confirmation shows how many PII items were detected and redacted. The specific text isn't logged for privacy reasons.

Q: Does this slow down my upload? A: PII scrubbing adds a few seconds per page to the upload process, but it runs automatically in the background. You'll be notified when it's complete.

Q: What if I want to include the builder company name? A: The scrubber removes builder company info by default to protect business information. If you want your company name visible, you can add it back in the plan description or preview images after upload.

Was this helpful?
Contact Support →