Automated Lead Generation System

Scrapes real estate listings, extracts agent data, and syncs directly into a CRM.

RLA Studios Dashboard
RoleFounder & Lead Engineer
Timeline2024 - Present
StackNode, FastAPI, React, PostgreSQL
TypeBusiness Platform
01

The Big Picture

A dual-runtime platform that automates lead scraping, client management, invoice generation, file delivery, and commission tracking. It replaces spreadsheets, manual Airtable entry, and disconnected tools.

RLA Studios is a full-stack business operations platform built for a real estate videography company. It's a dual-runtime monolith: a Node/Express frontend server and a Python/FastAPI backend sidecar, deployed as three Docker containers (Express, FastAPI, PostgreSQL) behind Traefik on a VPS via Dokploy.

The platform connects lead intake, CRM management, invoice generation, file delivery, commission tracking, and admin operations through a single interface, replacing what was previously a tangle of 4-5 disconnected tools.

0Leads managed
0API endpoints
0Lines of source code
RLA Studios Website

1.0 - rlastudios.com, the public-facing marketing site with lead capture, built alongside the platform.

02

The Problem

Every step in the business workflow was a context-switch between disconnected tools.

Before this platform, the workflow was entirely manual: browse Redfin and Compass listings one-by-one, copy-paste agent contact info into Airtable, generate invoices in Canva or Google Docs, manually upload PDFs to Dropbox, then remember who sold which client and who's owed commission.

Scaling past a handful of clients meant drowning in admin work instead of shooting videos. Every new client added more manual steps, not less.

Manual Lead Research

Browsing listing sites one-by-one, copy-pasting agent contact info into spreadsheets and Airtable by hand.

Disconnected Invoicing

Creating invoices in Canva/Google Docs, manually uploading to Dropbox, then tracking payment status separately.

No Commission Tracking

Remembering who sold which client and who's owed what. Tracked in someone's head, not a system.

Tool Sprawl

Every step required context-switching between Redfin, Airtable, Canva, Dropbox, and spreadsheets.

03

System Architecture

A dual-runtime monolith: Node gateway + Python sidecar, deployed as three Docker containers behind Traefik.

Express acts as the gateway and proxy. It handles auth, sessions, audit logging, and commission interception, then forwards business-logic requests to FastAPI via internal HTTP. The Python sidecar is never publicly exposed; an internal API key guard enforces this. WebSocket connections for the scraper are upgraded and piped through raw TCP.

PostgreSQL handles platform state (users, leads, settings, services, audit logs, commissions) across 8 tables with Drizzle ORM for type-safe queries. Airtable serves as the external CRM for client records, and Dropbox handles file storage for generated invoices.

Admin Dashboard

2.0 - Admin dashboard showing integration status, quick actions, and lead management.

01

Express Gateway

Handles auth, sessions, audit logging, WebSocket proxy, and commission interception. Forwards business logic to FastAPI via internal HTTP.

02

FastAPI Sidecar

Scraping engine, invoice generator, Airtable CRUD, Dropbox uploads, and address geocoding. Never publicly exposed, guarded by internal API key.

03

Dual Database

PostgreSQL for platform state (8 tables via Drizzle ORM). Airtable as external CRM, preserving the existing workflow while adding automation on top.

04

Scraper Pipeline

Paste listing URLs, get structured agent data in Airtable with real-time WebSocket progress.

Paste one or more Redfin or Compass listing URLs into the scraper UI. The Python backend auto-detects the platform and scrapes each page using BeautifulSoup with a multi-strategy extraction pipeline: first tries regex on the “Listed by” section, then falls back to JSON-LD structured data extraction, then uses DuckDuckGo HTML search as a last resort to find listings by address.

Results stream back via WebSocket in real-time with structured message types (log, result, complete, error), so each URL's progress is visible as it happens. Deduplication checks against existing Airtable records prevent duplicate entries. New leads are written directly to Airtable with agent name, phone, email, listing address, listed date, and days on market.

Partial street addresses extracted from listing URLs are auto-geocoded via Nominatim (OpenStreetMap) to full street, city, state, ZIP format. An in-memory cache prevents redundant lookups.

Lead Scraper

3.0 - Paste Redfin/Compass URLs, watch real-time WebSocket progress as each listing is scraped and written to Airtable.

05

Invoicing Engine

Generate branded PDF invoices, auto-upload to Dropbox, sync to Airtable, and track commissions. All in one click.

Select a client, pick services from a configurable catalog, and generate a branded PDF invoice using ReportLab with custom Montserrat typography and background template overlays. The output is a two-page PDF: page one is the invoice with pixel-precise coordinate-based layout, page two is an auto-populated Video Ownership Agreement with the client's name injected into legal text.

The PDF is auto-uploaded to Dropbox and the share link is synced back to the client's Airtable record. Marking an invoice as paid overlays a PAID stamp via PyMuPDF without regenerating the original, re-uploads the stamped version, and creates a commission ledger entry. The Express proxy layer transparently captures payment events and writes audit + commission entries without the Python backend needing to know.

Invoice Generator

4.0 - Select services from the catalog, preview the branded PDF in real-time, then generate and auto-upload to Dropbox.

06

CRM & Admin Panel

Role-based admin panel with lead management, client CRM, commission tracking, and a sandboxed demo mode.

The CRM manages nearly 2,000 real estate agent leads with full outreach pipeline tracking. Each record stores contact info, listing details (address, URL), and tracks the entire outreach lifecycle: when they were texted, days since last contact, follow-up dates, and response status.

Quick-action buttons let the team send templated messages with one tap: opener texts, follow-ups, pricing info, questions, and emails. Each button generates a pre-filled SMS deep link with the agent's first name, listing address, and a customizable message template pulled directly from the CRM. No copy-pasting, no app switching.

Lead status flows through a defined pipeline: New Lead, Interested, Need to Follow Up, Followed Up, Not Interested, Ghosted, or Hired. Video production status is tracked per lead (Not Started, In Progress, Completed), and invoices attach directly to completed work. Team attribution tracks who scraped each lead for commission purposes.

The admin panel layers on top with role-based access control (superadmin, admin, viewer, demo), rate-limited login (5 attempts per 15 minutes), bcrypt hashing, session-based auth, and audit logging. A sandboxed demo mode lets prospects explore the full panel. All mutations are intercepted and short-circuited so demo users can click every button without touching real data.

Client CRM

5.0 - Nearly 2,000 agent leads with outreach status, follow-up tracking, and one-tap templated messaging.

5.1 - One-tap outreach: click a button, get a pre-filled SMS with the agent's name and listing address pulled from the CRM.

Commission Tracking

5.2 - Commission ledger with invoice links, sold-by attribution, and payment status toggling.

07

Automation

If a human doesn't need to make a judgment call, a system handles it.

01

Scraper to Airtable

Paste URLs, auto-detect platform, scrape agent info, deduplicate, write to Airtable. Real-time WebSocket streaming shows each URL's progress.

02

Invoice Lifecycle

Generate PDF, upload to Dropbox, sync link to Airtable, mark paid, overlay PAID stamp, re-upload, create commission entry. One click.

03

Address Completion

Partial addresses from listing URLs are auto-geocoded via Nominatim to full street, city, state, ZIP format. In-memory cache prevents redundant lookups.

On first startup, the system auto-creates a default superadmin user and seeds a 7-item service catalog. Zero manual setup needed post-deploy. The configurable service catalog lets admins add, edit, and reorder line items that populate invoice generation, so pricing changes don't require code changes.

08

Design Decisions

Intentional architectural choices and the reasoning behind them.

01

Dual Runtime

Python has better scraping and PDF libraries (BeautifulSoup, ReportLab, PyMuPDF). Node has better session middleware and Drizzle ORM. Each runtime handles what it's best at.

02

Airtable as CRM

The business was already running on Airtable. Rather than forcing a migration, the platform wraps Airtable's API with retry/backoff and uses it as the source of truth for client data.

03

Commission Interceptor

Rather than modifying the Python backend, the Express proxy intercepts successful mark-paid responses and writes commission entries as a side effect. Zero cross-service coupling.

WebSockets were chosen over polling for scraper progress because scraping multiple URLs takes 5-30 seconds each. Polling would add latency and complexity. WebSockets give instant per-URL feedback with structured message types, making the scraper feel interactive rather than batch-oriented.

Multi-stage Docker builds with Alpine cut the final image size significantly. The builder stage installs all dev dependencies and runs Vite + esbuild, then the runtime stage copies only build output and production deps.

09

Challenges

The hardest problems encountered and how they were solved.

Scraping reliability across platforms.Redfin and Compass have completely different HTML structures and no consistent API. The scraper uses a multi-strategy pipeline: regex on the “Listed by” section first, then JSON-LD structured data, then DuckDuckGo HTML search as a fallback. Property type detection filters out land/lots to avoid irrelevant leads.

Invoice PDF pixel alignment.Overlaying dynamic text on branded background templates (Canva-exported PNGs) required a manual coordinate system with per-field (x, y) tuning. A debug grid mode draws 10/50/100-point gridlines to help align text visually. Text wrapping for long service descriptions uses ReportLab's simpleSplit with column-width constraints to prevent overlap.

Airtable rate limits. Airtable enforces 5 requests/second. The service layer implements exponential backoff with 3 retries on 429 responses and connection errors. Pagination handles bases with 100+ records via offset tokens.

WebSocket proxy through Express.The scraper's WebSocket connection originates from the browser, hits Express, and needs to reach FastAPI. The httpServer upgrade handler manually pipes the TCP connection through, forwarding the internal API key.

10

Outcomes

0Source files
0PostgreSQL tables
0Docker services

The platform replaced a workflow that required constant context-switching between 4-5 disconnected tools with a single interface that handles the entire business lifecycle. Lead research that used to take hours of manual browsing now happens in seconds with the scraper. Invoicing that required Canva, manual uploads, and spreadsheet tracking is now a one-click pipeline.

The dual-database architecture means the business can continue using Airtable as a familiar CRM interface while the platform adds automation, commission tracking, and audit logging on top, serving as a pragmatic bridge between existing workflow and full automation.

11

Retrospective

What worked, what I'd change, and what I learned.

01

Use the Right Runtime

The dual-runtime approach was the fastest path to production. Python for scraping/PDFs, Node for auth/sessions/ORM. Pragmatism over purity.

02

Wrap, Don't Replace

Wrapping Airtable instead of migrating away from it preserved the existing workflow and reduced adoption friction. Meet the business where it is.

03

Automate the Full Loop

Half-automating a workflow just moves the bottleneck. The invoice lifecycle works because it's automated end-to-end, from generation through commission tracking.

If rebuilding from scratch: replace Airtable with PostgreSQL entirely to eliminate the dual-database complexity, add a task queue like BullMQ for durable scraping jobs, unify to a single Node runtime with Puppeteer and pdf-lib, and invest in E2E tests. The platform works, but refactoring is riskier than it needs to be without automated test coverage.