Personal Project · OCR App
Textify
A full-stack OCR app that turns images and PDFs into clean, copyable text — using Google Drive's built-in document conversion as a zero-cost OCR engine instead of a paid Vision API.
- TypeScript
- React
- Vite
- NestJS
- Tailwind CSS
- Zod
- Axios
- Google Drive API
- Docker
Overview
Textify is a small full-stack OCR app: drag in a scan, screenshot, or PDF and get clean, copyable text back. I built it to scratch a real itch — piles of paper records I wanted indexed and searchable — but the part I’m proudest of is the engine. Instead of paying for a hosted OCR/Vision API, Textify performs OCR for free by leaning on a service that already does it well: Google Drive.
How it works
The clever bit is that uploading an image to Google Drive as a Google Doc makes Drive run OCR on it automatically. The NestJS backend turns that side effect into a proper OCR pipeline:
- Validate the upload — ≤ 5 MB, and one of
png/jpg/jpeg/pdf. - Rasterize — if it’s a PDF, split it into one PNG per page (
pdf-to-png-converter). - OCR each page by round-tripping through Drive:
- upload the image with mime type
application/vnd.google-apps.document, which triggers Drive’s OCR, - export the resulting Doc as
text/plain, - delete the temporary Drive file so nothing accumulates.
- upload the image with mime type
- Return one text string per page; the frontend joins them with newlines into a single document.
The result is a genuinely zero-cost OCR engine — no paid Vision API, no model to host — at the price of one network round trip per page.
Features
- Drag-and-drop upload with a click-to-browse fallback and a live file preview.
- Multi-format input: PNG, JPG/JPEG, and multi-page PDF.
- Validation on both ends — the 5 MB cap and allowed MIME types are enforced client- and server-side.
- Copy the extracted text to the clipboard or download it as a
.txtfile. - Light / dark theme.
- Swagger / OpenAPI docs served at
/api.
Tech stack
- Frontend — React 18, TypeScript, Vite, Tailwind CSS, shadcn/ui (Radix), react-dropzone, react-hook-form + Zod, Axios, Sonner, next-themes.
- Backend — NestJS 10, TypeScript, Multer for uploads,
@nestjs/swagger,@nestjs/serve-static. - OCR engine — Google Drive API (
googleapis) over OAuth2 (@google-cloud/local-auth). - PDF handling —
pdf-to-png-converter. - Packaging — a single Docker image bundling frontend + backend.
Architecture and deployment
Frontend and backend are separate packages in development, but production ships as one single-origin container: the Docker build compiles the SPA, copies it into the NestJS app, and serves both the API and the static frontend from the same process — so the Axios client just posts to a relative files/upload path with no CORS to manage. OAuth credentials (credentials.json plus a saved token.json refresh token) are read from /etc/secrets/, which maps cleanly onto a host like Render where they’re mounted as secret files.
Limitations
I kept the project honest about the tradeoffs, because they’re inherent to the approach:
- OCR quality is entirely Google Drive’s — it varies with scan quality, handwriting, and layout complexity.
- Every page is a full Drive round trip (upload → export → delete), so large multi-page PDFs scale linearly in time.
- It needs a Google account with Drive API access; the temporary docs count against (and are cleaned out of) that account’s Drive.
- A hard 5 MB per-file cap.
What I took away
Textify is my favorite kind of hack: solving a problem by noticing that the capability already exists somewhere for free, then wiring it up cleanly. The engineering wasn’t a model — it was the orchestration: validation, PDF rasterization, the upload/export/delete dance, and keeping the whole thing a single deployable artifact. The limitations are real, but for “make my paper searchable,” it’s exactly enough.