Hunt and extract text patterns from PDF documents using powerful regex tools.
TextHunter is available as both a web application and a native Windows desktop app. The desktop version runs entirely offline with all processing happening locally on your machine.
Download the latest Windows installer from the Releases page.
Desktop Features:
- ✅ Works completely offline (no internet required)
- ✅ All processing happens locally on your machine
- ✅ Native Windows application with system tray integration
- ✅ Automatic updates (coming soon)
See DESKTOP-README.md for installation and development instructions.
TextHunter is also available as a full-stack web application that allows you to upload PDF files, extract text content, and search for patterns using regular expressions. It features an intuitive Vue.js frontend with IndexedDB storage and a FastAPI backend for high-performance text processing.
- 📄 PDF Text Extraction - Upload and process PDF files with automatic text extraction
- 🔍 Regex Pattern Matching - Search for text patterns using custom regex or AI-generated patterns
- 🤖 Smart Regex Generation - Generate regex patterns from example strings
- 📊 Excel Export - Export extraction results to Excel with context
- 💾 Local Storage - Store PDFs and extracted text locally in your browser
- 🚀 Fast Processing - High-performance backend with async processing
- Frontend: Vue 3 + Vite + TypeScript + Tailwind CSS
- Backend: FastAPI + Python 3.12+
- Storage: IndexedDB (frontend) + in-memory processing (backend)
- Deployment: Ready for containerization and cloud deployment
- Framework: Tauri v2 (Rust-based)
- Frontend: Vue 3 (same as web, bundled)
- Backend: Python FastAPI sidecar (PyInstaller bundled)
- Platform: Windows (NSIS installer)
- Node.js 18+
- Python 3.12+
- uv (recommended Python package manager)
-
Clone the repository
git clone https://github.com/Unmask06/text-hunter.git cd text-hunter -
Start the application
# Windows PowerShell .\launch.ps1
This will start both backend (port 8000) and frontend (port 5173) servers.
Backend:
cd backend
uv sync
uv run python -m texthunterFrontend:
cd frontend
npm install
npm run dev- Open your browser to
http://localhost:5173/products/text-hunter/ - Upload PDF files using the file upload area
- Wait for text extraction to complete
- Configure your regex pattern or use the AI regex generator
- Extract matches and export results to Excel
📚 Comprehensive documentation is available for TextHunter, including:
- Overview and target users
- How-to guides with examples
- Oil & Gas industry use cases (line lists, instrument lists, equipment lists)
Development:
cd frontend
npm run docs:devVisit http://localhost:5173/products/text-hunter/docs/ (or check console for actual port)
Production Build:
cd frontend
npm run docs:buildFrom the Application: Click the "Docs" button in the top-right corner of the TextHunter application.
The backend provides a REST API with the following endpoints:
GET /- API informationGET /health- Health checkPOST /extract- Extract matches (preview)POST /extract-all- Extract all matchesPOST /guess-regex- Generate regex from examplesPOST /export- Export to Excel
Full API documentation available at http://localhost:8000/docs when running.
cd backend
uv sync --group dev
uv run pytest # Run testscd frontend
npm run update-api # Regenerate TypeScript types from backendtext-hunter/
├── backend/ # FastAPI backend
│ ├── texthunter/
│ │ ├── main.py # FastAPI app
│ │ ├── api/ # API layer
│ │ ├── core/ # Business logic layer
│ │ ├── config/ # Runtime settings
│ │ └── utils/ # Shared utilities
│ └── tests/ # Backend tests
├── frontend/ # Vue.js frontend
│ ├── src/
│ │ ├── components/ # Vue components
│ │ ├── services/ # API and DB services
│ │ └── types/ # TypeScript types
│ └── public/ # Static assets
├── launch.ps1 # Development launcher
└── README.md
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see individual component licenses for details.