ITfiers Logo
ITfiers (Internal)

WhatsApp AI Bot

Built an internally deployed WhatsApp AI assistant for ITfiers that combines GPT-4o-mini conversational AI with web browsing, web search, live screenshots, reminders, task management, and image OCR — all accessible through natural WhatsApp commands.

IndustryAI & Automation
Duration< 1 day
Year2026

8

AI Features

AI chat, web browse, web search, screenshots, reminders, task management, OCR, and messaging

English

OCR Support

In-browser OCR via Tesseract.js with no external service required

20 exchanges

Conversation History

Per-user in-memory chat history preserved across the session for coherent AI responses

Zero

Backend Dependency

No external database — all state is managed in-process for minimal deployment overhead

The Challenge

ITfiers needed a single, accessible productivity tool that team members and clients could interact with through WhatsApp — the messaging platform already in daily use — without requiring any additional apps or logins. The solution had to cover a broad set of use cases: AI-powered answers, real-time web lookups, visual page captures, scheduled reminders, a simple task list, and the ability to extract text from images. Routing all of these capabilities through one conversational interface, reliably and without a persistent database, required a carefully structured command detection and feature dispatch architecture.

Our Solution

We built a Node.js bot using the Baileys WhatsApp Web library to connect to WhatsApp over WebSocket without an official API dependency. Incoming messages are routed through a keyword-based dispatcher in handler.js that maps natural language triggers to eight dedicated feature modules. GPT-4o-mini powers open-ended AI chat with per-user conversation history (last 20 exchanges) maintained in memory. Web browsing and search are handled by Cheerio-based HTML parsing and the Brave Search API respectively, with Puppeteer capturing full-page screenshots on demand. Tesseract.js runs OCR on any image sent to the bot, extracting English text without any external service. Reaction emojis (⏳ / ✅ / ❌) give users immediate processing feedback on every request.

How We Built It

A detailed look at each layer of the automated pipeline architecture.

1

AI Conversation Engine

Any message that does not match a specific command keyword is routed to the AI chat handler, which calls GPT-4o-mini via the OpenAI SDK. Each user's last 20 message–response pairs are stored in memory and included as context on every API call, so the bot can follow up on earlier topics within a session. A system prompt establishes the bot's identity as an ITfiers assistant, keeping tone and branding consistent across all conversations.

2

Web Intelligence — Browse, Search, and Screenshots

Three separate handlers cover real-time web access. The browse command uses Axios to fetch a page and Cheerio to parse its HTML, then summarizes the content via GPT-4o-mini. The search command queries the Brave Search API and returns ranked results, falling back to AI-generated answers if no API key is configured. The screenshot command launches a headless Puppeteer browser, navigates to the target URL, captures a full-page screenshot, and sends the image directly into the WhatsApp chat — giving users a visual snapshot without leaving the app.

3

Productivity — Reminders and Task Management

The reminders feature parses natural-language time expressions (e.g., 'in 10 minutes', 'in 2 hours') from the command, schedules a setTimeout, and sends the user a message when the timer fires. The task manager supports adding, listing, completing, and deleting tasks, with each user's task list stored in memory for the session. Both features operate without any external database, keeping the deployment footprint minimal while covering the most common day-to-day productivity needs.

4

Image OCR

The OCR handler activates whenever a user sends an image: Tesseract.js processes the image entirely in-process without calling any external service, then replies with the extracted English text. This makes it straightforward to pull text from screenshots, photos of documents, or any image shared through WhatsApp without additional infrastructure.

5

Message Routing and Group Awareness

The central handler.js module inspects every incoming message for a prioritized set of keyword triggers before falling through to AI chat. In private chats the bot always responds; in group chats it responds only when explicitly mentioned by name or trigger keyword, avoiding noise in busy group conversations. A GROUPS_ONLY environment flag lets operators restrict the bot to group chats entirely. Reaction emojis sent at the start and end of each operation give users a clear, instant signal of processing state — ⏳ when work begins, ✅ on success, and ❌ on error — without requiring a text acknowledgment for every request.

Technology Stack

The tools and technologies powering this solution

Node.jsOpenAI GPT-4o-miniPuppeteerTesseract.jsCheerioBrave Search APIAxios