LangChain WhatsApp Tutorial

How to Build a WhatsApp AI Agent with LangChain and Whapi.Cloud: A Python Tutorial

Updated on June 1, 2026

This Whapi.Cloud tutorial shows Python developers how to build a WhatsApp AI agent with LangChain and LangGraph on a single hosted webhook. You wire the receive, reason and reply loop, then give each contact its own memory by setting the LangGraph thread_id to the sender's phone number. Replies go back as free-form text through the Whapi.Cloud API, with no local tunnel and no Meta business verification to set up first. It is written for backend developers comfortable with FastAPI and pip.

Build a WhatsApp AI agent with LangChain, LangGraph and Whapi.Cloud in Python

TL;DR: Run the receive, reason and reply loop on one Whapi.Cloud webhook. Set the LangGraph thread_id to the sender's phone number with a checkpointer, so every contact keeps their own memory. Return HTTP 200 in under a second and run the agent in a background task. No ngrok, no Meta verification, no message templates. Start with MemorySaver, swap to a Postgres checkpointer for production.

The three-step loop every WhatsApp agent runs

A WhatsApp AI agent is one loop: receive the message, reason with a tool-using agent, send the reply. Everything else is wiring around those three steps.

Receive, reason, reply loop for a WhatsApp AI agent on a single Whapi.Cloud webhook

In this build a FastAPI route receives the inbound message from Whapi.Cloud, a LangGraph agent powered by ChatOpenAI decides what to do, and a single REST call sends the answer back. The phone number that messaged you is the only identifier you need to carry through all three steps.

Connect the number and point one webhook at your app

Scan a QR code to connect the number, then paste your public URL into the channel's webhook settings. Inbound messages arrive as JSON POSTs the moment a contact writes to you.

In the official WhatsApp Business API you would register an app, pass Meta business verification, and run a verification handshake before a single message reaches your code. With Whapi.Cloud you connect a regular WhatsApp number by scanning a QR code, the same pairing flow WhatsApp Web uses, and the API is live in about two minutes. There is no Meta review queue between you and your first inbound payload.

Set the webhook URL to your deployed app's /webhook route and subscribe to the messages event in the channel settings. Whapi.Cloud then POSTs every incoming message to that route. The FastAPI handler below reads the sender's phone number and the text body out of the payload.

Setting the webhook URL and messages event in Whapi.Cloud channel settings — The channel settings screen where you paste the webhook URL and subscribe to the messages event.


# webhook.py -- receives inbound WhatsApp messages from Whapi.Cloud
from fastapi import FastAPI, Request

app = FastAPI()

@app.post("/webhook")
async def webhook(request: Request):
    data = await request.json()
    # Whapi delivers inbound messages in a "messages" array.
    for msg in data.get("messages", []):
        if msg.get("from_me"):
            continue  # skip your own outgoing messages echoed back
        sender = msg["from"]              # the contact's phone number, e.g. "14155551234"
        text = msg.get("text", {}).get("body", "")
        print(f"{sender}: {text}")
    return {"status": "ok"}

That sender value is the spine of the whole agent. It tells you who to reply to, and in a moment it becomes the key that keeps each conversation separate. See the Whapi.Cloud API documentation for the full inbound message schema.

Build the ReAct agent that picks tools, acts, then observes

A LangGraph ReAct agent is an LLM that picks a tool, runs it, reads the result, and repeats until it can answer. LangGraph supplies the loop; you supply the model and the tools.

LangChain gives you the model wrapper and the tool abstractions. LangGraph's create_react_agent wires them into a stateful graph so the agent can call a tool, observe the output, and decide its next move. You define each capability as a plain function decorated with @tool, then hand the list to the agent.


# agent.py -- a tool-using ReAct agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent

# Define tools as standalone functions.
# Decorating a bound method (def check_slots(self, ...)) raises a
# duplicate "self" argument error at agent-build time -- keep tools module-level.
@tool
def check_appointment_slots(day: str) -> str:
    """Return free appointment slots for a given day."""
    return "09:00, 11:30, 16:00"

model = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(
    model,
    tools=[check_appointment_slots],
    prompt="You are a clinic's WhatsApp assistant. Keep replies short.",
)

The prompt argument sets the agent's standing instructions and is reapplied on every turn, so the assistant's role stays stable even as the conversation grows. Keep tools small and single-purpose: one to read appointment slots, one to look up an order, one to escalate to a human. The model decides which to call from the tool's name and docstring, so write both as if they were API documentation.

This agent can already reason and call a tool. What it cannot do yet is remember anything. Invoke it twice and the second message starts from a blank slate, because nothing connects one call to the next. In practice that gap is where most first builds feel broken.

Set thread_id to the phone number, and every user keeps their own memory

Attach a checkpointer and pass thread_id equal to the sender's phone number on every call. That single line is the difference between one shared brain and one memory per contact.

Per-user memory pattern: phone number as LangGraph thread_id with a checkpointer

Without a checkpointer, every user shares one conversation state, so the second person to message inherits the first person's context. With a checkpointer plus a per-user thread_id, each contact gets an isolated thread. Use the phone number from the webhook as that thread_id and the routing solves itself. We call this the phone-as-thread_id rule, and it is the load-bearing decision in the whole build.


# memory.py -- one isolated conversation per phone number
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent

checkpointer = MemorySaver()  # in-memory; resets on restart
# For production, swap one line:
# from langgraph.checkpoint.postgres import PostgresSaver
# checkpointer = PostgresSaver.from_conn_string("postgresql://...")

agent = create_react_agent(model, tools=tools, checkpointer=checkpointer)

def reply_for(sender: str, text: str) -> str:
    # thread_id = phone number -> each contact keeps a separate conversation
    config = {"configurable": {"thread_id": sender}}
    result = agent.invoke({"messages": [("user", text)]}, config=config)
    return result["messages"][-1].content

MemorySaver keeps each thread in memory and is perfect for the prototype. It forgets everything on restart, which is fine until you deploy. The phone-as-thread_id rule does not change when you move to production; only the storage behind the checkpointer does. Open-source WhatsApp agents running in production use exactly this pattern, keyed by phone number.

Why "just use ngrok and the Business API" breaks first

The common tutorial path drags in a local tunnel, message templates, and a 24-hour reply window before any AI logic runs. The single-webhook path skips all three.

You will likely reach for the standard setup first: an ngrok tunnel so Meta can reach your laptop, plus the official Business API for sending. Here is exactly where it breaks. The tunnel URL changes on every restart and drops without warning, so your webhook silently stops receiving while your code looks fine. Then the sending side adds its own friction.

In the official WhatsApp Business API, any message you start outside a 24-hour window must be a pre-approved template, and roughly one in three templates is rejected on first review for format or category issues. Since July 1, 2025 Meta also bills per delivered template message, priced by category and country, a model with several hidden costs for developers. With Whapi.Cloud the agent answers with a free-form text reply through one API call, so there is no template approval queue and no per-template billing to police. That is the cost-predictability argument in one sentence: a reply you can send freely cannot be rejected or surcharged.

What the build needs	Single Whapi.Cloud webhook	ngrok + official Business API
Receiving messages locally	Hosted webhook URL, no tunnel	ngrok tunnel that rotates URLs and drops
Account setup	QR scan, live in ~2 minutes	Meta business verification, days to weeks
Sending a reply	Free-form text, one REST call	Pre-approved template, ~1 in 3 rejected
Reply timing	No 24-hour service window	Free-form only inside a 24-hour window
Send cost model	Subscription, no per-message template fee	Per-delivered-template billing since July 2025

Return 200 fast, run the agent in the background

Acknowledge the webhook with HTTP 200 immediately, then run the slow LLM call in a background task. A slow reply must never hold the webhook connection open.

Fast HTTP 200 acknowledgement with background agent processing for a WhatsApp webhook

Subscribe only to the messages event in the channel settings so your route is not woken by status updates and read receipts. Each inbound webhook payload carries the sender's number, the message type, and the text body; ignore anything where from_me is true so the bot does not answer its own messages.

An LLM call takes a few seconds; a webhook acknowledgement should take milliseconds. If your handler waits for the agent before returning, the delivery can time out and the same message gets re-delivered, so the user receives the reply twice. FastAPI's BackgroundTasks lets you return at once and process after.


# webhook_async.py -- fast 200, then reason and reply in the background
import os, requests
from fastapi import FastAPI, Request, BackgroundTasks

app = FastAPI()

def handle(sender: str, text: str):
    answer = reply_for(sender, text)  # the slow part: agent + LLM
    # POST https://gate.whapi.cloud/messages/text
    # If you block the webhook waiting for this, Whapi retries the
    # delivery and the contact gets the same answer twice.
    requests.post(
        "https://gate.whapi.cloud/messages/text",
        headers={"Authorization": f"Bearer {os.environ['WHAPI_TOKEN']}"},
        json={"to": sender, "body": answer},
        timeout=30,
    )

@app.post("/webhook")
async def webhook(request: Request, background: BackgroundTasks):
    data = await request.json()
    for msg in data.get("messages", []):
        if msg.get("from_me"):
            continue
        background.add_task(handle, msg["from"], msg.get("text", {}).get("body", ""))
    return {"status": "ok"}  # returned in milliseconds

The reply goes back through POST /messages/text with the sender's number in the to field and the agent's answer in body. Because the route returns before the agent finishes, deliveries stay fast and duplicate-reply bugs disappear. The pattern we encounter most often in broken first builds is a synchronous handler that blocks on the model and quietly trains the gateway to retry.

From prototype to production: swap MemorySaver for Postgres

MemorySaver remembers until the next restart; a Postgres checkpointer remembers across deploys and crashes. The swap is one line because the phone-as-thread_id rule stays identical.

Deploy the FastAPI app to any public host so the webhook URL is reachable, set WHAPI_TOKEN and your model key as environment variables, and replace MemorySaver with PostgresSaver. Conversation state then survives restarts, and the same per-user threads keep working without a code change in the agent itself.

Common errors first-time builders hit (and the fix)

Two errors trip up nearly every first build: the wrong OpenAI wrapper, and a tool defined as a class method. Both fail at startup with confusing messages.

If you pass a chat model name like gpt-4o to the legacy completion wrapper, OpenAI returns This is a chat model and not supported in the v1/completions endpoint. The fix is to use ChatOpenAI, not the completion-style OpenAI class.


# Wrong: completion wrapper rejects chat models
# from langchain_openai import OpenAI
# model = OpenAI(model="gpt-4o")  # -> v1/completions endpoint error

# Right: chat model wrapper
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o")

The second error appears when you decorate a bound method with @tool. LangChain reads self as a required tool argument and the agent build fails. Define tools as module-level functions and pass any shared state through a closure or a global client instead.

If you hit unexpected behavior on the WhatsApp side rather than in your Python, reach out to the Whapi.Cloud support team via the chat widget on whapi.cloud -- the team actively helps customers resolve production issues. We won't cover voice-note transcription or vector retrieval here; both sit on top of this same loop and deserve their own guide.

That is the whole build: one Whapi.Cloud webhook receives, a LangGraph agent reasons with memory keyed by each caller's phone number, and a single REST call sends the reply. Skipping tunnels, Meta verification, and template approvals is what keeps it this short. Teams automating appointment booking on this loop report no-shows falling by a quarter or more, which is why the per-user memory pattern is worth getting right. Wire the three steps as shown and the agent holds a real conversation across messages.

Get Your Free Whapi.Cloud Sandbox

About the Author

Jason Mitchell

Product Owner at Whapi.Cloud

Building WhatsApp integrations since 2019. Always happy to connect — whether you want to discuss an API use case, share feedback, or just talk shop. Find me on LinkedIn.

The three-step loop every WhatsApp agent runs
Connect the number and point one webhook at your app
Build the ReAct agent that picks tools, acts, then observes
Set thread_id to the phone number, and every user keeps thei...
Why "just use ngrok and the Business API" breaks first
Return 200 fast, run the agent in the background
From prototype to production: swap MemorySaver for Postgres
Common errors first-time builders hit (and the fix)

WhatsApp Call Webhooks: initiated, rin...

Subscribe to calls.post, ACK within 2s, and drive CRM from flat JSON statuses. C...

n8n vs Make vs Zapier WhatsApp API Automation Comparison

n8n vs Make vs Zapier: Which is Best f...

An in-depth comparison of n8n, Make, and Zapier for WhatsApp API automation in 2...

Zero-Token WhatsApp AI Assistant Architecture

How to Build a Free WhatsApp AI Assist...

Learn how to build a zero-token WhatsApp AI assistant with Meta AI and Whapi.Clo...

Programmatic WhatsApp Username Claim Hub

Don't Wait for the Rollout: How to Pro...

Meta's global username release introduces 128-character BSUIDs. Learn how to pro...

Frequently Asked Questions

LangChain WhatsApp Agent Questions

No. ngrok is only needed when a tunnel must expose a local machine. With Whapi.Cloud you connect a number by QR code and set a hosted webhook URL in the channel settings, so inbound messages POST straight to your deployed FastAPI app. The tunnel step that most tutorials require disappears entirely.

On the official Business API, messages outside a 24-hour window must be pre-approved templates, and Meta bills per delivered template. With Whapi.Cloud the agent sends a free-form text reply through <code>POST /messages/text</code>, so there is no template approval step and no per-template fee to manage.

Deploy the FastAPI app to a public host, set your token and model key as environment variables, and replace MemorySaver with a Postgres checkpointer. The phone-as-thread_id logic stays the same, but conversation state now survives restarts and deploys instead of resetting in memory.

Attach a checkpointer to the agent and pass <code>thread_id</code> equal to the sender's phone number on every invoke call. LangGraph then stores and restores a separate conversation per thread, so each contact keeps their own context. Without a checkpointer, all users share one state and conversations bleed together.

The webhook handler is blocking on the LLM call before returning. When the response is slow, the delivery times out and the message is re-sent, so the agent answers twice. Return HTTP 200 immediately and run the agent in a background task, then send the reply with a separate API call.

You decorated a class method with <code>@tool</code>, so LangChain reads <code>self</code> as a required tool argument and the agent build fails. Define each tool as a module-level function and pass any shared state through a closure or a global client instead of <code>self</code>.

See What Our Clients Built
with Whapi.Cloud

"Cart reminders with a 5% follow-up coupon lifted our recovery rate from 4% to 11%. Customers reply directly in WhatsApp — our team closes the sale right there."

Abandoned Cart Recovery

Hans M., Germany

"Managing 40+ segment groups became trivial — auto welcome messages, pinned updates, inactive member cleanup. Lead gen from WhatsApp groups grew 3x in two months."

Automated Group Management at Scale

Carlos S., Brazil

"Guests receive door codes, WiFi credentials, and a local guide automatically on arrival. Checkout is confirmed via a photo on WhatsApp. Front desk load dropped 40% in the first month."

Contactless Hotel Operations

Ana M., Romania

"Our deals channel has 12,000 subscribers. Whapi.Cloud scrapes competitors, filters duplicates, and auto-posts the top 5 daily. Channel growth tripled after switching to automated posting."

Automated Deal Channel Publishing

Katrin S., Germany

"We verified 93,000 active WhatsApp numbers from 180,000 contacts in 48 hours. Campaign open rates improved significantly by stopping spend on inactive numbers."

Large-Scale Audience Filtering

Sergio N., Spain

"Patients book appointments and check lab results on WhatsApp. The bot handles 200+ daily queries without staff. Appointment no-shows dropped 30% after automated 24h reminders."

Healthcare Bot — Scheduling & Results

Dr. Fernanda O., Brazil

"Post-purchase WhatsApp messages with a tailored discount at day 14. Birthday coupons see 45% redemption — far above our email rate. Repeat purchases via WhatsApp: 18% of total revenue."

WhatsApp Retention Campaigns

Lukas W., Germany

"Customers get a WhatsApp tracking link the moment their parcel ships. Support tickets dropped 35% in 3 months — mostly 'where is my order?' queries simply disappeared."

Automated Shipping Notifications

Matei P., Romania

Inhouse Developed & Managed

What is Whapi.Cloud?

Whapi.Cloud is an intuitive API that connects your business with WhatsApp -- directly and without complexity. Build support bots, schedule appointments, send notifications, manage groups and channels, automate order confirmations, and track everything with webhooks. Focus on growing your business while the API handles the messaging layer.

Our service provides full control and management of WhatsApp groups, communities and channels.

Add dynamics and new features: media, buttons, reactions, stories, orders and products. All of these are available to you for customer interaction.

Our care team will respond quickly and help you with any questions you may have!

Explore WhatsApp Automation View
demo

How to Build a WhatsApp AI Agent with LangChain and Whapi.Cloud: A Python Tutorial

The three-step loop every WhatsApp agent runs

Connect the number and point one webhook at your app

Build the ReAct agent that picks tools, acts, then observes

Set thread_id to the phone number, and every user keeps their own memory

Why "just use ngrok and the Business API" breaks first

Return 200 fast, run the agent in the background

From prototype to production: swap MemorySaver for Postgres

Common errors first-time builders hit (and the fix)

About the Author

Jason Mitchell

contents

recent posts

WhatsApp Call Webhooks: initiated, rin...

n8n vs Make vs Zapier: Which is Best f...

How to Build a Free WhatsApp AI Assist...

Don't Wait for the Rollout: How to Pro...

LangChain WhatsApp Agent Questions

See What Our Clients Built
with Whapi.Cloud

Hans M., Germany

Carlos S., Brazil

Ana M., Romania

Katrin S., Germany

Sergio N., Spain

Dr. Fernanda O., Brazil

Lukas W., Germany

Matei P., Romania

What is Whapi.Cloud?

How to Build a WhatsApp AI Agent with LangChain and Whapi.Cloud: A Python Tutorial

The three-step loop every WhatsApp agent runs

Connect the number and point one webhook at your app

Build the ReAct agent that picks tools, acts, then observes

Set thread_id to the phone number, and every user keeps their own memory

Why "just use ngrok and the Business API" breaks first

Return 200 fast, run the agent in the background

From prototype to production: swap MemorySaver for Postgres

Common errors first-time builders hit (and the fix)

About the Author

Jason Mitchell

contents

recent posts

WhatsApp Call Webhooks: initiated, rin...

n8n vs Make vs Zapier: Which is Best f...

How to Build a Free WhatsApp AI Assist...

Don't Wait for the Rollout: How to Pro...

LangChain WhatsApp Agent Questions

Do I need ngrok to build a WhatsApp AI agent with LangChain?

Do I have to use message templates to reply on WhatsApp?

How do I move the prototype to production?

How does per-user memory work in a LangGraph WhatsApp bot?

Why does my WhatsApp bot reply twice to the same message?

Why does my LangChain @tool decorator throw a duplicate "self" error?

See What Our Clients Builtwith Whapi.Cloud

Hans M., Germany

Carlos S., Brazil

Ana M., Romania

Katrin S., Germany

Sergio N., Spain

Dr. Fernanda O., Brazil

Lukas W., Germany

Matei P., Romania

What is Whapi.Cloud?

Control Groups and Channels

Use interactive messaging

Enjoy fast live support

See What Our Clients Built
with Whapi.Cloud