"Just call ChatGPT" isn't the answer

A raw general-purpose language model (a generic ChatGPT, Claude, or similar) knows nothing specific about your business. Ask it "do you ship the ZT-500 to Germany" or "what are your hours on Sundays" and it will produce a confident, entirely fabricated answer. Not because it's malicious, but because generating plausible-sounding text is what language models do.

The fix is to constrain the model to your actual business content and to put a thin rule layer on top that handles the things language models are unreliable at: deciding when to ask the visitor for their name, knowing when to stop pitching, recognizing a request for a live person, and routing that request to you quickly. None of this is exotic, but doing it well is the difference between "a chatbot" and "a chat assistant I'd actually put on my site."

How grounded retrieval actually works

Grounded means the bot's reply has to come from content you gave it: your website text, an FAQ you maintain, policy documents, maybe an inventory feed. When a visitor asks a question, the system first looks up the relevant chunks of your content, then hands those to the language model with the instruction "answer only from this."

The retrieval step is where most of the quality lives. Two different techniques each catch things the other misses:

A well-built assistant runs both in parallel and fuses their rankings. The industry usually calls this reciprocal rank fusion, or RRF. That way a question like "I need something durable for daily commuting" gets help from the dense side (catches "durable" in a product description that uses "reinforced construction"), and a question like "ZT-500 price" gets help from the lexical side (pins the exact product to #1). Neither approach alone is enough for a business with both narrative answers and branded products.

Retrieval is also cost-aware. Every chunk of your content the system pulls into the prompt is a chunk the language model has to read, and prompt size is a real cost — both in money and in answer quality (more context can dilute focus). A well-built assistant doesn't stuff every possibly-relevant FAQ entry and product chunk into every turn; it picks a number of chunks appropriate to the question shape. A two-word product lookup gets a tight inventory slice; a comparative recommendation gets a wider catalog; a hours-and-location question gets just the location FAQs. The system also skips retrieval entirely when the rule layer already knows the next step — for example, on a Fit Check qualifying-answer turn where the bot just needs to record "$8,000" as the budget answer, there's nothing for the FAQ to add. That's how a chat assistant stays affordable at scale without quietly degrading the answers that need full context.

What you should not need to know as an operator: vector dimensions, embedding models, chunk sizes, rank fusion formulas. You should need to know: the bot answers from your content, it handles paraphrase, and it catches your product names exactly. If a vendor can't explain how the second of those works, that's a flag.

A rule-based layer on top of the LLM

Here's a thing that sounds boring but matters a lot: pure prompt engineering is unreliable. If you just tell a language model "always ask for the visitor's name before closing the lead," it will comply often enough to look fine in a demo, and not reliably enough to run a real lead-capture flow on. Missing the name on some leads is a real business cost, and the failures are hard to catch during testing because they look identical to the successes until you review a week of transcripts.

The fix is a small deterministic policy layer: a rule-based decision per turn about what shape the next reply should take. Should the bot just answer? Should it answer and then offer to connect the visitor with a real person? Is it time to ask for a name? A contact method? Is the visitor winding down, in which case the bot should stop pitching? Is the visitor frustrated, in which case the bot should acknowledge that before anything else?

These decisions are made by code, not by asking the model. The model still writes the words, but the instructions it receives are different depending on which rule fired. Same vocabulary, different guardrails.

Why this matters for you: it's how the bot stops asking for your email on the third turn in a row, how it refuses to offer to connect twice to the same visitor, how it knows to wind down the conversation instead of keeping the pitch going when you've clearly decided not to buy. None of that is a prompt trick. It's a policy layer.

The clearest example is qualifying before booking. Should the bot share your calendar link with every visitor, or only ones that fit your engagement criteria? Should it ask one criteria question at a time so the conversation doesn't feel like an interrogation? When a visitor doesn't fit, should they get a polite decline plus a helpful resource — a DIY guide, a partner referral — instead of a calendar slot? Those are policy decisions, not prompt decisions. A pure-prompt bot can be told "ask qualifying questions before sharing the booking link" and will mostly comply, but the moments where it doesn't are the moments your sales calendar fills with bad-fit meetings. A real policy layer makes the rule deterministic: the link literally cannot be shared until the criteria check completes, and the not-a-fit branch always routes to whatever next-best resource you configured.

Session state and the "snapshot" lead

A chat is a conversation. The bot needs to remember things across turns. When a visitor says their name on turn 2 and their phone number on turn 5, the bot has to hold onto "name given" the whole time, even if the intervening turns talk about something else. That's session state.

The important constraint: state is per-session and time-bounded. It's not a running memory of "everything this visitor has ever told us." It's scoped to the conversation, it expires, and it doesn't train the underlying model. That matters for privacy and for predictability.

Lead capture is a related but separate event. Most chat platforms get this wrong by firing a notification every time the bot picks up a new fact about the visitor. A well-built system treats the lead as a single snapshot event: the bot collects what it can across the conversation, and when the visitor shares contact info, the system fires one notification with everything. No duplicate lead notifications, no CRM pollution, no "is this the same person" guessing game downstream. Intentional follow-up events — a returning visitor leaving a message for the owner, a calendar booking confirmation — are tracked separately so they reach you on purpose, but never as silent re-fires of the same lead.

What a well-built chat assistant deliberately doesn't do

Some of the most useful architectural choices are about what the bot won't do:

That list is a feature set, not a limitation. A bot that "learns from customer conversations" is a bot you cannot audit. A bot that "searches the web" introduces a moving source of truth you cannot review or sign off on. Neither is acceptable on a small business site.

"No open-internet search" doesn't mean the bot can't use anything beyond your FAQ. It means it can only use systems you've explicitly connected: an inventory feed (so it can check whether the ZT-500 is in stock), a Google Calendar (so a qualified visitor sees real available time slots in the chat instead of a "click here to book" link out to another page), a CRM (so the lead lands in the same pipeline your team already uses). These are bounded, auditable connections. Each one is opt-in, configured once, and revocable at any time. The bot never reaches anywhere your operator wouldn't expect — but it can do useful things within the systems you've authorized, like presenting a list of bookable slots that you actually have free, instead of guessing at availability.

Per-industry tuning, briefly

A message like "severe pain and bleeding" means something different to a dental office, a legal intake form, and an e-commerce support queue. A well-tuned chat assistant recognizes that and routes accordingly. This is usually done with industry-specific keyword patterns and verticalized behavior rules: a dental bot knows what an emergency looks like; a legal bot doesn't try to diagnose an injury; an e-commerce bot cares about SKUs and order numbers.

None of this is glamorous. It's the difference between a generic "AI for any business" that kind of works for everyone and a per-industry assistant that actually belongs on your site.

How Simple Business Bots handles each of these

Questions to ask any chat vendor

If you want a short checklist for evaluating any AI chat vendor, these five questions tend to separate the careful systems from the thin wrappers:

  1. "Can your bot look things up on the open internet, or only from content and systems I've explicitly provided or connected?" The right answer is only content and systems I've connected (your FAQ, website, inventory feed, calendar, CRM, etc.), not the open internet.
  2. "How does the bot decide when to hand off to me?" The right answer is some version of a rule, not the model decides on its own.
  3. "What does the bot do when it can't answer?" The right answer is admits it, captures contact info, offers a handoff, not guesses and not replies with a dead end.
  4. "Do customer conversations train the underlying model?" The right answer is no.
  5. "Is there behavior specific to my industry, or is it the same bot for everyone?" Industry tuning isn't always essential, but it's a meaningful signal about how much care went into the product.