AI Outbound Calling: Feasibility, Cost, Law, and the Business Case

Feasibility: can AI hold a real phone conversation?

Yes, with real caveats. AI handles a scripted, goal-directed outbound call at a quality that gets work done in favorable conditions. It cannot yet replicate natural, unstructured human conversation reliably at scale. The gap between a polished vendor demo and a production line handling thousands of calls is measured in 15 to 25 percentage points of task completion.

~200ms

Human turn-taking gap. Above 900ms callers start talking over the agent.

1.4-1.7s

Real production median latency, versus the sub-300ms vendors advertise. p99 runs 3 to 5 seconds.

15-25pp

Task-completion drop from clean demo to real PSTN, accents, and noise.

Where it works in production

Appointment scheduling, reminders, and confirmation calls (bounded, low emotion). Medical practices have automated up to 91% of reservation and appointment calls.
Outbound surveys and notifications (no negotiation, predictable script).
B2B qualification follow-up with phone-savvy recipients in quiet environments.
Debt and payment reminders with clear resolution paths.

Where it still breaks

Emotional attunement: upset customers, sensitive news, awkward pauses.
Accents and background noise (55-65 dB) cut transcription accuracy 15-30%, and ASR produces plausible near-misses ("cancel my order" heard as "schedule my order").
Multi-turn degradation: instruction-following falls sharply after 3-5 turns; GPT-4o drops to 50% on multi-turn function calling.
Adversarial callers: under price or complaint pressure, an LLM "eventually concedes" unless a function layer enforces the position.

Architecture note: cascaded speech-to-text then LLM then text-to-speech dominates production in 2026 for debuggability, compliance, and cost. End-to-end speech-to-speech (OpenAI Realtime, Gemini Live) preserves tone but costs roughly 10x more and follows instructions worse.

The stack: build, buy, or hybrid

Three shapes: a managed platform (Vapi, Bland, Retell), a self-assembled pipeline (telephony plus your own speech-to-text, LLM, and text-to-speech), or an open framework like Pipecat that gives you the pipeline plumbing without writing raw audio-socket code.

Option	Shape	Barge-in	Best for
Retell AI	Managed, custom-LLM mode	Built in (~800ms)	Fastest credible MVP. Your server returns each line over a websocket.
Vapi	Managed, swappable parts	Built in (sub-600ms)	Quick ship, tool-calling via webhooks.
Bland.ai	Managed, visual pathways	On by default	No-code flows, bring-your-own Twilio.
Twilio / Telnyx	Telephony + media-stream socket	Your code (VAD)	The transport layer under any self-build. Telnyx is ~30-50% cheaper.
Pipecat	Open framework (MIT)	Built-in VAD processor	Self-host at scale; swap any provider; native tool-call handlers.

Recommended path

MVP: Retell AI in custom-LLM mode. It owns all telephony and audio, fires a websocket event on every turn, and your thin server returns the next line. You get barge-in, warm transfer, analytics, and number provisioning without operating audio infrastructure.

Scale: migrate to Telnyx (telephony) + Pipecat (framework) + Deepgram Nova-3 (speech-to-text) + a text LLM like Claude Haiku + Cartesia Sonic (text-to-speech). All-in around $0.045/min, which is what makes resale at $0.15 to $0.25/min profitable. Keep your LLM logic identical across the move.

What a call actually costs

Modeled at 150 words per minute of agent speech and a blended 3-minute connected call. Speech-to-text and text-LLM are rounding errors; text-to-speech and the realtime-audio LLM are where the money goes.

Per-minute, by layer. Canonical figures from current published vendor pricing.
Layer	Vendor	$/min
Telephony	Telnyx (US outbound)	$0.008
Telephony	Twilio (US outbound)	$0.014
Speech-to-text	Deepgram Nova-3	$0.005
LLM (text path)	GPT-4o-mini text	$0.002
LLM (realtime audio)	OpenAI gpt-realtime	$0.10-0.30
Text-to-speech	Cartesia Sonic	$0.034
Text-to-speech	ElevenLabs Flash	$0.045

All-in, by stack. A phone number adds ~$1/month each; a local-presence pool of 50-100 numbers is $50-115/month.
Stack	$/min	3-min call
Self-assembled budget (Telnyx + Deepgram + GPT-4o-mini + ElevenLabs Flash)	$0.060	$0.18
Self-assembled premium	$0.111	$0.33
Managed (Bland Build plan)	$0.120	$0.36
Managed (Retell, mid)	$0.165	$0.50
Native realtime audio (Path B, mid)	$0.165	$0.50

Getting a number is instant via Twilio (local $1.15/mo, toll-free $2.15/mo) or Telnyx (from $1.00/mo). Local area codes answer 30-60% better than toll-free for cold outreach, but carriers scrutinize high volume from a single local line.

What you can charge, and the margin

Per-minute pricing is commoditizing fast (managed rates fell from $0.25/min in 2023 to $0.11-0.15 in 2026). The leverage is in outcome pricing, where the customer buys a booked meeting, not a minute.

Service pricing models versus underlying cost.
Model	Market price	Your cost	Gross margin
Per-minute markup	$0.20-0.50/min	$0.06-0.12/min	60-75%
Per-call	$0.75-2.00/call	$0.18-0.50/call	40-75%
Per-seat / month	$99-499/mo	by usage	30-75%
Per-appointment booked	$50-300/appt	$15-40/appt	80-95%+

The per-appointment math

At a realistic 5% connect rate and 10% book rate, one booked meeting takes ~200 dial attempts, costing roughly $16 on the budget stack (or $30-40 premium). Sell that meeting at $100-200 and gross margin before overhead is 75-90%. The anchor: a fully-loaded human SDR costs $300-500 per booked meeting, so AI undercuts by 3x to 10x. The catch is conversion risk: if connect rates fall to carrier spam filtering or your list is weak, the margin evaporates fast.

The legal landmine (read this first)

This is the part that decides whether the business survives. It is not optional hygiene; it is the current law, with a private right of action that lets any individual sue.

Every AI voice call is a regulated "artificial voice" call

On February 8, 2024 the FCC ruled (FCC 24-17) that AI-generated voice, including realtime synthesis and voice cloning, is an "artificial or prerecorded voice" under the TCPA. There is no "it sounds natural" exception and no B2B carve-out. Calling a cell phone with an AI voice without prior express consent is a violation worth $500 per call, or $1,500 if willful, with no cap. A 100,000-call campaign is $50M to $150M of exposure. TCPA class actions settle for tens of millions routinely.

The compliance floor, before the first call

Consent. Prior express written consent for any marketing call; prior express consent for informational calls. Pre-checked boxes and buried fine print do not count.
Calling hours. 8am to 9pm local time at the recipient's location (several states are stricter). Use the called party's physical location, not their area code.
Do Not Call. Scrub every list against the National DNC Registry, keep an internal DNC for 5 years, honor opt-outs promptly.
Per-call identity + opt-out. State the caller and company, give a callback, and offer an automated "stop calling" the recipient can trigger mid-call.
AI self-disclosure. Open every call with "You are speaking with an AI assistant." It costs nothing and pre-empts the wave of state bot-disclosure laws (California SB 1001, Florida, Texas SB 140 with treble damages).

Deliverability: staying off "Spam Likely" the legitimate way

Three engines drive spam labels: Hiya (AT&T), TNS (Verizon), First Orion (T-Mobile). They flag high volume from one number, low answer rates, fast hang-ups, missing CNAM, and low STIR/SHAKEN attestation.

Aim for STIR/SHAKEN Level A attestation: get numbers from a carrier under a verified business relationship and call only from numbers registered to you.
Keep CNAM accurate and enroll in Branded Caller ID / Rich Call Data so your name and reason-for-call show on the recipient's screen.
Register at FreeCallerRegistry.com, keep volume human-paced, and do not churn through many numbers (that pattern is itself a spam signal).

The honest framing: these tools exist to route identifiable, consented calls. They work against you precisely when you are calling people who did not ask to be called, which is exactly what they are designed to stop.

B2B reality: the "businesses are exempt" myth

The federal B2B exemption is narrow: a live human, manually dialing a business landline. The moment you use an autodialer, an AI voice, or call a mobile number (which is nearly every business contact today), TCPA applies in full and prior express written consent is required.

2.3-2.5%

Median dial-to-meeting conversion for B2B cold calling. ~40 dials per meeting.

15% vs 25%

AI SDR versus human SDR meeting-to-opportunity conversion: a 40% deficit.

50-70%

Annual AI-SDR tool churn, roughly double human SDR turnover. The most honest market signal.

Where AI genuinely fits B2B: list-qualification sweeps, appointment reminders and confirmations, after-hours callback capture, later-touch follow-ups, and automatic CRM logging. Where it fails: replacing the human on a genuinely cold conversation, complex enterprise accounts, and regulated verticals. The teams that win run a hybrid: AI does research, prioritization, dialing, coaching, and logging; humans hold the conversation that matters.

Verdict and the MVP path

AI outbound calling is real, commercially deployed, and cheap to run. It is not human-equivalent and will not be for the hardest conversations for 2 to 3 years. The unit economics are genuinely strong; the business risk is almost entirely legal and reputational, not technical.

Build a narrow, consented, structured product

Appointment confirmations, reminders, inbound overflow, after-hours callback, and qualification of opted-in leads. Compliance baked in from call one: consent records, AI self-disclosure, automated opt-out, DNC scrubbing.

Avoid

Cold AI dialing of strangers

Maximum legal exposure, fastest path to spam-flagging and burned lists, and the worst conversion. This is where the $500-per-call math turns lethal.

The concrete first build

Stand up Retell AI in custom-LLM mode with your reasoning server returning each line.
Pick one structured, consented use case (appointment confirmation or inbound-overflow answering) where success is "task done," not "stranger persuaded."
Bake compliance in: opening AI disclosure, consent capture, automated opt-out, DNC scrub, local calling hours.
Price per outcome (per confirmed appointment or per booked meeting), not per minute, to capture the 80%+ margin and dodge the commoditizing per-minute race.
When volume justifies it, migrate to Telnyx + Pipecat + Deepgram + Cartesia at ~$0.045/min and keep the same logic.