Three Seconds of Silence Is an Eternity

The phone call is the oldest real-time interface we have. It is also the one AI handles worst. Chat is forgiving — latency is invisible, and a typing indicator buys you time. On a phone call, three seconds of silence is the gap where the caller decides the system is broken, or stupid, or both.

This episode explores TEL3SIS, a voice-first agentic platform built around that constraint. The entire architecture — speech-to-text, LLM reasoning, text-to-speech — is optimised to complete the full loop in under three seconds. Not as a benchmark number on a slide deck, but as a survival threshold for the user experience.

The system integrates calendars, weather, SMS, and email, and handles context-aware call forwarding to humans when the conversation exceeds what an agent should attempt — a safety boundary that too many voice AI systems omit because it feels like admitting defeat. Tri-layer memory across Redis, SQLite, and vector storage gives the agent conversational persistence without the hallucination risk of stuffing everything into a single context window. A safety oracle filters every action before execution. If you are going to put an AI on a phone line, it needs to be fast enough to feel present and honest enough to hand off when it should not be.

View the full project →