We investigated the challenges of mitigating response delays in free-form conversations with virtual agents powered by Large Language Models (LLMs) within Virtual Reality (VR). For this, we used conversational fillers, such as gestures and verbal cues, to bridge delays between user input and system responses and evaluate their effectiveness across various latency levels and interaction scenarios. We found that latency above 4 seconds degrades quality of experience, while natural conversational fillers improve perceived response time, especially in high-delay conditions. Our findings provide insights for practitioners and researchers to optimize user engagement whenever conversational systems’ responses are delayed by network limitations or slow hardware. We also contribute an open-source pipeline that streamlines deploying conversational agents in virtual environments.
This paper, published at ACM CUI 2025, explored how response latency and conversational fillers affect user perception of embodied conversational virtual agents in virtual reality. The latency levels were:
The conversational fillers included:
The conversational pipeline used real-time Speech-to-Text, LLM and Text-to-Speech.
Latency significantly degraded user experience:
Natural fillers improved the perceived response time, but did not fully mitigate other negative effects of high latency:
Resources: