Mitigating Response Delays in Free-Form Conversations with LLM-powered Intelligent Virtual Agents

Abstract

We investigated the challenges of mitigating response delays in free-form conversations with virtual agents powered by Large Language Models (LLMs) within Virtual Reality (VR). For this, we used conversational fillers, such as gestures and verbal cues, to bridge delays between user input and system responses and evaluate their effectiveness across various latency levels and interaction scenarios. We found that latency above 4 seconds degrades quality of experience, while natural conversational fillers improve perceived response time, especially in high-delay conditions. Our findings provide insights for practitioners and researchers to optimize user engagement whenever conversational systems’ responses are delayed by network limitations or slow hardware. We also contribute an open-source pipeline that streamlines deploying conversational agents in virtual environments.

Publication
Proceedings of the 7th ACM Conference on Conversational User Interfaces (CUI ‘25)

Short Summary

This paper, published at ACM CUI 2025, explored how response latency and conversational fillers affect user perception of embodied conversational virtual agents in virtual reality. The latency levels were:

  • 1.5 seconds
  • 4.0 seconds
  • 6.5 seconds

The conversational fillers included:

  • No fillers (control condition)
  • Artificial fillers (processing icon + sound effect)
  • Natural fillers (thinking gesture + voice line)

teaser

The conversational pipeline used real-time Speech-to-Text, LLM and Text-to-Speech.

Conversational Architecture

Latency significantly degraded user experience:

Effects of Latency

Natural fillers improved the perceived response time, but did not fully mitigate other negative effects of high latency:

Effects of fillers

Resources:

Mykola Maslych
Mykola Maslych
Computer Science PhD Candidate

My research interests include machine learning applied to 3D User interfaces and HCI in general.