Speaker Entrainment

Speaker entrainment is a phenomenon observed in human-human conversations where one interlocutor attunes their speech’s acoustic, lexical and semantic features to the other interlocutor.

This project aims to create a bot which can entrain on the acoustic features of user’s speech. Incorporating such behavior into bots is known to increase trust, naturalness and likeability, which is likely to increase customer satisfaction and call resolution rate.


Baseline Module


The following audio samples are generated from the Baseline entrainment module, which entrains over pitch (fundamental frequency), intensity (loudness) and rate of articulation.

Demo Audio Samples


Script-1: Entraining over pitch (fundamental frequency) in this audio sample, entrained performs better. In this script, the pitch of the user is rising and the bot attunes itself to that.

Not Entrained Entrained

Script-2: Entraining over rate of articulation in this audio sample, entrained performs better. An excerpt from a user-bot interaction is provided here, where in the entrained version, our bot increases its rate of articulation according to the user.

Not Entrained Entrained

Script-3: Entraining over intensity in this audio sample, the non-entrained performs better. In this script excerpt, the pitch rises but that is a result of the user being angry since the bot does not understand him, among other factors. The bot becoming louder in response is very detrimental to call quality, which is why the entrained bot performs worse in this case.

Not Entrained Entrained