Categories

Machine Learning

Speech LLMs for Conversations

With LLMs making conversational systems has become easier. You no longer need to focus on the low-level details of categorizing seman...

In Machine Learning, May 09, 2024

Improving consumer verification using confidence calibration and thresholding

In the past year, our team’s current focus has shifted to building robust and scalable voice-bots for US companies. In particular, we...

In Machine Learning, Jan 09, 2024

Speech-First Conversational AI Revisited

Around last year, we shared our views on how nuances of spoken conversations make voicebots different than chatbots. With the recent ...

In Machine Learning, May 11, 2023

Incorporating context to improve SLU

Introduction In task-oriented dialogue systems, the spoken language understanding, or SLU, refers to the task of parsing the natural ...

In Machine Learning, Aug 04, 2022

Theory of Mind and Implications for Conversational AI

When a diplomat says yes, he means ‘perhaps’; When he says perhaps, he means ‘no’; When he says no, he is not a diplomat.

In Machine Learning, Theory of Mind, May 19, 2022

End of Utterance Detection

This blog post is based on the work done by Anirudh Thatipelli as an ML research fellow at Skit.ai

In Machine Learning, Apr 24, 2022

TTS Enhancement

Problem Statement

In Machine Learning, Mar 09, 2022

Turn Taking Dynamics in Voice Bots

One of the challenges in building an interactive voice bots is accounting for turn taking behaviour. Turn-taking is a difficult probl...

In Machine Learning, Mar 07, 2022

Feature Disentanglement - I

The main advantage of deep learning is the ability to learn from the data in an end-to-end manner. The core of deep learning is repre...

In Machine Learning, Feb 22, 2022

Google Summer of Code, 2022

Google Summer of Code - 2022

In Machine Learning, Feb 18, 2022

Speaker Entrainment

In this post, we will discuss the phenomenon of speaker entrainment and the insights we gained when designing a voice-bot that entrai...

In Machine Learning, Feb 04, 2022

Speech-First Conversational AI

We often get asked about the differences between voice and chat bots. The most common perception is that the voice bot problem can be...

In Machine Learning, Feb 02, 2022

Evaluating an ASR in a Spoken Dialogue System

An ASR (automatic speech recognition) is an integral component of any voice bot. The most popular metric that is used to evaluate the...

In Machine Learning, Jan 21, 2022

Complexity of Conversations - I

Consider a restaurant booking voice bot built using a frames and slots approach. While this can easily solve the problem of booking w...

In Machine Learning, Jan 18, 2022

On using ASR Alternatives for a Better SLU

This blog discusses some concepts from the recently published paper by members of the ML team at Skit (formerly Vernacular.ai). The p...

In Machine Learning, Nov 29, 2021

Seminar - Code Mixing in NLP and Speech

Below are some pointers and insights from the papers that we covered in the recently concluded seminar on Code-mixing in NLP and Spee...

In Machine Learning, Aug 24, 2021

Code Mixing Metrics

We at skit, recently concluded a seminar series on code-mixing, where we covered research papers that looked at approaches to deal wi...

In Machine Learning, Aug 09, 2021

Normalizing Flows - Part 2

In Part-1, we introduced the concept of normalizing flows. Here, we discuss the different types of normalizing flows. In most blogs t...

In Machine Learning, May 08, 2021

What's New in Kaldi-Serve 1.0

Kaldi-Serve is our open source high performance Speech Recognition server framework capable of serving Kaldi ASR models in production...

In Machine Learning, Mar 25, 2021

Our new Tech blog

We are merging past webpages of our Engineering and ML team in this new, central, Skit Tech page. From here on, this is going to be t...

In Engineering, Machine Learning, Feb 28, 2021

EMNLP 2020

Individual summary notes from EMNLP 2020.

In Machine Learning, Dec 21, 2020

Normalizing Flows - Part 1

Normalizing flows, popularized by (Rezende, & Mohamed, 2015), are techniques used in machine learning to transform simple probabi...

In Machine Learning, Dec 19, 2020

Interspeech 2020

We recently attended the all remote Interspeech 2020. Each of us made notes on what they did overall. But instead of posting those or...

In Machine Learning, Dec 01, 2020

Reading Sessions

Studying researches and building on top of them is an important part of what a team of ML Engineers do on a regular basis. Usually, t...

In Machine Learning, Nov 30, 2020

Bad Audio Detection

This blog will be a short one, where we’ll talk about our approach on filtering out inscrutable audios from VASR.

In Machine Learning, Jul 29, 2020

Speaker Diarization

This blog post is based on the work done by Anirudh Dagar as an intern at Skit.ai

In Machine Learning, Jul 21, 2020

A REPL for Conversations

A REPL, in programming, is an interactive environment where a programmer can go through the cycle of writing code, getting it Read, E...

In Machine Learning, Jan 30, 2020

Engineering

Authentication in gRPC

In gRPC, there are a number of ways you can add authentication between client and server. It is handled via Credentials Objects.

In Engineering, Oct 31, 2021

Our new Tech blog

We are merging past webpages of our Engineering and ML team in this new, central, Skit Tech page. From here on, this is going to be t...

In Engineering, Machine Learning, Feb 28, 2021

Building Fast and Efficient Microservices with gRPC

Skit.ai processes millions of speech recognition requests every day, and to handle such a load we have focused on building a highly s...

In Engineering, Feb 05, 2020

Theory of Mind

Theory of Mind and Implications for Conversational AI

When a diplomat says yes, he means ‘perhaps’; When he says perhaps, he means ‘no’; When he says no, he is not a diplomat.

In Machine Learning, Theory of Mind, May 19, 2022