Machine learning is a process of generalizing from examples. Those examples are the data to learn from. The more of them you have — and the closer they are to what your production system will face — the better.
Unfortunately, machine learning is faced with a chicken-and-egg problem. The best data would come from actual users of the machine learning system, which is unavailable before the system is built.
Thus, the name of the game is reaching deployment. The moment you put your system in front of real users, you can start collecting the data you wished you’d had to begin with. And with today’s systems running in the cloud and operating at internet scale, the volumes of data are often enormous, which is exactly what machine learning needs.
Therefore, in building conversational AI systems for our customers, Voicebox must figure out how to build something good enough to deploy, so it can then become amazing.
Voicebox does this through three main processes: bootstrapping, crowdsourcing, and diversification.
In bootstrapping, Voicebox’s data team and language experts generate as many plausible utterances as they can for all users of the system. For example, in building a conversational AI for online banking, they would think of as many ways as possible for asking about balances, transferring money, and everything else customers might need regarding their bank accounts.
Voicebox uses a proprietary crowdsourcing process to augment this data. Our process elicits novel phrasings that correspond to the different actions the conversational AI supports. This process is very effective in expanding the breadth of different utterance structures the conversational AI can understand. Although the crowd is not the same as users of the eventual system, they are a good substitute and in practice generate very similar utterances as real users.
Finally, to amplify the effect of bootstrapping and crowdsourcing data, Voicebox engages in a process called “diversification.” This process generates additional training data by taking each utterance collected so far and replacing parts that look like variables with different values. Each of these replacements yields an additional training sample.
For example, in the utterance “What’s the balance in my savings account,” the word “savings” is a variable that could be replaced with other values such as “checking,” “escrow,” “home equity,” and many others. By multiplying each utterance by as many different values that are possible for each variable, Voicebox often generates several tens of thousands of training utterances.
The result of all this effort is a training corpus that allows Voicebox to build a robust conversational AI using machine learning. Along with many other pre-release testing, data logging, and analysis steps, we can build systems that are both good enough to serve real user queries and to collect the data you wished you’d had to begin with.
Machine learning isn’t about the algorithms. Those are easy. It’s the data collection that’s hard. Thus, Voicebox’s success in machine learning comes from the tools and processes we’ve created for building training corpuses, no matter where they come from.