The Blog of Ian Mercer.

Natural Language Processing

Cover Image for Natural Language Processing

Natural language processing, understanding and generation has been a passion of mine for over a decade. I first started implementing it for my home automation system but it grew into a major project of its own. At the time there was nothing that could do what I wanted, and there still isn't anything that has all the functionality my engine offers.

For starters I wanted to be able to build natural language understanding (NLU) features directly into my programs. In a typical system offered today to integrate NLU into your app the notion of recognition is totally separate from the intents that you implement in your application and often, all you have is a web API calling your code with some 'intent' data. By contrast, in my system, your code can interact with the natural language engine, updating the vocabulary on the fly, interpreting new tokens that you add to your code, adding new intents in one step along with the sentence structures that represent them, and, most importantly of all, delivering strongly-typed tokens that you can interact with directly including units of measure (distance, time, temperature, ...) and temporal expressions (Monday, last week, three weeks ago). You can also take the tokens passed to your rules and ask to conjugate them, so when you ask "Create a task ..." it can reply "I created a task ...". Strong integration with your own code (rather than a web api call) means you can do database lookups to add new words or phrases.

Along the way I realized that the engine I had built really didn't need spaces to go a good job and that I could extend it to allow 'twitter-speak' where peoplerunthewordstogether.

Integration with WordNet from Princeton University provides a rich vocabulary with synonyms so now you can define sentences using a meaning instead of a word and it will recognize all the words that could have that meaning. I compile all of WordNet into two Nuget packages: one with the more common words, one with the less common words. Compiling it into code means that my engine can use Visual Studio's Intellisense to show meanings when you hover over a word. This helps you find the right sense for a word when there are multiple possble meanings.

You can read more about my Natural Language Engine for .NET here. You'll also find more blog posts about #nlp here.

Related Stories