A strongly-typed natural language engine (C# NLP)
Here is an explanation of the natural language engine that powers my home automation system. It’s a strongly-typed natural language engine with tokens and sentences being defined in code. It currently understands sentences to control lights, heating, music, sprinklers, … You can ask it who called, you can tell it to play music in a particular room, … it tells you when a car comes down the drive, when the traffic is bad on I-90, when there’s fresh snow in the mountains, when it finds new podcasts from NPR, … and much more.
The natural language engine itself is a separate component that I hope one day to use in other applications.
Existing Natural Language Engines
- Have a large, STATIC dictionary data file
- Can parse complex sentence structure
- Hand back a tree of tokens (strings)
- Don’t handle conversations
C# NLP Engine
- Defines strongly-typed tokens in code
- Uses type inheritance to model ‘is a’
- Defines sentences in code
- Rules engine executes sentences
- Understands context (conversation history)
Sample conversation
Goals
- Make it easy to define tokens and sentences (not XML)
- Safe, compile-time checked definition of the syntax and grammar (not XML)
- Model real-world inheritance with C# class inheritance:
- ‘a labrador’ is ‘a dog’ is ‘an animal’ is ‘a thing’
- Handle ambiguity, e.g.
play something in the air tonight in the kitchen remind me at 4pm to call john at 5pm
C# NLP Engine Structure
Tokens – Token Definition
- A hierarchy of Token-derived classes
- Uses inheritance, e.g. TokenOn is a TokenOnOff is a TokenState is a Token. This allows a single sentence rule to handle multiple cases, e.g. On and Off
- Derived from base Token class
- Simple tokens are a set of words, e.g. « is | are »
- Complex tokens have a parser, e.g. TokenDouble
A Simple Token Definition
public class TokenPersonalPronoun : TokenGenericNoun
{
internal static string wordz { get { return "he,him,she,her,them"; } }
}
- Recognizes any of the words specified
- Can use inheritance (as in this example)
A Complex Token
public abstract class TokenNumber : Token
{
public static IEnumerable<TokenResult> Initialize(string input)
{
…
- Initialize method parses input and returns one or more possible parses.
TokenNumber is a good example:
- Parses any numeric value and returns one or more of TokenInt, TokenLong, TokenIntOrdinal, TokenDouble, or TokenPercentage results.
The catch-all TokenPhrase
public class TokenPhrase : Token
TokenPhrase matches anything, especially anything in quote marks
e.g. add a reminder "call Bruno at 4pm"
The sentence signature to recognize this could be
(…, TokenAdd, TokenReminder, TokenPhrase, TokenExactTime)
This would match the rule too …
add a reminder discuss 6pm conference call with Bruno at 4pm
TemporalTokens
A complete set of tokens and related classes for representing time
- Point in time, e.g. today at 5pm
- Approximate time, e.g. who called at 5pm today
- Finite sequence, e.g. every Thursday in May 2009
- Infinite sequence, e.g. every Thursday
- Ambiguous time with context, e.g. remind me on Tuesday (context means it is next Tuesday)
- Null time
- Unknowable/incomprehensible time
TemporalTokens (Cont.)
Code to merge any sequence of temporal tokens to the smallest canonical representation,
e.g.
the first thursday in may 2009
->
{TIMETHEFIRST the first} + {THURSDAY thursday} + {MAY in may} + {INT 2009 -> 2009}
->
[TEMPORALSETFINITESINGLEINTERVAL [Thursday 5/7/2009] ]
TemporalTokens (Cont.)
Finite TemporalClasses provide
All TemporalClasses provide
Existing Token Types
- Numbers (double, long, int, percentage, phone, temperature)
- File names, Directories
- URLs, Domain names
- Names, Companies, Addresses
- Rooms, Lights, Sensors, Sprinklers, …
- States (On, Off, Dim, Bright, Loud, Quiet, …)
- Units of Time, Weight, Distance
- Songs, albums, artists, genres, tags
- Temporal expressions
- Commands, verbs, nouns, pronouns, …
Rules – A simple rule
/// <summary>
/// Set a light to a given state
/// </summary>
private static void LightState(NLPState st, TokenLight tlight, TokenStateOnOff ts)
{
if (ts.IsTrueState == true) tlight.ForceOn(st.Actor);
if (ts.IsTrueState == false) tlight.ForceOff(st.Actor);
st.Say("I turned it " + ts.LowerCased);
}
Any method matching this signature is a sentence rule:- NLPState, Token*
Rule matching respects inheritance, and variable repeats … (NLPState st, TokenThing tt, TokenState tokenState, TokenTimeConstraint[] constraints)
Rules are discovered on startup using Reflection and an efficient parse graph is built allowing rapid detection and rejection of incoming sentences.
State – NLPState
- Every sentence method takes an NLPState first parameter
- State includes RememberedObject(s) allowing sentences to react to anything that happened earlier in a conversation
- Non-interactive uses can pass a dummy state
- State can be per-user or per-conversation for non-realtime conversations like email
- Chat (e.g Jabber/Gtalk)
- Web chat
- Calendar (do X at time Y)
- Rich client application
- Strongly-typed natural language engine
- Compile time checking, inheritance, …
- Define tokens and sentences (rules) in C#
- Strongly-typed tokens: numbers, percentages, times, dates, file names, urls, people, business objects, …
- Builds an efficient parse graph
- Tracks conversation history
- Company names, locations, documents, …
- From TimeExpressions
User Interface
Works with a variety of user interfaces
Summary
Future plans
Expanded corpus of knowledge
Generate iCal/Gdata Recurrence


about 1 year ago
hello sir,
i am doing the final year project on speech recognition in c# and .net . So plaese help me by giving the code of this NATURAL LANGUAGE PROCESSING
about 1 year ago
@Rashid. There’s no code download for it at the moment, sorry. There are some open source Natural Language Engines like OpenNLP that are more suited to academic sentence structure analysis. This one is focused on command / control and query operations with a conversational approach.
about 1 year ago
Fascinating! Thanks for the post.
I had a similar problem, in trying to parse twitter posts as market place ‘offers’ or posting of ‘wants’. Similar to you I was OK to constrain it to only work on a limited grammar (the general problem being of course unsolvable). Unfortunately (due to expediency) ended up solving it with boring/inadequete RegExp type approach.
At one point however, we tried to deal with this using a much more token based approach, specifically in Haskell..
http://git.metasoft.co.nz/?p=parseoffr.git;a=blob;f=Offer.hs;h=f8be60c55c6d3514dc39badee1d20ae55cbd0720;hb=4f5e4c2855e2ce1e7f8271911342785085207c90
I’ve contemplated doing a similar thing in F# (since .NET is my native language, so to speak). Have you considered whether the tokenizer could leverage some of the ‘pattern recognition’ stuff that languages like Haskell provide?
For what its worth you can see the domain of our (open source) attempt at language processing here:
http://bit.ly/cWMLfl .. but would love to kick it up a notch in terms of strictness of our parsing.
about 6 months ago
Congratulations on the success of your smart home. I listened to your interview on .NET Rocks and what you have accomplished is amazing.
I found your site while searching for C# Home Automation. Further searching has turned up very little on NLP for .NET. In the interview you mentioned that you may monetize your C# NLP Engine, has there been any progress on that front? Are there any timelines for release? If not, is it possible to get a copy to play with for purely hobby purposes, nothing commercial.
Also, you mentioned in the interview that you tried X10 and the others and they were not what you were looking for. Are you using all custom built hardware? Can you recommend any hardware?
Thanks.