Posts tagged NLP
An ontology triple (quad) store for RDF/OWL using Entity Framework 4
May 12th
This weeks side-project was the creation of an ontology store using Entity Framework 4. An ontology store stores axioms consisting of Subject, Predicate, Object which are usually serialized as RDF, OWL, N3, … Whereas there’s lots of details about these serialization formats, the actual mechanics of how to store and manipulate them was somewhat harder to come by. Nevertheless, after much experimentation I came up with an Entity Model that can store Quads (Subject, Predicate, Object and Meta) or Quins (Subject, Predicate, Object, Meta, Graph). The addition of Meta allows one Axiom to reference another. The addition of Graph allows the store to be segmented making it easy to import some N3 or RDF into a graph, then flush that graph if it is no longer needed or if a newer version becomes available.
The store is currently hooked up to an Euler reasoner that can reason against it, lazily fetching just the necessary records from the SQL database that backs the Entity Model.
Here’s the EDMX showing how I modeled the Ontology Store:
Applying the Semantic Web to Home Automation
Apr 26th
Recently I’ve been considering how the Semantic Web will impact home automation.
Technologies like the Web Ontology Language (OWL) and RDF allow for the construction of complex ontologies that define what things are, and how they relate. Using these ontologies automated reasoning can be applied to generate new facts or to prove or disprove assertions.
This sounds like the ideal companion to the Natural Language Processing (NLP) Engine that I have already created for the my home automation system. With reasoning powers added to the natural language engine and the ability to augment the knowledge base by adding new assertions the whole system will be much more powerful. One day it might even be possible to create the entire home definition using a natural language text file and to query the system using rich natural language queries.
So, the first step is to find an existing ontology store and reasoning engine. A quick web search reveals that most are built in Java. There were a couple of links I came across later for .NET: http://razor.occams.info/code/semweb/ and http://www.intellidimension.com/products/semantics-server/. There’s also an interesting Q&A site at http://semanticoverflow.com which has lots of useful information on it.
But rather than starting with some existing library I really wanted to understand more deeply how an ontology store works and how a reasoning engine functions, so over the course of a couple of evenings I created my own. I now have a triple store and a simple reasoning engine. Here’s an actual conversation so you can see what it’s capable of so far and can perhaps get a glimpse at how powerful this concept could be:-
house is a class
contains is a property
contain is a property
contains is the same as contain
contain is the same as contains
contains is transitive
contain is transitive
first floor is a class
room is a class
kitchen is a room
first floor contains kitchen
house contains first floor
does house contain kitchenHouse: Yes, house contain kitchen because [house contains first floor] -> [first floor contains kitchen]
As you can see my semantic store can already represent classes, relationships between classes, new relationships (‘contains’), relationships between relationships (‘same as’). For such a small amount of code it’s quite surprising what this system can now handle in terms of knowledge representation and simple reasoning.
Next time I get some spare time I’ll hook it up to the actual home model so you can start to query that in much more powerful ways than before.
Stay tuned!
A strongly-typed natural language engine (C# NLP)
Feb 28th
Here is an explanation of the natural language engine that powers my home automation system. It’s a strongly-typed natural language engine with tokens and sentences being defined in code. It currently understands sentences to control lights, heating, music, sprinklers, … You can ask it who called, you can tell it to play music in a particular room, … it tells you when a car comes down the drive, when the traffic is bad on I-90, when there’s fresh snow in the mountains, when it finds new podcasts from NPR, … and much more.
The natural language engine itself is a separate component that I hope one day to use in other applications.
Existing Natural Language Engines
- Have a large, STATIC dictionary data file
- Can parse complex sentence structure
- Hand back a tree of tokens (strings)
- Don’t handle conversations
C# NLP Engine
- Defines strongly-typed tokens in code
- Uses type inheritance to model ‘is a’
- Defines sentences in code
- Rules engine executes sentences
- Understands context (conversation history)
Sample conversation
Goals
- Make it easy to define tokens and sentences (not XML)
- Safe, compile-time checked definition of the syntax and grammar (not XML)
- Model real-world inheritance with C# class inheritance:
- ‘a labrador’ is ‘a dog’ is ‘an animal’ is ‘a thing’
- Handle ambiguity, e.g.
play something in the air tonight in the kitchen remind me at 4pm to call john at 5pm
C# NLP Engine Structure
Tokens – Token Definition
- A hierarchy of Token-derived classes
- Uses inheritance, e.g. TokenOn is a TokenOnOff is a TokenState is a Token. This allows a single sentence rule to handle multiple cases, e.g. On and Off
- Derived from base Token class
- Simple tokens are a set of words, e.g. « is | are »
- Complex tokens have a parser, e.g. TokenDouble
A Simple Token Definition
public class TokenPersonalPronoun : TokenGenericNoun
{
internal static string wordz { get { return "he,him,she,her,them"; } }
}
- Recognizes any of the words specified
- Can use inheritance (as in this example)
A Complex Token
public abstract class TokenNumber : Token
{
public static IEnumerable<TokenResult> Initialize(string input)
{
…
- Initialize method parses input and returns one or more possible parses.
TokenNumber is a good example:
- Parses any numeric value and returns one or more of TokenInt, TokenLong, TokenIntOrdinal, TokenDouble, or TokenPercentage results.
The catch-all TokenPhrase
public class TokenPhrase : Token
TokenPhrase matches anything, especially anything in quote marks
e.g. add a reminder "call Bruno at 4pm"
The sentence signature to recognize this could be
(…, TokenAdd, TokenReminder, TokenPhrase, TokenExactTime)
This would match the rule too …
add a reminder discuss 6pm conference call with Bruno at 4pm
TemporalTokens
A complete set of tokens and related classes for representing time
- Point in time, e.g. today at 5pm
- Approximate time, e.g. who called at 5pm today
- Finite sequence, e.g. every Thursday in May 2009
- Infinite sequence, e.g. every Thursday
- Ambiguous time with context, e.g. remind me on Tuesday (context means it is next Tuesday)
- Null time
- Unknowable/incomprehensible time
TemporalTokens (Cont.)
Code to merge any sequence of temporal tokens to the smallest canonical representation,
e.g.
the first thursday in may 2009
->
{TIMETHEFIRST the first} + {THURSDAY thursday} + {MAY in may} + {INT 2009 -> 2009}
->
[TEMPORALSETFINITESINGLEINTERVAL [Thursday 5/7/2009] ]
TemporalTokens (Cont.)
Finite TemporalClasses provide
All TemporalClasses provide
Existing Token Types
- Numbers (double, long, int, percentage, phone, temperature)
- File names, Directories
- URLs, Domain names
- Names, Companies, Addresses
- Rooms, Lights, Sensors, Sprinklers, …
- States (On, Off, Dim, Bright, Loud, Quiet, …)
- Units of Time, Weight, Distance
- Songs, albums, artists, genres, tags
- Temporal expressions
- Commands, verbs, nouns, pronouns, …
Rules – A simple rule
/// <summary>
/// Set a light to a given state
/// </summary>
private static void LightState(NLPState st, TokenLight tlight, TokenStateOnOff ts)
{
if (ts.IsTrueState == true) tlight.ForceOn(st.Actor);
if (ts.IsTrueState == false) tlight.ForceOff(st.Actor);
st.Say("I turned it " + ts.LowerCased);
}
Any method matching this signature is a sentence rule:- NLPState, Token*
Rule matching respects inheritance, and variable repeats … (NLPState st, TokenThing tt, TokenState tokenState, TokenTimeConstraint[] constraints)
Rules are discovered on startup using Reflection and an efficient parse graph is built allowing rapid detection and rejection of incoming sentences.
State – NLPState
- Every sentence method takes an NLPState first parameter
- State includes RememberedObject(s) allowing sentences to react to anything that happened earlier in a conversation
- Non-interactive uses can pass a dummy state
- State can be per-user or per-conversation for non-realtime conversations like email
- Chat (e.g Jabber/Gtalk)
- Web chat
- Calendar (do X at time Y)
- Rich client application
- Strongly-typed natural language engine
- Compile time checking, inheritance, …
- Define tokens and sentences (rules) in C#
- Strongly-typed tokens: numbers, percentages, times, dates, file names, urls, people, business objects, …
- Builds an efficient parse graph
- Tracks conversation history
- Company names, locations, documents, …
- From TimeExpressions
User Interface
Works with a variety of user interfaces
Summary
Future plans
Expanded corpus of knowledge
Generate iCal/Gdata Recurrence


