News
A Quantified House – My Talk to the Seattle Quantified Self Meetup
Jul 31st
I delivered the following presentation to a meetup of the Quantified Self group in Seattle tonight. The evening was a fascinating fusion of medicine, technology and personal improvement. My talk fell between a session on personal genome sequencing and another on measuring the effects of coffee on blood pressure.
Integrating an Android phone into my home automation system
Jun 18th
My home automation system has long had the capability to communicate with a Panasonic PBX. From the PBX it gets a flow of information about every phone call in or out the house. It also has a couple of caller-ID-to-serial-port devices that give it an earlier notification of incoming calls with Caller ID information. Using these various inputs it builds a database of every caller and every call (time of day, duration, person).
If it sees a call from someone not already in the database it will ask (via chat) for you to enter their full name. It then updates its database replacing the caller ID name (which is often useless, especially for mobile numbers). You can also query it using the natural language interface to ask about any calls you might have missed, or to lookup a number by time of day or by a fragment of their name. You can even ask complex queries like “who called last year on a friday after 5pm” and it will construct an efficient SQL query to get the results.
It also synchronizes all these contact records with Google Contacts.
But until recently my mobile phone hasn’t been part of the home automation system. Yes, I can use it as an input device for Google Talk, and yes, the house still notices when it comes and goes because the house tracks every device that ever gets an IP address on the local network, but other than that it really doesn’t ‘understand’ much about my cell phone.

But that’s about to change. Recently I installed Tasker on my Android phone and using that app I can now set up a whole variety of triggers that can report back to the home automation system information such as phone calls made or received, GPS location, wifi-located position, phone unlocks, shakes and more.
So I’ve extended the web interface on the Home Automation web server to accept POSTs from Tasker with updates from my cell phone. These are placed into a PubSubHub implementation that uses SignalR to distribute messages to any connected clients. The home automation server is itself a client of this service (it publishes information about every device change in the house and the PubSubHub shares those updates with any connected Web client) so you get real-time updates for what’s happening in the house on the house’s web page. Extending that architecture to include messages from remote devices like the Android phone was easy and I plan to use it in the future for other remote devices, such as a Netduino with a collection of environment and HVAC sensors on it (more about that later).
As to precisely what I’ll do with this new capability I have a long list of features to implement now:
1) Logging all cell phone calls to the same database, automatically building my contacts list
2) Tracking how long it takes to get to work by each of the various ways I can go, correlating that with the traffic flow information and automatically figuring out which route I should take for future trips
3) Shake cellphone to change music in the house
4) Adjusting the heating at home based on how far away we are (and thus the soonest we could get back)
5) Finishing up my semantic, location-aware shopping list (knows which store you are in and what you need there and presents it in order by aisle)
6) Automatically delivering notifications by the best possible means (talking on the speakers at home, XMPP, or by email if I’m in a different time zone)
etc.
Before there was the web there was BeebTel
Jun 6th
Sorting through the stuff from my attic in the UK I came across a manual written in 1985. Here are some verbatin quotes from the manual:
… allows us to gain instant access to information which may actually be stored on the other side of the world.
It allows you to prepare ‘pages’ and to connect these, one with another, in any way you like.
… you don’t have to be a computer expert to set up your own information system.
You can link one system to another, so that you can, in effect, create one large system.
You can set up a massive information system accessible from anywhere on the network!
It has an extremely powerful facility called an ‘execution frame’ … which allows you to ‘call up’ a computer program …
This came from a manual that I wrote in 1985 for a program called BEEBTel which was a networked, page-based viewer and editor for linked pages and executable scripts. Of course TimBL’s version was somewhat more successful, despite coming four years later
My first programme [sic]
Apr 21st
At the risk of looking seriously old, here’s something found on a paper tape bearing the title “Ian’s First Programme” …
BEGIN INTEGER A,B,C,D,E,F,G,ANS' READ A,B,C,D,E,F,G' ANS:=(A+B+C+D+E+F+G)/7' PRINT ANS' END'
Can you identify the language and the computer it ran on?
The Internet of Dogs
Mar 25th
In my previous post about GreenGoose I described my initial experiences with this “Internet Of Things in a box” product. Recently I’ve been trying their API and have integrated it into my Home Automation System.
The initial integration was easy, I used the new ASP.NET WebApi Core Libraries (from Nuget) together with Newtonsoft Json.Net. GreenGoose’s datetime format is somewhat quirky but hopefully they’ll move to a more standard one soon. They are, however, also about to switch to OAuth so it’s going to require some more work when that happens.
Aside from a few simple WebAPI calls and some Json parsing the rest was just a matter of connecting up the appropriate TimeSeries classes that I use to track values that vary over time, declaring a few graphs, and deciding what to log. With that in place I can now spin up a home automation ‘sensor’ corresponding to any GreenGoose sensor Id and my home automation system will add all of the relevant graphs and charts, triggers and more for that device.
What’s interesting is that a single sensor potentially serves a couple of different purposes. The dog collar sensor for example polls regularly back to the base station so it can potentially be used to sense both how much exercise the dog has had but also simply whether the dog is at home or not which could be really handy for anyone with a dog that’s learned to ignore the invisible fence! Each sensor can, through the TimeSeries objects also offer additional data and triggers that can be used elsewhere in the home, for example, an alert if the dog was walked less than half and hour each day.
Understanding Dates and Times in Natural Language
Dec 27th
One of the more challenging aspects of understanding natural language is dealing with date and time expressions. There are many different ways a user could refer to a specific date and time. They might say “Next Tuesday at 4pm”, they might give a specific date in any of several different forms, they might refer to other time ranges “First Tuesday in January 2012 at 4pm” etc.
Whilst my natural language engine can’t understand every possible date time expression (e.g. the second Wednesday after the first Friday in May 2010) it does handle a huge variety.
Clearly the .NET provided DateTime class is wholly inadequate to express the kinds of date/time expressions your users might enter. To deal with that I’ve created my own classes that represent items like a specific Time of day, a DateTimeRange, a DateTimeRangeCollection, …
TemporalSets are the most general result of parsing a datetime expression since they can represent any date time expression. Broadly they split into two categories: finite and infinite expressions. “Tuesday at 5pm” is an infinite time expression. “Tuesday at 5pm January 2012″ is a finite time expression. Sometimes you will want to accept an infinite expression and interpret it as a future or past finite occurrence. For that I have a MergePreferPast and MergePreferFuture method that operate on a TemporalSetCollection. The demonstration code on BitBucket shows this in action.
TemporalSets also have unique capabilities around both providing query expressions (for database searches) and generative expressions (for adding dates to a calendar).
If you’d like to try out the latest date / time expression parsing code in my Natural Language Engine you can visit the demo and try typing “define” followed by a date / time expression.
If you find any expressions that ought to work, please feel free to email or Tweet them to me.
Here’s a sample session:
define june 23rd 2010 Absolute:[DATETIMERANGE 6/23/2010 at 12:00 AM to 6/23/2010 at 11:59 PM] define January 19th Future:[Thursday 1/19/2012], [Saturday 1/19/2013], [Sunday 1/19/2014], [Monday 1/19/2015], [Tuesday 1/19/2016], [Thursday 1/19/2017], [Friday 1/19/2018], [Saturday 1/19/2019], [Sunday 1/19/2020], [Tuesday 1/19/2021] Past: [Wednesday 1/19/2011] ...
A traffic service that answers “which way should I go?”
Oct 17th

Most traffic reports (on the radio or in text message alerts) are fairly useless. Like weather reports they contain lots of irrelevant information that could be eliminated with just a bit of extra context. In fact, most of the information they deliver is completely irrelevant to you as an individual located in one spot and hoping to get to another spot. Furthermore they aren’t actionable – telling me the traffic is slow on SR-520 and on I-90 isn’t interesting unless you can tell me which is the best way to go given where I am now and where I want to be.
So this weekend I added a new feature to the home automation that uses the WSDOT’s excellent traffic feed API to calculate a traffic report just for me. Recently I’ve started driving from the north end of Bellevue to the south end of Sammamish during rush hour. There are two very different paths I can take: SR-520 or I-405 to I-90. If either route has a problem I should take the other. So now I get an XMPP (chat) message from 4PM to 6PM whenever the optimal path changes from one route to the other. It’s the absolute minimum information I need and it’s 100% actionable.
For the moment the calculation is fairly simple, I simply maintain a list of the FlowDataID values along each route and then calculate a total ‘slowness’ factor based on the sum of those segments. If one way is much better than the other it generates an alert. If it goes back to being roughly equal the alert is cleared.
Since the calculation is purely relative (route A vs route B) it’s also fairly immune to day-of-week / school-holidays and other factors that have a significant impact on traffic but no impact on the only actionable decision I need to consider.
One other interesting point from the graph is just how spiked the traffic is on SR-520 compared to I-90.
“Remember Everything” … a long-term project
Sep 15th
“Remember Everything” connects nearly all of my projects into one giant solution that, well, remembers everything and has a natural language interface over it.
As inputs it will take information from my home automation system, my whole-network storage crawler, Google calendar, email, Twitter, blog, web crawler, an address-monitoring browser add-on I plan to write, the weather and traffic feeds, and, of course my natural language engine.
All this data will be put into MongoDB and can then be queried. Relationships between entities will be created using a semantic-web triple store and reasoner.
Together these capabilities will allow queries like:-
* Copy all the photos I took last week onto c:\vacationPhotos
* Send img_0938.jpg to mum.
* Who called last Monday?
* Show pictures from last month taken on sunny days.
* What was happening two weeks ago when X called?
* Who called yesterday when I was in a meeting?
* What song was playing around 9pm last night?
* How long did I spend on the phone to my accountant last week?
* What web pages did I read last week about the Semantic Web?
* Send the web page I tweeted about last night to my Kindle.
* We need butter and olives.
* What do I need to buy from QFC? (a semantic shopping list concept, more on that later …)
In addition to the shopping lists concept (that’s already in my home automation system but lacks the semantic reasoning) the system will take any subject-verb-object phrase and remember it and then allow you to query it back later, e.g.
* My son read 20 pages tonight (making the weekly reading report easier)
* How many pages did he read this week?
* I took the red pill at 10AM
* I walked 2 miles this morning
* I ran 4 miles
* How much exercise did I do this week when it wasn’t raining? (summarizing values semantically and mathematically)
* The Audi was serviced this week (remembering schedules so you can check if an item is overdue)
* My BA frequent flyer number is #### (remembering numbers you need to look up often)
* I took the day off on friday (vacation reporting)
* I spent $12.95 on lunch (expense reporting)
…
Whenever you have anything you need to remember the system will be able to remember it, recall it, and where possible aggregate or summarize it using math and/or semantic reasoning (e.g. running subClassOf exercise, butter subClassOf dairy product, dairy products areSoldAt QFC, …).
By linking my natural language engine to a triple store I can even allow users to teach it new concepts:
By silently monitoring your email, Twitter stream, calendar, activity in the house, … it will be able to answer questions based on the context not just on the content in ways that we take for granted as humans but which are not possible for computers today.
Home network crawler – cataloging every file on the home LAN with C# and MongoDB
Aug 22nd

Map-Reduce in action: The glaciers in Greenland 'map' the canyon walls into streams of rocks called lateral moraine. As the glaciers merge these rocks are 'reduced' into streams in the middle called 'medial' moraine. (A photo I took over Greenland this summer.)
I’m not a huge fan or RAID arrays – they mostly mean there’s another component to go wrong (the controller card) and when they do go wrong you can lose all your data just as easily as if it were all on one drive. I prefer a multiple copy strategy, an “Amazon S3 for the home” if you like. The downside of this is that there are multiple copies of each file across the home network and as I have several generations of hard drives the mapping from primary to secondary to tertiary is complex and hard to manage! It’s also really hard to find a single file when there are so many places to look and it’s nigh on impossible to be sure that I have the necessary three copies of every important file in the right places at all times.
So this weekend I embarked on a small project to catalog every file, directory and storage volume on the entire home network including drives that are only sometimes connected. The software has been running all weekend and is close to cataloging everything. It’s found 5 million files so far representing over 6TB of data!
The architecture I chose for this software was an agent that runs on each PC to catalog all of the attached volumes. This client uploads all the directories and files that it finds to a MongoDB database running on the same Atom server as the main storage array. The poor little Atom server’s 4GB of RAM has been in constant use but the server has remained responsive, in part because it boots from an SSD drive.
Each volume, directory and file is represented by a document in MongoDB in a single collection. The agent calculates an MD5 hash for each file and extracts metadata from MP3, WMA and JPG files. It also stores all of the key file dates (created, updated, accessed) and references to parent directories, volume identifiers and the currently connected PC. It does not assume that a volume is always connected to the same computer – you can unplug an external drive from one and put it somewhere else and it will all work just fine.
I implemented a re-startable tree scan that uses a couple of DateTime stamps to be able to determine which directories need to be scanned during the current pass and which ones have already been scanned. Any agent can be killed at any time and restarted and it will carry on walking the directory tree right where it left off. It will even continue correctly in the case where you move a volume from one PC to another.
Each agent uses the Parallel Task library’s Parallel.ForEach to crawl each volume in parallel and to parse multiple files from each directory simultaneously.
By storing all of the file metadata in Mongo DB it’s easy to use Map-Reduce to calculate some interesting statistics for the files on the network.
For example, to create a summary of file sizes I can use a Map function:
function Map() {
if (this.Size && this._t == "FileInformation")
{
var size = this.Size;
if (size < 1024)
emit ("kb", {count:1, size:this.Size});
else if (size < 1024*1024)
emit ("mb", {count:1, size:this.Size});
else if (size < 1024*1024*1024)
emit ("gb", {count:1, size:this.Size});
else if (size < 1024*1024*1024*1024)
emit ("tb", {count:1, size:this.Size});
else
emit ("tb+", {count:1, size:this.Size});
}
}
and a reduce function:
function Reduce(key, arr_values) {
var count = 0;
var size = 0;
for(var i in arr_values)
{
count = count + arr_values[i].count;
size = size + arr_values[i].size;
}
return {count:count, size:size};
}
Map-Reduce operations like this take about 20 minutes to run (on the Atom server with just 4GB of RAM) whereas any query serviced by one of the indexes on the MongoDB collection is almost instantaneous.
I’ve been using the excellent MongoVue to run simple map-reduce scripts like this and to keep track of how quickly the database is growing.
Map-reduce can also be used to find duplicate files – by emitting the MD5 hash as the key and some information about the file as the value I can find every copy of every file across every computer on the home network.
Since I have the file name and metadata for every file on the home network I can also easily find any file using MongoDB’s regex matching feature against the path.
The Hard Parts
For starters you’ll need a library that can handle long file names. Then you’ll need to fix it to provide at least the functionality that FileInfo and DirectoryInfo give you in .NET.
Next you’ll need to learn about reparse-points and hard-links and you’ll need to skip over them because with them in place the file system is not a tree; it’s a cyclical graph in which a simple crawler will quickly get confused or stuck.
You’ll also want to store the NTFS file Id and the unique Volume ID for every file so you can track it when the file is moved or the removable drive is connected to a different computer.
So how well does it work?
This all seems to work really well. Nearly every volume has now been cataloged. It’s located about 5M files occupying over 6TB of space. The worst case offender for the number of copies of the same file is 100+. I’ve used the find feature in MongoDB to find a file I was missing and I’m better able to plan how to arrange directories and file generations across the various hard drives I have.
What’s next
Well, of course this needs to be connected to the home automation system and my Natural Language engine so you can ask “send a copy of IMG_0228 from last week to X” or “where are all the spreadsheets I created last year?” That will be fairly easy.
After that I hope to incorporate backup features into the agents too so they can automatically keep the required number of copies of each file according to its importance. I’d also like to set up a rotating set of external drives that go in the fire safe when not connected and when they are connected they get updated with the latest copies of all the important files.
I’d also like to be able to get the agents to move whole groups of directories around between drives as juggling the directory layout each time a new hard drive is added to the system is always a time consuming process.
Comments or Questions?
Does everyone else have a hard time managing multiple computers, hard drives, directories and multiple copies of files? What tools do you use to do this? Is there anything commercially available that I could have used instead? Would a tool like this be useful to you? Should I publish the code somewhere? Comments and questions are always welcome here or on twitter.



Home power meters revisited
Jul 1st
Posted by Ian Mercer in Commentary
In an earlier post I discussed the utility (or otherwise) of the 24 hour power consumption graph and questioned why Google and Microsoft were both investing in this approach to home energy efficiency. Since then both Google and Microsoft have stopped their efforts in this area.
Interestingly, in Europe I’m seeing more and more homes with devices like the one shown here that provide real-time power consumption information. One of the more interesting uses for devices like these is as a check that everything has been turned off when a homeowner is about to leave the house. A quick glance at the meter can reveal if a heater has been left on in a bedroom. Of course the main water heater has the largest impact on the reading but homeowners learn what numbers represent ‘normal’ and can see at a glance when something else has been left on. Clearly a true smart home that can turn devices off when they are no longer in use is still a better long-term solution for this scenario but it’s interesting to see how a fairly simple device can at least provide an indication that everything is off without a significant investment in replacing light switches and device controllers. What would be nicer however would be if the meter included some kind of machine learning so it could show at a glance if the home is in a minimal power state or not.