<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ian Mercer &#187; C#</title>
	<atom:link href="http://blog.abodit.com/tag/csharp/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.abodit.com</link>
	<description>Living in the World&#039;s Smartest House</description>
	<lastBuildDate>Sat, 07 Jan 2012 19:50:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Dynamic persistence with MongoDB &#8211; look, no classes! Multiple inheritance in C#!</title>
		<link>http://blog.abodit.com/2011/09/dynamic-persistence-with-mongodb-look-no-classes-polymorphism-in-c/</link>
		<comments>http://blog.abodit.com/2011/09/dynamic-persistence-with-mongodb-look-no-classes-polymorphism-in-c/#comments</comments>
		<pubDate>Tue, 06 Sep 2011 23:58:45 +0000</pubDate>
		<dc:creator>Ian Mercer</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[ImpromptuInterface]]></category>

		<guid isPermaLink="false">http://blog.abodit.com/?p=1587</guid>
		<description><![CDATA[In an earlier post I explained a technique to create a class-free persistence layer using MongoDB. [Read that post first, then come back here.] Since then I&#8217;ve refined the techniques involved and created a cleaner implementation that does away with the `.props` collection on each object. Now when you add an interface to an object <a href="http://blog.abodit.com/2011/09/dynamic-persistence-with-mongodb-look-no-classes-polymorphism-in-c/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>In an <a href="http://blog.abodit.com/2011/05/class-free-persistence-multiple-inheritance-in-c-sharp-mongodb/" title="Class free persistence with MongoDB">earlier post</a> I explained a technique to create a class-free persistence layer using MongoDB.  [Read that post first, then come back here.]</p>
<p>Since then I&#8217;ve refined the techniques involved and created a cleaner implementation that does away with the `.props` collection on each object.  Now when you add an interface to an object you get exactly what you expected in the persisted data.</p>
<p>To use it you first need to register the serialization code somewhere in your startup code&#8230;</p>
<pre class="brush: csharp; title: ; notranslate">
            BsonSerializer.RegisterSerializationProvider(new MongoDynamicSerializationProvider());
</pre>
<p>The Serialization provider is quite simple:</p>
<pre class="brush: csharp; title: ; notranslate">
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using MongoDB.Bson.Serialization;

namespace MongoData.Dynamic
{
    public class MongoDynamicSerializationProvider : IBsonSerializationProvider
    {

        public IBsonSerializer GetSerializer(Type type)
        {
            if (typeof(MongoDynamic).IsAssignableFrom(type))
                return MongoDynamicBsonSerializer.Instance;
            return null;
        }
    }
}
</pre>
<p>The serializer is a bit more involved.  It uses an <strong>interface map</strong> to decide what type to return for each serialized object.  This is critical because many different .NET types can map onto the same BSon serialized value and only by maintaining this map can we get back to the original type.   It&#8217;s also<br />
critical for handling nested object graphs containing different types.</p>
<pre class="brush: csharp; title: ; notranslate">
using System;
using System.Collections.Concurrent;
using System.Dynamic;
using System.Linq;
using System.Linq.Expressions;
using System.Runtime.CompilerServices;
using Microsoft.CSharp.RuntimeBinder;
using MongoDB.Bson.IO;
using MongoDB.Bson.Serialization;
using MongoDB.Bson.Serialization.Serializers;
using MongoDB.Bson;
using MongoDB.Bson.Serialization.IdGenerators;
using System.Collections.Generic;
using ImpromptuInterface;

namespace MongoData.Dynamic
{
    public class MongoDynamicBsonSerializer : BsonBaseSerializer
    {
        private static MongoDynamicBsonSerializer instance = new MongoDynamicBsonSerializer();

        public static MongoDynamicBsonSerializer Instance
        {
            get { return instance; }
        }

        public override object Deserialize(BsonReader bsonReader, Type nominalType, IBsonSerializationOptions options)
        {
            var bsonType = bsonReader.CurrentBsonType;
            if (bsonType == BsonType.Null)
            {
                bsonReader.ReadNull();
                return null;
            }
            else if (bsonType == BsonType.Document)
            {
                var os = new ObjectSerializer();
                MongoDynamic md = new MongoDynamic();
                bsonReader.ReadStartDocument();

                Dictionary&lt;string, Type&gt; typeMap = null;

                // scan document first to find interfaces
                {
                    var bookMark = bsonReader.GetBookmark();
                    if (bsonReader.FindElement(MongoDynamic.InterfacesField))
                    {
                        md[MongoDynamic.InterfacesField] = BsonValue.ReadFrom(bsonReader).AsBsonArray.Select(x =&gt; x.AsString);
                        typeMap = md.GetTypeMap();
                    }
                    else
                    {
                        throw new FormatException(&quot;No interfaces defined for this dynamic object - can't deserialize it&quot;);
                    }
                    bsonReader.ReturnToBookmark(bookMark);
                }

                while (bsonReader.ReadBsonType() != BsonType.EndOfDocument)
                {
                    var name = bsonReader.ReadName();

                    if (name == &quot;_id&quot;)
                    {
                        md[name] = BsonValue.ReadFrom(bsonReader).AsObjectId;
                    }
                    else if (name == MongoDynamic.InterfacesField)
                    {
                        // Read it and ignore it, we already have it
                        BsonValue.ReadFrom(bsonReader);
                    }
                    else
                    {
                        if (typeMap == null) throw new FormatException(&quot;No interfaces define for this dynamic object - can't deserialize&quot;);
                        // lookup the type for this element according to the interfaces
                        Type elementType;
                        if (typeMap.TryGetValue(name, out elementType))
                        {
                            var value = BsonSerializer.Deserialize(bsonReader, elementType);
                            md[name] = value;
                        }
                        else
                        {
                            // This is a value that is no longer in the interface, maybe a column you removed
                            // not really much we can do with it ... but we need to read it and move on
                            var value = BsonSerializer.Deserialize(bsonReader, typeof(object));
                            md[name] = value;

                            // As with all databases, removing elements from the schema is always going to cause problems ...
                        }
                    }
                }
                bsonReader.ReadEndDocument();
                return md;
            }
            else
            {
                var message = string.Format(&quot;Can't deserialize a {0} from BsonType {1}.&quot;, nominalType.FullName, bsonType);
                throw new FormatException(message);
            }
        } 

        public override bool GetDocumentId(object document, out object id, out Type idNominalType, out IIdGenerator idGenerator)
        {
            MongoDynamic x = (MongoDynamic)document;
            id = x._id;
            idNominalType = typeof(ObjectId);
            idGenerator = new ObjectIdGenerator();
            return true;
        }

        public override void SetDocumentId(object document, object id)
        {
            MongoDynamic x = (MongoDynamic)document;
            x._id = (ObjectId)id;
        }

        public override void Serialize(BsonWriter bsonWriter, Type nominalType, object value, IBsonSerializationOptions options)
        {
            if (value == null)
            {
                bsonWriter.WriteNull();
                return;
            }
            var metaObject = ((IDynamicMetaObjectProvider)value).GetMetaObject(Expression.Constant(value));
            var memberNames = metaObject.GetDynamicMemberNames().ToList();
            if (memberNames.Count == 0)
            {
                bsonWriter.WriteNull();
                return;
            }

            bsonWriter.WriteStartDocument();
            foreach (var memberName in memberNames)
            {
                // ToDo: handle all those _id Id id variants?
                bsonWriter.WriteName(memberName);

                object memberValue;
                if (memberName == &quot;_id&quot;) memberValue = ((MongoDynamic)value)._id;
                else if (memberName == &quot;int&quot;) memberValue = ((MongoDynamic)value).@int;
                else memberValue = Impromptu.InvokeGet(value, memberName);

                if (memberValue == null)
                    bsonWriter.WriteNull();
                else
                {
                    var memberType = memberValue.GetType();
                    var serializer = BsonSerializer.LookupSerializer(memberType);
                    serializer.Serialize(bsonWriter, memberType, memberValue, null);
                }
            }
            bsonWriter.WriteEndDocument();
        }
    }
}
</pre>
<p>And finally, the actual <string>MongoDynamic</strong> class:</p>
<pre class="brush: csharp; title: ; notranslate">
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Dynamic;
using MongoDB.Bson;
using MongoDB.Bson.Serialization.Attributes;
using ImpromptuInterface;

namespace MongoData.Dynamic
{
    /// &lt;summary&gt;
    /// All MongoDynamic objects support this interface because every object needs an _id in MongoDB
    /// &lt;/summary&gt;
    public interface IId
    {
        ObjectId _id { get; set; }
    }

    /// &lt;summary&gt;
    /// MongoDynamic is like an ExpandoObject that also understands document Ids and uses Improptu interface
    /// to act like any other collection of interfaces ...
    /// It can be serialized and deserialized from BSon and thus stored in a MongoDB database.
    /// &lt;/summary&gt;
    /// &lt;remarks&gt;
    /// This simple class gives you the ability to define database objects using only .NET interfaces - no classes!
    /// Those objects can be dynamically extended to support any interface you want to add to them - polymorphism!
    /// When loaded back from the database the object will support all of the interfaces that were ever applied to it.
    /// Adding a new field is easy.  Removing one works too.
    /// All fields must be nullable since they may not be present on earlier instances of an object type.
    /// &lt;/remarks&gt;
    public class MongoDynamic : DynamicObject, IId
    {
        [BsonId(Order=1)]
        public ObjectId _id { get; set; }

        // Dumb name for a property - which is why I chose it - very unlikely it will ever conflict with a real property name
        public const string InterfacesField = &quot;int&quot;;

        /// &lt;summary&gt;
        /// Interfaces that have been added to this object
        /// &lt;/summary&gt;
        /// &lt;remarks&gt;
        /// We always begin by supporting the _id interface
        /// Order is important, we need to see this field before we can deserialize any others
        /// &lt;/remarks&gt;
        [BsonElement(InterfacesField, Order=2)]
        internal HashSet&lt;string&gt; @int = new HashSet&lt;string&gt;(){ typeof(IId).FullName };

        /// &lt;summary&gt;
        /// A text version of all interfaces - mostly for debugging purposes, stored in alphabetical order
        /// &lt;/summary&gt;
        [BsonIgnore]
        public string InterfacesAsText
        {
            get { return string.Join(&quot;,&quot;, this.@int.OrderBy(i =&gt; i)); }
        }

        /// &lt;summary&gt;
        /// Add support for an interface to this document if it doesn't already have it
        /// &lt;/summary&gt;
        public T AddLike&lt;T&gt;()
            where T : class
        {
            @int.Add(typeof(T).FullName);
            // And also act like any interfaces that interface implements (which will include ones they represent too)
            foreach (var @interface in typeof(T).GetInterfaces())
                @int.Add(@interface.FullName);
            return Impromptu.ActLike&lt;T&gt;(this, this.GetAllInterfaces());
        }

        /// &lt;summary&gt;
        /// Add support for multiple interfaces
        /// &lt;/summary&gt;
        public T AddLike&lt;T&gt;(Type[] otherInterfaces)
            where T : class
        {
            var allInterfaces = otherInterfaces.Concat(new[] { typeof(T) });
            var allInterfacesAndDescendants = allInterfaces.Concat(allInterfaces.SelectMany(x =&gt; x.GetInterfaces()));
            foreach (var @interface in allInterfacesAndDescendants)
                @int.Add(@interface.FullName);
            return Impromptu.ActLike&lt;T&gt;(this, this.GetAllInterfaces());
        }

        /// &lt;summary&gt;
        /// Cast this object to an interface only if it has previously been created as one of that kind
        /// &lt;/summary&gt;
        public T AsLike&lt;T&gt;()
            where T : class
        {
            if (!this.@int.Contains(typeof(T).FullName)) return null;
            else return Impromptu.ActLike&lt;T&gt;(this, this.GetAllInterfaces());
        }

        // A rather large cache of all interface types loaded into the App Domain
        private static List&lt;Type&gt; cacheOfTypes = null;

        // A cache of the interface types corresponding to a given 'key' of interface names
        private static Dictionary&lt;string, Type[]&gt; cacheOfInterfaces = new Dictionary&lt;string, Type[]&gt;();

        public Type[] GetAllInterfaces()
        {
            // We always behave like an object with an Id plus any other interfaces we have
            var key = string.Join(&quot;,&quot;, this.@int.OrderBy(i =&gt; i));
            if (!cacheOfInterfaces.ContainsKey(key))
            {
                if (cacheOfTypes == null)
                {
                    var assemblies = AppDomain.CurrentDomain.GetAssemblies();
                    cacheOfTypes = assemblies.SelectMany(ass =&gt; ass.GetTypes()).Where(t =&gt; t.IsInterface).ToList();
                }
                var interfaces = cacheOfTypes.Where(t =&gt; this.@int.Any(i =&gt; i == t.FullName));

                // Could trim the interfaces to remove any that are inherited from others ...
                cacheOfInterfaces.Add(key, interfaces.ToArray());
            }
            return cacheOfInterfaces[key];
        }

        /// &lt;summary&gt;
        /// Get a mapping from a field name to a type according to the interfaces on this object
        /// &lt;/summary&gt;
        /// &lt;returns&gt;&lt;/returns&gt;
        public Dictionary&lt;string, Type&gt; GetTypeMap()
        {
            Dictionary&lt;string, Type&gt; typeMap = new Dictionary&lt;string, Type&gt;();
            var interfaces = this.GetAllInterfaces();
            foreach (var mi in interfaces.SelectMany(intf =&gt; intf.GetProperties()))
            {
                typeMap[mi.Name] = mi.PropertyType;
            }
            return typeMap;
        }

        /// &lt;summary&gt;
        /// Becomes a Proxy object that acts like it implements all of the interfaces listed as being supported by this Entity
        /// &lt;/summary&gt;
        /// &lt;remarks&gt;
        /// Because the returned object supports ALL of the interfaces that have ever been added to this object
        /// you can cast it to any of them.  This enables a type of polymorphism.
        /// &lt;/remarks&gt;
        public object ActLikeAllInterfacesPresent()
        {
            return Impromptu.DynamicActLike(this, this.GetAllInterfaces());
        }

        [BsonIgnore]
        // BsonIgnore because Bson serialization will happen on the dynamic interface this class exposes not on this dictionary
        private Dictionary&lt;string, object&gt; children = new Dictionary&lt;string, object&gt;();

        /// &lt;summary&gt;
        /// Fetch a property by name
        /// &lt;/summary&gt;
        public override bool TryGetMember(GetMemberBinder binder, out object result)
        {
            if (binder.Name == &quot;_id&quot;) { result = this._id; return true; }
            else if (binder.Name == InterfacesField) { result = this.@int; return true; }
            else
            {
               children.TryGetValue(binder.Name, out result);
               result = null;                         // we hope that it's nullable!  If not you have an issue
               return true;                           // when you do a database migration or query a nullable field it won't be in 'children'
            }
        }

        /// &lt;summary&gt;
        /// Set a property (e.g. person1.Name = &quot;Smith&quot;)
        /// &lt;/summary&gt;
        public override bool TrySetMember(SetMemberBinder binder, object value)
        {
            if (binder.Name == &quot;_id&quot;) { this._id = (ObjectId)value; return true; }      // you shouldn't need to use this
            if (binder.Name == InterfacesField) throw new AccessViolationException(&quot;You cannot set the interfaces directly, use AddLike() instead&quot;);
            if (!this.GetTypeMap().ContainsKey(binder.Name)) throw new ArgumentException(&quot;Property '&quot; + binder.Name + &quot;' not found.  You need to call AddLike to specify the interfaces you want to support.&quot;);
            children[binder.Name] = value;
            return true;
        }

        public override IEnumerable&lt;string&gt; GetDynamicMemberNames()
        {
            return new[]{&quot;_id&quot;, InterfacesField}.Concat(children.Keys);
        }

        /// &lt;summary&gt;
        /// An indexer for use by serialization code
        /// &lt;/summary&gt;
        internal object this[string key]
        {
            get
            {
                if (key == &quot;_id&quot;) return this._id;
                else if (key == InterfacesField) return this.@int;
                else return children[key];
            }

            set
            {
                if (key == &quot;_id&quot; &amp;&amp; value is BsonObjectId) this._id = ((BsonObjectId)value).Value;
                else if (key == &quot;_id&quot;) this._id = (ObjectId)value;
                else if (key == InterfacesField) this.@int = new HashSet&lt;string&gt;((IEnumerable&lt;string&gt;)value);
                else children[key] = value;
            }
        }
    }
}
</pre>
<p>You&#8217;ll need Impromptu interface (from Nuget) to build this.  To use it, you write code like this to save to MongoDB:</p>
<pre class="brush: csharp; title: ; notranslate">
            MongoDynamic entity = new MongoDynamic();
            var user = entity.AddLike&lt;IUser&gt;();         // *** Add the IUser fields to it ...
            user.Name = name;                           // Use it as if it were an IUser
            // save it to the database as normal
</pre>
<p>And to retrieve an object you create a query as normal and then query for MongoDynamic objects like so &#8230;</p>
<pre class="brush: csharp; title: ; notranslate">
            var user = database.GetCollection&lt;MongoDynamic&gt;(&quot;***collectionName***&quot;).FindOne(query);
            if (user == null) return null;
            return user.AsLike&lt;IUser&gt;();
</pre>
<p>Typically you will want your query to reference the field called <strong>int</strong> (where all the interfaces are stored) so you can query for objects that support a specific type (if you do, you&#8217;ll want to add an index on that field).  [NB the name was chosen to be one you were unlikely to ever use in .NET]</p>
<p>MongoDynamic objects are <strong>polymorphic</strong> &#8211; you can morph them to support any other interface at any time like so &#8230;  </p>
<pre class="brush: csharp; title: ; notranslate">
            user.AddLike&lt;ISomeOtherInterface&gt;();
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.abodit.com/2011/09/dynamic-persistence-with-mongodb-look-no-classes-polymorphism-in-c/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Home network crawler &#8211; cataloging every file on the home LAN with C# and MongoDB</title>
		<link>http://blog.abodit.com/2011/08/home-network-crawler-storage-file-mongodb/</link>
		<comments>http://blog.abodit.com/2011/08/home-network-crawler-storage-file-mongodb/#comments</comments>
		<pubDate>Tue, 23 Aug 2011 07:50:13 +0000</pubDate>
		<dc:creator>Ian Mercer</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[MongoDB]]></category>
		<category><![CDATA[My News]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[LAN]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://blog.abodit.com/?p=1556</guid>
		<description><![CDATA[With the addition of two more 3TB drives to the home network it&#8217;s becoming impossible to track files and to remember where each one is and whether it&#8217;s a backup of some other disk or not. There are 8 computers on the home network and over 10TB of storage distributed between them. Much of the <a href="http://blog.abodit.com/2011/08/home-network-crawler-storage-file-mongodb/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p><div id="attachment_1560" class="wp-caption alignright" style="width: 310px"><a href="http://blog.abodit.com/wp-content/uploads/2011/08/279553_10150227932347951_755432950_7379212_7641191_o.jpg"><img src="http://blog.abodit.com/wp-content/uploads/2011/08/279553_10150227932347951_755432950_7379212_7641191_o-300x199.jpg" alt="Map-Reduce in operation in Greenland" title="Map-Reduce in operation in Greenland" width="300" height="199" class="size-medium wp-image-1560" /></a><p class="wp-caption-text">Map-Reduce in action: The glaciers in Greenland &#039;map&#039; the canyon walls into streams of rocks called lateral moraine.  As the glaciers merge these rocks are &#039;reduced&#039; into streams in the middle called &#039;medial&#039; moraine.  (A photo I took over Greenland this summer.)</p></div>With the addition of two more 3TB drives to the home network it&#8217;s becoming impossible to track files and to remember where each one is and whether it&#8217;s a backup of some other disk or not.  There are 8 computers on the home network and over 10TB of storage distributed between them.  Much of the storage is concentrated on a single machine running Windows Server 2008.  It&#8217;s a <a href="/2011/04/finally-got-the-1u-atom-server-racked-up/">low-powered Atom server</a> connected to a Sans Digital <a href="http://www.amazon.com/gp/product/B00365DWBK/ref=as_li_ss_tl?ie=UTF8&#038;tag=abodit-20&#038;linkCode=as2&#038;camp=217145&#038;creative=399373&#038;creativeASIN=B00365DWBK">1U Rackmount Sans Digital disk array</a><img src="http://www.assoc-amazon.com/e/ir?t=&#038;l=as2&#038;o=1&#038;a=B00365DWBK&#038;camp=217145&#038;creative=399373" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> running in JBOD mode (just a bunch of disks).</p>
<p>I&#8217;m not a huge fan or RAID arrays &#8211; they mostly mean there&#8217;s another component to go wrong (the controller card) and when they do go wrong you can lose all your data just as easily as if it were all on one drive.  I prefer a multiple copy strategy, an &#8220;Amazon S3 for the home&#8221; if you like.  The downside of this is that there are multiple copies of each file across the home network and as I have several generations of hard drives the mapping from primary to secondary to tertiary is complex and hard to manage!  It&#8217;s also really hard to find a single file when there are so many places to look and it&#8217;s nigh on impossible to be sure that I have the necessary three copies of every important file in the right places at all times.</p>
<p>So this weekend I embarked on a small project to catalog every file, directory and storage volume on the entire home network including drives that are only sometimes connected.  The software has been running all weekend and is close to cataloging everything.  It&#8217;s found 5 million files so far representing over 6TB of data!</p>
<p>The architecture I chose for this software was an agent that runs on each PC to catalog all of the attached volumes.  This client uploads all the directories and files that it finds to a MongoDB database running on the same Atom server as the main storage array.  The poor little Atom server&#8217;s 4GB of RAM has been in constant use but the server has remained responsive, in part because it boots from an SSD drive.</p>
<p>Each volume, directory and file is represented by a document in MongoDB in a single collection.  The agent calculates an MD5 hash for each file and extracts metadata from MP3, WMA and JPG files.  It also stores all of the key file dates (created, updated, accessed) and references to parent directories, volume identifiers and the currently connected PC.  It does not assume that a volume is always connected to the same computer &#8211; you can unplug an external drive from one and put it somewhere else and it will all work just fine.</p>
<p>I implemented a re-startable tree scan that uses a couple of DateTime stamps to be able to determine which directories need to be scanned during the current pass and which ones have already been scanned.  Any agent can be killed at any time and restarted and it will carry on walking the directory tree right where it left off.  It will even continue correctly in the case where you move a volume from one PC to another.</p>
<p>Each agent uses the Parallel Task library&#8217;s Parallel.ForEach to crawl each volume in parallel and to parse multiple files from each directory simultaneously.</p>
<p>By storing all of the file metadata in Mongo DB it&#8217;s easy to use Map-Reduce to calculate some interesting statistics for the files on the network.</p>
<p>For example, to create a summary of file sizes I can use a Map function:</p>
<pre class="brush: jscript; title: ; notranslate">
function Map() {
	if (this.Size &amp;&amp; this._t == &quot;FileInformation&quot;)
	{
		var size = this.Size;

		if (size &lt; 1024)
			emit (&quot;kb&quot;, {count:1, size:this.Size});
		else if (size &lt; 1024*1024)
			emit (&quot;mb&quot;, {count:1, size:this.Size});
		else if (size &lt; 1024*1024*1024)
			emit (&quot;gb&quot;, {count:1, size:this.Size});
		else if (size &lt; 1024*1024*1024*1024)
			emit (&quot;tb&quot;, {count:1, size:this.Size});
		else
			emit (&quot;tb+&quot;, {count:1, size:this.Size});
	}
}
</pre>
<p>and a reduce function:</p>
<pre class="brush: jscript; title: ; notranslate">
function Reduce(key, arr_values) {

	var count = 0;
	var size = 0;

	for(var i in arr_values)
	{
		count = count + arr_values[i].count;
		size = size + arr_values[i].size;
	}

	return {count:count, size:size};
}
</pre>
<p>Map-Reduce operations like this take about 20 minutes to run (on the Atom server with just 4GB of RAM) whereas any query serviced by one of the indexes on the MongoDB collection is almost instantaneous.</p>
<p>I&#8217;ve been using the excellent <a href="http://www.mongovue.com/" target="_blank">MongoVue</a> to run simple map-reduce scripts like this and to keep track of how quickly the database is growing.</p>
<p>Map-reduce can also be used to find duplicate files &#8211; by emitting the MD5 hash as the key and some information about the file as the value I can find every copy of every file across every computer on the home network.</p>
<p>Since I have the file name and metadata for every file on the home network I can also easily find any file using MongoDB&#8217;s regex matching feature against the path.</p>
<h3>The Hard Parts</h3>
<p>For starters you&#8217;ll need a <a href="http://bcl.codeplex.com/wikipage?title=Long%20Path&#038;referringTitle=Home" target="_blank">library that can handle long file names</a>.  Then you&#8217;ll need to fix it to provide at least the functionality that FileInfo and DirectoryInfo give you in .NET.  </p>
<p>Next you&#8217;ll need to learn about reparse-points and hard-links and you&#8217;ll need to skip over them because with them in place the file system is not a tree; it&#8217;s a cyclical graph in which a simple crawler will quickly get confused or stuck.</p>
<p>You&#8217;ll also want to store the NTFS file Id and the unique Volume ID for every file so you can track it when the file is moved or the removable drive is connected to a different computer.</p>
<h3>So how well does it work?</h3>
<p>This all seems to work really well.  Nearly every volume has now been cataloged.  It&#8217;s located about 5M files occupying over 6TB of space.  The worst case offender for the number of copies of the same file is 100+.  I&#8217;ve used the find feature in MongoDB to find a file I was missing and I&#8217;m better able to plan how to arrange directories and file generations across the various hard drives I have.</p>
<h3>What&#8217;s next</h3>
<p>Well, of course this needs to be connected to the home automation system and my Natural Language engine so you can ask &#8220;send a copy of IMG_0228 from last week to X&#8221; or &#8220;where are all the spreadsheets I created last year?&#8221;  That will be fairly easy.</p>
<p>After that I hope to incorporate backup features into the agents too so they can automatically keep the required number of copies of each file according to its importance.  I&#8217;d also like to set up a rotating set of external drives that go in the fire safe when not connected and when they are connected they get updated with the latest copies of all the important files.</p>
<p>I&#8217;d also like to be able to get the agents to move whole groups of directories around between drives as juggling the directory layout each time a new hard drive is added to the system is always a time consuming process.</p>
<h3>Comments or Questions?</h3>
<p>Does everyone else have a hard time managing multiple computers, hard drives, directories and multiple copies of files?  What tools do you use to do this?  Is there anything commercially available that I could have used instead?  Would a tool like this be useful to you?  Should I publish the code somewhere?  Comments and questions are always welcome here or on twitter.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.abodit.com/2011/08/home-network-crawler-storage-file-mongodb/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>C# Natural Language Engine connected to Microsoft Dynamics CRM 2011 Online</title>
		<link>http://blog.abodit.com/2011/06/c-natural-language-engine-connected-to-microsoft-crm-2011-online/</link>
		<comments>http://blog.abodit.com/2011/06/c-natural-language-engine-connected-to-microsoft-crm-2011-online/#comments</comments>
		<pubDate>Mon, 06 Jun 2011 06:39:46 +0000</pubDate>
		<dc:creator>Ian Mercer</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[My News]]></category>
		<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[CRM]]></category>

		<guid isPermaLink="false">http://blog.abodit.com/?p=1527</guid>
		<description><![CDATA[In an earlier post I discussed some ideas around a Semantic CRM. Recently I&#8217;ve been doing some clean up work on my C# Natural Language Engine and decided to do a quick test connecting it to a real CRM. As you may know from reading my blog, this natural language engine is already heavily used <a href="http://blog.abodit.com/2011/06/c-natural-language-engine-connected-to-microsoft-crm-2011-online/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>In an <a href="http://blog.abodit.com/2011/03/a-semantic-web-ontology-driven-approach-to-crm/">earlier post</a> I discussed some ideas around a <a href="http://blog.abodit.com/2011/03/a-semantic-web-ontology-driven-approach-to-crm/">Semantic CRM</a>.</p>
<p>Recently I&#8217;ve been doing some clean up work on my C# Natural Language Engine and decided to do a quick test connecting it to a real CRM.  As you may know from reading my blog, this natural language engine is already heavily used in my home automation system to control lights, sprinklers, HVAC, music and more and to query caller ID logs and other information.</p>
<p>I recently refactored it to use the Autofac dependency injection framework and in the process realized just how close my NLP engine is to ASP.NET MVC 3 in its basic structure and philosophy!  To use it you create Controller classes and put action methods in them.  Those controller classes use Autofac to get all of the dependencies they may need (services like an email service, a repository, a user service, an HTML email formattting service, &#8230;) and then the methods in them represents a specific sentence parse using the various token types that the NLP engine supports.  Unlike ASP.NET MVC3 there is no Route registration; the method itself represents the route (i.e. sentence structure) that it used to decide which method to call.  Internally my NLP engine has its own code to match incoming words and phrases to tokens and then on to the action methods.  In a sense the engine itself is one big dependency injection framework working against the action methods.  I sometimes wish ASP.NET MVC 3 had the same route-registration-free approach to designing web applications (but also appreciate all the reasons why it doesn&#8217;t).</p>
<p>Another improvement I made recently to the NLP Engine was to develop a connector for the <a href="http://twilio.com">Twilio</a> SMS service.  This means that my home automation system can now accept SMS messages as well as all the other communication formats it supports: email, web chat, XMPP chat and direct URL commands.  My Twilio connector to NLP supports message splitting and batching so it will buffer up outgoing messages to reach the limit of a single SMS and will send that.  This lowers SMS charges and also allows responses that are longer than a single SMS message.</p>
<p>Using this new, improved version of my Natural Language Engine I decided to try connecting it to a CRM.  I chose Microsoft Dynamics CRM 2011 and elected to use the strongly-typed, early-bound objects that you can generate for any instance of the CRM service.  I added some simple sentences in an NLPRules project that allow you to tell it who you met, and to input some of their details.  Unlike a traditional forms-based approach the user can decide what information to enter and what order to enter it in.  The Natural Language Engine supports the concept of a conversation and can remember what you were discussing allowing a much more natural style of conversation that some simple rule-based engines and even allowing it to ask questions and get answers from the user.</p>
<p>Here&#8217;s a screenshot showing a sample conversation using Google Talk (XMPP/Jabber) and the resulting CRM record in Microsoft CRM 2011 Online.   You could have the same conversation over SMS or email.   Click to enlarge.</p>
<p><a href="http://blog.abodit.com/wp-content/uploads/2011/06/NLPCRM.png"><img src="http://blog.abodit.com/wp-content/uploads/2011/06/NLPCRM-300x171.png" alt="A natural language interface to CRM" title="A natural language interface to CRM" width="300" height="171" class="aligncenter size-medium wp-image-1529" /></a></p>
<p>Based on my limited testing this looks like another promising area where a truly fluent, conversational-style natural language engine could play a significant role.  Note how it understands email addresses, phone numbers and such like and in code these all become strongly typed objects.  Where it really excels is in temporal expressions where it can understand things like &#8220;who called on a Saturday in May last year?&#8221; and can construct an efficient SQL query from that.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.abodit.com/2011/06/c-natural-language-engine-connected-to-microsoft-crm-2011-online/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A simple redirect route handler for ASP.NET 3.5 routing</title>
		<link>http://blog.abodit.com/2010/04/a-simple-redirect-route-handler-for-asp-net-3-5-routing/</link>
		<comments>http://blog.abodit.com/2010/04/a-simple-redirect-route-handler-for-asp-net-3-5-routing/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 05:22:43 +0000</pubDate>
		<dc:creator>Ian Mercer</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[MVC]]></category>
		<category><![CDATA[ASP.NET]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Routing]]></category>

		<guid isPermaLink="false">http://blog.abodit.com/?p=739</guid>
		<description><![CDATA[ASP.NET 3.5 Routing is a very powerful tool not just for registering routes for newer ASP.NET MVC applications but also for adding SEO friendly routes to older Webforms (ASPX) applications, or for routing multiple URLs to a single page. But that&#8217;s not all it can do. You can create your own IRouteHandler and then have <a href="http://blog.abodit.com/2010/04/a-simple-redirect-route-handler-for-asp-net-3-5-routing/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>ASP.NET 3.5 Routing is a very powerful tool not just for registering routes for newer ASP.NET MVC applications but also for adding SEO friendly routes to older Webforms (ASPX) applications, or for routing multiple URLs to a single page.  But that&#8217;s not all it can do.  You can create your own IRouteHandler and then have complete control over what to do with any incoming HttpRequest.</p>
<p>Here for example is a way to do a permanent redirect when a given route is matched.  To use it you might, for example, do:-</p>
<pre class="brush: csharp; title: ; notranslate">
            routes.Add(new Route(&quot;sample.aspx&quot;, new RedirectRouteHandler(&quot;/home/start&quot;)));
</pre>
<p>Here is the RedirectRouteHandler that can turn any request into a 301 redirect for you:-</p>
<pre class="brush: csharp; title: ; notranslate">
    /// &lt;summary&gt;
    /// Redirect Route Handler
    /// &lt;/summary&gt;
    public class RedirectRouteHandler : IRouteHandler
    {
        private string newUrl;

        public RedirectRouteHandler(string newUrl)
        {
            this.newUrl = newUrl;
        }

        public IHttpHandler GetHttpHandler(RequestContext requestContext)
        {
            return new RedirectHandler(newUrl);
        }
    }

    /// &lt;summary&gt;
    /// &lt;para&gt;Redirecting MVC handler&lt;/para&gt;
    /// &lt;/summary&gt;
    public class RedirectHandler : IHttpHandler
    {
        private string newUrl;

        public RedirectHandler(string newUrl)
        {
            this.newUrl = newUrl;
        }

        public bool IsReusable
        {
            get { return true; }
        }

        public void ProcessRequest(HttpContext httpContext)
        {
            httpContext.Response.Status = &quot;301 Moved Permanently&quot;;
            httpContext.Response.StatusCode = 301;
            httpContext.Response.AppendHeader(&quot;Location&quot;, newUrl);
            return;
        }
    }
</pre>
<p><strong>Note:</strong> I&#8217;m not saying this is the best or only way to handle this.  You&#8217;ll want to look at Url Rewriting and the Application and Request Routing module for IIS7 in particular.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.abodit.com/2010/04/a-simple-redirect-route-handler-for-asp-net-3-5-routing/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Why functional programming and LINQ is often better than procedural code</title>
		<link>http://blog.abodit.com/2010/04/why-functional-programming-is-better-linq-c-sharp-than-procedural-code/</link>
		<comments>http://blog.abodit.com/2010/04/why-functional-programming-is-better-linq-c-sharp-than-procedural-code/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 18:09:54 +0000</pubDate>
		<dc:creator>Ian Mercer</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[functional]]></category>

		<guid isPermaLink="false">http://blog.abodit.com/?p=726</guid>
		<description><![CDATA[Functional programming is a relatively new component in the C# language.  It can potentially replace for-loops in many situations with simpler code, but the question remains &#8216;what&#8217;s wrong with a good old for loop?&#8217; Here are some of the reasons I think functional programming is important and in particular how LINQ can improve the readability, <a href="http://blog.abodit.com/2010/04/why-functional-programming-is-better-linq-c-sharp-than-procedural-code/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>Functional programming is a relatively new component in the C# language.  It can potentially replace for-loops in many situations with simpler code, but the question remains &#8216;what&#8217;s wrong with a good old for loop?&#8217;</p>
<p>Here are some of the reasons I think functional programming is important and in particular how LINQ can improve the readability, maintainability, and parallelizability (if there were such a word) of your code:</p>
<ol>
<li>Functional approaches are potentially easier to parallelize either manually using PLINQ or by the compiler. As CPUs move to even more cores this may become more important.</li>
<li>Functional approaches make it easier to achieve lazy evaluation in multi-step processes because you can pass the intermediate results to the next step as a simple variable which hasn&#8217;t been evaluated fully yet rather than evaluating the first step entirely and then passing a collection to the next step (or without using a separate method and a yield statement to achieve the same procedurally).</li>
<li>Functional approaches are often shorter and easier to read.</li>
<li>Functional approaches often eliminate complex conditional bodies within for loops (e.g. if statements and &#8216;continue&#8217; statements) because you can break the for loop down into logical steps &#8211; selecting all the elements that match, doing an operation on them, &#8230;</li>
</ol>
<p>These days I opt for the functional syntax more often than not and fall back to for-loops when:-</p>
<p>A. The body of the loop contains complex logic that cannot be disentangled into a cleaner sequential application of functions and it simply easier to just write a for-loop with the complex conditional code in it.</p>
<p>B. The task is inherently not functional, i.e. has side effects</p>
<p>C. The task needs exception handling in it. Sure you can write big lambda blocks with try catch in them but at some point it becomes easier and cleaner just to use a for-loop.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.abodit.com/2010/04/why-functional-programming-is-better-linq-c-sharp-than-procedural-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Development Tools and Libraries I use</title>
		<link>http://blog.abodit.com/net/development-tools-and-libraries-i-use/</link>
		<comments>http://blog.abodit.com/net/development-tools-and-libraries-i-use/#comments</comments>
		<pubDate>Mon, 29 Mar 2010 23:36:40 +0000</pubDate>
		<dc:creator>Ian Mercer</dc:creator>
				<category><![CDATA[Home Automation]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">http://blog.abodit.com/?page_id=300</guid>
		<description><![CDATA[Libraries and Code Snippets TweetSharp Predicate Builder for LINQ query building: http://www.albahari.com/nutshell/predicatebuilder.aspx Useful tools LinqPad  http://www.linqpad.net/ Source Control Subversion with TortoiseSVN Continuous Integration JetBrains TeamCity, recently moved off CruiseControl.NET Deployment Subversion as a repository for binary images, custom deployment code Web Server IIS7 Useful articles PHP: http://devzone.zend.com/article/627]]></description>
			<content:encoded><![CDATA[<p><strong>Libraries and Code Snippets</strong></p>
<p>TweetSharp</p>
<p>Predicate Builder for LINQ query building: <a href="http://www.albahari.com/nutshell/predicatebuilder.aspx">http://www.albahari.com/nutshell/predicatebuilder.aspx</a></p>
<p><strong>Useful tools</strong></p>
<p>LinqPad  <a href="http://www.linqpad.net/">http://www.linqpad.net/</a></p>
<p><strong>Source Control</strong></p>
<p>Subversion with TortoiseSVN</p>
<p><strong>Continuous Integration</strong></p>
<p>JetBrains TeamCity, recently moved off CruiseControl.NET</p>
<p><strong>Deployment</strong></p>
<p>Subversion as a repository for binary images, custom deployment code</p>
<p><strong>Web Server</strong></p>
<p>IIS7</p>
<p><strong>Useful articles</strong></p>
<p>PHP: <a href="http://devzone.zend.com/article/627">http://devzone.zend.com/article/627</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.abodit.com/net/development-tools-and-libraries-i-use/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Putting a feedback button on every page with ASP.NET MVC and JQuery</title>
		<link>http://blog.abodit.com/2010/03/feedback-button-asp-net-mvc-jquery/</link>
		<comments>http://blog.abodit.com/2010/03/feedback-button-asp-net-mvc-jquery/#comments</comments>
		<pubDate>Sat, 13 Mar 2010 08:57:52 +0000</pubDate>
		<dc:creator>Ian Mercer</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[CSS]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[jQuery]]></category>
		<category><![CDATA[MVC]]></category>
		<category><![CDATA[UI]]></category>

		<guid isPermaLink="false">http://blog.abodit.com/?p=613</guid>
		<description><![CDATA[You&#8217;ve probably seen many web sites with the floating &#8216;feedback&#8217; button down the side. Here&#8217;s how to add one to your site using jQuery, jQuery UI and ASP.NET MVC. First make sure you have jQuery and jQuery UI referenced in your master page view together with the CSS file for whichever jQuery UI theme you <a href="http://blog.abodit.com/2010/03/feedback-button-asp-net-mvc-jquery/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.abodit.com/wp-content/uploads/2010/03/FeedbackButton.png"><img src="http://blog.abodit.com/wp-content/uploads/2010/03/FeedbackButton.png" alt="Feedback button" title="Feedback Button" width="233" height="629" class="alignright size-full wp-image-616" /></a></p>
<p>You&#8217;ve probably seen many web sites with the floating &#8216;feedback&#8217; button down the side.  Here&#8217;s how to add one to your site using jQuery, jQuery UI and ASP.NET MVC.</p>
<p>First make sure you have jQuery and jQuery UI referenced in your master page view together with the CSS file for whichever jQuery UI theme you have chosen.</p>
<p>We&#8217;ll make a few changes to the master page view to add the pop-up feedback form, we&#8217;ll add an action on a controller to accept the feedback that is posted, and we&#8217;ll need a small amount of CSS.</p>
<p>So, after referencing those javascript files and the theme CSS, the first thing to do is to add the following HTML to the bottom of your master page view:</p>
<pre class="brush: xml; title: ; notranslate">
            &lt;div id=&quot;feedbackdialog&quot; style=&quot;width:300px; height:300px;text-align:left;&quot;&gt;
                &lt;p&gt;Your name and/or email: &lt;br /&gt;
                &lt;input type=&quot;text&quot; id=&quot;feedbackEmail&quot; name=&quot;feedbackEmail&quot; size=&quot;34&quot; value=&quot;&lt;%: this.Model.AccountEmailOrEmpty %&gt;&quot; /&gt;
                &lt;/p&gt;
                &lt;p&gt;Comment:&lt;br /&gt;
                &lt;textarea id=&quot;feedbackComment&quot; name=&quot;comment&quot; cols=&quot;35&quot; rows=&quot;5&quot;&gt;&lt;/textarea&gt;&lt;/p&gt;
                &lt;br /&gt;
                &lt;div id=&quot;feedbackResult&quot;&gt;&lt;/div&gt;
            &lt;/div&gt;
</pre>
<p>Now add this code to your global javascript file that also referenced from your master page view &#8230; don&#8217;t embed it in the page, go ahead and do the right thing and put it in a .js file so it&#8217;s not a burden on every page.</p>
<pre class="brush: jscript; title: ; notranslate">
//function for the feedback form
$(document).ready(
    function () {
        /* Create the feedback dialog */

        $(&quot;#feedbackdialog&quot;).dialog(
        {
            closeOnEscape: true,
            modal: true,
            autoOpen: false,
            resizable: false,
            title: 'Feedback',
            width: 400,
            buttons: { &quot;Send&quot;: function () {
                var dlg = $(this);
                $.post(&quot;/corporate/suggest&quot;,
                        {
                            email: dlg.find(&quot;input[name='feedbackEmail']&quot;).val(),
                            comment: dlg.find(&quot;#feedbackComment&quot;).val(),
                            url: document.location.href
                         },
                        function (data) {
                            dlg.dialog('close');
                        }
                );
                $(this).html(&quot;&lt;p id='feedBackSending'&gt;Sending&lt;/p&gt;&quot;).dialog({ buttons: {} });
            }
            }
        });

        $('.contact_us').click(function () {
            $(&quot;#feedbackdialog&quot;).dialog(&quot;open&quot;);
            return false;
        });
    });
</pre>
<p>Next we&#8217;ll add the action referenced here, in the example we used the url &#8216;/corporate/suggest&#8217; so, assuming you have a controller called CorporateController, add the following action to it &#8230;</p>
<pre class="brush: csharp; title: ; notranslate">
        public ActionResult Suggest (string email, string comment, string url)
        {
            if (!string.IsNullOrWhiteSpace(comment))
            {
                // here we will log the feedback to the database and/or send it in email
            }
            return View();
        }
</pre>
<p>Create a view for &#8216;Suggest&#8217;, it doesn&#8217;t matter what&#8217;s in it as we don&#8217;t use the result currently.</p>
<p>And, finally we need a bit of CSS for the feedback icon itself:</p>
<pre class="brush: css; title: ; notranslate">
/* Feedback tab */
#feedbackTab
{
	right:0;
    position:fixed;
    width:32px;
    height:150px;
    top: 150px;
    z-index:1;
}
</pre>
<p>The feedback button now floats on every page, 150px from the top and it&#8217;s glued to the right hand side.</p>
<p>Of course you&#8217;ll need your own feedback image, or feel free to borrow the one here:- <a href="http://www.signswift.com/images/feedback.png">http://www.signswift.com/images/feedback.png</a></p>
<p>So with that all in place, click the feedback button and a form like this should appear.  Fill the information in and send it to the server.  Note how we silently grab the url of the page too so we can see which page they were on when the submitted the feedback.</p>
<p><a href="http://blog.abodit.com/wp-content/uploads/2010/03/FeedbackForm.png"><img src="http://blog.abodit.com/wp-content/uploads/2010/03/FeedbackForm.png" alt="Feedback Form" title="Feedback Form" width="646" height="488" class="aligncenter size-full wp-image-623" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.abodit.com/2010/03/feedback-button-asp-net-mvc-jquery/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A simple web crawler in C# using HtmlAgilityPack</title>
		<link>http://blog.abodit.com/2010/03/a-simple-web-crawler-in-c-using-htmlagilitypack/</link>
		<comments>http://blog.abodit.com/2010/03/a-simple-web-crawler-in-c-using-htmlagilitypack/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 17:35:15 +0000</pubDate>
		<dc:creator>Ian Mercer</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[IIS]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[crawler]]></category>

		<guid isPermaLink="false">http://blog.abodit.com/?p=595</guid>
		<description><![CDATA[]]></description>
			<content:encoded><![CDATA[<pre class="brush: csharp; title: ; notranslate">
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
using System.Net;

namespace LinkChecker.WebSpider
{
    /// &lt;summary&gt;
    /// A result encapsulating the Url and the HtmlDocument
    /// &lt;/summary&gt;
    public abstract class WebPage
    {
        public Uri Url { get; set; }

        /// &lt;summary&gt;
        /// Get every WebPage.Internal on a web site (or part of a web site) visiting all internal links just once
        /// plus every external page (or other Url) linked to the web site as a WebPage.External
        /// &lt;/summary&gt;
        /// &lt;remarks&gt;
        /// Use .OfType WebPage.Internal to get just the internal ones if that's what you want
        /// &lt;/remarks&gt;
        public static IEnumerable&lt;WebPage&gt; GetAllPagesUnder(Uri urlRoot)
        {
            var queue = new Queue&lt;Uri&gt;();
            var allSiteUrls = new HashSet&lt;Uri&gt;();

            queue.Enqueue(urlRoot);
            allSiteUrls.Add(urlRoot);

            while (queue.Count &gt; 0)
            {
                Uri url = queue.Dequeue();

                HttpWebRequest oReq = (HttpWebRequest)WebRequest.Create(url);
                oReq.UserAgent = @&quot;Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5&quot;;

                HttpWebResponse resp = (HttpWebResponse)oReq.GetResponse();

                WebPage result;

                if (resp.ContentType.StartsWith(&quot;text/html&quot;, StringComparison.InvariantCultureIgnoreCase))
                {
                    HtmlDocument doc = new HtmlDocument();
                    try
                    {
                        var resultStream = resp.GetResponseStream();
                        doc.Load(resultStream); // The HtmlAgilityPack
                        result = new Internal() { Url = url, HtmlDocument = doc };
                    }
                    catch (System.Net.WebException ex)
                    {
                        result = new WebPage.Error() { Url = url, Exception = ex };
                    }
                    catch (Exception ex)
                    {
                        ex.Data.Add(&quot;Url&quot;, url);    // Annotate the exception with the Url
                        throw;
                    }

                    // Success, hand off the page
                    yield return new WebPage.Internal() { Url = url, HtmlDocument = doc };

                    // And and now queue up all the links on this page
                    foreach (HtmlNode link in doc.DocumentNode.SelectNodes(@&quot;//a[@href]&quot;))
                    {
                        HtmlAttribute att = link.Attributes[&quot;href&quot;];
                        if (att == null) continue;
                        string href = att.Value;
                        if (href.StartsWith(&quot;javascript&quot;, StringComparison.InvariantCultureIgnoreCase)) continue;      // ignore javascript on buttons using a tags

                        Uri urlNext = new Uri(href, UriKind.RelativeOrAbsolute);

                        // Make it absolute if it's relative
                        if (!urlNext.IsAbsoluteUri)
                        {
                            urlNext = new Uri(urlRoot, urlNext);
                        }

                        if (!allSiteUrls.Contains(urlNext))
                        {
                            allSiteUrls.Add(urlNext);               // keep track of every page we've handed off

                            if (urlRoot.IsBaseOf(urlNext))
                            {
                                queue.Enqueue(urlNext);
                            }
                            else
                            {
                                yield return new WebPage.External() { Url = urlNext };
                            }
                        }
                    }
                }
            }
        }

        ///// &lt;summary&gt;
        ///// In the future might provide all the images too??
        ///// &lt;/summary&gt;
        //public class Image : WebPage
        //{
        //}

        /// &lt;summary&gt;
        /// Error loading page
        /// &lt;/summary&gt;
        public class Error : WebPage
        {
            public int HttpResult { get; set; }
            public Exception Exception { get; set; }
        }

        /// &lt;summary&gt;
        /// External page - not followed
        /// &lt;/summary&gt;
        /// &lt;remarks&gt;
        /// No body - go load it yourself
        /// &lt;/remarks&gt;
        public class External : WebPage
        {
        }

        /// &lt;summary&gt;
        /// Internal page
        /// &lt;/summary&gt;
        public class Internal : WebPage
        {
            /// &lt;summary&gt;
            /// For internal pages we load the document for you
            /// &lt;/summary&gt;
            public virtual HtmlDocument HtmlDocument { get; internal set; }
        }
    }
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.abodit.com/2010/03/a-simple-web-crawler-in-c-using-htmlagilitypack/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Using Exception.Data to add additional information to an Exception</title>
		<link>http://blog.abodit.com/2010/03/using-exception-data-to-add-additional-information-to-an-exception/</link>
		<comments>http://blog.abodit.com/2010/03/using-exception-data-to-add-additional-information-to-an-exception/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 07:15:05 +0000</pubDate>
		<dc:creator>Ian Mercer</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Exception]]></category>

		<guid isPermaLink="false">http://blog.abodit.com/?p=507</guid>
		<description><![CDATA[Introduction Whether you are writing a WinForms application or a complex .NET web site, you will invariably be catching exceptions, logging them and reporting them somewhere. (In this post, I&#8217;m not going to explain how to log exceptions). Simply reporting the exception as-thrown rarely captures enough information to be able to diagnose what happened. A FileNotFoundException for <a href="http://blog.abodit.com/2010/03/using-exception-data-to-add-additional-information-to-an-exception/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>Whether you are writing a WinForms application or a complex .NET web  site, you will invariably be catching exceptions, logging them and  reporting them somewhere. (In this post, I&#8217;m not going to explain how  to log exceptions). Simply reporting the exception as-thrown rarely  captures enough information to be able to diagnose what happened. A <code>FileNotFoundException </code>for instance isn&#8217;t much use unless you know which file it was.</p>
<p>One way to deal with this issue is to wrap an exception up in a more  explicit exception that includes the extra information, e.g.</p>
<pre class="brush: csharp; title: ; notranslate">
string filename ...
try
{
   //... do something with the file
}
catch (FileNotFoundException ex)
{
   CustomException ex2 = new CustomException(&quot;Missing cache: &quot; + filename&quot;, ex);
   throw ex2;
}
</pre>
<p>This approach works but it leads to a lot of custom exceptions that  are just extra work to create and maintain.  Sometimes you&#8217;ll want a  custom exception because you are going to handle it in a different way  in some outer scope, but often you just want to log  the error and redirect the user to an error page as there is nothing else you can do to fix the problem.</p>
<p>In cases like this, you  can simplify things greatly by using the little-known <code>.data </code>property on an  Exception. This is an <code>IDictionary </code>for a &#8220;collection of  key/value pairs that provide additional user-defined information about  the exception&#8221; [MSDN].</p>
<p>Using this approach, you can write:</p>
<pre class="brush: csharp; title: ; notranslate">
try
{
   ...
}
catch (FileNotFoundException ex)
{
   ex.Data.Add(&quot;cache filename&quot;, filename);
   throw;
}
</pre>
<p>Each surrounding scope can include a similar <code>Try</code>-<code>Catch </code>that adds more information to <code>.Data </code>so by the time you get to the top-most scope you have added a complete picture as to what might have caused the exception.  And in doing so you haven&#8217;t lost any of the StackTrace information, nor have you wrapped the exception up needlessly in another exception.</p>
<p>At a higher level in your <em>Global.asax</em> file where you catch  all unhandled exceptions, you can add even more to the <code>.Data collection and perhaps include </code>all the interesting parameters on <code>HttpContext </code>like <code>RawUrl</code>,  cookies, &#8230;</p>
<pre class="brush: csharp; title: ; notranslate">

ex.Data.Add(&quot;RawUrl&quot;, request.RawUrl);
try
{
   foreach (string cookieName in request.Cookies)
   {
      try
      {
         HttpCookie cookie = request.Cookies[cookieName];
         string key = &quot;Cookie &quot; + cookie.Path + &quot; &quot; + cookieName;
         if (!ex.Data.Contains(key))
         {
             ex.Data.Add(key, cookie.Value.ToString());
         }
      }
      catch
      {
         // deliberately nothing in here, should
         // never happen, just being cautious
      }
   }
   // An extension method I use to spot bots - write your own ...
   if (request.IsABot())
   {
      ex.Data.Add(&quot;BOT&quot;, &quot;************* BOT *****************&quot;);
   }
   ex.Data.Add(&quot;UserAgent&quot;, request.UserAgent);
   ex.Data.Add(&quot;Referrer&quot;, request.UrlReferrer);
   ex.Data.Add(&quot;User Host&quot;, request.UserHostName);
}
catch
{
   // deliberately nothing in here, should
   // never happen, just being cautious
   // but we definitely don't want to cause
   // an exception while handling one!
}
</pre>
<h2>Exception Reporting Code</h2>
<p>Now in your exception reporting code, you can write out the exception  message and stack trace followed by a dump of all the key value pairs  in <code>.Data</code>. I tend to use log4net on each server writing to a  rolling log file and SQL server to capture the exception data  centrally. For SQL, you&#8217;ll probably want one table for the Exception  itself and another table with a row for each key/value pair in <code>.Data</code>.</p>
<h2>Comments</h2>
<p>One cause of Exceptions on web servers is bots and client-side &#8216;web  accelerators&#8217;.  Both of these can hit pages with incorrect or outdated  parameters that you simply didn&#8217;t expect to receive. That&#8217;s why I add a  BOT warning on every exception as the exception itself may seem severe  but in reality it&#8217;s benign and no user has ever seen it.  I even found one  antivirus product that takes each request you make and sends  the URL to Japan where another server makes a second request back to  check the page for viruses! It even pretends not to be a Bot in the  UserAgent and of course, all your &#8216;security- through-obscurity&#8217; URLs are  now sitting on a server in Japan, but you know security through  obscurity is no security at all, right?</p>
<p>Another browser add on called FunWebProducts would routinely corrupt  Viewstate information so if you see that in your exceptions log, you  know who to blame.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.abodit.com/2010/03/using-exception-data-to-add-additional-information-to-an-exception/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A strongly-typed natural language engine (C# NLP)</title>
		<link>http://blog.abodit.com/2010/02/a-strongly-typed-natural-language-engine-c-nlp/</link>
		<comments>http://blog.abodit.com/2010/02/a-strongly-typed-natural-language-engine-c-nlp/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 07:05:53 +0000</pubDate>
		<dc:creator>Ian Mercer</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[Home Automation]]></category>
		<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[C#]]></category>

		<guid isPermaLink="false">http://blog.abodit.com/?p=491</guid>
		<description><![CDATA[Here is an explanation of the natural language engine that powers my home automation system. It&#8217;s a strongly-typed natural language engine with tokens and sentences being defined in code. It currently understands sentences to control lights, heating, music, sprinklers, &#8230; You can ask it who called, you can tell it to play music in a <a href="http://blog.abodit.com/2010/02/a-strongly-typed-natural-language-engine-c-nlp/" class="more-link">More &#62;</a>]]></description>
			<content:encoded><![CDATA[<p>Here is an explanation of the natural language engine that powers my home automation system.  It&#8217;s a strongly-typed natural language engine with tokens and sentences being defined in code.  It currently understands sentences to control lights, heating, music, sprinklers, &#8230; You can ask it who called, you can tell it to play music in a particular room, &#8230;  it tells you when a car comes down the drive, when the traffic is bad on I-90, when there&#8217;s fresh snow in the mountains, when it finds new podcasts from NPR, &#8230; and much more.</p>
<p>The natural language engine itself is a separate component that I hope one day to use in other applications.</p>
<h2>Existing Natural Language Engines</h2>
<ul>
<li>Have a large, STATIC dictionary data file</li>
<li>Can parse complex sentence structure</li>
<li>Hand back a tree of tokens (strings)</li>
<li>Don’t handle conversations</li>
</ul>
<h2>C# NLP Engine</h2>
<ul>
<li>Defines strongly-typed tokens in code</li>
<li>Uses type inheritance to model ‘is a’</li>
<li>Defines sentences in code</li>
<li>Rules engine executes sentences</li>
<li>Understands context (conversation history)</li>
</ul>
<h2>Sample conversation</h2>
<h1><a href="http://blog.abodit.com/wp-content/uploads/2010/02/SampleConversation.png"><img class="alignnone size-full wp-image-497" title="Sample Conversation with Natural Language Engine (NLP)" src="http://blog.abodit.com/wp-content/uploads/2010/02/SampleConversation.png" alt="" width="1008" height="635" /></a></h1>
<h2>Goals</h2>
<ul>
<li>Make it easy to define tokens and sentences (not XML)</li>
<li>Safe, compile-time checked definition of the syntax and grammar (not XML)</li>
<li>Model real-world inheritance with C# class inheritance:</li>
<li>‘a labrador’ is ‘a dog’ is ‘an animal’ is ‘a thing’</li>
<li>Handle ambiguity, e.g.</li>
</ul>
<pre>play something <span style="text-decoration: underline;">in</span> the air tonight <span style="text-decoration: underline;">in</span> the kitchen
remind me <span style="text-decoration: underline;">at 4pm</span> to call john <span style="text-decoration: underline;">at 5pm</span></pre>
<h2>C# NLP Engine Structure</h2>
<p><a href="http://blog.abodit.com/wp-content/uploads/2010/02/NLPStructure.png"><img class="alignnone size-full wp-image-500" title="Natural Language Engine Structure" src="http://blog.abodit.com/wp-content/uploads/2010/02/NLPStructure.png" alt="" width="1008" height="630" /></a></p>
<h2>Tokens &#8211; Token Definition</h2>
<ul>
<li> A hierarchy of Token-derived classes</li>
<li> Uses inheritance, e.g. <strong>TokenOn</strong> is a <strong>TokenOnOff</strong> is a <strong>TokenState</strong> is a <strong>Token</strong>.  This allows a single sentence rule to handle multiple cases, e.g. On and Off</li>
<li> Derived from base Token class</li>
<li> Simple tokens are a set of words, e.g. « is | are »</li>
<li> Complex tokens have a parser, e.g. TokenDouble</li>
</ul>
<h2>A Simple Token Definition</h2>
<pre class="brush: csharp; title: ; notranslate">
public class TokenPersonalPronoun : TokenGenericNoun
{
   internal static string wordz { get { return &quot;he,him,she,her,them&quot;; } }
}
</pre>
<ul>
<li> Recognizes any of the words specified</li>
<li> Can use inheritance (as in this example)</li>
</ul>
<h2>A Complex Token</h2>
<pre class="brush: csharp; title: ; notranslate">
public abstract class TokenNumber : Token
{
  public static IEnumerable&lt;TokenResult&gt; Initialize(string input)
  {
  …
</pre>
<ul>
<li>Initialize method parses input and returns one or more possible parses.</li>
</ul>
<p><strong>TokenNumber</strong> is a good example:</p>
<ul>
<li>Parses any numeric value and returns one or more of TokenInt, TokenLong, TokenIntOrdinal, TokenDouble, or TokenPercentage results.</li>
</ul>
<h2>The catch-all <strong>TokenPhrase</strong></h2>
<pre class="brush: csharp; title: ; notranslate">public class TokenPhrase : Token</pre>
<p>TokenPhrase matches anything, especially anything in quote marks</p>
<pre>   e.g. add a reminder "call Bruno at 4pm"</pre>
<p>The sentence signature to recognize this could be</p>
<pre>(…, TokenAdd, TokenReminder, TokenPhrase, TokenExactTime)</pre>
<p>This would match the rule too …</p>
<pre>add a reminder <em>discuss 6pm conference call with Bruno</em> at 4pm</pre>
<h2>TemporalTokens</h2>
<p>A complete set of tokens and related classes for representing time</p>
<ul>
<li>Point in time, e.g. today at 5pm</li>
<li>Approximate time, e.g. who called at 5pm today</li>
<li>Finite sequence, e.g. every Thursday in May 2009</li>
<li>Infinite sequence, e.g. every Thursday</li>
<li>Ambiguous time with context, e.g. remind me on Tuesday (context means it is next Tuesday)</li>
<li>Null time</li>
<li>Unknowable/incomprehensible time</li>
</ul>
<h2>TemporalTokens (Cont.)</h2>
<p>Code to merge any sequence of temporal tokens to the smallest canonical representation,</p>
<p>e.g.</p>
<p>the first thursday in may 2009</p>
<p>-&gt;<br />
{TIMETHEFIRST the first} + {THURSDAY thursday} + {MAY in may} + {INT 2009 -&gt; 2009}</p>
<p>-&gt;<br />
[TEMPORALSETFINITESINGLEINTERVAL [Thursday 5/7/2009] ]</p>
<h1>TemporalTokens (Cont.)</h1>
<p>Finite <strong>TemporalClasses</strong> provide</p>
<li>A way to enumerate the DateTimeRanges they cover</li>
<p>All <strong>TemporalClasses </strong>provide</p>
<li>A LINQ expression generator and Entity-SQL expression generator allowing them to be used to query a database</li>
<h1>Existing Token Types</h1>
<ul>
<li>Numbers (double, long, int, percentage, phone, temperature)</li>
<li>File names, Directories</li>
<li>URLs, Domain names</li>
<li>Names, Companies, Addresses</li>
<li>Rooms, Lights, Sensors, Sprinklers, …</li>
<li>States (On, Off, Dim, Bright, Loud, Quiet, …)</li>
<li>Units of Time, Weight, Distance</li>
<li>Songs, albums, artists, genres, tags</li>
<li>Temporal expressions</li>
<li>Commands, verbs, nouns, pronouns, …</li>
</ul>
<h2>Rules &#8211; A simple rule</h2>
<pre class="brush: csharp; title: ; notranslate">
   /// &amp;lt;summary&amp;gt;
   /// Set a light to a given state
   /// &amp;lt;/summary&amp;gt;
   private static void LightState(NLPState st, TokenLight tlight, TokenStateOnOff ts)
   {
      if (ts.IsTrueState == true) tlight.ForceOn(st.Actor);
      if (ts.IsTrueState == false) tlight.ForceOff(st.Actor);
      st.Say(&quot;I turned it &quot; + ts.LowerCased);
   }
</pre>
<p>Any method matching this signature is a sentence rule:-   NLPState, Token*</p>
<p>Rule matching respects inheritance, and variable repeats … (NLPState st, TokenThing tt, TokenState tokenState, TokenTimeConstraint[] constraints)</p>
<p>Rules are discovered on startup using Reflection and an efficient parse graph is built allowing rapid detection and rejection of incoming sentences.</p>
<h2>State &#8211; NLPState</h2>
<ul>
<li>Every sentence method takes an NLPState first parameter</li>
<li>State includes RememberedObject(s) allowing sentences to react to anything that happened earlier in a conversation</li>
<li>Non-interactive uses can pass a dummy state</li>
<li>State can be per-user or per-conversation for non-realtime conversations like email
<ul>
<li>Chat (e.g Jabber/Gtalk)</li>
<li>Web chat</li>
<li>Email</li>
<li>Calendar (do X at time Y)</li>
<li>Rich client application</li>
</ul>
<ul>
<li>Strongly-typed natural language engine</li>
<li>Compile time checking, inheritance, …</li>
<li>Define tokens and sentences (rules) in C#</li>
<li>Strongly-typed tokens: numbers, percentages, times, dates, file names, urls, people, business objects, …</li>
<li>Builds an efficient parse graph</li>
<li>Tracks conversation history</li>
</ul>
</li>
<h2>User Interface</h2>
<p>Works with a variety of user interfaces</p>
<h2>Summary</h2>
<h1>Future plans</h1>
<p><strong>Expanded corpus of knowledge</strong></p>
<li>Company names, locations, documents, …</li>
<p><strong>Generate iCal/Gdata Recurrence</strong></p>
<li>From TimeExpressions</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.abodit.com/2010/02/a-strongly-typed-natural-language-engine-c-nlp/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

