Posts tagged C#

Updated Release of the Abodit State Machine

I published a new version of the Abodit State Machine to Nuget this evening. You can find it here.

One breaking change in this version is that the state machine is now specified using three Type parameters instead of two:

public class OccupancyStateMachine : 
          StateMachine<OccupancyStateMachine, Event, BuildingArea>

The third type parameter, TContext, is a context object that can be passed in with every event occurrence or tick. This means that you don’t need to store any extraneous data in the state machine itself and can keep it as a pure representation of the state of the system.

In the example above I have an OccupancyStateMachine and the context is a BuildingArea. Each call to EventHappens now takes the event that happened and a BuildingArea object.

When you define your state machine you will need to include 4 parameters in each lambda expression.

Here, for example, is the current state machine for a BuildingArea in my home automation. It uses a hierarchy of states with two base states: Not Occupied and Occupied. It has timers for activity within a room or for occupancy within rooms that are contained by a floor. Note how it also exposes an IObservable<State> so that other objects can subscribe to state machine changes. I didn’t want to take the Rx dependency in the state machine class itself but you can see how easy it is to hook it up.

Of interest also is the way I represent occupancy as three distinct states, the extra one ‘Asleep’ represents a room that is not-occupied in the sense that there is no motion there now but there was at some point during the evening before.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Abodit.StateMachine;
using log4net;
using Abodit.Units;
using AboditUnits.Units;
using System.Reactive.Subjects;
using System.Reactive.Linq;

namespace Abodit
{
    /// <summary>
    /// An Occupancy State machine handles not occupied, occupied, asleep
    /// </summary>
    [Serializable]
    public class OccupancyStateMachine : StateMachine<OccupancyStateMachine, Event, BuildingArea>
    {
        private readonly Subject<State> watch = new Subject<State>();
        public IObservable<State> Watch { get { return watch.AsObservable(); } }

        public override void OnStateChanging(StateMachine<OccupancyStateMachine, Event, BuildingArea>.State newState, BuildingArea context)
        {
            watch.OnNext(newState);
        }

        public static readonly State Starting = AddState("Starting");

        public static readonly State NotOccupied = AddState("Not occupied",
                (m, e, s, c) => { 
                                m.CancelScheduledEvent(eTick);          // Stop the clock
                                m.IsTimerRunning = false;
                                m.IsRecentlyOccupied = false;
                                m.IsHeavilyOccupied = false;
                                m.After(new TimeSpan(hours:0, minutes:5, seconds:0), e5MinutesSinceOccupied);
                                m.After(new TimeSpan(hours:24, minutes:0, seconds:0), e24hoursSinceOccupied);
                                m.After(new TimeSpan(hours:48, minutes:0, seconds:0), e48hoursSinceOccupied);
                             },
                (m, e, s, c) => { });

        public static readonly State NotOccupiedIn5Minutes = AddState("Not occupied in over 5 minutes",
                (m, e, s, c) => { },
                (m, e, s, c) => { }, NotOccupied);

        public static readonly State NotOccupiedInOver24Hours = AddState("Not occupied in over 24 hours",
                (m, e, s, c) => { },
                (m, e, s, c) => { }, NotOccupiedIn5Minutes);

        public static readonly State NotOccupiedInOver48Hours = AddState("Not occupied in over 48 hours",
                (m, e, s, c) => { },
                (m, e, s, c) => { }, NotOccupiedInOver24Hours);

        public static readonly State NotOccupiedInOver1Week = AddState("Not occupied in over 1 week",
                (m, e, s, c) => { },
                (m, e, s, c) => { }, NotOccupiedInOver48Hours);

        public static readonly State Asleep = AddState("Asleep",
                (m, e, s, c) =>
                {
                    // Set a timer going for morning
                    var now = TimeProvider.Current.Now.LocalDateTime;
                    var morning = now.Hour < 8 ? now.AddHours(-now.Hour + 8) : now.AddHours(24 - now.Hour + 8);
                    m.At(morning.ToUniversalTime(), eMorning);
                },
                (m, e, s, c) => { },
                parent:NotOccupied);

        public static readonly State Occupied = AddState("Occupied",
                (m, e, s, c) =>
                {
                    m.IsRecentlyOccupied = true;
                    // Add a timer that runs while we are occupied
                    m.Every(new TimeSpan(hours:0, minutes:0, seconds:10), eTick);
                    // And set a timer going to mark 5 minutes since occupied
                    m.After(new TimeSpan(hours:0, minutes:5, seconds:0), e5MinutesAfterBecomingOccupied);
                    m.CancelScheduledEvent(e5MinutesSinceOccupied);
                    m.CancelScheduledEvent(e24hoursSinceOccupied);
                    m.CancelScheduledEvent(e48hoursSinceOccupied);
                },
                (m, e, s, c) => { });

        public static readonly State HeavilyOccupied = AddState("Heavily occupied",
                (m, e, s, c) => { },
                (m, e, s, c) => { },
                parent:Occupied);

        private static readonly Event eStart = new Event("Starts");
        private static readonly Event eUserActivity = new Event("User activity");
        private static readonly Event eTick = new Event("Tick");
        private static readonly Event eTimeout = new Event("Timeout");
        private static readonly Event eMorning = new Event("Morning");
        private static readonly Event e5MinutesAfterBecomingOccupied = new Event("5 minutes after becoming occupied");
        private static readonly Event e5MinutesSinceOccupied = new Event("5 minutes since occupied");
        private static readonly Event e24hoursSinceOccupied = new Event("24 hours since occupied");
        private static readonly Event e48hoursSinceOccupied = new Event("48 hours since occupied");

        private static readonly Event eAllChildrenNotOccupied = new Event("No child occupied");
        private static readonly Event eAtLeastOneChildOccupied = new Event("At least one child occupied");

        private double decliningActivity = 0.0;         // Up 1000 every UserInput, down x0.9 every n seconds
        private const int ActivityPerUserInput = 1000;
        private const double rateOfDecline = 0.92;

        public bool IsTimerRunning { get; set; }
        public bool IsRecentlyOccupied { get; set; }
        public bool IsHeavilyOccupied { get; set; }

        static OccupancyStateMachine()
        {
            // On startup we transition immediately to starting
            // but we want an event call to do this so we aren't doing any work
            // in the constructor, and so the initialization only happens when it's
            // a true 'cold start' not a 'warm start' from some database state
            Starting
                .When(eStart, (m, s, e, c) => { return NotOccupied; });

            // Note: This is a hierarchical state machine so NotOccupied includes Asleep
            NotOccupied
                .When(eAtLeastOneChildOccupied, (m, s, e, c) => 
                {
                    return Occupied;
                })
                .When(e5MinutesSinceOccupied, (m, s, e, c) =>
                {
                    // Could signal something??
                    return s;
                })
                .When(e24hoursSinceOccupied, (m, s, e, c) =>
                {
                    // Could signal something??
                    return s;
                })
                .When(e48hoursSinceOccupied, (m, s, e, c) =>
                {
                    // Could signal something??
                    return s;
                })
                .When(eUserActivity, (m, s, e, c) =>
                {
                    m.After(c.OccupancyTimeout, eTimeout);                // start a new timeout
                    m.IsTimerRunning = true;
                    return Occupied;
                });

            // Asleep is a substate of not occupied so no need for more logic on becoming occupied ...
            Asleep
                .When(eMorning, (m, s, e, c) =>
                {
                    // Eliminate Asleep if appropriate
                    return NotOccupied;
                });

            // Occupied includes recently occupied and heavily occupied ...
            Occupied
                .When(e5MinutesAfterBecomingOccupied, (m, s, e, c) => 
                {
                    m.IsRecentlyOccupied = false;
                    return s;
                })
                .When(eUserActivity, (m, s, e, c) =>
                {
                    // Accumulate activity ...
                    m.decliningActivity += ActivityPerUserInput;

                    m.CancelScheduledEvent(eTimeout);               // cancel the old timeout

                    m.After(c.OccupancyTimeout, eTimeout);                // start a new timeout
                    m.IsTimerRunning = true;

                    if (m.decliningActivity > 20 * ActivityPerUserInput)
                        return HeavilyOccupied;
                    else
                        return s;
                })
                .When(eAllChildrenNotOccupied, (m, s, e, c) =>
                    {
                        if (m.IsTimerRunning)
                        {
                            // If the timer is running ... wait until it runs out
                            return s;
                        }
                        else
                        {
                            DateTime nowLocal = TimeProvider.Current.Now.LocalDateTime;
                            if (nowLocal.Hour > 17)
                                return Asleep;
                            else
                                return NotOccupied;
                        }
                    })
                .When(eTick, (m, s, e, c) =>
                    {
                        m.decliningActivity *= rateOfDecline;
                        return s;
                    })
                .When(eTimeout, (m, s, e, c) =>
                    {
                        DateTime nowLocal = TimeProvider.Current.Now.LocalDateTime;
                        if (nowLocal.Hour > 17)
                            return Asleep;
                        else
                            return NotOccupied;
                    });

            HeavilyOccupied.When(eTick, (m, s, e, c) =>
            {
                // Same code as Occupied but this one will override if we are in HeavilyOccupied mode
                m.decliningActivity *= rateOfDecline;
                // Fall back to just occupied when ...
                if (m.decliningActivity < 0.2 * ActivityPerUserInput)
                    return Occupied;
                else
                    return s;

            });


        }

        public OccupancyStateMachine()
            : base(Starting)
        {
        }

        public OccupancyStateMachine(State initialState)
            : base(initialState)
        {
        }

        public override void Start()
        {
            this.EventHappens(eStart, null);
        }

        public void UserActivity(BuildingArea ba)
        {
            this.EventHappens(eUserActivity, ba);
        }

        public void AllChildrenNotOccupied(BuildingArea ba)
        {
            this.EventHappens(eAllChildrenNotOccupied, ba);
        }

        public void AtLeastOneChildOccupied(BuildingArea ba)
        {
            this.EventHappens(eAtLeastOneChildOccupied, ba);
        }
    }
}

Dynamic persistence with MongoDB – look, no classes! Multiple inheritance in C#!

In an earlier post I explained a technique to create a class-free persistence layer using MongoDB. [Read that post first, then come back here.]

Since then I’ve refined the techniques involved and created a cleaner implementation that does away with the `.props` collection on each object. Now when you add an interface to an object you get exactly what you expected in the persisted data.

To use it you first need to register the serialization code somewhere in your startup code…

            BsonSerializer.RegisterSerializationProvider(new MongoDynamicSerializationProvider());

The Serialization provider is quite simple:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using MongoDB.Bson.Serialization;

namespace MongoData.Dynamic
{
    public class MongoDynamicSerializationProvider : IBsonSerializationProvider
    {

        public IBsonSerializer GetSerializer(Type type)
        {
            if (typeof(MongoDynamic).IsAssignableFrom(type))
                return MongoDynamicBsonSerializer.Instance;
            return null;
        }
    }
}

The serializer is a bit more involved. It uses an interface map to decide what type to return for each serialized object. This is critical because many different .NET types can map onto the same BSon serialized value and only by maintaining this map can we get back to the original type. It’s also
critical for handling nested object graphs containing different types.

using System;
using System.Collections.Concurrent;
using System.Dynamic;
using System.Linq;
using System.Linq.Expressions;
using System.Runtime.CompilerServices;
using Microsoft.CSharp.RuntimeBinder;
using MongoDB.Bson.IO;
using MongoDB.Bson.Serialization;
using MongoDB.Bson.Serialization.Serializers;
using MongoDB.Bson;
using MongoDB.Bson.Serialization.IdGenerators;
using System.Collections.Generic;
using ImpromptuInterface;

namespace MongoData.Dynamic
{
    public class MongoDynamicBsonSerializer : BsonBaseSerializer
    {
        private static MongoDynamicBsonSerializer instance = new MongoDynamicBsonSerializer();

        public static MongoDynamicBsonSerializer Instance
        {
            get { return instance; }
        }

        public override object Deserialize(BsonReader bsonReader, Type nominalType, IBsonSerializationOptions options)
        {
            var bsonType = bsonReader.CurrentBsonType;
            if (bsonType == BsonType.Null)
            {
                bsonReader.ReadNull();
                return null;
            }
            else if (bsonType == BsonType.Document)
            {
                var os = new ObjectSerializer();
                MongoDynamic md = new MongoDynamic();
                bsonReader.ReadStartDocument();

                Dictionary<string, Type> typeMap = null;

                // scan document first to find interfaces
                {
                    var bookMark = bsonReader.GetBookmark();
                    if (bsonReader.FindElement(MongoDynamic.InterfacesField))
                    {
                        md[MongoDynamic.InterfacesField] = BsonValue.ReadFrom(bsonReader).AsBsonArray.Select(x => x.AsString);
                        typeMap = md.GetTypeMap();
                    }
                    else
                    {
                        throw new FormatException("No interfaces defined for this dynamic object - can't deserialize it");
                    }
                    bsonReader.ReturnToBookmark(bookMark);
                }

                while (bsonReader.ReadBsonType() != BsonType.EndOfDocument)
                {
                    var name = bsonReader.ReadName();

                    if (name == "_id")
                    {
                        md[name] = BsonValue.ReadFrom(bsonReader).AsObjectId;
                    }
                    else if (name == MongoDynamic.InterfacesField)
                    {
                        // Read it and ignore it, we already have it
                        BsonValue.ReadFrom(bsonReader);
                    }
                    else
                    {
                        if (typeMap == null) throw new FormatException("No interfaces define for this dynamic object - can't deserialize");
                        // lookup the type for this element according to the interfaces
                        Type elementType;
                        if (typeMap.TryGetValue(name, out elementType))
                        {
                            var value = BsonSerializer.Deserialize(bsonReader, elementType);
                            md[name] = value;
                        }
                        else
                        {
                            // This is a value that is no longer in the interface, maybe a column you removed
                            // not really much we can do with it ... but we need to read it and move on
                            var value = BsonSerializer.Deserialize(bsonReader, typeof(object));
                            md[name] = value;

                            // As with all databases, removing elements from the schema is always going to cause problems ... 
                        }
                    }
                }
                bsonReader.ReadEndDocument();
                return md;
            }
            else
            {
                var message = string.Format("Can't deserialize a {0} from BsonType {1}.", nominalType.FullName, bsonType);
                throw new FormatException(message);
            }
        } 
    

        public override bool GetDocumentId(object document, out object id, out Type idNominalType, out IIdGenerator idGenerator)
        {
            MongoDynamic x = (MongoDynamic)document;
            id = x._id;
            idNominalType = typeof(ObjectId);
            idGenerator = new ObjectIdGenerator();
            return true;
        }

        public override void SetDocumentId(object document, object id)
        {
            MongoDynamic x = (MongoDynamic)document;
            x._id = (ObjectId)id;
        }

        public override void Serialize(BsonWriter bsonWriter, Type nominalType, object value, IBsonSerializationOptions options)
        {
            if (value == null)
            {
                bsonWriter.WriteNull();
                return;
            }
            var metaObject = ((IDynamicMetaObjectProvider)value).GetMetaObject(Expression.Constant(value));
            var memberNames = metaObject.GetDynamicMemberNames().ToList();
            if (memberNames.Count == 0)
            {
                bsonWriter.WriteNull();
                return;
            }

            bsonWriter.WriteStartDocument();
            foreach (var memberName in memberNames)
            {
                // ToDo: handle all those _id Id id variants?
                bsonWriter.WriteName(memberName);

                object memberValue;
                if (memberName == "_id") memberValue = ((MongoDynamic)value)._id;
                else if (memberName == "int") memberValue = ((MongoDynamic)value).@int;
                else memberValue = Impromptu.InvokeGet(value, memberName);

                if (memberValue == null)
                    bsonWriter.WriteNull();
                else
                {
                    var memberType = memberValue.GetType();
                    var serializer = BsonSerializer.LookupSerializer(memberType);
                    serializer.Serialize(bsonWriter, memberType, memberValue, null);
                }
            }
            bsonWriter.WriteEndDocument();
        }
    }
}

And finally, the actual MongoDynamic class:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Dynamic;
using MongoDB.Bson;
using MongoDB.Bson.Serialization.Attributes;
using ImpromptuInterface;

namespace MongoData.Dynamic
{
    /// <summary>
    /// All MongoDynamic objects support this interface because every object needs an _id in MongoDB
    /// </summary>
    public interface IId
    {
        ObjectId _id { get; set; }
    }

    /// <summary>
    /// MongoDynamic is like an ExpandoObject that also understands document Ids and uses Improptu interface
    /// to act like any other collection of interfaces ...
    /// It can be serialized and deserialized from BSon and thus stored in a MongoDB database.
    /// </summary>
    /// <remarks>
    /// This simple class gives you the ability to define database objects using only .NET interfaces - no classes!
    /// Those objects can be dynamically extended to support any interface you want to add to them - polymorphism!
    /// When loaded back from the database the object will support all of the interfaces that were ever applied to it.
    /// Adding a new field is easy.  Removing one works too.
    /// All fields must be nullable since they may not be present on earlier instances of an object type.
    /// </remarks>
    public class MongoDynamic : DynamicObject, IId
    {
        [BsonId(Order=1)]
        public ObjectId _id { get; set; }

        // Dumb name for a property - which is why I chose it - very unlikely it will ever conflict with a real property name
        public const string InterfacesField = "int";

        /// <summary>
        /// Interfaces that have been added to this object
        /// </summary>
        /// <remarks>
        /// We always begin by supporting the _id interface
        /// Order is important, we need to see this field before we can deserialize any others
        /// </remarks>
        [BsonElement(InterfacesField, Order=2)]
        internal HashSet<string> @int = new HashSet<string>(){ typeof(IId).FullName };

        /// <summary>
        /// A text version of all interfaces - mostly for debugging purposes, stored in alphabetical order
        /// </summary>
        [BsonIgnore]
        public string InterfacesAsText
        {
            get { return string.Join(",", this.@int.OrderBy(i => i)); }
        }

        /// <summary>
        /// Add support for an interface to this document if it doesn't already have it
        /// </summary>
        public T AddLike<T>()
            where T : class
        {
            @int.Add(typeof(T).FullName);
            // And also act like any interfaces that interface implements (which will include ones they represent too)
            foreach (var @interface in typeof(T).GetInterfaces())
                @int.Add(@interface.FullName);
            return Impromptu.ActLike<T>(this, this.GetAllInterfaces());
        }

        /// <summary>
        /// Add support for multiple interfaces
        /// </summary>
        public T AddLike<T>(Type[] otherInterfaces)
            where T : class
        {
            var allInterfaces = otherInterfaces.Concat(new[] { typeof(T) });
            var allInterfacesAndDescendants = allInterfaces.Concat(allInterfaces.SelectMany(x => x.GetInterfaces()));
            foreach (var @interface in allInterfacesAndDescendants)
                @int.Add(@interface.FullName);
            return Impromptu.ActLike<T>(this, this.GetAllInterfaces());
        }

        /// <summary>
        /// Cast this object to an interface only if it has previously been created as one of that kind
        /// </summary>
        public T AsLike<T>()
            where T : class
        {
            if (!this.@int.Contains(typeof(T).FullName)) return null;
            else return Impromptu.ActLike<T>(this, this.GetAllInterfaces());
        }

        // A rather large cache of all interface types loaded into the App Domain
        private static List<Type> cacheOfTypes = null;

        // A cache of the interface types corresponding to a given 'key' of interface names
        private static Dictionary<string, Type[]> cacheOfInterfaces = new Dictionary<string, Type[]>();

        public Type[] GetAllInterfaces()
        {
            // We always behave like an object with an Id plus any other interfaces we have
            var key = string.Join(",", this.@int.OrderBy(i => i));
            if (!cacheOfInterfaces.ContainsKey(key))
            {
                if (cacheOfTypes == null)
                {
                    var assemblies = AppDomain.CurrentDomain.GetAssemblies();
                    cacheOfTypes = assemblies.SelectMany(ass => ass.GetTypes()).Where(t => t.IsInterface).ToList();
                }
                var interfaces = cacheOfTypes.Where(t => this.@int.Any(i => i == t.FullName));

                // Could trim the interfaces to remove any that are inherited from others ...
                cacheOfInterfaces.Add(key, interfaces.ToArray());
            }
            return cacheOfInterfaces[key];
        }

        /// <summary>
        /// Get a mapping from a field name to a type according to the interfaces on this object
        /// </summary>
        /// <returns></returns>
        public Dictionary<string, Type> GetTypeMap()
        {
            Dictionary<string, Type> typeMap = new Dictionary<string, Type>();
            var interfaces = this.GetAllInterfaces();
            foreach (var mi in interfaces.SelectMany(intf => intf.GetProperties()))
            {
                typeMap[mi.Name] = mi.PropertyType;
            }
            return typeMap;
        }


        /// <summary>
        /// Becomes a Proxy object that acts like it implements all of the interfaces listed as being supported by this Entity
        /// </summary>
        /// <remarks>
        /// Because the returned object supports ALL of the interfaces that have ever been added to this object
        /// you can cast it to any of them.  This enables a type of polymorphism.
        /// </remarks>
        public object ActLikeAllInterfacesPresent()
        {
            return Impromptu.DynamicActLike(this, this.GetAllInterfaces());
        }

        [BsonIgnore]
        // BsonIgnore because Bson serialization will happen on the dynamic interface this class exposes not on this dictionary
        private Dictionary<string, object> children = new Dictionary<string, object>();

        /// <summary>
        /// Fetch a property by name
        /// </summary>
        public override bool TryGetMember(GetMemberBinder binder, out object result)
        {
            if (binder.Name == "_id") { result = this._id; return true; }
            else if (binder.Name == InterfacesField) { result = this.@int; return true; }
            else 
            {
               children.TryGetValue(binder.Name, out result); 
               result = null;                         // we hope that it's nullable!  If not you have an issue 
               return true;                           // when you do a database migration or query a nullable field it won't be in 'children'
            }
        }

        /// <summary>
        /// Set a property (e.g. person1.Name = "Smith")
        /// </summary>     
        public override bool TrySetMember(SetMemberBinder binder, object value)
        {
            if (binder.Name == "_id") { this._id = (ObjectId)value; return true; }      // you shouldn't need to use this
            if (binder.Name == InterfacesField) throw new AccessViolationException("You cannot set the interfaces directly, use AddLike() instead");
            if (!this.GetTypeMap().ContainsKey(binder.Name)) throw new ArgumentException("Property '" + binder.Name + "' not found.  You need to call AddLike to specify the interfaces you want to support."); 
            children[binder.Name] = value;
            return true;
        }

        public override IEnumerable<string> GetDynamicMemberNames()
        {
            return new[]{"_id", InterfacesField}.Concat(children.Keys);
        }

        /// <summary>
        /// An indexer for use by serialization code
        /// </summary>
        internal object this[string key]
        {
            get
            {
                if (key == "_id") return this._id;
                else if (key == InterfacesField) return this.@int;
                else return children[key];
            }

            set
            {
                if (key == "_id" && value is BsonObjectId) this._id = ((BsonObjectId)value).Value;
                else if (key == "_id") this._id = (ObjectId)value;
                else if (key == InterfacesField) this.@int = new HashSet<string>((IEnumerable<string>)value);
                else children[key] = value;
            }
        }
    }
}

You’ll need Impromptu interface (from Nuget) to build this. To use it, you write code like this to save to MongoDB:

            MongoDynamic entity = new MongoDynamic();
            var user = entity.AddLike<IUser>();         // *** Add the IUser fields to it ...
            user.Name = name;                           // Use it as if it were an IUser
            // save it to the database as normal

And to retrieve an object you create a query as normal and then query for MongoDynamic objects like so …

            var user = database.GetCollection<MongoDynamic>("***collectionName***").FindOne(query);
            if (user == null) return null;
            return user.AsLike<IUser>();

Typically you will want your query to reference the field called int (where all the interfaces are stored) so you can query for objects that support a specific type (if you do, you’ll want to add an index on that field). [NB the name was chosen to be one you were unlikely to ever use in .NET]

MongoDynamic objects are polymorphic – you can morph them to support any other interface at any time like so …

            user.AddLike<ISomeOtherInterface>();

Home network crawler – cataloging every file on the home LAN with C# and MongoDB

Map-Reduce in operation in Greenland

Map-Reduce in action: The glaciers in Greenland 'map' the canyon walls into streams of rocks called lateral moraine. As the glaciers merge these rocks are 'reduced' into streams in the middle called 'medial' moraine. (A photo I took over Greenland this summer.)

With the addition of two more 3TB drives to the home network it’s becoming impossible to track files and to remember where each one is and whether it’s a backup of some other disk or not. There are 8 computers on the home network and over 10TB of storage distributed between them. Much of the storage is concentrated on a single machine running Windows Server 2008. It’s a low-powered Atom server connected to a Sans Digital 1U Rackmount Sans Digital disk array running in JBOD mode (just a bunch of disks).

I’m not a huge fan or RAID arrays – they mostly mean there’s another component to go wrong (the controller card) and when they do go wrong you can lose all your data just as easily as if it were all on one drive. I prefer a multiple copy strategy, an “Amazon S3 for the home” if you like. The downside of this is that there are multiple copies of each file across the home network and as I have several generations of hard drives the mapping from primary to secondary to tertiary is complex and hard to manage! It’s also really hard to find a single file when there are so many places to look and it’s nigh on impossible to be sure that I have the necessary three copies of every important file in the right places at all times.

So this weekend I embarked on a small project to catalog every file, directory and storage volume on the entire home network including drives that are only sometimes connected. The software has been running all weekend and is close to cataloging everything. It’s found 5 million files so far representing over 6TB of data!

The architecture I chose for this software was an agent that runs on each PC to catalog all of the attached volumes. This client uploads all the directories and files that it finds to a MongoDB database running on the same Atom server as the main storage array. The poor little Atom server’s 4GB of RAM has been in constant use but the server has remained responsive, in part because it boots from an SSD drive.

Each volume, directory and file is represented by a document in MongoDB in a single collection. The agent calculates an MD5 hash for each file and extracts metadata from MP3, WMA and JPG files. It also stores all of the key file dates (created, updated, accessed) and references to parent directories, volume identifiers and the currently connected PC. It does not assume that a volume is always connected to the same computer – you can unplug an external drive from one and put it somewhere else and it will all work just fine.

I implemented a re-startable tree scan that uses a couple of DateTime stamps to be able to determine which directories need to be scanned during the current pass and which ones have already been scanned. Any agent can be killed at any time and restarted and it will carry on walking the directory tree right where it left off. It will even continue correctly in the case where you move a volume from one PC to another.

Each agent uses the Parallel Task library’s Parallel.ForEach to crawl each volume in parallel and to parse multiple files from each directory simultaneously.

By storing all of the file metadata in Mongo DB it’s easy to use Map-Reduce to calculate some interesting statistics for the files on the network.

For example, to create a summary of file sizes I can use a Map function:

function Map() {
	if (this.Size && this._t == "FileInformation")
	{
		var size = this.Size;
		
		if (size < 1024)
			emit ("kb", {count:1, size:this.Size});
		else if (size < 1024*1024)
			emit ("mb", {count:1, size:this.Size});
		else if (size < 1024*1024*1024)
			emit ("gb", {count:1, size:this.Size});
		else if (size < 1024*1024*1024*1024)
			emit ("tb", {count:1, size:this.Size});
		else 
			emit ("tb+", {count:1, size:this.Size});
	}
}

and a reduce function:

function Reduce(key, arr_values) {

	var count = 0;
	var size = 0;
	
	for(var i in arr_values) 
	{
		count = count + arr_values[i].count;
		size = size + arr_values[i].size;
	}
	
	return {count:count, size:size};
}

Map-Reduce operations like this take about 20 minutes to run (on the Atom server with just 4GB of RAM) whereas any query serviced by one of the indexes on the MongoDB collection is almost instantaneous.

I’ve been using the excellent MongoVue to run simple map-reduce scripts like this and to keep track of how quickly the database is growing.

Map-reduce can also be used to find duplicate files – by emitting the MD5 hash as the key and some information about the file as the value I can find every copy of every file across every computer on the home network.

Since I have the file name and metadata for every file on the home network I can also easily find any file using MongoDB’s regex matching feature against the path.

The Hard Parts

For starters you’ll need a library that can handle long file names. Then you’ll need to fix it to provide at least the functionality that FileInfo and DirectoryInfo give you in .NET.

Next you’ll need to learn about reparse-points and hard-links and you’ll need to skip over them because with them in place the file system is not a tree; it’s a cyclical graph in which a simple crawler will quickly get confused or stuck.

You’ll also want to store the NTFS file Id and the unique Volume ID for every file so you can track it when the file is moved or the removable drive is connected to a different computer.

So how well does it work?

This all seems to work really well. Nearly every volume has now been cataloged. It’s located about 5M files occupying over 6TB of space. The worst case offender for the number of copies of the same file is 100+. I’ve used the find feature in MongoDB to find a file I was missing and I’m better able to plan how to arrange directories and file generations across the various hard drives I have.

What’s next

Well, of course this needs to be connected to the home automation system and my Natural Language engine so you can ask “send a copy of IMG_0228 from last week to X” or “where are all the spreadsheets I created last year?” That will be fairly easy.

After that I hope to incorporate backup features into the agents too so they can automatically keep the required number of copies of each file according to its importance. I’d also like to set up a rotating set of external drives that go in the fire safe when not connected and when they are connected they get updated with the latest copies of all the important files.

I’d also like to be able to get the agents to move whole groups of directories around between drives as juggling the directory layout each time a new hard drive is added to the system is always a time consuming process.

Comments or Questions?

Does everyone else have a hard time managing multiple computers, hard drives, directories and multiple copies of files? What tools do you use to do this? Is there anything commercially available that I could have used instead? Would a tool like this be useful to you? Should I publish the code somewhere? Comments and questions are always welcome here or on twitter.

C# Natural Language Engine connected to Microsoft Dynamics CRM 2011 Online

In an earlier post I discussed some ideas around a Semantic CRM.

Recently I’ve been doing some clean up work on my C# Natural Language Engine and decided to do a quick test connecting it to a real CRM. As you may know from reading my blog, this natural language engine is already heavily used in my home automation system to control lights, sprinklers, HVAC, music and more and to query caller ID logs and other information.

I recently refactored it to use the Autofac dependency injection framework and in the process realized just how close my NLP engine is to ASP.NET MVC 3 in its basic structure and philosophy! To use it you create Controller classes and put action methods in them. Those controller classes use Autofac to get all of the dependencies they may need (services like an email service, a repository, a user service, an HTML email formattting service, …) and then the methods in them represents a specific sentence parse using the various token types that the NLP engine supports. Unlike ASP.NET MVC3 there is no Route registration; the method itself represents the route (i.e. sentence structure) that it used to decide which method to call. Internally my NLP engine has its own code to match incoming words and phrases to tokens and then on to the action methods. In a sense the engine itself is one big dependency injection framework working against the action methods. I sometimes wish ASP.NET MVC 3 had the same route-registration-free approach to designing web applications (but also appreciate all the reasons why it doesn’t).

Another improvement I made recently to the NLP Engine was to develop a connector for the Twilio SMS service. This means that my home automation system can now accept SMS messages as well as all the other communication formats it supports: email, web chat, XMPP chat and direct URL commands. My Twilio connector to NLP supports message splitting and batching so it will buffer up outgoing messages to reach the limit of a single SMS and will send that. This lowers SMS charges and also allows responses that are longer than a single SMS message.

Using this new, improved version of my Natural Language Engine I decided to try connecting it to a CRM. I chose Microsoft Dynamics CRM 2011 and elected to use the strongly-typed, early-bound objects that you can generate for any instance of the CRM service. I added some simple sentences in an NLPRules project that allow you to tell it who you met, and to input some of their details. Unlike a traditional forms-based approach the user can decide what information to enter and what order to enter it in. The Natural Language Engine supports the concept of a conversation and can remember what you were discussing allowing a much more natural style of conversation that some simple rule-based engines and even allowing it to ask questions and get answers from the user.

Here’s a screenshot showing a sample conversation using Google Talk (XMPP/Jabber) and the resulting CRM record in Microsoft CRM 2011 Online. You could have the same conversation over SMS or email. Click to enlarge.

A natural language interface to CRM

Based on my limited testing this looks like another promising area where a truly fluent, conversational-style natural language engine could play a significant role. Note how it understands email addresses, phone numbers and such like and in code these all become strongly typed objects. Where it really excels is in temporal expressions where it can understand things like “who called on a Saturday in May last year?” and can construct an efficient SQL query from that.

A simple redirect route handler for ASP.NET 3.5 routing

ASP.NET 3.5 Routing is a very powerful tool not just for registering routes for newer ASP.NET MVC applications but also for adding SEO friendly routes to older Webforms (ASPX) applications, or for routing multiple URLs to a single page. But that’s not all it can do. You can create your own IRouteHandler and then have complete control over what to do with any incoming HttpRequest.

Here for example is a way to do a permanent redirect when a given route is matched. To use it you might, for example, do:-

            routes.Add(new Route("sample.aspx", new RedirectRouteHandler("/home/start")));

Here is the RedirectRouteHandler that can turn any request into a 301 redirect for you:-

    /// <summary>
    /// Redirect Route Handler
    /// </summary>
    public class RedirectRouteHandler : IRouteHandler
    {
        private string newUrl;

        public RedirectRouteHandler(string newUrl)
        {
            this.newUrl = newUrl;
        }

        public IHttpHandler GetHttpHandler(RequestContext requestContext)
        {
            return new RedirectHandler(newUrl);
        }
    }

    /// <summary>
    /// <para>Redirecting MVC handler</para>
    /// </summary>
    public class RedirectHandler : IHttpHandler
    {
        private string newUrl;

        public RedirectHandler(string newUrl)
        {
            this.newUrl = newUrl;
        }

        public bool IsReusable
        {
            get { return true; }
        }

        public void ProcessRequest(HttpContext httpContext)
        {
            httpContext.Response.Status = "301 Moved Permanently";
            httpContext.Response.StatusCode = 301;
            httpContext.Response.AppendHeader("Location", newUrl);
            return;
        }
    }

Note: I’m not saying this is the best or only way to handle this. You’ll want to look at Url Rewriting and the Application and Request Routing module for IIS7 in particular.

Why functional programming and LINQ is often better than procedural code

Functional programming is a relatively new component in the C# language.  It can potentially replace for-loops in many situations with simpler code, but the question remains ‘what’s wrong with a good old for loop?’

Here are some of the reasons I think functional programming is important and in particular how LINQ can improve the readability, maintainability, and parallelizability (if there were such a word) of your code:

  1. Functional approaches are potentially easier to parallelize either manually using PLINQ or by the compiler. As CPUs move to even more cores this may become more important.
  2. Functional approaches make it easier to achieve lazy evaluation in multi-step processes because you can pass the intermediate results to the next step as a simple variable which hasn’t been evaluated fully yet rather than evaluating the first step entirely and then passing a collection to the next step (or without using a separate method and a yield statement to achieve the same procedurally).
  3. Functional approaches are often shorter and easier to read.
  4. Functional approaches often eliminate complex conditional bodies within for loops (e.g. if statements and ‘continue’ statements) because you can break the for loop down into logical steps – selecting all the elements that match, doing an operation on them, …

These days I opt for the functional syntax more often than not and fall back to for-loops when:-

A. The body of the loop contains complex logic that cannot be disentangled into a cleaner sequential application of functions and it simply easier to just write a for-loop with the complex conditional code in it.

B. The task is inherently not functional, i.e. has side effects

C. The task needs exception handling in it. Sure you can write big lambda blocks with try catch in them but at some point it becomes easier and cleaner just to use a for-loop.

Development Tools and Libraries I use

Libraries and Code Snippets

TweetSharp

Predicate Builder for LINQ query building: http://www.albahari.com/nutshell/predicatebuilder.aspx

Useful tools

LinqPad  http://www.linqpad.net/

Source Control

Subversion with TortoiseSVN

Continuous Integration

JetBrains TeamCity, recently moved off CruiseControl.NET

Deployment

Subversion as a repository for binary images, custom deployment code

Web Server

IIS7

Useful articles

PHP: http://devzone.zend.com/article/627

Putting a feedback button on every page with ASP.NET MVC and JQuery

Feedback button

You’ve probably seen many web sites with the floating ‘feedback’ button down the side. Here’s how to add one to your site using jQuery, jQuery UI and ASP.NET MVC.

First make sure you have jQuery and jQuery UI referenced in your master page view together with the CSS file for whichever jQuery UI theme you have chosen.

We’ll make a few changes to the master page view to add the pop-up feedback form, we’ll add an action on a controller to accept the feedback that is posted, and we’ll need a small amount of CSS.

So, after referencing those javascript files and the theme CSS, the first thing to do is to add the following HTML to the bottom of your master page view:

            <div id="feedbackdialog" style="width:300px; height:300px;text-align:left;">
                <p>Your name and/or email: <br />
                <input type="text" id="feedbackEmail" name="feedbackEmail" size="34" value="<%: this.Model.AccountEmailOrEmpty %>" />
                </p>
                <p>Comment:<br />
                <textarea id="feedbackComment" name="comment" cols="35" rows="5"></textarea></p>
                <br />
                <div id="feedbackResult"></div>
            </div>

Now add this code to your global javascript file that also referenced from your master page view … don’t embed it in the page, go ahead and do the right thing and put it in a .js file so it’s not a burden on every page.

//function for the feedback form
$(document).ready(
    function () {
        /* Create the feedback dialog */

        $("#feedbackdialog").dialog(
        {
            closeOnEscape: true,
            modal: true,
            autoOpen: false,
            resizable: false,
            title: 'Feedback',
            width: 400,
            buttons: { "Send": function () {
                var dlg = $(this);
                $.post("/corporate/suggest",
                        {
                            email: dlg.find("input[name='feedbackEmail']").val(),
                            comment: dlg.find("#feedbackComment").val(),
                            url: document.location.href 
                         },
                        function (data) {
                            dlg.dialog('close');
                        }
                );
                $(this).html("<p id='feedBackSending'>Sending</p>").dialog({ buttons: {} });
            }
            }
        });

        $('.contact_us').click(function () {
            $("#feedbackdialog").dialog("open");
            return false;
        });
    });

Next we’ll add the action referenced here, in the example we used the url ‘/corporate/suggest’ so, assuming you have a controller called CorporateController, add the following action to it …

        public ActionResult Suggest (string email, string comment, string url)
        {
            if (!string.IsNullOrWhiteSpace(comment))
            {
                // here we will log the feedback to the database and/or send it in email
            }
            return View();
        }

Create a view for ‘Suggest’, it doesn’t matter what’s in it as we don’t use the result currently.

And, finally we need a bit of CSS for the feedback icon itself:

/* Feedback tab */
#feedbackTab 
{
	right:0;
    position:fixed;
    width:32px;
    height:150px;
    top: 150px;
    z-index:1;
}

The feedback button now floats on every page, 150px from the top and it’s glued to the right hand side.

Of course you’ll need your own feedback image, or feel free to borrow the one here:- http://www.signswift.com/images/feedback.png

So with that all in place, click the feedback button and a form like this should appear. Fill the information in and send it to the server. Note how we silently grab the url of the page too so we can see which page they were on when the submitted the feedback.

Feedback Form

A simple web crawler in C# using HtmlAgilityPack

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
using System.Net;

namespace LinkChecker.WebSpider
{
    /// <summary>
    /// A result encapsulating the Url and the HtmlDocument
    /// </summary>
    public abstract class WebPage
    {
        public Uri Url { get; set; }

        /// <summary>
        /// Get every WebPage.Internal on a web site (or part of a web site) visiting all internal links just once
        /// plus every external page (or other Url) linked to the web site as a WebPage.External
        /// </summary>
        /// <remarks>
        /// Use .OfType WebPage.Internal to get just the internal ones if that's what you want
        /// </remarks>
        public static IEnumerable<WebPage> GetAllPagesUnder(Uri urlRoot)
        {
            var queue = new Queue<Uri>();
            var allSiteUrls = new HashSet<Uri>();

            queue.Enqueue(urlRoot);
            allSiteUrls.Add(urlRoot);

            while (queue.Count > 0)
            {
                Uri url = queue.Dequeue();

                HttpWebRequest oReq = (HttpWebRequest)WebRequest.Create(url);
                oReq.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";

                HttpWebResponse resp = (HttpWebResponse)oReq.GetResponse();

                WebPage result;

                if (resp.ContentType.StartsWith("text/html", StringComparison.InvariantCultureIgnoreCase))
                {
                    HtmlDocument doc = new HtmlDocument();
                    try
                    {
                        var resultStream = resp.GetResponseStream();
                        doc.Load(resultStream); // The HtmlAgilityPack
                        result = new Internal() { Url = url, HtmlDocument = doc };
                    }
                    catch (System.Net.WebException ex)
                    {
                        result = new WebPage.Error() { Url = url, Exception = ex };
                    }
                    catch (Exception ex)
                    {
                        ex.Data.Add("Url", url);    // Annotate the exception with the Url
                        throw;
                    }

                    // Success, hand off the page
                    yield return new WebPage.Internal() { Url = url, HtmlDocument = doc };

                    // And and now queue up all the links on this page
                    foreach (HtmlNode link in doc.DocumentNode.SelectNodes(@"//a[@href]"))
                    {
                        HtmlAttribute att = link.Attributes["href"];
                        if (att == null) continue;
                        string href = att.Value;
                        if (href.StartsWith("javascript", StringComparison.InvariantCultureIgnoreCase)) continue;      // ignore javascript on buttons using a tags

                        Uri urlNext = new Uri(href, UriKind.RelativeOrAbsolute);

                        // Make it absolute if it's relative
                        if (!urlNext.IsAbsoluteUri)
                        {
                            urlNext = new Uri(urlRoot, urlNext);
                        }

                        if (!allSiteUrls.Contains(urlNext))
                        {
                            allSiteUrls.Add(urlNext);               // keep track of every page we've handed off

                            if (urlRoot.IsBaseOf(urlNext))
                            {
                                queue.Enqueue(urlNext);
                            }
                            else
                            {
                                yield return new WebPage.External() { Url = urlNext };
                            }
                        }
                    }
                }
            }
        }

        ///// <summary>
        ///// In the future might provide all the images too??
        ///// </summary>
        //public class Image : WebPage
        //{
        //}

        /// <summary>
        /// Error loading page
        /// </summary>
        public class Error : WebPage
        {
            public int HttpResult { get; set; }
            public Exception Exception { get; set; }
        }

        /// <summary>
        /// External page - not followed
        /// </summary>
        /// <remarks>
        /// No body - go load it yourself
        /// </remarks>
        public class External : WebPage
        {
        }

        /// <summary>
        /// Internal page
        /// </summary>
        public class Internal : WebPage
        {
            /// <summary>
            /// For internal pages we load the document for you
            /// </summary>
            public virtual HtmlDocument HtmlDocument { get; internal set; }
        }
    }
}

Using Exception.Data to add additional information to an Exception

Introduction

Whether you are writing a WinForms application or a complex .NET web site, you will invariably be catching exceptions, logging them and reporting them somewhere. (In this post, I’m not going to explain how to log exceptions). Simply reporting the exception as-thrown rarely captures enough information to be able to diagnose what happened. A FileNotFoundException for instance isn’t much use unless you know which file it was.

One way to deal with this issue is to wrap an exception up in a more explicit exception that includes the extra information, e.g.

string filename ...
try
{
   //... do something with the file
}
catch (FileNotFoundException ex)
{
   CustomException ex2 = new CustomException("Missing cache: " + filename", ex);
   throw ex2;
}

This approach works but it leads to a lot of custom exceptions that are just extra work to create and maintain.  Sometimes you’ll want a custom exception because you are going to handle it in a different way in some outer scope, but often you just want to log the error and redirect the user to an error page as there is nothing else you can do to fix the problem.

In cases like this, you can simplify things greatly by using the little-known .data property on an Exception. This is an IDictionary for a “collection of key/value pairs that provide additional user-defined information about the exception” [MSDN].

Using this approach, you can write:

try
{
   ...
}
catch (FileNotFoundException ex)
{
   ex.Data.Add("cache filename", filename);
   throw;
}

Each surrounding scope can include a similar Try-Catch that adds more information to .Data so by the time you get to the top-most scope you have added a complete picture as to what might have caused the exception.  And in doing so you haven’t lost any of the StackTrace information, nor have you wrapped the exception up needlessly in another exception.

At a higher level in your Global.asax file where you catch all unhandled exceptions, you can add even more to the .Data collection and perhaps include all the interesting parameters on HttpContext like RawUrl, cookies, …


ex.Data.Add("RawUrl", request.RawUrl);
try
{
   foreach (string cookieName in request.Cookies)
   {
      try
      {
         HttpCookie cookie = request.Cookies[cookieName];
         string key = "Cookie " + cookie.Path + " " + cookieName;
         if (!ex.Data.Contains(key))
         {
             ex.Data.Add(key, cookie.Value.ToString());
         }
      }
      catch
      {
         // deliberately nothing in here, should
         // never happen, just being cautious
      }
   }
   // An extension method I use to spot bots - write your own ...
   if (request.IsABot())
   {
      ex.Data.Add("BOT", "************* BOT *****************");
   }
   ex.Data.Add("UserAgent", request.UserAgent);
   ex.Data.Add("Referrer", request.UrlReferrer);
   ex.Data.Add("User Host", request.UserHostName);
}
catch
{
   // deliberately nothing in here, should
   // never happen, just being cautious
   // but we definitely don't want to cause
   // an exception while handling one!
}

Exception Reporting Code

Now in your exception reporting code, you can write out the exception message and stack trace followed by a dump of all the key value pairs in .Data. I tend to use log4net on each server writing to a rolling log file and SQL server to capture the exception data centrally. For SQL, you’ll probably want one table for the Exception itself and another table with a row for each key/value pair in .Data.

Comments

One cause of Exceptions on web servers is bots and client-side ‘web accelerators’.  Both of these can hit pages with incorrect or outdated parameters that you simply didn’t expect to receive. That’s why I add a BOT warning on every exception as the exception itself may seem severe but in reality it’s benign and no user has ever seen it.  I even found one antivirus product that takes each request you make and sends the URL to Japan where another server makes a second request back to check the page for viruses! It even pretends not to be a Bot in the UserAgent and of course, all your ‘security- through-obscurity’ URLs are now sitting on a server in Japan, but you know security through obscurity is no security at all, right?

Another browser add on called FunWebProducts would routinely corrupt Viewstate information so if you see that in your exceptions log, you know who to blame.