Programming
The specified password is not correct
Aug 23rd
I don’t know which recent Windows Update did this but both my Windows Server 2008 AND my Windows 7 64-bit computers suddenly stopped being able to connect to any shares on my Windows Server 2003. Attempts to connect were met with “The specified password is not correct”.
After much searching and trying different options I discovered that the Windows security policy “Network security: LAN Manager authentication level” was not set. Changing this to “Send LM & NTLM – use NTLMv2 session security if negotiated” fixed the problem. Start-Run secpol.msc is the tool you need to run to change it and this is what it looks like:
Yet another case where Microsoft error messages are just plain wrong. Someone clearly decided to throw the same message whether the password is incorrect or whether one of many other problems occurred during the negotiation between two computers.
Home network crawler – cataloging every file on the home LAN with C# and MongoDB
Aug 22nd

Map-Reduce in action: The glaciers in Greenland 'map' the canyon walls into streams of rocks called lateral moraine. As the glaciers merge these rocks are 'reduced' into streams in the middle called 'medial' moraine. (A photo I took over Greenland this summer.)
I’m not a huge fan or RAID arrays – they mostly mean there’s another component to go wrong (the controller card) and when they do go wrong you can lose all your data just as easily as if it were all on one drive. I prefer a multiple copy strategy, an “Amazon S3 for the home” if you like. The downside of this is that there are multiple copies of each file across the home network and as I have several generations of hard drives the mapping from primary to secondary to tertiary is complex and hard to manage! It’s also really hard to find a single file when there are so many places to look and it’s nigh on impossible to be sure that I have the necessary three copies of every important file in the right places at all times.
So this weekend I embarked on a small project to catalog every file, directory and storage volume on the entire home network including drives that are only sometimes connected. The software has been running all weekend and is close to cataloging everything. It’s found 5 million files so far representing over 6TB of data!
The architecture I chose for this software was an agent that runs on each PC to catalog all of the attached volumes. This client uploads all the directories and files that it finds to a MongoDB database running on the same Atom server as the main storage array. The poor little Atom server’s 4GB of RAM has been in constant use but the server has remained responsive, in part because it boots from an SSD drive.
Each volume, directory and file is represented by a document in MongoDB in a single collection. The agent calculates an MD5 hash for each file and extracts metadata from MP3, WMA and JPG files. It also stores all of the key file dates (created, updated, accessed) and references to parent directories, volume identifiers and the currently connected PC. It does not assume that a volume is always connected to the same computer – you can unplug an external drive from one and put it somewhere else and it will all work just fine.
I implemented a re-startable tree scan that uses a couple of DateTime stamps to be able to determine which directories need to be scanned during the current pass and which ones have already been scanned. Any agent can be killed at any time and restarted and it will carry on walking the directory tree right where it left off. It will even continue correctly in the case where you move a volume from one PC to another.
Each agent uses the Parallel Task library’s Parallel.ForEach to crawl each volume in parallel and to parse multiple files from each directory simultaneously.
By storing all of the file metadata in Mongo DB it’s easy to use Map-Reduce to calculate some interesting statistics for the files on the network.
For example, to create a summary of file sizes I can use a Map function:
function Map() {
if (this.Size && this._t == "FileInformation")
{
var size = this.Size;
if (size < 1024)
emit ("kb", {count:1, size:this.Size});
else if (size < 1024*1024)
emit ("mb", {count:1, size:this.Size});
else if (size < 1024*1024*1024)
emit ("gb", {count:1, size:this.Size});
else if (size < 1024*1024*1024*1024)
emit ("tb", {count:1, size:this.Size});
else
emit ("tb+", {count:1, size:this.Size});
}
}
and a reduce function:
function Reduce(key, arr_values) {
var count = 0;
var size = 0;
for(var i in arr_values)
{
count = count + arr_values[i].count;
size = size + arr_values[i].size;
}
return {count:count, size:size};
}
Map-Reduce operations like this take about 20 minutes to run (on the Atom server with just 4GB of RAM) whereas any query serviced by one of the indexes on the MongoDB collection is almost instantaneous.
I’ve been using the excellent MongoVue to run simple map-reduce scripts like this and to keep track of how quickly the database is growing.
Map-reduce can also be used to find duplicate files – by emitting the MD5 hash as the key and some information about the file as the value I can find every copy of every file across every computer on the home network.
Since I have the file name and metadata for every file on the home network I can also easily find any file using MongoDB’s regex matching feature against the path.
The Hard Parts
For starters you’ll need a library that can handle long file names. Then you’ll need to fix it to provide at least the functionality that FileInfo and DirectoryInfo give you in .NET.
Next you’ll need to learn about reparse-points and hard-links and you’ll need to skip over them because with them in place the file system is not a tree; it’s a cyclical graph in which a simple crawler will quickly get confused or stuck.
You’ll also want to store the NTFS file Id and the unique Volume ID for every file so you can track it when the file is moved or the removable drive is connected to a different computer.
So how well does it work?
This all seems to work really well. Nearly every volume has now been cataloged. It’s located about 5M files occupying over 6TB of space. The worst case offender for the number of copies of the same file is 100+. I’ve used the find feature in MongoDB to find a file I was missing and I’m better able to plan how to arrange directories and file generations across the various hard drives I have.
What’s next
Well, of course this needs to be connected to the home automation system and my Natural Language engine so you can ask “send a copy of IMG_0228 from last week to X” or “where are all the spreadsheets I created last year?” That will be fairly easy.
After that I hope to incorporate backup features into the agents too so they can automatically keep the required number of copies of each file according to its importance. I’d also like to set up a rotating set of external drives that go in the fire safe when not connected and when they are connected they get updated with the latest copies of all the important files.
I’d also like to be able to get the agents to move whole groups of directories around between drives as juggling the directory layout each time a new hard drive is added to the system is always a time consuming process.
Comments or Questions?
Does everyone else have a hard time managing multiple computers, hard drives, directories and multiple copies of files? What tools do you use to do this? Is there anything commercially available that I could have used instead? Would a tool like this be useful to you? Should I publish the code somewhere? Comments and questions are always welcome here or on twitter.
Stop writing rude software! Use LASTINPUTINFO instead.
Aug 19th
Can you imagine what life would be like it people behaved like software programs do?
You’d be working away on something when someone would interrupt, steal your attention, and demand a response. You’d be interrupted in the middle of sentences all the time and while you were dealing with one interruption someone else could come up and interrupt you again.
You wouldn’t put up with people like that so why do you put up with software that behaves that way?
Windows itself is one of the worst offenders: the dreaded dialog that explains that updates have been installed and it wants to reboot, right this instant has caused me significant inconvenience in the past as it steals focus and then grabs the next return character and assumes I really did want to reboot right now, right in the middle of a blog post!
There really is no excuse for writing rude software. Windows includes an API called LASTINPUTINFO that can tell you if the user is busy typing or moving the mouse and you can delay your annoying toast pop-up, or worse that focus-stealing modal dialog until you think the user is ready for it. The C# code below shows how to use this API call to get a number of seconds since the last user input. Simply delay your notification or dialog until an appropriate time has passed (e.g. 5 seconds) and only then interrupt the user).
Background processing
Similarly if your background processing is hammering the disk drive you can make it more polite and throttle it back when the user is active on their computer. (You did, of course do all that background processing on a lower priority thread, didn’t you!)
One other area you might want to consider is using BITS to download files instead of hammering their internet connection to fetch files in the background.
The Code
So here’s the code you should use from today to make your software polite:
public static class Input
{
[DllImport("User32.dll")]
private static extern bool
GetLastInputInfo(ref LASTINPUTINFO plii);
private struct LASTINPUTINFO
{
public uint cbSize;
public uint dwTime;
}
/// <summary>
/// How many seconds since last user input
/// </summary>
public static double SecondsSinceLastInput()
{
LASTINPUTINFO lastInPut = new LASTINPUTINFO();
lastInPut.cbSize = (uint)System.Runtime.InteropServices.Marshal.SizeOf(lastInPut);
GetLastInputInfo(ref lastInPut);
uint idle = (uint)Environment.TickCount - lastInPut.dwTime;
return idle/1000.0;
}
}
C# Natural Language Engine connected to Microsoft Dynamics CRM 2011 Online
Jun 5th
In an earlier post I discussed some ideas around a Semantic CRM.
Recently I’ve been doing some clean up work on my C# Natural Language Engine and decided to do a quick test connecting it to a real CRM. As you may know from reading my blog, this natural language engine is already heavily used in my home automation system to control lights, sprinklers, HVAC, music and more and to query caller ID logs and other information.
I recently refactored it to use the Autofac dependency injection framework and in the process realized just how close my NLP engine is to ASP.NET MVC 3 in its basic structure and philosophy! To use it you create Controller classes and put action methods in them. Those controller classes use Autofac to get all of the dependencies they may need (services like an email service, a repository, a user service, an HTML email formattting service, …) and then the methods in them represents a specific sentence parse using the various token types that the NLP engine supports. Unlike ASP.NET MVC3 there is no Route registration; the method itself represents the route (i.e. sentence structure) that it used to decide which method to call. Internally my NLP engine has its own code to match incoming words and phrases to tokens and then on to the action methods. In a sense the engine itself is one big dependency injection framework working against the action methods. I sometimes wish ASP.NET MVC 3 had the same route-registration-free approach to designing web applications (but also appreciate all the reasons why it doesn’t).
Another improvement I made recently to the NLP Engine was to develop a connector for the Twilio SMS service. This means that my home automation system can now accept SMS messages as well as all the other communication formats it supports: email, web chat, XMPP chat and direct URL commands. My Twilio connector to NLP supports message splitting and batching so it will buffer up outgoing messages to reach the limit of a single SMS and will send that. This lowers SMS charges and also allows responses that are longer than a single SMS message.
Using this new, improved version of my Natural Language Engine I decided to try connecting it to a CRM. I chose Microsoft Dynamics CRM 2011 and elected to use the strongly-typed, early-bound objects that you can generate for any instance of the CRM service. I added some simple sentences in an NLPRules project that allow you to tell it who you met, and to input some of their details. Unlike a traditional forms-based approach the user can decide what information to enter and what order to enter it in. The Natural Language Engine supports the concept of a conversation and can remember what you were discussing allowing a much more natural style of conversation that some simple rule-based engines and even allowing it to ask questions and get answers from the user.
Here’s a screenshot showing a sample conversation using Google Talk (XMPP/Jabber) and the resulting CRM record in Microsoft CRM 2011 Online. You could have the same conversation over SMS or email. Click to enlarge.
Based on my limited testing this looks like another promising area where a truly fluent, conversational-style natural language engine could play a significant role. Note how it understands email addresses, phone numbers and such like and in code these all become strongly typed objects. Where it really excels is in temporal expressions where it can understand things like “who called on a Saturday in May last year?” and can construct an efficient SQL query from that.
Class-free persistence and multiple inheritance in C# with MongoDB
May 4th
Much as I appreciate Object Relational Mappers and the C# type system there’s a lot of work to do if you just want create and persist a few objects. MongoDB alleviates a lot of that work with its Bson serialization code that converts almost any object into a binary serialized object notation and provides easy round tripping with JSON.
But there’s no getting around the limitations of C# when it comes to multiple inheritance. You can use interfaces to get most of the benefits of multiple inheritance but implementing a tangled set of classes with multiple interfaces on them can lead to a lot of duplicate code.
What if there was a way to do multiple inheritance without every having to write a class? What if we could simply declare a few interfaces and then ask for an object that implements all of them and a way to persist it to disk and get it back? What if we could later take one of those objects and add another interface to it? “Crazy talk” I hear you say!
Well, maybe not so crazy … take a look at the open source project impromptu-interface and you’ll see some of what you’ll need to make this reality. It can take a .NET dynamic object and turn it into an object that implements a specific interface.
Combine that with a simple MongoDB document store and some cunning logic to link the two together and voila, we have persistent objects that can implement any interface dynamically and there’s absolutely no classes in sight anywhere!
Let’s take a look at it in use and then I’ll explain how it works. First, let’s define a few interfaces:
public interface ILegs
{
int Legs { get; set; }
}
public interface IMammal
{
double BodyTemperatureCelcius { get; set; }
}
// Interfaces can use multiple inheritance:
public interface IHuman: IMammal, ILegs
{
string Name { get; set; }
}
// We can have interfaces that apply to specific instances of a class: not all humans are carnivores
public interface ICarnivore
{
string Prey { get; set; }
}
Now let’s take a look at some code to create a few of these new dynamic documents and treat them as implementors of those interfaces. First we need a MongoDB connection:
MongoServer MongoServer = MongoServer.Create(ConnectionString);
MongoDatabase mongoDatabase = MongoServer.GetDatabase("Remember", credentials);
Next we grab a collection where we will persist our objects.
var sampleCollection = mongoDatabase.GetCollection<SimpleDocument>("Sample");
Now we can create some objects adding interfaces to them dynamically and we get to use those strongly typed interfaces to set properties on them.
var person1 = new SimpleDocument();
person1.AddLike<IHuman>().Name = "John";
person1.AddLike<ILegs>().Legs = 2;
person1.AddLike<ICarniovore>().Prey = "Cattle";
sampleCollection.Save(person1);
var monkey1 = new SimpleDocument();
monkey1.AddLike<IMammal>(); // mark as a mammal
monkey1.AddLike<ILegs>().Legs = 2;
monkey1.AddLike<ICarniovore>().Prey = "Bugs";
sampleCollection.Save(monkey1);
Yes, that’s it! That’s all we needed to do to create persisted objects that implement any collection of interfaces. Note how the IHuman is also an IMammal because our code will also support inheritance amongst interfaces. We can load them back in from MongoDB and get the strongly typed versions of them by using .AsLike
So next, let’s take a look at how we can query for objects that support a given interface and how we can get strongly typed objects back from MongoDB:
var query = Query.EQ("int", typeof(IHuman).Name);
var humans = sampleCollection.Find(query);
Console.WriteLine("Examine the raw documents");
foreach (var doc in humans)
{
Console.WriteLine(doc.ToJson());
}
Console.WriteLine("Use query results strongly typed");
foreach (IHuman human in humans.Select(m => m.AsLike<IHuman>()))
{
Console.WriteLine(human.Name);
}
Console.ReadKey();
So how does this ‘magic’ work? First we need a simple Document class. It can be any old object class, no special requirements. At the moment it does wrap these interface properties up in a document inside it called ‘prop’ making it just a little bit harder to query and index but still fairly easy.
/// <summary>
/// A very simple document object
/// </summary>
public class SimpleDocument : DynamicObject
{
public ObjectId Id { get; set; }
// All other properties are added dynamically and stored wrapped in another Document
[BsonElement("prop")]
protected BsonDocument properties = new BsonDocument();
/// <summary>
/// Interfaces that have been added to this object
/// </summary>
[BsonElement("int")]
protected HashSet<string> interfaces = new HashSet<string>();
/// <summary>
/// Add support for an interface to this document if it doesn't already have it
/// </summary>
public T AddLike<T>()
where T:class
{
interfaces.Add(typeof(T).Name);
foreach (var @interface in typeof(T).GetInterfaces())
interfaces.Add(@interface.Name);
return Impromptu.ActLike<T>(new Proxy(this.properties));
}
/// <summary>
/// Cast this object to an interface only if it has previously been created as one of that kind
/// </summary>
public T AsLike<T>()
where T : class
{
if (!this.interfaces.Contains(typeof(T).Name)) return null;
else return Impromptu.ActLike<T>(new Proxy(this.properties));
}
}
Then we need a simple proxy object to wrap up the properties as a dynamic object that we can feed to Impromptu:
public class Proxy : DynamicObject
{
public BsonDocument document { get; set; }
public Proxy(BsonDocument document)
{
this.document = document;
}
public override bool TryGetMember(GetMemberBinder binder, out object result)
{
BsonValue res = null;
this.document.TryGetValue(binder.Name, out res);
result = res.RawValue;
return true; // We always support a member even if we don't have it in the dictionary
}
/// <summary>
/// Set a property (e.g. person1.Name = "Smith")
/// </summary>
public override bool TrySetMember(SetMemberBinder binder, object value)
{
this.document.Add(binder.Name, BsonValue.Create(value));
return true;
}
}
And that’s it! There is no other code required. Multiple-inheritance and code-free persistent objects are now a reality! All you need to do is design some interfaces and objects spring magically to life and get persisted easily.
[NOTE: This is experimental code: it's a prototype of an idea that's been bugging me for some time as I look at how to meld Semantic Web classes which have multiple inheritance relationships with C# classes (that don't) and with MongoDB's document-centric storage format. Does everything really have to be stored in a triple-store or is there some hybrid where objects can be stored with their properties and triple-store statements can be reserved for more complex relationships? Can we get semantic web objects back as meaningful C# objects with strongly typed properties on them? It's an interesting challenge and this approach appears to have some merit as a way to solve it.]
Random names in C# using LINQ and a touch of functional programming
Mar 21st
Today I needed some random names for testing. I wanted them to look like real names but to also look obviously made up so that nobody would ever treat them as a real record in the database. Below is a quick C# snippet to generate random names. It uses LINQ and a couple of anonymous functions to accomplish a task that would have taken many more lines in a traditional procedural style. Enumerable.Range() in particular is a really handy method for generating stuff.
Random r = new Random();
string alphabet = "abcdefghijklmnopqrstuvwyxzeeeiouea";
Func<char> randomLetter = () => alphabet[r.Next(alphabet.Length)];
Func<int, string> makeName =
(length) => new string(Enumerable.Range(0, length)
.Select(x => x==0 ? char.ToUpper(randomLetter()) : randomLetter())
.ToArray());
string first = makeName(r.Next(5) + 5);
string last = makeName(r.Next(7) + 7);
string company = makeName(r.Next(7) + 7) + " Inc.";
How to get CrmSvcUtil.exe to work with Microsoft Dynamics CRM 2011 online
Mar 10th
You’d think this would be easy – just download the SDK and run the utility, right? Sadly that’s not the case and the information to make it work is scattered around the web.
Here are the steps I’ve pieced together from various forum answers and some trial and error.
1. Install Microsoft WIF SDK
You can get it here: http://www.microsoft.com/downloads/en/details.aspx?FamilyID=eb9c345f-e830-40b8-a5fe-ae7a864c4d76&displaylang=en
Hint: For Windows 7 and Windows Server 2008 R2, select the msu file with name starting Windows6.1. [Naming it Windows7and2008 would have been too easy I guess.]
2. Install Microsoft Dynamics CRM 2011 SDK
You can get it here: http://www.microsoft.com/downloads/en/confirmation.aspx?FamilyID=420f0f05-c226-4194-b7e1-f23ceaa83b69
3. Run the DeviceRegistration.exe utility to generate a device id and password
You can find it in the SDK Tools / DeviceRegistration directory. Run it with command line /Operation:Register
Copy the values for device ID and device password, you’ll need them later
4. Now run CRMSVCUTIL from the MSCRM SDK under the bin directory (not the tools directory)
If you want to copy it to, say, your Utilities directory you’ll need to take all the DLLs with it. [Someone was apparently too lazy to run ILMerge on it.]
The parameters you’ll need are:
crmsvcutil /url:https://<<<Organization>>>.crm4.dynamics.com/XRMServices/2011/Organization.svc /o:crm.cs /n:<<<Desired namespace name>>> /u:<<< your windows live id >>> /p:<<< your windows live password >>> /serviceContextName:<<<Desired service context name>>> /di:<<< Device ID >>> /dp: <<< Device Password >>>
5. That’s it! You should have a file crm.cs that you can use in your Visual Studio Project to interface to MS-CRM.
I just wish it was one step!
6. To add this to your Visual Studio C# Project
Copy the crm.cs file to your solution, add a reference to System.Runtime.Serialization.
Add a /lib folder to your solution and copy the SDK /bin directory into it
Add a reference to the DLLs in there as necessary: Microsoft.XRM.sdk in particular.
Add a reference to System.ServiceModel.
Extending C# to understand the language of the semantic web
Feb 5th
![]()
I was inspired by a question on semanticoverflow.com which asked if there was a language in which the concepts of the Semantic Web could be expressed directly, i.e. you could write statements and perform reasoning directly in the code without lots of parentheses, strings and function calls.
Of course the big issue with putting the semantic web into .NET is the lack of multiple inheritance. In the semantic web the class ‘lion’ can inherit from the ‘big cat’ class and also from the ‘carnivorous animals’ class and also from the ‘furry creatures’ class etc. In C# you have to pick one and implement the rest as interfaces. But, since C# 4.0 we have the dynamic type. Could that be used to simulate multiple inheritance and to build objects that behave like their semantic web counterparts?
The DynamicObject in C# allows us to perform late binding and essentially to add methods and properties at runtime. Could I use that so you can write a statement like “canine.subClassOf.mammal();” which would be a complete Semantic Web statement like you might find in a normal triple store but written in C# without any ‘mess’ around it. Could I use that same syntax to query the triple store to ask questions like “if (lion.subClassOf.animal) …” where a statement without a method invocation would be a query against the triple store using a reasoner capable of at least simple transitive closure? Could I also create a syntax for properties so you could say “lion.Color(“yellow”)” to set a property called Color on a lion?
Well, after one evening of experimenting I have found a way to do just that. Without any other declarations you can write code like this:
dynamic g = new Graph("graph");
// this line declares both a mammal an an animal
g.mammal.subClassOf.animal();
// we can add properties to a class
g.mammal.Label("Mammal");
// add a subclass below that
g.carnivore.subClassOf.mammal();
// create the cat family
g.felidae.subClassOf.carnivore();
// define what the wild things are - a separate hierarchy of things
g.wild.subClassOf.domesticity();
// back to the cat family tree
g.pantherinae.subClassOf.felidae();
// these one are all wild (multiple inheritance at work!)
g.pantherinae.subClassOf.wild();
g.lion.subClassOf.pantherinae();
// experiment with properties
// these are stored directly on the object not in the triple store
g.lion.Color("Yellow");
// complete the family tree for this branch of the cat family
g.tiger.subClassOf.pantherinae();
g.jaguar.subClassOf.pantherinae();
g.leopard.subClassOf.pantherinae();
g.snowLeopard.subClassOf.leopard();
Behind the scenes dynamic objects are used to construct partial statements and then full statements and those full statements are added to the graph. Note that I’m not using full Uri’s here because they wouldn’t work syntactically, but there’s no reason each entity couldn’t be given a Uri property behind the scenes that is local to the graph that’s being used to contain it.
Querying works as expected: just write the semantic statement you want to test. One slight catch is that I’ve made the query return an enumeration of the proof steps used to prove it rather than just a simple bool value. So use `.Any()` on it to see if there is any proof.
// Note that we never said that cheeta is a mammal directly.
// We need to use inference to get the answer.
// The result is an enumeration of all the ways to prove that
// a cheeta is a mammal
var isCheetaAMammal = g.cheeta.subClassOf.mammal;
// we use .Any() just to see if there's a way to prove it
Console.WriteLine("Cheeta is a wild cat : " + isCheetaAMammal.Any());
Behind the scenes the simple statement “g.cheeta.subClassOf.mammal” will take each statement made and expand the subject and object using a logical argument process known as simple entailement. The explanation it might give for this query might be:
because [cheeta.subClassOf.felinae], [felinae.subClassOf.felidae], [felidae.subClassOf.mammal]
As you can see, integrating Semantic Web concepts [almost] directly into the programming language is a pretty powerful idea. We are still nowhere close to the syntactic power of prolog or F# but I was surprised how far vanilla C# could get with dynamic types and a fluent builder. I hope to explore this further and to publish the code sometime. It may well be “the world’s smallest triple store and reasoner”!
This code will hopefully also allow folks wanting to experiment with core semantic web concepts to do so without the ‘overhead’ of a full-blown triple store, reasoner and lots of RDF and angle brackets! When I first came to the Semantic Web I was amazed how much emphasis there was on serialization formats (which are boring to most software folks) and how little there was on language features and algorithms for manipulating graphs (the interesting stuff). With this experiment I hope to create code that focuses on the interesting bits.
The same concept could be applied to other in-memory graphs allowing a fluent, dynamic way to represent graph structures in code. There’s also no reason it has to be limited to in-memory graphs, the code could equally well store all statements in some external triple store.
The code for this experiment is available on bitbucket: https://bitbucket.org/ianmercer/semantic-fluent-dynamic-csharp
MongoDB – Map-Reduce coming from C#
Jan 20th
People coming from traditional relational database thinking and LINQ sometimes struggle to understand map-reduce. One way to understand it is to realize that it’s actually the simple composition of some LINQ operators with which you may already be familiar.
Map reduce is in effect a SelectMany() followed by a GroupBy() followed by an Aggregate() operation.
In a SelectMany() you are projecting a sequence but each element can become multiple elements. This is equivalent to using multiple emit statements in your map operation. The map operation can also chose not to call emit which is like having a Where() clause inside your SelectMany() operation.
In a GroupBy() you are collecting elements with the same key which is what Map-Reduce does with the key value that you emit from the map operation.
In the Aggregate() or reduce step you are taking the collections associated with each group key and combining them in some way to produce one result for each key. Often this combination is simply adding up a single ’1′ value output with each key from the map step but sometimes it’s more complicated.
One thing you should be aware of with map-reduce in MongoDB is that the reduce operation must accept and output the same data type because it may be applied repeatedly to partial sets of the grouped data. In C# your Aggregate() operation would be applied repeatedly on partial sequences to get to the final sequence.
Web site crawler and link checker (free)
Jan 13th
In a previous post I provided a utility called LinkChecker that is a web site crawler and link checker. The idea behind LinkChecker is that you can include it in your continuous integration scripts and thus check your web site either regularly or after every deployment and unlike a simple ping check this one will fail if you’ve broken any links within your site or have seo issues. It will also break just once for every site change and then be fixed the next time you run it. This feature means that in a continuous integration system like TeamCity you can get an email or other alert each time your site (or perhaps your competitor’s site) changes.
As promised in that post, a new version is now available. There’s many improvements under the covers but one obvious new feature is the ability to dump all the text content of a site into a text file. Simply append -dump filename.txt to the command line and you’ll get a complete text dump of any site. The dump includes page titles and all visible text on the page (it excludes embedded script and css automatically). It also excludes any element with an ID or CLASS that includes one of the words “footer”, “header”, “sidebar”, “feedback” so you don’t get lots of duplicate header and footer information in the dump. I plan to make this more extensible in future to allow other words to be added to the ignore list.
One technique you can use with this new ‘dump’ option is to dump a copy of your site after each deployment and then check it into source control. Now if there’s every any need to go back to see when a particular word or paragraph was changed on your site you have a complete record. You could for example use this to maintain a text copy of your WordPress blog, or perhaps to keep an eye on someone else’s blog or Facebook page to see when they added or removed a particular story.
Download the new version here:- LinkCheck <-- Requires Windows XP or later with .NET4 installed, unzip and run
Please consult the original article for more information.
LinkCheck is free, it doesn’t make any call backs, doesn’t use any personal data, use at your own risk. If you like it please make a link to this blog from your own blog or post a link to Twitter, thanks!

