MongoDB Map-Reduce - Hints and Tips
For anyone getting started with Map-Reduce on MongoDB here are a few pointers to get you started.
1. Guids are not a good choice for MongoDB identifiers: use the provided ObjectId instead.
Guids in javascript compare as binary objects and thus don't work well as keys for Map-Reduce operations.
2. You can't use an Array as the return type for the reduce operation.
This is actually documented clearly on the site for those of you that actually read the documentation before trying to get it to work but for everyone else it's going to cause some frustration.
3. The output value emitted by the map function MUST be the same format as the value returned by reduce.
The documentation on this one says it 'should' be the same, but in practice anything but the same format is bound to cause problems. What you need to understand is that 'map-reduce' is somewhat of a misnomer, the reduce function may be called iteratively and that doesn't work unless each reduce operation can have its results fed back in to another reduce operation.
4. Using .length on the values array passed into the reduce function is never the right thing to do
In your map operation you often output a value of '1' for each key. In the reduce operation you want to add up these '1's. It looks like you could use value.length to get the result. But, here too the iterative nature of the reduce operation means that you actually need to examine the values in the array passed in and accumulate them.
5. The print() function provides for some limited debugging assistance
When you need to see some intermediate results in your map or reduce functions the print() function can help.
6. The Reduce function automatically includes the key and the return value
When deserializing the results in C# you'll want to deserialize them into a type that has an '_id' property and a 'value' property. The following generic type can help:
/// <summary>
/// This is a useful type for dealing with MapReduce results.
/// It takes two type parameters, one for the key and one for the value.
/// The simplest possible result would use type parameters ObjectId and int
/// </summary>
public class MapReduceResult<Tid, Tvalue>
{
public Tid _id { get; set; }
public Tvalue value { get; set; }
public override string ToString() { return $"{_id} {value}"; }
}
7. Chaining Map-Reduce operations to get the result you need is quite normal
If you can't see how to get to what you want in a single Map-Reduce cycle don't worry, it's quite easy and normal to pass the results of one Map-Reduce operation to the next.