The Blog of Ian Mercer.

Neo4j Meetup in Seattle - some observations

I attended the Neo4j Meetup in Seattle this evening. It was an interesting tour around the internals of Neo4j and some of the design decisions behind how they store graphs in a database.

The most interesting thing about Neo4j is the Cypher query language used to construct graph queries that follow relationships, evaluate conditions on properties on relationships and nodes. Neo4j shows much promise in terms of being able to represent data in a very natural way and to query it using Cypher in ways that would bring SQL to its knees with join-upon-join-upon-join.

In an earlier blog post I lamented the lack of a single database solution that was the best of all worlds: relational + document + graph + semantic web. Tonight that feeling was compounded: Neo4j is a graph database but it's missing several key features that could make it much more.

We were privileged to get a first hand explanation as to how Neo4j worked internally but what we saw looked like a work in progress: an unfinished implementation of something that could be so much better. Here's some of the things Neo4j needs to fix before I'll give it a go:-

​1) Stealing bits from one value to give to another to create odd word lengths like 23 bits is so 1980's. I cannot believe this is a worthwhile optimization to make in 2012. Neo should bite the bullet, upgrade their few existing customers and move to a more modern byte aligned, 64-bit address space. I was equally amazed at the implementation of compression schemes for text on disk but the omission of other obvious space-saving opportunities like declaring some relationships to be one-way only (no reverse queries, thus no need to store the back link). It's 2012: disk space is essentially limitless; I should never have to hit a file-size limit because someone decided to use 23, 28 or some other random number of bits instead of 64.

​2) The extremely limited set of data types. If you want to store json you'd better support at least all the common Javascript options including Dates. Frankly I don't care if your database is written in Java, it exposes a web api using json so that's what it should support. Also odd was the choice of a linked list, meandering its way through the file, as the way to store properties for a node. IMHO Neo4j should just switch to Bson and put a document size limit on nodes like MongoDB instead of carrying on down this bit-packing, linked-list approach to properties with a partial implementation of types.

​3) The lack of file splitting at 2GB/4GB boundaries.

​4) Putting nodes and relationships into separate files. Sure this simplifies the access pattern but it's not going to give good locality to data on disk. An alignment based on disk block sizes with nodes and relationships packed into blocks seems likely to be a much better approach to minimizing disk seeks and reads.

​3) Reliance on Lucene to provide indexing. Much as I appreciate Lucene, Neo4j needs built-in indexes; without them it's impossible to optimize query plans across the graph and the indexes. MongoDB has a good selection of indexing options including 2D geo-spatial indexing; IMHO Neo4j should adopt the same set of options and offer queries that are both good relational database queries and good graph queries not force their users to pick one or the other whilst handling the interop between two different systems.

In fact, in my ideal world Neo4j and MongoDB would just become one database: a document database that also has great graph-querying capabilities!

I'll keep monitoring Neo4j but in the meantime it's full speed ahead with my own implementation of a graph database in MongoDB with the added twist that in my implementation, relationships are all modeled as triples (just like in a semantic web triple-store). My graph-query language isn't likely to be as powerful as Cypher any time soon but I have indexes, the ability to query by relationships easily and a robust implementation of properties on each node with support for all common data-types and through my interface-based approach to storing objects with multiple-inheritance I get strongly-typed result sets in C#.

Related Stories

Cover Image for My love/hate relationship with Stackoverflow

My love/hate relationship with Stackoverflow

Stackoverflow is a terrific source of information but can also be infuriating.

Ian Mercer
Ian Mercer
Cover Image for Xamarin Forms Application For Home Automation

Xamarin Forms Application For Home Automation

Building a Xamarin Forms application to control my home automation system

Ian Mercer
Ian Mercer

Websites should stop using passwords for login!

A slightly radical idea to eliminate passwords from many of the websites you use just occasionally

Ian Mercer
Ian Mercer

VariableWithHistory - making persistence invisible, making history visible

A novel approach to adding history to variables in a programming language

Ian Mercer
Ian Mercer

Updated Release of the Abodit State Machine

A hierarchical state machine for .NET

Ian Mercer
Ian Mercer

My first programme [sic]

At the risk of looking seriously old, here's something found on a paper tape

Ian Mercer
Ian Mercer

Building a better .NET State Machine

A state machine for .NET that I've released on Nuget

Ian Mercer
Ian Mercer

A simple state machine in C#

State machines are useful in many contexts but especially for home automation

Ian Mercer
Ian Mercer

MongoDB substring search with a difference

Ian Mercer
Ian Mercer

MongoDB - Map-Reduce coming from C#

Ian Mercer
Ian Mercer

MongoDB Map-Reduce - Hints and Tips

Ian Mercer
Ian Mercer

Why don't you trust your build system?

Ian Mercer
Ian Mercer

Elliott 803 - An Early Computer

Ian Mercer
Ian Mercer

Continuous Integration -> Continuous Deployment

What is "quality" in terms of a released software product or website?

Ian Mercer
Ian Mercer

Making a bootable Windows 7 USB Memory Stick

Here's how I made a bootable USB memory stick for Windows 7

Ian Mercer
Ian Mercer

Tip: getting the index in a foreeach statement

A tip on using LINQ's Select expression with an index

Ian Mercer
Ian Mercer

SQL Server - error: 18456, severity: 14, state: 38 - Incorrect Login

A rant about developers using the same message for different errors

Ian Mercer
Ian Mercer

WCF and the SYSTEM account

Namespace reservations and http.sys, my, oh my!

Ian Mercer
Ian Mercer

Mixed mode assembly errors after upgrade to .NET 4 Beta 2

Fixing this error was fairly simple

Ian Mercer
Ian Mercer

Shortened URLs should be treated like a Codec ...

Expanding URLs would help users decide whether or not to click a link

Ian Mercer
Ian Mercer

Tagging File Systems

Isn't it time we stopped knowing which drive our file is on?

Ian Mercer
Ian Mercer

A great site for developing and testing regular expressions

Just a link to a site I found useful

Ian Mercer
Ian Mercer

Introducing Jigsaw menus

A novel UI for menus that combines a breadcrumb and a menu in one visual metaphor

Ian Mercer
Ian Mercer

Fix for IE's overflow:hidden problem

Ian Mercer
Ian Mercer

A better Tail program for Windows

A comparison of tail programs for Windows

Ian Mercer
Ian Mercer

Measuring website browser performance

Found this great resource on website performance

Ian Mercer
Ian Mercer

Amazon Instance vs Dedicated Server comparison

Some benchmark performance for Amazon vs a dedicated server

Ian Mercer
Ian Mercer

Agile Software Development is Like Sailing

You cannot tack too often when sailing or you get nowhere. Agile is a bit like that.

Ian Mercer
Ian Mercer

Javascript error reporting

Sending client-side errors back to a server for analysis

Ian Mercer
Ian Mercer

AntiVirus Software is the Worst Software!

When your anti-virus software starts stealing your personal data, it's time to remove it!

Ian Mercer
Ian Mercer

ASP.NET Custom Validation

How to solve a problem encountered with custom validation in ASP.NET

Ian Mercer
Ian Mercer

Optimization Advice

Some advice on software optimization

Ian Mercer
Ian Mercer

Google Chart API

Ian Mercer
Ian Mercer

Cache optimized scanning of pairwise combinations of values

Using space-filling curves to optimize caching

Ian Mercer
Ian Mercer

Threading and User Interfaces

A rant about how few software programs get threading right

Ian Mercer
Ian Mercer

Take out the trash!

Why Windows shutdown takes so long

Ian Mercer
Ian Mercer

Dell upgrades - a pricey way to go

Ian Mercer
Ian Mercer

Programming mostly C#

Ian's advice on programming

Ian Mercer
Ian Mercer