At the YOW conference in Melbourne last december, there was a discussion on NoSQL databases. One of the biggest problems in OO in recent decades has been how to map object oriented data into a relational database. Now we have O/R mappers (Hibernate, JPA, Turbine) which have largely solved the problem for us. Or have they?
Until recently, I hadn’t heard much about alternative databases. MongoDB, CouchDB, Neo4J weren’t in my repertoire of known tools for software solutions. The first two can be considered chemical cousins, while Neo4J sits on the shelf with other graph databases.
Neo4J is still a relational database. Thinking back to my University days, I remember doing database design diagrams, with weak entities and denormalised data which we have to fit into BCNF or 3NF. Fun. Always the hardest part of mapping objects to an SQL database was the many:many, and when there was data attached to the relationship between the two.
Many to many mappings need a join table (which can leak into your domain model depending on implementation) and basically makes it awkward to work with. Where possible I would make a valid middle domain object to avoid a many:many mapping which takes most of the pain away.
Anyway, getting back to graph databases, Neo4J has the concept of a node, which stores data like a hashmap, with keys and values. This node just floats in the database, until you link it with another node via a relationship. In this case, however, a relationship is not has many or owns or rents, although it could be. It it more likely to be slightly more obtuse, eg. knows or likes. These relationships are much more elastic because these too, can have key-value pairs.
One limitation of Neo4J is that you can’t have circular references. I for one don’t think this is so terrible, as it probably allows for a lot more assumptions to be made which could speed up the code. You can work around this, anyway.
Why would you bother with Neo4J? Well good question. I think it models the real world much better than SQL databases. It has more elastic and dynamic semantics and doesn’t focus so doggedly on foreign key relationships – instead it focuses on traversals.
Traversals are the key to getting power out of a node database. Lets say you want to write a complex relational query between many different tables in SQL. You would have to write a series of subqueries for each different type, linking each of them to the subquery down below.
Neo4J allows you to code a traverser, with a series of rules based on context, and then it simply executes the traversal, much like an AI agent, based on the rules you have defined. It will return you the nodes it matches, including the path found and other meta data.
Relationships are also first-class citizens in graph databases, and in some cases are your focus. Lets say you are writing a GPS navigator, you have a series of nodes (intersections) and to calculate the quickest path, you need the lowest-cost (aka Dijkstra) path from the start to the end. Your goal is not to find a specific route, but to find the lowest-cost route which would be in the journey between each node (the weight of the relationship).
There is simply so much happening at the moment in software development – we are breaking down the walls of generations of limitations and putting an end to enterprise work arounds which have dogged software developers for decades. Can’t wait to see what’s around the corner