An introduction to Neo4j


Neo4j is a graph database that has been rapidly accumulating success stories, especially in areas such as social applications, recommendation engines, fraud detection, resource authorization, network and data center management, and much more. Here's an interesting introductory lecture by Ian Robinson at JavaZone 2013.

Tip: Databasetube offers various other interesting articles about Neo4j.

Key Takeaways from the Presentation

The Fundamental Premises

  • Data today is more connected than ever before. The relationships between data points are often as important as the data itself.
  • Complexity = f(size, semi-structure, connectedness). Modern data challenges come not just from volume, but from the intricate relationships and varying structures.
  • Graphs are the best abstractions we have to model connectedness. When your data is highly interconnected, graph databases provide a more natural and efficient way to represent and query it.

The Property Graph Model

Neo4j uses the "property graph model" for representing data:

  • Nodes have properties stored as key-value pairs (e.g., name: "John", age: 30)
  • Relationships have a direction and can have properties too (e.g., weighted associations like KNOWS {since: 2010})

This model provides flexibility while maintaining the structure needed for efficient querying.

Neo4j Features

Built-in Web UI: Neo4j server includes a web-based interface for visualizing and querying your graph data, making it easier to explore and understand your data structure.

When to Use a Graph Database

Consider using Neo4j when you encounter:

  • Lots of join tables (indicating high connectedness in your data model)
  • Lots of sparse tables (indicating semi-structured data with many optional fields)

If your relational database schema is becoming unwieldy with numerous many-to-many relationships, a graph database might be a better fit.

ACID Transactions

Neo4j fully supports ACID transactions, providing:

  • Durable, consistent data
  • A try/success syntax for transaction management

This makes Neo4j suitable for production applications that require data integrity guarantees.

Performance Characteristics

  • Millions of 'joins' per second – This is possible because connections are pre-calculated at insert time rather than computed during queries.
  • Consistent query times as dataset grows – Unlike relational databases where join performance degrades with size, graph traversals maintain consistent performance.

Cypher Query Language

Neo4j's Cypher query language has some unique characteristics:

  • Syntax mirrors the graphic representation of a graph – This makes queries intuitive and visual.
  • One-dimensional, left-to-right flow – Queries read naturally, following the path through the graph.

For example, a simple Cypher query might look like:

MATCH (person:Person)-[:KNOWS]->(friend)
WHERE person.name = "John"
RETURN friend.name

This query visually represents the pattern you're searching for in the graph.

Further Resources

For a comparison of various graph databases (including Neo4j), check out this tutorial from the ESWC'13 conference.

Cite this blog post:


Michele Pasin. An introduction to Neo4j. Blog post on www.michelepasin.org. Published on April 10, 2013.

Comments via Github:


See also:

2011


paper  Semantic Web Approaches in Digital History: an Introduction

Lecture slides from the Course on digital history, part of the master in Digital Humanities at King's College, London., Oct 2011.