The practise of Amazon Neptune

Amazon Neptune is a managed Graph database on AWS, whose compute and storage is decoupled like Amazon Aurora. Neptune leverages popular open-source APIs such as Gremlin and SPARQL, and easily migrate existing applications.

After exploring Neptune few months in solution, I have below few learnings,

Bulk loading

Always meet the ConcurrentModificationExceptions when concurrently loading vertices/edges into Neptune. Using neptune-python-utils with retry backoff can improve it, however it requires the expensive large Neptune instance.

The best way of batch loading the large vertices/edges into Neptune is using the bulk load feature, it works fine though the instance of Neptune is small. The loading time depends on the instance size of Neptune.

properties of vertice

In my use case, I store the embedding as properties of vertices like relation database. There are almost 400 properties for every vertices, the query performance is bad with large number of properties. Due to the embedding properties will not be queried, consolidating the 400 properties as a single one properties to improve the query performance.

streams

Neptune Streams logs every change to the graph. It's a Lab feature in 2019, and GA in 2020. However there is no Lambda integration now! It means you can not process the Neptune streams in Lambda functions!

Tools

Neptune Tools

Amazon Neptune Tools is a toolkit maintained by Neptune service team.

Neptune sigv4

The script can connect Neptune to call control plane APIs with aswauthsigv4 and proxy support.