The practise of Amazon Neptune
Amazon Neptune is a managed Graph database on AWS, whose compute and storage is decoupled like Amazon Aurora. Neptune leverages popular open-source APIs such as Gremlin and SPARQL, and easily migrate existing applications.
After exploring Neptune few months in solution, I have below few learnings,
Always meet the ConcurrentModificationExceptions when concurrently loading vertices/edges into Neptune. Using neptune-python-utils with retry backoff can improve it, however it requires the expensive large Neptune instance.
The best way of batch loading the large vertices/edges into Neptune is using the bulk load feature, it works fine though the instance of Neptune is small. The loading time depends on the instance size of Neptune.
properties of vertice
In my use case, I store the embedding as properties of vertices like relation database. There are almost 400 properties for every vertices, the query performance is bad with large number of properties. Due to the embedding properties will not be queried, consolidating the 400 properties as a single one properties to improve the query performance.
Neptune Streams logs every change to the graph. It's a Lab feature in 2019, and GA in 2020. However there is no Lambda integration now! It means you can not process the Neptune streams in Lambda functions!
Amazon Neptune Tools is a toolkit maintained by Neptune service team.
The script can connect Neptune to call control plane APIs with aswauthsigv4 and proxy support.