Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
231 views
in Technique[技术] by (71.8m points)

neo4j - Issues loading data patterns from CSV

I try to load data patterns into neo4j from cvs files using cypher command lines. I have two data files, one containing objects, the other containing object parts.

Object file:

ID
ABC-DE
DEF

Part file:

ID  ParentID    Level   Size
ABC ABC-DE  1   3
DE  ABC-DE  1   2
AB  ABC 2   2
BC  ABC 2   2
DE  DEF 1   2
F   DEF 2   1
A   AB  3   1
B   AB  3   1
B   BC  3   1
C   BC  3   1
D   DE  3   1
E   DE  3   1

Cypher command lines used to load data:

LOAD CSV WITH HEADERS FROM 'file:///path_to_file/object.csv' as csvLine FIELDTERMINATOR '	' CREATE (:Object { Name: csvLine.ID})  RETURN count(*);
LOAD CSV WITH HEADERS FROM 'file:///path_to_file/part.csv' as csvLine FIELDTERMINATOR '	' MATCH (o:Object {Name: csvLine.ParentID}) MERGE (p:Part {Name: csvLine.ID}) ON CREATE SET p.Size = csvLine.Size CREATE (o) -[:hasPart {Level: csvLine.Level}]-> (p) RETURN count(*);
LOAD CSV WITH HEADERS FROM 'file:///path_to_file/part.csv' as csvLine FIELDTERMINATOR '	' MATCH (o:Part {Name: csvLine.ParentID}) MERGE (p:Part {Name: csvLine.ID}) ON CREATE SET p.Size = csvLine.Size CREATE (o) -[:hasPart {Level: csvLine.Level}]-> (p) return count(*);

The first two command lines execute properly, creating 2 and 3 nodes respectively and corresponding links. The third command line only creates 4 nodes: AB, BC, D and E. Apparently, only nodes linked to existing nodes are created and linked.

From the CSV file content, we can see that parent nodes are listed before child nodes, we could therefore have expected that nodes A, B and C could have been created and linked to AB and BC accordingly.

Is the current behavior of CSV loading the expected one, which prevent of loading such patterns or is there an issue in my code or a bug?

This issue exists with both neo4j 2.1.7 and neo4j 2.2.0-M04.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

So I think your issue here is that cypher batches up many updates into a single transaction and then commits them. How many it puts into a transaction is configurable with USING PERIODIC COMMIT. I could be wrong, but I think normally it would treat the entire data load as a big transaction.

For you, this is problematic because when you're on, say, row #6, you might need to be referring back to a node that would have been created on row #3. Except that won't work; if the transaction hasn't committed yet because cypher is batching up a bunch of results) then the not-yet-committed result may not be available for a subsequent query running.

So you have a few options; one would be to do a two-pass IMPORT on that part file. On your first pass you might do this:

LOAD CSV WITH HEADERS FROM 'file:///path_to_file/part.csv' as csvLine FIELDTERMINATOR '	' 
MERGE (p:Part {Name: csvLine.ID}) ON CREATE SET p.Size = csvLine.Size CREATE (o) -[:hasPart {Level: csvLine.Level}]-> (p) return count(*);

(This first one would assure all parts exist in the DB)

Then on the second pass you might do this:

LOAD CSV WITH HEADERS FROM 'file:///path_to_file/part.csv' as csvLine FIELDTERMINATOR '	' MATCH (o:Part {Name: csvLine.ParentID}) MATCH (p:Part {Name: csvLine.ID}) CREATE (o) -[:hasPart {Level: csvLine.Level}]-> (p) return count(*);

(This would just link them with relationships)

Another option would be to do something like USING PERIODIC COMMIT 1 but I don't think that would be a good idea; there's transaction overhead, and that would drastically slow your data load if you have a fair amount of data.

EDIT If I were you, I'd do the two-pass import. Also, in general it's not a good idea to rely on record ordering in flat files like this, they tend to be messy. Finally, if you do two-pass import on your part file, note that you don't need your object file at all! Any object that's implicated will be created via the part file, and the object file doesn't add any extra properties than what part references.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...