Error on importing 40M+ data using neo4j-import tool - shell

I use neo4j-import to import 40M nodes, bellow is my shell:
[luning#pinnacle bin]$ ./neo4j-import --into ../data/weibo.db --nodes:User "/data/weibo/user-header.csv,/data/weibo/users/000000_0.csv,/data/weibo/users/000001_0.csv,/data/weibo/users/000002_0.csv,/data/weibo/users/000003_0.csv,/data/weibo/users/000004_0.csv,/data/weibo/users/000005_0.csv,/data/weibo/users/000006_0.csv,/data/weibo/users/000007_0.csv,/data/weibo/users/000008_0.csv,/data/weibo/users/000009_0.csv,/data/weibo/users/000010_0.csv,/data/weibo/users/000011_0.csv,/data/weibo/users/000012_0.csv,/data/weibo/users/000013_0.csv,/data/weibo/users/000014_0.csv,/data/weibo/users/000015_0.csv,/data/weibo/users/000016_0.csv,/data/weibo/users/000017_0.csv,/data/weibo/users/000018_0.csv,/data/weibo/users/000019_0.csv,/data/weibo/users/000020_0.csv,/data/weibo/users/000021_0.csv,/data/weibo/users/000022_0.csv,/data/weibo/users/000023_1.csv,/data/weibo/users/000024_0.csv,/data/weibo/users/000025_0.csv" --delimiter "TAB"
Nodes
[*>:87.20 MB/s---------------------------|PROPERTIES(2)===============|NOD|v:227.03 MB/s(2)====] 48MImport error: Panic called, so exiting
Neo4j Import Tool
neo4j-import is used to create a new Neo4j database from data in CSV files. See
the chapter "Import Tool" in the Neo4j Manual for details on the CSV file format
- a special kind of header is required.
Usage:
--into <store-dir>
Database directory to import into. Must not contain existing database.
--nodes [:Label1:Label2] "<file1>,<file2>,..."
Node CSV header and data. Multiple files will be logically seen as one big file
from the perspective of the importer. The first line must contain the header.
Multiple data sources like these can be specified in one import, where each data
source has its own header. Note that file groups must be enclosed in quotation
marks.
--relationships [:RELATIONSHIP_TYPE] "<file1>,<file2>,..."
Relationship CSV header and data. Multiple files will be logically seen as one
big file from the perspective of the importer. The first line must contain the
header. Multiple data sources like these can be specified in one import, where
each data source has its own header. Note that file groups must be enclosed in
quotation marks.
--delimiter <delimiter-character>
Delimiter character, or 'TAB', between values in CSV data. The default option is
,.
--array-delimiter <array-delimiter-character>
Delimiter character, or 'TAB', between array elements within a value in CSV
I have checked their schema. They are all consistent. It shows
Import error: Panic called, so exiting
Anybody knows how to solve it?
Below is my stacktrace:
java.lang.RuntimeException: Panic called, so exiting
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:63)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.anyStillExecuting(ExecutionSupervisor.java:79)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.finishAwareSleep(ExecutionSupervisor.java:102)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:64)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseDynamicExecution(ExecutionSupervisors.java:65)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:226)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:151)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:263)
Caused by: java.lang.RuntimeException: Panic called, so exiting
at org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.assertHealthy(AbstractStep.java:189)
at org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.await(AbstractStep.java:180)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.receive(ExecutorServiceStep.java:82)
at org.neo4j.unsafe.impl.batchimport.staging.AbstractStep.sendDownstream(AbstractStep.java:226)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep$2.call(ExecutorServiceStep.java:103)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep$2.call(ExecutorServiceStep.java:87)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:217)
Caused by: java.lang.RuntimeException: Panic called, so exiting
... 7 more
Caused by: java.lang.RuntimeException: Panic called, so exiting
... 7 more
Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: ERROR in input
data source: BufferedCharSeeker[buffer:org.neo4j.csv.reader.SectionedCharBuffer#4ac5af5c, seekPos:2764030, line:2882236]
in field: descriptions:string:4
for header: [id:string, screenname:string, locations:string, descriptions:string, :IGNORE, profileimageurl:string, gender:string, followerscount:string, friendscount:string, statusescount:string, favouritescount:string, verified:string, verifiedreason:string, :IGNORE, :IGNORE, :IGNORE, :IGNORE, :IGNORE, :IGNORE, :IGNORE, darenint:string, :IGNORE, :IGNORE, updateddate:string]
raw field value: 6:19:
original error: Tried to read in a value larger than effective buffer size 8388608
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:152)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:42)
at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
at org.neo4j.helpers.collection.NestingIterator.fetchNextOrNull(NestingIterator.java:61)
at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
at org.neo4j.unsafe.impl.batchimport.staging.IteratorBatcherStep.nextBatchOrNull(IteratorBatcherStep.java:54)
at org.neo4j.unsafe.impl.batchimport.staging.InputIteratorBatcherStep.nextBatchOrNull(InputIteratorBatcherStep.java:42)
at org.neo4j.unsafe.impl.batchimport.staging.ProducerStep.process(ProducerStep.java:73)
at org.neo4j.unsafe.impl.batchimport.staging.ProducerStep$1.run(ProducerStep.java:54)
Caused by: java.lang.IllegalStateException: Tried to read in a value larger than effective buffer size 8388608
at org.neo4j.csv.reader.BufferedCharSeeker.fillBufferIfWeHaveExhaustedIt(BufferedCharSeeker.java:258)
at org.neo4j.csv.reader.BufferedCharSeeker.nextChar(BufferedCharSeeker.java:231)
at org.neo4j.csv.reader.BufferedCharSeeker.seek(BufferedCharSeeker.java:109)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:81)
... 10 more

One of the fields probably have a quote that doesn't end that quote... and so the CSV parser will read and read until it finds the next quote. It's inlikely that you've got one field in there that's 8M big, so that's what I'm thinking.

I had the same error and removing special characters such as "*", "&","/" BUT keeping the single quotes was enough to get rid of the error.

I also got "Import error: Executor has been shut down" and "Import error: Panic called, so exiting" errors when I tried to import data to my graph using this method.
My data was free of quote characters (" and ') when I was getting these errors.
What solved my problem was getting rid of all other special characters.
I might have missed something in the documentation because I thought all the text in my node attributes would be read in as strings. Turns out neo4j-import doesn't like characters like "&" and "/"!
When I edited my data (yay sed!) to contain only alphanumeric characters the import tool worked perfectly.

Related

Error importing JSONL dataset into Vertex AI

I tried importing a JSONL dataset into Google's Vertex AI and get a weird and seemingly unrelated error:
Error: Could not parse the line, json is invalid or the format does not match the input schema: Cannot find field: classificationAnnotation in message google.cloud.aiplatform.master.schema.ImageBoundingBoxIoFormat. for: gs://[bucketname]/set.jsonl line 10
It happens every 4 lines of code. All of my lines are identical except the image name changes.
Line 10:
{"imageGcsUri":"gs://[mybucket]/path/to/image.png","classificationAnnotation":{"displayName":"MyLabel","annotationResourceLabels":{"aiplatform.googleapis.com/annotation_set_name":"MyLabel"}},"dataItemResourceLabels":{"aiplatform.googleapis.com/ml_use":"training"}}
Why am I getting this error?
From the line you are sharing, it seems like the image you are trying to access doesn't exist in the bucket you are using, so you would need to see if the image is on the same name or format you are calling it.

Multiaddress utility in front-end-template

Extending the front end template to do some new things and running into issues with multiaddress lookup functionality as specified in our chain here: https://github.com/Greenetwork/BLX_chain/blob/f14ad8705debcc8033069b4fdda046271e1b61f1/pallets/allocator/src/lib.rs#L140
So it seems that the issue is that we are missing one piece of data that the extrinsic (tradeTokens) wants (the piece that defines which "type" for fromapn:Source and toapn:Source)
You can see below that the using the dropdown menus to select Address32 in polkadot js apps yields a 3 being populated in the encoded call data (Address32 is the 3rd entry in the Source enum)
and further here that 3 is shown as ‘snip< Address32: >snip’
Currently as the code in our front-end-template:
https://github.com/Greenetwork/BLX_frontend_new/blob/afe6f9f256e6c9c2a3164f72cf80d4a9057bb893/src/Transfer.js#L71-L72
the error is:
Unhandled Rejection (Error): createType(Call):: Call: failed decoding allocator.tradeTokens:: Struct: failed on args: {"asset_id":"AssetId","fromapn":"Source","toapn":"Source","amt":"Balance1"}:: Struct: failed on fromapn: {"_enum":{"Id":"AccountId","Index":"AccountIndex","Raw":"Bytes","Address32":"[u8;32]","Address20":"[u8;20]"}}:: Invalid AccountId provided, expected 32 bytes, found 31
I think the issue is that the extrinsic is reading the first (or second, see call data in first picture) value of the data being submitted to fromapn as the index of Source, in the current case, that is 0, so it is proceeding to treat the remainder of the data that it was passed as an AccountId.
It is failing on the fact that by using one of the values as the index of Source so we are only left with 31 bytes instead of 32. AccountId is also 32 bytes i think.
Do we need another helper function to append that extra piece of data which specifies the type? Or is there another path?
pre-pended Address32 with and index character to make it 33 bytes:
https://github.com/Greenetwork/BLX_frontend_new/blob/71a0fbee21c08b2374bddfcdca0fe204d3e36aa7/src/helpers.js#L39-L51

Using field reference in UDP in IIB

I am setting reference to a field as a value to an UDP. see image. I wanted to reference the value at this path at runtime. Tried fetching the value using '{}' but seems like '{}' can't resolve path given in dots.
Second time tried fetching value using Eval function. Got stuck here too as Eval is throwing exception if my input has odd number of characters see error message
SET chrValue = EVAL(LocalTxnID);
Please resolve on how to read the input which is a reference in UDP at runtime. Why are the above methods not working
Looks as if these UDP values are for logging code. I see 'Global transaction id', 'Parent transaction id' and 'Local transaction id'. But IIB has built-in facilities for publishing messages with these fields (https://www.ibm.com/support/knowledgecenter/en/SSMKHH_10.0.0/com.ibm.etools.mft.doc/ac37860_.html). Is there a reason why you are not using those features?
Tried fetching the value using '{}' but seems like '{}' can't resolve path given in dots.
Correct. The {} syntax applies to a single NameExpression. You cannot use it to navigate multiple path segments.
Eval is throwing exception if my input has odd number of characters
The exception is reporting an invalid BLOB literal. That seems like a very strange error for an EVAL. Please check that you have quoted the correct error, and please also supply the exact string that EVAL received (if necessary take a user trace using mqsichangetrace, mqsireadlog, mqsiformatlog).

Automate downloading of multiple xml files from web service with power query

I want to download multiple xml files from web service API. I have a query that gets a JSON document:
= Json.Document(Web.Contents("http://reports.sem-o.com/api/v1/documents/static-reports?DPuG_ID=BM-086&page_size=100"))
and manipulates it to get list of file names such as: PUB_DailyMeterDataD1_201812041627.xml in a column on an excel spreadsheet.
I hoped to get a function to run against this list of names to get all the data, so first I worked on one file: PUB_DailyMeterDataD1_201812041627
= Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/PUB_DailyMeterDataD1_201812041627.xml"))
This gets an xml table which I manipulate to get the data I want (the half hourly metered MWh for generator GU_401970
Now I want to change the query into a function to automate the process across all xml files avaiable from the service. The function requires a variable to be substituted for the filename. I try this as preparation for the function:
let
Filename="PUB_DailyMeterDataD1_201812041627.xml",
Source = (Web.Contents("https://reports.sem-o.com/documents/Filename")),
(followed by the manipulating Mcode)
This doesnt work.
then this:
let
Filename="PUB_DailyMeterDataD1_201812041627.xml",
Source = Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/[Filename]")),
I get:
DataFormat.Error: Xml processing failed. Either the input is invalid or it isn't supported. (Internal error: Data at the root level is invalid. Line 1, position 1.)
Details:
Binary
So stuck here. Can you help.
thanks
Conor
You append strings with the "&" symbol in Power Query. [Somename] is the format for referencing a field within a table, a normal variable is just referenced with it's name. So in your example
let Filename="PUB_DailyMeterDataD1_201812041627.xml",
Source = Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/" & Filename)),
Would work.
It sounds like you have an existing query that drills down to a list of filenames and you are trying to use that to import them from the url though, so assuming that the column you have gotten the filenames from is called "Filename" then you could add a custom column with this in it
Xml.Tables(Web.Contents("https://reports.sem-o.com/documents/" & [Filename]))
And it will load the table onto the row of each of the filenames.

SSIS: Data conversion failed from a Flat File Source

Good day.
The following are the errors that had occured while processing the flat file:
Error: 0xC02020A1 at Task, File [1]: Data conversion failed. The data conversion for column "Column 0" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
Error: 0xC020902A at Task, File [1]: The "output column "Column 0" (14)" failed because truncation occurred, and the truncation row disposition on "output column "Column 0" (14)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
Error: 0xC0202092 at Task, File [1]: An error occurred while processing file "filepath" on data row 1.
Error: 0xC0047038 at Task, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "Retrieve Input Batch File" (1) returned error code 0xC0202092. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
Source File is a Flat File
Data Type properties for External Column and Output Column is identical:
Data Type: String [DT_STR]
Length is 1143
I've tried to experiment with the values in the properties, but I got no luck. What could be the reason for the error?
In addition, I tried to test 2 files. First file got success result, while the latter did not. Difference between them is, the first one is Dos\Windows, while the other is UNIX. Does it affect the behavior of the flat file?
Thank you so much for your input :)
Go to your flat file connection manager>>> Flat file Source Editor then >>> Click on Error Output >>>> and then to the column in question and select Ignore Failure. This worked for me. (That is if the column size is right)

Resources