neo4j csv import sign of life - shell

I am importing data with the new neo4j version (2.1.1) that allows for csv import. The csv import in question is dealing with bigrams.
The csv file looks like this;
$ head ~/filepath/w2.csv
value,w1,w2,
275,a,a
31,a,aaa
29,a,all
45,a,an
I am pasting this into the neo4j-shell client to load in the csv;
neo4j-sh (?)$ USING PERIODIC COMMIT
> LOAD CSV WITH HEADERS FROM "file:/Users/code/Downloads/w2.csv" AS line
> MERGE (w1:Word {value: line.w1})
> MERGE (w2:Word {value: line.w2})
> MERGE (w1)-[:LINK {value: line.value}]->(w2);
The problem is that the shell now hangs and I have no idea what it is doing. I have checked with the interactive online environment and it does not seem to load any data. It seems unlikely that I have not hit a periodic commit moment yet as the shell has been working for half an hour now.
Is there any way for me to get a sign of life from the csv-loader? I would like to see some intermediate results to assist me in debugging what is going on. A solution to my current situation is welcome, but I am specifically interested in a way to debug the csvloader.

I don't have a answer on the debugging/logging part.
Though maybe a hint to make your query a bit faster.
Do you have a index on the :Word.value?
You can try adding a index:
CREATE INDEX ON :Word(value);
UPDATE:
If you want to follow the import process you can follow the disk size of the graph.db directory. It can roughly give you an idea about the progress.
On a unix machine:
du -s ~/neo4j-community-2.1.2/data/graph.db/

Related

How to read sql table on Nifi?

I am trying to create a basic flow on Nifi
read table from sql
process it on python
write back another table in sql
It is simple as it is.
But, I am facing issues when I try to read data on python
As far as I learn I need to use sys.stdin/out.
It only reads and writes as below.
import sys
import pandas as pd
file = pd.read_csv(sys.stdin)
file.to_csv(sys.stdout,index=False)
Below you can find processor properties, but I don't think it is the issue.
QueryDatabaseTableRecord:
ExecuteStreamCommand:
PutDatabaseRecord:
Error Message:
There's a much easier way to do this if you're running 1.12.0 or newer: ScriptedTransformRecord. It's like ExecuteScript except it works on a per-record basis. This is what a simple Groovy script for it looks like:
def fullName = record.getValue("FullName")
def nameParts = fullName.split(/[\s]{1,}/)
record.setValue("FirstName", nameParts[0])
record.setValue("LastName:", nameParts[1])
record
It's a new processor, so there's not that much documentation on it yet aside from the (very good) documentation bundled with it. So samples might be sparse at the moment. If you want to use and run into issues, feel free to join the nifi-users mailing list and asked for more detailed help.

Misinformation in DataStage XML Export

As the title suggests, I am just trying to do a simple export of a datastage job. The issue occurs when we export the XML and begin examination. For some reason, the wrong information is being pulled from the job and placed in the XML.
As an example the SQL in a transform of the job may be:
SELECT V1,V2,V3 FROM TABLE_1;
Whereas the XML for the same transform may produce:
SELECT V1,Y6,Y9 FROM TABLE_1,TABLE_2;
It makes no sense to me how the export of a job could be different then the actual architecture.
The parameters I am using to export are:
Exclude Read Only Items: No
Include Dependent Items: Yes
Include Source Code with Routines: Yes
Include Source Code with Job Executable: Yes
Include Source Content with Data Quality Specifications: No
What tool are you using to view the XML? Try using something less smart, such as Notepad or Wordpad. This will determine/eliminate whether the problem is with your XML viewer.
You might also try exporting in DSX format and examining that output, to see whether the same symptoms are visible there.
Thank you all for the feedback. I realized that the issue wasn't necessarily with the XML. It had to do with numerous factors within our data stage environment. As mentioned above, the data connections were old and unreliable. For some reason this does not impact our current production refresh, so it's a non issue.
The other issue was the way that the generated SQL and custom SQL options work when creating the XML. In my case, there were times when old code was kept in the system, but the option was switched from custom code to generate SQL based on columns. This lead to inconsistent output from my script. Thus the mini project was scrapped.

Is there a way to change the projection in a topojson file?

I am trying to create a topojson file projected using geoAlbersUsa, originating from the US Census's ZCTA (Zip Codes, essentially) shapefile. I was able to successfully get through the examples in the excellent https://medium.com/#mbostock/command-line-cartography-part-1-897aa8f8ca2c using the specified maps, and now I'm trying to get the same result using the Zip Code-level shapefiles.
I keep running into various issues due to the size of the file and the length of the strings within the file. While I have been able to create a geojson file and a topojson file, I haven't been able to give it the geoAlbersUsa projection I want. I was hoping to find something to convert the current topojson file into a topojson file with a geoAlbersUsa projection but I haven't been able to find any way.
I know this can be done programmatically in the browser, but everything I've read indicates that performance will be significantly better if as much as possible can be done in the files themselves first.
Attempt 1: I was able to convert the ZCTA-level shapefile to a geojson file successfully using shp2json (as in Mike Bostock's example) but when I try to run geoproject (from d3-geo-projection) I get errors related to excessive string length. In node (using npm) I installed d3-geo-projection (npm install -g d3-geo-projection) then ran the following:
geoproject "d3.geoAlbersUsa()" < us_zips.geojson > us_zips_albersUsa.json
I get errors stating "Error: Cannot create a string longer than 0x3fffffe7 characters"
Attempt 2: I used ogr2ogr (https://gdal.org/programs/ogr2ogr.html) to create the geojson file (instead of shp2json), then ran tried to run the same geoproject code as above and got the same error.
Attempt 3: I used ogr2ogr to create the geojson sequence file (instead of a geojson file), then ran geo2topo to create the topojson file from the geojsons file. While this succeeded in creating the topojson file, it still doesn't include the geoAlbersUsa projection in the resulting topojson file.
I get from the rather obtuse documentation of ogr2ogr that an output projection can be specified using -a_srs but I can't for the life of me figure out how to specify something that would get me the geoAlbersUsa projection. I found this reference https://spatialreference.org/ref/sr-org/44/ but I think that would get me the Albers and it may chop off Alaska and Hawaii, which is not what I want.
Any suggestions here? I was hoping I'd find a way to change the projection in the topojson file itself since that would avoid the excessively-long-string issue I seem to run into whenever I try to do anything in node that requires the use of the geojson file. It seems like possibly that was something that could be done in earlier versions of topojson (see Ways to project topojson?) but I don't see any way to do it now.
Not quite an answer, but more than a comment..
So, I Googled just "0x3fffffe7" and found this comment on a random Github/NodeJS project, and based on reading it, my gut feeling is that the node stuff, and/or the D3 stuff you're using is reducing your entire ZCTA-level shapefile down to ....a single string stored in memory!! That's not good for a continent-scale map with such granular detail.
Moreover, the person who left that comment suggested that the OP in that case would need a different approach to introduce their dataset to the client. (Which I suppose is a browser?) In your case, might it work if you query out each state's collection of zips into a single shapefile (ogr2ogr can do this using OGR-SQL), which would give you 5 different shapefiles. Then for each of these, run them through your conversions to get json/geoalbers. To test this concept, try exporting just one state and see if everything else works as expected.
That being said, I'm concerned that your approach to this project has an unworkable UI/architectural expectation: I just don't think you can put that much geodata in a browser DIV! How big is the DIV, full screen I hope?!?
My advice would be to think of a different way to present the data. For example an inset-DIV to "select your state", then clicking the state zooms the main DIV to a larger view of that state and simultaneously pulls down and randers that state's-specific ZCTA-level data using the 50 files you prepped using the strategy I mentioned above. Does that make sense?
Here's a quick example for how I expect you can apply the OGR_SQL to your scenario, adapt to fit:
ogr2ogr idaho_zcta.shp USA_zcta.shp -sql "SELECT * FROM USA_zcta WHERE STATE_NAME = 'ID'"
Parameters as follows:
idaho_zcta.shp < this is your new file
USA_zcta.shp < this is your source shapefile
-sql < this signals the OGR_SQL query expression
As for the query itself, a couple tips. First, wrap the whole query string in double-quotes. If something weird happens, try adding leading and trailing spaces to the start and end of your query, like..
" SELECT ... 'ID' "
It's odd I know, but I once had a situation where it only worked that way.
Second, relative to the query, the table name is the same as the shapefile name, only without the ".shp" file extension. I can't remember whether or not there is case-sensitivity between the shapefile name and the query string's table name. If you run into a problem, give the shapefile and all lowercase name and use lowercase in the SQL, too.
As for your projection conversion--you're on your own there. That geoAlbersUSA looks like it's not an industry standard (i.e EPSG-coded) and is D3-specific, intended exclusively for a browser. So ogr2ogr isn't going to handle it. But I agree with the strategy of converting the data in advance. Hopefully the conversion pipeline you already researched will work if you just have much smaller (i.e. state-scale) datasets to put through it.
Good luck.

read a .fit file on Linux

How could I read Garmin's .fit file on Linux. I'd like to use it for some data analysis but the file is a binary file.
I have visited http://garmin.kiesewetter.nl/ but the website does not seem to work.
Thanks
You can use GPSbabel to do this. It's a command-line tool, so you end up with something like:
gpsbabel -i garmin_fit -f {filename}.fit -o csv -F {output filename}.csv
and you'll get a text file with all the lat/long coordinates.
What's trickier is getting out other data, ie: if you want speed, time, or other information from the .fit file. You can easily get those into a .gpx, where they're in xml and human-readable, but I haven't yet found a single line solution for getting that data into a csv.
The company that created ANT made an SDK package available here:
https://www.thisisant.com/resources/fit
When unzipping this, there is a java/FitCSVTool.jar file. Then:
java -jar java/FitCSVTool.jar -b input.fit output.csv
I tested with a couple of files and it seems to work really well. Then of course the format of the csv can be a little bit complex.
For example, latitude and longitude are stored in semicircles, so it should be multiplied by 180/(2^31) to give GPS coordinates.
You need to convert the file to a .csv, the Garmin repair tool at http://garmin.kiesewetter.nl/ will do this for you. I've just loaded the site fine, try again it may have been temporarily down.
To add a little more detail:
"FIT or Flexible and Interoperable Data Transfer is a file format used for GPS tracks and routes. It is used by newer Garmin fitness GPS devices, including the Edge and Forerunner." From the OpenStreetMap Wiki http://wiki.openstreetmap.org/wiki/FIT
There are many tools to convert these files to other formats for different uses, which one you choose depends on the use. GPSBabel is another converer tool that may help. gpsbabel.org (I can't post two links yet :)
This page parses the file and lets you download it as tables. https://www.fitfileviewer.com/ The fun bit is converting the timestamps from numbers to readable timestamps Garmin .fit file timestamp

How to split a large csv file into multiple files in GO lang?

I am a novice Go lang programmer,trying to learn Go lang features.I wanted to split a large csv file into multiple files in GO lang, each file containing the header.How do i do this? I have searched everywhere but couldnt get the right solution.Any help in this regard will be greatly appreciated.
Also please suggest me a good book for reference.
Thanking You
Depending on your shell fu this problem might be better suited for common shell utilities but you specifically mentioned go.
Let's think through the problem.
How big is this csv file? Are we talking 100 lines or is it 5G ?
If it's smallish I typically use this:
http://golang.org/pkg/io/ioutil/#ReadFile
However, this package also exists:
http://golang.org/pkg/encoding/csv/
Regardless - let's return to the abstraction of the problem. You have a header (which is the first line) and then the rest of the document.
So what we probably want to do (if ignoring csv for the moment) is to read in our file.
Then we want to split the file body by all the newlines in it.
You can use this to do so:
http://golang.org/pkg/strings/#Split
You didn't mention but do you know how many files you want to split by or would you rather split by the line count or byte count? What's the actual limitation here?
Generally it's not going to be file count but if we pretend it is we simply want to divide our line count by our expected file count to give lines/file.
Now we can take slices of the appropriate size and write the file back out via:
http://golang.org/pkg/io/ioutil/#WriteFile
A trick I use sometime to help think me threw these things is to write down our mission statement.
"I want to split a large csv file into multiple files in go"
Then I start breaking that up into pieces but take the divide/conquer approach - don't try to solve the entire problem in one go - just break it up to where you can think about it.
Also - make gratiutious use of pseudo-code until you can comfortably write the real code itself. Sometimes it helps to just write a short comment inline with how you think the code should flow and then get it down to the smallest portion that you can code and work from there.
By the way - many of the golang.org packages have example links where you can literally run in your browser the example code and cut/paste that to your own local environment.
Also, I know I'll catch some haters with this - but as for books - imo - you are going to learn a lot faster just by trying to get things working rather than reading. Action trumps passivity always. Don't be afraid to fail.
Here is a package that might help. You can set a necessary chunk size in bytes and a file will be split on an appropriate amount of chunks.

Resources