java8 parallel stream and custom format multiline file

java8 parallel stream and custom format multiline file - parallel-processing

Say I have some custom file format that has some logical format defined. I'd like to "unmarshall" the "objects" from the file. Then how can I use java8 Streams in a parallel fashion to unmarshall the objects?
Is this unreasonable can you explain a more reasonable approach?
Is this not possible? If not is this possible outside of java8, or java9 or scala? Can you provide an example?
[
abc:123,
xy:"yz",
s12:13,
],
...
[
abc:1
s:133,
]
It seems that Parallel message unmarshalling from a token delimited input stream with Java8 stream API is asking something similar but not necessarily from a file perspective. It wasn't clear to me, but I think for that issue it's not possible for java8.

Related

Parsing a JSON file without JSON.parse()

This is my first time using Ruby. I'm writing an application that parses data and performs some calculations based on it, the source of which is a JSON file. I'm aware I can use JSON.parse() here but I'm trying to write my program so that it will work with other sources of data. Is there a clear cut way of doing this? Thank you.

When your source file is JSON then use JSON.parse. Do not implement a JSON parser on your own. If the source file is a CSV, then use the CSV class.
When your application should be able to read multiple different formats then just add one Reader class for each data type, like JSONReader, CSVReader, etc. And then decide depending on the file extension which reader to use to read the file.

Convert a CSV file to JSON using Apache NiFi

I am trying to read a csv from local file system and convert the content into JSON format using Apache Nifi and put the JSON format file in the local system. I have succeeded in converting the first row of csv file but not other rows. What am I missing?
Input:
1,aaa,loc1
2,bbb,loc2
3,ccc,loc3
and my nifi workflow is as here:
http://www.filedropper.com/mycsvtojson
My output is as below which is desired format but I want that to happen for all the rows.
{ "id" : "1", "name" : "aaa",
"location" : "loc1" }

There are a few different ways this could be done...
A custom Java processor that reads in a CSV and converts to JSON
Using the ExecuteScript processor to do something similar in a Groovy/Jython script
Use SplitText to split your original CSV into single lines, then use your current approach with ExtractText and ReplaceText, and then a MergeContent to merge back together
Use ConvertCsvToAvro and then ConvertAvroToJson
Although the last option makes an extra conversion to Avro, it might be the easiest solution requiring almost no work.

This question is a bit older, but there is now a ConvertRecord processor in NiFi 1.3 and newer, which should be able to handle this conversion directly for you, and it avoids having to use split up the data by creating a single JSON array with all of the values, if that is desirable.

How to convert hadoop sequence file to json format?

As the name suggests, I'm looking for some tool which will convert the existing data from hadoop sequence file to json format.
My initial googling have only shown up results related to jaql, which I'm desperately trying to get to work.
Is there any tool from Apache available for this very purpose?
NOTE:
I've hadoop sequence file sitting on my local machine and would like to get data in corresponding json format.
So in-effect, I'm looking for some tool/utility which will take hadoop sequence file as input and produce output in json format.
Thanks

Apache Hadoop might be a good tool for reading sequence files.
All kidding aside, though, why not write the simplest possible Mapper java program that uses, say, Jackson to serialize each key and value pair it sees? That would be a pretty easy program to write.

I thought there must be some tool which will do this given that its such common requirement. Yes, it should be pretty easy to code but again why to do so if you already have something which does just the same.
Anyway, I figured out to do it using jaql. Sample working query which worked for me,
read({type: 'hdfs', location: 'some_hdfs_file', inoptions: {converter: 'com.ibm.jaql.io.hadoop.converter.FromJsonTextConverter'}});

How to use COMMON_TREE_NODE_STREAM

I'm following http://antlr3.org/api/C/buildrec.html tutorial.
It's my understanding that in order to remove/alter tokens before they are consumed by the parser I have to use none buffered stream COMMON_TREE_NODE_STREAM
In this view, how should i feed the parser ?
currently I use tstream=antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT,TOKENSOURCE(lxr));
to "feed" the parser.
Appreciate every advice

No, the COMMON_TREE_NODE_STREAM is the source for a tree parser, not the normal parser. The ANTLR_TOKEN_STREAM is the input stream for that which has a default implementation in the C runtime known as ANTLR3_COMMON_TOKEN_STREAM_struct. Look up its implementation to learn how to create your own token stream.

BASH BSON parser

We need to do some queries of a Mongo DB from BASH shell scripts. Using eval and Mongo's printjson() gives me text output, but it needs to be parsed. Using other scripting languages (Python, Ruby, Erlang, etc) is not an option.
I looked at JSON.sh ( a BASH script lib JSON parser: https://github.com/rcrowley/json.sh ) and it appears to be close to a solution other than the issue that it does not recognize BSON-but-not-JSON data types. Before I try to mod it to recognize BSON data types, is anyone aware of an existing solution?
Thanks.
10/11 Below Stennie notes that I have received an answer in the MongoDB User group, and provides a URL. The answer is very nice and complete, and begins, "MongoDB actually uses what we call Mongo Extended JSON which differs a bit from the vanilla JSON standard..." so I will have to modify the parser. Thanks to all.

Do you perhaps want to use tojson() rather than printjson() and loop through the result of tojson() to parse the fields?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

java8 parallel stream and custom format multiline file - parallel-processing

Related

Parsing a JSON file without JSON.parse()

Convert a CSV file to JSON using Apache NiFi

How to convert hadoop sequence file to json format?

How to use COMMON_TREE_NODE_STREAM

BASH BSON parser

Categories

Resources