Nested JSON to Parquet - parquet

Consider a nested JSON having the following format:
{
"f1": "v1",
"f2": 1,
"f3": [{
"a": "b",
"c": "d"
}, {
"e": "f",
"g": "h"
}],
"f4": {
"f5": "v5",
"f6": {
"f7": [1,2,3,4,5,6],
"f8": "v8"
}
}
}
If I do not flatten it and use Glue ETL to convert it to parquet directly, would it be the correct approach? Would the storage of nested maps/arrays in parquet columns be ideal if I do it this way?

Related

How to filter any data on any filed is present or not in object in GraphQL

Below is my sample data which store in Dgraph
{
"data": {
"comp": [
{
"topologytemplate": {
"namespace": "a",
"nodetemplates": [
{
"name": "a",
},
{
"name": "b",
}
]
}
},
{
"topologytemplate": {
"namespace": "c",
"nodetemplates": [
{
"name": "a",
},
{
"name": "b",
"directives": [
"a"
]
},
]
}
},
]
},
}
I want to filter data so that as a result we get data that does not contain "directives" filed. I am want to filter data using GraphQL query?
Currently, I am trying to filter data as follows:
query {
comp(func: eq(dgraph.type,"QQQ")){
name
topologytemplate{
nodetemplates #filter (eq(nodetypename,"a")){
name
directives
}
}
}
}
Query to check that directives filed are present or not in nodetemplate?

How to access json fields with Jolt Transform?

How do I access json fields with Jolt transform?
For example I have this json:
{
"a": 110,
"b": 10
}
I would like to have:
{
"a": 110,
"b": 10,
"c": 100 // 110 - 10 (substraction)
}
The following transformation will add a c variable which is set to a - b:
[
{
"operation": "shift",
"spec": {
"a": "a",
"b": "b"
}
},
{
"operation": "modify-default-beta",
"spec": {
"c": "=intSubtract(#(1,a), #(1,b))"
}
}
]
If you wish to test it, the Jolt demo website is an excellent resource. Put your original JSON into the "JSON Input" box:
{
"a": 110,
"b": 10
}
Then place the transformation spec from the top of this answer into the "JOLT Spec" box and hit the Transform button. The result should be as you desired:
{
"a" : 110,
"b" : 10,
"c" : 100
}
You just can use a single modify-overwrite-beta transformation along with a intSubtract function in order to add add an extra element to the current json value such as
[
{
"operation": "modify-overwrite-beta",
"spec": {
"c": "=intSubtract(#(1,a),#(1,b))"
}
}
]

How to extract more than one field from json in Nifi?

I have JSON payload like this;
{
"id": "",
"name": "",
"A": {...},
"B": {...},
"C": {...}
}
And I want to extract A, B and C fields with id and name field as different record. Like this;
{
"id": "",
"name": "",
"A": {...}
}
{
"id": "",
"name": "",
"B": {...}
}
{
"id": "",
"name": "",
"C": {...}
}
I'm using record based processors. But I don't know that how can I do this in Nifi using record based processors.
The "EvaluateJsonPath" is probably what you're looking for. You can add JSONPath expressions, that will be converted to attributes, or written to the flowfile.
http://jsonpath.com/ is a handy web tool to test your expressions.
If you want to use record based processors, then JoltTransformRecord would do the trick. Just set Jolt Transformation DSL as Chain and Jolt Specification as:
[
{
"operation": "shift",
"spec": {
"id": "id",
"name": "name",
"*": {
"#": "array.&"
}
}
},
{
"operation": "shift",
"spec": {
"array": {
"*": {
"#(2,id)": "[#2].id",
"#(2,name)": "[#2].name",
"#": "[#2].&"
}
}
}
}
]
This will first put your unique elements in an array and separate the common keys from them, then it will put the common keys in all of the elements while extracting the array to a top array.
Then, if you want them as different FlowFiles too, you can SplitRecord the array and you got it!

JMESPath current array index

In JMESPath with this query:
people[].{"index":#.index,"name":name, "state":state.name}
On this example data:
{
"people": [
{
"name": "a",
"state": {"name": "up"}
},
{
"name": "b",
"state": {"name": "down"}
},
{
"name": "c",
"state": {"name": "up"}
}
]
}
I get:
[
{
"index": null,
"name": "a",
"state": "up"
},
{
"index": null,
"name": "b",
"state": "down"
},
{
"index": null,
"name": "c",
"state": "up"
}
]
How do I get the index property to actually have the index of the array? I realize that #.index is not the correct syntax but have not been able to find a function that would return the index. Is there a way to include the current array index?
Use-case
Use Jmespath query syntax to extract the numeric index of the current array element, from a series of array elements.
Pitfalls
As of this writing (2019-03-22) this feature is not a part of the standard Jmespath specification.
Workaround
This is possible when running Jmespath from within any of various programming languages, however this must be done outside of Jmespath.
This is not exactly the form you requested but I have a possible answer for you:
people[].{"name":name, "state":state.name} | merge({count: length(#)}, #[*])
this request give this result:
{
"0": {
"name": "a",
"state": "up"
},
"1": {
"name": "b",
"state": "down"
},
"2": {
"name": "c",
"state": "up"
},
"count": 3
}
So each attribute of this object have a index except the last one count it just refer the number of attribute, so if you want to browse the attribute of the object with a loop for example you can do it because you know that the attribute count give the number of attribute to browse.

Elastic Search. Search by sub-collection value

Need help with specific ES query.
I have objects at Elastic Search index. Example of one of them (Participant):
{
"_id": null,
"ObjectID": 6008,
"EventID": null,
"IndexName": "crmws",
"version_id": 66244,
"ObjectData": {
"PARTICIPANTTYPE": "2",
"STATE": "ACTIVE",
"EXTERNALID": "01010111",
"CREATORID": 1006,
"partAttributeList":
[
{
"SYSNAME": "A",
"VALUE": "V1"
},
{
"SYSNAME": "B",
"VALUE": "V2"
},
{
"SYSNAME": "C",
"VALUE": "V2"
}
],
....
I need to find the only entity(s) by partAttributeList entities. For example whole Participant entity with SYSNAME=A, VALUE=V1 at the same entity of partAttributeList.
If i use usul matches:
{"match": {"ObjectData.partAttributeList.SYSNAME": "A"}},
{"match": {"ObjectData.partAttributeList.VALUE": "V1"}}
Of course I will find more objects than I really need. Example of redundant object that can be found:
...
{
"SYSNAME": "A",
"VALUE": "X"
},
{
"SYSNAME": "B",
"VALUE": "V1"
}..
What I get you are trying to do is to search multiple fields of the same object for exact matches of a piece of text so please try this out:
https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html

Resources