Merging multiple json arrays in the same file

Merging multiple json arrays in the same file - bash

I'm currently downloading a ton of jira issues to generate a report. Currently the 'full data' file has a ton of individual records like this:
{
"key": "645",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Crash when saving document",
"closedDate": "2014-10-03T09:01:23.000+0200",
"flag": null,
"fixVersionID": "123",
"fixVersionName": "2.7"
}
However, because I'm downloading multiple versions and appending to the same file I end up with this kind of structure.
[
{
"key": "645",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Crash when saving document",
"closedDate": "2014-10-03T09:01:23.000+0200",
"flag": null,
"fixVersionID": "123",
"fixVersionName": "2.7"
}
]
[
{
"key": "552",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Graphical Issue",
"closedDate": "2014-10-13T09:01:23.000+0200",
"flag": null,
"fixVersionID": "456",
"fixVersionName": "2.8"
}
]
What I want to do is to count the number of records with a specific date and then doing the same looping through a starting date to an end date using jq
But, I can't figure out how to:
Flatten the records so that they are one array not two
Strip the T09:01:23.000+0200 from the closedDate value
Count the number of objects with a specific date value such as 2014-10-13

You have multiple independent inputs. To be able to combine them in any meaningful way, you'll have to slurp up the input. The inputs will be treated as an array of the inputs. Then you could combine them into a single array by adding them.
Since the dates are all in a certain fixed format, you can take substrings of the dates.
"2014-10-13T09:01:23.000+0200"[:10] -> "2014-10-13"
Given that, you can then filter by the date you want and count using the length filter.
add | map(select(.closedDate[:10]=="2014-10-13")) | length
e.g.,
$ cat input.json
[
{
"key": "645",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Crash when saving document",
"closedDate": "2014-10-03T09:01:23.000+0200",
"flag": null,
"fixVersionID": "123",
"fixVersionName": "2.7"
}
]
[
{
"key": "552",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Graphical Issue",
"closedDate": "2014-10-13T09:01:23.000+0200",
"flag": null,
"fixVersionID": "456",
"fixVersionName": "2.8"
}
]
$ jq -s 'add | map(select(.closedDate[:10]=="2014-10-13")) | length' input.json
1

For question 1 and 2:
$ echo -e "[\n$(sed '/^[][]$/d;/closedDate/s/\(T[^"]*\)//g' json)\n]" > flat-json
To count the number for special day:
$ grep "closedDate" flat-json | grep "2014-10-13" | wc -l

Related

Compare two JSON arrays using two or more columns values in Dataweave 2.0

I had a task where I needed to compare and filter two JSON arrays based on the same values using one column of each array. So I used this answer of this question.
However, now I need to compare two JSON arrays matching two, or even three columns values.
I already tried to use one map inside other, however, it isn't working.
The examples could be the ones in the answer I used. Compare db.code = file.code, db.name = file.nm and db.id = file.identity
var db = [
{
"CODE": "A11",
"NAME": "Alpha",
"ID": "C10000"
},
{
"CODE": "B12",
"NAME": "Bravo",
"ID": "B20000"
},
{
"CODE": "C11",
"NAME": "Charlie",
"ID": "C30000"
},
{
"CODE": "D12",
"NAME": "Delta",
"ID": "D40000"
},
{
"CODE": "E12",
"NAME": "Echo",
"ID": "E50000"
}
]
var file = [
{
"IDENTITY": "D40000",
"NM": "Delta",
"CODE": "D12"
},
{
"IDENTITY": "C30000",
"NM": "Charlie",
"CODE": "C11"
}
]

See if this works for you
%dw 2.0
output application/json
var file = [
{
"IDENTITY": "D40000",
"NM": "Delta",
"CODE": "D12"
},
{
"IDENTITY": "C30000",
"NM": "Charlie",
"CODE": "C11"
}
]
var db = [
{
"CODE": "A11",
"NAME": "Alpha",
"ID": "C10000"
},
{
"CODE": "B12",
"NAME": "Bravo",
"ID": "B20000"
},
{
"CODE": "C11",
"NAME": "Charlie",
"ID": "C30000"
},
{
"CODE": "D12",
"NAME": "Delta",
"ID": "D40000"
},
{
"CODE": "E12",
"NAME": "Echo",
"ID": "E50000"
}
]
---
file flatMap(v) -> (
db filter (v.IDENTITY == $.ID and v.NM == $.NAME and v.CODE == $.CODE)
)
Using flatMap instead of map to flatten otherwise will get array of arrays in the output which is cleaner unless you are expecting a possibility of multiple matches per file entry, in which case I'd stick with map.

You can compare objects in DW directly, so the solution you linked can be modified to the following:
%dw 2.0
import * from dw::core::Arrays
output application/json
var db = [
{
"CODE": "A11",
"NAME": "Alpha",
"ID": "C10000"
},
{
"CODE": "B12",
"NAME": "Bravo",
"ID": "B20000"
},
{
"CODE": "C11",
"NAME": "Charlie",
"ID": "C30000"
},
{
"CODE": "D12",
"NAME": "Delta",
"ID": "D40000"
},
{
"CODE": "E12",
"NAME": "Echo",
"ID": "E50000"
}
]
var file = [
{
"IDENTITY": "D40000",
"NM": "Delta",
"CODE": "D12"
},
{
"IDENTITY": "C30000",
"NM": "Charlie",
"CODE": "C11"
}
]
---
db partition (e) -> file contains {IDENTITY:e.ID,NM:e.NAME,CODE:e.CODE}

You can make use of filter directly and using contains
db filter(value) -> file contains {IDENTITY: value.ID, NM: value.NAME, CODE: value.CODE}
This tells you to filter the db array based on if the file contains the object {IDENTITY: value.ID, NM: value.NAME, CODE: value.CODE}. However, this will not work if objects in the file array has other fields that you will not use for comparison. Using above, you can update filter condition to check if an object in file array exist (using data selector) where the condition applies. You can use below to check that.
db filter(value) -> file[?($.IDENTITY==value.ID and $.NM == value.NAME and $.CODE == value.CODE)] != null

unable to parse json into csv using jq

I have a JSON file that I want to convert into a CSV file using the jq in a shell script. I want to create a single row from this entire JSON file. I have to extract value from values. The row output should be something like
null,642,642,412,0,null,null
Here is my JSON file
{
"data": [
{
"name": "exits",
"period": "lifetime",
"values": [
{
"value": {}
}
],
"title": "Exits",
"description": "Number of times someone exited the carousel"
},
{
"name": "impressions",
"period": "lifetime",
"values": [
{
"value": 642
}
],
"title": "Impressions",
"description": "Total number of times the media object has been seen"
},
{
"name": "reach",
"period": "lifetime",
"values": [
{
"value": 412
}
],
"title": "Reach",
"description": "Total number of unique accounts that have seen the media object"
},
{
"name": "replies",
"period": "lifetime",
"values": [
{
"value": 0
}
],
"title": "Replies",
"description": "Total number of replies to the carousel"
},
{
"name": "taps_forward",
"period": "lifetime",
"values": [
{
"value": {}
}
],
"title": "Taps Forward",
"description": "Total number of taps to see this story's next photo or video"
},
{
"name": "taps_back",
"period": "lifetime",
"values": [
{
"value": {}
}
],
"title": "Taps Back",
"description": "Total number of taps to see this story's previous photo or video"
}
]
}
Hi tried using this jq command :
.data | map(.values[].value) | #csv
This is giving the following output:
jq: error (at :70): object ({}) is not valid in a csv row
exit status 5
So when I am getting this empty JSON object it is reflecting an error.
Please Help!!
The row output should be something like
null,642,642,412,0,null,null

Using length==0 here is dubious at best. To check for {} one could write:
jq '.data | map(.values[].value | if . == {} then "null" else . end) | #csv'
Similarly for [].

If you run the command without the #csv part you will see that the output is:
[
{},
642,
412,
0,
{},
{}
]
By replacing the empty objects with "null": (length == 0)
jq '.data | map(.values[].value) | map(if (type == "object" and length == 0 ) then "null" else . end) | #csv'
Output:
"\"null\",642,412,0,\"null\",\"null\""
Per suggestion from #aaron (see comment). The following can produce the requested output without extra post-processing. Disclaimer: this is not working with my jq 1.5, but working on jqplay with jq 1.6.
jq --raw-output '.data | map(.values[].value) | map(if (type == "object" and length == 0 ) then "null" else . end) | join(",")'
Output:
null,642,412,0,null,null

Merge two Apache Avro schemas containing a common array using jq

I've got two Apache Avro schemas (essentially JSON) - one being a "common" part across many schemas and another one as an . Looking for a way to merge them in a shell script.
base.avsc
{
"type": "record",
"fields": [
{
"name": "id",
"type": "string"
}
]
}
schema1.avsc
{
"name": "schema1",
"namespace": "test",
"doc": "Test schema",
"fields": [
{
"name": "property1",
"type": [
"null",
"string"
],
"default": null,
"doc": "Schema 1 specific field"
}
]
}
jq -s '.[0] * .[1]' base.avsc schema1.avsc doesn't merge the array for me:
{
"type": "record",
"fields": [
{
"name": "property1",
"type": [
"null",
"string"
],
"default": null,
"doc": "Schema 1 specific field"
}
],
"name": "schema1",
"namespace": "test",
"doc": "Test schema"
}
I don't expect to have same keys in the "fields" array. And "type": "record", could be moved into schema1.avsc if that makes it easier.
An expected result should be something like this (the order of the keys doesn't make a difference)
{
"name": "schema1",
"namespace": "test",
"doc": "Test schema",
"type": "record",
"fields": [
{
"name": "property1",
"type": [
"null",
"string"
],
"default": null,
"doc": "Schema 1 specific field"
},
{
"name": "id",
"type": "string"
}
]
}
Can't figure out how to write an expression in jq for what I want.

You need an addition (+) operator to perform a union of records from both the files and combine the common record fields from both the files as
jq -s '.[0] as $o1 | .[1] as $o2 | ($o1 + $o2) |.fields = ($o2.fields + $o1.fields) ' base.avsc schema1.avsc
Answer adopted from pkoppstein's comment on this GitHub post Merge arrays in two json files.
The jq manual says this under the addition operator +
Objects are added by merging, that is, inserting all the key-value pairs from both objects into a single combined object. If both objects contain a value for the same key, the object on the right of the + wins. (For recursive merge use the * operator.)

Here's a concise solution that avoids "slurping":
jq --argfile base base.avsc '
$base + .
| .fields += ($base|.fields)
' schema1.avsc
Or you could go with brevity:
jq -s '
.[0].fields as $f | add | .fields += $f
' base.avsc schema1.avsc

as an alternative solution, you may consider handling hierarchical json using a walk-path based unix utility jtc.
the ask here is mere a recursive merge, which with jtc looks like this:
bash $ <schema1.avsc jtc -mi base.avsc
{
"doc": "Test schema",
"fields": [
{
"default": null,
"doc": "Schema 1 specific field",
"name": "property1",
"type": [
"null",
"string"
]
},
{
"name": "id",
"type": "string"
}
],
"name": "schema1",
"namespace": "test",
"type": "record"
}
bash $
PS> Disclosure: I'm the creator of the jtc - shell cli tool for JSON operations

JMESPath current array index

In JMESPath with this query:
people[].{"index":#.index,"name":name, "state":state.name}
On this example data:
{
"people": [
{
"name": "a",
"state": {"name": "up"}
},
{
"name": "b",
"state": {"name": "down"}
},
{
"name": "c",
"state": {"name": "up"}
}
]
}
I get:
[
{
"index": null,
"name": "a",
"state": "up"
},
{
"index": null,
"name": "b",
"state": "down"
},
{
"index": null,
"name": "c",
"state": "up"
}
]
How do I get the index property to actually have the index of the array? I realize that #.index is not the correct syntax but have not been able to find a function that would return the index. Is there a way to include the current array index?

Use-case
Use Jmespath query syntax to extract the numeric index of the current array element, from a series of array elements.
Pitfalls
As of this writing (2019-03-22) this feature is not a part of the standard Jmespath specification.
Workaround
This is possible when running Jmespath from within any of various programming languages, however this must be done outside of Jmespath.

This is not exactly the form you requested but I have a possible answer for you:
people[].{"name":name, "state":state.name} | merge({count: length(#)}, #[*])
this request give this result:
{
"0": {
"name": "a",
"state": "up"
},
"1": {
"name": "b",
"state": "down"
},
"2": {
"name": "c",
"state": "up"
},
"count": 3
}
So each attribute of this object have a index except the last one count it just refer the number of attribute, so if you want to browse the attribute of the object with a loop for example you can do it because you know that the attribute count give the number of attribute to browse.

Getting specific fields' values of a text using awk, sed, and grep

I would like to get the values of the name fields of the following text, using sed, awk, grep or similar.
{
"cast": [
{
"character": "",
"credit_id": "52532e3119c29579400012b5",
"gender": null,
"id": 23629,
"name": "Brian O'Halloran",
"order": 0,
"profile_path": "/eJsLxovTdcm6QK9PDB2pCe5FMqK.jpg"
},
{
"character": "",
"credit_id": "52532e3119c2957940001315",
"gender": null,
"id": 19302,
"name": "Jason Mewes",
"order": 1,
"profile_path": "/so3nT2vgSCZMO2QXDVHF3ubxaFX.jpg"
},
{
"character": "",
"credit_id": "52532e3119c295794000133d",
"gender": null,
"id": 23630,
"name": "Jeff Anderson",
"order": 2,
"profile_path": "/vjt5WhpJAx0jxvmiGc5PAOBzzb7.jpg"
},
{
"character": "Silent Bob",
"credit_id": "52532e3219c2957940001359",
"gender": null,
"id": 19303,
"name": "Kevin Smith",
"order": 4,
"profile_path": "/3XXThSMqHQgQFjM4bMJ25U1EJTj.jpg"
}
],
"crew": [
{
"credit_id": "55425dbe9251410efa000094",
"department": "Visual Effects",
"gender": null,
"id": 1419667,
"job": "Animation Manager",
"name": "Richard J. Gasparian",
"profile_path": null
},
{
"credit_id": "5544521dc3a3680ce60037e8",
"department": "Art",
"gender": null,
"id": 1450356,
"job": "Background Designer",
"name": "Tristin Cole",
"profile_path": null
},
{
"credit_id": "554a142dc3a3683c84001851",
"department": "Art",
"gender": null,
"id": 1447432,
"job": "Background Designer",
"name": "Nadia Vurbenova",
"profile_path": null
},
{
"credit_id": "554bcd2b9251414692002c9b",
"department": "Production",
"gender": null,
"id": 1447493,
"job": "Location Manager",
"name": "Simon Rodgers",
"profile_path": null
},
{
"credit_id": "52532e3219c29579400013cd",
"department": "Production",
"gender": null,
"id": 19303,
"job": "Executive Producer",
"name": "Kevin Smith",
"profile_path": "/3XXThSMqHQgQFjM4bMJ25U1EJTj.jpg"
},
{
"credit_id": "52532e3319c2957940001405",
"department": "Production",
"gender": null,
"id": 59839,
"job": "Producer",
"name": "Harvey Weinstein",
"profile_path": "/k4UCnh7n0r5CEjq30gAl6QCfF9g.jpg"
},
{
"credit_id": "52532e3319c29579400014a7",
"department": "Production",
"gender": null,
"id": 1307,
"job": "Producer",
"name": "Bob Weinstein",
"profile_path": "/oe5Oxp034UOubnvZqqhurp6a1EP.jpg"
}
],
"id": 2
}

jq is the right tool for processing JSON data:
getting all name key values:
jq '[.cast[], .crew[] | .name]' file
The output:
[
"Brian O'Halloran",
"Jason Mewes",
"Jeff Anderson",
"Kevin Smith",
"Richard J. Gasparian",
"Tristin Cole",
"Nadia Vurbenova",
"Simon Rodgers",
"Kevin Smith",
"Harvey Weinstein",
"Bob Weinstein"
]
To get just a list of strings, use the following:
jq '.cast[], .crew[] | .name' file
The output:
"Brian O'Halloran"
"Jason Mewes"
"Jeff Anderson"
"Kevin Smith"
"Richard J. Gasparian"
"Tristin Cole"
"Nadia Vurbenova"
"Simon Rodgers"
"Kevin Smith"
"Harvey Weinstein"
"Bob Weinstein"
To get just a list of unquoted " strings, add the -r (--raw-output) option:
jq -r '.cast[], .crew[] | .name' file

Another jq approach :
jq '.[]?|.[]?|.name' file
"Brian O'Halloran"
"Jason Mewes"
"Jeff Anderson"
"Kevin Smith"
"Richard J. Gasparian"
"Tristin Cole"
"Nadia Vurbenova"
"Simon Rodgers"
"Kevin Smith"
"Harvey Weinstein"
"Bob Weinstein"
Note: The .foo? usage [ see manpage ] does not output even an error when . is not an array or an object.

Helo buddy, with awk just do it:
awk '/name/{gsub("[\",]*", "");print $2}' yourFile.txt
Best regards!! ;)

try with another following awk solutions too here.
1st approach: IN case you need to have double course around the and name values.
awk -F'[:,]' '/name/{sub(/^ +/,"",$2);print $2}' Input_file
2nd approach: In case if you need only the name's values then following may help you in same.
awk -F'[":,]' '/name/{print $5}' Input_file
Explanation of 1st approach: Simply making (: and ,) as field separators and then looking for string name in a line if it contains then substituting the initial space of 2nd field with NULL and then printing the 2nd field's value.
Explanation of 2nd approach: Making (" : and ,) as field separators. Then searching for string name in a line, if it has it then printing the 5th field of that line.

You should parse the JSON file rather than use a regex.
You can use Ruby to do:
$ ruby -0777 -r json -lane '
d=JSON.parse($_)
(d["cast"]+d["crew"]).each { |x| p x["name"] }' json
"Brian O'Halloran"
"Jason Mewes"
"Jeff Anderson"
"Kevin Smith"
"Richard J. Gasparian"
"Tristin Cole"
"Nadia Vurbenova"
"Simon Rodgers"
"Kevin Smith"
"Harvey Weinstein"
"Bob Weinstein"
Or if you want to separate cast from crew:
$ ruby -0777 -r json -lane '
d=JSON.parse($_)
%w(cast crew).each {|w|
puts "#{w.capitalize}:"
(d[w]).each { |x| puts "\t#{x["name"]}" }}' json
Cast:
Brian O'Halloran
Jason Mewes
Jeff Anderson
Kevin Smith
Crew:
Richard J. Gasparian
Tristin Cole
Nadia Vurbenova
Simon Rodgers
Kevin Smith
Harvey Weinstein
Bob Weinstein

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Merging multiple json arrays in the same file - bash

For question 1 and 2: $ echo -e "[\n$(sed '/^[][]$/d;/closedDate/s/\(T[^"]*\)//g' json)\n]" > flat-json To count the number for special day: $ grep "closedDate" flat-json | grep "2014-10-13" | wc -l

Related

Compare two JSON arrays using two or more columns values in Dataweave 2.0

unable to parse json into csv using jq

Merge two Apache Avro schemas containing a common array using jq

JMESPath current array index

Getting specific fields' values of a text using awk, sed, and grep

Categories

Resources