I have a jq command which I am trying to parallelise using GNU parallel but for some reason I am not able to get it to work.
The vanilla jq query is:
jq --raw-output '._id as $id | ._source.CitationTextHeader.Article.AuthorList[]? | .Affiliation.Affiliation | [ $id, .[0:rindex(" Electronic address:")] ] | #csv' results.json > test.out
I have tried to use it with parallel like so:
parallel -j0 --keep-order --spreadstdin "jq --raw-output '._id as $id | ._source.CitationTextHeader.Article.AuthorList[]? | .Affiliation.Affiliation | [ $id, .[0:rindex(" Electronic address:")] ] | #csv'" < results.json > test.json
but I get some bizzare compile error:
jq: error: syntax error, unexpected '|', expecting '$' or '[' or '{' (Unix shell quoting issues?) at <top-level>, line 1:
._id as | ._source.CitationTextHeader.Article.AuthorList[]? | .Affiliation.Affiliation | [ , .[0:rindex( Electronic address:)] ] | #csv
jq: 1 compile error
I think it does not like something re: quoting things in the string, but the error is a bit unhelpful.
UPDATE
Looking at other threads, I managed to construct this:
parallel -a results.json --results test.json -q jq -r '._id as $id | ._source.CitationTextHeader.Article.AuthorList[]? | .Affiliation.Affiliation | [ $id, .[0:rindex(" Electronic address:")] ] | #csv'
but now it complains:
parallel: Error: Command line too long (76224 >= 63664) at input 0:
:(
An aexample (firstline) of the json file:
{
"_index": "corpuspm",
"_type": "_doc",
"_id": "6786777",
"_score": 1,
"_source": {
"CitationTextHeader": {
"Article": {
"AuthorList": [
{
"Affiliation": {
"Affiliation": "title, society, American Pediatric Society. address#hotmail.com."
}
}
]
}
}
}
}
results.json is a large file containing a json on each line
You could use --spreadstdin and -n1 to linewise spread the input into your jq filter. Without knowing about the structure of your input JSONs, I have just copied over your "vanilla" filter:
< results.json > test.out parallel -j0 -n1 -k --spreadstdin 'jq -r '\''
._id as $id | ._source.CitationTextHeader.Article.AuthorList[]?
| .Affiliation.Affiliation | [$id, .[0:rindex(" Electronic address:")]]
| #csv
'\'
Without more info this will be a guess:
doit() {
jq --raw-output '._id as $id | ._source.CitationTextHeader.Article.AuthorList[]? | .Affiliation.Affiliation | [ $id, .[0:rindex(" Electronic address:")] ] | #csv'
}
export -f doit
cat results.json | parallel --pipe doit > test.out
It reads blocks of ~1 MB from results.json which it passes to doit.
If that works, you may be able to speed up the processing with:
parallel --block -1 -a results.json --pipepart doit > test.out
It will on-the-fly split up results.json into n parts (where n = number of CPU threads). Each part will be piped into doit. The overhead of this is quite small.
Add --keep-order if you need the output to be in the same order as input.
If your disks are slow and your CPU is fast, this may be even faster:
parallel --lb --block -1 -a results.json --pipepart doit > test.out
It will buffer in RAM instead of in tempfiles. --keep-order will, however, not be useful here because the output from job 2 will only be read after job 1 is done.
I am new to Bash and I am currently trying to get the content of a JSON without showing the names of the key values.
This is how the JSON looks like (part of it):
[
{
"V1": 65,
"V2": "Female",
"V3": 0.7,
"V4": 0.1,
"V5": 187,
"V6": 16,
"V7": 18,
"V8": 6.8,
"V9": 3.3,
"V10": 0.9,
"Class": 1
},
{
"V1": 62,
"V2": "Male",
"V3": 10.9,
"V4": 5.5,
"V5": 699,
"V6": 64,
"V7": 100,
"V8": 7.5,
"V9": 3.2,
"V10": 0.74,
"Class": 1
},
{
"V1": 62,
"V2": "Male",
"V3": 7.3,
"V4": 4.1,
"V5": 490,
"V6": 60,
"V7": 68,
"V8": 7,
"V9": 3.3,
"V10": 0.89,
"Class": 1
}
]
This is my script
#!/bin/bash
echo "Albumin =3";
echo "Age Sex Albumin Proteins";
echo "******";
echo " "
echo "Women";
echo "--------------";
cat csvjson.json | jq -c '.[] | {V1, V2, V8, V9} | select(.V9 ==3) | select(.V2 =="Female")';
echo " "
echo "Men";
echo "-------------";
cat csvjson.json | jq -c '.[] | {V1, V2, V8, V9} | select(.V9 ==3) | select(.V2 =="Male")';
This is what the script shows
Women
--------------
{"V1":38,"V2":"Female","V8":5.6,"V9":3}
{"V1":38,"V2":"Female","V8":5.6,"V9":3}
{"V1":32,"V2":"Female","V8":6,"V9":3}
{"V1":31,"V2":"Female","V8":6,"V9":3}
{"V1":19,"V2":"Female","V8":5.5,"V9":3}
{"V1":38,"V2":"Female","V8":7,"V9":3}
{"V1":20,"V2":"Female","V8":6.1,"V9":3}
{"V1":32,"V2":"Female","V8":7,"V9":3}
{"V1":42,"V2":"Female","V8":6.7,"V9":3}
Men
-------------
{"V1":72,"V2":"Male","V8":7.4,"V9":3}
{"V1":60,"V2":"Male","V8":6.3,"V9":3}
{"V1":33,"V2":"Male","V8":5.4,"V9":3}
{"V1":60,"V2":"Male","V8":6.8,"V9":3}
{"V1":60,"V2":"Male","V8":7.4,"V9":3}
{"V1":60,"V2":"Male","V8":7,"V9":3}
{"V1":72,"V2":"Male","V8":6.2,"V9":3}
And this is what I want to show
Women
--------------
38,Female,3, 5.6
38,Female,3, 5.6
32,Female,3, 6
31,Female,3, 6
19,Female,3, 5.5
38,Female,3, 7
20,Female,3, 6.1
32,Female,3, 7
42,Female,3, 6.7
Men
--------------
72,Male,3, 7.4
60,Male,3, 6.3
33,Male,3, 5.4
60,Male,3, 6.8
60,Male,3, 7.4
60,Male,3, 7
72,Male,3, 6.2
So, how can I hide the key values and only show the content of the JSON after doing the filters I did?
This can be accomplished entirely within jq (although some constraints are not all clear, so please comment and I will update the code):
jq --raw-output '
group_by(.V2)[]
| if first.V2 == "Male" then "Men" else "Women" end,
"--------------",
(
.[]
| select(.V9 == 3.3) # this filters to matching records
| [.V1, .V2, .V9, .V8]
| join(",")
)
' csvjson.json
Demo
Demo stand-alone jq script and code bloc language highlight for use here in stack sites using pmf's answer.
#!/usr/bin/env -S jq --raw-output --from-file
group_by(.V2)[]
| if first.V2 == "Male" then "Men" else "Women" end,
"--------------",
(
.[]
| select(.V9 == 3.3) # this filters to matching records
| [.V1, .V2, .V9, .V8]
| join(",")
)
I am unable to pass a variable in the tag-user cli command.
A=$(aws iam list-user-tags --user-name user --query 'Tags[].{Key:Key,Value:Value}' | grep -B2 "Description" | grep Value | awk -F ":" '{print $2}' | tr -d '",'| awk '$1=$1')
aws iam list-user-tags --user-name user --query 'Tags[].{Key:Key,Value:Value}' | grep -B2 "Description" | grep Value
"Value": "Used for SSO",
A=Used for SSO
passing the value of A to the below CLI :
aws iam tag-user --user-name azure-sso-user --tags "[{"Key": "own:team","Value": "test#test.com"},{"Key": "security","Value": "Service"},{"Key": "comment","Value": "$A"}]"
This is the error I get:
Error parsing parameter '--tags': Invalid JSON:
[{Key: own:team,Value: test#test.com},{Key: security,Value: Service},{Key: own:comment,Value: Used
This worked:
aws iam tag-user --user-name user --tags '[{"Key": "own:team","Value": "test#test.com"},{"Key": "security","Value": "Service"},{"Key": "own:comment","Value": "'"$A"'"}]'
That is, using the following:
[
{
"Key": "own:team",
"Value": "test#test.com"
},
{
"Key": "security",
"Value": "Service"
},
{
"Key": "own:comment",
"Value": "'"
$A
"'"
}
]
I am running aws cli
aws cloudwatch get-metric-statistics --metric-name CPUUtilization --start-time 2010-02-20T12:00:00 --end-time 2010-02-20T15:00:00 --period 60 --namespace AWS/EC2 --extended-statistics p80 --dimensions Name=InstanceId,Value=i-0b123423423
the output comes as
{
"Label": "CPUUtilization",
"Datapoints": [
{
"Timestamp": "2020-02-20T12:15:00Z",
"Unit": "Percent",
"ExtendedStatistics": {
"p80": 0.16587132264856133
}
},
How do i get the output in the below format's (2 Columns)
19.514049550078127 2020-02-13T20:15:00Z
12.721997782508938 2020-02-13T19:15:00Z
13.318820949213313 2020-02-13T18:15:00Z
15.994192991030545 2020-02-13T17:15:00Z
18.13096421299414 2020-02-13T16:15:00Z
with Heading as CPUUtilization (2 columns)
CPUUtilization
19.514049550078127 2020-02-13T20:15:00Z
12.721997782508938 2020-02-13T19:15:00Z
13.318820949213313 2020-02-13T18:15:00Z
15.994192991030545 2020-02-13T17:15:00Z
18.13096421299414 2020-02-13T16:15:00Z
And in single column
19.514049550078127
12.721997782508938
13.318820949213313
15.994192991030545
18.13096421299414
How can achieve this ?
Assuming the input file is input.json, then:
To output in the 2 columns format:
jq -r '.Datapoints[] | [.ExtendedStatistics.p80, .Timestamp] | #tsv' input.json | sort -nr
With Heading as CPUUtilization (2 columns):
echo CPUUtilization; jq -r '.Datapoints[] | [.ExtendedStatistics.p80, .Timestamp] | #tsv' input.json | sort -nr
And in single column:
jq -r '.Datapoints[] | [.ExtendedStatistics.p80] | #tsv' input.json | sort -nr
This question already has answers here:
jq not working on tag name with dashes and numbers
(2 answers)
Closed 4 years ago.
Whole file:https://1drv.ms/u/s!AizscpxS0QM4hJpEkp12VPHiKO_gBg
Using this command i get part bellow (get latest job)
jq '.|[ .executions[] | select(.job.name != null) | select(.job.name) ]
| sort_by(.id)
| reverse
| .[0] ' 1.json
{
"argstring": null,
"date-ended": {
"date": "2018-04-03T17:43:38Z",
"unixtime": 1522777418397
},
"date-started": {
"date": "2018-04-03T17:43:34Z",
"unixtime": 1522777414646
},
"description": "",
"executionType": "user",
"failedNodes": [
"172.30.61.88"
],
"href": "http://172.30.61.88:4440/api/21/execution/126",
"id": 126,
"job": {
"averageDuration": 4197,
"description": "",
"group": "",
"href": "http://172.30.61.88:4440/api/21/job/271cbcec-5042-4d52-b794-ede2056b2ab8",
"id": "271cbcec-5042-4d52-b794-ede2056b2ab8",
"name": "aa",
"permalink": "http://172.30.61.88:4440/project/demo/job/show/271cbcec-5042-4d52-b794-ede2056b2ab8",
"project": "demo"
},
"permalink": "http://172.30.61.88:4440/project/demo/execution/show/126",
"project": "demo",
"status": "failed",
"user": "administrator"
I managed to extract job name and status, now want to get date-ended.date ?
jq '.|[ .executions[] |select(.job.name != null) | select(.job.name) ]
| sort_by(.id)
| reverse
| .[0]
| "\(.status), \(.job.name)"' 1.json
With the "-r" command-line option, the following filter:
[.executions[] | select(.job.name != null)]
| sort_by(.id)
| reverse
| .[0]
| [.status, .job.name, ."date-ended".date]
| #csv
produces:
"failed","aa","2018-04-03T17:43:38Z"
An important point that you might have missed is that "-" is a "special" character in that it can signify negation or subtraction.
If your jq does not support the syntax ."date-ended".date, then you could fall back to the basic syntax: (.["date-ended"] | .date)
I guess you have troubles extracting .date-ended.date because the name contains a dash that is interpreted by jq as subtraction.
The solution is listed in the documentation:
If the key contains special characters, you need to surround it with double quotes like this: ."foo$", or else .["foo$"].
This means the last filter of your jq program should be:
"\(.status), \(.job.name), \(."date-ended".date)"