grep multiples results randomly results in bash - bash

I'm making a query into a rest api, from this result i got:
{ "meta": { "query_time": 0.004266858, "pagination": { "offset": 0, "limit": 00, "total": 4 }, "powered_by": "device-api", "trace_id": "foo" }, "resources": [ "foo/bar", "foo/bar/2", "foo/bar/3", "foo/bar/4" ], "errors": [] }
I want to take results only from resources like this:
"resources": [
"foo/bar",
"foo/bar/2",
"foo/bar/3",
"foo/bar/4"
],
Can we share some knowledge? thanks a lot!
PS: these results from resources are random

Don't use grep or other regular expression tools to parse JSON. JSON is structured data and should be processed by a tool designed to read JSON. On the command line jq is a great tool for this purpose. There are many powerful JSON libraries written in other languages if jq isn't what you need.
Once you've extracted the data you care about, you can use the shuf utility to select random lines, e.g. shuf -n 5 would sample five random lines from the input.
With the JSON you've provided this appears to do what I think you want:
jq --raw-output '.resources[]' | shuf -n 2
You may need to tweak the jq syntax slightly if the real JSON has a different structure.

Related

Custom JSON output formatting with JQ

I'd like to use jq to format the output of some known keys in my objects more succinctly.
Sample object:
// test.json
[
{
"target": "some-string",
"datapoints": [
[
123,
456
],
[
789,
101112
]
]
}
]
I'd like to use JQ (with some incantation) to change this to put all the datapoints objects on a single line. E.g.
[
{
"target": "some-string",
"datapoints": [[ 123, 456 ], [ 789, 101112 ]]
}
]
I don't really know if JQ allows this. I searched around - and found custom formatters like https://www.npmjs.com/package/perfect-json which seem to do what I want. I'd prefer to have a portable incantation for this using jq alone (and/or with standard *nix tools).
Use a two-pass approach. In the first, stringify the field using special markers so that in the second pass, they can be removed.
Depending on your level of paranoia, this second pass could be very simple or quite complex. On the simple end of the spectrum, choose markers that simply will not occur elsewhere, perhaps "<q>…</q>", or using some combination of non-ASCII characters. On the complex end of the spectrum, only remove the markers if they occur in the fields in which they are known to be markers.
Both passes could be accomplished with jq, along the lines of:
jq '.[].datapoints |= "<q>\(tojson)</q>"' |
jq -Rr 'sub("<q>(?<s>.*)</q>"; .s)'
Using jq and perl :
jq 'map(.datapoints |= "\u001b\(tojson)\u001b")
' test.json | perl -pe 's/"\\u001b(.*?)\\u001b"/$1/g'

Select highest version value from JSON array

I have an JSON result and i need to search for a specific value and get from the array.
For example here is my JSON and I need to search for a version 1.15 having the higher patch version inside the validNodeVersions array. So here I wanted to retrieve the value 1.15.12-gke.20 and that is the highest 1.15 versions for the array list. Can somebody please help on this?
Basically I am looking always to pick the highest patch release for any of the version. for 1.15 it is 1.15.12-gke.20.
gcloud container get-server-config --format json
{
"channels": [
{
"channel": "REGULAR",
"defaultVersion": "1.17.9-gke.1504",
"validVersions": [
"1.17.9-gke.6300",
"1.17.9-gke.1504"
]
},
{
"channel": "STABLE",
"defaultVersion": "1.16.13-gke.401",
"validVersions": [
"1.16.13-gke.401",
"1.15.12-gke.20"
]
}
],
"defaultClusterVersion": "1.16.13-gke.401",
"defaultImageType": "COS",
"validImageTypes": [
"UBUNTU",
"UBUNTU_CONTAINERD"
],
"validMasterVersions": [
"1.17.12-gke.500",
"1.14.10-gke.50"
],
"validNodeVersions": [
"1.17.12-gke.500",
"1.16.8-gke.12",
"1.15.12-gke.20",
"1.15.12-gke.17",
"1.15.12-gke.16",
"1.15.12-gke.13",
"1.15.12-gke.9",
"1.15.12-gke.6",
"1.15.12-gke.3",
"1.15.12-gke.2",
"1.15.11-gke.17",
"1.15.11-gke.15",
"1.15.11-gke.13",
"1.15.11-gke.12",
"1.15.11-gke.11",
"1.15.11-gke.9",
"1.15.11-gke.5",
"1.15.11-gke.3",
"1.15.11-gke.1",
"1.15.9-gke.26",
"1.15.8-gke.3",
"1.15.7-gke.23",
"1.15.4-gke.22",
"1.14.10-gke.0",
"1.14.9-gke.0"
]
}
It is more tricky to match the regex, sort or anything inside jq. GNU sort command has a nice parameter, -V that stands for version sorting, so here is a simple way to do this, also without any awk or sort splitting to fields or similar.
jq -r '.validNodeVersions[]' file.json | grep "^1\.15" | sort -V | tail -1
1.15.12-gke.20
jq is doing a simple selection of values here, grep filters these values by version and after sorting by version we get the highest.

Issue when converting JSON files to NDJSON with jq not sure how to fix this

I tried to convert a large number of JSON files in the same directory into NDJSON in order to load them into an analytics tool.
I used jq in order to convert them into one file using the command below.
for file in *; do cat $file |jq -c '.[]' >> testNDJSON.json; done
The original JSON structure is following
{"user_id":"user_id_value",
"user_properties": {
"key": "value",
"key": "value",
"key": "value"
}
}
Using my command data is written into a file but in the format
user_id_value, {"key": "value", "key: "value", "key": "value"}
So essentially both user id and user_properties are losing keys + the outer JSON brackets. I'm not sure how to fix this in jq
What I would like to get is the same JSON structure I had line by line in the same file. I don't understand why my command above drops the utmost keys and brackets from JSON.
{"user_id":"user_id_value", "user_properties": { "key": "value", "key": "value", "key": "value"} }
This question will be closed. My code above is correct. I had environment configuration in my bash which caused the errors I had.
I'd like to thank #Cyrus and #Antoine Pietri for helpful comments. Learned more about jq tool with this.

Trying to iterate operations on a file with awk and sed

I've got a line that pulls out the number of times the word severity comes out after the word vulnerabilities in a file
please don't laugh too hard:
cat <file> | sed '1,/vulnerabilities/d' | grep -c '"severity": 4'
This will come back with a count of "severity" : 4 matches in the file. I can't seem to iterate this amongst other files.
I have 100 or so files in the form bleeblah-082017. Where bleeblah can be different lengths and words. I'm having issues on how to easily iterate from one file above to get results from each individually.
I would usually have used an awk line to iterate through the list, but I can't seem to find any examples to meld awk and sed.
Would anyone have any ideas on how to perform the task above over many files and return a results per file?
Thanks
Davey
I have a file that has a bunch of entries such as:
{
"count": 6,
"plugin_family": "Misc.",
"plugin_id": 7467253,
"plugin_name": "Blah",
"severity": 4,
"severity_index": 1,
"vuln_index": 13
I'd like to extract the times "severity": 4 appears after the word vulnerabilities in each file. The output would be 10
Some more of the input file.
"notes": null,
"remediations": {
"num_cves": 20,
"num_hosts": 6,
"num_impacted_hosts": 2,
"num_remediated_cves": 6,
"remediations": [
{
"hosts": 2,
"remediation": "Apache HTTP Server httpOnly Cookie Information Disclosure: Upgrade to Apache version 2.0.65 / 2.2.22 or later.",
"value": "f950f3ddf554d7ea2bda868d54e2b639",
"vulns": 4
},
{
"hosts": 2,
"remediation": "Oracle Application Express (Apex) CVE-2012-1708: Upgrade Application Express to at least version 4.1.1.",
"value": "2c07a93fee3b201a9c380e59fa102ccc",
"vulns": 2
}
]
},
"vulnerabilities": [
{
"count": 6,
"plugin_family": "Misc.",
"plugin_id": 71049,
"plugin_name": "SSH Weak MAC Algorithms Enabled",
"severity": 1,
"severity_index": 0,
"vuln_index": 15
},
{
"count": 6,
"plugin_family": "Misc.",
"plugin_id": 70658,
"plugin_name": "SSH Server CBC Mode Ciphers Enabled",
"severity": 1,
"severity_index": 1,
"vuln_index": 13
},
{
"count": 2,
"plugin_family": "Web Servers",
"plugin_id": 64713,
"plugin_name": "Oracle Application Express (Apex) CVE-2012-1708",
"severity": 2,
"severity_index": 2,
"vuln_index": 12
},
Each of these files are from vulnerability scans that have been extracted from my scanner API. Essentially the word severity is all over the place in the different aspects (hosts, vulns, etc). I want to extract from each scan file the number of times the pattern appears after the word vulnerability (which only appears once in each file). Open to using perl python whatever to acheive this. Was just more familiar with shell scripting to manipulate these text type files in the past.
Parsing .json data with sed or awk is fraught with potential pitfalls. I recommend using a format-aware tool like jq to query the data you want. In this case, you can do something like
jq '{(input_filename): [.vulnerabilities[].severity]|add}' *.json
This should produce output something like
{
"bleeblah-201708.json": 4
}
{
"bleeblah-201709.json": 11
}
Use jq for parsing json on the command line. It is the standard tool. Working with text based tools like sed to parse json is very fragile since it relies on the order of elements and formatting of the json documents which is not guaranteed or part of the the json standard.
What you are looking for is the following command:
jq '[.vulnerabilities[]|select(.severity==4)]|length' file.json
If you want to run it for multiple files, use find:
find FOLDER -name 'PATTERN.json' -print \
-exec jq '[.vulnerabilities[]|select(.severity==4)]|length' {} +
I have made the following two example files, assuming that they can represent what you have. Note the occurrence of the search text before "vulnerabilities" and after, with different number of occurrences after.
From your code I assume that the search string will only be at most once on a line, the lines will be counted.
blableh-082017:
"severity" : 4
"severity" : 4
vulnerabilities
"severity" : 4
"severity" : 4
bleeblah-082017:
"severity" : 4
"severity" : 4
vulnerabilities
"severity" : 4
"severity" : 4
"severity" : 4
Here is my proposal, using find in addition to sed and grep, also using sh to achieve the desired piping inside -exec.
find . -iname "*-082017" -print -exec sh -c "sed 1,/vulnerabilities/d {} | grep -c '\"severity\" : 4'" \;
Output (hoping a name line and a count line are OK, otherwise another sed coudl reformat for you):
./blableh-082017
2
./bleeblah-082017
3
Details:
use find to process multiple files and get each file name to the output,
inspite of seds lack of support for that
use basically your code to do the cutting via sed and the counting via grep
give filename to sed as parameter, instead via pipe from cat
use sh within -exec to achieve piping
(answer by devnull to How to use pipe within -exec in find)
Environment:
GNU sed version 4.2.1
GNU bash, version 3.1.23(1)-release (i686-pc-msys)
GNU grep 2.5.4
find (GNU findutils) 4.4.2

Parsing Json data columnwise in shell

When I run a command I get a response like this
{
"status": "available",
"managed": true,
"name":vdisk7,
"support":{
"status": "supported"
},
"storage_pool": "pfm9253_pfm9254_new",
"id": "ff10abad"-2bf-4ef3-9038-9ae7f18ea77c",
"size":100
},
and hundreds of this type of lists or dictionaries
I want a command that does such sort of a thing
if name = "something",
get the id
Any links that would help me in learning such sort of commands would be highly appreciated
I have tried
awk '{if ($2 == "something") print $0;}'
But I think the response is in Json so the colum wise awk formatting is not working.
Also it's just a single command that I need to run so I would prefer not to use any external library.
JSON parser is better for this task
awk and sed are utilities to parse line-oriented text, but not json. What if your json formatting will change ? (some lines will go on one line ?).
You should use any standard json parser out there. Or use some powerful scripting language, such as PHP, Python, Ruby, etc.
I can provide you with example on how to do it with python.
What if I can't use powerful scripting language ?
If you totally unable to use python, then there is utility jq out there: link
If you have some recent distro, jq maybe already in repositories (example: Ubuntu 13.10 has it in repos).
I can use python!
I would do that using simple python inline script.
For example we have some some_command that returns json as a result.
We have to get value of data["name"].
Here we go:
some_command | python -c "import json, sys; print json.load(sys.stdin)['name']"
It will output vdisk7 in your case
For this to work you need to be sure, json is fully valid.
If you have a list of json objects:
[
{
...
"name": "vdisk17"
...
},
{
...
"name": "vdisk18"
...
},
{
...
"name": "vdisk19"
...
},
...
]
You could use some list comprehensions:
some_command | python -c "import json, sys; [sys.stdout.write(x['name'] + '\n') for x in json.load(sys.stdin)]"
It will output:
vdisk17
vdisk18
vdisk19

Resources