How to retrieve single value with grep from Json? - shell

How to extract a single value from a given json?
{
"Vpc": {
"InstanceTenancy": "default",
"State": "pending",
"VpcId": "vpc-123",
"CidrBlock": "10.0.0.0/16",
"DhcpOptionsId": "dopt-123"
}
}
Tried this but with no luck:
grep -e '(?<="VpcId": ")[^"]*'

You probably wanted -Po, which works with your regex:
$ grep -oP '(?<="VpcId": ")[^"]*' infile
vpc-123
If GNU grep with its -P option isn't available, we can't use look-arounds and have to resort to for example using grep twice:
$ grep -o '"VpcId": "[^"]*' infile | grep -o '[^"]*$'
vpc-123
The first one extracts up to and excluding the closing quotes, the second one searches from the end of the line for non-quotes.
But, as mentioned, you'd be better off properly parsing your JSON. Apart from jq mentioned in another answer, I know of
Jshon
JSON.sh
A jq solution would be as simple as this:
$ jq '.Vpc.VpcId' infile
"vpc-123"
Or, to get raw output instead of JSON:
$ jq -r '.Vpc.VpcId' infile
vpc-123

Something like
grep '^ *"VpcId":' json.file \
| awk '{ print $2 }' \
| sed -e 's/,$//' -e 's/^"//' -e 's/"$//'

you can do:
sed -r -n -e '/^[[:space:]]*"VpcId":/s/^[^:]*: *"(.*)", *$/\1/p'
but really, using any shell tools to run regexes over JSON content is a bad idea. you should consider a much saner language like python.
python -c 'import json, sys; print(json.loads(sys.stdin.read())["Vpc"]["VpcId"]);'

Try this regex pattern:
\"VpcId\":\s?(\"\S+\")

If you can install a tool I would suggest using jq jq. It allows very simple grep, with great support for piping too.

The OP asks for solutions using grep. In case he means using terminal, the node cli is an alternative, since support for JSON is total. One alternative could be the command node --eval "script"
echo '{"key": 42}' \
| node -e 'console.log(JSON.parse(require("fs").readFileSync(0).toString()).key)' //prints 42

Related

Extract data from curl output using sed, awk, cut or python3

I am trying to extract one url from a curl command's output in shell.
The curl command which I am running is :
curl -s "http://HOSTNAME.com/api/v4/projects/18/merge_requests?source_branch=samm_6819"
which gives output something like this :
[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]
Out of all this data, I need output to be :
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
I need this url which I need to print somewhere. I am not sure what can be used here either awk, sed , cut or anything with pipe with the curl command to get output of this url.
Can someone help me please?
I never wanted to ask this here but I am running out of options on what could be the best way to achieve this.
Thanks
Best option here is to use jq:
json='[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]'
echo $json | jq -r '.[].web_url'
You can try this sed
sed 's/.*:.\(http.[^"]*\).*/\1/'
It will match the last occurance of http through to the first occurance of " after http.
Using cat file in place of curl... for demonstration, this will work on the input you show using any sed in any shell on every Unix box:
$ cat file | sed 's/.*"web_url":"\([^"]*\).*/\1/'
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
Try Perl
$ perl -ne ' /.*"web_url":"([^"]+)"/ and print "$1\n" ' sameer.txt
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
Another method:
perl -ne ' while( /"web_url":"([^"]+)"/g ) { $x=1; $t=$1 } print "$t\n" if $x; $x=0 ' sameer.txt

Getting a variable from a website which is then stored

I need to retrieve the "message" variable from the website and store it in a variable.
Im new to shell script so im not sure how to do it. Ive been trying various ways for a while but they dont seem to work.
This is the output of the website example.web:8080/rest/message
[{"id":33,"message":"Dash","lastUpdated":1569922857154,"userName":null}]
#!/bin/bash
message=$( curl -# -L "http://example.web:8080/rest/message}")
username=$(
<<< "${message}" \
grep -P -o -e '(?<=<li>message: <strong>)(.*?)(?=<\/strong><\/li>)' |
head -n 1
)
I need the message variable "Dash" To be stored which can be printer later on.
Though I am not sure if output you are getting is pure json or not, if you are ok with awk could you please try following(If you have proper json then please use jq or parser which is specially made for json parsing).
Your_command | awk -v s1="\"" '
match($0,s1 "message" s1 ":" s1 "[^\"]*"){
num=split(substr($0,RSTART,RLENGTH),array,"\"")
print array[num]
}'
EDIT: If you have jq tool with you, could you please try following(not tested).
curl -# -L "http://example.web:8080/rest/message" | jq '.[] | .message'

Grep multiple strings from text file

Okay so I have a textfile containing multiple strings, example of this -
Hello123
Halo123
Gracias
Thank you
...
I want grep to use these strings to find lines with matching strings/keywords from other files within a directory
example of text files being grepped -
123-example-Halo123
321-example-Gracias-com-no
321-example-match
so in this instance the output should be
123-example-Halo123
321-example-Gracias-com-no
With GNU grep:
grep -f file1 file2
-f FILE: Obtain patterns from FILE, one per line.
Output:
123-example-Halo123
321-example-Gracias-com-no
You should probably look at the manpage for grep to get a better understanding of what options are supported by the grep utility. However, there a number of ways to achieve what you're trying to accomplish. Here's one approach:
grep -e "Hello123" -e "Halo123" -e "Gracias" -e "Thank you" list_of_files_to_search
However, since your search strings are already in a separate file, you would probably want to use this approach:
grep -f patternFile list_of_files_to_search
I can think of two possible solutions for your question:
Use multiple regular expressions - a regular expression for each word you want to find, for example:
grep -e Hello123 -e Halo123 file_to_search.txt
Use a single regular expression with an "or" operator. Using Perl regular expressions, it will look like the following:
grep -P "Hello123|Halo123" file_to_search.txt
EDIT:
As you mentioned in your comment, you want to use a list of words to find from a file and search in a full directory.
You can manipulate the words-to-find file to look like -e flags concatenation:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' '
This will return something like -e "Hello123" -e "Halo123" -e "Gracias" -e" Thank you", which you can then pass to grep using xargs:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' ' | dir_to_search/*
As you can see, the last command also searches in all of the files in the directory.
SECOND EDIT: as PesaThe mentioned, the following command would do this in a much more simple and elegant way:
grep -f words_to_find.txt dir_to_search/*

parse json using a bash script without external libraries

I have a fresh ubuntu installation and I'm using a command that returns a JSON string. I would like to send this json string to an external api using curl. How do I parse something like {"foo":"bar"} to an url like xxx.com?foo=bar using just the standard ubuntu libraries?
Try this
curl -s 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v RS=',"' -F: '/^text/ {print $2}'
You could use tr -d '{}' instead of sed. But leaving them out completely seems to have the desired effect as well.
if you want to strip off the outer quotes, pipe the result of the above through sed 's/(^"\|"$)//g'

Curl and xargs in piped commands

I want to process an old database where password are plain text (comma separated ; passwd is the 5th field in the csv file where the database has been exported) to crypt them for further use by dokuwiki. Here is my bash command (grep and sed are there to extract the crypted passwd from curl output) :
cat users.csv | awk 'FS="," { print $4 }' | xargs -l bash -c 'curl -s --data-binary "pass1=$0&pass2=$0" "https://sprhost.com/tools/SMD5.php" -o - ' | xargs | grep -o '<tt.*tt>' | sed -e 's/tt//g' | sed -e 's/<[^>]*>//g'
I get the following comment from xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
And only the first line of the file is processed, and nothing appends then.
Using the -0 option, and playing around with quotes, doesn't solve anything. Where am I wrong in the command line ? May be a more advanced language will be more adequate to do this.
Thank for help, LM
In general, if you have such a long pipe of commands, it is better to split them if things go wrong. Going through your pipe:
cat users.csv |
Nothing unexpected there.
awk 'FS="," { print $4 }' |
You probably wanted to do awk 'BEGIN {FS=","} { print $4 }'. Try the first two commands in the pipe and see if they produce the correct answer.
xargs -l bash -c 'curl -s --data-binary "pass1=$0&pass2=$0" "https://sprhost.com/tools/SMD5.php" -o - ' |
Nothing wrong there, although there might be better ways to do an MD5 hash.
xargs |
What is this xargs doing in the pipe? It should be removed.
grep -o '<tt.*tt>' |
Note that this will produce two lines:
<tt>$1$17ab075e$0VQMuM3cr5CtElvMxrPcE0</tt>
<tt><your_docuwiki_root>/conf/users.auth.php</tt>
which is probably not what you expected.
sed -e 's/tt//g' |
sed -e 's/<[^>]*>//g'
which will remove the html-tags, though
sed 's/<tt>//;s/<.tt>//'
will do the same.
So I'd say a wrong awk and an xargs too many.

Resources