Extract data from curl output using sed, awk, cut or python3 - bash

I am trying to extract one url from a curl command's output in shell.
The curl command which I am running is :
curl -s "http://HOSTNAME.com/api/v4/projects/18/merge_requests?source_branch=samm_6819"
which gives output something like this :
[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]
Out of all this data, I need output to be :
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
I need this url which I need to print somewhere. I am not sure what can be used here either awk, sed , cut or anything with pipe with the curl command to get output of this url.
Can someone help me please?
I never wanted to ask this here but I am running out of options on what could be the best way to achieve this.
Thanks

Best option here is to use jq:
json='[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]'
echo $json | jq -r '.[].web_url'

You can try this sed
sed 's/.*:.\(http.[^"]*\).*/\1/'
It will match the last occurance of http through to the first occurance of " after http.

Using cat file in place of curl... for demonstration, this will work on the input you show using any sed in any shell on every Unix box:
$ cat file | sed 's/.*"web_url":"\([^"]*\).*/\1/'
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69

Try Perl
$ perl -ne ' /.*"web_url":"([^"]+)"/ and print "$1\n" ' sameer.txt
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
Another method:
perl -ne ' while( /"web_url":"([^"]+)"/g ) { $x=1; $t=$1 } print "$t\n" if $x; $x=0 ' sameer.txt

Related

parse json using a bash script without external libraries

I have a fresh ubuntu installation and I'm using a command that returns a JSON string. I would like to send this json string to an external api using curl. How do I parse something like {"foo":"bar"} to an url like xxx.com?foo=bar using just the standard ubuntu libraries?
Try this
curl -s 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v RS=',"' -F: '/^text/ {print $2}'
You could use tr -d '{}' instead of sed. But leaving them out completely seems to have the desired effect as well.
if you want to strip off the outer quotes, pipe the result of the above through sed 's/(^"\|"$)//g'

How to retrieve single value with grep from Json?

How to extract a single value from a given json?
{
"Vpc": {
"InstanceTenancy": "default",
"State": "pending",
"VpcId": "vpc-123",
"CidrBlock": "10.0.0.0/16",
"DhcpOptionsId": "dopt-123"
}
}
Tried this but with no luck:
grep -e '(?<="VpcId": ")[^"]*'
You probably wanted -Po, which works with your regex:
$ grep -oP '(?<="VpcId": ")[^"]*' infile
vpc-123
If GNU grep with its -P option isn't available, we can't use look-arounds and have to resort to for example using grep twice:
$ grep -o '"VpcId": "[^"]*' infile | grep -o '[^"]*$'
vpc-123
The first one extracts up to and excluding the closing quotes, the second one searches from the end of the line for non-quotes.
But, as mentioned, you'd be better off properly parsing your JSON. Apart from jq mentioned in another answer, I know of
Jshon
JSON.sh
A jq solution would be as simple as this:
$ jq '.Vpc.VpcId' infile
"vpc-123"
Or, to get raw output instead of JSON:
$ jq -r '.Vpc.VpcId' infile
vpc-123
Something like
grep '^ *"VpcId":' json.file \
| awk '{ print $2 }' \
| sed -e 's/,$//' -e 's/^"//' -e 's/"$//'
you can do:
sed -r -n -e '/^[[:space:]]*"VpcId":/s/^[^:]*: *"(.*)", *$/\1/p'
but really, using any shell tools to run regexes over JSON content is a bad idea. you should consider a much saner language like python.
python -c 'import json, sys; print(json.loads(sys.stdin.read())["Vpc"]["VpcId"]);'
Try this regex pattern:
\"VpcId\":\s?(\"\S+\")
If you can install a tool I would suggest using jq jq. It allows very simple grep, with great support for piping too.
The OP asks for solutions using grep. In case he means using terminal, the node cli is an alternative, since support for JSON is total. One alternative could be the command node --eval "script"
echo '{"key": 42}' \
| node -e 'console.log(JSON.parse(require("fs").readFileSync(0).toString()).key)' //prints 42

Using Sed with regular expression to save results into a variable

What I'm trying to do is take user input as a string and parse a section of the string. The results from my regex I want to save into a new variable. Here is what I have so far.
#!/bin/bash
downloadUrl="$1"
pythonFile=echo $downloadUrl | sed '/Python-[\d+.]+tgz/'
echo "$downloadUrl"
echo "$pythonFile"
And here is my result.
sed: 1: "/Python-[\d+.]+tgz/": command expected
You forgot to run the commands in $() to get command substitution. Use:
pythonFile=$(echo $downloadUrl | sed '/Python-[\d+.]+tgz/')
See the manual for more details.
sed '/Python-[\d+.]+tgz/' is incorrect since you need to have a sed command like p and anyway it is only searching not really extracting part of input text.
You can use grep -oP
pythonFile$(grep -oP '/Python-[\d+.]+tgz/' <<< "$downloadUrl")
Or without -P
pythonFile$(grep -o '/Python-[0-9+.]\+tgz/' <<< "$downloadUrl")

Extract value via OSX Terminal from .html for "curl" submission within a single script

How do I extract the variable value of the following line of an html page via Terminal to submit it afterwards via "curl -d" in the same script?
<input type="hidden" name="au_pxytimetag" value="1234567890">
Edit: how do I transfer the extracted value to the "curl -d" command within a single script? might be a silly question, but I'm total noob. =0)
EDITED:
I cannot tell from your question what you are actually trying to do. I originally thought you were trying to extract a variable from a file, but it seems you actually want to firstly, get that file, secondly extract a variable, and thirdly, use variable for something else... so let's address each of those steps:
Firstly you want to grab a page using curl, so you will do
curl www.some.where.com
and the page will be output on your terminal. But actually you want to search for something on that page, so you need to do
curl www.some.where.com | awk something
or
curl www.some.where.com | grep something
But you want to put that into a variable, so you need to do
var=$(curl www.some.where.com | awk something)
or
var=$(curl www.some.where.com | grep something)
The actual command I think you want is
var=$(curl www.some.where.com | awk -F\" '/au_pxytimetag/{print $(NF-1)}')
Then you want to use the variable var for another curl operation, so you will need to do
curl -d "param1=$var" http://some.url.com/somewhere
Original answer
I'd use awk like this:
var=$(awk -F\" '/au_pxytimetag/{print $(NF-1)}' yourfile)
to take second to last field on line containing au_pxytimetag using " as field separator.
Then you can use it like this
curl -d "param1=$var&param2=SomethingElse" http://some.url.com/somewhere
You can use xmllint:
value=$(xmllint --html --xpath "string(//input[#name='au_pxytimetag']/#value)" index.html)
You can do it with my Xidel:
xidel http://webpage -e "//input[#name='au_pxytimetag']/#value"
But you do not need to.
With
xidel http://webpage -f "(//form)[1]" -e "//what-you-need-from-the-next-page"
you can send all values from the first form on the webpage to the form action and then you can query something from the next page
You can try:
grep au_pxytimetag input.html | sed "s/.* value=\"\(.*\)\".*/\1/"
EDIT:
If you need this on a script:
#!/bin/bash
DATA=$(grep au_pxytimetag input.html | sed "s/.* value=\"\(.*\)\".*/\1/")
curl http://example.com -d $DATA

I'm trying to use sed as a replacement for grep

The command I'm trying to use is
sed -n 's/'$LASTNAME'/pgIw '$TEMP_FILE2'' < "$TEMP_FILE"
My goal is to get search TEMP_FILE for the value in LASTNAME and write the line containing the match, if there is one, it to TEMP_FILE2. I keep getting that the sed command is garbled. The code above will return
sed: command garbled: s/smith/pgIw /tmp/tmp.aKaGFH
Any help is appreciated! I've been trying to figure this out for hours! This is suppose to be done in the Korn shell in UNIX and I can't use awk or python, it is the stipulations of the homework.
Thank you!
It looks like you have an abundance of quotes that you don't need. Try:
sed -n "/$LASTNAME/p" >$TEMP_FILE2 <$TEMP_FILE
Also, your use of the s sed command seems to be out of place, since you don't actually want to substitute anything.
If you just want to find something and pipe to output, then simply use grep
grep -i "$LASTNAME" "$TEMP_FILE" > "$TEMP_FILE2" # -i case-insensitive
Or Perl
perl -ne "print if /$LAST_NAME/" "$TEMP_FILE" > "$TEMP_FILE2"

Resources