I am trying to extract one url from a curl command's output in shell.
The curl command which I am running is :
curl -s "http://HOSTNAME.com/api/v4/projects/18/merge_requests?source_branch=samm_6819"
which gives output something like this :
[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]
Out of all this data, I need output to be :
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
I need this url which I need to print somewhere. I am not sure what can be used here either awk, sed , cut or anything with pipe with the curl command to get output of this url.
Can someone help me please?
I never wanted to ask this here but I am running out of options on what could be the best way to achieve this.
Thanks
Best option here is to use jq:
json='[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]'
echo $json | jq -r '.[].web_url'
You can try this sed
sed 's/.*:.\(http.[^"]*\).*/\1/'
It will match the last occurance of http through to the first occurance of " after http.
Using cat file in place of curl... for demonstration, this will work on the input you show using any sed in any shell on every Unix box:
$ cat file | sed 's/.*"web_url":"\([^"]*\).*/\1/'
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
Try Perl
$ perl -ne ' /.*"web_url":"([^"]+)"/ and print "$1\n" ' sameer.txt
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
Another method:
perl -ne ' while( /"web_url":"([^"]+)"/g ) { $x=1; $t=$1 } print "$t\n" if $x; $x=0 ' sameer.txt
I have a fresh ubuntu installation and I'm using a command that returns a JSON string. I would like to send this json string to an external api using curl. How do I parse something like {"foo":"bar"} to an url like xxx.com?foo=bar using just the standard ubuntu libraries?
Try this
curl -s 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v RS=',"' -F: '/^text/ {print $2}'
You could use tr -d '{}' instead of sed. But leaving them out completely seems to have the desired effect as well.
if you want to strip off the outer quotes, pipe the result of the above through sed 's/(^"\|"$)//g'
How to extract a single value from a given json?
{
"Vpc": {
"InstanceTenancy": "default",
"State": "pending",
"VpcId": "vpc-123",
"CidrBlock": "10.0.0.0/16",
"DhcpOptionsId": "dopt-123"
}
}
Tried this but with no luck:
grep -e '(?<="VpcId": ")[^"]*'
You probably wanted -Po, which works with your regex:
$ grep -oP '(?<="VpcId": ")[^"]*' infile
vpc-123
If GNU grep with its -P option isn't available, we can't use look-arounds and have to resort to for example using grep twice:
$ grep -o '"VpcId": "[^"]*' infile | grep -o '[^"]*$'
vpc-123
The first one extracts up to and excluding the closing quotes, the second one searches from the end of the line for non-quotes.
But, as mentioned, you'd be better off properly parsing your JSON. Apart from jq mentioned in another answer, I know of
Jshon
JSON.sh
A jq solution would be as simple as this:
$ jq '.Vpc.VpcId' infile
"vpc-123"
Or, to get raw output instead of JSON:
$ jq -r '.Vpc.VpcId' infile
vpc-123
Something like
grep '^ *"VpcId":' json.file \
| awk '{ print $2 }' \
| sed -e 's/,$//' -e 's/^"//' -e 's/"$//'
you can do:
sed -r -n -e '/^[[:space:]]*"VpcId":/s/^[^:]*: *"(.*)", *$/\1/p'
but really, using any shell tools to run regexes over JSON content is a bad idea. you should consider a much saner language like python.
python -c 'import json, sys; print(json.loads(sys.stdin.read())["Vpc"]["VpcId"]);'
Try this regex pattern:
\"VpcId\":\s?(\"\S+\")
If you can install a tool I would suggest using jq jq. It allows very simple grep, with great support for piping too.
The OP asks for solutions using grep. In case he means using terminal, the node cli is an alternative, since support for JSON is total. One alternative could be the command node --eval "script"
echo '{"key": 42}' \
| node -e 'console.log(JSON.parse(require("fs").readFileSync(0).toString()).key)' //prints 42
I am trying to write a script that should take values from a xml file.
Here is the xml file :-
`<manifestFile>
<productInformation>
<publicationInfo>
<pubID pcsi-selector="P.S.">PACODE</pubID>
<pubNumber/>
</publicationInfo>
</productInformation>
</manifestFile>`
and i my code is
:-
#!/bin/sh
Manifest=""
Manifest= `/bin/grep 'pcsi-selector="' /LDCManifest.xml | cut -f 2 -d '"'`
echo $Manifest
I expect my result to be P.S. , but it keeps throwing error as :-
./abc.sh: P.S.: not found
I am new to shell and i am not able to figure out whats the error here ?
You can't have a space after the =.
When you run this command:
Manifest= `/bin/grep 'pcsi-selector="' /LDCManifest.xml | cut -f 2 -d '"'`
It's the same as this:
Manifest='' `/bin/grep 'pcsi-selector="' /LDCManifest.xml | cut -f 2 -d '"'`
That tells the shell to
Run the grep command.
Take its output
Run that output as a command, with the environment variable Manifest set to the empty string for the duration of the command.
Get rid of the space after the = and you'll get the result you want.
However, you should also avoid using backticks for command substitution, because they interfere with quoting. Use $(...) instead:
Manifest=$(grep 'pcsi-selector="' /LDCManifest.xml | cut -f2 -d'"')
Also, using text/regex-based tools like grep and cut to manipulate XML is clunky and error-prone. You'd be better off installing something like XMLStarlet:
Manifest=$(xmlstarlet sel -t \
-v '/manifestFile/productInformation/publicationInfo/pubID/#pcsiselector' -n \
/LDCManifest.xml)
Or simpler:
grep -oP 'pcsi-selector="\K[^"]+' /LDCManifest.xml
would print
P.S.
assign
Manifest=$(grep -oP 'pcsi-selector="\K[^"]+' /LDCManifest.xml)
How do I extract the variable value of the following line of an html page via Terminal to submit it afterwards via "curl -d" in the same script?
<input type="hidden" name="au_pxytimetag" value="1234567890">
Edit: how do I transfer the extracted value to the "curl -d" command within a single script? might be a silly question, but I'm total noob. =0)
EDITED:
I cannot tell from your question what you are actually trying to do. I originally thought you were trying to extract a variable from a file, but it seems you actually want to firstly, get that file, secondly extract a variable, and thirdly, use variable for something else... so let's address each of those steps:
Firstly you want to grab a page using curl, so you will do
curl www.some.where.com
and the page will be output on your terminal. But actually you want to search for something on that page, so you need to do
curl www.some.where.com | awk something
or
curl www.some.where.com | grep something
But you want to put that into a variable, so you need to do
var=$(curl www.some.where.com | awk something)
or
var=$(curl www.some.where.com | grep something)
The actual command I think you want is
var=$(curl www.some.where.com | awk -F\" '/au_pxytimetag/{print $(NF-1)}')
Then you want to use the variable var for another curl operation, so you will need to do
curl -d "param1=$var" http://some.url.com/somewhere
Original answer
I'd use awk like this:
var=$(awk -F\" '/au_pxytimetag/{print $(NF-1)}' yourfile)
to take second to last field on line containing au_pxytimetag using " as field separator.
Then you can use it like this
curl -d "param1=$var¶m2=SomethingElse" http://some.url.com/somewhere
You can use xmllint:
value=$(xmllint --html --xpath "string(//input[#name='au_pxytimetag']/#value)" index.html)
You can do it with my Xidel:
xidel http://webpage -e "//input[#name='au_pxytimetag']/#value"
But you do not need to.
With
xidel http://webpage -f "(//form)[1]" -e "//what-you-need-from-the-next-page"
you can send all values from the first form on the webpage to the form action and then you can query something from the next page
You can try:
grep au_pxytimetag input.html | sed "s/.* value=\"\(.*\)\".*/\1/"
EDIT:
If you need this on a script:
#!/bin/bash
DATA=$(grep au_pxytimetag input.html | sed "s/.* value=\"\(.*\)\".*/\1/")
curl http://example.com -d $DATA