Extract value via OSX Terminal from .html for "curl" submission within a single script - macos

How do I extract the variable value of the following line of an html page via Terminal to submit it afterwards via "curl -d" in the same script?
<input type="hidden" name="au_pxytimetag" value="1234567890">
Edit: how do I transfer the extracted value to the "curl -d" command within a single script? might be a silly question, but I'm total noob. =0)

EDITED:
I cannot tell from your question what you are actually trying to do. I originally thought you were trying to extract a variable from a file, but it seems you actually want to firstly, get that file, secondly extract a variable, and thirdly, use variable for something else... so let's address each of those steps:
Firstly you want to grab a page using curl, so you will do
curl www.some.where.com
and the page will be output on your terminal. But actually you want to search for something on that page, so you need to do
curl www.some.where.com | awk something
or
curl www.some.where.com | grep something
But you want to put that into a variable, so you need to do
var=$(curl www.some.where.com | awk something)
or
var=$(curl www.some.where.com | grep something)
The actual command I think you want is
var=$(curl www.some.where.com | awk -F\" '/au_pxytimetag/{print $(NF-1)}')
Then you want to use the variable var for another curl operation, so you will need to do
curl -d "param1=$var" http://some.url.com/somewhere
Original answer
I'd use awk like this:
var=$(awk -F\" '/au_pxytimetag/{print $(NF-1)}' yourfile)
to take second to last field on line containing au_pxytimetag using " as field separator.
Then you can use it like this
curl -d "param1=$var&param2=SomethingElse" http://some.url.com/somewhere

You can use xmllint:
value=$(xmllint --html --xpath "string(//input[#name='au_pxytimetag']/#value)" index.html)

You can do it with my Xidel:
xidel http://webpage -e "//input[#name='au_pxytimetag']/#value"
But you do not need to.
With
xidel http://webpage -f "(//form)[1]" -e "//what-you-need-from-the-next-page"
you can send all values from the first form on the webpage to the form action and then you can query something from the next page

You can try:
grep au_pxytimetag input.html | sed "s/.* value=\"\(.*\)\".*/\1/"
EDIT:
If you need this on a script:
#!/bin/bash
DATA=$(grep au_pxytimetag input.html | sed "s/.* value=\"\(.*\)\".*/\1/")
curl http://example.com -d $DATA

Related

Extract a substring on Mac command line

After performing a request using curl and passing the response through jq I have this output:
projects/123456789/locations/europe-west2/featurestores/p013600
projects/123456789/locations/europe-west2/featurestores/p013601
I want to tokenise those strings and get the last part, i.e. I want to return:
p013600
p013601
How can I do that in a one-liner (i.e. via piping)
I figured it out, piping to cut does the job
echo projects/123456789/locations/europe-west2/featurestores/p013600 | cut -d'/' -f6

Extract data from curl output using sed, awk, cut or python3

I am trying to extract one url from a curl command's output in shell.
The curl command which I am running is :
curl -s "http://HOSTNAME.com/api/v4/projects/18/merge_requests?source_branch=samm_6819"
which gives output something like this :
[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]
Out of all this data, I need output to be :
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
I need this url which I need to print somewhere. I am not sure what can be used here either awk, sed , cut or anything with pipe with the curl command to get output of this url.
Can someone help me please?
I never wanted to ask this here but I am running out of options on what could be the best way to achieve this.
Thanks
Best option here is to use jq:
json='[{"id":244,"iid":69,"project_id":18,"title":"bug 6819","description":"","state":"merged","created_at":"2021-09-04T06:51:05.988Z","updated_at":"2021-09-04T06:52:03.869Z","merged_by":{"id":4,"name":"SOME NAME ","username":"SOMEUSERNAME","state":"active","avatar_url":"https://www.gravatar.com/avatar/baa4538f891a621a8e5480aa9ac404a6?s=80\u0026d=identicon","web_url":"http://HOSTNAME/SOMEUSERNAME"},"merged_at":"2021-09-04T06:52:03.997Z","closed_by":null,"closed_at":null,"target_branch":"master","source_branch":"samm_6819","user_notes_count":0,"upvotes":0,"downvotes":0,"author":{"id":1,"name":"Administrator","username":"root","state":"active","avatar_url":"https://www.gravatar.com/avatar/81fasf149c17eba1d66803dc0877828900?s=80\u0026d=identicon","web_url":"http://HOSTNAME/root"},"assignees":[],"assignee":null,"reviewers":[],"source_project_id":18,"target_project_id":18,"labels":[],"draft":false,"work_in_progress":false,"milestone":null,"merge_when_pipeline_succeeds":true,"merge_status":"can_be_merged","sha":"1e14427dd70862265b55fsa0e38f4e980d5f65524","merge_commit_sha":"34240a857d7d1a852f9c3d4safa3f031ef3bd35225","squash_commit_sha":null,"discussion_locked":null,"should_remove_source_branch":false,"force_remove_source_branch":false,"reference":"!69","references":{"short":"!69","relative":"!69","full":"SOMEPROJECTNAME-af/test!69"},"web_url":"http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69","time_stats":{"time_estimate":0,"total_time_spent":0,"human_time_estimate":null,"human_total_time_spent":null},"squash":false,"task_completion_status":{"count":0,"completed_count":0},"has_conflicts":false,"blocking_discussions_resolved":true,"approvals_before_merge":null}]'
echo $json | jq -r '.[].web_url'
You can try this sed
sed 's/.*:.\(http.[^"]*\).*/\1/'
It will match the last occurance of http through to the first occurance of " after http.
Using cat file in place of curl... for demonstration, this will work on the input you show using any sed in any shell on every Unix box:
$ cat file | sed 's/.*"web_url":"\([^"]*\).*/\1/'
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
Try Perl
$ perl -ne ' /.*"web_url":"([^"]+)"/ and print "$1\n" ' sameer.txt
http://HOSTNAME.com/GROUPNAME/PROJECTNAME/-/merge_requests/69
Another method:
perl -ne ' while( /"web_url":"([^"]+)"/g ) { $x=1; $t=$1 } print "$t\n" if $x; $x=0 ' sameer.txt

Getting a variable from a website which is then stored

I need to retrieve the "message" variable from the website and store it in a variable.
Im new to shell script so im not sure how to do it. Ive been trying various ways for a while but they dont seem to work.
This is the output of the website example.web:8080/rest/message
[{"id":33,"message":"Dash","lastUpdated":1569922857154,"userName":null}]
#!/bin/bash
message=$( curl -# -L "http://example.web:8080/rest/message}")
username=$(
<<< "${message}" \
grep -P -o -e '(?<=<li>message: <strong>)(.*?)(?=<\/strong><\/li>)' |
head -n 1
)
I need the message variable "Dash" To be stored which can be printer later on.
Though I am not sure if output you are getting is pure json or not, if you are ok with awk could you please try following(If you have proper json then please use jq or parser which is specially made for json parsing).
Your_command | awk -v s1="\"" '
match($0,s1 "message" s1 ":" s1 "[^\"]*"){
num=split(substr($0,RSTART,RLENGTH),array,"\"")
print array[num]
}'
EDIT: If you have jq tool with you, could you please try following(not tested).
curl -# -L "http://example.web:8080/rest/message" | jq '.[] | .message'

parse json using a bash script without external libraries

I have a fresh ubuntu installation and I'm using a command that returns a JSON string. I would like to send this json string to an external api using curl. How do I parse something like {"foo":"bar"} to an url like xxx.com?foo=bar using just the standard ubuntu libraries?
Try this
curl -s 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v RS=',"' -F: '/^text/ {print $2}'
You could use tr -d '{}' instead of sed. But leaving them out completely seems to have the desired effect as well.
if you want to strip off the outer quotes, pipe the result of the above through sed 's/(^"\|"$)//g'

AWK print field by variable value

so I have a GET command retrieving data from a server, and only need specific parts of this info.
I have a working script but the awk part is very long and I was wondering if I could get some help shortening it.
Current script:
curl --insecure -u $HMCUSER:$HMCPASS -H "Accept: */*" -H "Content-type: application/json" -s --header "X-API-Session:$COOKIE" -X GET https://$HMCIP:6794$BCID1/blades | awk -F\" '{print $50" M1:"$42"\n"$114" M1:"$106"\n"$18" M1:"$10"\n"$98" M1:"$90"\n"$34" M1:"$26"\n"$82" M1:"$74"\n"$66" M1:"$58"\n"$130" M1:"$122}' > ~walkers/blade-info-new
echo -e "\n`cat blade-info-new`\n"
and the output is:
/api/blades/394a7ea8-02d4-11e1-b71a-5cf3fcad1a40 M1:B.1.01
/api/blades/749f35cc-02d7-11e1-946a-5cf3fcad1ef8 M1:B.1.02
/api/blades/eeae9670-02d5-11e1-a5ee-5cf3fcad21e0 M1:B.1.03
/api/blades/3949f5a0-02d4-11e1-85df-5cf3fcad1dc8 M1:B.1.04
/api/blades/d25df328-02d3-11e1-a1e9-5cf3fcad2158 M1:B.1.05
/api/blades/bbecebd8-02d0-11e1-aca7-5cf3fcacf4a0 M1:B.1.06
/api/blades/3016b5d8-02d7-11e1-a66f-5cf3fcad1dd0 M1:B.1.07
/api/blades/75796586-02ea-11e1-8ab0-5cf3fcacf040 M1:B.1.08
(there are two columns: /api/blades/... and M1:B.1.0#)
So I tried this:
for i in {10..130..8}
do
try=$(curl --insecure -u $HMCUSER:$HMCPASS -H "Accept: */*" -H "Content-type: application/json" -s --header "X-API-Session:$COOKIE" -X GET https://$HMCIP:6794$BCID1/blades | awk -v i=$i -F\" '{print $i}')
echo "$try"
done
hoping to get the same output as above and instead I just get the complete JSON object:
{"blades":[{"status":"operating","name":"B.1.03","type":"system-x","object-uri":"/api/blades/eeae9670-02d5-11e1-a5ee-5cf3fcad21e0"},{"status":"operating","name":"B.1.05","type":"system-x","object-uri":"/api/blades/d25df328-02d3-11e1-a1e9-5cf3fcad2158"},{"status":"operating","name":"B.1.01","type":"system-x","object-uri":"/api/blades/394a7ea8-02d4-11e1-b71a-5cf3fcad1a40"},{"status":"operating","name":"B.1.07","type":"system-x","object-uri":"/api/blades/3016b5d8-02d7-11e1-a66f-5cf3fcad1dd0"},{"status":"operating","name":"B.1.06","type":"system-x","object-uri":"/api/blades/bbecebd8-02d0-11e1-aca7-5cf3fcacf4a0"},{"status":"operating","name":"B.1.04","type":"system-x","object-uri":"/api/blades/3949f5a0-02d4-11e1-85df-5cf3fcad1dc8"},{"status":"operating","name":"B.1.02","type":"system-x","object-uri":"/api/blades/749f35cc-02d7-11e1-946a-5cf3fcad1ef8"},{"status":"operating","name":"B.1.08","type":"system-x","object-uri":"/api/blades/75796586-02ea-11e1-8ab0-5cf3fcacf040"}]}
So I was wondering how to get the variable to work? I've been on many websites and everyone seems to say awk -v i=$i should work...
EDIT: The sequence I want to print is the object uri (i.e. /api/blades/...) followed by the blade name (i.e. B.1.01). These infos are all in the JSON object returned by the curl command starting with the tenth field and every 8th field after that (using " as a delimiter):
{"blades":[{"status":"operating","name":"B.1.03","type":"system-x","object-uri":"/api/blades/eeae9670-02d5-11e1-a5ee-5cf3fcad21e0"},{"status":"operating","name":"B.1.05","type":"system-x","object-uri":"/api/blades/d25df328-02d3-11e1-a1e9-5cf3fcad2158"},{"status":"operating","name":"B.1.01","type":"system-x","object-uri":"/api/blades/394a7ea8-02d4-11e1-b71a-5cf3fcad1a40"},{"status":"operating","name":"B.1.07","type":"system-x","object-uri":"/api/blades/3016b5d8-02d7-11e1-a66f-5cf3fcad1dd0"},{"status":"operating","name":"B.1.06","type":"system-x","object-uri":"/api/blades/bbecebd8-02d0-11e1-aca7-5cf3fcacf4a0"},{"status":"operating","name":"B.1.04","type":"system-x","object-uri":"/api/blades/3949f5a0-02d4-11e1-85df-5cf3fcad1dc8"},{"status":"operating","name":"B.1.02","type":"system-x","object-uri":"/api/blades/749f35cc-02d7-11e1-946a-5cf3fcad1ef8"},{"status":"operating","name":"B.1.08","type":"system-x","object-uri":"/api/blades/75796586-02ea-11e1-8ab0-5cf3fcacf040"}]}
The blade names don't have to be in numerical order (B.1.01 to B.1.08), only on the same line as the corresponding ID
EDIT 2: Found a work around. Used a C type for loop instead of the normal bash: for (( i=10; i<=130; i+=8 )) instead of for i in {10..130..8}
The proper answer to this question is to ditch awk (even though I love awk) and use a real JSON parser, e.g. the very handy jq tool.
If I understand correctly you're wanting {10..130..8} to expand to give the required series of $i values.
In my version of bash (it's ooooold: 3.2.25) the string {10..130..8} doesn't expand to anything and so the loop is entered with i="{10..130..8}" and so awk uses ${10..130..8} which appears to simplify to $0 (i.e. the whole curl return string). Hence your problem. You can test if this is the case by putting echo $i inside your loop.
You need a better way of getting the series of values you want. You can use "seq" for this (man seq for more info). $( seq 10 8 130 ) should do it.
Further, you can make it so that curl is only called once with something messier like
# Construct the string of fields
for i in $( seq 10 8 130 ); do
fields="$fields,\$$i"
done
fields=$( echo "$fields" | sed 's/^,//' ) # Remove the leading comma
...curl command... | awk '{print '$fields'}'
I think you want awk to access the updated value of the i variable. Because the awk instruction is between apostrophes (''), the value of i is hidden from awk, but you can avoid it by removing the apostrophes from the piece of the instruction that is replaced by the actual value of i. It is explained on this online AWK manual, in the section Dynamic Variables.
So for your particular case, you could try
awk -F\" '{print $'$i'}'
instead of
awk -v i=$i -F\" '{print $i}'
at the end of your pipeline command.
How about changing record separator (RS), and defining field separator to quote. Then save the name, and print it together with object-url. The command below should give you a starting point
curl [...] | awk -v RS=, -F\" '{
if ($2 ~ /^name$/) {name=$4}
if ($2 ~ /^object-uri$/) {print name, $4}
}'
p.s. Remote the if's and printing $0 if you want to see how the RS=, helps you.

Resources