Bash: using the output of one command in the other - bash

I have the following requirement here: Fetch all the commits from our SVN from the last two years and list the title of all the JIRA issues that had code committed. Our commit rules are pretty strict, so a commit must start with the JIRA code, like: COR-3123 Fixed the bug, introduced a new one
So, I wrote the following shell script to get this working:
svn log -r{2012-04-01}:{2014-04-01} | grep "COR-" | cut -f1 -d" " | sort -u
This gets me all the JIRA codes.
But now I want to use these in the following command:
wget --quiet --load-cookies cookies.txt -O - http://jira.example.com/browse/{HERE} | sed -n -e 's!.*<title>\(.*\)</title>.*!\1!p'
Ie: get the JIRA page via wget and parse out the title... (I have already cached my login credentials to use with wget in cookies.txt)
and obviously to the location {HERE} I want to insert the code obtained from the first list. Doing this via a two step (step 1: get list, step 2 iterate via list) script (python, perl, ... ) is not a problem, but I'd like to know if it's possible to do it in ONE step, using bash :)
(Yes, I know there is JIRA rest API)

You can use xargs to pass the parameter to wget:
xargs -I {} wget http://jira.example.com/browse/{}

Related

how can I scrap data from reddit (in bash)

I want to scrap titles and date from http://www.reddit.com/r/movies.json in bash
wget -q -O - "http://www.reddit.com/r/movies.json" | grep -Po '(?<="title": ").*?(?=",)' | sed 's/\"/"/'
I have titles but I don't know how to add dates, can someone help?
wget -q -O - "http://www.reddit.com/r/movies.json" | grep -Po
'(?<="title": ").*?(?=",)' | sed 's/"/"/'
As extension suggest it is JSON (application/json) file, therefore grep and sed are poorly suited for working with it, as they are mainly for using regular expressions. If you are allowed to install tools, jq should be handy here. Try using your system package manager to install it, if it succeed you should get pretty printed version of movies.json by doing
wget -q -O - "http://www.reddit.com/r/movies.json" | jq
and then find where interesting values are placed which should allow you to grab it. See jq Cheat Sheet for example of jq usage. If you are limited to already installed tools I suggest taking look at json module of python.

cURL to delete specific branches from GitLab

I am little confused on how to make use of cURL command to delete the unnecessary branches we have in our GitLab instance.
I want to delete the branches which are named as "tobedeleted_*" , so any branch name which starts with "tobedeleted" I want to delete it.
However one more additional thing which I want to check is, I want to delete the branches which are having an attribute in api where created_at is more than 1 month or 30 days.
This way, I want to make sure that I don't delete any branch which is newly created.
I have set of commands which can be executed manually to perform this which requires input of project id & branch name which is supposed to be deleted, but I want to automate it with the help of writing script or some curl commands which I will either schedule with the help of jenkins or gitlab schedule feature.
Can you guys help me on how to automate it?
I can tell you the details I have, I am basically lagging in writing the if conditions which will make it easier I believe.
I want to do this for all the projects which are present in a group number 6. So, To get all the project ids, I can make use of this curl command:
curl -s --location --request GET '$CI_API_V4_URL/groups/6/projects' --header 'PRIVATE-TOKEN:<my_pvt_token>' | sed 's/,/\n/g' | grep -w "id" | awk -F ':' '{print $2}' | sed -s 's/{"id"//g'
To get all the branches which are supposed to be deleted of a project which requires input of project ids, In the below curl command I am using projectid as 11
curl -s --location --request GET '$CI_API_V4_URL/projects/11/repository/branches' --header 'PRIVATE-TOKEN: <my_pvt_token>' | sed 's/,/\n/g' | grep -w "name" | awk -F ':' '{print $2}' | sed 's/"//g' | grep 'tobedeleted*'
Once, I extract the name of the branch of all the projects,
I need to give input of project id and branch name and iterate it to the following cURL command:
curl --request DELETE --header "PRIVATE-TOKEN:<my_pvt_token>" "$CI_API_V4_URL/projects/11/repository/branches/tobedeleted_username_6819"
I am very confused on how can I iterate through the projects by keep on deleting the multiple branches I have of that project.
Any help is really appreciated.
Also, this could be little easier if I was going to directly delete the merged branches with gitlab api, but I want to delete the specific named branches only.
Ref:
https://docs.gitlab.com/ee/api/branches.html#delete-repository-branch
https://docs.gitlab.com/ee/api/branches.html#list-repository-branches
https://docs.gitlab.com/ee/api/projects.html#list-all-projects

how to copy all the URLs of a certain column of a web page?

I want to import several number of files into my server using wget , the 492 files are here:
https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736
so I want to copy the URL of all files in "File Name" column to save them into a file and import them with wget.
So how can I copy all those URLs from that column ?
Thanks for reading :)
Since you've tagged bash, this should work.
wget -O- is used to output the data to the standard output, where it's greppable. (curl would do that by default.)
grep -oE is used to capture the URLs (which happily are in a regular enough format that a simple regexp works).
Then, wget -i is used to read URLs from the file generated. You might wish to add -nc or other suitable partial-fetch flags; those files are pretty hefty.
wget -O- https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736 | grep -oE 'http://ftp.sra.ebi.ac.uk/[^"]+' > urls.txt
wget -i urls.txt
First, I recommend using a more specific and robust implementation...
but, in the case you are against a wall and in a hurry -
$: curl -s https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736 |
sed -En '/href="http:\/\/.*clean.fastq.gz"/{s/^.*href="([^"]+)".*/\1/;p;}' |
while read url; do wget "$url"; done
This is a quick and dirty rough first pass, but it will give you something to work with.
If you aren't in a screaming hurry, try writing something more robust and step-wise in perl or python.

Shell script not working while writing the same on the Terminal works [duplicate]

I have a simple one-liner that works perfectly in the terminal:
history | sort -k2 | uniq -c --skip-fields=1 | sort -r -g | head
What it does: Gives out the 10 most frequently used commands by the user recently. (Don't ask me why I would want to achieve such a thing)
I fire up an editor and type the same with a #!/bin/bash in the beginning:
#!/bin/bash
history | sort -k2 | uniq -c --skip-fields=1 | sort -r -g | head
And say I save it as script.sh. Then when I go to the same terminal, type bash script.sh and hit Enter, nothing happens.
What I have tried so far: Googling. Many people have similar pains but they got resolved by a sudo su or adding/removing spaces. None of this worked for me. Any idea where I might be going wrong?
Edit:
I would want to do this from the terminal itself. The system on which this script would run may or may not provide permissions to change files in the home folder.
Another question as suggested by BryceAtNetwork23, what is so special about the history command that prevents us from executing it?
Looking at your history only makes sense in an interactive shell. Make that command a function instead of a standalone script. In your ~/.bashrc, put
popular_history() {
history | sort -k2 | uniq -c --skip-fields=1 | sort -r -g | head
}
To use history from a non-interactive shell, you need to enable it; it is only on by default for interactive shells. You can add the following line to the shell script:
set -o history
It still appears that only interactive shells will read the default history file by, well, default, so you'll need to populate the history list explicitly with the next line:
history -r ~/.bash_history
(Read the bash man page for more information on using a file other than the default .bash_history.)
History command is disabled by default on bash script, that's why even
history command won't work in .sh file. for its redirection. Kindly
redirect bash_history file inside the .sh file.
History mechanism can be enabled also by mentioning history file and change run-time parameters as mentioned below
#!/bin/bash
HISTFILE=~/.bash_history
set -o history
Note: mentioned above two lines on the top of the script file. Now history command will work in history.

How to get the highest numbered link from curl result?

i have create small program consisting of a couple of shell scripts that work together, almost finished
and everything seems to work fine, except for one thing of which i'm not really sure how to do..
which i need, to be able to finish this project...
there seem to be many routes that can be taken, but i just can't get there...
i have some curl results with lots of unused data including different links, and between all data there is a bunch of similar links
i only need to get (into a variable) the link of the highest number (without the always same text)
the links are all similar, and have this structure:
always same text
always same text
always same text
i was thinking about something like;
content="$(curl -s "$url/$param")"
linksArray= get from $content all links that are in the href section of the links
that contain "always same text"
declare highestnumber;
for file in $linksArray
do
href=${1##*/}
fullname=${href%.html}
OIFS="$IFS"
IFS='_'
read -a nameparts <<< "${fullname}"
IFS="$OIFS"
if ${nameparts[1]} > $highestnumber;
then
highestnumber=${nameparts[1]}
fi
done
echo ${nameparts[1]}_${highestnumber}.html
result:
https://always/same/link/unique-name_19.html
this was just my guess, any working code that can be run from bash script is oke...
thanks...
update
i found this nice program, it is easily installed by:
# 64bit version
wget -O xidel/xidel_0.9-1_amd64.deb https://sourceforge.net/projects/videlibri/files/Xidel/Xidel%200.9/xidel_0.9-1_amd64.deb/download
apt-get -y install libopenssl
apt-get -y install libssl-dev
apt-get -y install libcrypto++9
dpkg -i xidel/xidel_0.9-1_amd64.deb
it looks awsome, but i'm not really sure how to tweak it to my needs.
based on that link and the below answer, i guess a possible solution would be..
use xidel, or use "$ sed -n 's/.href="([^"]).*/\1/p' file" as suggested in this link, but then tweak it to get the link with html tags like:
< a href="https://always/same/link/same-name_17.html">always same text< /a>
then filter out all that doesn't end with ( ">always same text< /a> )
and then use the grep sort as mentioned below.
Continuing from the comment, you can use grep, sort and tail to isolate the highest number of your list of similar links without too much trouble. For example, if you list of links is as you have described (I've saved them in a file dat/links.txt for the purpose of the example), you can easily isolate the highest number in a variable:
Example List
$ cat dat/links.txt
always same text
always same text
always same text
Parsing the Highest Numbered Link
$ myvar=$(grep -o 'https:.*[.]html' dat/links.txt | sort | tail -n1); \
echo "myvar : '$myvar'"
myvar : 'https://always/same/link/same-name_19.html'
(note: the command above is all one line separate by the line-continuation '\')
Applying Directly to Results of curl
Whether your list is in a file, or returned by curl -s, you can apply the same approach to isolate the highest number link in the returned list. You can use process substitution with the curl command alone, or you can pipe the results to grep. E.g. as noted in my original comment,
$ myvar=$(grep -o 'https:.*[.]html' < <(curl -s "$url/$param") | sort | tail -n1); \
echo "myvar : '$myvar'"
or pipe the result of curl to grep,
$ myvar=$(curl -s "$url/$param" | grep -o 'https:.*[.]html' | sort | tail -n1); \
echo "myvar : '$myvar'"
(same line continuation note.)
Why not use Xidel with xquery to sort the links and return the last?
xidel -q links.txt --xquery "(for $i in //#href order by $i return $i)[last()]" --input-format xml
The input-format parameter makes sure you don't need any html tags at the start and ending of your txt file.
If I'm not mistaken, in the latest Xidel the -q (quiet) param is replaced by -s (silent).

Resources