So what I'm intending to do here is to determine the latest stable version of TuxOnIce from http://tuxonice.net/downloads/all/ (currently tuxonice-for-linux-3.8.0-2013-02-24.patch.bz2).
What complicates things is that there's no "current" link, so we gotta follow the versioning, which is something like (these don't exist):
tuxonice-for-linux-3.8.0-2013-4-2.patch.bz2
tuxonice-for-linux-3.8-4-2013-4-16.patch.bz2
tuxonice-for-linux-3.8-11-2013-5-23.patch.bz2
The problem is they're gonna be in this order:
tuxonice-for-linux-3.8-11-2013-5-23.patch.bz2
tuxonice-for-linux-3.8-4-2013-4-16.patch.bz2
tuxonice-for-linux-3.8.0-2013-4-2.patch.bz2
My current implemetation (which is garbage) is this. I thought about using the dates but couldn't figure out how to do that either (/tmp/tuxonice is the index file):
_major=3.8 # Auto-generated
_TOI=$(grep ${_major}-1[0-9] /tmp/tuxonice | cut -d '"' -f2 | tail -1)
[ ! $_TOI ] && _TOI=$(grep ${_major}- /tmp/tuxonice | cut -d '"' -f2 | tail -1)
[ ! $_TOI ] && _TOI=$(grep ${_major}.0-2 /tmp/tuxonice | cut -d '"' -f2 | tail -1)
Thanks.
Use the webserver's feature to sort the index page by modification date in reverse order, grab the page using lynx -dump, get the first line matching the filename you are interested in and print the respective column. This gives you the absolute URL to the file, from there you can tweak the command to give you the exact output you want (filename, just the version string, ...).
$ lynx -dump 'http://tuxonice.net/downloads/all/?C=M&O=D'|awk '/^[[:space:]]*[[:digit:]]+\..+\/tuxonice-for-linux/ { print $2; exit }'
http://tuxonice.net/downloads/all/tuxonice-for-linux-3.8.0-2013-02-24.patch.bz2
Still not super-robust and will obviously break if the modification dates are not as expected, and you probably also want to tweak the regex a bit to be more specific.
This isn't a real answer, but I thought this "one-liner"[1] was pretty cool:
HTML=$(wget -qO- http://tuxonice.net/downloads/all/ | grep tuxonice); TIMESTAMP=$(echo "$HTML" | sed 's/.*\([0-9]\{2\}-[A-Za-z]\{3\}-[0-9]\{4\} [0-9]\{2\}:[0-9]\{2\}\).*/\1/' | while read line; do echo $(date --date "$line" +%s) $line; done | sort | tail -n 1 | cut -d' ' -f2-3); LINK=$(echo "$HTML" | grep "$TIMESTAMP" | sed 's/.*href=\"\(.*\)\".*/\1/'); echo "http://tuxonice.net/downloads/all/${LINK}"
Prints:
http://tuxonice.net/downloads/all/tuxonice-for-linux-3.8.0-2013-02-24.patch.bz2
This approach is really just a joke though. Obviously, there are better ways to do this, perhaps using a scripting language that supports XML parsing.
At the very least, maybe this will give you some insight on how you can use the date/time values of the files to select the "newest". But I'd caution using this (because upload dates may not coincide with version numbers), and suggest that your version number idea was probably a better idea, if you can somehow handle all of the various naming and version numbering schemes it looks like they've used.
[1] It's not a real one liner
Related
Let's say I have these files in folder Test1
AAAA-12_21_2020.txt
AAAA-12_20_2020.txt
AAAA-12_19_2020.txt
BBB-12_21_2020.txt
BBB-12_20_2020.txt
BBB-12_19_2020.txt
I want below latest files to folder Test2
AAAA-12_21_2020.txt
BBB-12_21_2020.txt
This code would work:
ls $1 -U | sort | cut -f 1 -d "-" | uniq | while read -r prefix; do
ls $1/$prefix-* | sort -t '_' -k3,3V -k1,1V -k2,2V | head -n 1
done
We first iterate over every prefix in the directory specified as the first argument, which we get by sorting the list of files and deleting duplicates, before extracting everything before -. Then we sort those filenames by three fields separated by the _ symbol using the -k option of sort (primarily by years in the third field, then months in second and lastly days). We use version sort to be able to ignore the text around and interpret numbers correctly (as opposed to lexicographical sort).
I'm not sure whether this is the best way to do this, as I used only basic bash functions. Because of the date format and the fact that you have to differentiate prefixes, you have to parse the string fully, which is a job better suited for AWK or Perl.
Nonetheless, I would suggest using day-month-year or year-month-day format for machine-readable filenames.
Using awk:
ls -1 Test1/ | awk -v src_dir="Test1" -v target_dir="Test2" -F '(-|_)' '{p=$4""$2""$3; if(!($1 in b) || b[$1] < p){a[$1]=$0}} END {for (i in a) {system ("mv "src_dir"/"a[i]" "target_dir"/")}}'
Find the next link if the Link header contains rel=next..
Getting the link header can result in different strings.. I need to find the next link.
e.g.
Link: <http://mygithub.com/api/v3/organizations/20/repos?page=1>; rel=prev, <http://mygithub.com/api/v3/organizations/20/repos?page=3>; rel=next, <http://mygithub.com/api/v3/organizations/20/repos?page=4>; rel=last, <http://mygithub.com/api/v3/organizations/20/repos?page=1>;
would be http://mygithub.com/api/v3/organizations/20/repos?page=3
Link: <http://mygithub.com/api/v3/organizations/4/repos?page=2>; rel="next", <http://mygithub.com/api/v3/organizations/4/repos?page=2>; rel="last"
would be http://mygithub.com/api/v3/organizations/4/repos?page=2
Played with sed and parameter expansion - not that experienced so got stuck :)
Please be aware that parsing HTML with non-html tools it fraught with peril; you will see that this works, and assume you can get away with it always. You'll spend hours trying to get the next level of complexity to work, when you should be studying how to use html-aware tools. Don't say we didn't warn you (-;, but
printf "<http://mygithub.com/api/v3/organizations/20/repos?page=1>; rel=prev, <http://mygithub.com/api/v3/organizations/20/repos?page=3>; rel=next, <http://mygithub.com/api/v3/organizations/20/repos?page=4>; rel=last, <http://mygithub.com/api/v3/organizations/20/repos?page=1>;\n" \
| awk -F" " '{
for(i=1;i<=NF;i++){
if ($i == "rel=next,") {
gsub(/[<>]/,"",$(i-1);sub(/;$/,"",$(i-1))
print $(i-1)
}
}
}'
produces required output:
http://mygithub.com/api/v3/organizations/20/repos?page=3
To save the output of a script section into a variable, you wrap the code for command-substitution, in this case
nextReposLink=$( printf .... | awk '....' )
#-------------^^--------------------------^
The ^ pointed items are modern syntax for command-substitution. The code inside of $( ... ) is executed and the standard output is passed as a argument to the invoking command line. (The original syntax for command substitution is/was `cmds` and works the same in the simple case var=`cmds` . You can nest modern cmd-substitution easily, whereas the old version requires a lot of escape character fiddling. Avoid it if you can.
Note that about any s/str/rep/ that sed can do, awk can do the same, but requires the use of the sub(/regx/, "repl", "str") or gsub(sameArgs) functions. In this particular case, you may need to escape the <> like \<\>.
Be sure to always dbl-quote the use of variables, i.e. echo "$nextReposLink".
IHTH
Well - I put one of your URL strings in a text file and was able to pull out the first URL with two cuts.
[root#oelinux2 ~]# cat test
Link: <http://mygithub.com/api/v3/organizations/20/repos?page=1>; rel=prev, <http://mygithub.com/api/v3/organizations/20/repos?page=3>; rel=next, <http://mygithub.com/api/v3/organizations/20/repos?page=4>; rel=last, <http://mygithub.com/api/v3/organizations/20/repos?page=1>;
Then with using cut:
cat test | cut -d "<" -f2 | cut -d ">" -f1
[root#oelinux2 ~]# cat test | cut -d "<" -f2 | cut -d ">" -f1
http://mygithub.com/api/v3/organizations/20/repos?page=1
That's one option - if you are just looking to get the first URL in the string. Basically - that's just grabbing what's between the two delimiters "<" and ">"
With Cut:
-d is the 'delimiter'
-f is the field you want to get.
If you wanted to get a later URL in that string, you could change the fields (-f #) and see what you get :)
I am trying to match a pattern and extract the values that comes after it. I have used below regex pattern matchching, it it dint help me. No values got extracted as I got blank value when I echoed it.
Someone let me know what mistake I made.
Sample regex:
class="remove_link_style">Site Issue - Please check</a></td><td>
Working</td><td>
<ahref="/0051043899"class="remove_link_style">
patten used: text=$(echo "class="remove_link_style">Site Issue - Please check</a></td><td>Working</td><td><ahref="/0051043899"class="remove_link_style">" | grep -o --perl-regexp "(?class="remove_link_style")[a-zA-Z0-9_]+"")
I also wanted to extract the string that comes after class="remove_link_style" but before </a></td><td>
I think you would find a lot of references and advice not to parse XML with bash tools like grep/sed/awk . With this context, I would advise using any of the parsing tools like http://xmlsoft.org/xmllint.html or http://xmlstar.sourceforge.net/doc/xmlstarlet.txt . But if you'd like to quickly extract the contents, you can combine grep and cut as below.
echo 'class="remove_link_style">GB|Trekkinn-UK|Manualcrawlrequest|1</a></td><td>WorkInProgress</td><td><ahref="/0051043899"class="remove_link_style">' | grep -Eo 'style"[^<>]*>[^<>]+' | cut -f2 -d">"
This prints out:
GB|Trekkinn-UK|Manualcrawlrequest|1
WorkInProgress
EDIT : As per OP's ask, store the output into an array.
If you need the output to be stored in an array, you need to set the IFS since you have white spaces in your elements.
IFS=$'\n'
result=($(echo 'class="remove_link_style">Site Issue - Please check</a></td><td>Working</td><td><ahref="/0051043899"class="remove_link_style">' | grep -Eo 'style"[^<>]*>[^<>]+' | cut -f2 -d">"))
unset IFS
for i in "${result[#]}"; do echo $i; done
Site Issue - Please check
Working
I'm a complete noob at awk/sed so forgive me if I'm missing something obvious here.
Basically I'm trying to do a nested grep, i.e. something akin to:
grep $value `exim -Mvh $(`exim -bpru | grep $eximID | more`)`
Breakdown:
grep $value IN COMMAND
--> exim -Mvh (print exim mail headers) FROM RESULTS OF
---> exim -bpru | grep $eximID | more
$value is the string I'm looking for
$eximID is the string I'm looking for within exim -bpru (list all exim thingies)
No idea if what I'm trying to accomplish would be easier with awk/sed hence the question really.
I tried to make that as legible as possible but nested nesting is hard yo
Edit
Tada! My script is now workings thanks to you guys! Here it is, unfinished, but working:
#!/usr/bin/bash
echo "Enter the email address you want to search for + compare sender info via exim IDs."
read searchTarget
echo "Enter the target domain the email is coming from."
read searchDomain
#domanList is array for list of exim IDs needed
domainList=($(exim -bpru | grep "$searchDomain" | awk '{ print $3 }'))
for i in "${domainList[#]}"
do
echo "$(exim -Mvh $i | grep $searchTarget)"
#echo "$(grep $searchTarget $(exim -Mvh $i))"
done
grep $value `exim -Mvh $(`exim -bpru | grep $eximID | more`)`
This isn't right. The backticks (`command`) and $(command) do the same thing, it's just an alternative syntax. The advantage of using $() is that it's better nestable, so it's a good habit to always use that.
So, let's fix this, we now end up with:
grep "$value" "$(exim -Mvh "$(exim -bpru | grep "$eximID")")" | more
I relocated the more command, for what I think will be obvious reasons. more just paginates data for the user, feeding the output of more to something else almost never makes sense.
I've also quoted the variables, this is also a good habit, because otherwise things will break when there are certain characters in your variable (most common is the a space).
I can't test if this gives you the output you want, if it doesn't, then update your answer with a few lines of example data, and the expected output.
If you're going to do it with back-quotes (not recommended; it is hard work), then you have to write:
grep $value `exim -Mvh $(\`exim -bpru | grep $eximID\`)`
(where I've removed the more since when used like that it behaves like cat and there's no point in using cat at the end of the commands like that either).
It would be more sane to use the $(…) notation throughout:
grep $value $(exim -Mvh $( $(exim -bpru | grep $eximID)))
And it seems more plausible that you don't need quite that many sets of indirection and this is what you're really after:
grep $value $(exim -Mvh $(exim -bpru | grep $eximID))
You should look at:
Why didn't back quotes in a shell script help me cd to a directory?
What is the benefit of using $(…) instead of back ticks in shell scripts?
Why does \$ reduce to $ inside backquotes [though not inside $(…)]?
and no doubt there are other related questions too.
I want to write a script to find the latest version of rpm of a given package available from a mirror for eg: http://mirror.centos.org/centos/5/updates/x86_64/RPMS/
The script should be able to run on majority of linux flavors (eg centos, redhat, ubuntu). So yum based solution is not an option. Is there any existing script that does this? Or can someone give me a general idea on how to go about this?
Thx to levislevis85 for the wget cli. Try this:
ARCH="i386"
PKG="pidgin-devel"
URL=http://mirror.centos.org/centos/5/updates/x86_64/RPMS
DL=`wget -O- -q $URL | sed -n 's/.*rpm.>\('$PKG'.*'$ARCH'.rpm\).*/\1/p' | sort | tail -1`
wget $URL/$DL
I Will put my comment here, otherwise the code will not be readable.
Try this:
ARCH="i386"
PKG="pidgin-devel"
URL=http://mirror.centos.org/centos/5/updates/x86_64/RPMS
DL=`wget -O- -q $URL | sed -n 's/.*rpm.>\('$PKG'.*'$ARCH'.rpm\).*<td align="right">\(.*\)-\(.*\)-\(.*\) \(..\):\(..\) <\/td><td.*/\4 \3 \2 \5 \6 \1/p' | sort -k1n -k2M -k3n -k4n -k5n | cut -d ' ' -f 6 | tail -1`
wget $URL/$DL
What it does is:
wget - get the index file
sed - cut out some parts and put it together in different order. Should result in Year Month Day Hour Minute and Package, like:
2009 Oct 27 01 14 pidgin-devel-2.6.2-2.el5.i386.rpm
2009 Oct 30 10 49 pidgin-devel-2.6.3-2.el5.i386.rpm
sort - order the columns n stays for numerical and M for month
cut - cut out the filed 6
tail - show only last entry
the problem with this could be, if some older package release comes after a newer then this script will also fail. If the output of the site changes, the script will fail. There are always a lot of points where a script could fail.
using wget and gawk
#!/bin/bash
pkg="kernel-headers"
wget -O- -q http://mirror.centos.org/centos/5/updates/x86_64/RPMS | awk -vpkg="$pkg" 'BEGIN{
RS="\n";FS="</a>"
z=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",D,"|")
for(i=1;i<=z;i++){
date[D[i]]=sprintf("%02d",i)
}
temp=0
}
$1~pkg{
p=$1
t=$2
gsub(/.*href=\042/,"",p)
gsub(/\042>.*/,"",p)
m=split(t,timestamp," ")
n=split(timestamp[1],d,"-")
q=split(timestamp[2],hm,":")
datetime=d[3]date[d[2]]d[1]hm[1]hm[2]
if ( datetime >= temp ){
temp=datetime
filepkg = p
}
}
END{
print "Latest package: "filepkg", date: ",temp
}'
an example run of the above:
linux$ ./findlatest.sh
Latest package: kernel-headers-2.6.18-164.6.1.el5.x86_64.rpm, date: 200911041457
Try this (which requires lynx):
lynx -dump -listonly -nonumbers http://mirror.centos.org/centos/5/updates/x86_64/RPMS/ |
grep -E '^.*xen-libs.*i386.rpm$' |
sort --version-sort |
tail -n 1
If your sort doesn't have --version-sort, then you'll have to parse the version out of the filename or hope that a regular sort will do the right thing.
You may be able to do something similar with wget or curl or even a Bash script using redirections with /dev/tcp/HOST/PORT. The problem with these is that you would then have to parse HTML.