Read RSS feed using a shell script - bash

Edit: Translated
I have a RSS-feed that i want to parse. It's a podcast and I want just the MP3-urls to download them with wget.
This is the podcast: http://feeds.feedburner.com/Film-UndKino-trailerVideopodcast
The title should include an (de) to get just the german episodes.
The publish-date should be today.
Would be great if someone could help me – I came this far:
wget -q -O- view-source:http://feeds.feedburner.com/Film-UndKino-trailerVideopodcast?format=xml| awk 'BEGIN{RS=""}
/(date +'%d %M %Y')/{
gsub(/.*|.*/,"")
print
}
But it doesn't work.
Thanks in advance,
arneb3rt

You need to drop the "view-source:" from the wget command and execute the date command (with %b to print the abbreviated month instead of %M) outside of the awk command. The following bash script uses grep instead of awk to produce the URLs of where wget can fetch the podcasts.
Note that, probably due to the holidays, there have been no podcasts since 24 Dec 2011 at the feed, so I hard-coded the date of the last podcast for testing:
url='http://feeds.feedburner.com/Film-UndKino-trailerVideopodcast?format=xml'
d=$(date +'%d %b %Y')
d="24 Dec 2011"
echo "Checking podcasts for date: ${d}"
wget -q -O- ${url} |\
grep -A6 "(de)" |\
grep -A1 "${d}" |\
egrep -o 'http[^ ]*de.mp4' |\
sort | uniq
The output of the above bash script lists two URLs (one feedburner and the other iTunes):
Checking podcasts for date: 24 Dec 2011
http://feedproxy.google.com/~r/Film-UndKino-trailerVideopodcast/~5/pzeSvkVK-3A/trailer01_de.mp4
http://www.moviemaze-trailer.de/ipod/6841/trailer01_de.mp4
Therefore, you could wget the 24 Dec 2011 podcast from either of the above URLs.

Related

Is it possible to send the result of a cut command to a pre defined directory?

I'm currently learning how to code shell scripts and as part of the assignment I need to sort the files I've been given into different directories based on the date inside the file.
The date is on the first line of the file and all of the functionalities have to be inside the same script.
My current idea is to translate into the format required and then to create multiple directories with a mkdir -p function, then use cut to select the section of the date that I want to highlight in the data and return them, ideally I want to be able to now take these outputs from the SelectYear, SelectMonth and SelectDay function and put those files into the corresponding directories that I already set up with the CreateAllDirectories function.
Is this possible?
Here is the end result that I need to achieve with this script, making a directory for each year that appears in the files, then in each of these year directories making another directory for the months, then inside the months directory having a directory for the days, followed by a list of all files which contain that exact date inside them like this:
[~/filesToSort] $ ls -R
.:
2000 2001 2002 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
./2000:
02 03 04 09 10 11 12
./2000/02:
09
./2000/02/09:
ff_1646921307 ….
Currently this is the script I have:
#!/bin/bash
#Changes the date format from YYYY-MM-DD to YYYY/MM/DD
function ChangeSeperater{
head -n 1 ~/filesToSort/ff_* | tr '-' '/'
}
#Makes multiple directories
function CreateAllDirectories{
mkdir -p /year/month/day
}
#Cuts year from file
function SelectYear{
head -n 1 ~/filesToSort/ff_* | cut -c1-4
}
#Cuts month from file
function SelectMonth{
head -n 1 ~/filesToSort/ff_* | cut -c6-7
}
#Cuts day from file
function SelectDay{
head -n 1 ~/filesToSort/ff_* | cut -c9-10
}
EDIT: Thanks for all the help!
Here is the finished script in case anyone is interested:
#!/bin/bash
#Changes the date format from YYYY-MM-DD to YYYY/MM/DD
#Change Seperator function, gets the date from its parameter, changes the date from YYYY-MM-DD to YYYY/MM/DD
function ChangeSeperator() {
echo "$1" | tr '-' '/'
}
#Sorts the files into the correct directories, cuts the entire date from the file and turns it into a directory, uses the ChangeSeperator function from earlier make the parent directory and all sub directories
for file in ~/filesToSort/ff_*
do
directory=$(ChangeSeperator $(head -c 10 "$file"))
mkdir -p "$directory"
mv "$file" "$directory"
done
You don't need all those functions, just the one to translate the date from yyyy-mm-dd to the pathname yyyy/mm/dd should be enough.
for file in ~/filesToSort/ff_*
do
directory=$(ChangeSeperator $(head -c 10 "$file"))
mkdir -p "$directory"
cp "$file" "$directory"
done
The ChangeSeperator function needs to get the date from its parameter:
ChangeSeperator() {
echo "$1" | tr '-' '/'
}
First you'll probably need a loop somewhere to sieve through all the files and you might consider processing them one by one.
As for the date, you should probably look in the documentation for date that can give you most of your information right there.
By example :
date -d 2018-07-01 +"%Y/%m/%d"
2018/07/01
and by the way you can always do something like :
d=$(date -d 2018-07-01 +"%Y/%m/%d")
echo "d="$d
d=2018/07/01
mkdir -p $d
Hope it's enough pointer...not here to do your assignement :)

Grep a time stamp in the H:MM:SS format

Working on the file an need to grep the line with a time stamp in the H:MM:SS format. I tried the following egrep '[0-9]\:[0-9]\:[0-9]'. Didn't work for me. What am i doing wrong in regex?
$ date -u | egrep '\d{1,2}:\d{1,2}:\d{1,2}'
Fri May 2 00:59:47 UTC 2014
Try a site like http://regexpal.com/
Here is the fix:
grep '[0-9]:[0-9][0-9]:[0-9][0-9]'
If you need get timestamp only, and your grep is gnu grep.
grep -o '[0-9]:[0-9][0-9]:[0-9][0-9]'
and if you work more harder, limit on time format only:
grep '[0-2][0-9]:[0-5][0-9]:[0-5][0-9]'
Simplest way that I know of:
grep -E '([0-9]{2}:){2}[0-9]{2}' file
If you need month and day also:
grep -E '.{3,4} .{,2} ([0-9]{2}:){2}[0-9]{2}' file

Shell Script to get exception from logs for last one hour

I am developing script which will grep logs of last one hour and check any exception and send email for solaris platform.
I did following steps
grep -n -h date +'%Y-%m-%d %H:%M' test.logs
above command gives me line number and then i do following
tail +6183313 test.log | grep 'exception'
sample logs
2014-02-17 10:15:02,625 | WARN | m://mEndpoint | oSccMod | 262 - com.sm.sp-client - 0.0.0.R2D03-SNAPSHOT | 1201 or 101 is returned as exception code from SP, but it is ignored
2014-02-17 10:15:02,625 | WARN | m://mEndpoint | oSccMod | 262 - com.sm.sp-client - 0.0.0.R2D03-SNAPSHOT | SP error ignored and mock success returned
2014-02-17 10:15:02,626 | INFO | 354466740-102951 | ServiceFulfill | 183 - org.apache.cxf | Outbound Message
Please suggest any better alternative to perform above task.
With GNU date, one can use:
grep "^$(date -d -1hour +'%Y-%m-%d %H')" test.logs | grep 'exception'| mail -s "exceptions in last hour of test.logs" ImranRazaKhan
The first step above is to select all log entries from the last hour. This is done with grep by looking for all lines beginning with the year-month-day and hour that matches one hour ago:
grep "^$(date -d -1hour +'%Y-%m-%d %H')" test.logs
The next step in the pipeline is to select from those lines the ones that have exceptions:
grep 'exception'
The last step in the pipeline is to send out the mail:
mail -s "exceptions in last hour of test.logs" ImranRazaKhan
The above sends mail to ImranRazaKhan (or whatever email address you chose) with the subject line of "exceptions in last hour of test.logs".
The convenience of having the -d option to date should not be underestimated. It might seem simple to subtract 1 from the current hour but, if the current hour is 12am, then we need to adjust both the day and the hour. If the hour was 12am on the first of the month, we would also have to change the month. And likewise for year. And, of course, February requires special consideration during leap years.
Adapting the above to Solaris:
Consider three cases:
Under Solaris 11 or better, the GNU date utility is available at /usr/gnu/bin/date. Thus, we need simply to specify a path for date:
grep "^$(/usr/gnu/bin/date -d -1hour +'%Y-%m-%d %H')" test.logs | grep 'exception'| mail -s "exceptions in last hour of test.logs" ImranRazaKhan
Under Solaris 10 or earlier, one can download & install GNU date
If GNU date is still not available, we need to find another way to find the date and time for one hour ago. The simplest workaround is likely to select a timezone that is one hour behind your timezone. If that timezone was, say, Hong Kong, then use:
grep "^$(TZ=HongKong date +'%Y-%m-%d %H')" test.logs | grep 'exception'| mail -s "exceptions in last hour of test.logs" ImranRazaKhan
You can do like this:
dt="$(date -d '1 hour ago' "+%m/%d/%Y %H:%M:%S")"
awk -v dt="$dt" '$0 ~ dt && /exceltion/' test.logs
Scanning through millions lines of log sounds terribly inefficient. I would suggest changing log4j (what it looks like) configuration of your application to cut a new log file every hour. This way, tailing the most recent file becomes a breeze.

Trim text and add timestamp?

So basically I have my output as the following:
<span id="PlayerCount">134,015 people currently online</span>
What I want is a way to trim it to show:
134,015 - 3:24:20AM - Oct 24
Can anyone help? Also note the number may change so is it possible output everything between ">" and the "c" in currently? And add a timestamp somehow?
Using commands from terminal in Linux, so that's called bash right?
Do you perhaps mean something like:
$ echo '<span id="PlayerCount">134,015 people currently online</span>' | sed
-e 's/^[^>]*>//'
-e "s/currently.*$/$(date '+%r %b %d %Y')/"
which generates:
134,015 people 03:36:30 PM Oct 24 2011
The echo is just for the test data. The first sed command will change everything up to the first > character into nothing (ie, delete it).
The second one will change everything from the currently to the end of the line with the current date in your desired format (although I have added the year since I'm a bit of a stickler for detail).
The relevant arguments for date here are:
%r locale's 12-hour clock time (e.g., 11:11:04 PM)
%b locale's abbreviated month name (e.g., Jan)
%d day of month (e.g., 01)
%Y year
A full list of format specifiers can be obtained from the date man page (execute man date from a shell).
A small script which will give you the desired information from the page you mentioned in the comments is:
#!/usr/bin/bash
wget --output-document=- http://runescape.com/title.ws 2>/dev/null \
| grep PlayerCount \
| head -1l \
| sed 's/^[^>]*>//' \
| sed "s/currently.*$/$(date '+%r %b %d %Y')/"
Running this gives me:
pax$ ./online.sh
132,682 people 04:09:17 PM Oct 24 2011
In detail:
The wget bit pulls down the web page and writes it on standard output. The standard error (progress bar) is thrown away.
The grep extracts only lines with the word PlayerCount in them.
The head throws away all but the first of those.
The first sed strips up to the first > character.
The second sed changes the trailing text to the durrent date and time.
Quickhack(tm):
$ people=$(echo '<span id="PlayerCount">134,015 people currently online</span>' | \
sed -e 's/^.*>\(.*\) people.*$/\1/')
$ echo $people - $(date)
134,015 - Mon Oct 24 09:36:23 CEST 2011
produce_OUTPUT | grep -o '[0-9,]\+' | while read count; do
printf "%s - %s\n" $count "$(date +'%l:%M:%S %p - %b %e')"
done

find latest version of rpms from a mirror

I want to write a script to find the latest version of rpm of a given package available from a mirror for eg: http://mirror.centos.org/centos/5/updates/x86_64/RPMS/
The script should be able to run on majority of linux flavors (eg centos, redhat, ubuntu). So yum based solution is not an option. Is there any existing script that does this? Or can someone give me a general idea on how to go about this?
Thx to levislevis85 for the wget cli. Try this:
ARCH="i386"
PKG="pidgin-devel"
URL=http://mirror.centos.org/centos/5/updates/x86_64/RPMS
DL=`wget -O- -q $URL | sed -n 's/.*rpm.>\('$PKG'.*'$ARCH'.rpm\).*/\1/p' | sort | tail -1`
wget $URL/$DL
I Will put my comment here, otherwise the code will not be readable.
Try this:
ARCH="i386"
PKG="pidgin-devel"
URL=http://mirror.centos.org/centos/5/updates/x86_64/RPMS
DL=`wget -O- -q $URL | sed -n 's/.*rpm.>\('$PKG'.*'$ARCH'.rpm\).*<td align="right">\(.*\)-\(.*\)-\(.*\) \(..\):\(..\) <\/td><td.*/\4 \3 \2 \5 \6 \1/p' | sort -k1n -k2M -k3n -k4n -k5n | cut -d ' ' -f 6 | tail -1`
wget $URL/$DL
What it does is:
wget - get the index file
sed - cut out some parts and put it together in different order. Should result in Year Month Day Hour Minute and Package, like:
2009 Oct 27 01 14 pidgin-devel-2.6.2-2.el5.i386.rpm
2009 Oct 30 10 49 pidgin-devel-2.6.3-2.el5.i386.rpm
sort - order the columns n stays for numerical and M for month
cut - cut out the filed 6
tail - show only last entry
the problem with this could be, if some older package release comes after a newer then this script will also fail. If the output of the site changes, the script will fail. There are always a lot of points where a script could fail.
using wget and gawk
#!/bin/bash
pkg="kernel-headers"
wget -O- -q http://mirror.centos.org/centos/5/updates/x86_64/RPMS | awk -vpkg="$pkg" 'BEGIN{
RS="\n";FS="</a>"
z=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",D,"|")
for(i=1;i<=z;i++){
date[D[i]]=sprintf("%02d",i)
}
temp=0
}
$1~pkg{
p=$1
t=$2
gsub(/.*href=\042/,"",p)
gsub(/\042>.*/,"",p)
m=split(t,timestamp," ")
n=split(timestamp[1],d,"-")
q=split(timestamp[2],hm,":")
datetime=d[3]date[d[2]]d[1]hm[1]hm[2]
if ( datetime >= temp ){
temp=datetime
filepkg = p
}
}
END{
print "Latest package: "filepkg", date: ",temp
}'
an example run of the above:
linux$ ./findlatest.sh
Latest package: kernel-headers-2.6.18-164.6.1.el5.x86_64.rpm, date: 200911041457
Try this (which requires lynx):
lynx -dump -listonly -nonumbers http://mirror.centos.org/centos/5/updates/x86_64/RPMS/ |
grep -E '^.*xen-libs.*i386.rpm$' |
sort --version-sort |
tail -n 1
If your sort doesn't have --version-sort, then you'll have to parse the version out of the filename or hope that a regular sort will do the right thing.
You may be able to do something similar with wget or curl or even a Bash script using redirections with /dev/tcp/HOST/PORT. The problem with these is that you would then have to parse HTML.

Resources