Extract picasaweb album names - bash

I'd just like to get the album names. Here's an example page:
http://picasaweb.google.com/sunnchoi
But when I wget it and grep for a title pattern, I get 100 results. I understand that I have to emulate clicking the 'Show More Albums' link. How do I do that (using bash utils/perl)?

Try the Picases Web Album API.
They have examples in Python/Java and other languages. Here's request a list of albums (this one using python).

If you have xmlstarlet available, you can directly parse the corresponding RSS URL of the given website:
xmlstarlet sel --net -T -t -m '//item' -v 'title' -n \
'http://picasaweb.google.com/data/feed/base/user/sunnchoi?alt=rss&kind=album&hl=en_US&access=public' |
nl

Related

Get title of an RSS feed with bash

How can I get the title of an RSS feed with Bash? Say I want to get the most recent article from MacRumors. Their RSS feed link is http://feeds.macrumors.com/MacRumors-All. How can I get the most recent article title with Bash?
An alternative to xmllint is xmlstarlet and so:
curl -s http://feeds.macrumors.com/MacRumors-All | xmlstarlet sel -t -m "/rss/channel/item[1]" -v "title"
Use the xmlstarlet sel command to select the xpath we are looking for and then use -v to display a specific element.
You can combine curl and an XPath expression (here, using xmllint), and rely on the fact that the feed is in reverse chronological order:
curl http://feeds.macrumors.com/MacRumors-All | xmllint --xpath '/rss/channel/item[1]/title/text()'
See How to execute XPath one-liners from shell? for other ways to evaluate XPath.
In particular, if you have an older xmllint with --xpath, you may be able to use the technique suggested by this wrapper:
echo 'cat /rss/channel/item[1]/title/text()' | xmllint --shell <(curl http://feeds.macrumors.com/MacRumors-All)

Bash wget filter specific word

i want to filter a specific word from a website using wget.
the word i want to filter out is hPa and the value of it.
see: https://www.foreca.de/Deutschland/Berlin/Berlin
i can't find useful information on how to filter out a specific string.
this is what i've tried so far:
#!/bin/bash
LAST=$(wget -l1 https://www.foreca.de/Deutschland/Berlin/Berlin -O - | sed -e 'hPa')
echo $LAST
thanks for helping me out.
A fully fledged solution using xpath :
Command :
$ saxon-lint --html --xpath '//div[contains(text(), "hPa")]/text()' \
'https://www.foreca.de/Deutschland/Berlin/Berlin'
Output :
1026 hPa
Notes :
Don't parse HTML with regex, use a proper XML/HTML parser like we do here. Check: Using regular expressions with HTML tags
Check https://github.com/sputnick-dev/saxon-lint (my own project)
if what I wrote bores you and you just want a quick and dirty command even if it's evil, then use curl -s https://www.foreca.de/Deutschland/Berlin/Berlin | grep -oP '\d+\s+hPa'

How to get the highest numbered link from curl result?

i have create small program consisting of a couple of shell scripts that work together, almost finished
and everything seems to work fine, except for one thing of which i'm not really sure how to do..
which i need, to be able to finish this project...
there seem to be many routes that can be taken, but i just can't get there...
i have some curl results with lots of unused data including different links, and between all data there is a bunch of similar links
i only need to get (into a variable) the link of the highest number (without the always same text)
the links are all similar, and have this structure:
always same text
always same text
always same text
i was thinking about something like;
content="$(curl -s "$url/$param")"
linksArray= get from $content all links that are in the href section of the links
that contain "always same text"
declare highestnumber;
for file in $linksArray
do
href=${1##*/}
fullname=${href%.html}
OIFS="$IFS"
IFS='_'
read -a nameparts <<< "${fullname}"
IFS="$OIFS"
if ${nameparts[1]} > $highestnumber;
then
highestnumber=${nameparts[1]}
fi
done
echo ${nameparts[1]}_${highestnumber}.html
result:
https://always/same/link/unique-name_19.html
this was just my guess, any working code that can be run from bash script is oke...
thanks...
update
i found this nice program, it is easily installed by:
# 64bit version
wget -O xidel/xidel_0.9-1_amd64.deb https://sourceforge.net/projects/videlibri/files/Xidel/Xidel%200.9/xidel_0.9-1_amd64.deb/download
apt-get -y install libopenssl
apt-get -y install libssl-dev
apt-get -y install libcrypto++9
dpkg -i xidel/xidel_0.9-1_amd64.deb
it looks awsome, but i'm not really sure how to tweak it to my needs.
based on that link and the below answer, i guess a possible solution would be..
use xidel, or use "$ sed -n 's/.href="([^"]).*/\1/p' file" as suggested in this link, but then tweak it to get the link with html tags like:
< a href="https://always/same/link/same-name_17.html">always same text< /a>
then filter out all that doesn't end with ( ">always same text< /a> )
and then use the grep sort as mentioned below.
Continuing from the comment, you can use grep, sort and tail to isolate the highest number of your list of similar links without too much trouble. For example, if you list of links is as you have described (I've saved them in a file dat/links.txt for the purpose of the example), you can easily isolate the highest number in a variable:
Example List
$ cat dat/links.txt
always same text
always same text
always same text
Parsing the Highest Numbered Link
$ myvar=$(grep -o 'https:.*[.]html' dat/links.txt | sort | tail -n1); \
echo "myvar : '$myvar'"
myvar : 'https://always/same/link/same-name_19.html'
(note: the command above is all one line separate by the line-continuation '\')
Applying Directly to Results of curl
Whether your list is in a file, or returned by curl -s, you can apply the same approach to isolate the highest number link in the returned list. You can use process substitution with the curl command alone, or you can pipe the results to grep. E.g. as noted in my original comment,
$ myvar=$(grep -o 'https:.*[.]html' < <(curl -s "$url/$param") | sort | tail -n1); \
echo "myvar : '$myvar'"
or pipe the result of curl to grep,
$ myvar=$(curl -s "$url/$param" | grep -o 'https:.*[.]html' | sort | tail -n1); \
echo "myvar : '$myvar'"
(same line continuation note.)
Why not use Xidel with xquery to sort the links and return the last?
xidel -q links.txt --xquery "(for $i in //#href order by $i return $i)[last()]" --input-format xml
The input-format parameter makes sure you don't need any html tags at the start and ending of your txt file.
If I'm not mistaken, in the latest Xidel the -q (quiet) param is replaced by -s (silent).

reverse geocoding in bash

I have a gps unit which extracts longitude and latitude and outputs as a google maps link
http://maps.googleapis.com/maps/api/geocode/xml?latlng=51.601154,-0.404765&sensor=false
From this i'd like to call it via curl and display the "short name" in line 20
"short_name" : "Northwood",
so i'd just like to be left with
Northwood
so something like
curl -s http://maps.googleapis.com/maps/api/geocode/xml?latlng=latlng=51.601154,-0.404765&sensor=false sed sort_name
Mmmm, this is kind of quick and dirty:
curl -s "http://maps.googleapis.com/maps/api/geocode/json?latlng=40.714224,-73.961452&sensor=false" | grep -B 1 "route" | awk -F'"' '/short_name/ {print $4}'
Bedford Avenue
It looks for the line before the line with "route" in it, then the word "short_name" and then prints the 4th field as detected by using " as the field separator. Really you should use a JSON parser though!
Notes:
This doesn't require you to install anything.
I look for the word "route" in the JSON because you seem to want the road name - you could equally look for anything else you choose.
This isn't a very robust solution as Google may not always give you a route, but I guess other programs/solutions won't work then either!
You can play with my solution by successively removing parts from the right hand end of the pipeline to see what each phase produces.
EDITED
Mmm, you have changed from JSON to XML, I see... well, this parses out what you want, but I note you are now looking for a locality whereas before you were looking for a route or road name? Which do you want?
curl -s "http://maps.googleapis.com/maps/api/geocode/xml?latlng=51.601154,-0.404765&sensor=false" | grep -B1 locality | grep short_name| head -1|sed -e 's/<\/.*//' -e 's/.*>//'
The "grep -B1" looks for the line before the line containing "locality". The "grep short_name" then gets the locality's short name. The "head -1" discards all but the first locality if there are more than one. The "sed" stuff removes the <> XML delimiters.
This isn't text, it's structured JSON. You don't want the value after the colon on line 12, you want the value of short name in the address_component with type 'route' from the result.
You could do this with jsawk or python, but it's easier to get it from XML output with xmlstarlet, which is lighter than python and more available than jsawk. Install xmlstarlet and try:
curl -s 'http://maps.googleapis.com/maps/api/geocode/xml?latlng=40.714224,-73.961452&sensor=false' \
| xmlstarlet sel -t -v '/GeocodeResponse/result/address_component[type="route"]/short_name'
This is much more robust than trying to parse JSON as plaintext.
The following seems to work assuming you always like the short_name at line 12:
curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng=40.714224,-73.961452&sensor=false' | sed -n -e '12s/^.*: "\([a-zA-Z ]*\)",/\1/p'
or if you are using the xml api and wan't to trap the short_name on line 20:
curl -s 'http://maps.googleapis.com/maps/api/geocode/xml?latlng=51.601154,-0.404765&sensor=false' | sed -n -e '19s/<short_name>\([a-zA-Z ]*\)<\/short_name>/\1/p'

Retrieve latest twitter tweet with a particular hashtag via CLI

I'm looking for a way to retrieve the latest tweet that contains a particular twitter hashtag via the cli (bash). Something that would run like: "./get-tweet.sh blah" and return "Dude I'm feeling so #blah" Thanks!
Looks like I can get the rss feed by doing this:
curl -s 'http://search.twitter.com/search.rss?q=%23blah&rpp=1'
I would then just need to cut out the correct xml
I'd look at TTYtter for doing that. In particular it allows for scripting like so:
ttytter -runcommand="/search #haiku"
You'll need to do the initial setup interactively though for oAuth.
OK, I hacked together a solution:
!/bin/bash
curl -s "http://search.twitter.com/search.rss?q=%23$1&rpp=1" > /tmp/hashtag.xml
xmllint --xpath '/rss/channel/item/title/text()' /tmp/hashtag.xml | sed 's/http\://t.co/*//g' | sed 's/#//g' | sed 's/#//g' |xargs -i -0 echo -n "{}"
echo
rm /tmp/hashtag.xml

Resources