Using wget to download file to directory and email link/attachment to address - shell

I am using the following cron task:
wget -O ~/files/csv_backups/products_csv/products-`date +%Y-%m-%d-%H%M%S`.csv http://[site_URL]/product-report/csv
to grab a daily CSV of product sales. It is currently placing it in a files directory on the server and names it with a date stamp.
What I am looking to do is for it to also send an email with either an attachment to the file or a link to the file in the body, to a specific email address. I can't seem to find a way to do this. Any thoughts?
UPDATE:
I've tried with the code suggested but it just outputted the trailing path to the file, not a proper download link. So I tried the following
echo "Download File" | mail -s "Daily Products CSV Report" username#mail.com
But I get the echo printed in plain text. Closer..
UPDATE:
1) OK... So my first problem is that the timestamp I am trying to rename the file to doesn't work and the download fails. So I removed the timestamp (even though I would like it so it could make backups...) for now. The following code is my new line I am using to download the file. It should constantly download a new copy regardless of server timestamp. This is because the file is not actually generated at all until you visit that URL so there is no file to compare times with before wget tries to download it.
wget --timestamping -r -O files/csv_backups/products_csv/products.csv http://www.example.com/reports/product-report/csv
2) For the email, I removed the html and just left the following because for some reason the echo was rendering the html in plaintext which doesn't work. Why ever that is happening may be also why I couldn't use the $(ls -1t * | head -1) line at the end of the URL to grad the most recent file...
echo "http://www.example.com/files/csv_backups/products_csv/products.csv" | mail...

If you don't have mutt installed then you can do something like:
echo "$(ls -1t ~/files/csv_backups/products_csv/* | head -1)" |
mail -s "subject line" email#address.com
This will give path to the file in subject body
The whole echo $(ls -1t ~/files/csv_backups/products_csv/* | head -1) is because you will not know what the file name is since you are adding timestamp not datestamp. The %H%M%S will add hour minutes and seconds to the filename as well.
With the above command we assume the latest file in that directory will be the one downloaded recently.
Alternatively you can use uuencode which is part of GNU sharutils.
uuencode $(ls -1t ~/files/csv_backups/products_csv/* | head -1) $(ls -1t ~/files/csv_backups/products_csv/* | head -1) |
mail -s "subject line" email#address.com

Heiner Steven wrote a good article about [writing scripts for sending files via email](http://www.shelldorado.com/articles/mailattachments.html. He also wrote sendfile which uses the metamail package (available at the link above).
The other approach is to use perl ;-) (man perlfaq9 covers e-mail) or to dust off of uuencode.
uuencode Sales.cvs Sales.cvs | mail -s "Kaching" listofmanagers#widgetsinc.com
NB: I think all versions of uuencode require that you give the name of the attachment twice: once for the file to encode; and once for the name of the file for extraction from the message. uuencode may also appear as b64encode on some BSD systems. You can Base64 encode with uuencode using the -m switch.

Related

how to copy all the URLs of a certain column of a web page?

I want to import several number of files into my server using wget , the 492 files are here:
https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736
so I want to copy the URL of all files in "File Name" column to save them into a file and import them with wget.
So how can I copy all those URLs from that column ?
Thanks for reading :)
Since you've tagged bash, this should work.
wget -O- is used to output the data to the standard output, where it's greppable. (curl would do that by default.)
grep -oE is used to capture the URLs (which happily are in a regular enough format that a simple regexp works).
Then, wget -i is used to read URLs from the file generated. You might wish to add -nc or other suitable partial-fetch flags; those files are pretty hefty.
wget -O- https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736 | grep -oE 'http://ftp.sra.ebi.ac.uk/[^"]+' > urls.txt
wget -i urls.txt
First, I recommend using a more specific and robust implementation...
but, in the case you are against a wall and in a hurry -
$: curl -s https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736 |
sed -En '/href="http:\/\/.*clean.fastq.gz"/{s/^.*href="([^"]+)".*/\1/;p;}' |
while read url; do wget "$url"; done
This is a quick and dirty rough first pass, but it will give you something to work with.
If you aren't in a screaming hurry, try writing something more robust and step-wise in perl or python.

Pull information from wget data

I want to pull and display the twitter username of various accounts that I specify via the ID. I figured I could do this, in part, with wget.
echo what id would you like to search
read ID
wget https://twitter.com/intent/user?user_id=$ID > ~/temp/$ID
This is really as far as I got as I cant figure out how to pull the data from it. I have tried this;
read ID
source ~/temp/$ID
echo $value
To echo anything that was labeled as "value" (the username is labeled as "value" several times).
Examples:
Stack Overflow's Twitter account is #stackoverflow, and their twitter id is: 128700677 So I can run
wget https://twitter.com/intent/user?user_id=128700677
and the document will be a nice 248 line long HTML document, you can try it and see. So basically, is there a way to have the script either go through and find the most common value="" or just go to/display <title>Stack Overflow (#StackOverflow) on Twitter</title> without the <title></title> and on Twitter
PS: Would this count as bootstrapping?
EDIT-----------------------------
This needs to be able to work with bash because I already have a system set up in bash. This will just help confirm #s
As that-other-guy said, it would be better to use twitter API to find that out. However, you can try and push your method a little bit further, like
wget -O - "https://twitter.com/intent/user?user_id=${ID}" | grep -Po "(?<=screen_name=).*(?=')" | head -n 1
to filter out strings like href='/intent/user?screen_name=StackOverflow' and extract what's after screen_name= part in the first string.
P.S. I didn't notice a lot of value= in the html, to be honest, and sourcing something like html in your script is not the best thing to do, as you may get something destructive executing this way.
screen_name could be fetched with:
read -r ID ;\
screen_name=$(wget -q -O - http://twitter.com/intent/user?user_id="$ID" | sed -n 's/^.*button follow".*screen_name=\([^"]*\)".*$/\1/p')
printf "%s\n" "$screen_name"
nickname could be fetched with:
read -r ID ;\
nickname=$(wget -q -O - https://twitter.com/intent/user?user_id=128700677 | sed -n 's/^.*"nickname">\([^<]*\)<.*$/\1/p')
printf "%s\n" "$nickname"
title could be fetched with:
read -r ID ;\
title=$(wget -q -O - https://twitter.com/intent/user?user_id=128700677 | sed -n 's/^.*<title>\(.*\) on Twitter<.title>.*$/\1/p')
printf "%s\n" "$title"
The use of the REST API sounds a better idea.

Need the name for a URL that contains lots of garbage expect the name. (Advanced BASH)

http://romhustler.net/file/54654/RFloRzkzYjBxeUpmSXhmczJndVZvVXViV3d2bjExMUcwRmdhQzltaU5UUTJOVFE2TVRrM0xqZzNMakV4TXk0eU16WTZNVE01TXpnME1UZ3pPRHBtYVc1aGJGOWtiM2R1Ykc5aFpGOXNhVzVy <-- Url that needs to be identified
http://romhustler.net/rom/ps2/final-fantasy-x-usa <-- Parent url
If you copy paste this url you will see the browser identify the files name. How can I get a bash script to do the same ?
I need to WGET the first URL, but because it will be for 100 more items i cant copy paste each URL.
I currently have the menu set up for all the files. Just dont know how to mass download each file individually as the URL's for the files have no matching patterns.
*Bits of my working menu:
#Raw gamelist grabber
w3m http://romhustler.net/roms/ps2 |cat|egrep "/5" > rawmenu.txt
#splits initial file into a files(games01) that contain 10 lines.
#-d puts lists files with 01
split -l 10 -d rawmenu.txt games
#s/ /_/g - replaces spaces with underscore
#s/__.*//g - removes anything after two underscores
select opt in\
$(cat games0$num|sed -e 's/ /_/g' -e 's/__.*//g')\
"Next"\
"Quit" ;
if [[ "$opt" =~ "${lines[0]}" ]];
then
### Here the URL needs to be grabbed ###
This has to be done is BASH. Is this possible ?
It appears that romhustler.net use some Javascript on thier full download pages to hide the final download link for a few seconds after the page loads, possibly to prevent this kind of web scraping.
However, if they were using direct links to ZIP files for example, we could do this:
# Use curl to get the HTML of the page and egrep to match the hyperlinks to each ROM
curl -s http://romhustler.net/roms/ps2 | egrep -o "rom/ps2/[a-zA-Z0-9_-]+" > rawmenu.txt
# Loop through each of those links and extract the full download link
while read LINK
do
# Extract full download link
FULLDOWNLOAD=`curl -s "http://romhustler.net$LINK" | egrep -o "/download/[0-9]+/[a-zA-Z0-9]+"`
# Download the file
wget "http://romhustler.net$FULLDOWNLOAD"
done < "rawmenu.txt"

Grab just the first filename from a zip file stream?

I want to extract just the first filename from a remote zip archive without downloading the entire zip. In particular, I'm trying to get the build number of dartium (link to zip file). Since the file is quite large, I don't want to download the entire thing.
If I download the entire thing, unzip -l reports the first file as being: 0 2013-04-07 12:18 dartium-lucid64-inc-21033.0/. I want to get just this filename so I can parse out the 21033 portion as the build number.
I was doing this (total hack):
_url="https://storage.googleapis.com/dartium-archive/continuous/dartium-lucid64.zip"
curl -s $_url | head -c 256 | sed -n "s:.*dartium-lucid64-inc-\([0-9]\+\).*:\1:p"
It was working when I had my shell in ASCII mode, but I recently switched it to UTF-8 and it seems sed is now honoring that, which breaks my script.
I thought about hacking it by doing:
export LANG=
curl -s ...
But that seemed like an even bigger hack.
Is there a better way?
Firstly, you can set bytes range using curl.
Next, use "strings" to extract all strings from binary stream.
Add "q" after "p" to quit after find only first occurrence.
curl -s $_url -r0-256 | strings | sed -n "s:.*dartium-lucid64-inc-\([0-9]\+\).*:\1:p;q"
Or this:
curl -s $_url -r0-256 | strings | sed -n "/dartium-lucid64/{s:.*-\([^-]\+\)\/.*:\1:p;q}"
It must be a bit faster and more reliable. Also it extracts full version, including subversion (if you need it).

Uuencode not attaching file to email and grep not listing file names

I've been trying to figure this out, but not matter what I try it doesn't seem to be working as I want. Basically the things that are missing is that grep is not listing the file names when it finds a match (which is what the -H flag is supposed to do, I think?) and uuencode doesn't seem to want to attach the file in the email. I've tried both uuencode and cat and I'm getting nowhere.
Does anybody have any idea what might be the problem here?
for i in `ls SystemOut_*[0-9].log`; do
grep -inEH '^\[.*(error|exception)' $i >> scannedErrors.txt;
mv "$i" "${i%.log}"_scanned.log;
done
if [[ -s scannedErrors.txt ]]; then
uuencode scannedErrors.txt | mailx -s "Scanned Logfile Errors" someone#somewhere.com < Message.txt;
fi
/bin/rm scannedErrors.txt;
uuencode scannedErrors.txt is expecting input - when called with only one option, it treats that option as the name of the file to include in the output, not the name of the file to read. So either do cat scannedErrors.txt | uuencode scannedErrors.txt or uuencode scannedErrors.txt scannedErrors.txt.
I don't necessarily see anything wrong with the grep bit, but it might be useful to put a cat scannedErrors.txt in there to make sure you're finding what you want. You could also do grep ... SystemOut_*[0-9].log rather than the for loop, which would make the -H the default, but you'd probably still need the for loop to rename things, unless you happen to have the rename script available (a perl script that allows creative renaming via regular expressions).
try :
{ cat scannedErrors.txt|uuencode scannedErrors.txt;cat Message.txt;}|mailx -s "Scanned Logfile Errors" someone#somewhere.com
because there is a conflict between | and <

Resources