Issue with wget trying to get images from certain websites - bash

I am trying to download all images off this website path http://www.samsung.com/sg/consumer/mobile-devices/smartphones/ using the below code
wget -e robots=off -nd -nc -np --recursive -r -p --level=5 --accept jpg,jpeg,png,gif --convert-links -N --limit-rate=200k --wait 1.0 -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:14.0) Gecko/20100101 Firefox/14.0.1' -P testing_folder www.samsung.com/sg/consumer/mobile-devices/smartphones
I would expect to see the images of the phones downloaded to my testing_folder.But all I see is some global images like logo etc. I dont seem to be able to get the phone images downloaded. The code above seems to work on some other websites through.
I have gone through all the wget questions on this forum but this particular issue doesnt seem to have an answer. Can someone help, I am sure there is a easy out. What am I doing wrong ?
UPDATE:
It looks like it is an issue with possible javascript pages and hence seems like end of the road, since apparently wget cant handle javascript pages well. If anyone can still help, will be delighted.

Steps:
configure a proxy server, for example Apache httpd with mod_proxy and mod_http_proxy
visit the page with a web browser that supports JavaScript and is configured to use your proxy server
harvest the URLs from the proxy server log file and put them in a file
Or:
Start Firefox and open web page
F10 - Tools - Page Info - Media - right click - select all - right click - copy
Paste into file with your favourite editor
Then:
optionally, (if you don't want to find out how to get wget read a list of URLs from a file), add minimal html tags (html, body and img) to the file
use wget to download the image specifying the file created in step 3 or 4 as the starting point

Related

URL-forwarding to download a file: wget only downloads the index.html

From time to time I have to download a specific file from a website with wget. The URL is very long, so I created a free .tk-domain that forwards to the file. If I use my new .tk-URL in my browser, it downloads the file as I want it but on my VPS on Ubuntu, it only downloads the index.html file if I use wget. I've two forwarding options on Dot.TK
Frame (Cloaking)
Redirect (HTTP 301 Forwarding)
Which option should I use and is there a way to get the file instead of the index.html?
If you use a 301, wget should be able to download the file. You can also use curl -LO <URL> with the 301.

How to download a file using command line

I need to download the following file using the command line to my remote computer:
download link
The point is that if I use wget of curl, I just get a html document. but, if I enter this address in my browser (on my laptop), it simply starts downloading.
Now, my question is that since the only way to access my remote machine is through command line, how I can download it directly on that machine using the command line?
Thanks
Assuming that you are using a linux terminal.
You can use a command line browser like Lynx to click on links and download files.
The link provided by you isn't a normal file link, this link sends the filename as a GET variable, another page with form is sent by server as a response to this request. So wget, cURL will not work.
That website likely tracking session and checks if you've submitted the data & confirmed you're not a robot
Try different approach: copy it from your machine to remote via scp:
scp /localpath/to/file username#remotehost.com:/path/to/destination
Alternatively, you may export cookies from your local machine to remote and then pass them to wget with ‘--load-cookies file’ option, but can't guarantee it will work 100% if site also tracks session ID to IP
Here's Firefox extension for exporting cookies:
https://addons.mozilla.org/en-US/firefox/addon/export-cookies/
Once you have cookies.txt file just scp it to remote machine and run wget with '--load-cookies file' option
One of the authors of the corpus here.
As pointed out by a friend of mine, this tool solves all the problems.
https://addons.mozilla.org/en-GB/firefox/addon/cliget/
After installation you just click the download link and copy the generated command to the remote machine. Just tried it, works perfectly. We should put that info on the download page.

cURL makes invalid bencoding when downloading torrents from torcache

The title says it all. I realize that a similar question has been asked at https://askubuntu.com/questions/307566/wget-and-curl-somehow-modifying-bencode-file-when-downloading/310507#310507 but I don't think the same solution works, because I have tried to unzip the file using 7zip, and also Gzip for windows (http://gnuwin32.sourceforge.net/packages/gzip.htm). Both claim the file to be of wrong format. Renaming it's extension to .gz or .zip doesn't help either. The --compressed attribute is no help as well. So, my guess is something's changed on the torcache site. I've tried using the user-agent as well, to no avail.
In a related issue, I guess, when I try downloading from the https site, I recieve "curl: (52) Empty reply from server". Only http works, and that gives me invalid bencoding. When I enter the URL on my browser, the torrent file downloads all by itself.
The command I'm entering is as follows:
curl -O http://torcache.net/torrent/006DDC8C407ACCDAF810BCFF41E77299A373296A.torrent

Does anyone know how to download a project from nitrous.io?

I made an ruby web application on nitrous.io, the tool is very nice and it helped a lot but now I want to download ther project in my computer and I didn't found any option to do that...
You can download and upload projects by any of the following options:
Utilize Nitrous Desktop to Sync your files locally.
Upload your project to Github, and pull the project from there. Here is a guide on adding the SSH key to Github if needed.
Upload the content via SCP. To do this, you will need to add an SSH Key to your account.
Next, run this command on your local machine, replacing {PORT} with the port # assigned to your Nitrous.IO box, and also changing usw1 with the proper region found in the SSH URI of your boxes page.
To Upload:
scp -P{PORT} -r path/to/yourFolder action#usw1-2.nitrousbox.com:~/workspace
To Download:
scp -P{PORT} -r action#usw1-2.nitrousbox.com:~/workspace path/to/yourLocalFolder
I do not know the service, but apparently they offer ssh access. Then you can use scp to copy the files to your machine. Anyway, probably you should ask their support...
...post a summary of their answer here and close the question :)
The easiest way is to store your project in a Git repository and then push this repository to an external host. You will then be able to clone your project from the external repository to any machine you want.
Personally, I use Bitbucket (Bitbucket as it is free and very easy to set up. Have a look at the tutorials there.
ok replying really late but I hope this will help anyone still looking for this. Here is how I download stuff from nitrous, no desktop utility download needed, and no ssh/scp or adding keys.
What you do is, simply make a archive for the folder you want to download by
tar -zcvf myarchive.tar.gz mydir/
now you got a *.gz file right? Whichever folder your gz file is in, be there and type:
python3.3 -m http.server 8080
you just started a cute little http server ready to serve you your download, now from the Preview menu click "Port 8080", this opens a new browser tab showing your gz file in the file listing (sample url http://yourboxes.apse1.nitrousbox.com:8080/). Now you can click your gz file and it will start downloading. Once done with the download, press Ctrl+C on the terminal to terminate the http server.
This is not limited to nitrous, you can make this work on many online VMs like cloud9 etc.

See webserver response

When debugging a website (e.g. developed using Wicket), what client-tool one can use to see the webserver's exact response (e.g. 301 or 302)?
use apt-get or similar to install wget and then do
wget -S http://localhost:8090
Where "localhost:8090" is the url path to the site
To see all headers from the GET request. wget also has options that support POST requests
Alternatively use a browser with firebug to see the response codes

Resources