cURL makes invalid bencoding when downloading torrents from torcache - windows

The title says it all. I realize that a similar question has been asked at https://askubuntu.com/questions/307566/wget-and-curl-somehow-modifying-bencode-file-when-downloading/310507#310507 but I don't think the same solution works, because I have tried to unzip the file using 7zip, and also Gzip for windows (http://gnuwin32.sourceforge.net/packages/gzip.htm). Both claim the file to be of wrong format. Renaming it's extension to .gz or .zip doesn't help either. The --compressed attribute is no help as well. So, my guess is something's changed on the torcache site. I've tried using the user-agent as well, to no avail.
In a related issue, I guess, when I try downloading from the https site, I recieve "curl: (52) Empty reply from server". Only http works, and that gives me invalid bencoding. When I enter the URL on my browser, the torrent file downloads all by itself.
The command I'm entering is as follows:
curl -O http://torcache.net/torrent/006DDC8C407ACCDAF810BCFF41E77299A373296A.torrent

Related

How to download a file using command line

I need to download the following file using the command line to my remote computer:
download link
The point is that if I use wget of curl, I just get a html document. but, if I enter this address in my browser (on my laptop), it simply starts downloading.
Now, my question is that since the only way to access my remote machine is through command line, how I can download it directly on that machine using the command line?
Thanks
Assuming that you are using a linux terminal.
You can use a command line browser like Lynx to click on links and download files.
The link provided by you isn't a normal file link, this link sends the filename as a GET variable, another page with form is sent by server as a response to this request. So wget, cURL will not work.
That website likely tracking session and checks if you've submitted the data & confirmed you're not a robot
Try different approach: copy it from your machine to remote via scp:
scp /localpath/to/file username#remotehost.com:/path/to/destination
Alternatively, you may export cookies from your local machine to remote and then pass them to wget with ‘--load-cookies file’ option, but can't guarantee it will work 100% if site also tracks session ID to IP
Here's Firefox extension for exporting cookies:
https://addons.mozilla.org/en-US/firefox/addon/export-cookies/
Once you have cookies.txt file just scp it to remote machine and run wget with '--load-cookies file' option
One of the authors of the corpus here.
As pointed out by a friend of mine, this tool solves all the problems.
https://addons.mozilla.org/en-GB/firefox/addon/cliget/
After installation you just click the download link and copy the generated command to the remote machine. Just tried it, works perfectly. We should put that info on the download page.

BASH: Get HTTP link of file

Is there a way to get a HTTP link of a file from a script?
For example:
I have a file at:
/home/User/video.mp4
Next, I would like to get the http link of that file. For example:
http://192.168.1.5/video.mp4
I currently have nginx installed onto the remote server with a specific directory as the root of the web server.
On the server I have, you can get the server link using this:
echo "http://$(whoami).$(hostname -f)/path/to/file"
I could get the file link using the command above but this would be an issue with files with spaces in them.
I'm doing this so that I can send the link to Internet Download Manager under windows. So using wget to download files will not work for me.
I'm currently using cygwin to create the script.
To solve the spaces problem, you can replace them with %20:
path="http://$(whoami).$(hostname -f)/path/to/file"
path=${path// /%20}
echo $path
Regards.

Issue with wget trying to get images from certain websites

I am trying to download all images off this website path http://www.samsung.com/sg/consumer/mobile-devices/smartphones/ using the below code
wget -e robots=off -nd -nc -np --recursive -r -p --level=5 --accept jpg,jpeg,png,gif --convert-links -N --limit-rate=200k --wait 1.0 -U 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:14.0) Gecko/20100101 Firefox/14.0.1' -P testing_folder www.samsung.com/sg/consumer/mobile-devices/smartphones
I would expect to see the images of the phones downloaded to my testing_folder.But all I see is some global images like logo etc. I dont seem to be able to get the phone images downloaded. The code above seems to work on some other websites through.
I have gone through all the wget questions on this forum but this particular issue doesnt seem to have an answer. Can someone help, I am sure there is a easy out. What am I doing wrong ?
UPDATE:
It looks like it is an issue with possible javascript pages and hence seems like end of the road, since apparently wget cant handle javascript pages well. If anyone can still help, will be delighted.
Steps:
configure a proxy server, for example Apache httpd with mod_proxy and mod_http_proxy
visit the page with a web browser that supports JavaScript and is configured to use your proxy server
harvest the URLs from the proxy server log file and put them in a file
Or:
Start Firefox and open web page
F10 - Tools - Page Info - Media - right click - select all - right click - copy
Paste into file with your favourite editor
Then:
optionally, (if you don't want to find out how to get wget read a list of URLs from a file), add minimal html tags (html, body and img) to the file
use wget to download the image specifying the file created in step 3 or 4 as the starting point

Can't curl then unzip zip file

I'm just trying to curl this zip file, then unzip it
curl -sS https://www.kaggle.com/c/word2vec-nlp-tutorial/download/labeledTrainData.tsv.zip > labeledTrainData.tsv.zip
unzip labeledTrainData.tsv.zip labeledTrainData.tsv
but I keep getting the error;
Archive: labeledTrainData.tsv.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
I'm using the same syntax as found in this response, I think. Is there something wrong with the file I'm downloading? I feel like I'm making a noob mistake. I run these two commands in a shell script
I am able to replicate your error. This sort of error generally indicates one of two things:
The file was not packaged properly
You aren't downloading what you think you're downloading.
In this case, your problem is the latter. It looks like you're downloading the file from the wrong URL. I'm seeing this when I open up the alleged zip file for reading:
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
Long story short, you need to download from the alternate URL specified above. Additionally, Kaggle usually requires login credentials when downloading, so you'll need to specify your username/password as well.

CMake ExternalProject_Add proxy settings

I have been quite successfully using CMake to perform builds using the ExternalProject_Add function, but my company recently put in a proxy server... Which has broken the aforementioned build scripts.
The download step fails during the extract phase because the tarball that was downloaded is only the redirect request from the proxy server (at least I think this is what is contained in the tiny tarball it acquires).
I found this post on the CMake mailing-list. I thought maybe if it worked for the file() command it might work for the ExternalProject_Add() command. I set both http_proxy and HTTP_PROXY environment variables, but still received the same error. I have thought about overriding the DOWNLOAD_COMMAND argument with a wget call since this command seems to behave with the proxy settings. However, I wanted to know if there was a better way.
UPDATE 1: I checked the contents of the small tarball, and it does contain HTML; however, it is a notification that Authentication is required. I'm not sure why it is requiring authentication because I haven't had to enter any login information for wget. wget shows the following output:
Resolving webproxy... 10.0.1.50
Connecting to webproxy|10.0.1.50|:80... connected.
Proxy request sent, awaiting response... 200 OK
Download begins here...
UPDATE 2: I have also noticed that both apt-get and svn fail with this new proxy setup, but git does not... svn complains about "Server sent unexpected return value (307 Proxy Redirect)..." Very confusing...
Thanks!
What version of CMake are you using? The file(DOWNLOAD command started using the follow redirect flag in version 2.8.2, introduced by the following commit:
http://cmake.org/gitweb?p=cmake.git;a=commitdiff;h=ef491f78218e255339278656bf6dc26073fef264
Using a custom DOWNLOAD_COMMAND is certainly a reasonable workaround.

Resources