downloading a single large file with aria2c - download

I want to download a file that is around 60GB in size.
My internet speed is 100mbps but download speed is not utilizing my entire bandwidth.
If I use aria2c to download this single file, I can utilize increased "connections per server"? It seems aria2c lets me use 16 max connections. Would this option even work for downloading a single file?
The way I'm visualizing how the download goes is like 1 connection tries to download from 1 sector of the file, while the other connection tries to download from a different sector. And basically, the optimal number of concurrent download is until the host bandwidth limit is reached (mine being 100mbps). And when the two connections collide in the sectors they are downloading, then aria2c would immediately see that that specific sector is already downloaded and skips to a different sector. Is this how it plays out when using multiple connections for a single file?

Is this how it plays out when using multiple connections for a single
file?
HTTP standard provide Range request header, which allow to say for example: I want part of file, starting at byte X and ending at byte Y. If server do support this gimmick then it respond with 206 Partial Content. Thus knowing length (size) of file (see Content-Length) it is possible to lay parts so they are disjoint and cover whole file.
Beware that not all servers support this gimmick. You need to check if server hosting file you want to download do so. This can be done using HEAD request, HTTP range requests provides example using curl
curl -I http://i.imgur.com/z4d4kWk.jpg
HTTP/1.1 200 OK
...
Accept-Ranges: bytes
Content-Length: 146515
If you have bytes in Accept-Ranges this mean that server does have support. If you wish you might any other tool able to send HEAD request and provide to you response headers.

Related

Unable to send data in chunks to server in Golang

I'm completely new to Golang. I am trying to send a file from the client to the server. The client should split it into smaller chunks and send it to the rest end point exposed by the server. The server should combine those chunks and save it.
This is the client and server code I have written so far. When I run this to copy a file of size 39 bytes, the client is sending two requests to the server. But the server is displaying the following errors.
2017/05/30 20:19:28 Was not able to access the uploaded file: unexpected EOF
2017/05/30 20:19:28 Was not able to access the uploaded file: multipart: NextPart: EOF
You are dividing buffer with the file into separate chunks and sending each of them as separate HTTP message. This is not how multipart is intended to be used.
multipart MIME means that a single HTTP message may contain one or more entities, quoting HTTP RFC:
MIME provides for a number of "multipart" types -- encapsulations of
one or more entities within a single message-body. All multipart types
share a common syntax, as defined in section 5.1.1 of RFC 2046
You should send the whole file and send it in a single HTTP message (file contents should be a single entity). The HTTP protocol will take care of the rest but you may consider using FTP if the files you are planning to transfer are large (like > 2GB).
If you are using a multipart/form-data, then it is expected to take the entire file and send it up as a single byte stream. Go can handle multi-gigabyte files easily this way. But your code needs to be smart about this.
ioutil.ReadAll(r.Body) is out of the question unless you know for sure that the file will be very small. Please don't do this.
multipartReader, err := r.MultipartReader() use a multipart reader. This will iterate over uploading files, in the order they are included in the encoding. This is important, because you can keep the file entirely out of memory, and do a Copy from one filehandle to another. This is how large files are handled easily.
You will have issues with middle-boxes and reverse proxies. We have to change defaults in Nginx so that it will not cut off large files. Nginx (or whatever reverse-proxy you might use) will need to cooperate, as they often are going to default to some really tiny file size max like 300MB.
Even if you think you dealt with this issue on upload with some file part trick, you will then need to deal with large files on download. Go can do single large files very efficiently by doing a Copy from filehandle to filehandle. You will also end up needing to support partial content (http 206) and not modified (304) if you want great performance for downloading files that you uploaded. Some browsers will ignore your pleas to not ask for partial content when things like large video is involved. So, if you don't support this, then some content will fail to download.
If you want to use some tricks to cut up files and send them in parts, then you will end up needing to use a particular Javascript library. This is going to be quite harmful to interoperability if you are going for programmatic access from any client to your Go server. But maybe you can't fix middle-boxes that impose size limits, and you really want to cut files up into chunks. You will have a lot of work to handle downloading the files that you managed to upload in chunks.
What you are trying to do is the typical code that is written with a tcp connection with most other languages, in GO you can use tcp too with net.Listen and eventually accept on the listener object. Then this should be fine.

File uploading using chunking : Jmeter

Can any one please let me know will jmeter support performance testing for file uploads more than 10gb files. The way the files are uploading is through chunking in JAVA. I cannot do the file upload for more than 10 GB because int allows max size of 2^31. In the http sampler i am declaring the file size as one one chunk
for eg: file size is 444,641,856 bytes, I am declaring the whole in one chunk instead of dividing it into chunks of 5mb each.
The developers are not willing to change the code and also if I give the result using one chunk size its not a valid performance test.
Can anyone suggest will jmeter allowing chunking mechanism ..... and also is there a solution for file uploading for more than 10Gb Files
Theoretically JMeter doesn't have 2GB limitation (especially HTTPClient implementations) so given you configured it properly you shouldn't face errors.
However if you don't have as much RAM as 10GB x number of virtual users you might want to try HTTP Raw Request sampler available via JMeter Plugins.
References:
https://groups.google.com/forum/#!topic/jmeter-plugins/VDqXDNDCr6w%5B1-25%5D
http://jmeter.512774.n5.nabble.com/fileupload-test-with-JMeter-td4267154.html

Loading a remote file into ffmpeg efficiently

My use case requires transcoding a remote MOV file that can’t be stored locally. I was hoping to use http protocol to stream the file into ffmpeg. This works, but I’m observing this to be a very expensive operation with (seemingly) redundant network traffic, so am looking for suggestions.
What I see is that ffmpeg starts out with a Range request “0-“ (which brings in the entire file), followed by a number of open-ended requests (no ending offset) at different positions, each of which makes the http server return large chunks of the file again and again, from the starting position to the very end.
For example, http range requests for a short 10MB file look like this:
bytes=0-
bytes=10947419-
bytes=36-
bytes=3153008-
bytes=5876422-
Is there another input method that would be more network-efficient for my use case? I control the server where the video file resides, so I’m flexible in what code runs there.
Any help is greatly appreciated

Retrieving dimensions of image without download whole image

I'm using open-uri to download remote images and then the imagesize gem to get the dimensions. The problem is this gets painfully slow when more than a handful of images needs to be processed.
How can I download enough information to know the dimensions for various image formats?
Are there any more ways to optimize this?
I believe if you go raw socket (issue bare bones http request), there's no need to download more than a few bytes (and abort the connection) to determine dimensions of images.
require 'uri'
require 'socket'
raise "Usage: url [bytes-to-read [output-filename]]" if ARGV.length < 1
uri = URI.parse(ARGV.shift)
bytes = (ARGV.shift || 50).to_i
file = ARGV.shift
$stderr.puts "Downloading #{bytes} bytes from #{uri.to_s}"
Socket.tcp(uri.host, uri.port) do |sock|
# http request
sock.print "GET #{uri.path} HTTP/1.0\r\nHost: #{uri.host}\r\n\r\n"
sock.close_write
# http response headers
while sock.readline.chomp != ""; end
# http response body, we need first N bytes
if file
open(file,"wb") {|f| f.write(sock.read(bytes)) }
else
puts sock.read(bytes)
end
end
e.g. if i push the first 33 bytes of a PNG file (13 bytes for a GIF) into exiftool, it will give me the image size
$ ruby download_partial.rb http://yardoc.org/images/ss5.png 33 | exiftool - | grep ^Image
Downloading 33 bytes from http://yardoc.org/images/ss5.png
Image Width : 1000
Image Height : 300
Image Size : 1000x300
I'm not aware of any way to specify how many bytes to download with a normal HTTPd request. It's an all or nothing situation.
Some file types do allow sections of the files, but, you would have to have control of the server in order to enable that.
It's been a long time since I've played at this level, but, theoretically you could use a block with Net::HTTP or Open-URI, and count bytes until you've received the appropriate number to get to the image size block, then close the connection. Your TCP stack would probably not be too happy with you, especially if you were doing that a lot. If I remember right, it wouldn't dispose of the memory until the connection had timed out and would eat up available connections, either on your side or the server's. And, if I ran a site and found my server's performance being compromised by your app prematurely closing connections I'd ban you.
Ultimately, your best solution is to talk to whoever owns the site you are pillaging, and see if they have an API to tell you what the file sizes are. Their side of the connection can find that out a lot faster than your side since you have to retrieve the entire file. If nothing else, offer to write them something that can accomplish that. Maybe they'll understand that, by enabling it, you won't be consuming all their bandwidth retrieving images.

Read header data from files on remote server

I'm working on a project right now where I need to read header data from files on remote servers. I'm talking about many and large files so I cant read whole files, but just the header data I need.
The only solution I have is to mount the remote server with fuse and then read the header from the files as if they where on my local computer. I've tried it and it works. But it has some drawbacks. Specially with FTP:
Really slow (FTP is compared to SSH with curlftpfs). From same server, with SSH 90 files was read in 18 seconds. And with FTP 10 files in 39 seconds.
Not dependable. Sometimes the mountpoint will not be unmounted.
If the server is active and a passive mounting is done. That mountpoint and the parent folder gets locked in about 3 minutes.
Does timeout, even when there's data transfer going (guess this is the FTP-protocol and not curlftpfs).
Fuse is a solution, but I don't like it very much because I don't feel that I can trust it. So my question is basically if there's any other solutions to the problem. Language is preferably Ruby, but any other will work if Ruby does not support the solution.
Thanks!
What type of information are you looking for?
You could try using ruby's open-uri module.
The following example is from http://www.ruby-doc.org/stdlib/libdoc/open-uri/rdoc/index.html
require 'open-uri'
open("http://www.ruby-lang.org/en") {|f|
p f.base_uri # <URI::HTTP:0x40e6ef2 URL:http://www.ruby-lang.org/en/>
p f.content_type # "text/html"
p f.charset # "iso-8859-1"
p f.content_encoding # []
p f.last_modified # Thu Dec 05 02:45:02 UTC 2002
}
EDIT: It seems that the op wanted to retrieve ID3 tag information from the remote files. This is more complex.
From wiki:
This appears to be a difficult problem.
On wiki:
Tag location within file
Only with the ID3v2.4 standard has it
been possible to place the tag data at
the end of the file, in common with
ID3v1. ID3v2.2 and 2.3 require that
the tag data precede the file. Whilst
for streaming data this is absolutely
required, for static data it means
that the entire audio file must be
updated to insert data at the front of
the file. For initial tagging this
incurs a large penalty as every file
must be re-written. Tag writers are
encouraged to introduce padding after
the tag data in order to allow for
edits to the tag data without
requiring the entire audio file to be
re-written, but these are not standard
and the tag requirements may vary
greatly, especially if APIC
(associated pictures) are also
embedded.
This means that depending on the ID3 tag version of the file, you may have to read different parts of the file.
Here's an article that outlines the basics of reading ID3 tag using ruby for ID3tagv1.1 but should server as a good starting point: http://rubyquiz.com/quiz136.html
You could also look into using a ID3 parsing library, such as id3.rb or id3lib-ruby; however, I'm not sure if either supports the ability to parse a remote file (Most likely could through some modifications).
A "best-as-nothing" solution would be to start the transfer, and stop it when dowloaded file has more than bytes. Since not many (if any) libraries will allow interruption of the connection, it is more complex and will probably require you to manually code a specific ftp client, with two threads, one doing the FTP connection and transfer, and the other monitoring the size of the downloaded file and killing the first thread.
Or, at least, you could parallelize the file transfers. So that you don't wait for all the files being fully transferred to analyze the start of the file. The transfer will then continue
There has been a proposal of a RANG command, allowing to retrieve only a part of the files (here, the first bytes).
I didn't find any reference of inclusion of this proposal, nor implementation, however.
So, for a specific server it could be useful to test (or check the docs of the FTP server) - and use it if available.

Resources