How to make a downloader in java - java-io

I am making a downloader in java to download small to large files.
My logic to download files is as follows
URL url=new URL(urlToGetFile);
int count=-1; //this is for counter
int offset=0;
BufferedInputStream bufferedInputStream=new BufferedInputStream(url.openStream());
FileOutputStream fileOutputStream=new FileOutputStream(FinalFilePath);
byte data[] = new byte[1024];
while( ((count=bufferedInputStream.read(data,0,1024))!=-1) )
{
fileOutputStream.write(data,0, 1024);
}
bufferedInputStream.close();
fileOutputStream.close();
PrintLine("File has download");
And it works only for small files but as I download large files these are download but are corrupted.
After reading many questions I am also little bit confused that why everyone is coding fileOutputStream.write(data,0, 1024); to make offset to 0 and same with offset for bufferedInputStream.
I also want to know how to change that offset for BufferedInputStream and for FileOutputStream. While getting bytes in loop.

You need to write the amount that was read.
When you read into the buffer you can read fewer than 1024 bytes. For example a 1200-byte file would be read as 1024 + 176. Your count variable stores how much was actually read, which would be 176 the second time around your loop.
The reason for corruption is that you would be writing 176 'good' bytes plus (1024 - 176 = 848) additional bytes that were still in the data array from the previous read.
So try:
while( ((count=bufferedInputStream.read(data,0,1024))!=-1) )
{
fileOutputStream.write(data,0, count);
}
The zero offset in that write call is an offset into data, which you really do want to be zero. See the Javadoc for details. There is no difference for other stream types.

Related

Memory limitation of Jmeter Beanshell sampler holding many variables

I have a csv file with 450K rows and 2 columns. Using the CSV data config results in SocketException: Too many open files error on some load generators. To get around it, I used a Beanshell sampler to read the contents of the large csv in memory just once, however when it tries to save variable # 22,770 it throws java.lang.ArrayIndexOutOfBoundsException: null
Here is my simple code -
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.lang.*;
BufferedReader lineReader = null;
try{
lineReader= new BufferedReader(new FileReader("${skufile}"));
String line = null;
int count = 0;
while ((line = lineReader.readLine()) != null){
String[] values = line.split(",");
vars.put("sku_" + count, values[0]);
vars.put("optionid_" + count, values[1]);
log.info("Sku# "+ count + " : " +vars.get("sku_"+count));
count++;
}
}catch (Throwable e) {
log.error("Errror in Beanshell", e);
throw e;
}
I have tried using both props and vars.
The error is not connected with any form of limits, take a look at line 22771 of your CSV file, it might i.e. not contain comma therefore your values[1] becomes null
Holding the file in memory is not the best option, I would rather recommend going for CSV Data Set Config and increasing the maximum number of open files which might be as low as 1024 for normal user for the majority of Linux distributions. The steps are:
add the next lines to /etc/security/limits.conf file
your_user_name soft nofile 4096
your_user_name hard nofile 65536
you can also run the following command to ramp-up system-wide "hard" limit
ulimit -n 8192
Be aware that since JMeter 3.1 it is recommended to use JSR223 Test Elements and Groovy language for scripting. Groovy is not only compatible with latest Java language features and offers syntax sugar on top, Groovy has much better performance comparing to Beanshell.

AWS multipart upload from inputStream has bad offfset

I am using the Java Amazon AWS SDK to perform some multipart uploads from HDFS to S3. My code is the following:
for (int i = startingPart; currentFilePosition < contentLength ; i++)
{
FSDataInputStream inputStream = fs.open(new Path(hdfsFullPath));
// Last part can be less than 5 MB. Adjust part size.
partSize = Math.min(partSize, (contentLength - currentFilePosition));
// Create request to upload a part.
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucket).withKey(s3Name)
.withUploadId(currentUploadId)
.withPartNumber(i)
.withFileOffset(currentFilePosition)
.withInputStream(inputStream)
.withPartSize(partSize);
// Upload part and add response to our list.
partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
currentFilePosition += partSize;
inputStream.close();
lastFilePosition = currentFilePosition;
}
However, the uploaded file is not the same as the original one. More specifically, I am testing on a test file, which has about 20 MB. The parts I upload are 5 MB each. At the end of each 5MB part, I see some extra text, which is always 96 characters long.
Even stranger, if I add something stupid to .withFileOffset(), for example,
.withFileOffset(currentFilePosition-34)
the error stays the same. I was expecting to get other characters, but I am getting the EXACT 96 extra characters as if I hadn't modified the line.
Any ideas what might be wrong?
Thanks,
Serban
I figured it out. This came from a stupid assumption on my part. It turns out, the file offset in ".withFileOffset(...)" tells you the offset where to write in the destination file. It doesn't say anything about the source. By opening and closing the stream repeatedly, I am always writing from the beginning of the file, but to a different offset. The solution is to add a seek statement after opening the stream:
FSDataInputStream inputStream = fs.open(new Path(hdfsFullPath));
inputStream.seek(currentFilePosition);

EOF on zeromq file transfer

I am using the below Python code to transfer large files between a server and a client using zeromq.
Implementation to send file, server
CHUNK_SIZE = 250000
message = pair.recv() # message is the path to the file
filename = open(message, 'rb')
filesize = os.path.getsize(message)
offsets = (int(ceil(filesize / CHUNK_SIZE)), 0)[filesize <= CHUNK_SIZE]
for offset in range(offsets + 1):
filename.seek(offset)
chunksize = CHUNK_SIZE
if offset == offsets:
chunksize = filesize - (CHUNK_SIZE * (offset - 1)) # calculate the size of the last chunk
data = filename.read(chunksize)
pair.send(data)
pair.send(b'')
Implementation to receive file, client
while True:
data = pairs.recv()
if data is not '':
target.write(data)
else:
break
However, after transfer a large file using this implementation, for some reason an extra data is being added at end of the file:
File server side
$ stat file.zip
File: `file.zip'
Size: 1503656416 Blocks: 2936840 IO Block: 4096 regular file
Client side
$ stat file.zip
File: `file.zip'
Size: 1503906416 Blocks: 2937328 IO Block: 4096 regular file
The size and blocks are different between them.
That said, do you have any suggestions to calculate/send the end of file properly?
Thanks
Just found the solution. The seek() was not processing the chunks properly.
-filename.seek(offset)
+filename.seek(0, 1)
Thus, it will always get the offset 0 on current (last) position.
Now everything is working as expected :)

Stream a HTTP response in Java

I want to write the response on an HTTP request to a File. However I want to stream the response to a physical file without waiting for the entire response to be loaded.
I will actually be making a request to a JHAT server for returning all the Strings from the dump. My browser hangs before the response completes as there are 70k such objects, I wanted to write them to a file so that I can scan through.
thanks in advance,
Read a limited amount of data from the HTTP stream and write it to a file stream. Do this until all data has been handled.
Here is example code demonstrating the principle. In this example I do not deal with any i/o errors. I chose an 8KB buffer to be faster than processing one byte at a time, yet still limiting the amount of data pulled into RAM during each iteration.
final URL url = new URL("http://example.com/");
final InputStream istream = url.openStream();
final OutputStream ostream = new FileOutputStream("/tmp/data.txt");
final byte[] buffer = new byte[1024*8];
while (true) {
final int len = istream.read(buffer);
if (len <= 0) {
break;
}
ostream.write(buffer, 0, len);
}

Uploading large files to S3 with ruby (aws:s3) - connection reset by peer on UBUNTU

I am trying to store some large files on S3 using ruby aws:s3 using:
S3Object.store("video.mp4", open(file), 'bucket', :access => :public_read)
For files of 100 MB or so everything is great but with files of over 200 MB I get a "Connection reset by peer" error in the log.
Has anyone come across this weirdness? From the web, it seems to be an issue with large but I have not yet come across a definitive solution.
I am using Ubuntu.
EDIT:
This seems to be a Linux issue as suggested here.
No idea where the original problem might be, but as workaround you could try multipart upload.
filename = "video.mp4"
min_chunk_size = 5 * 1024 * 1024 # S3 minimum chunk size (5Mb)
#object.multipart_upload do |upload|
io = File.open(filename)
parts = []
bufsize = (io.size > 2 * min_chunk_size) ? min_chunk_size : io.size
while buf = io.read(bufsize)
md5 = Digest::MD5.base64digest(buf)
part = upload.add_part(buf)
parts << part
if (io.size - (io.pos + bufsize)) < bufsize
bufsize = (io.size - io.pos) if (io.size - io.pos) > 0
end
end
upload.complete(parts)
end
S3 multipart upload is little tricky as each part size must be over 5Mb, but that has been taken care of above code.

Resources