Sending file in chunk always crashes at 10th chunk

Sending file in chunk always crashes at 10th chunk - ruby

I have a strange problem with my ultra-simple method. It sends a file in 4MB chunks to foreign API. The thing is, always at 10th chunk, foreign API crashes.
It's impossible to debug the API error but it says: The specified blob or block content is invalid (That API is Azure Storage API but it's not important right now, the problems lays clearly on my side).
Because it crashes at 10th element (which is 40th megabite) it's a pain to test it and debugging it "by hand" takes a lot of time (partly in cause of my bad internet connection speed) i decided to share my method
def upload_chunk()
file_to_send = File.open('file.mp4', 'rb')
until file_to_send.eof?
#content = file_to_send.read 4194304 # Get 4MB chunk
upload_to_api(#content) # Line that produces the error
end
end
Can you see anything, that can be wrong with this code? Please have in mind that it ALWAYS crashes at 10th time and works perfectly for files of size lesser than 40 MB.

I did a search for ruby "The specified blob or block content is invalid" and found this as the second link (first was this page):
http://cloud.dzone.com/articles/azure-blob-storage-specified
This contains:
If you’re uploading blobs by splitting blobs into blocks and you get the above mentioned error, ensure that your block ids of your blocks are of same length. If the block ids of your blocks are of different length, you’ll get this error.
So my first guess is that the call to upload_to_api is assigning ids from 1-9, then when it goes to 10 the id length increases causing the problem.
If you don't have control over how the ids are generated, then perhaps you can set the amount of bytes read on each iteration to be no more than 1/9 of the total file size.

Related

Genexus error "Check srcIndex and length, and the array's lower bounds" generating report

KB built on Genexus 16 U9, using generator .NET 4.0.
The system generates a report when the client request via web service, passing the invoicy's ID. Generaly it's requested simultaneously for many different docs, but every report generates an unique filename (avoiding lock the filename), converts it to base64 and delete the file.
In majority the request goes success, but sometimes it starts throwing the exception below for many requests in a short period of time. After recicling the IIS pool, it stops occuring for a while.
Report procedure: rnuc006.
Source array was not long enough. Check srcIndex and length, and the array's lower bounds. at GeneXus.Procedure.GxReportUtils.GetPrinter(Int32 outputType, String path, Stream reportOutputStream)
at GeneXus.Procedure.GXProcedure.getPrinter()
at GeneXus.Programs.rnuc006.executePrivate()
at GeneXus.Programs.rnuc006.execute(SdtSDTDadosEmissao& aP0_SDTDadosEmissao, SdtSDTDadosEnvio& aP1_SDTDadosEnvio, Int16 aP2_indiceLote, Int16 aP3_indiceRPS, String aP4_Filename)
at GeneXus.Programs.pnfs216.S121()
at GeneXus.Programs.pnfs216.executePrivate()
I'm trying to debug, but its dificult to find why it starts happening suddenly.

There's a fix to this error on v16u10, maybe you can try with that version if you have this problem again.

How can I append bytes of data in an already uploaded file in StorJ?

I was getting segment error while uploading a large file.
I have read the file data in chunks of bytes using the Read method through io.Reader. Now, I need to upload the bytes of data continuously into the StorJ.

Storj, architected as an S3-compatible distributed object storage system, does not allow changing objects once uploaded. Basically, you can delete or overwrite, but you can't append.
You could make something that seemed like it supported append, however, using Storj as the backend. For example, by appending an ordinal number to your object's path, and incrementing it each time you want to add to it. When you want to download the whole thing, you would iterate over all the parts and fetch them all. Or if you only want to seek to a particular offset, you could calculate which part that offset would be in, and download from there.
sj://bucket/my/object.name/000
sj://bucket/my/object.name/001
sj://bucket/my/object.name/002
sj://bucket/my/object.name/003
sj://bucket/my/object.name/004
sj://bucket/my/object.name/005
Of course, this leaves unsolved the problem of what to do when multiple clients are trying to append to your "file" at the same time. Without some sort of extra coordination layer, they would sometimes end up overwriting each other's objects.

Handling large files with Azure search blob extractor

Receiving errors from the Blob extractor that files are too large for the current tier, which is basic. I will be upgrading to a higher tier, but I notice that the max size is currently 256MB.
When I have PPTX files that are mostly video and audio, but have text I'm interested in, is there a way to index those? What does the blob extractor max file size actually mean?
Can I tell the extractor to only take the first X MB or chars and just stop?

There are two related limits in the blob indexer:
Max file size limit that you are hitting. If file size exceeds that limit, indexer doesn't attempt to download it and produces an error to make sure you are aware of the issue. The reason we don't just take first N bytes is because for parsing many formats correctly, the entire file is needed. You can mark blobs as skipable or configure indexer to ignore a number of errors if you want it to make forward progress when encountering blobs that are too large.
The max size of extracted text. In case file contains more text than that, indexer takes N characters up to the limit and includes a warning so you can be aware of the issue. Content that doesn't get extracted (such as video, at least today) doesn't contribute to this limit, of course.
How large are the PPTX you need indexed? I'll add my contact info in a comment.

why is the length of block after the block

I'm extracting data from a binary file and see that the length of the binary data block comes after the block itself (the character chunks within the block have length first then 00 and then the information)
what is the purpose of the the block? is it for error checking?

Couple of examples:
The length of block was unknown when write operation began. Consider audio stream from microphone which we want to write as single block. It is not feasible to buffer it in RAM because it may be huge. That's why after we received EOF, we append effective size of block to the file. (Alternative way would be to reserve couple of bytes for length field in the beginning of block and then, after EOF, to write length there. But this requires more IO.)
Database WALs (write-ahead logs) may use such scheme. Consider that user starts transaction and makes lots of changes. Every change is appended as single record (block) to WAL. If user decides to rollback transaction, it is easy now to go backwards and then to chop off all records which were added as part of transaction user wants to rollback.
It is common for binary files to carry two blocks of metainformation: one block in the beginning (e.g. creation date, hostname) and another one in the end (e.g. statistics and checksum). When application opens existing binary file, it first wants to load these two blocks to make decisions about memory allocation and the like. This is much easier to load last block if its length is stored in the very end of file rather then scanning file from the beginning.

What can lead to failures in appending data to a file?

I maintain a program that is responsible for collecting data from a data acquisition system and appending that data to a very large (size > 4GB) binary file. Before appending data, the program must validate the header of this file in order to ensure that the meta-data in the file matches that which has been collected. In order to do this, I open the file as follows:
data_file = fopen(file_name, "rb+");
I then seek to the beginning of the file in order to validate the header. When this is done, I seek to the end of the file as follows:
_fseeki64(data_file, _filelengthi64(data_file), SEEK_SET);
At this point, I write the data that has been collected using fwrite(). I am careful to check the return values from all I/O functions.
One of the computers (windows 7 64 bit) on which we have been testing this program intermittently shows a condition where the data appears to have been written to the file yet neither the file's last changed time nor its size changes. If any of the calls to fopen(), fseek(), or fwrite() fail, my program will throw an exception which will result in aborting the data collection process and logging the error. On this machine, none of these failures seem to be occurring. Something that makes the matter even more mysterious is that, if a restore point is set on the host file system, the problem goes away only to re-appear intermittently appear at some future time.
We have tried to reproduce this problem on other machines (a vista 32 bit operating system) but have had no success in replicating the issue (this doesn't necessarily mean anything since the problem is so intermittent in the first place.
Has anyone else encountered anything similar to this? Is there a potential remedy?
Further Information
I have now found that the failure occurs when fflush() is called on the file and that the win32 error that is being returned by GetLastError() is 665 (ERROR_FILE_SYSTEM_LIMITATION). Searching google for this error leads to a bunch of reports related to "extents" for SQL server files. I suspect that there is some sort of journaling resource that the file system is reporting and this because we are growing a large file by opening it, appending a chunk of data, and closing it. I am now looking for understanding regarding this particular error with the hope for coming up with a valid remedy.

The file append is failing because of a file system fragmentation limit. The question was answered in What factors can lead to Win32 error 665 (file system limitation)?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio