I have a "stringstream" variable that stores some compressed binary data in gzip format.
I want to decompress this stringstream variable in memory.
First of all, for in-memory decompression of binary data in gzip format, what third party library do you suggest to use ?
I noticed zlib library for compression/decompression of gzip and deflate formats.
However, the two functions handling decompression that zlip provides do not seem to meet my needs exactly:
int uncompress (Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen);
int gzread (gzFile file, voidp buf, unsigned len);
The first one (uncompress) requires me to know the length of the decompressed data in advance to properly allocate enough memory for storage. In my case, it is unknown.
On the other hand, the second one (gzread) takes a file as input, not a memory buffer.
What do you suggest for an "efficient" in-memory decompression using zlip or some other library ?
Thanks.
There appears to be some decompression filters for gzip in the Boost library, this might be worth looking into:
http://www.boost.org/doc/libs/1_48_0/libs/iostreams/doc/classes/gzip.html
Related
I am trying to read an 8K 32bit OpenEXR HDR file with Rust.
Using the Image crate to read the file:
use image::io::Reader as ImageReader;
let img = ImageReader::open(r"C:\Users\Marko\Desktop\HDR_Big.exr")
.expect("File Error")
.decode()
.expect("Decode ERROR");
This results in an Decode ERROR: Limits(LimitError { kind: InsufficientMemory })
Reading a 4K file or smaller works fine.
I thought buffering would help so I tried:
use image::io::Reader as ImageReader;
use std::io::BufReader;
use std::fs::File;
let f = File::open(r"C:\Users\Marko\Desktop\HDR_Big.exr").expect("File Error");
let reader = BufReader::new(f);
let img_reader = ImageReader::new(reader)
.with_guessed_format()
.expect("Reader Error");
let img = img_reader.decode().expect("Decode ERROR");
But the same error results.
Is this a problem with the image crate itself? Can it be avoided?
If it makes any difference for the solution after decoding the image I use the raw data like this:
let data: Vec<f32> = img.to_rgb32f().into_raw();
Thanks!
But the same error results. Is this a problem with the image crate itself? Can it be avoided?
No because it's not a problem and yes it can be avoided.
When an image library faces the open web it's relatively easy to DOS the entire service or exhaust its library cheaply as it's usually possible to request huge images at a very low cost (for instance a 44KB PNG can decompress to a 1GB full-color buffer, and a megabyte-scale jpeg can reach GB-scale buffer sizes).
As a result modern image libraries tend to set limits by default in order to limit the "default" liability of users.
That is the case of image-rs, by default it does not set any width or height limits but it does request that the allocator limits itself to 512MB.
If you wish for higher or no limitations, you can configure the decoder to match.
All of this is surfaced by simply searching for the error name and the library (both "InsufficientMemory image-rs" and "LimitError image-rs" surfaced the information)
By default, image::io::Reader asks the decoder to fit the decoding process in 512 MiB of memory, according to the documentation. It's possible to disable this limitation, using, e.g., Reader::no_limits.
I'm trying to send a relatively big string (6Kb) through libnl and generic netlink, however, I'm receiving the error -5 (NL_ENOMEM) from the function nla_put_string in this process. I've made a lot of research but I didn't find any information about these two questions:
What's the maximum string size supported by generic netlink and libnl nla_put_string function?
How to use the multipart mechanism of generic netlink to broke this string in smaller parts to send and reassemble it on the Kernel side?
If there is a place to study such subject I appreciate that.
How to use the multipart mechanism of generic netlink to broke this string in smaller parts to send and reassemble it on the Kernel side?
Netlink's Multipart feature "might" help you transmit an already fragmented string, but it won't help you with the actual string fragmentation operation. That's your job. Multipart is a means to transmit several small correlated objects through several packets, not one big object. In general, Netlink as a whole is designed with the assumption that any atomic piece of data you want to send will fit in a single packet. I would agree with the notion that 6Kbs worth of string is a bit of an oddball.
In actuality, Multipart is a rather ill-defined gimmic in my opinion. The problem is that the kernel doesn't actually handle it in any generic capacity; if you look at all the NLMSG_DONE usage instances, you will notice not only that it is very rarely read (most of them are writes), but also, it's not the Netlink code but rather some specific protocol doing it for some static (ie. private) operation. In other words, the semantics of NLMSG_DONE are given by you, not by the kernel. Linux will not save you any work if you choose to use it.
On the other hand, libnl-genl-3 does appear to perform some automatic juggling with the Multipart flags (NLMSG_DONE and NLM_F_MULTI), but that only applies when you're sending something from Kernelspace to Userspace, and on top of that, even the library itself admits that it doesn't really work.
Also, NLMSG_DONE is supposed to be placed in the "type" Netlink header field, not in the "flags" field. This is baffling to me, because Generic Netlink stores the family identifier in type, so it doesn't look like there's a way to simultaneously tell Netlink that the message belongs to you, AND that it's supposed to end some data stream. Unless I'm missing something important, Multipart and Generic Netlink are incompatible with each other.
I would therefore recommend implementing your own message control if necessary and forget about Multipart.
What's the maximum string size supported by generic netlink and libnl nla_put_string function?
It's not a constant. nlmsg_alloc() reserves
getpagesize() bytes per packet by default. You can tweak this default with nlmsg_set_default_size(), or more to the point you can override it with nlmsg_alloc_size().
Then you'd have to query the actual allocated size (because it's not guaranteed to be what you requested) and build from there. To get the available payload you'd have to subtract the Netlink header length, the Generic Header length and the Attribute Header lengths for any attributes you want to add. Also the user header length, if you have one. You would also have to align all these components because their sizeof is not necessarily their actual size (example).
All that said, the kernel will still reject packets which exceed the page size, so even if you specify a custom size you will still need to fragment your string.
So really, just forget all of the above. Just fragment the string to something like getpagesize() / 2 or whatever, and send it in separate chunks.
This is the general idea:
static void do_request(struct nl_sock *sk, int fam, char const *string)
{
struct nl_msg *msg;
msg = nlmsg_alloc();
genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, fam,
0, 0, DOC_EXMPL_C_ECHO, 1);
nla_put_string(msg, DOC_EXMPL_A_MSG, string);
nl_send_auto(sk, msg);
nlmsg_free(msg);
}
int main(int argc, char **argv)
{
struct nl_sock *sk;
int fam;
sk = nl_socket_alloc();
genl_connect(sk);
fam = genl_ctrl_resolve(sk, FAMILY_NAME);
do_request(sk, fam, "I'm sending a string.");
do_request(sk, fam, "Let's pretend I'm biiiiiig.");
do_request(sk, fam, "Look at me, I'm so big.");
do_request(sk, fam, "But I'm already fragmented, so it's ok.");
nl_close(sk);
nl_socket_free(sk);
return 0;
}
I left a full sandbox in my Dropbox. See the README. (Tested in kernel 5.4.0-37-generic.)
The documentation of OSData says that "...You can add bytes to them and overwrite portions of the byte array.". I can see a method to append bytes, but I don't understand how I am able to overwrite a portion of the buffer.
Another option would be to use IONewZero to allocate a number of elements of the type I need. I my case I just need this for ints.
Example:
T* dataBuffer = IONewZero(T, SIZE);
And then deallocate with:
IOSafeDeleteNULL(dataBuffer_, T, SIZE);
What are the advantages of using an OSData object compared to the solution with IONewZero / IOSafeDeleteNULL?
I think the documentation might just be copy-pasted from the kernel variant of OSData. I've seen that in a bunch of places, especially USBDriverKit.
OSData is mostly useful for dealing with plist-like data structures (i.e. setting and getting properties on service objects) in conjunction with the other OSTypes: OSArray, OSDictionary, OSNumber, etc. It's also used for in-band (<= 4096 byte) "struct" arguments of user client external methods.
The only use I can see outside of those scenarios is when you absolutely have to reference-count a blob of data. But it's certainly not a particularly convenient or efficient container for data-in-progress. If you subsequently need to send the data to a device or map it to user space, IOBufferMemoryDescriptor is probably a better choice (and also reference counted) though it's even more heavyweight.
When writing lots of sequential data to disk I found that having an internal 4MB buffer and when opening the file for writing I specify [FILE_FLAG_NO_BUFFERING][1], so that my internal buffer is used.
But that also creates a requirement to write in full sector blocks (512 bytes on my machine).
How do I write the last N<512 bytes to disk?
Is there some flag to WriteFile to allow this?
Do I pad them with extra NUL characters and then truncate the file size down to the correct value?
(With SetFileValidData or similar?)
For those wondering the reason for trying this approach. Our application logs a lot. To handle this a dedicated log-thread exists, which formats and writes logs to disk. Also if we log with fullest detail we might log more per second than the disk-system can handle. (Usually noticed for customers with SAN systems that are not well tweaked.)
So, the goal is log write as much as possible, but also notice when we start to overload the system, and then hold back a bit, like reducing the details of the logs.
Hence the idea to have a fill a big memory-block and give that to the OS, hoping to reduce the overheads.
As the comments suggest, doing file writing this way is probably not the best solution for real world situations. But if writing with FILE_FLAG_NO_BUFFERING is used,
SetFileInformationByHandle is the way to mark the file shorter than whole blocks.
int data_len = len(str);
int len_last_block = BLOCKSIZE%datalen;
int padding_to_fill_block = (data_last_block == BLOCKSIZE ? 0 : (BLOCKSIZE-len_last_block);
str.append('\0', padding_to_fill_block);
ULONG bytes_written = 0;
::WriteFile(hFile, data, data_len+padding_to_fill_block, &bytes_written, NULL));
m_filesize += bytes_written;;
LARGE_INTEGER end_of_file_pos;
end_of_file_pos.QuadPart = m_filesize - padding_to_fill_block;
if (!::SetFileInformationByHandle(hFile, FileEndOfFileInfo, &end_of_file_pos, sizeof(end_of_file_pos)))
{
HRESULT hr = ::GetLastErrorMessage();
}
Well, I'm working on a project, in which I'm handling potentially big files, that I can't load into ram all at once, so I'm going to treat them like a CHS hard drive, and grab the data one 0x800 byte chunk at a time.
My problem is, I cannot find any functions in the WINAPI that allow me to read the data from a file I've opened with CreateFile, starting at an offset.
And yes, it must be a WINAPI function, and no, I do not want to map the whole file into memory.
Thanks much, Bradley.
Use ReadFile with SetFilePointer