What is a good data type for representing arbitrary binary data? - mercury

I want to read binary data from disk and store it in a Mercury variable. According to the string library, strings don't allow embedded null bytes and store content with UTF-8 encoding so I don't think that will work. The best I've found so far is a line in the bitmap library that says, "Accessing bitmaps as if they are an array of eight bit bytes is especially efficient"
Are bitmaps a good way to store arbitrary binary data? Is there something better?

Yes, bitmaps are the recommended way to read/write/store binary data.

Related

How to read coordinates data from a BLOB?

There is coordinate data which is stored in a BLOB, similar to the arcgis ST_GEOMETRY type of points. The storage contains the byte stream of the point coordinates that define the geometry like this:
How can I get the data from the BLOB in Oracle?
BLOBs are binary data. They could contain literally anything. Oracle has no built-in mechanism for extracting data from a BLOB. Your options are:
Whatever part of your application wrote the binary data should be responsible for unpacking and displaying the data.
Write some PL/SQL to retrieve the data using UTL_RAW functions to work with binary data. Find out more Doing this requires you to understand how the program which wrote the binary structured it.
This is why storing data in binary is usually a bad idea. Sure, you save space but essentially this obfuscates the data and imposes a toll on using it. If storage is that much of an issue consider compression instead.

What is the best way to read and write purely binary information in Go?

I want to implement Huffman coding by hand for a personal project. The part I am stuck on is how to store the coding. Say my input can be encoded using 65 bits. Do I make a class which wraps a byte slice of 9 bytes, and treats the elements as one continuous piece of memory? Or is there a way to do what I want more directly?
You could probably use bitarray data structure for this. Check out this https://godoc.org/github.com/golang-collections/go-datastructures/bitarray

Algorithm to compress a lot of small strings?

I am looking for an algorithm to compress small ASCII strings. They contain lots of letters but they also can contain numbers and rarely special characters. They will be small, about 50-100 bytes average, 250 max.
Examples:
Android show EditText.setError() above the EditText and not below it
ImageView CENTER_CROP dont work
Prevent an app to show on recent application list on android kitkat 4.4.2
Image can't save validable in android
Android 4.4 SMS - Not receiving sentIntents
Imported android-map-extensions version 2.0 now my R.java file is missing
GCM registering but not receiving messages on pre 4.0.4. devices
I want to compress the titles one by one, not many titles together and I don't care much about CPU and memory usage.
You can use Huffman coding with a shared Huffman tree among all texts you want to compress.
While you typically construct a Huffman tree for each string to be compressed separately, this would require a lot of overhead in storage which should be avoided here. That's also the major problem when using a standard compression scheme for your case: most of them have some overhead which kills your compression efficiency for very short strings. Some of them don't have a (big) overhead but those are typically less efficient in general.
When constructing a Huffman tree which is later used for compression and decompression, you typically use the texts which will be compressed to decide which character is encoded with which bits. Since in your case the texts to be compressed seem to be unknown in advance, you need to have some "pseudo" texts to build the tree, maybe from a dictionary of the human language or some experience of previous user data.
Then construct the Huffman tree and store it once in your application; either hardcode it into the binary or provide it in the form of a file. Then you can compress and decompress any texts using this tree. Whenever you decide to change the tree since you gain better experience on which texts are compressed, the compressed string representation also changes. It might be a good idea to introduce versioning and store the tree version together with each string you compress.
Another improvement you might think about is to use multi-character Huffman encoding. Instead of compressing the texts character by character, you could find frequent syllables or words and put them into the tree too; then they require even less bits in the compressed string. This however requires a little bit more complicated compression algorithm, but it might be well worth the effort.
To process a string of bits in the compression and decompression routine in C++(*), I recommend either boost::dynamic_bitset or std::vector<bool>. Both internally pack multiple bits into bytes.
(*)The question once had the c++ tag, so OP obviously wanted to implement it in C++. But as the general problem is not specific to a programming language, the tag was removed. But I still kept the C++-specific part of the answer.

Text Compression - What algorithm to use

I need to compress some text data of the form
[70,165,531,0|70,166,562|"hi",167,578|70,171,593|71,179,593|73,188,609|"a",1,3|
The data contains a few thousand characters(10000 - 50000 approx).
I read upon the various compression algorithms, but cannot decide which one to use here.
The important thing here is : The compressed string should contain only alphanumberic characters(or a few special characters like +-/&%#$..) I mean most algorithms provide gibberish ascii characters as compressed data right? That must be avoided.
Can someone guide me on how to proceed here?
P.S The text contains numbers , ' and the | character predominantly. Other characters occur very very rarely.
Actually your requirement to limit the output character set to printable characters automatically costs you 25% of your compression gain, as out of 8 bits per by you'll end up using roughly 6.
But if that's what you really want, you can always base64 or the more space efficient base85 the output to reconvert the raw bytestream to printable characters.
Regarding the compression algorithm itself, stick to one of the better known ones like gzip or bzip2, for both well tested open source code exists.
Selecting "the best" algorithm is actually not that easy, here's an excerpt of the list of questions you have to ask yourself:
do i need best speed on the encoding or decoding side (eg bzip is quite asymmetric)
how important is memory efficiency both for the encoder and the decoder? Could be important for embedded applications
is the size of the code important, also for embedded
do I want pre existing well tested code for encoder or decorder or both only in C or also in another language
and so on
The bottom line here is probably, take a representative sample of your data and run some tests with a couple of existing algorithms, and benchmark them on the criteria that are important for your use case.
Just one thought: You can solve your two problems independently. Use whatever algorithm gives you the best compression (just try out a few on your kind of data. bz2, zip, rar -- whatever you like, and check the size), and then to get rid of the "gibberish ascii" (that's actually just bytes there...), you can encode your compressed data with Base64.
If you really put much thought into it, you might find a better algorithm for your specific problem, since you only use a few different chars, but if you stumble upon one, I think it's worth a try.

self-encoded QR barcode?

I was wondering if it's possible to create a QR in some file format, say png, then encode the png in QR, such that the resulting QR is the same one you started with?
I don't think so. Each QR code needs to encode the original data along with variable amounts of redundancy.
So to encode the original QR code, you need the encode the same amount of information and additional redundancies, which means the result can't be the same since it encodes more information.
There are different sizes of QR codes ranging from 21x21 to 177x177. They can hold anywhere from 152 to around 31,000 data bits. Unfortunately, even using 1 bit per "pixel", the amount of data a code can hold never reaches the number of bits required to store it.
There are sizes, though for which it is not far off. I imagine some simple compression algorithm, or maybe even ignoring common parts like the calibration areas could get to a point where you can store some representation of it in itself. It is feasible to me that you could find a way to store a qr code of some size as a qr code of the same size.
The problem then is constructing a code which creates itself. With different error correction options, there is room to fudge a few pixels around, which helps the probability that such a thing is possible, but it would still take a fair bit of magic. Perhaps some sort of genetic algorithm could do better than brute force, but you may need to read the full spec and build one cleverly by hand. The search space is pretty big.
As freespace mentioned, it's not possible to encode an image in that same image itself, for several reasons.
I have created a QR Code which contains an URL which contains (again) the original image:
http://qr.ai/qqq
I really think that's the closest you can get.
A QR code can contain max. 4296 characters. I assume this is unicode, and that two bytes are used to represent one character. This means that a QR code can contain a maximum of 7089 bytes, which is enough to store a small image (like a small qr code).
The only issue here, is that most QR readers expect qr-codes to contain text (not image data).

Resources