error when access data from boost mapped_region - boost

I have some trouble when visit mapped_region data.
First, I define a struct: (for stock quotes...)
struct bar{
double open,high,low,close;
size_t volume;
bar(double _open, double _high, double _low, double _close): open(_open),high(_high), close(_close), volume(_volume){}};
here is the sample.txt (I've also tried binary format) file for which I want to visit by iteration of bar type
89.26 89.47 89.25 89.47 563
89.47 89.56 89.27 89.47 284
89.46 89.56 89.26 89.33 264
using following code, I can read the character by character:
file_mapping m_file(filename,read_only);
mapped_region region(m_file,read_only);
char const* add= static_cast<char*> (region.get_address());
namely, for the first data, I would get 8 9 . 2 6, character by character using add[i]. This can be terrible workload.
So I want to convert :
bar* myaddr=(bar*)(region.get_address()), where bar is define as the above..
so that I can have access to data by using :
myaddr->open (with an offset ).
For instance, now I want to visit the 3rd number in the second line, I just need:
(myaddr+1)->high
However, the result is really wired:
e.g 1.50656e-189, or sometimes 825303072 for (myaddr+2)->volume
In fact, if I convert to any time beyond char , there would be such error...
Question: How can I visit mapped data by using myaddr-> without error?
Thanks

That looks like a text file. If you read it as a memory mapped region you get text, not doubles. That is your problem.

Related

Gdb printing long values in hex without having to guess the length

How do I print long values in gdb?
When using just x, i.e x $rdi, the value (in hex) is cut off after 8 bytes.
If I use x/32bx(or whatever other length), bytes are separated by spaces which is not nice, but okay. The problem is that when there's some long value I want to print, I have to guess the size and pass it to x/. If that value is 256 bytes long, the output will look messy, because it's separated by spaces, but it also means I have to make a lot of guesses and then look through a long and ugly string of bytes and find the place where the value ends and is followed by 0x00s (and obviously the value can have 0x00s in between which makes trying to work this out even more confusing) to be able to know how long it is.
If I try to print it as an integer, it gets cut off as well. I'd like to be able to easily tell how long a value is and not have it be cut off.
A way to display long (as well as ll & ull) values is with the g modifier. For example if we have a program which just stores
unsigned long long int a = 1234567891234567898;
int b = 23;
unsigned long long int c = 1111111111111111111;
by typing x/20xg $rsp (after the values have been moved to the stack) we get
0x7fffffffdd90: 0x0000000000000000 0x0000001755555040
0x7fffffffdda0: 0x112210f4c023b6da 0x0f6b75ab2bc471c7
0x7fffffffddb0: 0x0000000000000000 0x00007ffff7e08b25
...
With the long numbers in [rsp+0x10] & [rsp+0x18] being a & c respectively, and that 0x17 in [rsp+0xc] being b.

How can I see the actual binary content of a VB6 Double variable?

I have hunted about quite a bit but can't find a way to get at the Hexadecimal or Binary representation of the content of a Double variable in VB6. (Are Double variables held in IEEE754 format?)
The provided Hex(x) function is no good because it integerizes its input first.
So if I want to see the exact bit pattern produced by Atn(1), Hex(Atn(1)) does NOT produce it.
I'm trying to build a mathematical function containing If clauses. I want to be able to see that the values returned on either side of these boundaries are, as closely as possible, in line.
Any suggestions?
Yes, VB6 uses standard IEEE format for Double. One way to get what you want without resorting to memcpy() tricks is to use two UDTs. The first would contain one Double, the second a static array of 8 Byte. LSet the one containing the Double into the one containing the Byte array. Then you can examine each Byte from the Double one by one.
If you need to see code let us know.
[edit]
At the module level:
Private byte_result() As Byte
Private Type double_t
dbl As Double
End Type
Private Type bytes_t
byts(1 To 8) As Byte
End Type
Then:
Function DoubleToBytes (aDouble As Double) As Byte()
Dim d As double_t
Dim b As bytes_t
d.dbl = aDouble
LSet b = d
DoubleToBytes = b.byts
End Function
To use it:
Dim Indx As Long
byte_result = DoubleToBytes(12345.6789#)
For Indx = 1 To 8
Debug.Print Hex$(byte_result(Indx)),
Next
This is air code but it should give you the idea.

how bytes are used to store information in protobuf

i am trying to understand the protocol buffer here is the sample , what i am not be able to understand is how bytes are being used in following messages. i dont know what this number
1 2 3 is used for.
message Point {
required int32 x = 1;
required int32 y = 2;
optional string label = 3;
}
message Line {
required Point start = 1;
required Point end = 2;
optional string label = 3;
}
message Polyline {
repeated Point point = 1;
optional string label = 2;
}
i read following paragraph in google protobuf but not able to understand what is being said here , can anyone help me in understanding how bytes are being used to store info.
The " = 1", " = 2" markers on each element identify the unique "tag" that field uses in the binary encoding. Tag numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you can decide to use those tags for the commonly used or repeated elements, leaving tags 16 and higher for less-commonly used optional element.
The general form of a protobuf message is that it is a sequence of pairs of the form:
field header
payload
For your question, we can largely forget about the payload - that isn't the bit that relates to the 1/2/3 and the <=16 restriction - all of that is in the field header. The field header is a "varint" encoded integer; "varint" uses the most-significant-bit as an optional continuation bit, so small values (<=127, assuming unsigned and not zig-zag) require one byte to encode - larger values require multiple bytes. Or in other words, you get 7 useful bits to play with before you need to set the continuation bit, requiring at least 2 bytes.
However! The field header itself is composed of two things:
the wire-type
the field-number / "tag"
The wire-type is the first 3 bits, and indicates the fundamental format of the payload - "length-delimited", "64-bit", "32-bit", "varint", "start-group", "end-group". That means that of the 7 useful bits we had, only 4 are left; 4 bits is enough to encode numbers <= 16. This is why field-numbers <= 16 are suggested (as an optimisation) for your most common elements.
In your question, the 1 / 2 / 3 is the field-number; at the time of encoding this is left-shifted by 3 and composed with the payload's wire-type; then this composed value is varint-encoded.
Protobuf stores the messages like a map from an id (the =1, =2 which they call tags) to the actual value. This is to be able to more easily extend it than if it would transfer data more like a struct with fixed offsets. So a message Point for instance would look something like this on a high level:
1 -> 100,
2 -> 500
Which then is interpreted as x=100, y=500 and label=not set. On a lower level, protobuf serializes this tag-value mapping in a highly compact format, which among other things, stores integers with variable-length encoding. The paragraph you quoted just highlights exactly this in the case of tags, which can be stored more compactly if they are < 16, but the same for instance holds for integer values in your protobuf definition.

Boost Array - conversion to BYTE

So i have this: boost::array data_;
How do i convert it to normal BYTE/Char buffer or how do i print the data inside without converting it , using printf?
How can i compare it with other normal chracter buffer for example "hello".
It will be also very helpfull to know how does boost::array work, (i am creating boost async.tcp server).
I have tried some things but i was unable to print the characters inside the buffer, i'm new to boost.
I could not find much documentation about boost.
Thank you.
The boost::array class is a parameterized type, meaning that the full type name of a variable of this type is something like boost::array<char,10> for an array containing 10 elements of type char, or boost::array<float,100> for an array containing 100 elements of type float.
If you happen to have a variable data_ of some type boost::array<T,N> where T is char, then printing out the characters in it is easy:
std::cout.write(data_.data(), data_.size());
If T is wchar, you could do
std::wcout.write(data_.data(), data_.size());
If your particular boost::array type contains some other element type T, you need to consider how you would want to print out the elements. For example, if your happy with the default stream representation of the type, you may do something like
for (auto element : _data) {
std::cout << element << "\n";
}
to print out one element per line.
You can find the documentation of the boost::array class at http://www.boost.org/doc/libs/1_53_0/doc/html/boost/array.html

Last byte in Huffman compression

I am wondering about what is the best way to handle the last byte in Huffman Copression. I have some nice code in C++, that can compress text files very well, but currently I must write to my coded file also number of coded chars (well, it equal to input file size), because of no idea how to handle last byte better.
For example, last char to compress is 'a', which code is 011 and I am just starting new byte to write, so the last byte will look like:
011 + some 5 bits of trash, I am making them zeros for example at the end.
And when I am encoding this coded file, it may happen that code 00000 (or with less zeros) is code for some char, so I will have some trash char at the end of my encoded file.
As I wrote in first paragraph, I am avoiding this by saving numbers of chars of input file in coded file, and while encoding, I am reading the coded file to reach that number (not to EndOfFile, to don't get to those example 5 zeros).
It's not really efficient, size of coded file is increased for long number.
How can I handle this in better way?
Your approach (write the number of encoded bytes the to the file) is a perfectly reasonable approach. If you want to try a different avenue, you could consider inventing a new "pseudo-EOF" character that marks the end of the input (I'll denote it as &square;). Whenever you want to compress a string s, you instead compress the string s&square;. This means that when you build up your encoding tree, you would include one copy of the &square; character so that you have a unique encoding for &square;. Then, when you write out the string to the file, you would write out the bits characters of the string as normal, then write out the bit pattern for &square;. If there are leftover bits, you can just leave them set arbitrarily.
The advantage to this approach is that as you decode the file, if at any point you find the &square; character, you can immediately stop decoding bits because you know that you have hit the end of the file. This does not require you to store the number of bytes that were written out anywhere - the encoding implicitly marks its own endpoint.
The disadvantage to this setup is that it might increase the length of the bit patterns used by certain characters, since you will need to assign a bit pattern to &square; in addition to all the other characters.
I teach an introductory programming course and we use Huffman encoding as one of our assignments. We have students use the above approach, since it's a bit easier than having to write out the number of bits or bytes before the file contents. For more details, you could take a look at this handout or these lecture slides from the course.
Hope this helps!
I know this is an old question, but still, there's an alternate, so it might help someone.
When you're writing your compressed file to output, you probably have some integer keeping track of where you are in the current byte (for bit shifting).
char c, p;
p = '\0';
int curr = 7;
while (infile.get(c))
{
std::string trav = GetTraversal(c);
for (int i = 0; i < trav.size(); i++)
{
if (trav[i] == '1')
p += (1 << curr);
if (--curr < 0)
{
outfile.put(p);
p = '\0';
curr = 7;
}
}
}
if (curr < 7)
outfile.put(p);
At the end of this block, (curr+1)%8 equals the number of trash bits in the last data byte. You can then store it at the end as a single extra byte, and just keep it in mind when you're decompressing.

Resources