XChangeProperty for an atom property on a system where Atom is 64 bits - x11

The X11 protocol defines an atom as a 32-bit integer, but on my system, the Atom type in is a typedef for unsigned long, which is a 64-bit integer. The manual for Xlib says that property types have a maximum size of 32 bits. There seems to be some conflict here. I can think of three possible solutions.
If Xlib treats properties of type XA_ATOM as a special case, then you can simply pass 32 for 'format' and an array of atoms for 'data'. This seems unclean and hackish, and I highly doubt that this is correct.
The manual for Xlib appears to be ancient. Since Atom is 64 bits long on my system, should I pass 64 for the 'format' parameter even though 64 is not listed as an allowed value?
Rather than an array of Atoms, should I pass an array of uint32_t values for the 'data' parameter? This seems like it would most likely be the correct solution to me, but this is not what they did in some sources I've looked up that use XChangeProperty, such as SDL.
SDL appears to use solution 1 when setting the _NET_WM_WINDOW_TYPE property, but I suspect that this may be a bug. On systems with little endian byte order (LSB first), this would appear to work if the property has only one element.
Has anyone else encountered this problem? Any help is appreciated.

For the property routines you always want to pass an array of 'long', 'short' or 'char'. This is always true independent of the actual bit width. So, even if your long or atom is 64 bits, it will be translated to 32 bits behind the scenes.
The format is the number of server side bits used, not client side. So, for format 8, you must pass a char array, for format 16, you always use a short array and for format 32 you always use a long array. This is completely independent of the actual lengths of short or long on a given machine. 32 bit values such as Atom or Window always are in a 'long'.
This may seem odd, but it is for a good reason, the C standard does not guarantee types exist that have exactly the same widths as on the server. For instance, a machine with no native 16 bit type. However a 'short' is guaranteed to have at least 16 bits and a long is guaranteed to have at least 32 bits. So by making the client API in terms of 'short' and 'long' you can both write portable code and always have room for the full X id in the C type.

Related

Why can't ruby use most of the 2^X numbers as object ids?

ObjectSpace._id2ref gives us the object from the Ruby's Object Space, it has an object against id in sequence starting from 0, however, if we try to see object on id 4 it gives an error as
2.6.3 :121 > ObjectSpace._id2ref(4)
Traceback (most recent call last):
2: from (irb):121
1: from (irb):121:in `_id2ref'
RangeError (0x0000000000000004 is not id value)
Also, I figured that it's the same behaviour for 2^x values(except 1, 2, 8).
(0..10).each do |exp|
object_id = 2**exp
begin
puts "Number: #{object_id} : #{ObjectSpace._id2ref(object_id)}"
rescue Exception => e
puts "Number: #{object_id} : #{e.message}"
end
end
Number: 1 : 0
Number: 2 : 2.0
Number: 4 : 0x0000000000000004 is not id value
Number: 8 : nil
Number: 16 : 0x0000000000000010 is not id value
Number: 32 : 0x0000000000000020 is not id value
Number: 64 : 0x0000000000000040 is not id value
Number: 128 : 0x0000000000000080 is not symbol id value
Number: 256 : 0x0000000000000100 is not id value
Number: 512 : 0x0000000000000200 is not id value
Number: 1024 : 0x0000000000000400 is not id value
Why can't ruby use these specific numbers as object ids?
Also, what's different for (1,2,8)? and why error is different for 128?
First, it is very important to make a couple of things crystal clear:
There are exactly two guarantees Ruby makes about object IDs. These two guarantees are the only thing you are allowed to rely on. You must not make any assumptions about object IDs other than these two guarantees:
An object has the same ID for its entire lifetime.
No two objects have the same ID at the same time.
[Note: this means in particular that different objects can have the same ID at different times, i.e. that IDs can be recycled.]
An object ID is an opaque identifier. You must not make any assumptions about its structure or about any particular value.
Any particular implementation of object IDs is a private internal implementation detail of a specific version of a specific implementation running in a specific environment at a specific moment. There is no guarantee that the results will be the same with a different implementation. There is no guarantee that the results will be the same with a different version of the same implementation. There is no guarantee that the results will be the same with the same version of the same implementation running in a different environment. In fact, there is not even a guarantee that the results will be the same between two runs of the same code on the same version of the same implementation in the same environment.
ObjectSpace::_id2ref is an abomination. It should not even exist. It most certainly should not be used. It breaks object-orientation, it breaks encapsulation, it breaks safety.
Just as an example: unfortunately, you don't say which version of which implementation you are running in which environment. However, it looks like you are running YARV 2.6.3 in a 64-bit environment.
If you were to run that exact same code on the exact same version of YARV in a 32-bit environment, you would get different results. If you were to run that exact same code on an older version of YARV (pre-2.0) in the exact same environment, you would get different results.
Let's address the first, implicit, assumption which I think I see in your question. You seem to think that any ID should resolve to an object. It's easy to see that this cannot be true: there are infinitely many IDs, but for every run of a program, there are only finitely many objects, so there will always be infinitely many IDs which don't resolve to an object.
This already explains most of your results, namely the ones for 4, 16, 32, 64, 256, 512, and 1024.
So, with that out of the way, here's a high-level explanation of why there seems to some sort of structure to the IDs, and what that structure is. (But let me remind you again, that this explanation only applies to 64 bit systems, not to 32 bit, it only applies to YARV, it only applies to versions of YARV 2.0 or newer, and it is quite possible that it will no longer apply to YARV 3.0.)
In YARV, the developers made the decision that the object ID is the same thing as the memory address of the object header. This makes it easy to ensure the "rules" of object IDs: you can't have multiple objects at the same memory address at the same time, and an object will not change its memory address.
(Actually, it turns out that the second one is already a quite severe restriction: many modern high-performance garbage collectors depend on being able to move objects around in memory. This is not possible if you assume that object ID == memory address. Which means you will not be able to use any of those high-performance algorithms.)
On pretty much all modern machines, memory access is word-aligned. While it is possible to address individual bytes, that is generally slower or more awkward. So, we can basically assume that if we allocate memory, it will be aligned on a word-boundary. Which means that all memory addresses will be divisible by 8 on 64-bit systems and 4 on 32-bit systems, or in other words, that all memory addresses will end in 3 (64-bit) or 2 (32-bit) zero bits. Or, in other words: 87.5% (75%) of the address space are unused.
On the other hand, it would be quite a waste to represent Integers as a full-blown Ruby object:
They are immutable, which means we don't have to store any state.
They can't have instance variables, which means we don't have to store an instance variable table.
They can't have a singleton class, which means we don't have to store a __klass__ pointer.
They can't be extended.
And so on …
What this means, is that we can optimize the representation of Integers by not storing them as objects at all. All we need is some special case in the engine, so that if someone asks for the class of, say, 42, instead of trying to look at 42's __klass__ pointer, the engine "magically" knows to just return the Integer class.
Once we have that in place, we can do a really cool trick, which is actually as old as the very first LISP and Smalltalk VMs, and it is called a tagged pointer representation. Normally, the value of a variable is a pointer to the object (header), but with a tagged pointer representation we can store the value of the object inside the pointer to the object itself!
All we need to do is to have some sort of tag on the pointer that tells the engine that this is actually not a pointer but a value disguised as a pointer. In some older machines, especially those specifically designed for running high-level languages, pointers did have a tag field specifically for holding, e.g. type information or access control. Modern machines don't have that, but we have those unused bits we can (ab)use as tag bits.
And that is what YARV is doing: When the last bit of a pointer is 1, then it's not actually a pointer, it's an Integer. In particular, an Integer is encoded in YARV by shifting it one bit to the left and setting the last bit to 1. This allows us to encode a 63-bit Integer in a 64-bit pointer, and do native integer arithmetic at it with no object overhead and only a little bit of bit shifting overhead.
And if you think about what this encoding means:
shifting one bit to the left is equivalent to multiplying by two
setting the last bit to 1 is equivalent to incrementing by 1
Then you can explain the first pattern: a small Integer with value n is encoded as the "quasi-pointer" 2n + 1, and since "memory address" and object ID are the same in YARV (even though this is not actually a memory address, because there is no object which could have an address), it will have the object ID 2n + 1.
Integers that don't fit into 63 bit (31 bit), are allocated as objects like any other object. In different engines, these have different names, e.g. in the Smalltalk-80 VM, they are called SmallInts, in YARV, they are called Fixnums (and the ones that don't fit into a Fixnum are called Bignums). They actually used to be different subclasses of a fully-abstract Integer class in older versions of YARV, but this was considered a mistake. (It's really an internal optimization and should not be visible to the programmer.) In current versions of YARV, Fixnum and Bignum are aliases for Integer and using them gives a deprecation warning.
This explains your result for 1. If you had tried out ObjectSpace._id2ref(3), the result would have 1, then ObjectSpace._id2ref(5) would be 2, and so on.
And we still are using only 62.5% of the address space (on a 64-bit system)!
So, let's think about what else we might want to represent in this way.
YARV has a very similar optimization for Floats. Floating point numbers that fit into 62-bits are called flonums and are represented similar, with a tag of 10 at the end. (YARV does not use flonums on 32-bit platforms.)
This explains your result for ObjectSpace._id2ref(2). If you had tried ObjectSpace._id2ref(6), the result would have been -2.0.
And a similar trick is also played for Symbols. I won't explain it here in detail, because a) I don't actually fully know how it works, and b) it is slightly more complex, because the value being encoded isn't directly the Symbol value, rather it is an index into the Symbol table. However, that explains your result for 128.
Now, lastly, there is a completely different part of the address space that is also unused: the low addresses. On most modern Operating Systems, the low addresses are reserved for mapping the kernel memory directly into the user process in order to speed up the user space ↔︎ kernel space transition. Plus, there is another reason the very low addresses are kept free: in C, it is illegal to dereference a NULL pointer. Now, one way of implementing this, would be for the runtime to track all pointer dereferences and check whether they are dereferencing the NULL pointer. But there is an easier way: just give the NULL pointer an actual memory address, but one that is never allocated. That way, you don't have to do anything: if the code tries to dereference the pointer, the address doesn't exist, and the MMU will take care of raising an error. So, most C compilers compile the NULL pointer to the actual memory address 0, and in order to make sure that there is never any real data allocated at that address, they keep a whole area around address 0 free.
This means that the low addresses are never used, and we can (ab)use them to represent even more "interesting" objects. Now, YARV uses the very low addresses to represent the following objects:
false at address 0, which has the additional advantage that 0 is considered false in C.
nil at address 8 (4 in 32-bit).
true at address 20 (2 in 32-bit).
Qundef (a special internal value inside the engine that denotes an undefined value) at address 52 (6 in 32-bit).
And that explains your number 8.
This also means that your 4, 16, 32, 64, 256, 512, and 1024 will probably never resolve to an object, because they are in the low address range where the C library will simply never allocate memory.
As a closing remark, I want to repeat one last time that all of this is a private internal implementation detail of a specific version of YARV running in a specific environment. You must not rely on any of this, ever.
When flonums were introduced in YARV, and on some platforms nil no longer had object ID 4, this did break some code, and it did cause some confusion (as evidenced e.g. by questions on Stack Overflow), even though the YARV developers are allowed to change object IDs at will, because there are no guarantees being made about any particular ID values or the structure of IDs. Please, do not make the same mistake.

maximum field number in protobuf message

The official document for protocol buffers https://developers.google.com/protocol-buffers/docs/proto3 says the maximum field number for fields in protobuf message is 2^29-1. But why is this limit?
Please anyone can explain in some detail? I am newbie to this.
I read answers to the this question at why 2^29-1 is the biggest key in protocol buffers.
But I am not clarified
Each field in an encoded protocol buffer has a header (called key or tag) prefixed to the actual encoded value. The encoding spec defines this key:
Each key in the streamed message is a varint with the value (field_number << 3) | wire_type – in other words, the last three bits of the number store the wire type.
Here the spec says the tag is a varint where the first 3 bits are used to encode the wire type. A varint could encode a 64 bit value, thus just by going on this definition the limit would be 2^61-1.
In addition to this, the Language Guide narrows this down to a 32 bit value at max.
The smallest field number you can specify is 1, and the largest is 2^29 - 1, or 536,870,911.
The reasons for this are not given. I can only speculate for the reasons behind this:
Artificial limit as no one is expecting a message to have that many fields. Just think about fitting a message with that many fields into memory.
As the key is a varint, it isn't simply the next 4 bytes in the raw buffer, rather a variable length of bytes (Java code reading a varint32). Each byte has 7 bit of actual data and 1 bit indicating if the end is reached. It cloud be that for performance reasons it was deemed to be better to limit the range.
Since proto3 is the 3rd version of protocol buffers, it could be that either proto1 or proto2 defined the tag to be a varint32. To keep backwards compatibility this limit is still true in proto3 today.
Because of this line:
#define GOOGLE_PROTOBUF_WIRE_FORMAT_MAKE_TAG(FIELD_NUMBER, TYPE) \
static_cast<uint32>((static_cast<uint32>(FIELD_NUMBER) << 3) | (TYPE))
this line create a "tag", which left only 29 (32 - 3) bits to save field indice.
Don't know why google use uint32 instead of uint64 though, since field number is a varint, may be they think 2^29-1 fields is large enough for a single message declaration.
I suspect this is simply so that a field-header (wire-type and tag-number) can be decoded and handled as a 32-bit value. The wire-type is always the 3 least significant bits, leaving 29 bits for the tag number. Technically "varint" should support 64 bits, but it makes sense to limit it to reasonable numbers, not least because "varint" encoding means that larger numbers take more bytes to encode.
Edit: I realise now that this is similar to the linked post, but... it remain true! Each field in protobuf is prefixed by a "varint" that expresses what field (tag-number) follows, and what data type it is (wire-type). The latter is important especially so that unexpected fields (version differences) can be stored or skipped correctly. It is convenient for that field-header to be trivially processed by most frameworks, and most frameworks are fine with 32-bit integers.
this is another question rather a comment, in the document it says,
Field numbers in the range 16 through 2047 take two bytes. So you
should reserve the numbers 1 through 15 for very frequently occurring
message elements. Remember to leave some room for frequently occurring
elements that might be added in the future.
Because for the first byte, top 5 bits are used for field number, and bottom 3 bits for field type, isn't it that field number from 31 (because zero is not used) to 2047 take two bytes? (and I also guess the second bytes' lower 3 bits are used also for field type.. I'm in the middle of reading it, so I'll fix it when I know it)

Bit Shift Operator '<<' creates Extra 0xffff?

I am currently stuck with this simple bit-shifting problem. The problem is that when I assign a short variable any values, and shift them with << 8, I get 0xffff(2 extra bytes) when I save the result to the 'short' variables. However, for 'long', it is OK. So I am wondering why this would anyhow happen ??
I mean, short isn't supposed to read more than 2 bytes but... it clearly shows that my short values are containing Extra 2 bytes with the value 0xffff.
I'm seeking for your wisdom.. :)
This image describes the problem. Clearly, when the 'sign' bit(15) of 'short' is set to 1 AFTER the bit shift operation, the whole 2 byte ahead turns into 0xffff. This is demonstrated by showing 127(0x7f) passing the test but 0x81 NOT passing the test because when it is shifted, Due to it's upper 8. That causes to set Bit15(sign bit) to '1'. Also, Because 257(0x101) doesn't set the bit 15 after shifting, it turns out to be OK.
There are several problems with your code.
First, you are doing bit shift operations with signed variables, this may have unexpected results. Use unsigned short instead of short to do bit shifting, unless you are sure of what you are doing.
You are explicitly casting a short to unsigned short and then storing the result back to a variable of type short. Not sure what you are expecting to happen here, this is pointless and will prevent nothing.
The issue is related to that. 129 << 8 is 33024, a value too big to fit in a signed short. You are accidently lighting the sign bit, causing the number to become negative. You would see that if you printed it as %d instead of %x.
Because short is implicitly promoted to int when passed as parameter to printf(), you see the 32-bit version of this negative number, which has its 16 most relevant bits lit in accordance. This is where the leading ffff come from.
You don't have this problem with long because even though its signed long its still large enough to store 33024 without overloading the sign bit.

How does Ruby differentiate VALUE with value and pointer?

For values such as true, nil, or small integers, Ruby does optimization. Instead of using VALUE pointer as a pointer, it directly uses VALUE to store data.
I wonder how Ruby makes a difference between these uses:
def foo(x)
...
with x that will be associated to VALUE. In low level terms, they are just a number. How can I tell whether or not a certain number is a pointer to an object? All that comes to my mind is to limit pointers to have the MSB set to 0, and direct values with MSB equal to 1. But this is just my guess. How is it done in Ruby?
There are many different implementations of Ruby. The Ruby Language Specification doesn't prescribe any particular internal representation for objects – why should it? It's an internal representation, after all!
For example, JRuby doesn't represent objects as C pointers at all, it represents them as Java objects. IronRuby represents them as .NET objects. Opal represents them as ECMAScript objects. MagLev represents them as Smalltalk objects.
However, there are indeed some implementations that use the strategy you describe. The now abandoned MRI did it that way, YARV and Rubinius also do it.
This is actually a very old trick, dating back to at least the 1960s. It's called a tagged pointer representation, and like the name suggests, you need to tag the pointer with some additional metadata in order to know whether or not it is actually a pointer to an object or an encoding of some other datatype.
Some CPUs have special tag bits specifically for that purpose. (For example, on the AS/400, the CPU doesn't even have pointers, it has 128bit object references, even though the original CPU was only 48bit wide, and the newer POWER-based CPUs 64 bit; the extra bits are used to encode all sorts of metadata like type, owner, access restrictions, etc.) Some CPUs have tag bits for other purposes that can be "abused" for this purpose. However, most modern mainstream CPUs don't have tag bits.
But, you can use a trick! On many modern CPUs, unaligned memory accesses (accessing an address that does not start at a word boundary) are really slow (on some, they aren't even possible at all), which means that on a 32bit CPU, all pointers that are realistically being used, end with two 00 bits and on 64 bit CPUs with three 000 bits. You can use these bits as tag bits: pointers that end with 00 are indeed pointers, pointers that end with 01, 10, or 11 are an encoding of some other data type.
In MRI, the pointers ending in 1 were used to encode 31/63 bit Fixnums. In YARV, they are used to encode 31/63 bit Fixnums, i.e. integers that are encoded as actual machine integers according to the formula 2n+1 (arithmetically speaking) or (n << 1) | 1 (as a bit pattern). On 64 bit platforms, YARV also uses pointers that end in 10 to encode 62 bit flonums using a similar scheme. (If you ever wondered why the object_id of a Fixnum in YARV is 2n+1, now you know: YARV uses the memory address for the object ID, and 2n+1 is the "memory address" of n.)
Now, what about nil, false and true? Well, there is no space for them in our current scheme. However, the very low memory addresses are usually reserved for the operating system kernel, which means that a pointer like 0 or 2 or 4 cannot realistically occur in a program. YARV uses that space to encode nil, false and true: false is encoded as 0 (which is convenient because that's also the encoding of false in C), nil is encoded as 0b1000 and true is encoded as 0b10100 (it used to be 0, 0b10 and 0b100 in older versions before the introduction of flonums).
Theoretically, there is a lot of space there to encode other objects as well, but YARV doesn't do that. Some Smalltalk or Lisp VMs, for example, encode ASCII or BMP Unicode character objects there, or some often used objects such as the empty list, empty array, or empty string.
There is still some piece missing, though: without an object header, with just the bare bit pattern, how can the VM access the class, the methods, the instance variables, etc.? Well, it can't. Those have to be special-cased and hardcoded into the VM. The VM simply has to know that a pointer ending in 1 is an encoded Fixnum and has to know that the class is Fixnum and the methods can be found there. And as for instance variables? Well, you could store them separately from the objects in a dictionary on the side. Or you go the Ruby route and simply disallow them altogether.
This answer is merely a distillation of #Jörg always-excellent treatise.
In MRI, true, false, nil and Fixnums are mapped to fixed object_id's; all other objects are assigned dynamically-generated values. The object_id for false is 0. For true and nil they are 20 and 8 (2 and 4 prior to v2.0), respectively. The integer i has object_id i*2+1. Dynamically-generated object_id's cannot be any of these values. Therefore, (in MRI) one can merely check to see if the object_id is one of these values to determine if the associated object has a fixed object_id.
Incidentally, objects can be obtained from their object_id's with the method ObjectSpace#_id2ref.
For more on this, see #sepp2k's answer here.

How is data stored in a bit vector?

I'm a bit confused how a fixed size bit vector stores its data.
Let's assume that we have a bit vector bv that I want to store hello in as ASCII.
So we do bv[0]=104, bv[1]=101, bv[2]=108, bv[3]=108, bv[4]=111.
How is the ASCII of hello represented in the bit vector?
Is it as binary like this: [01101000][01100101][01101100][01101100][01101111]
or as ASCII like this: [104][101][108][108][111]
The following paper HAMPI at section 3.5 step 2, the author is assigning ascii code to a bit vector, but Im confused how the char is represented in the bit vector.
Firstly, you should probably read up on what a bit vector is, just to make sure we're on the same page.
Bit vectors don't represent ASCII characters, they represent bits. Trying to do bv[0]=104 on a bit vector will probably not compile / run, or, if it does, it's very unlikely to do what you expect.
The operations that you would expect to be supported is along the lines of set the 5th bit to 1, set the 10th bit to 0, set all these bit to this, OR the bits of these two vectors and probably some others.
How these are actually stored in memory is completely up to the programming language, and, on top of that, it may even be completely up to a given implementation of that language.
The general consensus (not a rule) is that each bit should take up roughly 1 bit in memory (maybe, on average, slightly more, since there could be overhead related to storing these).
As one example (how Java does it), you could have an array of 64-bit numbers and store 64 bits in each position. The translation to ASCII won't make sense in this case.
Another thing you should know - even ASCII gets stored as bits in memory, so those 2 arrays are essentially the same, unless you meant something else.

Resources