X11 XClientMessageEvent union porting to 64-bit... Bug? - x11

I'm working with a legacy application on Linux that uses ClientMessage to do messaging between cooperating processes. The XClientMessageEvent structure provides a union as a convenience to use for custom data:
union {
char b[20];
short s[10];
long l[5];
} data;
And the man page claims: "The b, s, and l members represent data of twenty 8-bit values, ten 16-bit values, and five 32-bit values. "
This is all well and good if compiling on a 32-bit system. Just different access to the 20 bytes of the union, right? Well, on a 64-bit system, the "long l[5]" member takes up 40 bytes.
My testing shows that if the sender considers these as 64-bit longs and uses them, the receiver only gets the first 20 bytes of the "data" structure. The last 20 bytes are lost (appear as zeros to the receiver).
Since the application relies pretty heavily on packing this union with both shorts and longs, I'm stuck.
I'm using: xorg-x11-server-Xorg-1.15.0-22.el6.centos.x86_64
Has anyone had any experience in solving this? Am I just not understanding this correctly?
It sure seems like a bug in X to me. 64-bit X11 has been around for a long time. I've only seen one other mention of this issue on the web.

I've used a XClientMessage on 64 bit, this works great to go fullscreen:
XClientMessageEvent xevent;
memset (&xevent, 0, sizeof (xevent));
xevent.type = ClientMessage;
xevent.window = mw;
xevent.message_type = XInternAtom (m_display, "_NET_WM_FULLSCREEN_MONITORS", False);
xevent.format = 32;
xevent.data.l[0] = m_fullScreen;
xevent.data.l[1] = m_fullScreen;
xevent.data.l[2] = m_fullScreen;
xevent.data.l[3] = m_fullScreen;
xevent.data.l[4] = 1; // source indication
XSendEvent (m_display, DefaultRootWindow (m_display), False, SubstructureRedirectMask | SubstructureNotifyMask, (XEvent *) & xevent);
Did you set xevent.format as well? It's not documented but I guess it's used to hint as to which of the 3 members b, s or l are used.

Related

How to change a boost::multiprecision::cpp_int from big endian to little endian

I have a boost::multiprecision::cpp_int in big endian and have to change it to little endian. How can I do that? I tried with boost::endian::conversion but that did not work.
boost::multiprecision::cpp_int bigEndianInt("0xe35fa931a0000*);
boost::multiprecision::cpp_int littleEndianInt;
littleEndianIn = boost::endian::endian_reverse(m_cppInt);
The memory layout of boost multi-precision types is implementation detail. So you cannot assume much about it anyways (they're not supposed to be bitwise serializable).
Just read a random section of the docs:
MinBits
Determines the number of Bits to store directly within the object before resorting to dynamic memory allocation. When zero, this field is determined automatically based on how many bits can be stored in union with the dynamic storage header: setting a larger value may improve performance as larger integer values will be stored internally before memory allocation is required.
It's not immediately clear that you have any chance at some level of "normal int behaviour" in memory layout. The only exception would be when MinBits==MaxBits.
Indeed, we can static_assert that the size of cpp_int with such backend configs match the corresponding byte-sizes.
It turns out that there's even a promising tag in the backend base-class to indicate "triviality" (this is truly promising): trivial_tag, so let's use it:
Live On Coliru
#include <boost/multiprecision/cpp_int.hpp>
namespace mp = boost::multiprecision;
template <int bits> using simple_be =
mp::cpp_int_backend<bits, bits, mp::unsigned_magnitude>;
template <int bits> using my_int =
mp::number<simple_be<bits>, mp::et_off>;
using my_int8_t = my_int<8>;
using my_int16_t = my_int<16>;
using my_int32_t = my_int<32>;
using my_int64_t = my_int<64>;
using my_int128_t = my_int<128>;
using my_int192_t = my_int<192>;
using my_int256_t = my_int<256>;
template <typename Num>
constexpr bool is_trivial_v = Num::backend_type::trivial_tag::value;
int main() {
static_assert(sizeof(my_int8_t) == 1);
static_assert(sizeof(my_int16_t) == 2);
static_assert(sizeof(my_int32_t) == 4);
static_assert(sizeof(my_int64_t) == 8);
static_assert(sizeof(my_int128_t) == 16);
static_assert(is_trivial_v<my_int8_t>);
static_assert(is_trivial_v<my_int16_t>);
static_assert(is_trivial_v<my_int32_t>);
static_assert(is_trivial_v<my_int64_t>);
static_assert(is_trivial_v<my_int128_t>);
// however it doesn't scale
static_assert(sizeof(my_int192_t) != 24);
static_assert(sizeof(my_int256_t) != 32);
static_assert(not is_trivial_v<my_int192_t>);
static_assert(not is_trivial_v<my_int256_t>);
}
Conluding: you can have trivial int representation up to a certain point, after which you get the allocator-based dynamic-limb implementation no matter what.
Note that using unsigned_packed instead of unsigned_magnitude representation never leads to a trivial backend implementation.
Note that triviality might depend on compiler/platform choices (it's likely that cpp_128_t uses some builtin compiler/standard library support on GCC, e.g.)
Given this, you MIGHT be able to pull of what you wanted to do with hacks IF your backend configuration support triviality. Sadly I think it requires you to manually overload endian_reverse for 128 bits case, because the GCC builtins do not have __builtin_bswap128, nor does Boost Endian define things.
I'd suggest working off the information here How to make GCC generate bswap instruction for big endian store without builtins?
Final Demo (not complete)
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/endian/buffers.hpp>
namespace mp = boost::multiprecision;
namespace be = boost::endian;
template <int bits> void check() {
using T = mp::number<mp::cpp_int_backend<bits, bits, mp::unsigned_magnitude>, mp::et_off>;
static_assert(sizeof(T) == bits/8);
static_assert(T::backend_type::trivial_tag::value);
be::endian_buffer<be::order::big, T, bits, be::align::no> buf;
buf = T("0x0102030405060708090a0b0c0d0e0f00");
std::cout << std::hex << buf.value() << "\n";
}
int main() {
check<128>();
}
(Changing be::order::big to be::order::native obviously makes it compile. The other way to complete it would be to have an ADL accessible overload for endian_reverse for your int type.)
This is both trivial and in the general case unanswerable, let me explain:
For a general N-bit integer, where N is a large number, there is unlikely to be any well defined byte order, indeed even for 64 and 128 bit integers there are more than 2 possible orders in use: https://en.wikipedia.org/wiki/Endianness#Middle-endian.
On any platform, with any native endianness you can always extract the bytes of a cpp_int, the first example here: https://www.boost.org/doc/libs/1_73_0/libs/multiprecision/doc/html/boost_multiprecision/tut/import_export.html#boost_multiprecision.tut.import_export.examples shows you how. When exporting bytes like this, they are always most significant byte first, so you can subsequently rearrange them how you wish. You should not however, rearrange them and load them back into a cpp_int as the class won't know what to do with the result!
If you know that the value is small enough to fit into a native integer type, then you can simply cast to the native integer and use a system API on the result. As in endian_reverse(static_cast<int64_t>(my_cpp_int)). Again, don't assign the result back into a cpp_int as it requires native byte order.
If you wish to check whether a value is small enough to fit in an N-bit integer for the approach above, you can use the msb function, which returns the index of the most significant bit in the cpp_int, add one to that to obtain the number of bits used, and filter out the zero case and the code looks like:
unsigned bits_used = my_cpp_int.is_zero() ? 0 : msb(my_cpp_int) + 1;
Note that all of the above use completely portable code - no hacking of the underlying implementation is required.

Keeping UINT64 values in V8

I'm looking to integrate a scripting engine in my C/C++ program. Currently, I am looking at Google V8.
How do I efficiently handle 64 bit values in V8? My C/C++ program uses 64 bit values extensivly for keeping handlers/pointers. I don't want them separatelly allocated on the heap. There appears to be a V8::External value type. Can I assign it to a Javascript variable and use it as a value type?
function foo() {
var a = MyNativeFunctionReturningAnUnsigned64BitValue();
var b = a; // Hopefully, b is a stack allocated value capable of
// keeping a 64 bit pointer or some other uint64 structure.
MyNativeFunctionThatAcceptsAnUnsigned64BitValue(b);
}
If it is not possible in V8, how about SpiderMonkey? I know that Duktape (Javascript engine) has a non Ecmascript standard 64 bit value type (stack allocated) to host pointers, but I would assume that other engines also wants to keep track of external pointers from within their objects.
No it's not possible and I'm afraid duktape could be violating the spec unless it took some great pains to ensure it's not observable.
You can store pointers in objects so to store 64-bit ints directly on an object you need pointers to have the same size:
Local<FunctionTemplate> function_template = FunctionTemplate::New(isolate);
// Instances of this function have room for 1 internal field
function_template->InstanceTemplate()->SetInternalFieldCount(1);
Local<Object> object = function_template->GetFunction()->NewInstance();
static_assert(sizeof(void*) == sizeof(uint64_t));
uint64_t integer = 1;
object->SetAlignedPointerInInternalField(0, reinterpret_cast<void*>(integer));
uint64_t result = reinterpret_cast<uint64_t>(object->GetAlignedPointerInInternalField(0));
This is of course far from being efficient.

C-API: Allocating "PyTypeObject-extension"

I have found some code in PyCXX that may be buggy.
Is it indeed a bug, and if so, what is the right way to fix it?
Here is the problem:
struct PythonClassInstance
{
PyObject_HEAD
ExtObjBase* m_pycxx_object;
}
:
{
:
table->tp_new = extension_object_new; // PyTypeObject
:
}
:
static PyObject* extension_object_new(
PyTypeObject* subtype, PyObject* args, PyObject* kwds )
{
PythonClassInstance* o = reinterpret_cast<PythonClassInstance *>
( subtype->tp_alloc(subtype,0) );
if( ! o )
return nullptr;
o->m_pycxx_object = nullptr;
PyObject* self = reinterpret_cast<PyObject* >( o );
return self;
}
Now PyObject_HEAD expands to "PyObject ob_base;", so clearly PythonClassInstance trivially extends PyObject to contain an extra pointer (which will point to PyCXX's representation for this PyObject)
tp_alloc allocates memory for storing a PyObject
The code then typecasts this pointer to a PythonClassInstance, laying claim to an extra 4(or 8?) bytes that it does not own!
And then it sets this extra memory to 0.
This looks very dangerous, and I'm surprised the bug has gone unnoticed. The risk is that some future object will get placed in this location (that is meant to be storing the ExtObjBase*).
How to fix it?
PythonClassInstance foo{};
PyObject* tmp = subtype->tp_alloc(subtype,0);
// !!! memcpy sizeof(PyObject) bytes starting from location tmp into location (void*)foo
But I think now maybe I need to release tmp, and I don't think I should be playing with memory directly like this. I feel like it could be jeopardising Python's memory management/garbage collection inbuilt machinery.
The other option is maybe I can persuade tp_alloc to allocate 4 extra bytes (or is it 8 now; enough for a pointer) bypassing in 1 instead of 0.
Documentation says this second parameter is "Py_ssize_t nitems" and:
If the type’s tp_itemsize is non-zero, the object’s ob_size field
should be initialized to nitems and the length of the allocated memory
block should be tp_basicsize + nitemstp_itemsize, rounded up to a
multiple of sizeof(void); otherwise, nitems is not used and the
length of the block should be tp_basicsize.
So it looks like I should be setting:
table->tp_itemsize = sizeof(void*);
:
PyObject* tmp = subtype->tp_alloc(subtype,1);
EDIT: just tried this and it causes a crash
But then the documentation goes on to say:
Do not use this function to do any other instance initialization, not
even to allocate additional memory; that should be done by tp_new.
Now I'm not sure whether this code belongs in tp_new or tp_init.
Related:
Passing arguments to tp_new and tp_init from subtypes in Python C API
Python C-API Object Allocation‏
The code is correct.
As long as the PyTypeObject for the extension object is properly initialized it should work.
The base class tp_alloc receives subtype so it should know how much memory to allocate by checking the tp_basicsize member.
This is a common Python C/API pattern as demonstrated int the tutorial.
Actually this is a (minor/harmless) bug in PyCXX
SO would like to convert this answer to a comment, which makes no sense I can't awarded the green tick of completion so I comment. So I have to ramble in order to qualify it. blerh.

Best way to maintain an RNG state in multiple devices in openCL

So I'm trying to make use of this custom RNG library for openCL:
http://cas.ee.ic.ac.uk/people/dt10/research/rngs-gpu-mwc64x.html
The library defines a state struct:
//! Represents the state of a particular generator
typedef struct{ uint x; uint c; } mwc64x_state_t;
And in order to generate a random uint, you pass in the state into the following function:
uint MWC64X_NextUint(mwc64x_state_t *s)
which updates the state, so that when you pass it into the function again, the next "random" number in the sequence will be generated.
For the project I am creating I need to be able to generate random numbers not just in different work groups/items but also across multiple devices simultaneously and I'm having trouble figuring out the best way to design this. Like should I create 1 mwc64x_state_t object per device/commandqueue and pass that state in as a global variable? Or is it possible to create 1 state object for all devices at once?
Or do I not even pass it in as a global variable and declare a new state locally within each kernel function?
The library also comes with this function:
void MWC64X_SeedStreams(mwc64x_state_t *s, ulong baseOffset, ulong perStreamOffset)
Which supposedly is supposed to split up the RNG into multiple "streams" but including this in my kernel makes it incredibly slow. For instance, if I do something very simple like the following:
__kernel void myKernel()
{
mwc64x_state_t rng;
MWC64X_SeedStreams(&rng, 0, 10000);
}
Then the kernel call becomes around 40x slower.
The library does come with some source code that serves as example usages but the example code is kind of limited and doesn't seem to be that helpful.
So if anyone is familiar with RNGs in openCL or if you've used this particular library before I'd very much appreciate your advice.
The MWC64X_SeedStreams function is indeed relatively slow, at least in comparison
to the MWC64X_NextUint call, but this is true of most parallel RNGs that try
to split a large global stream into many sub-streams that can be used in
parallel. The assumption is that you'll be calling NextUint many times
within the kernel (e.g. a hundred or more), but SeedStreams is only at the top.
This is an annotated version of the EstimatePi example that comes with
with the library (mwc64x/test/estimate_pi.cpp and mwc64x/test/test_mwc64x.cl).
__kernel void EstimatePi(ulong n, ulong baseOffset, __global ulong *acc)
{
// One RNG state per work-item
mwc64x_state_t rng;
// This calculates the number of samples that each work-item uses
ulong samplesPerStream=n/get_global_size(0);
// And then skip each work-item to their part of the stream, which
// will from stream offset:
// baseOffset+2*samplesPerStream*get_global_id(0)
// up to (but not including):
// baseOffset+2*samplesPerStream*(get_global_id(0)+1)
//
MWC64X_SeedStreams(&rng, baseOffset, 2*samplesPerStream);
// Now use the numbers
uint count=0;
for(uint i=0;i<samplesPerStream;i++){
ulong x=MWC64X_NextUint(&rng);
ulong y=MWC64X_NextUint(&rng);
ulong x2=x*x;
ulong y2=y*y;
if(x2+y2 >= x2)
count++;
}
acc[get_global_id(0)] = count;
}
So the intent is that n should be large and grow as the number
of work items grow, so that samplesPerStream remains around
a hundred or more.
If you want multiple kernels on multiple devices, then you
need to add another level of hierarchy to the stream splitting,
so for example if you have:
K : Number of devices (possibly on parallel machines)
W : Number work-items per device
C : Number of calls to NextUint per work-item
You end up with N=KWC total calls to NextUint across all
work-items. If your devices are identified as k=0..(K-1),
then within each kernel you would do:
MWC64X_SeedStreams(&rng, W*C*k, C);
Then the indices within the stream would be:
[0 .. N ) : Parts of stream used across all devices
[k*(W*C) .. (k+1)*(W*C) ) : Used within device k
[k*(W*C)+(i*C) .. (k*W*C)+(i+1)*C ) : Used by work-item i in device k.
It is fine if each kernel uses less than C samples, you can
over-estimate if necessary.
(I'm the author of the library).

Crash when casting the result of arc4random() to Int

I've written a simple Bag class. A Bag is filled with a fixed ratio of Temperature enums. It allows you to grab one at random and automatically refills itself when empty. It looks like this:
class Bag {
var items = Temperature[]()
init () {
refill()
}
func grab()-> Temperature {
if items.isEmpty {
refill()
}
var i = Int(arc4random()) % items.count
return items.removeAtIndex(i)
}
func refill() {
items.append(.Normal)
items.append(.Hot)
items.append(.Hot)
items.append(.Cold)
items.append(.Cold)
}
}
The Temperature enum looks like this:
enum Temperature: Int {
case Normal, Hot, Cold
}
My GameScene:SKScene has a constant instance property bag:Bag. (I've tried with a variable as well.) When I need a new temperature I call bag.grab(), once in didMoveToView and when appropriate in touchesEnded.
Randomly this call crashes on the if items.isEmpty line in Bag.grab(). The error is EXC_BAD_INSTRUCTION. Checking the debugger shows items is size=1 and [0] = (AppName.Temperature) <invalid> (0x10).
Edit Looks like I don't understand the debugger info. Even valid arrays show size=1 and unrelated values for [0] =. So no help there.
I can't get it to crash isolated in a Playground. It's probably something obvious but I'm stumped.
Function arc4random returns an UInt32. If you get a value higher than Int.max, the Int(...) cast will crash.
Using
Int(arc4random_uniform(UInt32(items.count)))
should be a better solution.
(Blame the strange crash messages in the Alpha version...)
I found that the best way to solve this is by using rand() instead of arc4random()
the code, in your case, could be:
var i = Int(rand()) % items.count
This method will generate a random Int value between the given minimum and maximum
func randomInt(min: Int, max:Int) -> Int {
return min + Int(arc4random_uniform(UInt32(max - min + 1)))
}
The crash that you were experiencing is due to the fact that Swift detected a type inconsistency at runtime.
Since Int != UInt32 you will have to first type cast the input argument of arc4random_uniform before you can compute the random number.
Swift doesn't allow to cast from one integer type to another if the result of the cast doesn't fit. E.g. the following code will work okay:
let x = 32
let y = UInt8(x)
Why? Because 32 is a possible value for an int of type UInt8. But the following code will fail:
let x = 332
let y = UInt8(x)
That's because you cannot assign 332 to an unsigned 8 bit int type, it can only take values 0 to 255 and nothing else.
When you do casts in C, the int is simply truncated, which may be unexpected or undesired, as the programmer may not be aware that truncation may take place. So Swift handles things a bit different here. It will allow such kind of casts as long as no truncation takes place but if there is truncation, you get a runtime exception. If you think truncation is okay, then you must do the truncation yourself to let Swift know that this is intended behavior, otherwise Swift must assume that is accidental behavior.
This is even documented (documentation of UnsignedInteger):
Convert from Swift's widest unsigned integer type,
trapping on overflow.
And what you see is the "overflow trapping", which is poorly done as, of course, one could have made that trap actually explain what's going on.
Assuming that items never has more than 2^32 elements (a bit more than 4 billion), the following code is safe:
var i = Int(arc4random() % UInt32(items.count))
If it can have more than 2^32 elements, you get another problem anyway as then you need a different random number function that produces random numbers beyond 2^32.
This crash is only possible on 32-bit systems. Int changes between 32-bits (Int32) and 64-bits (Int64) depending on the device architecture (see the docs).
UInt32's max is 2^32 − 1. Int64's max is 2^63 − 1, so Int64 can easily handle UInt32.max. However, Int32's max is 2^31 − 1, which means UInt32 can handle numbers greater than Int32 can, and trying to create an Int32 from a number greater than 2^31-1 will create an overflow.
I confirmed this by trying to compile the line Int(UInt32.max). On the simulators and newer devices, this compiles just fine. But I connected my old iPod Touch (32-bit device) and got this compiler error:
Integer overflows when converted from UInt32 to Int
Xcode won't even compile this line for 32-bit devices, which is likely the crash that is happening at runtime. Many of the other answers in this post are good solutions, so I won't add or copy those. I just felt that this question was missing a detailed explanation of what was going on.
This will automatically create a random Int for you:
var i = random() % items.count
i is of Int type, so no conversion necessary!
You can use
Int(rand())
To prevent same random numbers when the app starts, you can call srand()
srand(UInt32(NSDate().timeIntervalSinceReferenceDate))
let randomNumber: Int = Int(rand()) % items.count

Resources