C++11 char16_t strlen-equivalent function - c++11

I have a simple question:
Is there a way to do a strlen()-like count of characters in zero-terminated char16_t array?

use
char_traits<char16_t>::length(your_pointer)
see 21.2.3.2 struct char_traits<char16_t> and table 62 of the C++11-Std.

Use pointers :), create a duplicate pointer to the start, then loop through it while (*endptr++);, then the length is given by endptr - startptr. You can actually template this, however, its possible the the compile won't generate the same intrinsic code it does for strlen(for different sizes ofc).

Necrolis answer includes the NULL byte in the length, which probably is not what you want. strlen() does not include the NULL byte in the length.
Adapting their answer:
static size_t char16len(uint16_t *startptr)
{
uint16_t *endptr = startptr;
while (*endptr) {
endptr++;
}
return endptr - startptr;
}
I realize there is an accepted C++ answer, but this answer is useful for anyone who stumbles on this post using C (like I did).

Related

How to change a boost::multiprecision::cpp_int from big endian to little endian

I have a boost::multiprecision::cpp_int in big endian and have to change it to little endian. How can I do that? I tried with boost::endian::conversion but that did not work.
boost::multiprecision::cpp_int bigEndianInt("0xe35fa931a0000*);
boost::multiprecision::cpp_int littleEndianInt;
littleEndianIn = boost::endian::endian_reverse(m_cppInt);
The memory layout of boost multi-precision types is implementation detail. So you cannot assume much about it anyways (they're not supposed to be bitwise serializable).
Just read a random section of the docs:
MinBits
Determines the number of Bits to store directly within the object before resorting to dynamic memory allocation. When zero, this field is determined automatically based on how many bits can be stored in union with the dynamic storage header: setting a larger value may improve performance as larger integer values will be stored internally before memory allocation is required.
It's not immediately clear that you have any chance at some level of "normal int behaviour" in memory layout. The only exception would be when MinBits==MaxBits.
Indeed, we can static_assert that the size of cpp_int with such backend configs match the corresponding byte-sizes.
It turns out that there's even a promising tag in the backend base-class to indicate "triviality" (this is truly promising): trivial_tag, so let's use it:
Live On Coliru
#include <boost/multiprecision/cpp_int.hpp>
namespace mp = boost::multiprecision;
template <int bits> using simple_be =
mp::cpp_int_backend<bits, bits, mp::unsigned_magnitude>;
template <int bits> using my_int =
mp::number<simple_be<bits>, mp::et_off>;
using my_int8_t = my_int<8>;
using my_int16_t = my_int<16>;
using my_int32_t = my_int<32>;
using my_int64_t = my_int<64>;
using my_int128_t = my_int<128>;
using my_int192_t = my_int<192>;
using my_int256_t = my_int<256>;
template <typename Num>
constexpr bool is_trivial_v = Num::backend_type::trivial_tag::value;
int main() {
static_assert(sizeof(my_int8_t) == 1);
static_assert(sizeof(my_int16_t) == 2);
static_assert(sizeof(my_int32_t) == 4);
static_assert(sizeof(my_int64_t) == 8);
static_assert(sizeof(my_int128_t) == 16);
static_assert(is_trivial_v<my_int8_t>);
static_assert(is_trivial_v<my_int16_t>);
static_assert(is_trivial_v<my_int32_t>);
static_assert(is_trivial_v<my_int64_t>);
static_assert(is_trivial_v<my_int128_t>);
// however it doesn't scale
static_assert(sizeof(my_int192_t) != 24);
static_assert(sizeof(my_int256_t) != 32);
static_assert(not is_trivial_v<my_int192_t>);
static_assert(not is_trivial_v<my_int256_t>);
}
Conluding: you can have trivial int representation up to a certain point, after which you get the allocator-based dynamic-limb implementation no matter what.
Note that using unsigned_packed instead of unsigned_magnitude representation never leads to a trivial backend implementation.
Note that triviality might depend on compiler/platform choices (it's likely that cpp_128_t uses some builtin compiler/standard library support on GCC, e.g.)
Given this, you MIGHT be able to pull of what you wanted to do with hacks IF your backend configuration support triviality. Sadly I think it requires you to manually overload endian_reverse for 128 bits case, because the GCC builtins do not have __builtin_bswap128, nor does Boost Endian define things.
I'd suggest working off the information here How to make GCC generate bswap instruction for big endian store without builtins?
Final Demo (not complete)
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/endian/buffers.hpp>
namespace mp = boost::multiprecision;
namespace be = boost::endian;
template <int bits> void check() {
using T = mp::number<mp::cpp_int_backend<bits, bits, mp::unsigned_magnitude>, mp::et_off>;
static_assert(sizeof(T) == bits/8);
static_assert(T::backend_type::trivial_tag::value);
be::endian_buffer<be::order::big, T, bits, be::align::no> buf;
buf = T("0x0102030405060708090a0b0c0d0e0f00");
std::cout << std::hex << buf.value() << "\n";
}
int main() {
check<128>();
}
(Changing be::order::big to be::order::native obviously makes it compile. The other way to complete it would be to have an ADL accessible overload for endian_reverse for your int type.)
This is both trivial and in the general case unanswerable, let me explain:
For a general N-bit integer, where N is a large number, there is unlikely to be any well defined byte order, indeed even for 64 and 128 bit integers there are more than 2 possible orders in use: https://en.wikipedia.org/wiki/Endianness#Middle-endian.
On any platform, with any native endianness you can always extract the bytes of a cpp_int, the first example here: https://www.boost.org/doc/libs/1_73_0/libs/multiprecision/doc/html/boost_multiprecision/tut/import_export.html#boost_multiprecision.tut.import_export.examples shows you how. When exporting bytes like this, they are always most significant byte first, so you can subsequently rearrange them how you wish. You should not however, rearrange them and load them back into a cpp_int as the class won't know what to do with the result!
If you know that the value is small enough to fit into a native integer type, then you can simply cast to the native integer and use a system API on the result. As in endian_reverse(static_cast<int64_t>(my_cpp_int)). Again, don't assign the result back into a cpp_int as it requires native byte order.
If you wish to check whether a value is small enough to fit in an N-bit integer for the approach above, you can use the msb function, which returns the index of the most significant bit in the cpp_int, add one to that to obtain the number of bits used, and filter out the zero case and the code looks like:
unsigned bits_used = my_cpp_int.is_zero() ? 0 : msb(my_cpp_int) + 1;
Note that all of the above use completely portable code - no hacking of the underlying implementation is required.

Why the following code prints garbage values for input strings greater than 128 bytes?

This is a problem of codechef that I recently came across. The answer seems to be right for every test case where the value of input string is less than 128 bytes as it is passing a couple of test cases. For every value greater than 128 bytes it is printing out a large value which seems to be a garbage value.
std::string str;
std::cin>>str;
vector<pair<char,int>> v;
v.push_back(make_pair('C',0));
v.push_back(make_pair('H',0));
v.push_back(make_pair('E',0));
v.push_back(make_pair('F',0));
int i=0;
while(1)
{
if(str[i]=='C')
v['C'].second++;
else if (str[i]=='H')
{
v['H'].second++;
v['C'].second--;
}
else if (str[i]=='E')
{
v['E'].second++;
v['C'].second--;
}
else if (str[i]=='F')
v['F'].second++;
else
break;
i++;
Even enclosing the same code within
/*reading the string values from a file and not console*/
std::string input;
std::ifstream infile("input.txt");
while(getline(infile,input))
{
istringstream in(input);
string str;
in>>str;
/* above code goes here */
}
generates the same result. I am not looking for any solution(s) or hint(s) to get to the right answer as I want to test the correctness of my algorithm. But I want to know why this happens as I am new to vector containers`.
-Regards.
if(str[i]=='C')
v['C'].second++;
You're modifying v[67]
... which is not contained in your vector, and thus either invalid memory or uninitialized
You seem to be trying to use a vector as an associative array. There is already such a structure in C++: a std::map. Use that instead.
With using this v['C'] you actually access the 67th (if 'A' is 65 from ASCII) element of a container having only 4 items. Depending on compiler and mode (debug vs release) you get undefined behavior for the code.
What you probably wanted to use was map i.e. map<char,int> v; instead of vector<pair<char,int>> v; and simple v['C']++; instead of v['C'].second++;

Reordering members in a template by alignment

Assume I write the following code:
template<typename T1, typename T2>
struct dummy {
T1 first;
T2 second;
};
I would like to know in general how I can order members in a template class by descending size. In other words, I would like the above class to be
struct dummy {
int first;
char second;
};
when instantiated as dummy<int, char>. However, I would like to obtain
struct dummy {
int second;
char first;
};
in the case dummy<char, int>.
On most platforms, padding for std::pair occurs only at "natural" alignment. This sort of padding will end up the same for either order.
For std::tuple, some arrangements can be more efficient than others, but the library can choose any memory layout it likes, so any TMP you add on top is only second-guessing.
In general, yes, you can define a sorting algorithm using templates, but it would be a fair bit of work.
This can be done, the only issue is the naming, how would you name your fields ??
I did what you are asking for not long time ago, I used std::tuple, and some meta-programming skills, I did a merge sort to reorder the template arguments, It is really fun to do (if you like functionnal programming).
For the naming I used some Macro to access the fields.
I really encourage you to do it by yourself, it is really interesting intellectually, however if you like to see some code, please tell me !

C-API: Allocating "PyTypeObject-extension"

I have found some code in PyCXX that may be buggy.
Is it indeed a bug, and if so, what is the right way to fix it?
Here is the problem:
struct PythonClassInstance
{
PyObject_HEAD
ExtObjBase* m_pycxx_object;
}
:
{
:
table->tp_new = extension_object_new; // PyTypeObject
:
}
:
static PyObject* extension_object_new(
PyTypeObject* subtype, PyObject* args, PyObject* kwds )
{
PythonClassInstance* o = reinterpret_cast<PythonClassInstance *>
( subtype->tp_alloc(subtype,0) );
if( ! o )
return nullptr;
o->m_pycxx_object = nullptr;
PyObject* self = reinterpret_cast<PyObject* >( o );
return self;
}
Now PyObject_HEAD expands to "PyObject ob_base;", so clearly PythonClassInstance trivially extends PyObject to contain an extra pointer (which will point to PyCXX's representation for this PyObject)
tp_alloc allocates memory for storing a PyObject
The code then typecasts this pointer to a PythonClassInstance, laying claim to an extra 4(or 8?) bytes that it does not own!
And then it sets this extra memory to 0.
This looks very dangerous, and I'm surprised the bug has gone unnoticed. The risk is that some future object will get placed in this location (that is meant to be storing the ExtObjBase*).
How to fix it?
PythonClassInstance foo{};
PyObject* tmp = subtype->tp_alloc(subtype,0);
// !!! memcpy sizeof(PyObject) bytes starting from location tmp into location (void*)foo
But I think now maybe I need to release tmp, and I don't think I should be playing with memory directly like this. I feel like it could be jeopardising Python's memory management/garbage collection inbuilt machinery.
The other option is maybe I can persuade tp_alloc to allocate 4 extra bytes (or is it 8 now; enough for a pointer) bypassing in 1 instead of 0.
Documentation says this second parameter is "Py_ssize_t nitems" and:
If the type’s tp_itemsize is non-zero, the object’s ob_size field
should be initialized to nitems and the length of the allocated memory
block should be tp_basicsize + nitemstp_itemsize, rounded up to a
multiple of sizeof(void); otherwise, nitems is not used and the
length of the block should be tp_basicsize.
So it looks like I should be setting:
table->tp_itemsize = sizeof(void*);
:
PyObject* tmp = subtype->tp_alloc(subtype,1);
EDIT: just tried this and it causes a crash
But then the documentation goes on to say:
Do not use this function to do any other instance initialization, not
even to allocate additional memory; that should be done by tp_new.
Now I'm not sure whether this code belongs in tp_new or tp_init.
Related:
Passing arguments to tp_new and tp_init from subtypes in Python C API
Python C-API Object Allocation‏
The code is correct.
As long as the PyTypeObject for the extension object is properly initialized it should work.
The base class tp_alloc receives subtype so it should know how much memory to allocate by checking the tp_basicsize member.
This is a common Python C/API pattern as demonstrated int the tutorial.
Actually this is a (minor/harmless) bug in PyCXX
SO would like to convert this answer to a comment, which makes no sense I can't awarded the green tick of completion so I comment. So I have to ramble in order to qualify it. blerh.

Passing C++ structure pointer from Perl to arbitary dll function call

I am using Win32::API to call an arbitary function exported in a DLL which accepts a C++ structure pointer.
struct PluginInfo {
int nStructSize;
int nType;
int nVersion;
int nIDCode;
char szName[ 64 ];
char szVendor[ 64 ];
int nCertificate;
int nMinAmiVersion;
};
As we need to use the "pack" function to construct the structure and need to pass an argument
my $name = " " x 64;
my $vendor = " " x 64;
my $pluginInfo = pack('IIIIC64C64II',0,0,0,0,$name,$vendor,0,0);
Its not constructing the structure properly.
It seems that length argument applied to C will gobble those many arguments.
Can some one please suggest the best way to construct this structure form Perl and passon to dll call.
Thanks in advance,
Naga Kiran
Use Z (NUL-padded string) in your template, as in
my $pluginInfo = pack('IIIIZ64Z64II',0,0,0,0,$name,$vendor,0,0);
Also, take a look at Win32::API::Struct, which is part of the Win32::API module.
For anything complicated, check out Convert::Binary::C. It may seem daunting at first, but once you realize its power, it's an eye opener.
Update: Let me add a bit of information. You need to have a look at a specific section of the module's manpage for the prime reason to use it. I'll quote it for convenience:
Why use Convert::Binary::C?
Say you want to pack (or unpack) data
according to the following C
structure:
struct foo {
char ary[3];
unsigned short baz;
int bar;
};
You could of course use Perl's pack
and unpack functions:
#ary = (1, 2, 3);
$baz = 40000;
$bar = -4711;
$binary = pack 'c3 Si', #ary, $baz, $bar;
But this implies that the struct
members are byte aligned. If they were
long aligned (which is the default for
most compilers), you'd have to write
$binary = pack 'c3 x S x2 i', #ary, $baz, $bar;
which doesn't really increase
readability.
Now imagine that you need to pack the
data for a completely different
architecture with different byte
order. You would look into the pack
manpage again and perhaps come up with
this:
$binary = pack 'c3 x n x2 N', #ary, $baz, $bar;
However, if you try to unpack $foo
again, your signed values have turned
into unsigned ones.
All this can still be managed with
Perl. But imagine your structures get
more complex? Imagine you need to
support different platforms? Imagine
you need to make changes to the
structures? You'll not only have to
change the C source but also dozens of
pack strings in your Perl code. This
is no fun. And Perl should be fun.
Now, wouldn't it be great if you could
just read in the C source you've
already written and use all the types
defined there for packing and
unpacking? That's what
Convert::Binary::C does.

Resources