UTF8 Conversion - utf-8

I need to generate a UTF8 string to pass to a 3rd party library and I'm having trouble figuring out the right gymnastics... Also, to make matters worst, I'm stuck using C++ Builder 6 and every example I found talks about using std::string which CBuilder6 evidentially has no support for. I'd like to accomplish this without using STL what so ever.
Here is my code so far that I can't seem to make work.
wchar_t *SS1;
char *SS2;
SS1 = L"select * from mnemonics;";
int strsize = WideCharToMultiByte(CP_UTF8, 0, SS1, wcslen(SS1), NULL, 0, NULL, NULL);
SS2 = new char[strsize+1];
WideCharToMultiByte( CP_UTF8, 0, SS1, wcslen(SS1), SS2, strsize, NULL, NULL);
3rd party library chokes when I pass it SS2 as a parameter. Obviously, I'm on a Windows platform using Microsoft's WideCharToMultiByte but eventually I would like to not need this function call as this code must also be compiled on an embedded platform as well under Linux but I'll cross that bridge when I get to it.
For now, I just need to be able to either convert a wchar_t or char to UTF8 encoded string preferably without using any STL. I won't have STL on the embedded platform.
Thanks!

Something like that:
extern void someFunctionThatAcceptsUTF8(const char* utf8);
const char* ss1 = "string in system default multibyte encoding";
someFunctionThatAcceptsUTF8( w2u( a2w(ss1) ) ); // that conversion you need:
// a2w: "ansi" -> widechar string
// w2u: widechar string -> utf8 string.
You just need to grab and include this file:
http://code.google.com/p/tiscript/source/browse/trunk/sdk/include/aux-cvt.h
It should work on Builder just fine.

If you're still looking for an answer, here's a simple implementation of a utf8 convertor in C language:
/*
** Transforms a wchar to utf-8, returning a string of converted bytes
*/
void ft_to_utf8(wchar_t c, unsigned char *buffer)
{
if (c < (1 << 7))
*buffer++ = (unsigned char)(c);
else if (c < (1 << 11))
{
*buffer++ = (unsigned char)((c >> 6) | 0xC0);
*buffer++ = (unsigned char)((c & 0x3F) | 0x80);
}
else if (c < (1 << 16))
{
*buffer++ = (unsigned char)((c >> 12) | 0xE0);
*buffer++ = (unsigned char)(((c >> 6) & 0x3F) | 0x80);
*buffer++ = (unsigned char)((c & 0x3F) | 0x80);
}
else if (c < (1 << 21))
{
*buffer++ = (unsigned char)((c >> 18) | 0xF0);
*buffer++ = (unsigned char)(((c >> 12) & 0x3F) | 0x80);
*buffer++ = (unsigned char)(((c >> 6) & 0x3F) | 0x80);
*buffer++ = (unsigned char)((c & 0x3F) | 0x80);
}
*buffer = '\0';
}

Related

Fast bit shift of a byte array - CMAC subkeys

I need to implement as fast as possible left bit shift of a 16-byte array in JavaCard.
I tried this code:
private static final void rotateLeft(final byte[] output, final byte[] input) {
short carry = 0;
short i = (short) 16;
do {
--i;
carry = (short)((input[i] << 1) | carry);
output[i] = (byte)carry;
carry = (short)((carry >> 8) & 1);
} while (i > 0);
}
Any ideas how to improve the performace? I was thinking about some Util.getShort(...) and Util.setShort(...) magic, but I did not manage to make it work faster then the implementation above.
This is one part of CMAC subkeys computation and it is done quite often, unfortunately. In case you know some faster way to compute CMAC subkeys (both subkeys in one loop or something like that), please, let me know.
When it comes for speed, known length, hard-coded version is the fastest (but ugly). If you need to shift more than one bit, ensure to update the code accordingly.
output[0] = (byte)((byte)(input[0] << 1) | (byte)((input[1] >> 7) & 1));
output[1] = (byte)((byte)(input[1] << 1) | (byte)((input[2] >> 7) & 1));
output[2] = (byte)((byte)(input[2] << 1) | (byte)((input[3] >> 7) & 1));
output[3] = (byte)((byte)(input[3] << 1) | (byte)((input[4] >> 7) & 1));
output[4] = (byte)((byte)(input[4] << 1) | (byte)((input[5] >> 7) & 1));
output[5] = (byte)((byte)(input[5] << 1) | (byte)((input[6] >> 7) & 1));
output[6] = (byte)((byte)(input[6] << 1) | (byte)((input[7] >> 7) & 1));
output[7] = (byte)((byte)(input[7] << 1) | (byte)((input[8] >> 7) & 1));
output[8] = (byte)((byte)(input[8] << 1) | (byte)((input[9] >> 7) & 1));
output[9] = (byte)((byte)(input[9] << 1) | (byte)((input[10] >> 7) & 1));
output[10] = (byte)((byte)(input[10] << 1) | (byte)((input[11] >> 7) & 1));
output[11] = (byte)((byte)(input[11] << 1) | (byte)((input[12] >> 7) & 1));
output[12] = (byte)((byte)(input[12] << 1) | (byte)((input[13] >> 7) & 1));
output[13] = (byte)((byte)(input[13] << 1) | (byte)((input[14] >> 7) & 1));
output[14] = (byte)((byte)(input[14] << 1) | (byte)((input[15] >> 7) & 1));
output[15] = (byte)(input[15] << 1);
And use RAM byte array!
This is the fastest algorithm to rotate arbitrary number of bits I could come up with (I rotate array of 8 bytes, you can easily transform it to shifting 16 instead):
Use EEPROM to create a masking table for your shifts. Mask is just increasing amounts of 1s from the right:
final static byte[] ROTL_MASK = {
(byte) 0x00, //shift 0: 00000000 //this one is never used, we don't do shift 0.
(byte) 0x01, //shift 1: 00000001
(byte) 0x03, //shift 2: 00000011
(byte) 0x07, //shift 3: 00000111
(byte) 0x0F, //shift 4: 00001111
(byte) 0x1F, //shift 5: 00011111
(byte) 0x3F, //shift 6: 00111111
(byte) 0x7F //shift 7: 01111111
};
Then you first use Util.arrayCopyNonAtomic for quick swap of bytes if shift is larger than 8:
final static byte BITS = 8;
//swap whole bytes:
Util.arrayCopyNonAtomic(in, (short) (shift/BITS), out, (short) 0, (short) (8-(shift/BITS)));
Util.arrayCopyNonAtomic(in, (short) 0, out, (short) (8-(shift/BITS)), (short) (shift/BITS));
shift %= BITS; //now we need to shift only up to 8 remaining bits
if (shift > 0) {
//apply masks
byte mask = ROTL_MASK[shift];
byte comp = (byte) (8 - shift);
//rotate using masks
out[8] = in[0]; // out[8] is any auxiliary variable, careful with bounds!
out[0] = (byte)((byte)(in[0] << shift) | (byte)((in[1] >> comp) & mask));
out[1] = (byte)((byte)(in[1] << shift) | (byte)((in[2] >> comp) & mask));
out[2] = (byte)((byte)(in[2] << shift) | (byte)((in[3] >> comp) & mask));
out[3] = (byte)((byte)(in[3] << shift) | (byte)((in[4] >> comp) & mask));
out[4] = (byte)((byte)(in[4] << shift) | (byte)((in[5] >> comp) & mask));
out[5] = (byte)((byte)(in[5] << shift) | (byte)((in[6] >> comp) & mask));
out[6] = (byte)((byte)(in[6] << shift) | (byte)((in[7] >> comp) & mask));
out[7] = (byte)((byte)(in[7] << shift) | (byte)((in[8] >> comp) & mask));
}
You can additionally remove mask variable and use direct reference to the table instead.
Using this rather than naive implementation of bit-wise rotation proved to be about 450% - 500% faster.
It might help to cache CMAC subkeys when signing repeatedly using the same key (i.e. the same DESFire EV1 session key). The subkeys are always the same for the given key.
I think David's answer could be even faster if it used two local variables to cache the values read twice from the same offset of the input array (from my observations on JCOP, the array access is quite expensive even for transient arrays).
EDIT: I can provide the following implementation which does 4 bit right shift using short (32-bit int variant for cards supporting it would be even faster):
short pom=0; // X000 to be stored next
short pom2; // loaded value
short pom3; // 0XXX to be stored next
short curOffset=PARAMS_TRACK2_OFFSET;
while(curOffset<16) {
pom2=Util.getShort(mem_PARAMS, curOffset);
pom3=(short)(pom2>>>4);
curOffset=Util.setShort(mem_RAM, curOffset, (short)(pom|pom3));
pom=(short)(pom2<<12);
}
Beware, this code assumes same offsets in source and destination.
You can unroll this loop and use constant parameters if desired.

OCR algorithm (GOCR) to 32F429IDISCOVERY board

I'm trying to implement an OCR algorithm (GOCR algorithm specifically) to 32F429IDISCOVERY board and I'm still getting nothing back...
I'm recording a image from OV7670 camera in RGB565 format to SDRAM of the board that is then converted to greyscale and passed to the algorithm itself.
From this and other forums I got the impression that GOCR is very good algorithm and it seemed to be working very well on PC but I just cant get it to work on the board.
Does anyone have some experience with implementing OCR or GOCR? I am not sure where the problem is because it beaves in a very wierd way. The code stops in different part of the algorithm almost every time...
Calling the OCR algorithm:
void ocr_algorithm(char *output_str) {
job_t job1, *job; /* fixme, dont want global variables for lib */
job=OCR_JOB=&job1;
int linecounter;
const char *line;
uint8_t r,g,b;
uint32_t n,i,buffer;
char *p_pic;
uint32_t *image = (uint32_t*) SDRAM_START_ADR;
setvbuf(stdout, (char *) NULL, _IONBF, 0); /* not buffered */
job_init(job); /* init cfg and db */
job_init_image(job); /* single image */
p_pic = malloc(IMG_ROWS*IMG_COLUMNS);
// Converting RGB565 to grayscale
i=0;
for (n = 0; n < IMG_ROWS*IMG_COLUMNS; n++) {
if (n % 2 == 0){
buffer = image[i] & 0xFFFF;
}
else{
buffer = (image[i] >> 16) & 0xFFFF;
i++;
}
r = (uint8_t) ((buffer >> 11) & 0x1F);
g = (uint8_t) ((buffer >> 5) & 0x3F);
b = (uint8_t) (buffer & 0x1F);
// RGB888
r = ((r * 527) + 23) >> 6;
g = ((g * 259) + 33) >> 6;
b = ((b * 527) + 23) >> 6;
// Greyscale
p_pic[n] = 0.299*r + 0.587*g + 0.114*b;
}
//read_picture;
job->src.p.p = p_pic;
job->src.p.x = IMG_ROWS;
job->src.p.y = IMG_COLUMNS;
job->src.p.bpp = 1;
/* call main loop */
pgm2asc(job);
//print output
strcpy(output_str, "");
linecounter = 0;
line = getTextLine(&(job->res.linelist), linecounter++);
while (line) {
strcat(output_str, line);
strcat(output_str, "\n");
line = getTextLine(&(job->res.linelist), linecounter++);
}
free_textlines(&(job->res.linelist));
job_free_image(job);
free(p_pic);
}

What is the most efficient way to subtract signed integral data in binary (bits)?

I'm working in C on a PC, trying to leverage as little C++ as possible, working with binary data stored in unsigned char format, although other formats are certainly possible if worthwhile. The goal is subtracting two signed integer values (which can be ints, signed ints, longs, signed longs, signed shorts, etc.) in binary without converting to other data formats. The raw data is just prepackaged as unsigned char, though, with the user basically knowing which of the signed integer formats should be used for reading (i.e. we know how many bytes to read at once). Even though data is stored as an unsigned char array, data are meant to be read signed as two's-complement integers.
One common way we're often taught in school is adding the negative. Negation, in turn, is often taught to be performed as flipping bits and adding 1 (0x1), resulting in two additions (perhaps a bad thing?); or, as other posts point out, flipping bits past the first zero starting from the MSB. I'm wondering if there is a more efficient way, that may not be easily described as a pen-and-paper operation, but works because of the way data is stored in bit format. Here are some prototypes I've written, which may not be the most efficient way, but which summarizes my progress so far based on textbook methodology.
The addends are passed by reference in case I have to manually extend them to balance their length. Any and all feedback will be appreciated! Thanks in advance for considering.
void SubtractByte(unsigned char* & a, unsigned int & aBytes,
unsigned char* & b, unsigned int & bBytes,
unsigned char* & diff, unsigned int & nBytes)
{
NegateByte(b, bBytes);
// a - b == a + (-b)
AddByte(a, aBytes, b, bBytes, diff, nBytes);
// Restore b to its original state so input remains intact
NegateByte(b, bBytes);
}
void AddByte(unsigned char* & a, unsigned int & aBytes,
unsigned char* & b, unsigned int & bBytes,
unsigned char* & sum, unsigned int & nBytes)
{
// Ensure that both of our addends have the same length in memory:
BalanceNumBytes(a, aBytes, b, bBytes, nBytes);
bool aSign = !((a[aBytes-1] >> 7) & 0x1);
bool bSign = !((b[bBytes-1] >> 7) & 0x1);
// Add bit-by-bit to keep track of carry bit:
unsigned int nBits = nBytes * BITS_PER_BYTE;
unsigned char carry = 0x0;
unsigned char result = 0x0;
unsigned char a1, b1;
// init sum
for (unsigned int j = 0; j < nBytes; ++j) {
for (unsigned int i = 0; i < BITS_PER_BYTE; ++i) {
a1 = ((a[j] >> i) & 0x1);
b1 = ((b[j] >> i) & 0x1);
AddBit(&a1, &b1, &carry, &result);
SetBit(sum, j, i, result==0x1);
}
}
// MSB and carry determine if we need to extend:
if (((aSign && bSign) && (carry != 0x0 || result != 0x0)) ||
((!aSign && !bSign) && (result == 0x0))) {
++nBytes;
sum = (unsigned char*)realloc(sum, nBytes);
sum[nBytes-1] = (carry == 0x0 ? 0x0 : 0xFF); //init
}
}
void FlipByte (unsigned char* n, unsigned int nBytes)
{
for (unsigned int i = 0; i < nBytes; ++i) {
n[i] = ~n[i];
}
}
void NegateByte (unsigned char* n, unsigned int nBytes)
{
// Flip each bit:
FlipByte(n, nBytes);
unsigned char* one = (unsigned char*)malloc(nBytes);
unsigned char* orig = (unsigned char*)malloc(nBytes);
one[0] = 0x1;
orig[0] = n[0];
for (unsigned int i = 1; i < nBytes; ++i) {
one[i] = 0x0;
orig[i] = n[i];
}
// Add binary representation of 1
AddByte(orig, nBytes, one, nBytes, n, nBytes);
free(one);
free(orig);
}
void AddBit(unsigned char* a, unsigned char* b, unsigned char* c,
unsigned char* result) {
*result = ((*a + *b + *c) & 0x1);
*c = (((*a + *b + *c) >> 1) & 0x1);
}
void SetBit(unsigned char* bytes, unsigned int byte, unsigned int bit,
bool val)
{
// shift desired bit into LSB position, and AND with 00000001
if (val) {
// OR with 00001000
bytes[byte] |= (0x01 << bit);
}
else{ // (!val), meaning we want to set to 0
// AND with 11110111
bytes[byte] &= ~(0x01 << bit);
}
}
void BalanceNumBytes (unsigned char* & a, unsigned int & aBytes,
unsigned char* & b, unsigned int & bBytes,
unsigned int & nBytes)
{
if (aBytes > bBytes) {
nBytes = aBytes;
b = (unsigned char*)realloc(b, nBytes);
bBytes = nBytes;
b[nBytes-1] = ((b[0] >> 7) & 0x1) ? 0xFF : 0x00;
} else if (bBytes > aBytes) {
nBytes = bBytes;
a = (unsigned char*)realloc(a, nBytes);
aBytes = nBytes;
a[nBytes-1] = ((a[0] >> 7) & 0x1) ? 0xFF : 0x00;
} else {
nBytes = aBytes;
}
}
The first thing to notice is that signed vs. unsigned doesn't matter to the generated bit pattern in two's complement. All that changes is the interpretation of the result.
The second thing to notice is that an addition has carried if the result is less than either input when done with unsigned arithmetic.
void AddByte(unsigned char* & a, unsigned int & aBytes,
unsigned char* & b, unsigned int & bBytes,
unsigned char* & sum, unsigned int & nBytes)
{
// Ensure that both of our addends have the same length in memory:
BalanceNumBytes(a, aBytes, b, bBytes, nBytes);
unsigned char carry = 0;
for (int j = 0; j < nbytes; ++j) { // need to reverse the loop for big-endian
result[j] = a[j] + b[j];
unsigned char newcarry = (result[j] < a[j] || (unsigned char)(result[j]+carry) < a[j]);
result[j] += carry;
carry = newcarry;
}
}

Fastest ways to set and get a bit

I'm just trying to develop ultra-fast functions for setting and getting bits in uint32 arrays. For example, you can say "set bit 1035 to 1". Then, the uint32 indexed with 1035 / 32 is used with the bitposition 1035 % 32. I especially don't like the branching in the setbit function.
Here is my approach:
void SetBit(uint32* data, const uint32 bitpos, const bool newval)
{
if (newval)
{
//Set On
data[bitpos >> 5u] |= (1u << (31u - (bitpos & 31u)));
return;
}
else
{
//Set Off
data[bitpos >> 5u] &= ~(1u << (31u - (bitpos & 31u)));
return;
}
}
and
bool GetBit(const uint32* data, const uint32 bitpos)
{
return (data[bitpos >> 5u] >> (31u - (bitpos & 31u))) & 1u;
}
Thank you!
First, I would drop the 31u - ... from all expressions: all it does is reordering the bits in your private representation of the bit set, so you can flip this order without anyone noticing.
Second, you can get rid of the branch by using a clever bit hack:
void SetBit(uint32* data, const uint32 bitpos, const bool f)
{
uint32 &w = data[bitpos >> 5u];
uint32 m = 1u << (bitpos & 31u);
w = (w & ~m) | (-f & m);
}
Third, you can simplify your getter by letting the compiler do the conversion:
bool GetBit(const uint32* data, const uint32 bitpos)
{
return data[bitpos >> 5u] & (1u << (bitpos & 31u));
}

Looking for more details about "Group varint encoding/decoding" presented in Jeff's slides

I noticed that in Jeff's slides "Challenges in Building Large-Scale Information Retrieval Systems", which can also be downloaded here: http://research.google.com/people/jeff/WSDM09-keynote.pdf, a method of integers compression called "group varint encoding" was mentioned. It was said much faster than 7 bits per byte integer encoding (2X more). I am very interested in this and looking for an implementation of this, or any more details that could help me implement this by myself.
I am not a pro and new to this, and any help is welcome!
That's referring to "variable integer encoding", where the number of bits used to store an integer when serialized is not fixed at 4 bytes. There is a good description of varint in the protocol buffer documentation.
It is used in encoding Google's protocol buffers, and you can browse the protocol buffer source code.
The CodedOutputStream contains the exact encoding function WriteVarint32FallbackToArrayInline:
inline uint8* CodedOutputStream::WriteVarint32FallbackToArrayInline(
uint32 value, uint8* target) {
target[0] = static_cast<uint8>(value | 0x80);
if (value >= (1 << 7)) {
target[1] = static_cast<uint8>((value >> 7) | 0x80);
if (value >= (1 << 14)) {
target[2] = static_cast<uint8>((value >> 14) | 0x80);
if (value >= (1 << 21)) {
target[3] = static_cast<uint8>((value >> 21) | 0x80);
if (value >= (1 << 28)) {
target[4] = static_cast<uint8>(value >> 28);
return target + 5;
} else {
target[3] &= 0x7F;
return target + 4;
}
} else {
target[2] &= 0x7F;
return target + 3;
}
} else {
target[1] &= 0x7F;
return target + 2;
}
} else {
target[0] &= 0x7F;
return target + 1;
}
}
The cascading ifs will only add additional bytes onto the end of the target array if the magnitude of value warrants those extra bytes. The 0x80 masks the byte being written, and the value is shifted down. From what I can tell, the 0x7f mask causes it to signify the "last byte of encoding". (When OR'ing 0x80, the highest bit will always be 1, then the last byte clears the highest bit (by AND'ing 0x7f). So, when reading varints you read until you get a byte with a zero in the highest bit.
I just realized you asked about "Group VarInt encoding" specifically. Sorry, that code was about basic VarInt encoding (still faster than 7-bit). The basic idea looks to be similar. Unfortunately, it's not what's being used to store 64bit numbers in protocol buffers. I wouldn't be surprised if that code was open sourced somewhere though.
Using the ideas from varint and the diagrams of "Group varint" from the slides, it shouldn't be too too hard to cook up your own :)
Here is another page describing Group VarInt compression, which contains decoding code. Unfortunately they allude to publicly available implementations, but they don't provide references.
void DecodeGroupVarInt(const byte* compressed, int size, uint32_t* uncompressed) {
const uint32_t MASK[4] = { 0xFF, 0xFFFF, 0xFFFFFF, 0xFFFFFFFF };
const byte* limit = compressed + size;
uint32_t current_value = 0;
while (compressed != limit) {
const uint32_t selector = *compressed++;
const uint32_t selector1 = (selector & 3);
current_value += *((uint32_t*)(compressed)) & MASK[selector1];
*uncompressed++ = current_value;
compressed += selector1 + 1;
const uint32_t selector2 = ((selector >> 2) & 3);
current_value += *((uint32_t*)(compressed)) & MASK[selector2];
*uncompressed++ = current_value;
compressed += selector2 + 1;
const uint32_t selector3 = ((selector >> 4) & 3);
current_value += *((uint32_t*)(compressed)) & MASK[selector3];
*uncompressed++ = current_value;
compressed += selector3 + 1;
const uint32_t selector4 = (selector >> 6);
current_value += *((uint32_t*)(compressed)) & MASK[selector4];
*uncompressed++ = current_value;
compressed += selector4 + 1;
}
}
I was looking for the same thing and found this GitHub project in Java:
https://github.com/stuhood/gvi/
Looks promising !
Instead of decoding with bitmask, in c/c++ you could use predefined structures that corresponds to the value in the first byte.. complete example that uses this: http://www.oschina.net/code/snippet_12_5083
Another Java implementation for groupvarint: https://github.com/catenamatteo/groupvarint
But I suspect the very large switch has some drawback in Java

Resources