ASCII 7x5 side-feeding characters for led modules

ASCII 7x5 side-feeding characters for led modules - ascii

I am looking at the code for the font file here:
http://www.openobject.org/opensourceurbanism/Bike_POV_Beta_4
The code starts like this:
const byte font[][5] = {
{0x00,0x00,0x00,0x00,0x00}, // 0x20 32
{0x00,0x00,0x6f,0x00,0x00}, // ! 0x21 33
{0x00,0x07,0x00,0x07,0x00}, // " 0x22 34
{0x14,0x7f,0x14,0x7f,0x14}, // # 0x23 35
{0x00,0x07,0x04,0x1e,0x00}, // $ 0x24 36
{0x23,0x13,0x08,0x64,0x62}, // % 0x25 37
{0x36,0x49,0x56,0x20,0x50}, // & 0x26 38
{0x00,0x00,0x07,0x00,0x00}, // ' 0x27 39
{0x00,0x1c,0x22,0x41,0x00}, // ( 0x28 40
{0x00,0x41,0x22,0x1c,0x00}, // ) 0x29 41
{0x14,0x08,0x3e,0x08,0x14}, // * 0x2a 42
{0x08,0x08,0x3e,0x08,0x08}, // + 0x2b 43
and so on...
I am very confused as to how this code works - can someone explain it to me please?
Thanks,
Majd

Each array of 5 bytes = 40 bits which map to the 7x5 = 35 pixels in the character grid (there are 5 unused bits presumably).
When you want to display a character you copy the corresponding 5 byte bitmap for that character to the appropriate memory location. E.g. to display the character X you would copy the data from font['X'].

Related

how to strip out non-human-readable character at the start of each line using Xcode

I am trying to set up Xcode to get rid of non-human readable characters in legacy text files recovered from 8” floppy disks created in 1986. The files were created in QDOS, a proprietary disk operating system using a text-based Music Composition Language application aka MCL.
I aim to write a C program to read the ascii file, character by character, filter out non-printable characters from the source file and save it to a destination file thereby making it possible to view file contents in exactly the same format a composer would have seen it in 1986.
When Xcode reads the legacy text file, the unwanted character appears as the first human readable character of every line except the first line.
!B=24:Af
* BAR 1
G2,6
* BAR 2 & 3
!G2,1/4:Bf2,1/4:C2,1/4:Ef2,1/4:F3,1/4:G3,35/4:D3:A4
"* BAR 4
#Bf4:G4,2:D3:A4:Bf4
$* BAR 5
%D4,2:C4,3:F5
&* BAR 6
'D4:Bf4:A4,2:G4:D3:?
(* BAR 7 &
A hex dump of the above text file shows the two ascii bytes $0D (Carriage Return) followed by $1C (File Separator). These two bytes plus the byte that follows immediately after them, are the characters I am trying to remove.
0000: 1C 1D 21 42 3D 32 34 3A 41 66 0A 1C 1E 2A 20 20 ¿¿!B=24:Af¬¿¿*
0010: 20 20 20 20 20 20 20 20 20 42 41 52 20 31 0A 1C BAR 1¬¿
0020: 1F 47 32 2C 36 0A 1C 20 2A 20 20 20 20 20 20 20 ¿G2,6¬¿ *
0030: 20 20 20 20 42 41 52 20 32 20 26 20 33 0A 1C 21 BAR 2 & 3¬¿!
0040: 47 32 2C 31 2F 34 3A 42 66 32 2C 31 2F 34 3A 43 G2,1/4:Bf2,1/4:C
0050: 32 2C 31 2F 34 3A 45 66 32 2C 31 2F 34 3A 46 33 2,1/4:Ef2,1/4:F3
0060: 2C 31 2F 34 3A 47 33 2C 33 35 2F 34 3A 44 33 3A ,1/4:G3,35/4:D3:
0070: 41 34 0A 1C 22 2A 20 20 20 20 20 20 20 20 20 20 A4¬¿"*
0080: 20 42 41 52 20 34 20 0A 1C 23 42 66 34 3A 47 34 BAR 4 ¬¿#Bf4:G4
0090: 2C 32 3A 44 33 3A 41 34 3A 42 66 34 0A 1C 24 2A ,2:D3:A4:Bf4¬¿$*
00A0: 20 20 20 20 20 20 20 20 20 20 20 42 41 52 20 35 BAR 5
00B0: 0A 1C 25 44 34 2C 32 3A 43 34 2C 33 3A 46 35 0A ¬¿%D4,2:C4,3:F5¬
00C0: 1C 26 2A 20 20 20 20 20 20 20 20 20 20 20 42 41 ¿&* BA
00D0: 52 20 36 0A 1C 27 44 34 3A 42 66 34 3A 41 34 2C R 6¬¿'D4:Bf4:A4,
00E0: 32 3A 47 34 3A 44 33 3A 3F 0A 1C 28 2A 20 20 20 2:G4:D3:?¬¿(*
00F0: 20 20 20 20 20 20 20 20 42 41 52 20 37 20 26 20 BAR 7 &
I created an Xcode Command Line Tool Project. When I select Type : Plain Text and Text Encoding : Unicode (UTF-8) in the Xcode Inspectors Window the same single printable character is visible. I chose those settings because my MacOS expects en_AU.UTF-8.
The C code that follows makes an identical copy of the text file without identifying individual characters. Essentially it will read old file contents and write a new file successfully. The hex dump for the output file is identical to the hex dump above.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, const char * argv[]) {
char filename[] = {"~/Desktop/MCLRead/bell1.ss"} ;
printf("MCLRead\n\t%s\n", filename);
FILE* fin = fopen(filename, "r");
if (!fin) { perror("input error"); return 0; }
FILE* fout = fopen("output.txt", "w");
if (!fout) { perror("fout error"); return 0; }
fseek(fin, 0, SEEK_END); // go to the end of file
size_t filesize = ftell(fin); // get file size
fseek(fin, 0, SEEK_SET); // go back to the beginning
//allocate enough memory
char* buffer = malloc(filesize * sizeof(char));
//read one character at a time (or `fread` the whole file)
size_t i = 0;
while (1)
{
int c = fgetc(fin);
if (c == EOF) break;
//save to buffer
buffer[i++] = (char)c;
}
However when I compile, build and run this in Xcode the characters are unrecognisable regardless of the Type or Text Encoding settings in the Xcode Inspectors Window. The following error message appears in the Console Window
error: No such file or directory
Program ended with exit code: 0
When I run the same code in the Terminal Window it generates an output text file but the characters are unrecognisable
Desktop % gcc main.c
Desktop % ./a.out output.txt
Desktop % cat output.txt
cat results in a string of 128 ? characters in the Terminal Command Line - a total of 128 even though the file contains more than a thousand characters in total.
Can someone give me any clues for making this text file readable in a format that allows the non-human-readable characters to be stripped from the start of each line.
Please note, I am not asking for help to write the C code but rather what Text Format will make the unwanted 8-bit characters readable so I can remove them (a slight refinement on the question I asked initially). Any further help would be most appreciated. Thanks in advance.
Note
This post has been revised in response to comments.
The hex dump has been done as text rather than as an image. This offers the most reliable way to share the text file for anyone who wants to test what I have done

The problem can be solved easily by reading each byte as a 7-bit binary value using int not char. Source file is read in hex, saved in decimal and read as text.
Note. There is no EOF character. MCL used the word 'END' at the end of the file. Because it has been salvaged from a floppy disk image, the file sometimes has a trailing string of hex E5 characters written on the floppy disk when it was formatted. At other times where the format track is already overwritten the file has a trailing string of zeros.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CR 0x0D // ASCII Carriage Return
#define FS 0x1C // ASCII File Separator
#define FD_FORMAT 0xE5 // floppy disk format track
int main(int argc, const char * argv[])
{
char fname[20];
printf("\n Enter MCL file name : ");
scanf("%s", fname);
printf("\n\t%s\n", fname);
int a = 0; // init CR holder
int b = a; // init File Separator holder
FILE* fin = fopen(fname, "r"); // init read
if (!fin)
{ perror("input error"); return 0;
}
FILE* fout = fopen("output.txt", "w"); // init write
if (!fout)
{ perror("fout error"); return 0;
}
fseek(fin, 0, SEEK_END); // look for end of file
size_t fsize = ftell(fin); // get file size
fseek(fin, 0, SEEK_SET); // go back to the start
int* buffer = malloc(fsize * sizeof(int)); // allocate buffer
size_t i = 0;
while (1)
{
int c = fgetc(fin); // read one byte at a time
if (c < CR) break; // skip low control codes
if (c == FD_FORMAT) break; // skip floppy format track
printf("\t%X", a);
printf("\t%X", b);
if ((a != CR) && (b != FS)) // skip save if new line
{
printf("\t%0X\n", c);
buffer[i++] = c; // save to buffer
}
a = b;
b = c;
}
for (i = 0; i < fsize; i++) // write out int by int
fputc(buffer[i], fout);
free(buffer);
fclose(fin);
fclose(fout);
return 0;
}

How to explain embedded message binary wire format of protocol buffer?

I'm trying to understand protocol buffer encoding method, when translating message to binary(or hexadecimal) format, I can't understand how the embedded message is encoded.
I guess maybe it's related to memory address, but I can't find the accurate relationship.
Here is what i've done.
Step 1: I defined two messages in test.proto file,
syntax = "proto3";
package proto_test;
message Education {
string college = 1;
}
message Person {
int32 age = 1;
string name = 2;
Education edu = 3;
}
Step 2: And then I generated some go code,
protoc --go_out=. test.proto
Step 3: Then I check the encoded format of the message,
p := proto_test.Person{
Age: 666,
Name: "Tom",
Edu: &proto_test.Education{
College: "SOMEWHERE",
},
}
var b []byte
out, err := p.XXX_Marshal(b, true)
if err != nil {
log.Fatalln("fail to marshal with error: ", err)
}
fmt.Printf("hexadecimal format:% x \n", out)
fmt.Printf("binary format:% b \n", out)
which outputs,
hexadecimal format:08 9a 05 12 03 54 6f 6d 1a fd 96 d1 08 0a 09 53 4f 4d 45 57 48 45 52 45
binary format:[ 1000 10011010 101 10010 11 1010100 1101111 1101101 11010 11111101 10010110 11010001 1000 1010 1001 1010011 1001111 1001101 1000101 1010111 1001000 1000101 1010010 1000101]
what I understand is ,
08 - int32 wire type with tag number 1
9a 05 - Varints for 666
12 - string wire type with tag number 2
03 - length delimited which is 3 byte
54 6f 6d - ascii for "TOM"
1a - embedded message wire type with tag number 3
fd 96 d1 08 - ? (here is what I don't understand)
0a - string wire type with tag number 1
09 - length delimited which is 9 byte
53 4f 4d 45 57 48 45 52 45 - ascii for "SOMEWHERE"
What does fd 96 d1 08 stands for?
It seems like that d1 08 always be there, but fd 96 sometimes change, don't know why. Thanks for answering :)
Add
I debugged the marshal process and reported a bug here.

At that location I/you would expect the number of bytes in the embedded message.
I have repeated your experiment in Python.
msg = Person()
msg.age = 666
msg.name = "Tom"
msg.edu.college = "SOMEWHERE"
I got a different result, the one I would expect. A varint stating the size of the embedded message.
0x08
0x9A, 0x05
0x12
0x03
0x54 0x6F 0x6D
0x1A
0x0B <- Equals to 11 decimal.
0x0A
0x09
0x53 0x4F 0x4D 0x45 0x57 0x48 0x45 0x52 0x45
Next I deserialized your bytes:
msg2 = Person()
str = bytearray(b'\x08\x9a\x05\x12\x03\x54\x6f\x6d\x1a\xfd\x96\xd1\x08\x0a\x09\x53\x4f\x4d\x45\x57\x48\x45\x52\x45')
msg2.ParseFromString(str)
print(msg2)
The result of this is perfect:
age: 666
name: "Tom"
edu {
college: "SOMEWHERE"
}
The conclusion I come to is that there are some different ways of encoding in Protobuf. I do not know what is done in this case but I know the example of a negative 32 bit varint. A positive varint is encoded in five bytes, a negative value is cast as a 64 bit value and encoded to ten bytes.

What's wrong with my PNG IDAT chunk?

I'm trying to manually construct a simple, 4x1, uncompressed PNG.
So far, I have:
89504E47 // PNG Header
0D0A1A0A
0000000D // byte length of IHDR chunk contents, 4 bytes, value 13
49484452 // IHDR start - 4 bytes
00000004 // Width 4 bytes }
00000001 // Height 4 bytes }
08 // bit depth 8 = 24/32 bit 1 byte }
06 // color type, 6 - RGBa 1 byte }
00 // compression, 0 = Deflate 1 byte }
00 // filter, 0 = no filter 1 byte }
00 // interlace, 0 = no interlace 1 byte } Total, 13 Bytes
F93C0FCD // CRC of IHDR chunk, 4 bytes
00000013 // byte length of IDAT chunk contents, 4 bytes, value 19
49444154 // IDAT start - 4 bytes
0000 // ZLib 0 compression, 2 bytes }
00 // Filter = 0, 1 bytes }
CC0000FF // Pixel 1, Red-ish, 4 bytes }
00CC00FF // Pixel 2, Green-ish, 4 bytes }
0000CCFF // Pixel 3, Blue-ish, 4 bytes }
CCCCCCCC // Pixel 4, transclucent grey, 4 bytes } Total, 19 Bytes
6464C2B0 // CRC of IHDR chunk, 4 bytes
00000000 // byte length of IEND chunk, 4 bytes (value: 0)
49454E44 // IEND start - 4 bytes
AE426082 // CRC of IEND chunk, 4 bytes
Update
I think the issue I'm having is down to the ZLib/Deflate ordering.
I think that I have to include the "Non-compressed blocks format" details from RFC 1951, sec. 3.2.4, but I'm a little unsure as to the interactions. The only examples I can find are for Compressed blocks (understandably!)
So I've now tried:
49444154 // IDAT start - 4 bytes
01 // BFINAL = 1, BTYPE = 00 1 byte }
11EE // LEN & NLEN of data 2 bytes }
00 // Filter = 0, 1 byte }
CC0000FF // Pixel 1, Red-ish, 4 bytes }
00CC00FF // Pixel 2, Green-ish, 4 bytes }
0000CCFF // Pixel 3, Blue-ish, 4 bytes }
CCCCCCCC // Pixel 4, transclucent grey, 4 bytes } Total, 19 Bytes
6464C2B0 // CRC of IHDR chunk, 4 bytes
So the whole PNG file is:
89504E47 // PNG Block
0d0a1a0A
0000000D // IHDR Block
49484452
00000004
00000001
08060000
00
F93C0FCD
00000014 // IDAT Block
49444154
0111EE00
CC0000FF
00CC00FF
0000CCFF
CCCCCCCC
6464C2B0
00000000 // IEND Block
49454E44
AE426082
I'd be really grateful for some pointers as to where the issue lies... or even the PNG data for a working file so that I can reverse-engineer it?
Update 2
Thanks to Mark Adler, I've corrected my newbie errors, and now have functional code that can reproduce the result shown in his answer below, i.e. 4x1 pixel image. From this I can now happily produce a 100x1 image!
However, as a last step, I'd hoped, by tweaking the height field in the IHDR and adding additional non-terminal IDATs, to extend this to say a 4 x 2 image. Unfortunately this doesn't appear to work the way I'd expected.
I now have something like...
89504E47 // PNG Header
0D0A1A0A
0000000D // re calc'ed IHDR with 2 rows
49484452
00000004
00000002 // changed - now 2 rows
08
06
00
00
00
7FA87D63 // CRC of IHDR updated
0000001C // row 1 IDAT, non-terminal
49444154
7801
00 // BFINAL = 0, BTYPE = 00
1100EEFF
00
CC0000FF
00CC00FF
0000CCFF
CCCCCCCC
3D3A0892
5D19A623
0000001C // row 2, terminal IDAT, as Mark Adler's answer
49444154
7801
01 // BFINAL = 1, BTYPE = 00
1100EEFF
00
CC0000FF
00CC00FF
0000CCFF
CCCCCCCC
3D3A0892
BA0400B4
00000000
49454E44
AE426082

This:
11EE // LEN & NLEN of data 2 bytes }
is wrong. LEN and NLEN are both 16 bits, not 8 bits. So that needs to be:
1100EEFF // LEN & NLEN of data 4 bytes }
You also need a zlib wrapper around the deflate data. See RFC 1950.
Lastly you will need to update the CRC of the chunk. (Which has the wrong comment by the way -- it should say CRC of IDAT chunk.)
Thusly repaired:
89504E47 // PNG Header
0D0A1A0A
0000000D // byte length of IHDR chunk contents, 4 bytes, value 13
49484452 // IHDR start - 4 bytes
00000004 // Width 4 bytes }
00000001 // Height 4 bytes }
08 // bit depth 8 = 24/32 bit 1 byte }
06 // color type, 6 - RGBa 1 byte }
00 // compression, 0 = Deflate 1 byte }
00 // filter, 0 = no filter 1 byte }
00 // interlace, 0 = no interlace 1 byte } Total, 13 Bytes
F93C0FCD // CRC of IHDR chunk, 4 bytes
0000001C // byte length of IDAT chunk contents, 4 bytes, value 28
49444154 // IDAT start - 4 bytes
7801 // zlib Header 2 bytes }
01 // BFINAL = 1, BTYPE = 00 1 byte }
1100EEFF // LEN & NLEN of data 4 bytes }
00 // Filter = 0, 1 byte }
CC0000FF // Pixel 1, Red-ish, 4 bytes }
00CC00FF // Pixel 2, Green-ish, 4 bytes }
0000CCFF // Pixel 3, Blue-ish, 4 bytes }
CCCCCCCC // Pixel 4, transclucent grey, 4 bytes }
3d3a0892 // Adler-32 check 4 bytes }
ba0400b4 // CRC of IDAT chunk, 4 bytes
00000000 // byte length of IEND chunk, 4 bytes (value: 0)
49454E44 // IEND start - 4 bytes
AE426082 // CRC of IEND chunk, 4 bytes

Measuring memory access time x86

I try to measure cached / non cached memory access time and results confusing me.
Here is the code:
1 #include <stdio.h>
2 #include <x86intrin.h>
3 #include <stdint.h>
4
5 #define SIZE 32*1024
6
7 char arr[SIZE];
8
9 int main()
10 {
11 char *addr;
12 unsigned int dummy;
13 uint64_t tsc1, tsc2;
14 unsigned i;
15 volatile char val;
16
17 memset(arr, 0x0, SIZE);
18 for (addr = arr; addr < arr + SIZE; addr += 64) {
19 _mm_clflush((void *) addr);
20 }
21 asm volatile("sfence\n\t"
22 :
23 :
24 : "memory");
25
26 tsc1 = __rdtscp(&dummy);
27 for (i = 0; i < SIZE; i++) {
28 asm volatile (
29 "mov %0, %%al\n\t" // load data
30 :
31 : "m" (arr[i])
32 );
33
34 }
35 tsc2 = __rdtscp(&dummy);
36 printf("(1) tsc: %llu\n", tsc2 - tsc1);
37
38 tsc1 = __rdtscp(&dummy);
39 for (i = 0; i < SIZE; i++) {
40 asm volatile (
41 "mov %0, %%al\n\t" // load data
42 :
43 : "m" (arr[i])
44 );
45
46 }
47 tsc2 = __rdtscp(&dummy);
48 printf("(2) tsc: %llu\n", tsc2 - tsc1);
49
50 return 0;
51 }
the output:
(1) tsc: 451248
(2) tsc: 449568
I expected, that first value would be much larger because caches were invalidated by clflush in case (1).
Info about my cpu (Intel(R) Core(TM) i7 CPU Q 720 # 1.60GHz) caches:
Cache ID 0:
- Level: 1
- Type: Data Cache
- Sets: 64
- System Coherency Line Size: 64 bytes
- Physical Line partitions: 1
- Ways of associativity: 8
- Total Size: 32768 bytes (32 kb)
- Is fully associative: false
- Is Self Initializing: true
Cache ID 1:
- Level: 1
- Type: Instruction Cache
- Sets: 128
- System Coherency Line Size: 64 bytes
- Physical Line partitions: 1
- Ways of associativity: 4
- Total Size: 32768 bytes (32 kb)
- Is fully associative: false
- Is Self Initializing: true
Cache ID 2:
- Level: 2
- Type: Unified Cache
- Sets: 512
- System Coherency Line Size: 64 bytes
- Physical Line partitions: 1
- Ways of associativity: 8
- Total Size: 262144 bytes (256 kb)
- Is fully associative: false
- Is Self Initializing: true
Cache ID 3:
- Level: 3
- Type: Unified Cache
- Sets: 8192
- System Coherency Line Size: 64 bytes
- Physical Line partitions: 1
- Ways of associativity: 12
- Total Size: 6291456 bytes (6144 kb)
- Is fully associative: false
- Is Self Initializing: true
Code disassembly between two rdtscp instructions
400614: 0f 01 f9 rdtscp
400617: 89 ce mov %ecx,%esi
400619: 48 8b 4d d8 mov -0x28(%rbp),%rcx
40061d: 89 31 mov %esi,(%rcx)
40061f: 48 c1 e2 20 shl $0x20,%rdx
400623: 48 09 d0 or %rdx,%rax
400626: 48 89 45 c0 mov %rax,-0x40(%rbp)
40062a: c7 45 b4 00 00 00 00 movl $0x0,-0x4c(%rbp)
400631: eb 0d jmp 400640 <main+0x8a>
400633: 8b 45 b4 mov -0x4c(%rbp),%eax
400636: 8a 80 80 10 60 00 mov 0x601080(%rax),%al
40063c: 83 45 b4 01 addl $0x1,-0x4c(%rbp)
400640: 81 7d b4 ff 7f 00 00 cmpl $0x7fff,-0x4c(%rbp)
400647: 76 ea jbe 400633 <main+0x7d>
400649: 48 8d 45 b0 lea -0x50(%rbp),%rax
40064d: 48 89 45 e0 mov %rax,-0x20(%rbp)
400651: 0f 01 f9 rdtscp
Looks like I'am missing / misunderstand something. Could you suggest?

mov %0, %%al is so slow (one cache line per 64 clocks, or per 32 clocks on Sandybridge specifically (not Haswell or later)) that you might bottleneck on that whether or not your loads are ultimately coming from DRAM or L1D.
Only every 64-th load will miss in cache, because you're taking full advantage of spatial locality with your tiny byte-load loop. If you actually wanted to test how fast the cache can refill after flushing an L1D-sized block, you should use a SIMD movdqa loop, or just byte loads with a stride of 64. (You only need to touch one byte per cache line).
To avoid the false dependency on the old value of RAX, you should use movzbl %0, %eax. This will let Sandybridge and later (or AMD since K8) use their full load throughput of 2 loads per clock to keep the memory pipeline closer to full. Multiple cache misses can be in flight at once: Intel CPU cores have 10 LFBs (line fill buffers) for lines to/from L1D, or 16 Superqueue entries for lines from L2 to off-core. See also Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?. (Many-core Xeon chips have worse single-thread memory bandwidth than desktops/laptops.)
But your bottleneck is far worse than that!
You compiled with optimizations disabled, so your loop uses addl $0x1,-0x4c(%rbp) for the loop counter, which gives you at least a 6-cycle loop-carried dependency chain. (Store/reload store-forwarding latency + 1 cycle for the ALU add.) http://agner.org/optimize/
(Maybe even higher because of resource conflicts for the load port. i7-720 is a Nehalem microarchitecture, so there's only one load port.)
This definitely means your loop doesn't bottleneck on cache misses, and will probably run about the same speed whether you used clflush or not.
Also note that rdtsc counts reference cycles, not core clock cycles. i.e. it will always count at 1.7GHz on your 1.7GHz CPU, regardless of the CPU running slower (powersave) or faster (Turbo). Control for this with a warm-up loop.
You also didn't declare a clobber on eax, so the compiler isn't expecting your code to modify rax. You end up with mov 0x601080(%rax),%al. But gcc reloads rax from memory every iteration, and doesn't use the rax that you modify, so you aren't actually skipping around in memory like you might be if you'd compiled with optimizations.
Hint: use volatile char * if you want to get the compiler to actually load, and not optimize it to fewer wider loads. You don't need inline asm for this.

What's the memory layout of UTF-16 encoded strings with Visual Studio 2015?

WinAPI uses wchar_t buffers. As I understand we need to use UTF-16 to encode all our arguments to WinAPI.
We have two versions of UTF-16: UTF-16be and UTF-16le. Let encode a string "Example" 0x45 0x78 0x61 0x6d 0x70 0x6c 0x65. With UTF-16be bytes should be placed as this: 00 45 00 78 00 61 00 6d 00 70 00 6c 00 65. With UTF-16le it should be 45 00 78 00 61 00 6d 00 70 00 6c 00 65 00. (We are omitting BOM). Byte representations of the same string are different.
According to the docs Windows uses UTF-16le. This means that we should encode all strings with UTF-16le or it would not work.
At the same time my compiler (VS2015) uses UTF-16be for the strings that I hard coded into my code (smth like L"my test string"). But WinAPI works well with these strings. Why it works? What am I missing?
Update 1:
To test byte representation of hard coded strings I used following code:
std::string charToHex(wchar_t ch)
{
const char alphabet[] = "0123456789ABCDEF";
std::string result(4, ' ');
result[0] = alphabet[static_cast<unsigned int>((ch & 0xf000) >> 12)];
result[1] = alphabet[static_cast<unsigned int>((ch & 0xf00) >> 8)];
result[2] = alphabet[static_cast<unsigned int>((ch & 0xf0) >> 4)];
result[3] = alphabet[static_cast<unsigned int>(ch & 0xf)];
return std::move(result);
}

Little endian or big endian describes the way that variables of more than 8 bits are stored in memory. The test you have devised doesn't test memory layout, it's working with wchar_t types directly; the upper bits of an integer type are always the upper bits, no matter if the CPU is big endian or little endian!
This modification to your code will show how it really works.
std::string charToHex(wchar_t * pch)
{
const char alphabet[] = "0123456789ABCDEF";
std::string result;
unsigned char * pbytes = static_cast<unsigned char *>(pch);
for (int i = 0; i < sizeof(wchar_t); ++i)
{
result.push_back(alphabet[(pbytes[i] & 0xf0) >> 4];
result.push_back(alphabet[pbytes[i] & 0x0f];
}
return std::move(result);
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio