How is memory allocated to array of const char* - char

Memory is allocted in a contiguous manner in integer array. But then how is it allocated to a array of strings.
Suppose I have array of string like this:
const char *A[] = {"abcx" , "dbba" , "cccc"};
does every character of string get its own address. Something like this:
1000 -> a
1001 -> b
1002 -> c
1003 -> x
1004 -> \0
1005 -> d
1006 -> b
1007 -> b
1008 -> a
1009 -> \0
.
.
When I perform this operation.
char var = *(A+1) - *A + 1;
cout<<(*A+var)<<endl;
Output: 'bba'
So what is happening under the hood?

const char *A[] = {"abcx" , "dbba" , "cccc"}; is not an array of strings. It is an array of pointers to strings.
For each of "abcx", "dbba", and "cccc", the compiler creates and initializes an array of char, and it assigns memory for those arrays, commonly in a read-only data section of the program. They might be next to each other in that order, they might be in a different order, or they might be scattered and mingled with other data.
Then const char *A[] defines A to be an array of pointers to char. The compiler will figure out how many elements are in that array when it counts the initializers, and it will allocate space for that array. Since there are three initializers, it will provide space for three elements of type char *.
Then = {"abcx" , "dbba" , "cccc"} will cause the compiler to initialize each array element to the first character of the corresponding string.
The total space for A, which you might evaluate as (char *) &A[3] - (char *) &A[0], will be three times the space for a char *, which would be 12 or 24 bytes in C++ implementations that use four or eight bytes for pointers to char, respectively.
In *(A+1) - *A + 1, *(A+1) is element 1 of the array, A[1]. This element is a pointer to the first byte of "dbba". And *A is element 0 of the array, A[0], which is a pointer to the first byte of "abcx". As the compiler has discretion about where to put those strings, we generally cannot expect any particular relationship between them, and C++ does not define the results of subtracting these two pointers.
In C++ implementations that use a flat (simple) address space, *(A+1) - *A + 1 may operate on the memory addresses in the natural way. (This cannot be relied upon, because a compiler does not necessarily implement this undefined expression by performing the nominally indicated arithmetic. If it recognizes the undefined behavior, optimization by the compiler may transform it into any other code.) If the arithmetic is performed in that way, and the later evaluation of (*A+var) is performed similarly, then the result may be effectively *(A+1) - *A + 1 + *A = *(A+1) + 1, which is a pointer to character 1 of "dbba" (starting from “d” as character 0). So it is unsurprising that the output is “bba”.

Related

Go multiple assignment order of pointers and slices

I need your help with a question, go docs say:
"The assignment proceeds in two phases. First, the operands of index expressions and pointer indirections (including implicit pointer indirections in selectors) on the left and the expressions on the right are all evaluated in the usual order. Second, the assignments are carried out in left-to-right order." (Assignment statements)
From the text above I can assume that pointers and index expressions should be carried out in the standard order together, but it looks like Go carry out first indexes, then pointers, then everything else.
x := []int{1}
var a *[]int
a = &x
x[0], *a, x[0] = 1, []int{1, 2}, (*a)[1]
//result: index out of range [1] with length 1 (*a)[1]
however, I expected then *a will have a new slice capacity of 2, but it is not.
another example is to test the order of pointers and slices:
x[0], *a, x[0] = 1, []int{1, 2}, 999 //result: [1,2]
I expected during the left-right order, *a and x should have a new slice, and the expected result is [999,2].
To be more sure we can modify the previous example to:
*a, x[0] = nil, 666 //result: [] - but not a panic
It looks like Go has Three phases
Carry out all indexes
Carry out all pointers
Carry out everything else
Am I understanding it right, what is the real order of pointers and slices?
Thanks in advance!
what is the real order of pointers and slices?
Left to right, just as the docs say.
You've quoted the right section of the spec to answer your question, but it seems you misunderstand the language used. Read it plainly:
First, the operands of index expressions and pointer indirections (including implicit pointer indirections in selectors) on the left and the expressions on the right are all evaluated in the usual order. Second, the assignments are carried out in left-to-right order.
Now look at the first example:
x := []int{1}
var a *[]int
a = &x
x[0], *a, x[0] = 1, []int{1, 2}, (*a)[1]
When (*a)[1] is evaluated, none of the assignments on that line are carried out yet. Hence, the words "First" and "Second" in the quoted section. So, it tries to index []int{1}[1], which is invalid.
For the second example, all you must understand is that the expression x[0] corresponds to the 0 slot of slice x when the expression is evaluated. It doesn't matter if x gets reassigned after x[0] is evaluated, the already evaluated x[0] will still correspond to the 0 slot of the original slice.
The third example uses the same knowledge as the second.
The subtlety you may have not understood before is that index expressions and pointer indirections do not yield values, they yield variables. Slice/array elements are also considered to be variables for this purpose, so you can imagine a slice's underlying data as a series of distinct variables stored back-to-back. Thus, an index expression of x[0] resolves to some specific variable in memory that no longer depends on the value of x whatsoever. Remember, x is not a slice per se. x is just a variable that can denote some slice, or no slice at all, and that can change throughout the lifetime of x.

Hash function for strings without respecting char's position

The title of my question is self-descriptive.
I need to hash a structure of three 64bit variables (I'll convert them to a string of chars) each one contains a hand of cards - card game app, so swapping of some chars in these variables should produce the same hash.
One approach would be to sort resulting string. Is there any better solution?
If the representation of a hand is similar to a bit set, it is already unordered. For example, if you use a combination of bit masks to represent a combination of cards, say, like this
A♠ - 0x00000001
2♠ - 0x00000002
3♠ - 0x00000004
4♠ - 0x00000008
...
K♠ - 0x00001000
A♥ - 0x00002000
2♥ - 0x00004000
...
then you can represent hands using bit combinations, like this:
A♠ 4♠ 2♥ - 0x00004009
This representation is position-independent, i.e. hands 4♠ A♠ 2♥ and 2♥ 4♠ A♠ would have exactly the same representation as A♠ 4♠ 2♥. You can convert this representation to a string as necessary by iterating the individual bits, and adding a card to the string representation each time that you discover a bit that is set to 1.
A representation like this can be used to compute a 32-bit hash code by XOR-ing the upper 32 bits of the representation with the lower 32 bits:
uint64_t hand = ... // A representation of hand similar to what's described above
uint32_t hash = (uint32_t)(hand ^ (hand >> 32));
Currently my cards are presented as bytes, but bits in two cards can overlap: A♣ = 0x11; 10♣=0x12; K♣=0x13 ... and so on.
You can convert this representation to the one described above when computing the hash code, and avoid sorting that way:
// Each card is a number from 1 to 53, inclusive
uint8_t hand[HAND_SIZE] = ...; // The hand
uint64_t set = 0;
for (int i = 0 ; i != HAND_SIZE ; i++) {
set |= (1LL << hand[i]);
}
uint32_t hash = (uint32_t)(set ^ (set >> 32));
The other way to do this is to count the number of occurrences of each character and then hash the resulting vector(a vector count, where count[c] is the number of occurrences of a character c). I would not say that it is better than sorting(the number of characters is fixed(and probably pretty low) so you can use radix sort)(but I cannot say that it is worse either). The time complexity of both: sorting using radix sort and counting the number of occurrences of each character is linear(moreover, radix sort and counting characters is pretty much the same thing), so there shouldn't be much difference between these two.

Strange pointer arithmetic

I came across too strange behaviour of pointer arithmetic. I am developing a program to develop SD card from LPC2148 using ARM GNU toolchain (on Linux). My SD card a sector contains data (in hex) like (checked from linux "xxd" command):
fe 2a 01 34 21 45 aa 35 90 75 52 78
While printing individual byte, it is printing perfectly.
char *ch = buffer; /* char buffer[512]; */
for(i=0; i<12; i++)
debug("%x ", *ch++);
Here debug function sending output on UART.
However pointer arithmetic specially adding a number which is not multiple of 4 giving too strange results.
uint32_t *p; // uint32_t is typedef to unsigned long.
p = (uint32_t*)((char*)buffer + 0);
debug("%x ", *p); // prints 34012afe // correct
p = (uint32_t*)((char*)buffer + 4);
debug("%x ", *p); // prints 35aa4521 // correct
p = (uint32_t*)((char*)buffer + 2);
debug("%x ", *p); // prints 0134fe2a // TOO STRANGE??
Am I choosing any wrong compiler option? Pls help.
I tried optimization options -0 and -s; but no change.
I could think of little/big endian, but here i am getting unexpected data (of previous bytes) and no order reversing.
Your CPU architecture must support unaligned load and store operations.
To the best of my knowledge, it doesn't (and I've been using STM32, which is an ARM-based cortex).
If you try to read a uint32_t value from an address which is not divisible by the size of uint32_t (i.e. not divisible by 4), then in the "good" case you will just get the wrong output.
I'm not sure what's the address of your buffer, but at least one of the three uint32_t read attempts that you describe in your question, requires the processor to perform an unaligned load operation.
On STM32, you would get a memory-access violation (resulting in a hard-fault exception).
The data-sheet should provide a description of your processor's expected behavior.
UPDATE:
Even if your processor does support unaligned load and store operations, you should try to avoid using them, as it might affect the overall running time (in comparison with "normal" load and store operations).
So in either case, you should make sure that whenever you perform a memory access (read or write) operation of size N, the target address is divisible by N. For example:
uint08_t x = *(uint08_t*)y; // 'y' must point to a memory address divisible by 1
uint16_t x = *(uint16_t*)y; // 'y' must point to a memory address divisible by 2
uint32_t x = *(uint32_t*)y; // 'y' must point to a memory address divisible by 4
uint64_t x = *(uint64_t*)y; // 'y' must point to a memory address divisible by 8
In order to ensure this with your data structures, always define them so that every field x is located at an offset which is divisible by sizeof(x). For example:
struct
{
uint16_t a; // offset 0, divisible by sizeof(uint16_t), which is 2
uint08_t b; // offset 2, divisible by sizeof(uint08_t), which is 1
uint08_t a; // offset 3, divisible by sizeof(uint08_t), which is 1
uint32_t c; // offset 4, divisible by sizeof(uint32_t), which is 4
uint64_t d; // offset 8, divisible by sizeof(uint64_t), which is 8
}
Please note, that this does not guarantee that your data-structure is "safe", and you still have to make sure that every myStruct_t* variable that you are using, is pointing to a memory address divisible by the size of the largest field (in the example above, 8).
SUMMARY:
There are two basic rules that you need to follow:
Every instance of your structure must be located at a memory address which is divisible by the size of the largest field in the structure.
Each field in your structure must be located at an offset (within the structure) which is divisible by the size of that field itself.
Exceptions:
Rule #1 may be violated if the CPU architecture supports unaligned load and store operations. Nevertheless, such operations are usually less efficient (requiring the compiler to add NOPs "in between"). Ideally, one should strive to follow rule #1 even if the compiler does support unaligned operations, and let the compiler know that the data is well aligned (using a dedicated #pragma), in order to allow the compiler to use aligned operations where possible.
Rule #2 may be violated if the compiler automatically generates the required padding. This, of course, changes the size of each instance of the structure. It is advisable to always use explicit padding (instead of relying on the current compiler, which may be replaced at some later point in time).
LDR is the ARM instruction to load data. You have lied to the compiler that the pointer is a 32bit value. It is not aligned properly. You pay the price. Here is the LDR documentation,
If the address is not word-aligned, the loaded value is rotated right by 8 times the value of bits [1:0].
See: 4.2.1. LDR and STR, words and unsigned bytes, especially the section Address alignment for word transfers.
Basically your code is like,
p = (uint32_t*)((char*)buffer + 0);
p = (p>>16)|(p<<16);
debug("%x ", *p); // prints 0134fe2a
but has encoded to one instruction on the ARM. This behavior is dependent on the ARM CPU type and possibly co-processor values. It is also highly non-portable code.
It's called "undefined behavior". Your code is casting a value which is not a valid unsigned long * into an unsigned long *. The semantics of that operation are undefined behavior, which means pretty much anything can happen*.
In this case, the reason two of your examples behaved as you expected is because you got lucky and buffer happened to be word-aligned. Your third example was not as lucky (if it was, the other two would not have been), so you ended up with a pointer with extra garbage in the 2 least significant bits. Depending on the version of ARM you are using, that could result in an unaligned read (which it appears is what you were hoping for), or it could result in an aligned read (using the most significant 30 bits) and a rotation (word rotated by the number of bytes indicated in the least significant 2 bits). It looks pretty clear that the later is what happened in your 3rd example.
Anyway, technically, all 3 of your example outputs are correct. It would also be correct for the program to crash on all 3 of them.
Basically, don't do that.
A safer alternative is to write the bytes into a uint32_t. Something like:
uint32_t w;
memcpy(&w, buffer, 4);
debug("%x ", w);
memcpy(&w, buffer+4, 4);
debug("%x ", w);
memcpy(&w, buffer+2, 4);
debug("%x ", w);
Of course, that's still assuming sizeof(uint32_t) == 4 && CHAR_BITS == 8, but that's a much safer assumption. (Ie, it should work on pretty much any machine with 8 bit bytes.)

OpenCL - GPU Vector Math (Instruction Level Parallelism)

This article talks about the optimization of code and discusses Instruction level parallelism. They give an example of GPU vector math where the float4 vector math can be performed on the vector rather than the individual scalars. Example given:
float4 x_neighbor = center.xyxy + float4(-1.0f, 0.0f, 1.0f, 0.0f);
Now my question is can it be used for comparison purposes as well? So in the reduction example, can I do this:
accumulator.xyz = (accumulator.xyz < element.xyz) ? accumulator.xyz : element.xyz;
Thank you.
As already stated by Austin comparison operators apply on vectors as well.
The point d. in the section 6.3 of the standard is the relevant part for you. It says:
The relational operators greater than (>), less than (<), greater than
or equal (>=), and less than or equal (<=) operate on scalar and
vector types.
it explains as well the valid cases:
The two operands are scalars. (...)
One operand is a scalar, and the other is a vector. (...) The scalar type is then widened to a vector that has the same number of
components as the vector operand. The operation is done component-wise
resulting in the same size vector.
The two operands are vectors of the same type. In this case, the operation is done component-wise resulting in the same size vector.
And finally, what these comparison operators return:
The result is a scalar signed integer of type int if the source
operands are scalar and a vector signed integer type of the same size
as the source operands if the source operands are vector types.
For scalar types, the relational operators shall return 0 if the
specified relation is false and 1 if the specified relation is true.
For vector types, the relational operators shall return 0 if the
specified relation is false and –1 (i.e. all bits set) if the
specified relation is true. The relational operators always return 0
if either argument is not a number (NaN).
EDIT:
To complete a bit the return value part, especially after #redrum's comment; It seems odd at first that the true value is -1 for the vector types. However, since OCL behaves as much as possible like C, it doesn't make a big change since everything that is different than 0 is true.
As an example is you have the vector:
int2 vect = (int2)(0, -1);
This statement will evaluate to true and do something:
if(vect.y){
//Do something
}
Now, note that this isn't valid (not related to the value returned, but only to the fact it is a vector):
if(vect){
//do something
}
This won't compile, however, you can use the function all and any to evaluate all elements of a vector in an "if statement":
if(any(vect){
//this will evaluate to true in our example
}
Note that the returned value is (from the quick reference card):
int any (Ti x): 1 if MSB in component of x is set; else 0
So any negative number will do.
But still, why not keep 1 as the returned value when evaluated to true?
I think that the important part is the fact that all bits are set. My guess, would be that like that you can make easily bitwise operation on vectors, like say you want to eliminate the elements smaller than a given value. Thanks to the fact that the value "true" is -1, i.e. 111111...111, you can do something like that:
int4 vect = (int4)(75, 3, 42, 105);
int ref = 50;
int4 result = (vect < ref) & vect;
and result's elements will be: 0, 3, 42, 0
in the other hand if the returned value was 1 for true, the result would be: 0, 1, 0, 0
The OpenCL 1.2 Reference Card from Khronos says that logical operators:
Operators [6.3]
These operators behave similarly as in C99 except that
operands may include vector types when possible:
+ - * % / -- ++ == != &
~ ^ > < >= <= | ! && ||
?: >> << = , op= sizeof

How to translate Text to Binary with Cocoa?

I'm making a simple Cocoa program that can encode text to binary and decode it back to text. I tried to make this script and I was not even close to accomplishing this. Can anyone help me? This has to include two textboxes and two buttons or whatever is best, Thanks!
There are two parts to this.
The first is to encode the characters of the string into bytes. You do this by sending the string a dataUsingEncoding: message. Which encoding you choose will determine which bytes it gives you for each character. Start with NSUTF8StringEncoding, and then experiment with other encodings, such as NSUnicodeStringEncoding, once you get it working.
The second part is to convert every bit of every byte into either a '0' character or a '1' character, so that, for example, the letter A, encoded in UTF-8 to a single byte, will be represented as 01000001.
So, converting characters to bytes, and converting bytes to characters representing bits. These two are completely separate tasks; the second part should work correctly for any stream of bytes, including any valid stream of encoded characters, any invalid stream of encoded characters, and indeed anything that isn't text at all.
The first part is easy enough:
- (NSString *) stringOfBitsFromEncoding:(NSStringEncoding)encoding
ofString:(NSString *)inputString
{
//Encode the characters to bytes using the UTF-8 encoding. The bytes are contained in an NSData object, which we receive.
NSData *data = [string dataUsingEncoding:NSUTF8StringEncoding];
//I did say these were two separate jobs.
return [self stringOfBitsFromData:data];
}
For the second part, you'll need to loop through the bytes of the data. A C for loop will do the job there, and that will look like this:
//This is the method we're using above. I'll leave out the method signature and let you fill that in.
- …
{
//Find out how many bytes the data object contains.
NSUInteger length = [data length];
//Get the pointer to those bytes. “const” here means that we promise not to change the values of any of the bytes. (The compiler may give a warning if we don't include this, since we're not allowed to change these bytes anyway.)
const char *bytes = [data bytes];
//We'll store the output here. There are 8 bits per byte, and we'll be putting in one character per bit, so we'll tell NSMutableString that it should make room for (the number of bytes times 8) characters.
NSMutableString *outputString = [NSMutableString stringWithCapacity:length * 8];
//The loop. We start by initializing i to 0, then increment it (add 1 to it) after each pass. We keep looping as long as i < length; when i >= length, the loop ends.
for (NSUInteger i = 0; i < length; ++i) {
char thisByte = bytes[i];
for (NSUInteger bitNum = 0; bitNum < 8; ++bitNum) {
//Call a function, which I'll show the definition of in a moment, that will get the value of a bit at a given index within a given character.
bool bit = getBitAtIndex(thisByte, bitNum);
//If this bit is a 1, append a '1' character; if it is a 0, append a '0' character.
[outputString appendFormat: #"%c", bit ? '1' : '0'];
}
}
return outputString;
}
Bits 101 (or, 1100101)
Bits are literally just digits in base 2. Humans in the Western world usually write out numbers in base 10, but a number is a number no matter what base it's written in, and every character, and every byte, and indeed every bit, is just a number.
Digits—including bits—are counted up from the lowest place, according to the exponent to which the base is raised to find the magnitude of that place. We want bits, so that base is 2, so our place values are:
2^0 = 1: The ones place (the lowest bit)
2^1 = 2: The twos place (the next higher bit)
2^2 = 4: The fours place
2^3 = 8: The eights place
And so on, up to 2^7. (Note that the highest exponent is exactly one lower than the number of digits we're after; in this case, 7 vs. 8.)
If that all reminds you of reading about “the ones place”, “the tens place”, “the hundreds place”, etc. when you were a kid, it should: it's the exact same principle.
So a byte such as 65, which (in UTF-8) completely represents the character 'A', is the sum of:
2^7 × 0 = 0
+ 2^6 × 0 = 64
+ 2^5 × 1 = 0
+ 2^4 × 0 = 0
+ 2^3 × 0 = 0
+ 2^2 × 0 = 0
+ 2^1 × 0 = 0
+ 2^0 × 1 = 1
= 0 + 64 +0+0+0+0+0 + 1
= 64 + 1
= 65
Back when you learned base 10 numbers as a kid, you probably noticed that ten is “10”, one hundred is “100”, etc. This is true in base 2 as well: as 10^x is “1” followed by x “0”s in base 10, so is 2^x “1” followed by “x” 0s in base 2. So, for example, sixty-four in base 2 is “1000000” (count the zeroes and compare to the table above).
We are going to use these exact-power-of-two numbers to test each bit in each input byte.
Finding the bit
C has a pair of “shift” operators that will insert zeroes or remove digits at the low end of a number. The former is called “shift left”, and is written as <<, and you can guess the opposite.
We want shift left. We want to shift 1 left by the number of the bit we're after. That is exactly equivalent to raising 2 (our base) to the power of that number; for example, 1 << 6 = 2^6 = “1000000”.
Testing the bit
C has an operator for bit testing, too; it's &, the bitwise AND operator. (Do not confuse this with &&, which is the logical AND operator. && is for using whole true/false values in making decisions; & is one of your tools for working with bits within values.)
Strictly speaking, & does not test single bits; it goes through the bits of both input values, and returns a new value whose bits are the bitwise AND of each input pair. So, for example,
01100101
& 00101011
----------
00100001
Each bit in the output is 1 if and only if both of the corresponding input bits were also 1.
Putting these two things together
We're going to use the shift left operator to give us a number where one bit, the nth bit, is set—i.e., 2^n—and then use the bitwise AND operator to test whether the same bit is also set in our input byte.
//This is a C function that takes a char and an int, promising not to change either one, and returns a bool.
bool getBitAtIndex(const char byte, const int bitNum)
//It could also be a method, which would look like this:
//- (bool) bitAtIndex:(const int)bitNum inByte:(const char)byte
//but you would have to change the code above. (Feel free to try it both ways.)
{
//Find 2^bitNum, which will be a number with exactly 1 bit set. For example, when bitNum is 6, this number is “1000000”—a single 1 followed by six 0s—in binary.
const int powerOfTwo = 1 << bitNum;
//Test whether the same bit is also set in the input byte.
bool bitIsSet = byte & powerOfTwo;
return bitIsSet;
}
A bit of magic I should acknowledge
The bitwise AND operator does not evaluate to a single bit—it does not evaluate to only 1 or 0. Remember the above example, in which the & operator returned 33.
The bool type is a bit magic: Any time you convert any value to bool, it automatically becomes either 1 or 0. Anything that is not 0 becomes 1; anything that is 0 becomes 0.
The Objective-C BOOL type does not do this, which is why I used bool in the code above. You are free to use whichever you prefer, except that you generally should use BOOL whenever you deal with anything that expects a BOOL, particularly when overriding methods in subclasses or implementing protocols. You can convert back and forth freely, though not losslessly (since bool will change non-zero values as described above).
Oh yeah, you said something about text boxes too
When the user clicks on your button, get the stringValue of your input field, call stringOfBitsFromEncoding:ofString: using a reasonable encoding (such as UTF-8) and that string, and set the resulting string as the new stringValue of your output field.
Extra credit: Add a pop-up button with which the user can choose an encoding.
Extra extra credit: Populate the pop-up button with all of the available encodings, without hard-coding or hard-nibbing a list.

Resources