What's the best way to handle an "all combinations" project? - performance

I've been assigned a school project in which I need to come up with as many integers as possible using only the integers 2 3 4 and the operators + - * / %. I then have to output the integers with cout along with how I got that answer. For example:
cout << "2 + 3 - 4 = " << 2 + 3 - 4;
I can only use each integer once per cout statement, and there can be no duplicate answers.
Everyone else seems to be doing the "brute force" method (i.e., copying and pasting the same statements and changing the numbers and operators), but that hardly seems efficient. I figured I would try cycling through each number and operator one-by-one and checking to see if the answer has already been found, but I'm unsure of what the easiest way to do this would be.
I suppose I could use nested loops, but there's still the problem of checking to see if the answer has already been found. I tried storing the answers in a vector, but I couldn't pass the vector to a user-defined function that checked to see if a value existed in the vector.

You could use a map or a hash_map from the Standard Template Library (STL). These structures store key-value pairs efficiently. Read up on them before you use them but they might give you a good starting point. Hint: The integers you compute would probably make good keys.

Assuming you can use each of the numbers in the set(2, 3, 4) only once there are 3! ways of arranging these 3 numbers. Then there are 2 places for sign and you have total 5 symbols(+ -
* / %) so there are 5*5 = 25 ways to do that. So you have total 3! * 25 expressions.
Than you can create a hash map where key will be number and value will be the expression. If the hash map contains a key already you skip that expression.

You could try a bit of meta-programming, as follows. It has the advantage of using C itself to calculate the expressions rather than you trying to do your own evaluator (and possibly getting it wrong):
#include <stdlib.h>
#include <iostream>
#include <fstream>
using namespace std;
int main (void) {
int n1, n2, n3;
const char *ops[] = {" + ", " - ", " * ", " / ", " % ", 0};
const char **op1, **op2;
ofstream of;
of.open ("prog2.cpp", ios::out);
of << "#include <iostream>\n";
of << "using namespace std;\n";
of << "#define IXCOUNT 49\n\n";
of << "static int mkIdx (int tot) {\n";
of << " int ix = (IXCOUNT / 2) + tot;\n";
of << " if ((ix >= 0) && (ix < IXCOUNT)) return ix;\n";
of << " cout << \"Need more index space, "
<< "try \" << IXCOUNT + 1 + (ix - IXCOUNT) * 2 << \"\\n\";\n";
of << " return -1;\n";
of << "}\n\n";
of << "int main (void) {\n";
of << " int tot, ix, used[IXCOUNT];\n\n";
of << " for (ix = 0; ix < sizeof(used)/sizeof(*used); ix++)\n";
of << " used[ix] = 0;\n\n";
for (n1 = 2; n1 <= 4; n1++) {
for (n2 = 2; n2 <= 4; n2++) {
if (n2 != n1) {
for (n3 = 2; n3 <= 4; n3++) {
if ((n3 != n1) && (n3 != n2)) {
for (op1 = ops; *op1 != 0; op1++) {
for (op2 = ops; *op2 != 0; op2++) {
of << " tot = " << n1 << *op1 << n2 << *op2 << n3 << ";\n";
of << " if ((ix = mkIdx (tot)) < 0) return ix;\n";
of << " if (!used[ix])\n";
of << " cout << " << n1 << " << \"" << *op1 << "\" << "
<< n2 << " << \"" << *op2 << "\" << " << n3
<< " << \" = \" << tot << \"\\n\";\n";
of << " used[ix] = 1;\n\n";
}
}
}
}
}
}
}
of << " return 0;\n";
of << "}\n";
of.close();
system ("g++ -o prog2 prog2.cpp ; ./prog2");
return 0;
}
This gives you:
2 + 3 + 4 = 9
2 + 3 - 4 = 1
2 + 3 * 4 = 14
2 + 3 / 4 = 2
2 + 3 % 4 = 5
2 - 3 + 4 = 3
2 - 3 - 4 = -5
2 - 3 * 4 = -10
2 - 3 % 4 = -1
2 * 3 + 4 = 10
2 * 3 * 4 = 24
2 / 3 + 4 = 4
2 / 3 - 4 = -4
2 / 3 * 4 = 0
2 % 3 + 4 = 6
2 % 3 - 4 = -2
2 % 3 * 4 = 8
2 * 4 + 3 = 11
2 / 4 - 3 = -3
I'm not entirely certain of the wisdom of handing this in as an assignment however :-)

Related

OpenCL (JOCL) - 2D calculus over two arrays in Kernel

I'm asking this here because I thought I've understood how OpenCL works but... I think there are several things I don't get.
What I want to do is to get the difference between all the values of two arrays, then calculate the hypot and finally get the maximum hypot value, so If I have:
double[] arrA = new double[]{1,2,3}
double[] arrB = new double[]{6,7,8}
Calculate
dx1 = 1 - 1; dx2 = 2 - 1; dx3 = 3 - 1, dx4= 1 - 2;... dxLast = 3 - 3
dy1 = 6 - 6; dy2 = 7 - 6; dy3 = 8 - 6, dy4= 6 - 7;... dyLast = 8 - 8
(Extreme dx and dy will get 0, but i don't care about ignoring those cases by now)
Then calculate each hypot based on hypot(dx(i), dy(i))
And once all these values where obtained, get the maximum hypot value
So, I have the next defined Kernel:
String programSource =
"#ifdef cl_khr_fp64 \n"
+ " #pragma OPENCL EXTENSION cl_khr_fp64 : enable \n"
+ "#elif defined(cl_amd_fp64) \n"
+ " #pragma OPENCL EXTENSION cl_amd_fp64 : enable \n"
+ "#else "
+ " #error Double precision floating point not supported by OpenCL implementation.\n"
+ "#endif \n"
+ "__kernel void "
+ "sampleKernel(__global const double *bufferX,"
+ " __global const double *bufferY,"
+ " __local double* scratch,"
+ " __global double* result,"
+ " __const int lengthX,"
+ " __const int lengthY){"
+ " const int index_a = get_global_id(0);"//Get the global indexes for 2D reference
+ " const int index_b = get_global_id(1);"
+ " const int local_index = get_local_id(0);"//Current thread id -> Should be the same as index_a * index_b + index_b;
+ " if (local_index < (lengthX * lengthY)) {"// Load data into local memory
+ " if(index_a < lengthX && index_b < lengthY)"
+ " {"
+ " double dx = (bufferX[index_b] - bufferX[index_a]);"
+ " double dy = (bufferY[index_b] - bufferY[index_a]);"
+ " scratch[local_index] = hypot(dx, dy);"
+ " }"
+ " } "
+ " else {"
+ " scratch[local_index] = 0;"// Infinity is the identity element for the min operation
+ " }"
//Make a Barrier to make sure all values were set into the local array
+ " barrier(CLK_LOCAL_MEM_FENCE);"
//If someone can explain to me the offset thing I'll really apreciate that...
//I just know there is alway a division by 2
+ " for(int offset = get_local_size(0) / 2; offset > 0; offset >>= 1) {"
+ " if (local_index < offset) {"
+ " float other = scratch[local_index + offset];"
+ " float mine = scratch[local_index];"
+ " scratch[local_index] = (mine > other) ? mine : other;"
+ " }"
+ " barrier(CLK_LOCAL_MEM_FENCE);"
//A barrier to make sure that all values where checked
+ " }"
+ " if (local_index == 0) {"
+ " result[get_group_id(0)] = scratch[0];"
+ " }"
+ "}";
For this case, the defined GWG size is (100, 100, 0) and a LWI size of (10, 10, 0).
So, for this example, both arrays have size 10 and the GWG and LWI are obtained as follows:
//clGetKernelWorkGroupInfo(kernel, device, CL.CL_KERNEL_WORK_GROUP_SIZE, Sizeof.size_t, Pointer.to(buffer), null);
long kernel_work_group_size = OpenClUtil.getKernelWorkGroupSize(kernel, device.getCl_device_id(), 3);
//clGetDeviceInfo(device, CL_DEVICE_MAX_WORK_ITEM_SIZES, Sizeof.size_t * numValues, Pointer.to(buffer), null);
long[] maxSize = device.getMaximumSizes();
maxSize[0] = ( kernel_work_group_size > maxSize[0] ? maxSize[0] : kernel_work_group_size);
maxSize[1] = ( kernel_work_group_size > maxSize[1] ? maxSize[1] : kernel_work_group_size);
maxSize[2] = ( kernel_work_group_size > maxSize[2] ? maxSize[2] : kernel_work_group_size);
// maxSize[2] =
long xMaxSize = (x > maxSize[0] ? maxSize[0] : x);
long yMaxSize = (y > maxSize[1] ? maxSize[1] : y);
long zMaxSize = (z > maxSize[2] ? maxSize[2] : z);
long local_work_size[] = new long[] { xMaxSize, yMaxSize, zMaxSize };
int numWorkGroupsX = 0;
int numWorkGroupsY = 0;
int numWorkGroupsZ = 0;
if(local_work_size[0] != 0)
numWorkGroupsX = (int) ((total + local_work_size[0] - 1) / local_work_size[0]);
if(local_work_size[1] != 0)
numWorkGroupsY = (int) ((total + local_work_size[1] - 1) / local_work_size[1]);
if(local_work_size[2] != 0)
numWorkGroupsZ = (int) ((total + local_work_size[2] - 1) / local_work_size[2]);
long global_work_size[] = new long[] { numWorkGroupsX * local_work_size[0],
numWorkGroupsY * local_work_size[1], numWorkGroupsZ * local_work_size[2]};
The thing is I'm not getting the espected values so I decided to make some tests based on a smaller kernel and changing the [VARIABLE TO TEST VALUES] object returned in a result array:
/**
* The source code of the OpenCL program to execute
*/
private static String programSourceA =
"#ifdef cl_khr_fp64 \n"
+ " #pragma OPENCL EXTENSION cl_khr_fp64 : enable \n"
+ "#elif defined(cl_amd_fp64) \n"
+ " #pragma OPENCL EXTENSION cl_amd_fp64 : enable \n"
+ "#else "
+ " #error Double precision floating point not supported by OpenCL implementation.\n"
+ "#endif \n"
+ "__kernel void "
+ "sampleKernel(__global const double *bufferX,"
+ " __global const double *bufferY,"
+ " __local double* scratch,"
+ " __global double* result,"
+ " __const int lengthX,"
+ " __const int lengthY){"
//Get the global indexes for 2D reference
+ " const int index_a = get_global_id(0);"
+ " const int index_b = get_global_id(1);"
//Current thread id -> Should be the same as index_a * index_b + index_b;
+ " const int local_index = get_local_id(0);"
// Load data into local memory
//Only print values if index_a < ArrayA length
//Only print values if index_b < ArrayB length
//Only print values if local_index < (lengthX * lengthY)
//Only print values if this is the first work group.
+ " if (local_index < (lengthX * lengthY)) {"
+ " if(index_a < lengthX && index_b < lengthY)"
+ " {"
+ " double dx = (bufferX[index_b] - bufferX[index_a]);"
+ " double dy = (bufferY[index_b] - bufferY[index_a]);"
+ " result[local_index] = hypot(dx, dy);"
+ " }"
+ " } "
+ " else {"
// Infinity is the identity element for the min operation
+ " result[local_index] = 0;"
+ " }"
The returned values are far of being the espected but, if the [VARIABLE TO TEST VALUES] is (index_a * index_b) + index_a, almost each value of the returned array has the correct (index_a * index_b) + index_a value, i mean:
result[0] -> 0
result[1] -> 1
result[2] -> 2
....
result[97] -> 97
result[98] -> 98
result[99] -> 99
but some values are: -3.350700319577517E-308....
What I'm not doing correctly???
I hope this is well explained and not that big to make you angry with me....
Thank you so much!!!!!
TomRacer
You have many problems in your code, and some of them are concept related. I think you should read the standard or OpenCL guide completely before starting to code. Because some of the system calls you are using have a different behaviour that what you expect.
Work-groups and work-items are NOT like CUDA. If you want 100x100 work-items, separated into 10x10 work-groups you use as global-size (100x100) and local-size(10x10). Unlike CUDA, where the global work item is multiplied by the local size internally.
1.1. In your test code, if you are using 10x10 with 10x10. Then you are not filling the whole space, the non filled area will still have garbage like -X.xxxxxE-308.
You should not use lengthX and lengthY and put a lot of ifs in your code. OpenCL has a method to call the kernels with offsets and with specific number of items, so you can control this from the host side. BTW doing this is a performance loss and is never a good practice since the code is less readable.
get_local_size(0) gives you the local size of axis 0 (10 in your case). What is what you do not understand in this call? Why do you divide it by 2 always?
I hope this can help you in your debugging process.
Cheers
thank you for your answer, first of all this kernel code is based on the commutative reduction code explained here: http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-simple-reductions/.
So I'm using that code but I added some things like the 2D operations.
Regarding to the point you mentioned before:
1.1- Actually the global work group size is (100, 100, 0)... That 100 is a result of multiplying 10 x 10 where 10 is the current array size, so my global work group size is based on this rule... then the local work item size is (10, 10, 0).
Global work group size must be a multiple of local work item size, I have read this in many examples and I think this is ok.
1.2- In my test code I'm using the same arrays, in fact if I change the array size GWG size and LWI size will change dinamically.
2.1- There are not so many "if" there, there are just 3 "if", the first one checks when I must compute the hypot() based on the array objects or fill that object with zero.
The second and third "if"s are just part of the reduction algorithm that seems to be fine.
2.2- Regarding to the lengthX and lengthY yeah you are right but I haven't got that yet, how should I use that??
3.1- Yeah, I know that, but I realized that I'm not using the Y axis id so maybe there is another problem here.
3.2- The reduction algorithm iterates over each pair of elements stored in the scratch variable and checking for the maximum value between them, so for each "for" that it does it is reducing the elements to be computed to the half of the previous quantity.
Also I'm going to post a some changes on the main kernel code and in the test kernel code because there where some errors.
Greetings...!!!

Total probability of a given answer of a given number of additions [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm doing C++, and I want to find out the simplest way to find the total probability of a given answer of a given number of additions.
For example, the given answer is 5, and the given number of additions is 4 (x+x+x+x). The total probability that I want to find is 4:
1) 1 + 1 + 1 + 2 = 5
2) 1 + 1 + 2 + 1 = 5
3) 1 + 2 + 1 + 1 = 5
4) 2 + 1 + 1 + 1 = 5
Another example, the given answer is 6, and the given number of additions is 4 (x+x+x+x). The total probability is 10:
1) 1 + 1 + 1 + 3 = 6
2) 1 + 1 + 3 + 1 = 6
3) 1 + 3 + 1 + 1 = 6
4) 3 + 1 + 1 + 1 = 6
5) 1 + 1 + 2 + 2 = 6
6) 1 + 2 + 2 + 1 = 6
7) 2 + 2 + 1 + 1 = 6
8) 2 + 1 + 1 + 2 = 6
9) 2 + 1 + 2 + 1 = 6
10) 1 + 2 + 1 + 2 = 6
I have absolutely no idea where to start
Here's a start for you.
Have a look at this table
1 2 3 4 5
+------------------
1 | 1 0 0 0 0
2 | 1 1 0 0 0
3 | 1 2 1 0 0
4 | 1 3 3 1 0
5 | 1 4 6 4 1
The number of summands is increasing from left to right, the total increases in rows, so e.g. there are 3 ways to sum 3 integers (greater than 0) for a total of 4 (namely 1+1+2, 1+2+1, 2+1+1).
With 4 additions and a result Y, if all numbers will be positive and nonzero and small enough (<100) you can easily at least bruteforce this... just cycle trough all numbers with 4x for cycles and if they sum up to Y increment number of permutations. Disadvantage is the complexity O(N^4) which will be very slow.
#include <iostream>
using namespace std;
int main()
{
int y = 6;
int perm = 0;
for(int a = 1; a < y; a++)
for(int b = 1; b < y; b++)
for(int c = 1; c < y; c++)
for(int d = 1; d < y; d++)
{
if((a+b+c+d)==y)
{
cout << a << " + " << b << " + " << c << " + " << d << " = " << y << endl;
perm++;
}
}
cout << "number of permutations: " << perm << endl;
}
This is not probability what you are trying to find, it's number of comibnations.
Looking at your examples, I assume that the number of numbers you are adding is fixed (i.e. 4), so every number is greater or equal to 1. We can do simple math here then - let's substract this number from both sides of the equation:
Original: 1) 1 + 1 + 1 + 2 = 5
Result of substracting: 1) 0 + 0 + 0 + 1 = 1
When the substraction is done, your problem is the combination with repetition problem.
The formulas you can find in the link I provided are quite simple. The problem can be solved using following code:
#include <iostream>
unsigned factorial(int n)
{
if (n == 1) return 1;
return n * factorial(n-1);
}
unsigned combinationsWithRepetition(int n, int k)
{
return factorial(n + k - 1) / (factorial(k) * factorial(n - 1));
}
unsigned yourProblem(unsigned numberOfNumbers, unsigned result)
{
return combinationsWithRepetition(numberOfNumbers, result - numberOfNumbers);
}
int main()
{
std::cout << yourProblem(4, 5) << std::endl;
std::cout << yourProblem(4, 6) << std::endl;
return 0;
}
Also, you can check this code out in online compiler.
Note that this code covers only the problem solving and could be improved if you choose to use it (i.e. it is not protected against invalid values).

How to convert a decimal base (10) to a negabinary base (-2)?

I want to write a program to convert from decimal to negabinary.
I cannot figure out how to convert from decimal to negabinary.
I have no idea about how to find the rule and how it works.
Example: 7(base10)-->11011(base-2)
I just know it is 7 = (-2)^0*1 + (-2)^1*1 + (-2)^2*0 + (-2)^3*1 + (-2)^4*1.
The algorithm is described in http://en.wikipedia.org/wiki/Negative_base#Calculation. Basically, you just pick the remainder as the positive base case and make sure the remainder is nonnegative and minimal.
7 = -3*-2 + 1 (least significant digit)
-3 = 2*-2 + 1
2 = -1*-2 + 0
-1 = 1*-2 + 1
1 = 0*-2 + 1 (most significant digit)
def neg2dec(arr):
n = 0
for i, num in enumerate(arr[::-1]):
n+= ((-2)**i)*num
return n
def dec2neg(num):
if num == 0:
digits = ['0']
else:
digits = []
while num != 0:
num, remainder = divmod(num, -2)
if remainder < 0:
num, remainder = num + 1, remainder + 2
digits.append(str(remainder))
return ''.join(digits[::-1])
Just my two cents (C#):
public static int[] negaBynary(int value)
{
List<int> result = new List<int> ();
while (value != 0)
{
int remainder = value % -2;
value = value / -2;
if (remainder < 0)
{
remainder += 2;
value += 1;
}
Console.WriteLine (remainder);
result.Add(remainder);
}
return result.ToArray();
}
There is a method (attributed to Librik/Szudzik/Schröppel) that is much more efficient:
uint64_t negabinary(int64_t num) {
const uint64_t mask = 0xAAAAAAAAAAAAAAAA;
return (mask + num) ^ mask;
}
The conversion method and its reverse are described in more detail in this answer.
Here is some code that solves it and display the math behind it.
Some code taken from "Birender Singh"
#https://onlinegdb.com/xR1E5Cj7L
def neg2dec(arr):
n = 0
for i, num in enumerate(arr[::-1]):
n+= ((-2)**i)*num
return n
def dec2neg(num):
oldNum = num
if num == 0:
digits = ['0']
else:
digits = []
while num != 0:
num, remainder = divmod(num, -10)
if remainder < 0:
num, remainder = num + 1, remainder + 10
print(str(oldNum) + " = " + str(num) + " * -10 + " + str(remainder))
oldNum = num
digits.append(str(remainder))
return ''.join(digits[::-1])
print(dec2neg(-8374932))
Output:
-8374932 = 837494 * -10 + 8
837494 = -83749 * -10 + 4
-83749 = 8375 * -10 + 1
8375 = -837 * -10 + 5
-837 = 84 * -10 + 3
84 = -8 * -10 + 4
-8 = 1 * -10 + 2
1 = 0 * -10 + 1
12435148

Go << and >> operators

Could someone please explain to me the usage of << and >> in Go? I guess it is similar to some other languages.
The super (possibly over) simplified definition is just that << is used for "times 2" and >> is for "divided by 2" - and the number after it is how many times.
So n << x is "n times 2, x times". And y >> z is "y divided by 2, z times".
For example, 1 << 5 is "1 times 2, 5 times" or 32. And 32 >> 5 is "32 divided by 2, 5 times" or 1.
From the spec at http://golang.org/doc/go_spec.html, it seems that at least with integers, it's a binary shift. for example, binary 0b00001000 >> 1 would be 0b00000100, and 0b00001000 << 1 would be 0b00010000.
Go apparently doesn't accept the 0b notation for binary integers. I was just using it for the example. In decimal, 8 >> 1 is 4, and 8 << 1 is 16. Shifting left by one is the same as multiplication by 2, and shifting right by one is the same as dividing by two, discarding any remainder.
The << and >> operators are Go Arithmetic Operators.
<< left shift integer << unsigned integer
>> right shift integer >> unsigned integer
The shift operators shift the left
operand by the shift count specified
by the right operand. They implement
arithmetic shifts if the left operand
is a signed integer and logical shifts
if it is an unsigned integer. The
shift count must be an unsigned
integer. There is no upper limit on
the shift count. Shifts behave as if
the left operand is shifted n times by
1 for a shift count of n. As a result,
x << 1 is the same as x*2 and x >> 1
is the same as x/2 but truncated
towards negative infinity.
They are basically Arithmetic operators and its the same in other languages here is a basic PHP , C , Go Example
GO
package main
import (
"fmt"
)
func main() {
var t , i uint
t , i = 1 , 1
for i = 1 ; i < 10 ; i++ {
fmt.Printf("%d << %d = %d \n", t , i , t<<i)
}
fmt.Println()
t = 512
for i = 1 ; i < 10 ; i++ {
fmt.Printf("%d >> %d = %d \n", t , i , t>>i)
}
}
GO Demo
C
#include <stdio.h>
int main()
{
int t = 1 ;
int i = 1 ;
for(i = 1; i < 10; i++) {
printf("%d << %d = %d \n", t, i, t << i);
}
printf("\n");
t = 512;
for(i = 1; i < 10; i++) {
printf("%d >> %d = %d \n", t, i, t >> i);
}
return 0;
}
C Demo
PHP
$t = $i = 1;
for($i = 1; $i < 10; $i++) {
printf("%d << %d = %d \n", $t, $i, $t << $i);
}
print PHP_EOL;
$t = 512;
for($i = 1; $i < 10; $i++) {
printf("%d >> %d = %d \n", $t, $i, $t >> $i);
}
PHP Demo
They would all output
1 << 1 = 2
1 << 2 = 4
1 << 3 = 8
1 << 4 = 16
1 << 5 = 32
1 << 6 = 64
1 << 7 = 128
1 << 8 = 256
1 << 9 = 512
512 >> 1 = 256
512 >> 2 = 128
512 >> 3 = 64
512 >> 4 = 32
512 >> 5 = 16
512 >> 6 = 8
512 >> 7 = 4
512 >> 8 = 2
512 >> 9 = 1
n << x = n * 2^x   Example: 3 << 5 = 3 * 2^5 = 96
y >> z = y / 2^z   Example: 512 >> 4 = 512 / 2^4 = 32
<< is left shift. >> is sign-extending right shift when the left operand is a signed integer, and is zero-extending right shift when the left operand is an unsigned integer.
To better understand >> think of
var u uint32 = 0x80000000;
var i int32 = -2;
u >> 1; // Is 0x40000000 similar to >>> in Java
i >> 1; // Is -1 similar to >> in Java
So when applied to an unsigned integer, the bits at the left are filled with zero, whereas when applied to a signed integer, the bits at the left are filled with the leftmost bit (which is 1 when the signed integer is negative as per 2's complement).
Go's << and >> are similar to shifts (that is: division or multiplication by a power of 2) in other languages, but because Go is a safer language than C/C++ it does some extra work when the shift count is a number.
Shift instructions in x86 CPUs consider only 5 bits (6 bits on 64-bit x86 CPUs) of the shift count. In languages like C/C++, the shift operator translates into a single CPU instruction.
The following Go code
x := 10
y := uint(1025) // A big shift count
println(x >> y)
println(x << y)
prints
0
0
while a C/C++ program would print
5
20
In decimal math, when we multiply or divide by 10, we effect the zeros on the end of the number.
In binary, 2 has the same effect. So we are adding a zero to the end, or removing the last digit
<< is the bitwise left shift operator ,which shifts the bits of corresponding integer to the left….the rightmost bit being ‘0’ after the shift .
For example:
In gcc we have 4 bytes integer which means 32 bits .
like binary representation of 3 is
00000000 00000000 00000000 00000011
3<<1 would give
00000000 00000000 00000000 00000110 which is 6.
In general 1<<x would give you 2^x
In gcc
1<<20 would give 2^20 that is 1048576
but in tcc it would give you 0 as result because integer is of 2 bytes in tcc.
in simple terms we can take it like this in golang
So
n << x is "n times 2, x times". And y >> z is "y divided by 2, z times".
n << x = n * 2^x Example: 3<< 5 = 3 * 2^5 = 96
y >> z = y / 2^z Example: 512 >> 4 = 512 / 2^4 = 32
These are Right bitwise and left bitwise operators

How do I find next bit to change in a Gray code in constant time?

I have a small 8-bit processor which has a N-to-M decoder on some output lines - eg, for the 5 to 32 bit case, I write 00101 and bit 5 changes state. The only interface to the output is change-state, there is no read-back.
The device counts rapidly (but randomly) occuring events, and should provide this count as a 'single bit changes' code to another device. The output pins are read in parallel by another device, and may be read as rapidly or as sparingly as the other device decides, so the count is necessary.
I do NOT need to use the standard Binary Reflective Gray code - I can use any single-bit changing code.
However, I want to be able to track the next bit to change efficiently.
I do not have a "LowestBitSet" instruction, and finding lowest bit set across four 8 bit registers is cycle consuming - so I cannot use this "common" approach:
Keep binary counter A
Find B as A XOR (A+1)
Bit to change is LowestBitSet in B
I wish to calculate this in as little memory and registers as possible, and memory is definitely too restricted for any large lookup table. Cycle time is the more important factor.
Any suggestions on algorithms?
"Algorithm L" on page 10 of Knuth, Donald E. "Generating all n-tuples." The Art of Computer Programming, Volume 4A: Enumeration and Backtracking, pre-fascicle 2a, October 15, 2004 seems ideal. Step L4 would be "change_state(j)" for your device.
You don't need to calculate the Gray codes and xor them, you can just use the counter itself, and then use a 256-element lookup table to count the number of trailing zeros. Like this:
unsigned char bit_change(unsigned char counter[4]) {
static const unsigned char ones[] = {
0,0,0,1,0,1,1,2,0,1,1,2,1,2,2,3,0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
};
unsigned char i;
for (i = 0; i < 4; i++) {
unsigned char x = counter[i];
if (x) {
x ^= x - 1;
return 8 * i + ones[x];
}
}
}
If you unroll the loop, this is at most 2 adds, 1 xors, and 5 loads (but almost always less). If you don't have 256 bytes for the table, you could use the same strategy on nibbles.
LowestBitSet(A ^ (A+1)) is always 0, unless you work for IBM. I think you mean HighestBitSet(), which is roughly the same as log_2().
The bit-twiddling hack immediately preceeding the one linked by AShelly will be much more feasible on an 8-bit micro.
This should make your original algorithm fairly practical, generating { 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, ... }.
As for the possibility of changing to a different sequence which would also generate a Gray code, for the purpose of making it easier to compute, that's very interesting but I haven't come up with anything.
For Binary Reflective Gray Code, see this answer for an efficient way to calculate the code N.
XOR with the previous code to get a value where only the bit to change is set.
Then you can use this Bit Twiddling Hack (the case where "v is a power of 2") to find the bit index with only 3 operations and a 32-entry table.
The pseudo-code is something like this:
n = lastCode = 0
increment:
n+=1
newCode = GrayCode(n)
delta = newCode XOR oldCode
bitToToggle = BitIndex(delta)
old code = new code
GOTO increment;
The algorithm posited by the OP does not generate any Gray code.
The algorithm in this answer: https://stackoverflow.com/a/4657711/7728918 is not constant time,
since the conditional test if (x) can vary from 1 to 4 executions depending on the value of counter[i].
This changes the amount of time that is required to calculate bit numbers.
There will be 4 different possible execution times that any single calculation may have.
See (cargo cult coders excepted) the reference for the rationale of the following that meets
this constant time requirement (no lights, no car, not a single luxury ... not even a "table"):
byte log2toN(byte b){
return 7 - (byte) ( 0x10310200A0018380uLL >> ( (byte)(0x1D*b) >>2 ) ) ; }
byte BRGC(byte n){ return n ^ n>>1; }
byte bit2flip(byte n){ return log2toN( BRGC(n) ^ BRGC(n+1) ); }
However, there is a much better, more succint and expedient method to meet the OP's criteria.
For cargo cult coding purposes the following conveniently satisfies the OP's conditions minimally (maximally? ;).
The bit number to change is found each time with only two operations: a modulus
(which if done modulo 2^ncould be as simple as a bit-wise & operation with n-1 bits
ie. the constant 2^n - 1) and an increment.
The actual Johnson Gray code (JGC) sequence is generated incrementally by XORing the previous code
with the desired bit, selected by a left shift of 1 to the bit number position.
This calculation is NOT needed as per the OP's requirements.
The Johnson Code
-------------------------
The actual Gray coding is irrelevant, so using a Johnson counter's Gray code is exceptionally trivial.
Note that the Johnson Gray code (JGC) density is linear and not logarithmic like base 2 or
binary reflected Gray code (BRGC).
With 32 bits in 4 bytes, the sequence can count from 0 to 63 before resetting.
/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /.
byte bitCtr=-1; // for 4 x 8 bits use 32 instead of 5
int JohnsonCode(){ static byte GCbits = 0;
return GCbits ^= 1u << ( bitCtr = ++bitCtr %5 ); }
/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /.
Test output:
Johnson counter Bit
Gray code: Flipped:
....1 0
...11 1
..111 2
.1111 3
11111 4
1111. 0
111.. 1
11... 2
1.... 3
..... 4
....1 0
...11 1 etc.
partially generated with this code:
void testJohnson(){
Serial.println("\n\tJohnson counter\t Bit\n\t Gray code:\t Flipped:");
for( int intifiny=31; --intifiny; )
Serial.println( "\t " + cut( 5, "....." +
// s.replace(...) returns zip,
// so VVV use lambda function invoke via VVV
// _____________________ V ____________________ ______________ V _____________
[](String s){ s.replace("0","."); return s; } ( String( JohnsonCode(), BIN ) )
) + "\t " + bitCtr
);
}
/*
Unfortunately, this answer does not explain "How do you (I, did I, ...) find ...".
For details on the methodology of finding such solutions and using a BRGC similarly ...
see previous ref.: https://stackoverflow.com/a/42846062/7728918
I have been trying to understand Algorithm L, to that end, I think I have found some potentially useful intuition I would like to share.
It starts with noticing the pattern of which bit to flip is recursive and symmetric.
0 1 0 2 0 1 0 3 0 1 0 2 0 1 0
Now it makes sense to think of them as a tree.
3
2 2
1 1 1 1
0 0 0 0 0 0 0 0
and therefore generated by the following algorithm:
def gen(arg):
if arg == 0:
print(arg)
else:
gen(arg - 1)
print(arg)
gen(arg - 1)
The tree above can be interpreted as the tree of activation frames of this algorithm.
If we were printing a non-zero number, the next number is obvious, it has to be 0. Therefore the problem of predicting the next element reduces to only predicting what will happen after 0.
Here is the interesting observation, that next thing after 0 must be the closest ancestor in the tree such that it is on the right of the current position. This suggest the following algorithm that propagate the right parent and therefore predicts the next element:
def gen(arg, right_parent):
if arg == 0:
print("%s %s" % (0, right_parent))
else:
gen(arg - 1, arg) # The right parent of my left child is me
print("%s %s" % (arg, 0))
gen(arg - 1, right_parent) # The right parent of my right children is my right parent.
Here is an annotated tree with right parents written in brackets:
3(4)
2(3) 2(4)
1(2) 1(3) 1(2) 1(4)
0(1) 0(2) 0(1) 0(3) 0(1) 0(2) 0(1) 0(4)
The problem with this approach is that when we execute it, the code might go through multiple steps of calling or returning so that the time spent between successive prints is not constant. We could argue that the time is amortized constant, after all, each pair of push and pop is associated with printing exactly one number.
Here is another idea. By the time we print the number, we know the stack frame is going away before the next time we print the same number, is it possible for us to front-load the work of returning and calling the same frame?
By the time the first 0 is printed, we knew its right parent is 1, so it will pass its own right_parent when it makes the recursive call again.
We summarize this observation in this rule:
If the right_parent value of a frame is exactly 1 larger than the current frame, then the right_parent value for the next call will be the right_parent value of the right parent frame.
By the time the second 0 is printed, we knew its right parent is 2, so the next call will be done through multiple steps from the second recursive call of the right parent. Any multiple steps call will lead to the fact that it is a left child, and a left child's right parent is always 1 larger than the current frame!
We summarize this observation in this rule:
If the right_parent value of a frame is larger than the current frame by more than 1, then the right_parent value for the next call will be exactly one larger than the current frame's value.
With that two rules, I come up with this algorithm:
def gen():
right_parent = [1,2,3,4]
cur = 0
for i in range(0, 15):
print(cur)
j = right_parent[cur]
if j == cur + 1:
if j != 4: # Avoid index outside of the list
right_parent[cur] = right_parent[j]
else:
right_parent[cur] = cur + 1
if cur == 0:
cur = j
else:
cur = 0
This is O(1), but it is not Algorithm L, which does not involve the comparisons. To explore, these comments will probably shed some lights:
def gen():
right_parent = [1,2,3,4]
cur = 0
for i in range(0, 15):
print(cur)
next = right_parent[cur]
if next == cur + 1:
if next != 4:
right_parent[cur] = right_parent[cur + 1] # f[j] = f[j + 1]
else:
right_parent[cur] = cur + 1 # f[j + 1] = j + 1
if cur == 0:
cur = next # j = f[0]
else:
cur = 0 # f[0] = 0
gen()
It feels like Algorithm L somehow deals with both a to-be-left and a to-be-right in the same iteration. It might have something to do with the "to-be-active" and "to-be-passive" notion as in Knuth's presentation, but I decide to stop here. I think it is good enough for an intuition into how the algorithm might be developed.
/*
As previously posted this answer, https://stackoverflow.com/questions/4648716#42865348
using a Johnson counter Gray code, is very simple:
Number_of_Bit_To_Flip = ++Number_of_Bit_To_Flip % Total_Number_of_Counter_Bits
which is executed on every event occurrence.
Otherwise, using a Binary Reflected Gray Code and a 4 byte base 2 counter n, ...
Method 1 - using a table
*/
static const byte twos[ ] = { // note pattern V V V V V V
0,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 6,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 7,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 6,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 8,
};
// operation count worst case 3 logic 4 index 1 addition
// for 7/8 of calls 2 3 1
byte bit_change(byte ctr[4]) {
return
(byte[]){
(byte[]){ 16 + twos[ctr[2]] ,
(byte[]){24 + twos[ctr[3]] ,
31 } [ !ctr[3] ] } [ !ctr[2] ] ,
(byte[]){ 0 + twos[ctr[0]] ,
8 + twos[ctr[1]] } [ !ctr[0] ] }
[ctr[0] || ctr[1]];
// -------- NB. orphaned, included for pedagogic purposes --------
return (byte[]){ 0 + twos[ctr[0]] , // this IS totally time constant
8 + twos[ctr[1]] , // NB. 31 + time "constantator"
16 + twos[ctr[2]] , // case ctr[i]==0 for all i
24 + twos[ctr[3]] ,
31 + twos[ctr[0]] } [ !ctr[0]*( 1+
!ctr[1]*( 1+
!ctr[2]*( 1+
!ctr[3] ) ) ) ];
}
/Method 2 - no tables */
byte bin2toN(byte b){
return
(byte []) {(byte []) {(byte []) {7,6} [b < 128 ] ,
(byte []) {5,4} [b < 32 ] } [b < 64 ] ,
(byte []) {(byte []) {3,2} [b < 8 ] ,
(byte []) {1,0} [b < 2 ] } [b < 4 ] } [b < 16 ] ;
}
byte flip_bit(byte n[4]){
return
(byte []){
(byte []){ 16 + bin2toN( n[2] & -n[2] ) ,
(byte []){ 24 + bin2toN( n[3] & -n[3] ),
31 } [ !n[3] ] } [ !n[2] ] ,
(byte []){ 0 + bin2toN( n[0] & -n[0] ) ,
8 + bin2toN( n[1] & -n[1] ) } [ !n[0] ] }
[ n[0] || n[1] ] ;
// - - - - - - - - - - - - ORPHANED, fails on zero - - - - - - - - - - - -
return // included for pedagogic purposes
(byte []) {
(byte []) { bin2toN( n[2] & -n[2] ) + 16 ,
bin2toN( n[3] & -n[3] ) + 24 } [ !n[2] ] ,
(byte []) { bin2toN( n[0] & -n[0] ) + 0 ,
bin2toN( n[1] & -n[1] ) + 8 } [ !n[0] ] } [ n[0] || n[1] ] ;
}
/*
Bit Bunnies and Cargo Cult Coders have no need to read further.
The efficiency of execution of the above code depends on the fact that n[0], n[1], ...
are computed at compile time as fixed addresses which is quite conventional.
Also, using a call-by-need optimizing compiler will expedite the array contents
so only one indexed value needs to be calculated.
This compiler sophistication is likely missing but it is easy to manually assemble
the raw machine code to do it (basically a switch, computed goto, etc. ).
Analysis of the above algorithms, using the orphaned code, shows that every function
call will execute exactly the same instruction sequence, optimized or not.
In both methods, the non-orphaned returns require handling the case when there
is counter roll over on 0, consequently using extra index and logical
(!) operations. This extra happens for 1/2 of 1/2 of 1/2 or 1/8 of the total counts,
and for one count in this 1/8 there is nothing to do but return 31.
The first method requires 2 logical operations (! ||), 1 addition and 3 index
calculations for 7/8 of the total counts. On a single count of zero, 3 logical and
3 index operations are needed and the rest of the other 1/8 need 3 logical, 1 addition
and 4 index operations.
The final code on execution of method 2 (compiled optimally), for 7/8 of the
calculations, uses 7 logical operations (|| & ! < - the last is two's complement),
1 arithmetic (+) and 5 calculated index operations. The other 1/8, but one
instance, need 8 logical, 1 addition and 6 calculated index operations.
Unfortunately, no flash of divine inspiration manifested any cargo code.
This is an abridged tale of how this composition's authorship happened.
How this was done involved a crucial preliminary investigation as documented:
https://stackoverflow.com/a/42846062.
Then code was derived using a succesive refinement process commencing with an assesment
of the algorithms in this post.
Specifically, this answer: https://stackoverflow.com/a/4657711 .
This algorithm's time variant execution from the loop overhead will be
poignently and prominently accentuated by reducing the return calculation
to a single addition and two index operations.
*/
byte bit_change(byte ctr[4]) {
static byte ones[256]; // this sparse RA is precomputed once
for (byte i = 255; --i; ones[i]=0) ;
ones[ 0] = ones[ 1] = 0; ones[ 3] = 1; ones[ 7] = 2;
ones[15] = 3; ones[31] = 4; ones[63] = 5; ones[127] = 6; ones[255] = 7;
// { ie. this very sparse array is completely adequate for original code
// 0,0, ,1, , , ,2, , , , , , , ,3, , , , , , , , , , , , , , , ,4,
// , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,5,
// , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
// , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,6,
// , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
// , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
// , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
// , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,7, }
// 1/2 of count uses 2 index 2 conditionals 0 increment 1 logic 2 +/- 1 x
// 1/4 3 4 1 1 2 1
// 1/8 4 6 2 1 2 1
// 1/16 5 8 3 1 2 1
// average 14 = 3.5 5 1.5 1 2 1
unsigned char i; for (i = 0; i < 4; i++) { // original code
unsigned char x = counter[i]; // "works" but
if (x) { // still fails on
x ^= x - 1; // count 0 rollover
return 8 * i + ones[x];
} }
// ............................. refinement .............................
byte x; for (byte i = 0; i < 4; i++) //
if (x = counter[i])
return i<<3 + ones[x ^ x - 1];
}
/--------------------------------------------------------------
--------------------------------/
// for (byte i = 255; --i; twos[i] == ones[i ^ i-1] ) ;
// ones[ ] uses only 9 of 1/4K inefficient, twos[ ] uses all 1/4K
static const byte twos[ ] = {
0,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 6,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 7,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 6,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, 8,
};
// fix count 0 rollover failure, make time absolutely constant as per OP
byte f0(byte ctr[4]) {
byte ansr=31;
for (byte i = 0; i < 4; i++)
if (ctr[i]) {
ansr=(byte[]){0,8,16,24}[i] + twos[ctr[i]]; // i<<3 faster?
break;
}
for (; i < 4; i++) if (ctr[i]) ;
return ansr;
}
//..................................................
// loop ops (average): 1.5 ++ 2.5 [] 5 if
// result calculation: 1 + 2 [] significant loop overhead
byte f1(byte counter[4]) {
for (byte i = 0; i < 4; i++)
if (counter[i])
return (byte[]){ 0 + twos[counter[0]],
8 + twos[counter[1]],
16 + twos[counter[2]],
24 + twos[counter[3]] } [i];
return 31;
}
//..................................................
// 5 +/++ 6 [] 10 if
byte f2(byte counter[4]){
byte i, ansr=31;
for (i = 0; i < 4; i++) { // definite loop overhead
if (counter[i]) {
ansr= (byte[]){ 0 + twos[counter[0]],
8 + twos[counter[1]],
16 + twos[counter[2]],
24 + twos[counter[3]] } [i];
break;
} }
for (; i < 4; i++) if (counter[i]); // time "constantator"
return ansr;
}
//..................................................
// 4 + 4 ! 3 x 1 [] 1 computed goto/switch
byte f3(byte counter[4]){ // default: time "constantator"
switch (!counter[0]*( 1 + // counter[0]==0 !!
!counter[1]*( 1 +
!counter[2]*( 1 +
!counter[3] ) ) ) ){
case 0: return 0 + twos[ counter[0] ] ;
case 1: return 8 + twos[ counter[1] ] ;
case 2: return 16 + twos[ counter[2] ] ;
case 3: return 24 + twos[ counter[3] ] ;
default: return 31 + twos[ counter[0] ] ;
} }
/*
There is a comparable chronology for method 2.
This sequence has been radically attenuated and abbreviated to an intermediate example:
Inadvertently, the code posted in https://stackoverflow.com/a/42865348 was
not the exclusively byte sized one as intended. The intended code is in this post.
*/
byte log2toN(byte b){ return ( b & 0xF0 ? 4:0 ) + // 4444....
( b & 0xCC ? 2:0 ) + // 22..22..
( b & 0xAA ? 1:0 ) ; // 1.1.1.1.
}
// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
byte BRGC(byte n){ return n ^ n>>1; }
byte bitNbyte(byte n){ return log2toN( BRGC(n) ^ BRGC(n+1) ); }
byte bit2flip(byte n[4]){
boolean n3=n[3]<255, n2=n[2]<255, n1=n[1]<255, n0=n[0]<255;
return n0*( 0 + bitNbyte( n[0] ) ) + !n0*(
n1*( 8 + bitNbyte( n[1] ) ) + !n1*(
n2*(16 + bitNbyte( n[2] ) ) + !n2*(
n3*(24 + bitNbyte( n[3] ) ) + !n3*( 0 ) ) ) );
}
// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
byte bit_flip(byte n[4]){
//switch( ( !!n[0] << 3 ) | ( !!n[1] << 2 ) | ( !!n[2] << 1 ) | !!n[3] )
switch( 15 ^ ( !n[0] << 3 ) ^ ( !n[1] << 2 ) ^ ( !n[2] << 1 ) ^ !n[3] ) {
case 15: case 14: case 13: case 12:
case 11: case 10: case 9: case 8: return 0 + log2toN( n[0] & -n[0] );
case 7: case 6: case 5: case 4: return 8 + log2toN( n[1] & -n[1] );
case 3: case 2: return 16 + log2toN( n[2] & -n[2] );
case 1: return 24 + log2toN( n[3] & -n[3] );
default: return 31 + log2toN( n[0] & -n[0] );
} }
/*
Rhetorically, the answer to How do I find ... can only be answered explicitly
in the personal sense (see this answer: https://stackoverflow.com/a/42846062) as
it is not possible to speak for other individuals' cognitive abilities.
The content of https://stackoverflow.com/a/42846062 is crucial for background
information and reflects the very personal
pedantic mechanism required for the mental gymnastics to solve this problem.
Undoubtedly, the milieu and plethora of material is daunting but this is exactly
the personal approach taken in garnering sufficient insight, repertoire, anecdotal
precedents, etc. to extrapolate and interpolate an answer to answer specifically,
what program meets the criteria, exactly. Further, it is this very "chase" that
excites and motivates the (perhaps pathological) psyche to invest time and effort
to satiate curiosity with an inquisitive quest.
*/
void setup() { }
void loop() { }
/*
Can not edit previous answer, as per commentary, so posting rewrite:
Too impatient?
For immediate gratification and minimal edification, cut to the chase and chase this link where
only the final result has been posted:
C code for generating next bit to flip in a gray code
REFs:
C code for generating next bit to flip in a gray code
How do I find next bit to change in a Gray code in constant time?
Deriving nth Gray code from the (n-1)th Gray Code
Gray code increment function
Efficient way to iterate over Gray code change positions
Generating gray codes.
https://en.wikipedia.org/wiki/The_C_Programming_Language
https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style
WARNING:
For purposes of coding expediency and demonstrable functional execution,
some non-trivial programming techniques have been used. However, this is
hopefully mitigated for the exposition of the concept rudiments
by presenting the essence as trivially and minimally as possible with
highlighting by / / / /. Careful reading, study and experiment are encouraged to
avoid cargo cult coding, oversights, and perpetrating mistakes.
This answer is manifested in the Arduino IDE ESP8266 core coding environment.
The algorithm as posited by the OP is not exactly correct (as if this;).
The Johnson Code
-------------------------
Since the actual Gray coding is irrelevant, using a Johnson counter's Gray code is an exceptionally easy
and poignant way to cognitively and computationally count both the bit to change and the next code.
Note that the Johnson counter Gray code density is linear and not logarithmic.
With 32 bits in 4 bytes, the sequence can count from 0 to 63 before resetting.
It is necessary to verify carefully the functional suitability of the code that follows,
modifying it as appropriate.
HINT: Verification is a MUST, especially for the "binary reflected" Gray code (BRGC)!
/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /./
byte bitCtr=-1; // for 4 x 8 bits use 32 instead of 5
byte JohnsonCode(){ static byte GCbits = 0;
return GCbits ^= 1u << ( bitCtr = ++bitCtr %5 ); }
/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /./
void testJohnson(){
Serial.println("\n\tJohnson counter\t Bit\n\t Gray code:\t Flipped:");
for( int intifiny=31; --intifiny; )
Serial.println( "\t " + cut( 5, "....." +
// s.replace(...) returns zip,
// so VVV use lambda function invoke via VVV
// _____________________ V ____________________ ______________ V _____________
[](String s){ s.replace("0","."); return s; } ( String( JohnsonCode(), BIN ) )
) + "\t " + bitCtr
);
}
/*
Outputs:
Johnson counter Bit
Gray code: Flipped:
....1 0
...11 1
..111 2
.1111 3
11111 4
1111. 0
111.. 1
11... 2
1.... 3
..... 4
....1 0
...11 1 etc.
Some background material on the Binary Reflected Gray Code (BRGC)
-----------------------------------------------------------------------------------------------
CONVERSIONS:
---------------------
REF: Code Golf: Gray Code
// These javascript scURIples may run by copy and paste to the URI browser bar.
// convert base 2 to BRGC: n^=n>>1
// get base 2 from BRGC: n^=n>>1 n^=n>>2 n^=n>>4 ...
javascript: n=16; s="<pre>";
function f(i,n){ return i?f(i>>1,n^n>>i):n}
while(n--) s += f(4,n^n>>1) .toString(2)+"\n";
javascript: n=16; s="<pre>"; while(n--) s += (n^n>>1) .toString(2)+"\n";
javascript: c=0; n=16; s="<pre>"; while(n--) s +=(c^(c=n^n>>1)).toString(2)+"\n";
COUNTING:
-----------------
The following (as per ref. above) arbitrarily gets both the preceding and following BRGC's for a code.
NB! The order of n1 and n2 is parity determined and non-corresponding otherwise.
The ordering might be n1, gray, n2 OR it could be n2, gray, n1, so, eo (parity) discriminates.
unsigned n1 = gray ^ 1;
unsigned n2 = gray ^ ((gray & -gray) << 1);
gray = eo=!eo ? n1 : n2; // eo (or parity) gets Every Other
ie.
bitToFlip = eo=!eo ? 1 : (gray & -gray) << 1; gray ^= bitToFlip;
hence
gray ^= eo=!eo ? 1 : (gray & -gray) << 1;
/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /./
byte tooGray( byte (*f)(byte) ){
static byte BRGC=0, base2=0;
static boolean eo=false;
return
(*f)( BRGC ^= (eo=!eo) ? (BRGC & -BRGC) <<1 : 1 ) & 0x3 |
// count ^---------------------------------------^ in raw BRGC
(*f)( base2 ^ base2++ >>1 ) & 0xC ; }
// count in base 2 ^---------------------^ and convert to BRGC
/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /.
REF:
The neighbors in Gray code
http://www.graphics.stanford.edu/~seander/bithacks.html
http://www.inwap.com/pdp10/hbaker/hakmem/hakmem.html
https://en.wikipedia.org/wiki/Ring_counter#Johnson_counter
Oh yeah, ... count set bits in A ^ A+1 which will have a bit pattern like 000...0111..1 Prove.
How to get bit position for a power of 2 - the n parameters must have a single bit set.
Method 1
*/
byte naive1(byte n){ return bitNumber(n-1); }
byte bitNumber(byte m){ // can use A ^ A+1 ... BUT >> 1 first OR -1 after
return ( m & 1 ?1:0 ) + ( m & 2 ?1:0 ) + ( m & 4 ?1:0 ) + ( m & 8 ?1:0 ) +
( m & 16 ?1:0 ) + ( m & 32 ?1:0 ) + ( m & 64 ?1:0 ) + ( m & 128 ?1:0 );}
// 256 512 1024 2048 ...
/*
Method 2
*/
byte naively2(byte n){
return ( n & 1 ?0:0 ) + ( n & 2 ?1:0 ) + ( n & 4 ?2:0 ) + ( n & 8 ?3:0 ) +
( n & 16 ?4:0 ) + ( n & 32 ?5:0 ) + ( n & 64 ?6:0 ) + ( n & 128 ?7:0 );}
/*
Method 3
*/
byte powerOf2(byte n){ return ( n & 0xF0 ? 4:0 ) + // 4444....
( n & 0xCC ? 2:0 ) + // 22..22..
( n & 0xAA ? 1:0 ) ; } // 1.1.1.1.
/*
Method 4
*/
// and the penultimate,
// http://www.graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
byte fastNof2up(byte b){ return (byte []){0,1,6,2,7,5,4,3}[(byte)(b*0x1Du)>>5];}
// 7,0,5,1,6,4,3,2 0x3Au
// b==2^N 0,1,2,4,7,3,6,5 0x17u
// 7,0,1,3,6,2,5,4 0x2Eu
// |------| |------|
// .....00011101 ........1101....
// ......0011101. .........101.....
// .......011101.. ..........01......
// ........11101... ...........1.......
// |------| |------|
/*
Method 5
Details are at the end.
Can a judicious choice of constants reduce this to only 2 operations?
*/
byte log2toN(byte b){ return 7 - (byte) ( 0x10310200A0018380uLL >> ( (byte)(0x1D*b) >>2 ) ) ; }
/*
Testing Environment
*/
void setup() {Serial.begin(115200); testJohnson(); test(); fastLog2upN(0); }
void loop() { delay(250); // return ;
[](byte GrayX2){ Serial.print( GrayX2 ^ 0x0F ? "" : baq(GrayX2)+"\n");
analogWrite( D5, (int []) { 0, 1200, 0, 300 } [ GrayX2 & 0x3 ] );
analogWrite( D6, (int []) { 0, 0, 1200, 300 } [ GrayX2 & 0x3 ] );
analogWrite( D7, (int []) { 0, 1200, 0, 300 } [ GrayX2 >> 2 & 0x3 ] );
analogWrite( D8, (int []) { 0, 0, 1200, 300 } [ GrayX2 >> 2 & 0x3 ] ); }
// ( tooGray( b ) );
( tooGray( [](byte n) { return n; } ) );
}
/======================================================================
Caveat:
-----------
The OP's algorithm is not effective.
Keep binary counter A
Find B as A XOR (A+1)
Bit to change is LowestBitSet in B
as seen when coded as:
*/
void test(){
static byte C=0, bits=0;
Serial.println((String)"\n "+(3^2+1)+" "+(3^(2+1))+" "+((3^2)+1) );
Serial.println(
"\n manifested by an actual "
"\n obstinate perverse bit to "
"\n psyche flip check"
"\n A A+1 A ^ A+1 B^=A^A+1 (A^A+1)+1>>1 ");
for(int intifiny=32, A=0; intifiny--; A=++A%15) // sort a go infinite ... an epiphany!
Serial.println( (String) pad( b( bits ^= b( b(A) ^ b(A+1) ) ) ) +" "+
baq( (A^A+1)+1>>1 ) +" "+ baq( C^=(A^A+1)+1>>1 ) +" "
// "| "+ pad(A)+" "+ pad(bits)
);
Serial.println(
" | "
"\n BITS: "
"\n Bit Isn't This Silly "
"\n Bit Is Totally Set (A ^ A+1) & -(A ^ A+1) == 1 ALWAYS "
"\n\n non-binary Gray codes? "
"\n {-1,0,1} balanced ternary, factoroid (factoradic), {0,-1} negated binary \n");
} // https://en.wikipedia.org/wiki/Steinhaus%E2%80%93Johnson%E2%80%93Trotter_algorithm
// some service routines ...
String cut(byte sz, String str) { return str.substring(str.length()-sz) ; }
String pad(byte n ) { return cut( 4, " " + String(n,DEC) ); }
String baq(byte n ) { return cut( 9, "........." + String(n,BIN) ); }
void q ( ) { /* PDQ QED PQ's */ }
void p ( String str) { Serial.print( " " + str + " " ) ; }
byte d (byte n ) { p(pad(n)); return n ; }
byte b (byte n ) { p(baq(n)); return n ; }
/*
Sample output:
flip bit
"correctly" confirm?
A A+1 A ^ A+1 B^=A^A+1 (A^A+1)+1>>1
........0 ........1 ........1 ........1 1 ........1 ........1 | 0 1
........1 .......10 .......11 .......10 2 .......10 .......11 | 1 2
.......10 .......11 ........1 .......11 3 ........1 .......10 | 2 3
.......11 ......100 ......111 ......100 4 ......100 ......110 | 3 4
......100 ......101 ........1 ......101 5 ........1 ......111 | 4 5
......101 ......110 .......11 ......110 6 .......10 ......101 | 5 6
......110 ......111 ........1 ......111 7 ........1 ......100 | 6 7
......111 .....1000 .....1111 .....1000 8 .....1000 .....1100 | 7 8
.....1000 .....1001 ........1 .....1001 9 ........1 .....1101 | 8 9
.....1001 .....1010 .......11 .....1010 10 .......10 .....1111 | 9 10
.....1010 .....1011 ........1 .....1011 11 ........1 .....1110 | 10 11
etc. |
BITS:
Bit Isn't This Silly Houston; We have a (an-other) problem
Bit Is Totally Set
(A ^ A+1) & -(A ^ A+1) == 1 ALWAYS
Curious?
-----------
The following code is a
very very crude method that can expedite a hunt for suitable bit packed counting candidates
to compute log 2^n (in base 2 ie. n).
*/
byte fastLog2upN(byte b){ // b==2^N
for(long long int i=8, b=1; --i; )
Serial.println((int)(0x0706050403020100uLL / (b*=0x100)),HEX) ;
for( int i=9, b=1; --i;b*=2) // 3A = 1D*2
Serial.println(
(String) ( (b>>4 | b>>2 | b>>1) & 7 ) + " \t" +
( (0xB8*b) >>8 & 7 ) + " \t" +
( (0xB8*b) >>7 & 7 ) + " \t" +
( (0x1D*b) >>4 & 7 ) + " \t" +
( (0x0D*b) >>3 & 7 ) + " |\t" +
( ((byte)(0x9E*b) >>2 ) ) + " \t" +
(byte) ( 0x07070301C0038007uLL >> ((byte)(0x9E*b) >>2 ) ) + " \t" +
( ((byte)(0x1E*b) >>2 ) ) + " \t" +
(byte) ( 0x7070001C0038307uLL >> ((byte)(0x1E*b) >>2 ) ) + " \t" +
( ((byte)(0x5E*b) >>2 ) ) + " \t" +
(byte) ( 0x703800183838307uLL >> ((byte)(0x5E*b) >>2 ) ) + " \t| " +
( ((byte)(0x3A*b))>>5 ) + " \t" +
( ((byte)(0x3A*b))>>4 ) + " \t" +
( ((byte)(0x3A*b))>>3 ) + " \t" +
( ((byte)(0x3A*b))>>2 ) + " \t" +
( ((byte)(0x3A*b))>>1 ) + " \t" +
( ((byte)(0x3A*b))>>0 ) + " \t| " +
(byte) ( 0x0203040601050007uLL >> 8*((byte)(0x3A*b) >>5 ) ) + " \t" +
String((byte) ( 0x0706050403020100uLL >> ((byte)(0x3A*b) >>4 ) ),HEX ) + "\t" +
String((byte) ( 0x0020010307uLL >> ((byte)(0x3A*b) >>3 ) ),HEX ) + "\t" +
String((byte) ( 0x10300200A0018007uLL >> ((byte)(0x3A*b) >>2 ) ),HEX ) + "\t|" +
( ((byte)(0x1D*b))>>2 ) + " \t" +
(byte) ( 0x10710700E0018380uLL >> ((byte)(0x1D*b) >>2 ) ) + " \t" +
(byte) ( 0x10310200A0018380uLL >> ((byte)(0x1D*b) >>2 ) ) + " \t| " +
"") ;
}
/*
javascript: x=6; y=4; n=511; ra=[]; s="<pre>x";
while(n--)for(i=5;i;i=i==3?2:--i){
j=0;
for(b=1;b<129;b*=2) ra[j++]=((n*b)&0xFF)>>i;
ra.sort(function(a, b){return a-b});
if ( tf=ra[--j] < 64 && ra[1]>ra[0]+x )
while( --j && ( tf = (ra[j]>ra[j-1]+x) || ( ra[j]<ra[j-1]+y && ra[j+1]>ra[j]+x) ) );
if(tf) s+="\n "+n.toString(16)+" >> "+i+" \t"+ra.join(" ");
};
s;
// many >>2 's to do: sledge hammer creation of acceptable bit pattern solutions
// only want btf's - Bits That Fit (in 8 bytes): (tf=ra[j-1] < 64)
// program proximity control so no BS (Brain Strain!): tf = (ra[j]<ra[j-1]+x) || (ra[j]<ra[j-2]+y)
// for debug: s+="\n "+n.toString(16)+" >> "+i;
// for debug: s+="\n" +tf+"\t"+ra[j]+"\t"+ra[j-1]+"\t"+ra[j-2];
*/

Resources