Manually Converting rgba8 to rgba5551 - opengl-es

I need to convert rgba8 to rgba5551 manually. I found some helpful code from another post and want to modify it to convert from rgba8 to rgba5551. I don't really have experience with bitewise stuff and haven't had any luck messing with the code myself.
void* rgba8888_to_rgba4444( void* src, int src_bytes)
{
// compute the actual number of pixel elements in the buffer.
int num_pixels = src_bytes / 4;
unsigned long* psrc = (unsigned long*)src;
unsigned short* pdst = (unsigned short*)src;
// convert every pixel
for(int i = 0; i < num_pixels; i++){
// read a source pixel
unsigned px = psrc[i];
// unpack the source data as 8 bit values
unsigned r = (px << 8) & 0xf000;
unsigned g = (px >> 4) & 0x0f00;
unsigned b = (px >> 16) & 0x00f0;
unsigned a = (px >> 28) & 0x000f;
// and store
pdst[i] = r | g | b | a;
}
return pdst;
}

The value of RGBA5551 is that it has color info condensed into 16 bits - or two bytes, with only one bit for the alpha channel (on or off). RGBA8888, on the other hand, uses a byte for each channel. (If you don't need an alpha channel, I hear RGB565 is better - as humans are more sensitive to green). Now, with 5 bits, you get the numbers 0 through 31, so r, g, and b each need to be converted to some number between 0 and 31, and since they are originally a byte each (0-255), we multiply each by 31/255. Here is a function that takes RGBA bytes as input and outputs RGBA5551 as a short:
short int RGBA8888_to_RGBA5551(unsigned char r, unsigned char g, unsigned char b, unsigned char a){
unsigned char r5 = r*31/255; // All arithmetic is integer arithmetic, and so floating points are truncated. If you want to round to the nearest integer, adjust this code accordingly.
unsigned char g5 = g*31/255;
unsigned char b5 = b*31/255;
unsigned char a1 = (a > 0) ? 1 : 0; // 1 if a is positive, 0 else. You must decide what is sensible.
// Now that we have our 5 bit r, g, and b and our 1 bit a, we need to shift them into place before combining.
short int rShift = (short int)r5 << 11; // (short int)r5 looks like 00000000000vwxyz - 11 zeroes. I'm not sure if you need (short int), but I've wasted time tracking down bugs where I didn't typecast properly before shifting.
short int gShift = (short int)g5 << 6;
short int bShift = (short int)b5 << 1;
// Combine and return
return rShift | gShift | bShift | a1;
}
You can, of course condense this code.

Related

How to generate uniform single precision floating point random number between 0 and 1 in FPGA?

I am trying to generate single precision floating point random number using FPGA by generating number between 0 and 0x3f80000 (IEEE format for 1). But since there are more number of discreet points near to zero than 1, I am not getting uniform generation. Is there any transformation which I can apply to mimic uniform generation. I am using LFSR(32 Bit) and Xoshiro random number generation.
A standard way to generate uniformly distributed floats in [0,1) from uniformly distributed 32-bit unsigned integers is to multiply the integers with 2-32. Obviously we wouldn't instantiate a floating-point multiplier on the FPGA just for this purpose, and we do not have to, since the multiplier is a power of two. In essence what is needed is a conversion of the integer to a floating-point number, then decrementing the exponent of the floating-point number by 32. This does not work for a zero input which has to be handled as a special case. In the ISO-C99 code below I am assuming that float is mapped to IEEE-754 binary32 type.
Other than for certain special cases, the significand of an IEEE-754 binary floating-point number is normalized to [1,2). To convert an integer into the significand, we need to normalize it, so the most significant bit is set. We can do this by counting the number of leading zero bits, then left shifting the number by that amount. The count of leading zeros is also needed to adjust the exponent.
The significand of a binary32 number comprises 24 bits, of which only 23 bits are stored; the most significant bit (the integer bit) is always one and therefore implicit. This means not all of the 32 bits of the integer can be incorporated into the binary32, so in converting a 32-bit unsigned integer one usually rounds to 24-bit precision. To simplify the implementation, in the code below I simply truncate by cutting off the least significant eight bits, which should have no noticeable effect on the uniform distribution. For the exponent part, we can combine the adjustments due to normalization step with the subtraction due to the scale factor of 2-32.
The code below is written using hardware-centric primitives. Extracting a bit is just a question of grabbing the correct wire, and shifts by fixed amounts are likewise simply wire shifts. The circuit needed to count the number of leading zeros is typically called a priority encoder.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#define USE_FP_MULTIPLY (0)
uint32_t bit (uint32_t, uint32_t);
uint32_t mux (uint32_t, uint32_t, uint32_t);
uint32_t clz (uint32_t);
float uint32_as_float (uint32_t);
/* uniform float in [0, 1) from uniformly distributed random integers */
float uniform_rand_01 (uint32_t i)
{
const uint32_t FP32_EXPO_BIAS = 127;
const uint32_t FP32_MANT_BITS = 24;
const uint32_t FP32_STORED_MANT_BITS = FP32_MANT_BITS - 1;
uint32_t lz, r;
// compute shift amount needed for normalization
lz = clz (i);
// normalize so that msb is set, except when input is zero
i = mux (bit (lz, 4), i << 16, i);
i = mux (bit (lz, 3), i << 8, i);
i = mux (bit (lz, 2), i << 4, i);
i = mux (bit (lz, 1), i << 2, i);
i = mux (bit (lz, 0), i << 1, i);
// build bit pattern for IEEE-754 binary32 floating-point number
r = (((FP32_EXPO_BIAS - 2 - lz) << FP32_STORED_MANT_BITS) +
(i >> (32 - FP32_MANT_BITS)));
// handle special case of zero input
r = mux (i == 0, i, r);
// treat bit-pattern as 'float'
return uint32_as_float (r);
}
// extract bit i from x
uint32_t bit (uint32_t x, uint32_t i)
{
return (x >> i) & 1;
}
// simulate 2-to-1 multiplexer: c ? a : b ; c must be in {0,1}
uint32_t mux (uint32_t c, uint32_t a, uint32_t b)
{
uint32_t m = c * 0xffffffff;
return (a & m) | (b & ~m);
}
// count leading zeros. A priority encoder in hardware.
uint32_t clz (uint32_t x)
{
uint32_t m, c, y, n = 32;
y = x >> 16; m = n - 16; c = (y != 0); n = mux (c, m, n); x = mux (c, y, x);
y = x >> 8; m = n - 8; c = (y != 0); n = mux (c, m, n); x = mux (c, y, x);
y = x >> 4; m = n - 4; c = (y != 0); n = mux (c, m, n); x = mux (c, y, x);
y = x >> 2; m = n - 2; c = (y != 0); n = mux (c, m, n); x = mux (c, y, x);
y = x >> 1; m = n - 2; c = (y != 0); n = mux (c, m, n - x);
return n;
}
// re-interpret bit pattern of a 32-bit integer as an IEEE-754 binary32
float uint32_as_float (uint32_t a)
{
float r;
memcpy (&r, &a, sizeof r);
return r;
}
// George Marsaglia's KISS PRNG, period 2**123. Newsgroup sci.math, 21 Jan 1999
// Bug fix: Greg Rose, "KISS: A Bit Too Simple" http://eprint.iacr.org/2011/007
static uint32_t kiss_z=362436069, kiss_w=521288629;
static uint32_t kiss_jsr=123456789, kiss_jcong=380116160;
#define znew (kiss_z=36969*(kiss_z&65535)+(kiss_z>>16))
#define wnew (kiss_w=18000*(kiss_w&65535)+(kiss_w>>16))
#define MWC ((znew<<16)+wnew )
#define SHR3 (kiss_jsr^=(kiss_jsr<<13),kiss_jsr^=(kiss_jsr>>17), \
kiss_jsr^=(kiss_jsr<<5))
#define CONG (kiss_jcong=69069*kiss_jcong+1234567)
#define KISS ((MWC^CONG)+SHR3)
#define N 100
uint32_t bucket [N];
int main (void)
{
for (int i = 0; i < 100000; i++) {
uint32_t i = KISS;
#if USE_FP_MULTIPLY
float r = i * 0x1.0p-32f;
#else // USE_FP_MULTIPLY
float r = uniform_rand_01 (i);
#endif // USE_FP_MULTIPLY
bucket [(int)(r * N)]++;
}
for (int i = 0; i < N; i++) {
printf ("bucket [%2d]: [%.5f,%.5f): %u\n",
i, 1.0f*i/N, (i+1.0f)/N, bucket[i]);
}
return EXIT_SUCCESS;
}
Please check the xoshiro128+ here https://prng.di.unimi.it/xoshiro128plus.c
The VHDL code written by someone can be found here:
https://github.com/jorisvr/vhdl_prng/tree/master/rtl
The seed value is generated from another random number generation algorithm so don't get confused by this.
Depending on the seed value used it should give a uniform distribution.

C++ sizeof(struct)

code like this:
#include <stdio.h>
int main(){
struct{
unsigned char a:4;
unsigned char b:4;
}i;
struct{
unsigned char a:4;
unsigned char b:4;
unsigned char c:4;
}j;
i.a = 1;
i.b = 1;
j.a = 1;
j.b = 1;
j.c = 1;
printf("size of i is: %d, size of j is: %d", sizeof(i), sizeof(j));
return 0;
}
why the output is 1 2? means size of i possess 1 byte, j possess 2 bytes. we know unsigned char have 1 byte, so why i not equal 2? i am sorry for my english.
All variables in C++ are padded upto next byte.
In struct i, both a and b are of 4 bit summing up to 1 byte.
In j, variables sum up to 12 bits, but size is 2 byte due to padding.
Reference: http://www.cplusplus.com/forum/general/51911/

Algorithm Challenge: Arbitrary in-place base conversion for lossless string compression

It might help to start out with a real world example. Say I'm writing a web app that's backed by MongoDB, so my records have a long hex primary key, making my url to view a record look like /widget/55c460d8e2d6e59da89d08d0. That seems excessively long. Urls can use many more characters than that. While there are just under 8 x 10^28 (16^24) possible values in a 24 digit hex number, just limiting yourself to the characters matched by a [a-zA-Z0-9] regex class (a YouTube video id uses more), 62 characters, you can get past 8 x 10^28 in only 17 characters.
I want an algorithm that will convert any string that is limited to a specific alphabet of characters to any other string with another alphabet of characters, where the value of each character c could be thought of as alphabet.indexOf(c).
Something of the form:
convert(value, sourceAlphabet, destinationAlphabet)
Assumptions
all parameters are strings
every character in value exists in sourceAlphabet
every character in sourceAlphabet and destinationAlphabet is unique
Simplest example
var hex = "0123456789abcdef";
var base10 = "0123456789";
var result = convert("12245589", base10, hex); // result is "bada55";
But I also want it to work to convert War & Peace from the Russian alphabet plus some punctuation to the entire unicode charset and back again losslessly.
Is this possible?
The only way I was ever taught to do base conversions in Comp Sci 101 was to first convert to a base ten integer by summing digit * base^position and then doing the reverse to convert to the target base. Such a method is insufficient for the conversion of very long strings, because the integers get too big.
It certainly feels intuitively that a base conversion could be done in place, as you step through the string (probably backwards to maintain standard significant digit order), keeping track of a remainder somehow, but I'm not smart enough to work out how.
That's where you come in, StackOverflow. Are you smart enough?
Perhaps this is a solved problem, done on paper by some 18th century mathematician, implemented in LISP on punch cards in 1970 and the first homework assignment in Cryptography 101, but my searches have borne no fruit.
I'd prefer a solution in javascript with a functional style, but any language or style will do, as long as you're not cheating with some big integer library. Bonus points for efficiency, of course.
Please refrain from criticizing the original example. The general nerd cred of solving the problem is more important than any application of the solution.
Here is a solution in C that is very fast, using bit shift operations. It assumes that you know what the length of the decoded string should be. The strings are vectors of integers in the range 0..maximum for each alphabet. It is up to the user to convert to and from strings with restricted ranges of characters. As for the "in-place" in the question title, the source and destination vectors can overlap, but only if the source alphabet is not larger than the destination alphabet.
/*
recode version 1.0, 22 August 2015
Copyright (C) 2015 Mark Adler
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
Mark Adler
madler#alumni.caltech.edu
*/
/* Recode a vector from one alphabet to another using intermediate
variable-length bit codes. */
/* The approach is to use a Huffman code over equiprobable alphabets in two
directions. First to encode the source alphabet to a string of bits, and
second to encode the string of bits to the destination alphabet. This will
be reasonably close to the efficiency of base-encoding with arbitrary
precision arithmetic. */
#include <stddef.h> // size_t
#include <limits.h> // UINT_MAX, ULLONG_MAX
#if UINT_MAX == ULLONG_MAX
# error recode() assumes that long long has more bits than int
#endif
/* Take a list of integers source[0..slen-1], all in the range 0..smax, and
code them into dest[0..*dlen-1], where each value is in the range 0..dmax.
*dlen returns the length of the result, which will not exceed the value of
*dlen when called. If the original *dlen is not large enough to hold the
full result, then recode() will return non-zero to indicate failure.
Otherwise recode() will return 0. recode() will also return non-zero if
either of the smax or dmax parameters are less than one. The non-zero
return codes are 1 if *dlen is not long enough, 2 for invalid parameters,
and 3 if any of the elements of source are greater than smax.
Using this same operation on the result with smax and dmax reversed reverses
the operation, restoring the original vector. However there may be more
symbols returned than the original, so the number of symbols expected needs
to be known for decoding. (An end symbol could be appended to the source
alphabet to include the length in the coding, but then encoding and decoding
would no longer be symmetric, and the coding efficiency would be reduced.
This is left as an exercise for the reader if that is desired.) */
int recode(unsigned *dest, size_t *dlen, unsigned dmax,
const unsigned *source, size_t slen, unsigned smax)
{
// compute sbits and scut, with which we will recode the source with
// sbits-1 bits for symbols < scut, otherwise with sbits bits (adding scut)
if (smax < 1)
return 2;
unsigned sbits = 0;
unsigned scut = 1; // 2**sbits
while (scut && scut <= smax) {
scut <<= 1;
sbits++;
}
scut -= smax + 1;
// same thing for dbits and dcut
if (dmax < 1)
return 2;
unsigned dbits = 0;
unsigned dcut = 1; // 2**dbits
while (dcut && dcut <= dmax) {
dcut <<= 1;
dbits++;
}
dcut -= dmax + 1;
// recode a base smax+1 vector to a base dmax+1 vector using an
// intermediate bit vector (a sliding window of that bit vector is kept in
// a bit buffer)
unsigned long long buf = 0; // bit buffer
unsigned have = 0; // number of bits in bit buffer
size_t i = 0, n = 0; // source and dest indices
unsigned sym; // symbol being encoded
for (;;) {
// encode enough of source into bits to encode that to dest
while (have < dbits && i < slen) {
sym = source[i++];
if (sym > smax) {
*dlen = n;
return 3;
}
if (sym < scut) {
buf = (buf << (sbits - 1)) + sym;
have += sbits - 1;
}
else {
buf = (buf << sbits) + sym + scut;
have += sbits;
}
}
// if not enough bits to assure one symbol, then break out to a special
// case for coding the final symbol
if (have < dbits)
break;
// encode one symbol to dest
if (n == *dlen)
return 1;
sym = buf >> (have - dbits + 1);
if (sym < dcut) {
dest[n++] = sym;
have -= dbits - 1;
}
else {
sym = buf >> (have - dbits);
dest[n++] = sym - dcut;
have -= dbits;
}
buf &= ((unsigned long long)1 << have) - 1;
}
// if any bits are left in the bit buffer, encode one last symbol to dest
if (have) {
if (n == *dlen)
return 1;
sym = buf;
sym <<= dbits - 1 - have;
if (sym >= dcut)
sym = (sym << 1) - dcut;
dest[n++] = sym;
}
// return recoded vector
*dlen = n;
return 0;
}
/* Test recode(). */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <assert.h>
// Return a random vector of len unsigned values in the range 0..max.
static void ranvec(unsigned *vec, size_t len, unsigned max) {
unsigned bits = 0;
unsigned long long mask = 1;
while (mask <= max) {
mask <<= 1;
bits++;
}
mask--;
unsigned long long ran = 0;
unsigned have = 0;
size_t n = 0;
while (n < len) {
while (have < bits) {
ran = (ran << 31) + random();
have += 31;
}
if ((ran & mask) <= max)
vec[n++] = ran & mask;
ran >>= bits;
have -= bits;
}
}
// Get a valid number from str and assign it to var
#define NUM(var, str) \
do { \
char *end; \
unsigned long val = strtoul(str, &end, 0); \
var = val; \
if (*end || var != val) { \
fprintf(stderr, \
"invalid or out of range numeric argument: %s\n", str); \
return 1; \
} \
} while (0)
/* "bet n m len count" generates count test vectors of length len, where each
entry is in the range 0..n. Each vector is recoded to another vector using
only symbols in the range 0..m. That vector is recoded back to a vector
using only symbols in 0..n, and that result is compared with the original
random vector. Report on the average ratio of input and output symbols, as
compared to the optimal ratio for arbitrary precision base encoding. */
int main(int argc, char **argv)
{
// get sizes of alphabets and length of test vector, compute maximum sizes
// of recoded vectors
unsigned smax, dmax, runs;
size_t slen, dsize, bsize;
if (argc != 5) { fputs("need four arguments\n", stderr); return 1; }
NUM(smax, argv[1]);
NUM(dmax, argv[2]);
NUM(slen, argv[3]);
NUM(runs, argv[4]);
dsize = ceil(slen * ceil(log2(smax + 1.)) / floor(log2(dmax + 1.)));
bsize = ceil(dsize * ceil(log2(dmax + 1.)) / floor(log2(smax + 1.)));
// generate random test vectors, encode, decode, and compare
srandomdev();
unsigned source[slen], dest[dsize], back[bsize];
unsigned mis = 0, i;
unsigned long long dtot = 0;
int ret;
for (i = 0; i < runs; i++) {
ranvec(source, slen, smax);
size_t dlen = dsize;
ret = recode(dest, &dlen, dmax, source, slen, smax);
if (ret) {
fprintf(stderr, "encode error %d\n", ret);
break;
}
dtot += dlen;
size_t blen = bsize;
ret = recode(back, &blen, smax, dest, dlen, dmax);
if (ret) {
fprintf(stderr, "decode error %d\n", ret);
break;
}
if (blen < slen || memcmp(source, back, slen)) // blen > slen is ok
mis++;
}
if (mis)
fprintf(stderr, "%u/%u mismatches!\n", mis, i);
if (ret == 0)
printf("mean dest/source symbols = %.4f (optimal = %.4f)\n",
dtot / (i * (double)slen), log(smax + 1.) / log(dmax + 1.));
return 0;
}
As has been pointed out in other StackOverflow answers, try not to think of summing digit * base^position as converting it to base ten; rather, think of it as directing the computer to generate a representation of the quantity represented by the number in its own terms (for most computers probably closer to our concept of base 2). Once the computer has its own representation of the quantity, we can direct it to output the number in any way we like.
By rejecting "big integer" implementations and asking for letter-by-letter conversion you are at the same time arguing that the numerical/alphabetical representation of quantity is not actually what it is, namely that each position represents a quantity of digit * base^position. If the nine-millionth character of War and Peace does represent what you are asking to convert it from, then the computer at some point will need to generate a representation for Д * 33^9000000.
I don't think any solution can work generally because if ne != m for some integer e and some MAX_INT because there's no way to calculate the value of the target base in a certain place p if np > MAX_INT.
You can get away with this for the case where ne == m for some e because the problem is recursively doable (the first e digits of n can be summed and converted into the first digit of M, and then chopped off and repeated.
If you don't have this useful property, then eventually you're going to have to try to take some part of the original base and try to perform modulus in np and np is going to be greater than MAX_INT, which means it's impossible.

How can I check if there is only value changed in a (bitwise?) value?

How can I check if there is only 1 bit change between a value and another (next) value?
the output is for example
001
101
110
in the second output there is a 0 changed into a 1
in the third output there is a 0 changed into a 1 AND also the last 1 changed into a 0
the program may only continue if there is only 1 change.
First, XOR the two numbers. XOR will return a 1 for every bit that changed.
Example:
0101110110100100
XOR
0100110110100100
would give you
0001000000000000
Now what you need is a quick way to check if there is only a single bit in your resulting number, or in other words, if the resulting number is a power of two.
A quick test for that is: (x & (x - 1)) == 0.
No for loops needed.
You can compute the bitwise XOR and then just count the bits that are 1's. This is known as the Hamming distance. For example:
unsigned int a = 0b001;
unsigned int b = 0b100;
unsigned int res;
/* Stores the number of different bits */
unsigned int acc;
res = a ^ b;
/* from https://graphics.stanford.edu/~seander/bithacks.html */
for (acc = 0; res; res >>= 1)
{
acc += res & 1;
}
In Java
void main(String[] args){
boolean value = moreThanOneChanged("101", "001");
}
static boolean moreThanOneChanged(String input, String current){
if(input.length() != current.length()) return false;
char[] first = input.toCharArray();
char[] second = current.toCharArray();
for(int i = 0, j = 0; i < input.length(); i++){
if(first[i] == second[i])
j++;
if(j > 1)
return true;
}
return false;
}
You can prove it to yourself fairly easily by using an and comparison between an exclusive or of each value and the exclusive or minus 1. It is easier to visualize what takes place by looking at the binary representation of the values and results. Below the function onebitoff performs the test. The other functions just provide a way to output the results:
#include <stdio.h>
#include <limits.h> /* for CHAR_BIT */
#define WDSZ 64
/** returns pointer to binary representation of 'n' zero padded to 'sz'.
* returns pointer to string contianing binary representation of
* unsigned 64-bit (or less ) value zero padded to 'sz' digits.
*/
char *cpbin (unsigned long n, int sz)
{
static char s[WDSZ + 1] = {0};
char *p = s + WDSZ;
int i;
for (i=0; i<sz; i++) {
p--;
*p = (n>>i & 1) ? '1' : '0';
}
return p;
}
/* return true if one-bit bitwise variance */
int onebitoff (unsigned int a, unsigned int b)
{
return ((a ^ b) & ((a ^ b) - 1)) ? 0 : 1;
}
/* quick output of binary difference for 2 values */
void showdiff (unsigned int a, unsigned int b)
{
if (onebitoff (a, b))
printf ( " values %u, %u - vary by one-bit (bitwise)\n\n", a, b);
else
printf ( " values %u, %u - vary by other than one-bit (bitwise)\n\n", a, b);
printf (" %3u : %s\n", a, cpbin (a, sizeof (char) * CHAR_BIT));
printf (" %3u : %s\n", b, cpbin (b, sizeof (char) * CHAR_BIT));
printf (" xor : %s\n\n", cpbin ((a ^ b), sizeof (char) * CHAR_BIT));
}
int main () {
printf ("\nTest whether the following numbers vary by a single bit (bitwise)\n\n");
showdiff (1, 5);
showdiff (5, 6);
showdiff (6, 1);
showdiff (97, 105); /* just as a further test */
return 0;
}
output:
$ ./bin/bitsvary
Test whether the following numbers vary by a single bit (bitwise)
values 1, 5 - vary by one-bit (bitwise)
1 : 00000001
5 : 00000101
xor : 00000100
values 5, 6 - vary by other than one-bit (bitwise)
5 : 00000101
6 : 00000110
xor : 00000011
values 6, 1 - vary by other than one-bit (bitwise)
6 : 00000110
1 : 00000001
xor : 00000111
values 97, 105 - vary by one-bit (bitwise)
97 : 01100001
105 : 01101001
xor : 00001000

What is the most efficient way to subtract signed integral data in binary (bits)?

I'm working in C on a PC, trying to leverage as little C++ as possible, working with binary data stored in unsigned char format, although other formats are certainly possible if worthwhile. The goal is subtracting two signed integer values (which can be ints, signed ints, longs, signed longs, signed shorts, etc.) in binary without converting to other data formats. The raw data is just prepackaged as unsigned char, though, with the user basically knowing which of the signed integer formats should be used for reading (i.e. we know how many bytes to read at once). Even though data is stored as an unsigned char array, data are meant to be read signed as two's-complement integers.
One common way we're often taught in school is adding the negative. Negation, in turn, is often taught to be performed as flipping bits and adding 1 (0x1), resulting in two additions (perhaps a bad thing?); or, as other posts point out, flipping bits past the first zero starting from the MSB. I'm wondering if there is a more efficient way, that may not be easily described as a pen-and-paper operation, but works because of the way data is stored in bit format. Here are some prototypes I've written, which may not be the most efficient way, but which summarizes my progress so far based on textbook methodology.
The addends are passed by reference in case I have to manually extend them to balance their length. Any and all feedback will be appreciated! Thanks in advance for considering.
void SubtractByte(unsigned char* & a, unsigned int & aBytes,
unsigned char* & b, unsigned int & bBytes,
unsigned char* & diff, unsigned int & nBytes)
{
NegateByte(b, bBytes);
// a - b == a + (-b)
AddByte(a, aBytes, b, bBytes, diff, nBytes);
// Restore b to its original state so input remains intact
NegateByte(b, bBytes);
}
void AddByte(unsigned char* & a, unsigned int & aBytes,
unsigned char* & b, unsigned int & bBytes,
unsigned char* & sum, unsigned int & nBytes)
{
// Ensure that both of our addends have the same length in memory:
BalanceNumBytes(a, aBytes, b, bBytes, nBytes);
bool aSign = !((a[aBytes-1] >> 7) & 0x1);
bool bSign = !((b[bBytes-1] >> 7) & 0x1);
// Add bit-by-bit to keep track of carry bit:
unsigned int nBits = nBytes * BITS_PER_BYTE;
unsigned char carry = 0x0;
unsigned char result = 0x0;
unsigned char a1, b1;
// init sum
for (unsigned int j = 0; j < nBytes; ++j) {
for (unsigned int i = 0; i < BITS_PER_BYTE; ++i) {
a1 = ((a[j] >> i) & 0x1);
b1 = ((b[j] >> i) & 0x1);
AddBit(&a1, &b1, &carry, &result);
SetBit(sum, j, i, result==0x1);
}
}
// MSB and carry determine if we need to extend:
if (((aSign && bSign) && (carry != 0x0 || result != 0x0)) ||
((!aSign && !bSign) && (result == 0x0))) {
++nBytes;
sum = (unsigned char*)realloc(sum, nBytes);
sum[nBytes-1] = (carry == 0x0 ? 0x0 : 0xFF); //init
}
}
void FlipByte (unsigned char* n, unsigned int nBytes)
{
for (unsigned int i = 0; i < nBytes; ++i) {
n[i] = ~n[i];
}
}
void NegateByte (unsigned char* n, unsigned int nBytes)
{
// Flip each bit:
FlipByte(n, nBytes);
unsigned char* one = (unsigned char*)malloc(nBytes);
unsigned char* orig = (unsigned char*)malloc(nBytes);
one[0] = 0x1;
orig[0] = n[0];
for (unsigned int i = 1; i < nBytes; ++i) {
one[i] = 0x0;
orig[i] = n[i];
}
// Add binary representation of 1
AddByte(orig, nBytes, one, nBytes, n, nBytes);
free(one);
free(orig);
}
void AddBit(unsigned char* a, unsigned char* b, unsigned char* c,
unsigned char* result) {
*result = ((*a + *b + *c) & 0x1);
*c = (((*a + *b + *c) >> 1) & 0x1);
}
void SetBit(unsigned char* bytes, unsigned int byte, unsigned int bit,
bool val)
{
// shift desired bit into LSB position, and AND with 00000001
if (val) {
// OR with 00001000
bytes[byte] |= (0x01 << bit);
}
else{ // (!val), meaning we want to set to 0
// AND with 11110111
bytes[byte] &= ~(0x01 << bit);
}
}
void BalanceNumBytes (unsigned char* & a, unsigned int & aBytes,
unsigned char* & b, unsigned int & bBytes,
unsigned int & nBytes)
{
if (aBytes > bBytes) {
nBytes = aBytes;
b = (unsigned char*)realloc(b, nBytes);
bBytes = nBytes;
b[nBytes-1] = ((b[0] >> 7) & 0x1) ? 0xFF : 0x00;
} else if (bBytes > aBytes) {
nBytes = bBytes;
a = (unsigned char*)realloc(a, nBytes);
aBytes = nBytes;
a[nBytes-1] = ((a[0] >> 7) & 0x1) ? 0xFF : 0x00;
} else {
nBytes = aBytes;
}
}
The first thing to notice is that signed vs. unsigned doesn't matter to the generated bit pattern in two's complement. All that changes is the interpretation of the result.
The second thing to notice is that an addition has carried if the result is less than either input when done with unsigned arithmetic.
void AddByte(unsigned char* & a, unsigned int & aBytes,
unsigned char* & b, unsigned int & bBytes,
unsigned char* & sum, unsigned int & nBytes)
{
// Ensure that both of our addends have the same length in memory:
BalanceNumBytes(a, aBytes, b, bBytes, nBytes);
unsigned char carry = 0;
for (int j = 0; j < nbytes; ++j) { // need to reverse the loop for big-endian
result[j] = a[j] + b[j];
unsigned char newcarry = (result[j] < a[j] || (unsigned char)(result[j]+carry) < a[j]);
result[j] += carry;
carry = newcarry;
}
}

Resources