I am working an a very basic operating system for a learning experience, and I am trying to start with key presses. I am making a freestanding executable, so no standard library. How would I go about taking input from a keyboard? I have figured out how to print to the screen through video memory.
/*
* kernel.c
* */
void cls(int line) { // clear the screen
char *vidptr = (char*) 0xb8000;
/* 25 lines each of 80 columns; each element takes 2 bytes */
unsigned int x = 0;
while (x < 80 * 25 * 2) {
// blank character
vidptr[x] = ' ';
// attribute-byte - light grey on black screen
x += 1;
vidptr[x] = 0x07;
x += 1;
}
line = 0;
}
void printf(const char *str, int line, char attr) { // write a string to video memory
char *vidptr = (char*) 0xb8000;
unsigned int y =0, x = 0;
while (str[y] != '\0') {
// the character's ascii
vidptr[x] = str[y];
x++;
// attribute byte - give character black bg and light gray fg
vidptr[x+1] = attr;
x++;
y++;
}
}
void kmain(void) {
unsigned int line = 0;
cls(line);
printf("Testing the Kernel", line, 0x0a);
}
and my assembly:
;; entry point
bits 32 ; nasm directive - 32 bit
global entry
extern _kmain ; kmain is defined in the c file
section .text
entry: jmp start
;multiboot spec
align 4
dd 0x1BADB002 ; black magic
dd 0x00 ; flags
dd -(0x1BADB002 + 0x00) ; checksum. m+f+c should be zero
start:
cli ; block interrupts
mov esp, stack_space ; set stack pointer
call _kmain
hlt ; halt the CPU
section .bss
resb 8192 ; 8KB for stack
stack_space:
The kernel below in C language changes a cell of an array:
__global__ void test(int *mt[matrix_size])
{
mt[0][0]=12;
}
The code below copies kernel results to host but it doesn't send the array to host correctly:
int *matrix[matrix_size],*d_matrix[matrix_size];
for(int i=0;i<matrix_size;i++)
matrix[i] = (int *)malloc(n*n*sizeof(int));
for(int i=0;i<matrix_size;i++)
cudaMalloc((void**)&d_matrix[i],sizeof(int));
test<<<1,1>>>(d_matrix);
cudaMemcpy(*matrix,*d_matrix,n*n*sizeof(int),cudaMemcpyDeviceToHost);
printf("\n\n %d \n\n",matrix[0][0]); //the result is zero instead of 12
How can I fix the problem?
You have gotten a lot wrong here.
The root cause is that d_matrix is in host memory and can't be passed directly to a kernel. If you check runtime errors you will see that firstly the first cudaMemcpy call fails because of the wrong direction argument, then when you fix that, the kernel fails with a invalid address error.
To fix this, you need to allocate a copy of d_matrix on the GPU and copy d_matrix to that copy. This is because the array you are passing to the kernel decays to a pointer and is not passed by value.
Something like this:
#include <cstdio>
const int n = 9;
const int matrix_size = 16;
__global__
void test(int *mt[matrix_size])
{
mt[threadIdx.x][0] = 12 + threadIdx.x;
}
int main()
{
int *matrix[matrix_size],*d_matrix[matrix_size];
for(int i=0;i<matrix_size;i++) {
matrix[i] = (int *)malloc(n * n * sizeof(int));
cudaMalloc((void**)&d_matrix[i], n * n * sizeof(int));
}
int **dd_matrix;
cudaMalloc(&dd_matrix, matrix_size * sizeof(int*));
cudaMemcpy(dd_matrix, d_matrix, matrix_size * sizeof(int *), cudaMemcpyHostToDevice);
test<<<1,matrix_size>>>(dd_matrix);
for(int i=0;i<matrix_size;i++) {
cudaMemcpy(matrix[i], d_matrix[i], n*n*sizeof(int), cudaMemcpyDeviceToHost);
printf("%d = %d \n", i, matrix[i][0]);
}
return 0;
}
Which when run gives this:
$ nvcc -g -G -arch=sm_52 -o bozocu bozocu.cu
$ ./bozocu
0 = 12
1 = 13
2 = 14
3 = 15
4 = 16
5 = 17
6 = 18
7 = 19
8 = 20
9 = 21
10 = 22
11 = 23
12 = 24
13 = 25
14 = 26
15 = 27
is, I believe, more in line with what you were expecting.
I'm searching UART code for pic18f65k40, but i haven't found yet.
Could anyone please provide a sample code to transmit a single character using pic18f65k40 uart.
This is the driver from my neolight theatre lighting controller. It is for a PIC18F14K22, but should be close enough to give you a start. This board also bit bangs the extremely fast protocol for the World Semi LEDs, hence the comment about timing.
/* Neolight board optional UART support. The UART is used as an
alternative to contact closures to trigger effects, and can be used
to download completely arbitrary patterns to the LEDS.
NOTE that during a write to the LEDs there are NO cycles left over
to operate the UART, so it does not work during updates. The
protocol reflects this.
*/
#include <stdio.h>
#include <stdlib.h>
#include <xc.h>
#include "neolight.h"
/* Baud rate calculation */
#define brgh 1 /* Use high rate mode */
#if brgh
#define divisor 4
#else
#define divisor 64
#endif
/* Macros to compute baud rate setup at compile time */
#define spbrg ( ( (long) Fosc ) / ( (long) divisor * (long) ( baud_rate ) ) - 1 )
#define spbrgl ( spbrg & 0xFF )
#define spbrgh ( ( spbrg >> 8 ) & 0xFF )
/* Turn on the UART and set baud rate */
void uart_setup ()
{
RCSTAbits . SPEN = 1 ;
BAUDCONbits . BRG16 = 1 ;
TXSTAbits . TXEN = 1 ;
TXSTAbits . BRGH = brgh ;
RCSTAbits . CREN = 1 ;
SPBRG = spbrgl ;
SPBRGH = spbrgh ;
}
/* Send data to UART */
void uart_send ( unsigned char * buffer, int count )
{
int n ;
for ( n = 0; n < count; n++ )
{
while ( !TXSTAbits . TRMT ) ;
TXREG = buffer [ n ] ;
}
}
/* Receive from UART, return -1 if no char */
int uart_receive ()
{
if ( RCSTAbits . OERR )
{
RCSTAbits . CREN = 0 ;
RCSTAbits . CREN = 1 ;
}
if ( PIR1bits . RCIF )
return RCREG ;
else
return -1 ;
}
void Setup_UART(void)
{
TX1STA = 0b00100100;
RC1STA = 0b00010000;
BAUDCON1 = 0b00001000;
SP1BRGL = xxx; //Baudrate value, look at datasheet
SP1BRGH = xxx;
RCSTA1bits.SPEN = 1;
}
void Send_Byte(uint8_t Data)
{
while (!PIR3bits.TXIF);
TX1REG = Data;
}
This example works for the PIC1827k40. Please have a look at the datasheet for your controller.
1, It is a pity that memset(void* dst, int value, size_t size) fools a lot of people when they first use this function! 2nd parameter "int value" should be "uchar value" to describe the real operation inside.
Don't misunderstand me, I am asking a memset-like function!
2, I know there are some c++ candy function like std::fill_n(my_array, array_length, constant_value);
even a pure c function in OS X: memset_pattern4(grid, &pattern, sizeof grid);
mentioned in a perfect thread Why is memset() incorrectly initializing int?.
So, is there a similar c function in runtime library of visual studio like memset_pattern4()?
3, for somebody asked why i wouldn't use a for-loop to set integer by integer. here is my answer: memset turns to a better performance when setting big trunk(10K?) of memory at least in x86.
http://www.gamedev.net/topic/472631-performance-of-memset/page-2 gives more discussion, although without a conclusion(I doubt there will be).
4, said function can be used to simplify counting sort by avoiding useless Fibonacci accumulation.
Original:
for (int i = 0; i < SRC_ARRY_SIZE; i++)
counter_arry[src_arry[i]]++;
for (int i = SRC_LOW_BOUND; i < SRC_HI_BOUND; i++)//forward fabnacci??
counter_arry[i+1] += counter_arry[i];
for (int i = 0; i < SRC_ARRY_SIZE; i++)
{
value = src_arry[i];
map = --counter_arry[value];//then counter down!
temp[map] = value;
}
Expected:
for (int i = 0; i < SRC_ARRY_SIZE; i++)
counter_arry[src_arry[i]]++;
for (int i = SRC_LOW_BOUND; i < SRC_HI_BOUND+1; i++)//forward fabnacci??
{
memset_4(cur_ptr,i, counter_arry[i]);
cur_ptr += counter_arry[i];
}
Thanks for your kindly review and reply!
Here's an implementation of memset_pattern4() that you might find useful. It's nothing like Darwin's SSE assembly language version, except that it has the same interface.
#include <string.h>
#include <stdint.h>
/*
A portable implementation of the Darwin memset_patternX() family of functions:
These are analogous to memset(), except that they fill memory with a replicated
pattern either 4, 8, or 16 bytes long. b points to a buffer of size len bytes
which is to be filled. The second parameter points to the pattern. If the
buffer length is not an even multiple of the pattern length, the last instance
of the pattern will be truncated. Neither the buffer nor the pattern pointer
need be aligned.
*/
/*
alignment utility macros stolen from Linux
see https://lkml.org/lkml/2006/11/25/2 for a discussion of why typeof() is used
*/
#if !_MSC_VER
#define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1)
#define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
#define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
#define __ALIGN_MASK(x, mask) __ALIGN_KERNEL_MASK((x), (mask))
#define PTR_ALIGN(p, a) ((typeof(p))ALIGN((uintptr_t)(p), (a)))
#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
#define IS_PTR_ALIGNED(p, a) (IS_ALIGNED((uintptr_t)(p), (a)))
#else
/* MS friendly versions */
/* taken from the DDK's fltKernel.h header */
#define IS_ALIGNED(_pointer, _alignment) \
((((uintptr_t) (_pointer)) & ((_alignment) - 1)) == 0)
#define ROUND_TO_SIZE(_length, _alignment) \
((((uintptr_t)(_length)) + ((_alignment)-1)) & ~(uintptr_t) ((_alignment) - 1))
#define __ALIGN_KERNEL(x, a) ROUND_TO_SIZE( (x), (a))
#define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
#define PTR_ALIGN(p, a) ALIGN((p), (a))
#define IS_PTR_ALIGNED(p, a) (IS_ALIGNED((uintptr_t)(p), (a)))
#endif
void nx_memset_pattern4(void *b, const void *pattern4, size_t len)
{
enum { pattern_len = 4 };
unsigned char* dst = (unsigned char*) b;
unsigned const char* src = (unsigned const char*) pattern4;
if (IS_PTR_ALIGNED( dst, pattern_len) && IS_PTR_ALIGNED( src, pattern_len)) {
/* handle aligned moves */
uint32_t val = *((uint32_t*)src);
uint32_t* word_dst = (uint32_t*) dst;
size_t word_count = len / pattern_len;
dst += word_count * pattern_len;
len -= word_count * pattern_len;
for (; word_count != 0; --word_count) {
*word_dst++ = val;
}
}
else {
while (pattern_len <= len) {
memcpy(dst, src, pattern_len);
dst += pattern_len;
len -= pattern_len;
}
}
memcpy( dst, src, len);
}
I have a list of N 64-bit integers whose bits represent small sets. Each integer has at most k bits set to 1. Given a bit mask, I would like to find the first element in the list that matches the mask, i.e. element & mask == element.
Example:
If my list is:
index abcdef
0 001100
1 001010
2 001000
3 000100
4 000010
5 000001
6 010000
7 100000
8 000000
and my mask is 111000, the first element matching the mask is at index 2.
Method 1:
Linear search through the entire list. This takes O(N) time and O(1) space.
Method 2:
Precompute a tree of all possible masks, and at each node keep the answer for that mask. This takes O(1) time for the query, but takes O(2^64) space.
Question:
How can I find the first element matching the mask faster than O(N), while still using a reasonable amount of space? I can afford to spend polynomial time in precomputation, because there will be a lot of queries. The key is that k is small. In my application, k <= 5 and N is in the thousands. The mask has many 1s; you can assume that it is drawn uniformly from the space of 64-bit integers.
Update:
Here is an example data set and a simple benchmark program that runs on Linux: http://up.thirld.com/binmask.tar.gz. For large.in, N=3779 and k=3. The first line is N, followed by N unsigned 64-bit ints representing the elements. Compile with make. Run with ./benchmark.e >large.out to create the true output, which you can then diff against. (Masks are generated randomly, but the random seed is fixed.) Then replace the find_first() function with your implementation.
The simple linear search is much faster than I expected. This is because k is small, and so for a random mask, a match is found very quickly on average.
A suffix tree (on bits) will do the trick, with the original priority at the leaf nodes:
000000 -> 8
1 -> 5
10 -> 4
100 -> 3
1000 -> 2
10 -> 1
100 -> 0
10000 -> 6
100000 -> 7
where if the bit is set in the mask, you search both arms, and if not, you search only the 0 arm; your answer is the minimum number you encounter at a leaf node.
You can improve this (marginally) by traversing the bits not in order but by maximum discriminability; in your example, note that 3 elements have bit 2 set, so you would create
2:0 0:0 1:0 3:0 4:0 5:0 -> 8
5:1 -> 5
4:1 5:0 -> 4
3:1 4:0 5:0 -> 3
1:1 3:0 4:0 5:0 -> 6
0:1 1:0 3:0 4:0 5:0 -> 7
2:1 0:0 1:0 3:0 4:0 5:0 -> 2
4:1 5:0 -> 1
3:1 4:0 5:0 -> 0
In your example mask this doesn't help (since you have to traverse both the bit2==0 and bit2==1 sides since your mask is set in bit 2), but on average it will improve the results (but at a cost of setup and more complex data structure). If some bits are much more likely to be set than others, this could be a huge win. If they're pretty close to random within the element list, then this doesn't help at all.
If you're stuck with essentially random bits set, you should get about (1-5/64)^32 benefit from the suffix tree approach on average (13x speedup), which might be better than the difference in efficiency due to using more complex operations (but don't count on it--bit masks are fast). If you have a nonrandom distribution of bits in your list, then you could do almost arbitrarily well.
This is the bitwise Kd-tree. It typically needs less than 64 visits per lookup operation. Currently, the selection of the bit (dimension) to pivot on is random.
#include <limits.h>
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef unsigned long long Thing;
typedef unsigned long Number;
unsigned thing_ffs(Thing mask);
Thing rand_mask(unsigned bitcnt);
#define WANT_RANDOM 31
#define WANT_BITS 3
#define BITSPERTHING (CHAR_BIT*sizeof(Thing))
#define NONUMBER ((Number)-1)
struct node {
Thing value;
Number num;
Number nul;
Number one;
char pivot;
} *nodes = NULL;
unsigned nodecount=0;
unsigned itercount=0;
struct node * nodes_read( unsigned *sizp, char *filename);
Number *find_ptr_to_insert(Number *ptr, Thing value, Thing mask);
unsigned grab_matches(Number *result, Number num, Thing mask);
void initialise_stuff(void);
int main (int argc, char **argv)
{
Thing mask;
Number num;
unsigned idx;
srand (time(NULL));
nodes = nodes_read( &nodecount, argv[1]);
fprintf( stdout, "Nodecount=%u\n", nodecount );
initialise_stuff();
#if WANT_RANDOM
mask = nodes[nodecount/2].value | nodes[nodecount/3].value ;
#else
mask = 0x38;
#endif
fprintf( stdout, "\n#### Search mask=%llx\n", (unsigned long long) mask );
itercount = 0;
num = NONUMBER;
idx = grab_matches(&num,0, mask);
fprintf( stdout, "Itercount=%u\n", itercount );
fprintf(stdout, "KdTree search %16llx\n", (unsigned long long) mask );
fprintf(stdout, "Count=%u Result:\n", idx);
idx = num;
if (idx >= nodecount) idx = nodecount-1;
fprintf( stdout, "num=%4u Value=%16llx\n"
,(unsigned) nodes[idx].num
,(unsigned long long) nodes[idx].value
);
fprintf( stdout, "\nLinear search %16llx\n", (unsigned long long) mask );
for (idx = 0; idx < nodecount; idx++) {
if ((nodes[idx].value & mask) == nodes[idx].value) break;
}
fprintf(stdout, "Cnt=%u\n", idx);
if (idx >= nodecount) idx = nodecount-1;
fprintf(stdout, "Num=%4u Value=%16llx\n"
, (unsigned) nodes[idx].num
, (unsigned long long) nodes[idx].value );
return 0;
}
void initialise_stuff(void)
{
unsigned num;
Number root, *ptr;
root = 0;
for (num=0; num < nodecount; num++) {
nodes[num].num = num;
nodes[num].one = NONUMBER;
nodes[num].nul = NONUMBER;
nodes[num].pivot = -1;
}
nodes[num-1].value = 0; /* last node is guaranteed to match anything */
root = 0;
for (num=1; num < nodecount; num++) {
ptr = find_ptr_to_insert (&root, nodes[num].value, 0ull );
if (*ptr == NONUMBER) *ptr = num;
else fprintf(stderr, "Found %u for %u\n"
, (unsigned)*ptr, (unsigned) num );
}
}
Thing rand_mask(unsigned bitcnt)
{struct node * nodes_read( unsigned *sizp, char *filename)
{
struct node *ptr;
unsigned size,used;
FILE *fp;
if (!filename) {
size = (WANT_RANDOM+0) ? WANT_RANDOM : 9;
ptr = malloc (size * sizeof *ptr);
#if (!WANT_RANDOM)
ptr[0].value = 0x0c;
ptr[1].value = 0x0a;
ptr[2].value = 0x08;
ptr[3].value = 0x04;
ptr[4].value = 0x02;
ptr[5].value = 0x01;
ptr[6].value = 0x10;
ptr[7].value = 0x20;
ptr[8].value = 0x00;
#else
for (used=0; used < size; used++) {
ptr[used].value = rand_mask(WANT_BITS);
}
#endif /* WANT_RANDOM */
*sizp = size;
return ptr;
}
fp = fopen( filename, "r" );
if (!fp) return NULL;
fscanf(fp,"%u\n", &size );
fprintf(stderr, "Size=%u\n", size);
ptr = malloc (size * sizeof *ptr);
for (used = 0; used < size; used++) {
fscanf(fp,"%llu\n", &ptr[used].value );
}
fclose( fp );
*sizp = used;
return ptr;
}
Thing value = 0;
unsigned bit, cnt;
for (cnt=0; cnt < bitcnt; cnt++) {
bit = 54321*rand();
bit %= BITSPERTHING;
value |= 1ull << bit;
}
return value;
}
Number *find_ptr_to_insert(Number *ptr, Thing value, Thing done)
{
Number num=NONUMBER;
while ( *ptr != NONUMBER) {
Thing wrong;
num = *ptr;
wrong = (nodes[num].value ^ value) & ~done;
if (nodes[num].pivot < 0) { /* This node is terminal */
/* choose one of the wrong bits for a pivot .
** For this bit (nodevalue==1 && searchmask==0 )
*/
if (!wrong) wrong = ~done ;
nodes[num].pivot = thing_ffs( wrong );
}
ptr = (wrong & 1ull << nodes[num].pivot) ? &nodes[num].nul : &nodes[num].one;
/* Once this bit has been tested, it can be masked off. */
done |= 1ull << nodes[num].pivot ;
}
return ptr;
}
unsigned grab_matches(Number *result, Number num, Thing mask)
{
Thing wrong;
unsigned count;
for (count=0; num < *result; ) {
itercount++;
wrong = nodes[num].value & ~mask;
if (!wrong) { /* we have a match */
if (num < *result) { *result = num; count++; }
/* This is cheap pruning: the break will omit both subtrees from the results.
** But because we already have a result, and the subtrees have higher numbers
** than our current num, we can ignore them. */
break;
}
if (nodes[num].pivot < 0) { /* This node is terminal */
break;
}
if (mask & 1ull << nodes[num].pivot) {
/* avoid recursion if there is only one non-empty subtree */
if (nodes[num].nul >= *result) { num = nodes[num].one; continue; }
if (nodes[num].one >= *result) { num = nodes[num].nul; continue; }
count += grab_matches(result, nodes[num].nul, mask);
count += grab_matches(result, nodes[num].one, mask);
break;
}
mask |= 1ull << nodes[num].pivot;
num = (wrong & 1ull << nodes[num].pivot) ? nodes[num].nul : nodes[num].one;
}
return count;
}
unsigned thing_ffs(Thing mask)
{
unsigned bit;
#if 1
if (!mask) return (unsigned)-1;
for ( bit=random() % BITSPERTHING; 1 ; bit += 5, bit %= BITSPERTHING) {
if (mask & 1ull << bit ) return bit;
}
#elif 0
for (bit =0; bit < BITSPERTHING; bit++ ) {
if (mask & 1ull <<bit) return bit;
}
#else
mask &= (mask-1); // Kernighan-trick
for (bit =0; bit < BITSPERTHING; bit++ ) {
mask >>=1;
if (!mask) return bit;
}
#endif
return 0xffffffff;
}
struct node * nodes_read( unsigned *sizp, char *filename)
{
struct node *ptr;
unsigned size,used;
FILE *fp;
if (!filename) {
size = (WANT_RANDOM+0) ? WANT_RANDOM : 9;
ptr = malloc (size * sizeof *ptr);
#if (!WANT_RANDOM)
ptr[0].value = 0x0c;
ptr[1].value = 0x0a;
ptr[2].value = 0x08;
ptr[3].value = 0x04;
ptr[4].value = 0x02;
ptr[5].value = 0x01;
ptr[6].value = 0x10;
ptr[7].value = 0x20;
ptr[8].value = 0x00;
#else
for (used=0; used < size; used++) {
ptr[used].value = rand_mask(WANT_BITS);
}
#endif /* WANT_RANDOM */
*sizp = size;
return ptr;
}
fp = fopen( filename, "r" );
if (!fp) return NULL;
fscanf(fp,"%u\n", &size );
fprintf(stderr, "Size=%u\n", size);
ptr = malloc (size * sizeof *ptr);
for (used = 0; used < size; used++) {
fscanf(fp,"%llu\n", &ptr[used].value );
}
fclose( fp );
*sizp = used;
return ptr;
}
UPDATE:
I experimented a bit with the pivot-selection, favouring bits with the highest discriminatory value ("information content"). This involves:
making a histogram of the usage of bits (can be done while initialising)
while building the tree: choosing the one with frequency closest to 1/2 in the remaining subtrees.
The result: the random pivot selection performed better.
Construct a a binary tree as follows:
Every level corresponds to a bit
It corresponding bit is on go right, otherwise left
This way insert every number in the database.
Now, for searching: if the corresponding bit in the mask is 1, traverse both children. If it is 0, traverse only the left node. Essentially keep traversing the tree until you hit the leaf node (BTW, 0 is a hit for every mask!).
This tree will have O(N) space requirements.
Eg of tree for 1 (001), 2(010) and 5 (101)
root
/ \
0 1
/ \ |
0 1 0
| | |
1 0 1
(1) (2) (5)
With precomputed bitmasks. Formally is is still O(N), since the and-mask operations are O(N). The final pass is also O(N), because it needs to find the lowest bit set, but that could be sped up, too.
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* For demonstration purposes.
** In reality, this should be an unsigned long long */
typedef unsigned char Thing;
#define BITSPERTHING (CHAR_BIT*sizeof (Thing))
#define COUNTOF(a) (sizeof a / sizeof a[0])
Thing data[] =
/****** index abcdef */
{ 0x0c /* 0 001100 */
, 0x0a /* 1 001010 */
, 0x08 /* 2 001000 */
, 0x04 /* 3 000100 */
, 0x02 /* 4 000010 */
, 0x01 /* 5 000001 */
, 0x10 /* 6 010000 */
, 0x20 /* 7 100000 */
, 0x00 /* 8 000000 */
};
/* Note: this is for demonstration purposes.
** Normally, one should choose a machine wide unsigned int
** for bitmask arrays.
*/
struct bitmap {
char data[ 1+COUNTOF (data)/ CHAR_BIT ];
} nulmaps [ BITSPERTHING ];
#define BITSET(a,i) (a)[(i) / CHAR_BIT ] |= (1u << ((i)%CHAR_BIT) )
#define BITTEST(a,i) ((a)[(i) / CHAR_BIT ] & (1u << ((i)%CHAR_BIT) ))
void init_tabs(void);
void map_empty(struct bitmap *dst);
void map_full(struct bitmap *dst);
void map_and2(struct bitmap *dst, struct bitmap *src);
int main (void)
{
Thing mask;
struct bitmap result;
unsigned ibit;
mask = 0x38;
init_tabs();
map_full(&result);
for (ibit = 0; ibit < BITSPERTHING; ibit++) {
/* bit in mask is 1, so bit at this position is in fact a don't care */
if (mask & (1u <<ibit)) continue;
/* bit in mask is 0, so we can only select items with a 0 at this bitpos */
map_and2(&result, &nulmaps[ibit] );
}
/* This is not the fastest way to find the lowest 1 bit */
for (ibit = 0; ibit < COUNTOF (data); ibit++) {
if (!BITTEST(result.data, ibit) ) continue;
fprintf(stdout, " %u", ibit);
}
fprintf( stdout, "\n" );
return 0;
}
void init_tabs(void)
{
unsigned ibit, ithing;
/* 1 bits in data that dont overlap with 1 bits in the searchmask are showstoppers.
** So, for each bitpos, we precompute a bitmask of all *entrynumbers* from data[], that contain 0 in bitpos.
*/
memset(nulmaps, 0 , sizeof nulmaps);
for (ithing=0; ithing < COUNTOF(data); ithing++) {
for (ibit=0; ibit < BITSPERTHING; ibit++) {
if ( data[ithing] & (1u << ibit) ) continue;
BITSET(nulmaps[ibit].data, ithing);
}
}
}
/* Logical And of two bitmask arrays; simular to dst &= src */
void map_and2(struct bitmap *dst, struct bitmap *src)
{
unsigned idx;
for (idx = 0; idx < COUNTOF(dst->data); idx++) {
dst->data[idx] &= src->data[idx] ;
}
}
void map_empty(struct bitmap *dst)
{
memset(dst->data, 0 , sizeof dst->data);
}
void map_full(struct bitmap *dst)
{
unsigned idx;
/* NOTE this loop sets too many bits to the left of COUNTOF(data) */
for (idx = 0; idx < COUNTOF(dst->data); idx++) {
dst->data[idx] = ~0;
}
}