Is there an efficient implementation of a hash table, which maps key (integer) to values (string) and vice versa, for some compiled language?
Of course, one could always have two tables, one for key=>value mapping and other for value=>key. However that would not be very efficient, at least not memory-wise. Possibly both mappings can be in a single table, if the type system and intended usage allow it.
One name for this is a BiMap (as in bidirectional). The obvious limitation is that keys will be distinct (like in a normal dictionary/map), but so will with values.
For Java, there's a StackOverflow question on it, but the general recommendation is the Guava BiMap.
For C and C++, Boost has a Bimap.
Internally, it's the "inefficient" implementation you mention where it keeps two hashtables. Here's the thing: it is efficient, and using twice as much memory for a secondary lookup structure is expected, and rarely a big deal.
This is the datastructureI usefor a bihash:
The overhead is four ints (for the indexes) per entry.
In the example below, with typedef unsigned char Index; , the overhead will be four characters, and the maximal table capacity will be 255.
/* For demonstration purposes these types are VERY SMALL.
** Normally one would use unsigned short, or unsigned int.
*/
typedef unsigned char Index;
typedef unsigned short Hashval;
/* Use the maximal representable value as sentinel value */
#define NIL ((Index)-1)
struct entry {
Index head_key /* The head-of-the-chain pointers */
, head_val; /* ... and for the values */
Index next_key /* linked list for chaining the keys */
, next_val; /* ... and for the values */
};
struct table {
unsigned totlen, keylen;
/* free points to the root of the freetree */
Index size, free;
/* The complete payload, for both keys and values.
* layout = [key0|val0|key1|val1|..] (without padding/alignment)
*/
char *data;
struct entry *entries; /* All the entries. Not pointers, but the actual entries. */
};
/* Macros for accessing the pool of payload */
#define NODE_KEY(p,n) ((p)->data + (n) * (p)->totlen)
#define NODE_VAL(p,n) ((p)->data + (n) * ((p)->totlen+(p)->keylen))
#define TH_OK 0
#define TH_KEY_NOT_FOUND 1
#define TH_VAL_NOT_FOUND 2
#define TH_BOTH_NOT_FOUND 3
#define TH_TABLE_FULL 4
#define TH_KEY_DUPLICATE 5
#define TH_VAL_DUPLICATE 6
#define TH_BOTH_DUPLICATE 7
#define TH_TOTAL_ECLIPSE 8
/********************************************/
/* Allocate and initialise the hash table.
** Note: given fixed size, the table and the payload could be statically allocated,
** (but we'd still need to do the initialisation)
*/
struct table * table_new( unsigned keylen, unsigned vallen, unsigned totcount )
{
Index idx;
struct table *this;
if (totcount > NIL) {
fprintf(stderr, "Table_new(%zu,%zu,%zu): totcount(%zu) larger than largest Index(%zu)\n"
, (size_t) keylen, (size_t) vallen, (size_t) totcount
, (size_t) totcount, ((size_t)NIL) -1 );
return NULL;
}
this = malloc (sizeof *this);
this->size = totcount;
this->keylen = keylen;
this->totlen = keylen+vallen;
this->data = malloc (totcount * this->totlen );
this->entries = malloc (totcount * sizeof *this->entries );
this->free = 0; /* start of freelist */
for( idx=0; idx < this->size; idx++ ) {
this->entries[idx].head_key = NIL;
this->entries[idx].head_val = NIL;
this->entries[idx].next_key = NIL;
this->entries[idx].next_val = idx+1; /* unused next pointer reused as freelist */
};
this-> entries[idx-1].next_val = NIL; /* end of freelist */
fprintf(stderr, "Table_new(%zu,%zu,%zu) size = %zu+%zu+%zu\n"
, (size_t) keylen, (size_t) vallen, (size_t) totcount
, sizeof *this, (size_t)totcount * this->totlen, totcount * sizeof *this->entries
);
return this;
}
Related
I noticed that on Windows every time I issue an unbuffered fread() request with an odd length, it's split into 2 requests (as observed through procmon):
a) fread for my requested length-1
b) 2-byte fread for the last byte
This has an obvious performance overhead like 2 kernel requests instead of one etc.
Sample code ran on Windows 10:
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]) {
FILE* pFile;
char* buffer;
pFile = fopen(argv[0], "rb");
setbuf(pFile, nullptr);
size_t len = 3;
buffer = (char*)malloc(sizeof(char)*len);
if (len != fread(buffer, 1, len, pFile)) { fputs("Reading error", stderr); exit(3); }
free(buffer);
fclose(pFile);
return 0;
}
This results in the following procmon reported calls:
ReadFile c:\work\cpptry\Debug\cpptry.exe SUCCESS Offset: 0, Length: 2, Priority: Normal
ReadFile c:\work\cpptry\Debug\cpptry.exe SUCCESS Offset: 2, Length: 2
It seems as if Windows is incapable of issuing odd-sized requests to the file system.
What's up with that?
This is implementation artifact.
MS CRT keeps all FILEs buffered even if you tell it to don't do this. Instead file buffer is set to internal buffer with space for two bytes. This allows to keep one code path instead of two and simplifies implementation of fast path in fgetc and fputc.
#define fgetc(_stream) (--(_stream)->_cnt >= 0 ? 0xff & *(_stream)->_ptr++ : _filbuf(_stream))
Some of you are probably bothered by size of the buffer (2 bytes when quasi unbuffered), but in _fread_nolock_s function we can find optimization
witch tries to read multiplies of buffer size directly to the destination bypassing file buffer.
See fread.c in CRT sources:
/* calc chars to read -- (count/streambufsize) * streambufsize */
nbytes = (unsigned)(count - count % streambufsize);
...
nread = _read_nolock(_fileno(stream), data, nbytes);
Because the file buffer's size is equal 2, even number of bytes is read directly to the destination and eventual one byte goes through the file buffer. Sometimes there could be some bytes in the buffer that need to be transfered to destination before optimized read can take place.
Bonus: buffer size is always forced to multiple of 2.
See setvbuf.c:
/*
* force size to be even by masking down to the nearest multiple
* of 2
*/
size &= (size_t)~1;
...
/*
* CASE 1: No Buffering.
*/
if (type & _IONBF) {
stream->_flag |= _IONBF;
buffer = (char *)&(stream->_charbuf);
size = 2;
}
Code snippets above are from VC 2013 CRT.
For comparison snippets from Universal CRT 10.0.17134
read.cpp
unsigned const bytes_to_read = stream_buffer_size != 0
? static_cast<unsigned>(maximum_bytes_to_read - maximum_bytes_to_read % stream_buffer_size)
: maximum_bytes_to_read;
...
int const bytes_read = _read_nolock(_fileno(stream.public_stream()), data, bytes_to_read);
setvbuf.cpp
// Force the buffer size to be even by masking the low order bit:
size_t const usable_buffer_size = buffer_size_in_bytes & ~static_cast<size_t>(1);
...
// Case 1: No buffering:
if (type & _IONBF)
{
return set_buffer(stream, reinterpret_cast<char*>(&stream->_charbuf), 2, _IOBUFFER_NONE);
}
And snippets from VC 6.0 (1998)
read.c
/* calc chars to read -- (count/bufsize) * bufsize */
nbytes = ( bufsize ? (count - count % bufsize) : count );
nread = _read(_fileno(stream), data, nbytes);
setvbuf.c
/*
* force size to be even by masking down to the nearest multiple
* of 2
*/
size &= (size_t)~1;
...
/*
* CASE 1: No Buffering.
*/
if (type & _IONBF) {
stream->_flag |= _IONBF;
buffer = (char *)&(stream->_charbuf);
size = 2;
}
typedef struct
{
long nIndex; // object index
TCHAR path[3 * MAX_TEXT_FIELD_SIZE];
}structItems;
void method1(LPCTSTR pInput, LPTSTR pOutput, size_t iSizeOfOutput)
{
size_t iLength = 0;
iLength = _tcslen(pInput);
if (iLength > iSizeOfOutput + sizeof(TCHAR))
iLength = iSizeOfOutput - sizeof(TCHAR);
memset(pOutput, 0, iSizeOfOutput); // Access violation error
}
void main()
{
CString csSysPath = _T("fghjjjjjjjjjjjjjjjj");
structItems *pIndexSyspath = nullptr;
pIndexSyspath = (structItems *)calloc(1, sizeof(structItems) * 15555555); //If i put size as 1555555 then it works well
method1(csSysPath, pIndexSyspath[0].path, (sizeof(TCHAR) * (3 * MAX_TEXT_FIELD_SIZE)));
}
This is a sample code which cause the crash.
In the above code if the size we put 1555555 then it works well (I randomly decreased size by a digit).
This is a 32 bit application running on 64 Bit Win OS on 16GB RAM
I kindly request some one to help me understand the reason for failure and relation between calloc - size - memset.
typedef struct
{
long nIndex; // 4 bytes on Windows
TCHAR path[3 * MAX_TEXT_FIELD_SIZE]; // 1 * 3 * 255 bytes for non-unicode
} structItems;
Supposing non unicode, TCHAR is 1byte, MAX_TEXT_FIELD_SIZE is 255, so sizeof(structItems) is 255*3 + 4, which is 769 bytes for a struct. Now, you want to allocate sizeof(structItems) * 15555555, which is more than 11GiB. How could that fit into 2GiB available to 32-bit process.
How do I initialize device array which is allocated using cudaMalloc()?
I tried cudaMemset, but it fails to initialize all values except 0.code, for cudaMemset looks like below, where value is initialized to 5.
cudaMemset(devPtr,value,number_bytes)
As you are discovering, cudaMemset works like the C standard library memset. Quoting from the documentation:
cudaError_t cudaMemset ( void * devPtr,
int value,
size_t count
)
Fills the first count bytes of the memory area pointed to by devPtr
with the constant byte value value.
So value is a byte value. If you do something like:
int *devPtr;
cudaMalloc((void **)&devPtr,number_bytes);
const int value = 5;
cudaMemset(devPtr,value,number_bytes);
what you are asking to happen is that each byte of devPtr will be set to 5. If devPtr was a an array of integers, the result would be each integer word would have the value 84215045. This is probably not what you had in mind.
Using the runtime API, what you could do is write your own generic kernel to do this. It could be as simple as
template<typename T>
__global__ void initKernel(T * devPtr, const T val, const size_t nwords)
{
int tidx = threadIdx.x + blockDim.x * blockIdx.x;
int stride = blockDim.x * gridDim.x;
for(; tidx < nwords; tidx += stride)
devPtr[tidx] = val;
}
(standard disclaimer: written in browser, never compiled, never tested, use at own risk).
Just instantiate the template for the types you need and call it with a suitable grid and block size, paying attention to the last argument now being a word count, not a byte count as in cudaMemset. This isn't really any different to what cudaMemset does anyway, using that API call results in a kernel launch which is do too different to what I posted above.
Alternatively, if you can use the driver API, there is cuMemsetD16 and cuMemsetD32, which do the same thing, but for half and full 32 bit word types. If you need to do set 64 bit or larger types (so doubles or vector types), your best option is to use your own kernel.
I also needed a solution to this question and I didn't really understand the other proposed solution. Particularly I didn't understand why it iterates over the grid blocks for(; tidx < nwords; tidx += stride) and for that matter, the kernel invocation and why using the counter-intuitive word sizes.
Therefore I created a much simpler monolithic generic kernel and customized it with strides i.e. you may use it to initialize a matrix in multiple ways e.g. set rows or columns to any value:
template <typename T>
__global__ void kernelInitializeArray(T* __restrict__ a, const T value,
const size_t n, const size_t incx) {
int tid = threadIdx.x + blockDim.x * blockIdx.x;
if (tid*incx < n) {
a[tid*incx] = value;
}
}
Then you may invoke the kernel like this:
template <typename T>
void deviceInitializeArray(T* a, const T value, const size_t n, const size_t incx) {
int number_of_blocks = ((n / incx) + BLOCK_SIZE - 1) / BLOCK_SIZE;
dim3 gridDim(number_of_blocks, 1);
dim3 blockDim(BLOCK_SIZE, 1);
kernelInitializeArray<T> <<<gridDim, blockDim>>>(a, value, n, incx);
}
I have a list of N 64-bit integers whose bits represent small sets. Each integer has at most k bits set to 1. Given a bit mask, I would like to find the first element in the list that matches the mask, i.e. element & mask == element.
Example:
If my list is:
index abcdef
0 001100
1 001010
2 001000
3 000100
4 000010
5 000001
6 010000
7 100000
8 000000
and my mask is 111000, the first element matching the mask is at index 2.
Method 1:
Linear search through the entire list. This takes O(N) time and O(1) space.
Method 2:
Precompute a tree of all possible masks, and at each node keep the answer for that mask. This takes O(1) time for the query, but takes O(2^64) space.
Question:
How can I find the first element matching the mask faster than O(N), while still using a reasonable amount of space? I can afford to spend polynomial time in precomputation, because there will be a lot of queries. The key is that k is small. In my application, k <= 5 and N is in the thousands. The mask has many 1s; you can assume that it is drawn uniformly from the space of 64-bit integers.
Update:
Here is an example data set and a simple benchmark program that runs on Linux: http://up.thirld.com/binmask.tar.gz. For large.in, N=3779 and k=3. The first line is N, followed by N unsigned 64-bit ints representing the elements. Compile with make. Run with ./benchmark.e >large.out to create the true output, which you can then diff against. (Masks are generated randomly, but the random seed is fixed.) Then replace the find_first() function with your implementation.
The simple linear search is much faster than I expected. This is because k is small, and so for a random mask, a match is found very quickly on average.
A suffix tree (on bits) will do the trick, with the original priority at the leaf nodes:
000000 -> 8
1 -> 5
10 -> 4
100 -> 3
1000 -> 2
10 -> 1
100 -> 0
10000 -> 6
100000 -> 7
where if the bit is set in the mask, you search both arms, and if not, you search only the 0 arm; your answer is the minimum number you encounter at a leaf node.
You can improve this (marginally) by traversing the bits not in order but by maximum discriminability; in your example, note that 3 elements have bit 2 set, so you would create
2:0 0:0 1:0 3:0 4:0 5:0 -> 8
5:1 -> 5
4:1 5:0 -> 4
3:1 4:0 5:0 -> 3
1:1 3:0 4:0 5:0 -> 6
0:1 1:0 3:0 4:0 5:0 -> 7
2:1 0:0 1:0 3:0 4:0 5:0 -> 2
4:1 5:0 -> 1
3:1 4:0 5:0 -> 0
In your example mask this doesn't help (since you have to traverse both the bit2==0 and bit2==1 sides since your mask is set in bit 2), but on average it will improve the results (but at a cost of setup and more complex data structure). If some bits are much more likely to be set than others, this could be a huge win. If they're pretty close to random within the element list, then this doesn't help at all.
If you're stuck with essentially random bits set, you should get about (1-5/64)^32 benefit from the suffix tree approach on average (13x speedup), which might be better than the difference in efficiency due to using more complex operations (but don't count on it--bit masks are fast). If you have a nonrandom distribution of bits in your list, then you could do almost arbitrarily well.
This is the bitwise Kd-tree. It typically needs less than 64 visits per lookup operation. Currently, the selection of the bit (dimension) to pivot on is random.
#include <limits.h>
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef unsigned long long Thing;
typedef unsigned long Number;
unsigned thing_ffs(Thing mask);
Thing rand_mask(unsigned bitcnt);
#define WANT_RANDOM 31
#define WANT_BITS 3
#define BITSPERTHING (CHAR_BIT*sizeof(Thing))
#define NONUMBER ((Number)-1)
struct node {
Thing value;
Number num;
Number nul;
Number one;
char pivot;
} *nodes = NULL;
unsigned nodecount=0;
unsigned itercount=0;
struct node * nodes_read( unsigned *sizp, char *filename);
Number *find_ptr_to_insert(Number *ptr, Thing value, Thing mask);
unsigned grab_matches(Number *result, Number num, Thing mask);
void initialise_stuff(void);
int main (int argc, char **argv)
{
Thing mask;
Number num;
unsigned idx;
srand (time(NULL));
nodes = nodes_read( &nodecount, argv[1]);
fprintf( stdout, "Nodecount=%u\n", nodecount );
initialise_stuff();
#if WANT_RANDOM
mask = nodes[nodecount/2].value | nodes[nodecount/3].value ;
#else
mask = 0x38;
#endif
fprintf( stdout, "\n#### Search mask=%llx\n", (unsigned long long) mask );
itercount = 0;
num = NONUMBER;
idx = grab_matches(&num,0, mask);
fprintf( stdout, "Itercount=%u\n", itercount );
fprintf(stdout, "KdTree search %16llx\n", (unsigned long long) mask );
fprintf(stdout, "Count=%u Result:\n", idx);
idx = num;
if (idx >= nodecount) idx = nodecount-1;
fprintf( stdout, "num=%4u Value=%16llx\n"
,(unsigned) nodes[idx].num
,(unsigned long long) nodes[idx].value
);
fprintf( stdout, "\nLinear search %16llx\n", (unsigned long long) mask );
for (idx = 0; idx < nodecount; idx++) {
if ((nodes[idx].value & mask) == nodes[idx].value) break;
}
fprintf(stdout, "Cnt=%u\n", idx);
if (idx >= nodecount) idx = nodecount-1;
fprintf(stdout, "Num=%4u Value=%16llx\n"
, (unsigned) nodes[idx].num
, (unsigned long long) nodes[idx].value );
return 0;
}
void initialise_stuff(void)
{
unsigned num;
Number root, *ptr;
root = 0;
for (num=0; num < nodecount; num++) {
nodes[num].num = num;
nodes[num].one = NONUMBER;
nodes[num].nul = NONUMBER;
nodes[num].pivot = -1;
}
nodes[num-1].value = 0; /* last node is guaranteed to match anything */
root = 0;
for (num=1; num < nodecount; num++) {
ptr = find_ptr_to_insert (&root, nodes[num].value, 0ull );
if (*ptr == NONUMBER) *ptr = num;
else fprintf(stderr, "Found %u for %u\n"
, (unsigned)*ptr, (unsigned) num );
}
}
Thing rand_mask(unsigned bitcnt)
{struct node * nodes_read( unsigned *sizp, char *filename)
{
struct node *ptr;
unsigned size,used;
FILE *fp;
if (!filename) {
size = (WANT_RANDOM+0) ? WANT_RANDOM : 9;
ptr = malloc (size * sizeof *ptr);
#if (!WANT_RANDOM)
ptr[0].value = 0x0c;
ptr[1].value = 0x0a;
ptr[2].value = 0x08;
ptr[3].value = 0x04;
ptr[4].value = 0x02;
ptr[5].value = 0x01;
ptr[6].value = 0x10;
ptr[7].value = 0x20;
ptr[8].value = 0x00;
#else
for (used=0; used < size; used++) {
ptr[used].value = rand_mask(WANT_BITS);
}
#endif /* WANT_RANDOM */
*sizp = size;
return ptr;
}
fp = fopen( filename, "r" );
if (!fp) return NULL;
fscanf(fp,"%u\n", &size );
fprintf(stderr, "Size=%u\n", size);
ptr = malloc (size * sizeof *ptr);
for (used = 0; used < size; used++) {
fscanf(fp,"%llu\n", &ptr[used].value );
}
fclose( fp );
*sizp = used;
return ptr;
}
Thing value = 0;
unsigned bit, cnt;
for (cnt=0; cnt < bitcnt; cnt++) {
bit = 54321*rand();
bit %= BITSPERTHING;
value |= 1ull << bit;
}
return value;
}
Number *find_ptr_to_insert(Number *ptr, Thing value, Thing done)
{
Number num=NONUMBER;
while ( *ptr != NONUMBER) {
Thing wrong;
num = *ptr;
wrong = (nodes[num].value ^ value) & ~done;
if (nodes[num].pivot < 0) { /* This node is terminal */
/* choose one of the wrong bits for a pivot .
** For this bit (nodevalue==1 && searchmask==0 )
*/
if (!wrong) wrong = ~done ;
nodes[num].pivot = thing_ffs( wrong );
}
ptr = (wrong & 1ull << nodes[num].pivot) ? &nodes[num].nul : &nodes[num].one;
/* Once this bit has been tested, it can be masked off. */
done |= 1ull << nodes[num].pivot ;
}
return ptr;
}
unsigned grab_matches(Number *result, Number num, Thing mask)
{
Thing wrong;
unsigned count;
for (count=0; num < *result; ) {
itercount++;
wrong = nodes[num].value & ~mask;
if (!wrong) { /* we have a match */
if (num < *result) { *result = num; count++; }
/* This is cheap pruning: the break will omit both subtrees from the results.
** But because we already have a result, and the subtrees have higher numbers
** than our current num, we can ignore them. */
break;
}
if (nodes[num].pivot < 0) { /* This node is terminal */
break;
}
if (mask & 1ull << nodes[num].pivot) {
/* avoid recursion if there is only one non-empty subtree */
if (nodes[num].nul >= *result) { num = nodes[num].one; continue; }
if (nodes[num].one >= *result) { num = nodes[num].nul; continue; }
count += grab_matches(result, nodes[num].nul, mask);
count += grab_matches(result, nodes[num].one, mask);
break;
}
mask |= 1ull << nodes[num].pivot;
num = (wrong & 1ull << nodes[num].pivot) ? nodes[num].nul : nodes[num].one;
}
return count;
}
unsigned thing_ffs(Thing mask)
{
unsigned bit;
#if 1
if (!mask) return (unsigned)-1;
for ( bit=random() % BITSPERTHING; 1 ; bit += 5, bit %= BITSPERTHING) {
if (mask & 1ull << bit ) return bit;
}
#elif 0
for (bit =0; bit < BITSPERTHING; bit++ ) {
if (mask & 1ull <<bit) return bit;
}
#else
mask &= (mask-1); // Kernighan-trick
for (bit =0; bit < BITSPERTHING; bit++ ) {
mask >>=1;
if (!mask) return bit;
}
#endif
return 0xffffffff;
}
struct node * nodes_read( unsigned *sizp, char *filename)
{
struct node *ptr;
unsigned size,used;
FILE *fp;
if (!filename) {
size = (WANT_RANDOM+0) ? WANT_RANDOM : 9;
ptr = malloc (size * sizeof *ptr);
#if (!WANT_RANDOM)
ptr[0].value = 0x0c;
ptr[1].value = 0x0a;
ptr[2].value = 0x08;
ptr[3].value = 0x04;
ptr[4].value = 0x02;
ptr[5].value = 0x01;
ptr[6].value = 0x10;
ptr[7].value = 0x20;
ptr[8].value = 0x00;
#else
for (used=0; used < size; used++) {
ptr[used].value = rand_mask(WANT_BITS);
}
#endif /* WANT_RANDOM */
*sizp = size;
return ptr;
}
fp = fopen( filename, "r" );
if (!fp) return NULL;
fscanf(fp,"%u\n", &size );
fprintf(stderr, "Size=%u\n", size);
ptr = malloc (size * sizeof *ptr);
for (used = 0; used < size; used++) {
fscanf(fp,"%llu\n", &ptr[used].value );
}
fclose( fp );
*sizp = used;
return ptr;
}
UPDATE:
I experimented a bit with the pivot-selection, favouring bits with the highest discriminatory value ("information content"). This involves:
making a histogram of the usage of bits (can be done while initialising)
while building the tree: choosing the one with frequency closest to 1/2 in the remaining subtrees.
The result: the random pivot selection performed better.
Construct a a binary tree as follows:
Every level corresponds to a bit
It corresponding bit is on go right, otherwise left
This way insert every number in the database.
Now, for searching: if the corresponding bit in the mask is 1, traverse both children. If it is 0, traverse only the left node. Essentially keep traversing the tree until you hit the leaf node (BTW, 0 is a hit for every mask!).
This tree will have O(N) space requirements.
Eg of tree for 1 (001), 2(010) and 5 (101)
root
/ \
0 1
/ \ |
0 1 0
| | |
1 0 1
(1) (2) (5)
With precomputed bitmasks. Formally is is still O(N), since the and-mask operations are O(N). The final pass is also O(N), because it needs to find the lowest bit set, but that could be sped up, too.
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* For demonstration purposes.
** In reality, this should be an unsigned long long */
typedef unsigned char Thing;
#define BITSPERTHING (CHAR_BIT*sizeof (Thing))
#define COUNTOF(a) (sizeof a / sizeof a[0])
Thing data[] =
/****** index abcdef */
{ 0x0c /* 0 001100 */
, 0x0a /* 1 001010 */
, 0x08 /* 2 001000 */
, 0x04 /* 3 000100 */
, 0x02 /* 4 000010 */
, 0x01 /* 5 000001 */
, 0x10 /* 6 010000 */
, 0x20 /* 7 100000 */
, 0x00 /* 8 000000 */
};
/* Note: this is for demonstration purposes.
** Normally, one should choose a machine wide unsigned int
** for bitmask arrays.
*/
struct bitmap {
char data[ 1+COUNTOF (data)/ CHAR_BIT ];
} nulmaps [ BITSPERTHING ];
#define BITSET(a,i) (a)[(i) / CHAR_BIT ] |= (1u << ((i)%CHAR_BIT) )
#define BITTEST(a,i) ((a)[(i) / CHAR_BIT ] & (1u << ((i)%CHAR_BIT) ))
void init_tabs(void);
void map_empty(struct bitmap *dst);
void map_full(struct bitmap *dst);
void map_and2(struct bitmap *dst, struct bitmap *src);
int main (void)
{
Thing mask;
struct bitmap result;
unsigned ibit;
mask = 0x38;
init_tabs();
map_full(&result);
for (ibit = 0; ibit < BITSPERTHING; ibit++) {
/* bit in mask is 1, so bit at this position is in fact a don't care */
if (mask & (1u <<ibit)) continue;
/* bit in mask is 0, so we can only select items with a 0 at this bitpos */
map_and2(&result, &nulmaps[ibit] );
}
/* This is not the fastest way to find the lowest 1 bit */
for (ibit = 0; ibit < COUNTOF (data); ibit++) {
if (!BITTEST(result.data, ibit) ) continue;
fprintf(stdout, " %u", ibit);
}
fprintf( stdout, "\n" );
return 0;
}
void init_tabs(void)
{
unsigned ibit, ithing;
/* 1 bits in data that dont overlap with 1 bits in the searchmask are showstoppers.
** So, for each bitpos, we precompute a bitmask of all *entrynumbers* from data[], that contain 0 in bitpos.
*/
memset(nulmaps, 0 , sizeof nulmaps);
for (ithing=0; ithing < COUNTOF(data); ithing++) {
for (ibit=0; ibit < BITSPERTHING; ibit++) {
if ( data[ithing] & (1u << ibit) ) continue;
BITSET(nulmaps[ibit].data, ithing);
}
}
}
/* Logical And of two bitmask arrays; simular to dst &= src */
void map_and2(struct bitmap *dst, struct bitmap *src)
{
unsigned idx;
for (idx = 0; idx < COUNTOF(dst->data); idx++) {
dst->data[idx] &= src->data[idx] ;
}
}
void map_empty(struct bitmap *dst)
{
memset(dst->data, 0 , sizeof dst->data);
}
void map_full(struct bitmap *dst)
{
unsigned idx;
/* NOTE this loop sets too many bits to the left of COUNTOF(data) */
for (idx = 0; idx < COUNTOF(dst->data); idx++) {
dst->data[idx] = ~0;
}
}
I have data like (1,2,3,4,5,6,7,8) .I want to arrange them in a way like (1,3,5,7,2,4,6,8) in n/2-2 swap without using any array and loop must be use 1 or less.
Note that i have to do the swap in existing array of number.If there is other way like without swap and without extra array use,
Please give me some advice.
maintain two pointers: p1,p2. p1 goes from start to end, p2 goes from end to start, and swap non matching elements.
pseudo code:
specialSort(array):
p1 <- array.start()
p2 <- array.end()
while (p1 != p2):
if (*p1 %2 == 0):
p1 <- p1 + 1;
continue;
if (*p2 %2 == 1):
p2 <- p2 -1;
continue;
//when here, both p1 and p2 need a swap
swap(p1,p2);
Note that complexity is O(n), at least one of p1 or p2 changes in every second iteration, so the loop cannot repeat more the 2*n=O(n) times. [we can find better bound, but it is not needed]. space complexity is trivially O(1), we allocate a constant amount of space: 2 pointers only.
Note2: if your language does not support pointers [i.e. java,ml,...], it can be replaced with indexes: i1 going from start to end, i2 going from end to start, with the same algorithm principle.
#include <stdio.h>
#include <string.h>
char array[26] = "ABcdEfGiHjklMNOPqrsTUVWxyZ" ;
#define COUNTOF(a_) (sizeof(a_)/sizeof(a_)[0])
#define IS_ODD(e) ((e)&0x20)
#define IS_EVEN(e) (!IS_ODD(e))
void doswap (char *ptr, unsigned sizl, unsigned sizr);
int main(void)
{
unsigned bot,limit,cut,top,size;
size = COUNTOF(array);
printf("Before:%26.26s\n", array);
/* pass 1 count the number of EVEN chars */
for (limit=top=0; top < size; top++) {
if ( IS_EVEN( array[top] ) ) limit++;
}
/* skip initial segment of EVEN */
for (bot=0; bot < limit;bot++ ) {
if ( IS_ODD(array[bot])) break;
}
/* Find leading strech of misplaced ODD + trailing stretch of EVEN */
for (cut=bot;bot < limit; cut = top) {
/* count misplaced items */
for ( ;cut < size && IS_ODD(array[cut]); cut++) {;}
/* count shiftable items */
for (top=cut;top < size && IS_EVEN(array[top]); top++) {;}
/* Now, [bot...cut) and [cut...top) are two blocks
** that need to be swapped: swap them */
doswap(array+bot, cut-bot, top-cut);
bot += top-cut;
}
printf("Result:%26.26s\n", array);
return 0;
}
void doswap (char *ptr, unsigned sizl, unsigned sizr)
{
if (!sizl || !sizr) return;
if (sizl >= sizr) {
char tmp[sizr];
memcpy(tmp, ptr+sizl, sizr);
memmove(ptr+sizr, ptr, sizl);
memcpy(ptr, tmp, sizr);
}
else {
char tmp[sizr];
memcpy(tmp, ptr, sizl);
memmove(ptr, ptr+sizl, sizr);
memcpy(ptr+sizl, tmp, sizl);
}
}