How to feed a random source with uint32s - go

I'm trying to implement a 32-bit (MT19937-32, LFSR113 & LFSR88, among others) random sources in Go, but math.Rand's source interface accepts Int63() as method.
How do we convert uint32 to int64 (non-negative int64, or 63-bit)
here's an LFSR88 code (some methods and consts omitted):
type LFSR88 struct {
s1, s2, s3, b uint32
}
.
.
.
func (lfsr *LFSR88) Uint32() uint32 {
lfsr.b = (((lfsr.s1 << 13) ^ lfsr.s1) >> 19)
lfsr.s1 = (((lfsr.s1 & 4294967294) << 12) ^ lfsr.b)
lfsr.b = (((lfsr.s2 << 2) ^ lfsr.s2) >> 25)
lfsr.s2 = (((lfsr.s2 & 4294967288) << 4) ^ lfsr.b)
lfsr.b = (((lfsr.s3 << 3) ^ lfsr.s3) >> 11)
lfsr.s3 = (((lfsr.s3 & 4294967280) << 17) ^ lfsr.b)
return (lfsr.s1 ^ lfsr.s2 ^ lfsr.s3)
}

Converting a uint32 to an int64 is quite simple:
var u32 uint32 = /* some number */
var i64 int64 = int64(u32)
The problem with this alone is that you'll end up with an int64 that's half 0 bits, so you probably want to combine two of them:
var u1, u2 uint32 = /* two numbers */
var i64 uint64 = int64(u1) + int64(u2)<<32
See a complete example here.

Related

Getting “./a.out” terminated by signal SIGSEGV (Address boundary error)

I'm writing a program that splits any two numbers. The problem is whenever I run the program I get an error that says:
“./a.out” terminated by signal SIGSEGV (Address boundary error)
And that error occurs at the lines:
a = std::stoi(temp_vec.front());
b = std::stoi(temp_vec.back());
and
c = std::stoi(temp_vec.front());
d = std::stoi(temp_vec.back());
Here's my program:
#include <iostream>
#include <string>
#include <vector>
void split_number(std::vector<std::string> vect, int x);
int main()
{
int x = 0, y = 0, a = 0, b = 0, c = 0, d = 0;
std::vector<std::string> temp_vec;
std::cout << "Enter x: ";
std::cin >> x;
std::cout << "Enter y: ";
std::cin >> y;
split_number(temp_vec, x);
a = std::stoi(temp_vec.front());
b = std::stoi(temp_vec.back());
split_number(temp_vec, y);
c = std::stoi(temp_vec.front());
d = std::stoi(temp_vec.back());
return 0;
}
void split_number(std::vector<std::string> vect, int x)
{
vect.clear();
//1. convert x to string
std::string temp_str = std::to_string(x);
//2. calculate length
std::size_t len = temp_str.length();
std::size_t delm = 0;
if(len % 2 == 0) {
delm = len / 2;
} else {
delm = (len + 1) / 2;
}
//3. populate vector
vect.push_back(temp_str.substr(0, delm));
vect.push_back(temp_str.substr(delm + 1));
}
Any help would be appreciated.
You get the segmentation fault because your vector is empty. Your vector is empty because you pass a copy of your initial vector to split_number(). The copy is passed because the signature of split_number() says it requires a copy. Change it to:
void split_number(std::vector<std::string> & vect, int x)
The ampersand makes the vect parameter a reference parameter, and modifications will show in the calling code.

Combine three 32-bit identifiers into one 32-bit identifier?

Given three identifiers, combine them into a single 32-bit value.
It is known, that the first identifier may have (2^8)-1 different values. Analogically, the second (2^8)-1 and the third (2^10)-1. Therefore the total count of identifiers of all kinds will not exceed (2^32)-1.
Example solution could be to have a map:
key: 32 bits,
value: 8 (or 10) bits.
The value would begin at 0 and be incremented every time a new identifier is provided.
Can it be done better? (instead of 3 maps) Do you see a problem with this solution?
To clarify, the identifier can hold ANY values from the range <0, 2^32). The only information that is given, is that the total number of them will not exceed (2^8)-1 (or 10th).
The identifiers can have the same values (it's completely random). Consider the randomness source memory addresses given by the OS to heap-allocated memory (e.g. using a pointer as an identifier). I realize this might work differently on x64 systems, however, I hope the general's problem solution to be similiar to this specific one.
This means that a simple bit shifting is out of question.
You could try something like this:-
#include <map>
#include <iostream>
class CombinedIdentifier
{
public:
CombinedIdentifier (unsigned id1, unsigned id2, unsigned id3)
{
m_id [0] = id1;
m_id [1] = id2;
m_id [2] = id3;
}
// version to throw exception on ID not found
static CombinedIdentifier GetIdentifier (unsigned int id)
{
// search m_store for a value = id
// if found, get key and return it
// else....throw an exception->id not found
}
// version to return found/not found instead of throwing an exception
static bool GetIdentifier (unsigned int id, CombinedIdentifier &out)
{
// search m_store for a value = id
// if found, get key and save it to 'out' and return true
// else....return false
}
int operator [] (int index) { return m_id [index]; }
bool operator < (const CombinedIdentifier &rhs) const
{
return m_id [0] < rhs.m_id [0] ? true :
m_id [1] < rhs.m_id [1] ? true :
m_id [2] < rhs.m_id [2];
}
bool operator == (const CombinedIdentifier &rhs) const
{
return m_id [0] == rhs.m_id [0] &&
m_id [1] == rhs.m_id [1] &&
m_id [2] == rhs.m_id [2];
}
bool operator != (const CombinedIdentifier &rhs) const
{
return !operator == (rhs);
}
int GetID ()
{
int
id;
std::map <CombinedIdentifier, int>::iterator
item = m_store.find (*this);
if (item == m_store.end ())
{
id = m_store.size () + 1;
m_store [*this] = id;
}
else
{
id = item->second;
}
return id;
}
private:
int
m_id [3];
static std::map <CombinedIdentifier, int>
m_store;
};
std::map <CombinedIdentifier, int>
CombinedIdentifier::m_store;
int main ()
{
CombinedIdentifier
id1 (2, 4, 10),
id2 (9, 14, 1230),
id3 (4, 1, 14560),
id4 (9, 14, 1230);
std::cout << "id1 = " << id1.GetID () << std::endl;
std::cout << "id2 = " << id2.GetID () << std::endl;
std::cout << "id3 = " << id3.GetID () << std::endl;
std::cout << "id4 = " << id4.GetID () << std::endl;
}
You can get this with bit shifting and unsafe code.
There is an article on SO: What are bitwise shift (bit-shift) operators and how do they work?
Then you can use the whole 32bit range for your three values
---- 8 bits ---- | ---- 8 bits ---- | ---- 10 bits ---- | ---- unused 6 bits ----
int result = firstValue << (8 + 10 + 6);
result += secondValue << (10 + 6);
result += thirdValue << 6;
I think you could make use of a Perfect Hash Function. In particular, the link provided in that that article to Pearson Hashing seems to be appropriate. You might even be able to cut-and-paste the included C program the 2nd article except for the fact that its output is a 64-bit number not a 32-bit one. But if you modify it slightly from
for (j=0; j<8; j++) {
// standard Pearson hash (output is h)
to
for (j=0; j<4; j++) {
// standard Pearson hash (output is h)
You'll have what you need.

Filter only digit sequences containing a given set of digits

I have a large list of digit strings like this one. The individual strings are relatively short (say less than 50 digits).
data = [
'300303334',
'53210234',
'123456789',
'5374576807063874'
]
I need to find out a efficient data structure (speed first, memory second) and algorithm which returns only those strings that are composed of a given set of digits.
Example results:
filter(data, [0,3,4]) = ['300303334']
filter(data, [0,1,2,3,4,5]) = ['300303334', '53210234']
The data list will usually fit into memory.
For each digit, precompute a postings list that don't contain the digit.
postings = [[] for _ in xrange(10)]
for i, d in enumerate(data):
for j in xrange(10):
digit = str(j)
if digit not in d:
postings[j].append(i)
Now, to find all strings that contain, for example, just the digits [1, 3, 5] you can merge the postings lists for the other digits (ie: 0, 2, 4, 6, 7, 8, 9).
def intersect_postings(p0, p1):
i0, i1 = next(p0), next(p1)
while True:
if i0 == i1:
yield i0
i0, i1 = next(p0), next(p1)
elif i0 < i1: i0 = next(p0)
else: i1 = next(p1)
def find_all(digits):
p = None
for d in xrange(10):
if d not in digits:
if p is None: p = iter(postings[d])
else: p = intersect_postings(p, iter(postings[d]))
return (data[i] for i in p) if p else iter(data)
print list(find_all([0, 3, 4]))
print list(find_all([0, 1, 2, 3, 4, 5]))
A string can be encoded by a 10-bit number. There are 2^10, or 1,024 possible values.
So create a dictionary that uses an integer for a key and a list of strings for the value.
Calculate the value for each string and add that string to the list of strings for that value.
General idea:
Dictionary Lookup;
for each (string in list)
value = 0;
for each character in string
set bit N in value, where N is the character (0-9)
Lookup[value] += string // adds string to list for this value in dictionary
Then, to get a list of the strings that match your criteria, just compute the value and do a direct dictionary lookup.
So if the user asks for strings that contain only 3, 5, and 7:
value = (1 << 3) || (1 << 5) || (1 << 7);
list = Lookup[value];
Note that, as Matt pointed out in comment below, this will only return strings that contain all three digits. So, for example, it wouldn't return 37. That seems like a fatal flaw to me.
Edit
If the number of symbols you have to deal with is very large, then the number of possible combinations becomes too large for this solution to be practical.
With a large number of symbols, I'd recommend an inverted index as suggested in the comments, combined with a secondary filter that removes the strings that contain extraneous digits.
Consider a function f which constructs a bitmask for each string with bit i set if digit i is in the string.
For example,
f('0') = 0b0000000001
f('00') = 0b0000000001
f('1') = 0b0000000010
f('1100') = 0b0000000011
Then I suggest storing a list of strings for each bitmask.
For example,
Bitmask 0b0000000001 -> ['0','00']
Once you have prepared this data structure (which is the same size as your original list), you can then easily access all the strings for a particular filter by accessing all lists where the bitmask is a subset of the digits in your filter.
So for your example of filter [0,3,4] you would return the lists from:
Strings containing just 0
Strings containing just 3
Strings containing just 4
Strings containing 0 and 3
Strings containing 0 and 4
Strings containing 3 and 4
Strings containing 0 and 3 and 4
Example Python Code
from collections import defaultdict
import itertools
raw_data = [
'300303334',
'53210234',
'123456789',
'5374576807063874'
]
def preprocess(raw_data):
data = defaultdict(list)
for s in raw_data:
bitmask = 0
for digit in s:
bitmask |= 1<<int(digit)
data[bitmask].append(s)
return data
def filter(data,mask):
for r in range(len(mask)):
for m in itertools.combinations(mask,r+1):
bitmask = sum(1<<digit for digit in m)
for s in data[bitmask]:
yield s
data = preprocess(raw_data)
for a in filter(data, [0,1,2,3,4,5]):
print a
Just for kicks, I have coded up Jim's lovely algorithm and the Perl is here if anyone wants to play with it. Please do not accept this as an answer or anything, pass all credit to Jim:
#!/usr/bin/perl
use strict;
use warnings;
my $Debug=1;
my $Nwords=1000;
my ($word,$N,$value,$i,$j,$k);
my (#dictionary,%Lookup);
################################################################################
# Generate "words" with random number of characters 5-30
################################################################################
print "DEBUG: Generating $Nwords word dictionary\n" if $Debug;
for($i=0;$i<$Nwords;$i++){
$j = rand(25) + 5; # length of this word
$word="";
for($k=0;$k<$j;$k++){
$word = $word . int(rand(10));
}
$dictionary[$i]=$word;
print "$word\n" if $Debug;
}
# Add some obvious test cases
$dictionary[++$i]="0" x 50;
$dictionary[++$i]="1" x 50;
$dictionary[++$i]="2" x 50;
$dictionary[++$i]="3" x 50;
$dictionary[++$i]="4" x 50;
$dictionary[++$i]="5" x 50;
$dictionary[++$i]="6" x 50;
$dictionary[++$i]="7" x 50;
$dictionary[++$i]="8" x 50;
$dictionary[++$i]="9" x 50;
$dictionary[++$i]="0123456789";
################################################################################
# Encode words
################################################################################
for $word (#dictionary){
$value=0;
for($i=0;$i<length($word);$i++){
$N=substr($word,$i,1);
$value |= 1 << $N;
}
push(#{$Lookup{$value}},$word);
print "DEBUG: $word encoded as $value\n" if $Debug;
}
################################################################################
# Do lookups
################################################################################
while(1){
print "Enter permitted digits, separated with commas: ";
my $line=<STDIN>;
my #digits=split(",",$line);
$value=0;
for my $d (#digits){
$value |= 1<<$d;
}
print "Value: $value\n";
print join(", ",#{$Lookup{$value}}),"\n\n" if defined $Lookup{$value};
}
I like Jim Mischel's approach. It has pretty efficient look up and bounded memory usage. Code in C follows:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <readline/readline.h>
#include <readline/history.h>
enum {
zero = '0',
nine = '9',
numbers = nine - zero + 1,
masks = 1 << numbers,
};
typedef uint16_t mask;
struct list {
char *s;
struct list *next;
};
typedef struct list list_cell;
typedef struct list *list;
static inline int is_digit(char c) { return c >= zero && c <= nine; }
static inline mask char2mask(char c) { return 1 << (c - zero); }
static inline mask add_char2mask(mask m, char c) {
return m | (is_digit(c) ? char2mask(c) : 0);
}
static inline int is_set(mask m, mask n) { return (m & n) != 0; }
static inline int is_set_char(mask m, char c) { return is_set(m, char2mask(c)); }
static inline int is_submask(mask sub, mask m) { return (sub & m) == sub; }
static inline char *sprint_mask(char buf[11], mask m) {
char *s = buf;
char i;
for(i = zero; i <= nine; i++)
if(is_set_char(m, i)) *s++ = i;
*s = 0;
return buf;
}
static inline mask get_mask(char *s) {
mask m=0;
for(; *s; s++)
m = add_char2mask(m, *s);
return m;
}
static inline int is_empty(list l) { return !l; }
static inline list insert(list *l, char *s) {
list cell = (list)malloc(sizeof(list_cell));
cell->s = s;
cell->next = *l;
return *l = cell;
}
static void *foreach(void *f(char *, void *), list l, void *init) {
for(; !is_empty(l); l = l->next)
init = f(l->s, init);
return init;
}
struct printer_state {
int first;
FILE *f;
};
static void *prin_list_member(char *s, void *data) {
struct printer_state *st = (struct printer_state *)data;
if(st->first) {
fputs(", ", st->f);
} else
st->first = 1;
fputs(s, st->f);
return data;
}
static void print_list(list l) {
struct printer_state st = {.first = 0, .f = stdout};
foreach(prin_list_member, l, (void *)&st);
putchar('\n');
}
static list *init_lu(void) { return (list *)calloc(sizeof(list), masks); }
static list *insert2lu(list lu[masks], char *s) {
mask i, m = get_mask(s);
if(m) // skip string without any number
for(i = m; i < masks; i++)
if(is_submask(m, i))
insert(lu+i, s);
return lu;
}
int usage(const char *name) {
fprintf(stderr, "Usage: %s filename\n", name);
return EXIT_FAILURE;
}
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
static inline void chomp(char *s) { if( (s = strchr(s, '\n')) ) *s = '\0'; }
list *load_file(FILE *f) {
char *line = NULL;
size_t len = 0;
ssize_t read;
list *lu = init_lu();
for(; (read = getline(&line, &len, f)) != -1; line = NULL) {
chomp(line);
insert2lu(lu, line);
}
return lu;
}
void read_reqs(list *lu) {
char *line;
char buf[11];
for(; (line = readline("> ")); free(line))
if(*line) {
add_history(line);
mask m = get_mask(line);
printf("mask: %s\nstrings: ", sprint_mask(buf, m));
print_list(lu[m]);
};
putchar('\n');
}
int main(int argc, const char* argv[] ) {
const char *name = argv[0];
FILE *f;
list *lu;
if(argc != 2) return usage(name);
f = fopen(argv[1], "r");
if(!f) handle_error("open");
lu = load_file(f);
fclose(f);
read_reqs(lu);
return EXIT_SUCCESS;
}
To compile use
gcc -lreadline -o digitfilter digitfilter.c
And test run:
$ cat data.txt
300303334
53210234
123456789
5374576807063874
$ ./digitfilter data.txt
> 034
mask: 034
strings: 300303334
> 0,1,2,3,4,5
mask: 012345
strings: 53210234, 300303334
> 0345678
mask: 0345678
strings: 5374576807063874, 300303334
Put each value into a set-- Eg.: '300303334'={3, 0, 4}.
Since the length of your data items are bound by a constant (50),
you can do these at O(1) time for each item using Java HashSet. The overall complexity of this phase adds up to O(n).
For each filter set, use containsAll() of HashSet to see whether
each of these data items is a subset of your filter. Takes O(n).
Takes O(m*n) in the overall where n is the number of data items and m the number of filters.

Manually Converting rgba8 to rgba5551

I need to convert rgba8 to rgba5551 manually. I found some helpful code from another post and want to modify it to convert from rgba8 to rgba5551. I don't really have experience with bitewise stuff and haven't had any luck messing with the code myself.
void* rgba8888_to_rgba4444( void* src, int src_bytes)
{
// compute the actual number of pixel elements in the buffer.
int num_pixels = src_bytes / 4;
unsigned long* psrc = (unsigned long*)src;
unsigned short* pdst = (unsigned short*)src;
// convert every pixel
for(int i = 0; i < num_pixels; i++){
// read a source pixel
unsigned px = psrc[i];
// unpack the source data as 8 bit values
unsigned r = (px << 8) & 0xf000;
unsigned g = (px >> 4) & 0x0f00;
unsigned b = (px >> 16) & 0x00f0;
unsigned a = (px >> 28) & 0x000f;
// and store
pdst[i] = r | g | b | a;
}
return pdst;
}
The value of RGBA5551 is that it has color info condensed into 16 bits - or two bytes, with only one bit for the alpha channel (on or off). RGBA8888, on the other hand, uses a byte for each channel. (If you don't need an alpha channel, I hear RGB565 is better - as humans are more sensitive to green). Now, with 5 bits, you get the numbers 0 through 31, so r, g, and b each need to be converted to some number between 0 and 31, and since they are originally a byte each (0-255), we multiply each by 31/255. Here is a function that takes RGBA bytes as input and outputs RGBA5551 as a short:
short int RGBA8888_to_RGBA5551(unsigned char r, unsigned char g, unsigned char b, unsigned char a){
unsigned char r5 = r*31/255; // All arithmetic is integer arithmetic, and so floating points are truncated. If you want to round to the nearest integer, adjust this code accordingly.
unsigned char g5 = g*31/255;
unsigned char b5 = b*31/255;
unsigned char a1 = (a > 0) ? 1 : 0; // 1 if a is positive, 0 else. You must decide what is sensible.
// Now that we have our 5 bit r, g, and b and our 1 bit a, we need to shift them into place before combining.
short int rShift = (short int)r5 << 11; // (short int)r5 looks like 00000000000vwxyz - 11 zeroes. I'm not sure if you need (short int), but I've wasted time tracking down bugs where I didn't typecast properly before shifting.
short int gShift = (short int)g5 << 6;
short int bShift = (short int)b5 << 1;
// Combine and return
return rShift | gShift | bShift | a1;
}
You can, of course condense this code.

Fastest ways to set and get a bit

I'm just trying to develop ultra-fast functions for setting and getting bits in uint32 arrays. For example, you can say "set bit 1035 to 1". Then, the uint32 indexed with 1035 / 32 is used with the bitposition 1035 % 32. I especially don't like the branching in the setbit function.
Here is my approach:
void SetBit(uint32* data, const uint32 bitpos, const bool newval)
{
if (newval)
{
//Set On
data[bitpos >> 5u] |= (1u << (31u - (bitpos & 31u)));
return;
}
else
{
//Set Off
data[bitpos >> 5u] &= ~(1u << (31u - (bitpos & 31u)));
return;
}
}
and
bool GetBit(const uint32* data, const uint32 bitpos)
{
return (data[bitpos >> 5u] >> (31u - (bitpos & 31u))) & 1u;
}
Thank you!
First, I would drop the 31u - ... from all expressions: all it does is reordering the bits in your private representation of the bit set, so you can flip this order without anyone noticing.
Second, you can get rid of the branch by using a clever bit hack:
void SetBit(uint32* data, const uint32 bitpos, const bool f)
{
uint32 &w = data[bitpos >> 5u];
uint32 m = 1u << (bitpos & 31u);
w = (w & ~m) | (-f & m);
}
Third, you can simplify your getter by letting the compiler do the conversion:
bool GetBit(const uint32* data, const uint32 bitpos)
{
return data[bitpos >> 5u] & (1u << (bitpos & 31u));
}

Resources