Begin read from arbitary position in array - algorithm

I wonder if there's a way to begin reading from an arbitrary position in a array. E.g. if I have a array of size 10 and it begins reading from position 4. Then it should continue on reading from position 5, 6, 7, 8, 9, 0, 1, 2, 3
I was uncertain with tag, so if have picked wrong tag please do change it for me.

Yes, you can index using the modulo operation which is written as % in most languages:
x = list[i % list.length]
This will give you the desired effect of wrapping around when you reach the end of the list instead of attempting to index out of bounds.
This assumes 0-based indexing. If you use 1-based indexing you have to add one to the result of the modulo operation.

offset = 4;
for(i=0; i<n; i++)
cout << x[(i+offset)%n] << ' ';

It depends on what you mean by "list". Traditionally in computer science, "list" usually means "linked list". In this case, you have to traverse the list in order to get to a particular element.
If you need/want to be able to start reading from arbitrary positions efficiently, you probably want to avoid linked lists, but the exact alternatives you have (easily) available will vary with the programming language, libraries, etc., you're using.
Edit: for an array, it's generally trivial -- just specify the starting position directly, and take the remainder after dividing by the array size.

One option is to make the list circular (like they do it in the Linux kernel).

Related

Pattern Recognition Algorithm/Technique

Background
I apologize for the music-based question, but the details don't really mean all that much. I'm sequentially going through a midi file and I'm looking for an efficient way to find a pattern in the data to find something called a tuplet. See image below:
The tuplets have the numbers (3 or 6) over top of them. I need to know at which position they begin in the data file. The numbers below the notes are the values you would see sequentially in the data file. Just in case you can't decipher the data below, here it is:
1, 2, 2.3333, 2.6666, 3, 3.5, 3.6666, 3.83333, 4, 4.1666, 4.3333, 4.5, 4.6666, 4.8333,
5, 6.3333, 6.6666, 7.1666, 7.3333, 7.5, 7.6666, 7.8333, 8, 8.1666, 8.333, 8.5, 8.6666.
The first tuplet begins at position 2 and the difference between the position of notes is 0.3333 (repeating)
The second tuplet begins at position 3.5 and the difference between the position of notes is 0.1666 (repeating)
The main issue is that in the note, unlike the image below, position 7 will not be noted in the data file because the data only file only lists note locations. The icon that you see in that location is called a rest, which is not notated in the data file.
Question
How can I find an efficient method to find the start of each tuplet? Is there some sort of recursive method?
I don't think you need any recursion for this.
The normal note values can only be represented by fractions of the beat of the type a / 2^b. The tuplets can be arbitrary fractions, but mostly I've seen something like triplets, quintuplets or (in your case sextuplets).
So the simplest way would be to compute the length of every note (maybe the time difference between two MIDI events? Or the length is stored explicitly in MIDI? I'm not that familiar with the format) and compute the rational representation of this length.
Every group of notes with a denominator that is not a power of two belongs to such a tuplet. To group the notes together, I would recommend the following approach (assuming that all notes of a tuplet have the same value):
Factorize the denominator into a power of two a and the rest b (e.g. a * b = 4 * 5)
Initialize an empty tuplet of size b
For every note compute the distance to the beginning of the tuplet and store the note at the corresponding position, inserting rests if necessary. The length of the tuplet can be computed by taking the minimum length l of all notes in the tuplet, so greedily adding them until the end of these notes exceeds a distance of l * b from the beginning of the tuplet
This way, you base the tuplet on the minimum note length and add all notes that fit into it.

Making a custom Distribution of Random Numbers

I am trying to make sense of the different distribution objects in c++11 and I am finding it overwhelming. I hope some of you can and will help.
This is why I am looking into all this:
I need a random number generator that I can adjust every time it is used so that it is more likely to produce the same number again. The second requirement I need to fill is that I need the random numbers generated to only be these numbers:
{1, 2, 4, 8, 16, ..., 128}
Third and last requirement is that on certain occasions I need to skip one or more numbers from the above set.
My problem is that I don't understand the descriptions of various distribution objects. I, thus, cannot determine what tools I need to use to meet my above needs.
Can somebody tell me what tools I need and how I need to use them? The more clear, concise and detailed the response the better.
Your range can be generated with a random number j in the range [0, 7], then you compute:
1 << j
to get your number. std::uniform_int_distribution<> would be handy for generating the value in [0, 7].
Additionally you could use a std::bernoulli_distribution (which returns a random bool) to decide if the next number is going to be the same as the last one, or if you should generate a new number. The std::bernoulli_distribution defaults to a 50/50 chance of true/false, but you can customize that distribution in the bernoulli_distribution constructor to anything you like (e.g. 80/20 or whatever).
If this isn't clear enough, just jump in with some code. Try coding it up, and if it isn't working, post what you have, and I'm sure somebody will help.
Oh, forgot about your 3rd requirement: For that just put your [0, 7] generation in a loop, and if you come up with a number you're supposed to skip, then iterate the loop, else break out of it.
For skipping numbers I completely agree with Howard that manual checking is probably the way to go, but there might be a better way altering the probability of a given number being generated.
Another way to do this would be to use a discrete_distribution object, which allows you to specify the probability of generating any given value, so for your example it would be something like
std::default_random_engine entropy;
std::array<double, 128> probs;
probs.fill(1.0);
std::discrete_distribution<int> choose(probs.begin(), probs.end());
then when you're in your loop, in addition to deciding whether or not to skip, you can increment one of those values by some amount to increase the odds of it coming up again, making sure to reinitialize the discrete distribution, like this:
int x;
double myValue = 0.2;//or whatever increment you want
for (something; something else; something else else)
{
x = choose(entropy);
if (skip(x))
continue;//alternately you could set probs.at(x) = 0
//only if you never want to generate it again
probs.at(x) += myValue;
choose = std::discrete_distribution<int>(probs.begin(), probs.end());
output(x);
}
where skip and output are your functions to decide if x should be skipped and do whatever you want with the generated value respectively

Represent 10000 booleans using only 10000 bits

I want to represent 10000 bits of information.(Each can be either one or zero). Is there any way I can do this?
Wikipedia explains a bit hack to achieve this. But then it asks me to have a number that's as large as 2^10000 for storing 10000 bits.
Is there some way that's tractable even for storing large number of bits?
As wikipedia explains, a bit field is an appropriate choice here. a bit field that can hold 10,000 bits has 2^10000 states.
A good choice for doing this (given that integers are 32/64 bits) is a bit vector, which is asked about and explained in excruciating detail here:
bit vector implementation of set in Programming Pearls, 2nd Edition
The general idea is that you use an array of integers which are used as bit fields.
You can make bool take 1 bit for example if you have a bunch of them eg. in a struct, like this:
struct A
{
bool a:1, b:1, c:1, d:1, e:1;
};
Above method won't be useful if the number of variables are large. So instead create an array of integers of size 10000/4*8. It will create exactly 10000 bits. Now you can access each bit by using offset and << or >>(like for accessing 55th bit, use floor(55/4*8) and >>55%32. you can reach that bit).
In C++ you can do this very simply, using one of two standard library containers:
std::vector<bool>
This specialization of a standard vector acts (almost) like any other vector, but compresses its contents to one bit per element. Aside from enjoying that fact, you can just treat it like a vector:
// Create a vector of 10000 booleans
std::vector<bool> lots_of_bits(10000);
// Set all the odd ones to true
for (int i = 1; i < lots_of_bits.size(); i += 2) {
lots_of_bits[i] = true;
}
// Add another 100 trues at the end
for (int j = 0; j < 100; ++j) {
lots_of_bits.push_back(true);
}
// etc.
std::bitset<N>
The "new, improved" bit vector which does not pretend to be a standard container. In particular, it's of fixed size and you need to know the size at compile time. That can be a bit restrictive, but it's otherwise a pretty useful class. Like std::vector<bool>, it implements the [] operator for getting and setting individual bits. It also supports the bitwise logical operators &, |, '^' and ~ (and, or, xor and not), as well as left and right bitshifts, and some other utilities.
Is your concern that accessing bit number n requires shifting n times? If so, you can make the problem tractable by dividing your 10,000 bits into 10,000 / 8 buckets using an array of characters (assuming C or C++ here). Now you can access bit number n by figuring out what bucket that bit is in (n / 8) and then what position within the bucket (n % 8). Then you just do the masking. No extra storage required (except the padding at the end, so a few extra bits if you don't have a perfect multiple of 32 bits).

Sum reduction of binary sequence

Consider a binary sequence:
11000111
I have to find sum of this series (actually in parallel)
Sum =1+1+0+0+0+1+1+1= 5
This is a waste of resource as why invest time in adding 0s?
Is there any clever way to sum this sequence so I can avoid unnecessary additions?
Operate at the byte level rather than the bit level. Use a small LUT to convert a byte to a population count. That way you're only doing one lookup and one add per 8 bits. Unless your data is likely to be very sparse this should be quite efficient.
Well it depends on how you store your bitset.
If it's an array, then you can't do more than a plain for. If you want to do this in parallel, just split the array in chunks and process them concurrently.
If we are talking about a bitset (storing the bits in a native (32/64-bit) integer type), then the simplest way to count bits would be this one:
int bitset;
int s = 0;
for (; bitset; s++)
bitset &= bitset-1;
This removes the last bit of 1 at every step, so you have O(s).
Of course, you can combine these two methods if you need more than 32/64 bits
I dunno why people are answering, not even looking into link from the 1st comment to the question. You can easily make it under O(size_of_bitset). At lewast when it comes to constant factor.
You could use this method (found in link by J.F. Sebastian):
inline int count_bits(int num){
int sum = 0;
for (; bitset; sum++) bitset &= bitset-1;
return sum;
}
int main (void){
int array[N];
int total_sum = 0;
#pragma omp parallel for reduction(+:total_sum)
for (size_t i = 0; i < N, i++){
total_sum += count_bits(array[i]);
}
}
This will count number of bits in memory range of array in parallel. The inline is important to avoid unnecessary copying, also the compiler should optimize it much better.
You can swap the count_bits with anything better that counts bits in an integer to get faster if you find anything. This version has complexity of O(bits_set) (not size of the bit set!).
Invoking the parallel construct will introduce quite a lot of overhead compared to a single summation that it does need to be quite large to compensate.
The parallelism is done via OpenMP. The partial sum of each thread is summed at the end of the parallel loop and stored in total_sum. Note the total_sum will be private inside the loop for each thread reduction due to reduction clause.
You could alter the code to make it count bits set in arbitrary memory region but it is quite important for it to be memory aligned when you perform operations on such low level.
As far as I can see, it would be wasteful to try to handle the zeros specially. As #bdares said, addition is really cheap. At a minimum, you'll need to execute N instructions to sum up the an N-bit sequence, that would be if you unconditionally sum ever bit. If you add a test to see whether the bit is a 0 or 1, that's another instruction that needs to be executed for each bit. Even if there's no branch penalty, you're executing minimum 1 instruction for every bit (the conditional test), and then you're also executing the original instruction (the add) for any bits that are equal to 1. So even without branch penalty, this takes more time to execute.
#bdares mentions that the compiler will optimize out the branches, but that's only if the value of each bit is known at compile time, and if you know the values of the bits at compile time, you should just add them up yourself in advance.
There might be some cute things you can do with bit twiddling. For instance, if you take the bits two at a time you're adding up values of 0, 1, 2, or 3, and only have half as many additions to do. There may by something you can then do with the result to convert it into the value you want, but I haven't actually thought about how to do that.

What are the differences between an array and a range in ruby?

just wondering what the subtle difference between an array and a range is. I came across an example where I have x = *(1..10) output x as an array and *(1..10) == (1..10).to_a throws an error. This means to me there is a subtle difference between the two and I'm just curious what it is.
Firstly, when you're not in the middle of an assignment or parameter-passing, *(1..10) is a syntax error because the splat operator doesn't parse that way. That's not really related to arrays or ranges per se, but I thought I'd clear up why that's an error.
Secondly, arrays and ranges are really apples and oranges. An array is an object that's a collection of arbitrary elements. A range is an object that has a "start" and an "end", and knows how to move from the start to the end without having to enumerate all the elements in between.
Finally, when you convert a range to an array with to_a, you're not really "converting" it so much as you're saying, "start at the beginning of this range and keep giving me elements until you reach the end". In the case of "(1..10)", the range is giving you 1, then 2, then 3, and so on, until you get to 10.
One difference is that ranges do not separately store every element in itself, unlike an array.
r = (1..1000000) # very fast
r.to_a # sloooooow
You lose the ability to index to an arbitrary point, however.

Resources