Binary Search with multiple midpoints confusion - algorithm

I'm reviewing for my midterm and this specific question is causing me some issues.
This is the following array to perform the binary search:
the value I want to search for is 150.
To start off, I take the first element which is 0, and the last element which is 15.
(start + end) / 2,
(0 + 15) / 2 = 7
The value at the array of 7 is 90.
90 < 150, so the value is contained in the right side of the array.
The array now looks like this:
Continuing with the same logic
(start + end) / 2
(8 + 15) / 2 = 11.
However, according to the professor I should be at the value 12 here. I'm not sure what i am doing wrong. Any help would be appreciated.

The algorithms were written even before the computers were invented.
Computers are simply a tool or a device which implements the algorithm in an efficient manner which is why it is fast.
The binary search which you are performing here is relevant to computers as the array are indexed from 0 (counting usually starts from 0 in computers), that is why you are getting 11 which is correct in point of computers.
But for the humans counting starts from 1 and the so the result according to professor is 12.
While writing algorithms we write in according to the perception of the human and we twist it a little to implement in our machine.

Related

How can I make a complex ifelse algorithm which comprehend dates and time?

I have got a data-management problem. I have a database where "EDSS.1","EDSS.2",... represent a numeric variable, scaled from 0 to 10 (0.5 scale), where higher number stand for higher disability. For each EDSS variable, I have a "VISITDATE.1", "VISITDATE.2",...
EDSS
VISITDATE
Now I am interested in assessing the CONFIRMED DISABILITY PROGRESSION (CDP), which is an increased i 1 poin on the EDSS. To make things more difficult, this increment need to be confirmed in the subsequent visit (e.g. EDSS.3) which has to be >= 6 months (which is, VISITDATE.3 - VISITDATE.2 > 6 months.
To do so, I am creating a nested ifelse statement, as showed below.
prova <- prova %>% mutate(
CDP = ifelse(EDSS.2 > EDSS.1 & EDSS.3>=EDSS.2 & difftime(VISITDATE.3,VISITDATE.2,
units = "weeks") > 48,
print(ymd(VISITDATE.2)),0))
However, I am facing the following main problems:
How can I print the VISIT.DATE of my interest instead of 1 or 0?
How can I shift my code to the EDSS.2,EDSS.3, and so on? I am interested in finding all the confirmed disability progressions (CDPs).
Many thanks to everyone who find the time to answer me.

Pattern Recognition Algorithm/Technique

Background
I apologize for the music-based question, but the details don't really mean all that much. I'm sequentially going through a midi file and I'm looking for an efficient way to find a pattern in the data to find something called a tuplet. See image below:
The tuplets have the numbers (3 or 6) over top of them. I need to know at which position they begin in the data file. The numbers below the notes are the values you would see sequentially in the data file. Just in case you can't decipher the data below, here it is:
1, 2, 2.3333, 2.6666, 3, 3.5, 3.6666, 3.83333, 4, 4.1666, 4.3333, 4.5, 4.6666, 4.8333,
5, 6.3333, 6.6666, 7.1666, 7.3333, 7.5, 7.6666, 7.8333, 8, 8.1666, 8.333, 8.5, 8.6666.
The first tuplet begins at position 2 and the difference between the position of notes is 0.3333 (repeating)
The second tuplet begins at position 3.5 and the difference between the position of notes is 0.1666 (repeating)
The main issue is that in the note, unlike the image below, position 7 will not be noted in the data file because the data only file only lists note locations. The icon that you see in that location is called a rest, which is not notated in the data file.
Question
How can I find an efficient method to find the start of each tuplet? Is there some sort of recursive method?
I don't think you need any recursion for this.
The normal note values can only be represented by fractions of the beat of the type a / 2^b. The tuplets can be arbitrary fractions, but mostly I've seen something like triplets, quintuplets or (in your case sextuplets).
So the simplest way would be to compute the length of every note (maybe the time difference between two MIDI events? Or the length is stored explicitly in MIDI? I'm not that familiar with the format) and compute the rational representation of this length.
Every group of notes with a denominator that is not a power of two belongs to such a tuplet. To group the notes together, I would recommend the following approach (assuming that all notes of a tuplet have the same value):
Factorize the denominator into a power of two a and the rest b (e.g. a * b = 4 * 5)
Initialize an empty tuplet of size b
For every note compute the distance to the beginning of the tuplet and store the note at the corresponding position, inserting rests if necessary. The length of the tuplet can be computed by taking the minimum length l of all notes in the tuplet, so greedily adding them until the end of these notes exceeds a distance of l * b from the beginning of the tuplet
This way, you base the tuplet on the minimum note length and add all notes that fit into it.

How to write Analysis function for Min-Max Algorithm?

I'm trying to code AI for a game somewhat similar to Tic-Tac-Toe. You can see its rules here.
The min-max algorithm and analysis function I'm using can be found here
The way I've tried so far:
I've built some patterns which will be good for the current player. (in Python)
e.g. my_pattern = " ".join(str(x) for x in [piece, None, piece, piece, None])
I'm matching such patterns with all the 6 possible orientations on the hexagonal gameboard for every piece (not for blank spaces). To be precise, matching my_pattern with 6 different arrays (each array represents one of 6 different orientations).
Now, What should this analysis function actually calculate?
The score of entire state of board?
The score of the last move made on board?
If someone can accurately describe the purpose of Analysis function, that would be great.
The analysis function represents the current state of board. It may/ may not include the last move, any of the previous moves or the order of moves to reach a board position. It should also consider whose turn it is to play.
What I mean is the same board can be good/bad for white/black depending on whose turn it is. (Called the situation of zugzwang in chess).
Also, the same board can be reached in a variety of move sequences, hence, it depends on the type of game whether you want to include that in the analysis or not. (High level chess engines surely include order of moves, though not for calculating current board, but for further analysis on a possibility of reaching that position). In this game however, I don't think there is any need of including last or any of the previous moves (order) for your analysis function.
EDIT:
An example of analysis function:
value = 10000*W(4) - 10000*W(3) + 200*W(2.1) + 200*W(1.2) + 100*W(2) + 100*W(1.1) + 2*W(1e) + 10*W(1m) + 30*W(1c) - (10000*B(4) - 10000*B(3) + 200*B(2.1) + 200*B(1.2) + 100*B(2) + 100*B(1.1) + 2*B(1e) + 10*B(1m) + 30*B(1c))
where:
W = white
B = black pieces
4 = made line of 4 pieces
3 = made line of 3 pieces
2 = made line of 2 pieces having possibility of getting extended to 4 from atleast one side
. = blank (ie, 1.2 = W.WW on the board)
1.1 = Piece|Blank|Piece and possibility of extending to 4 from atleast one side
e|m|c = edge|middle|center of board, and possibility of extending to 4 from either sides
The positive result of this analysis function would mean white is better, 0 indicates balanced board and negative value means black has advantageous position. You can change the weights owing to the result of tests you will execute. However, finding all possible combinations is exhaustive task, but the game is such :)

Direct way of converting BASE-14 to BASE-7

Given (3AC) in base-14. Convert it into BASE-7.
A simple approach is to convert first 3AC into BASE-10 & then to BASE-7 which results in 2105.
I was just wondering that does there exist any direct way of conversion from BASE-14 to BASe-7?
As others have said, there is no straightforward technique, because 14 is not a power of 7.
However, you don't need to go through base-10. One approach is to write routines that perform base-14 arithmetic (specifically addition and multiplication), and then use them to process each base-7 digit in turn: multiply it by the relevant power-of-7, and then add it to an accumulator.
I have found one approach.
There is no need to calculate for base 10 and then base 7. It can be done using this formula!
If a no X is represented in base 14 as
X = an a(n-1) a(n-2) .... a(0)
then in base 7 we can write it as
X=.....rqp
where
p=(2^0)a(0)%7;
q=((2^1)a(1) + p/7)%7
r=((2^2)a(2) + q/7)%7
..........
nth term=((2^n)a(n) + (n-1)th term/7)%7
(will go further because a no. in base 14 will require more digits in base 7).
The logic is simple, just based on properties of bases, and taking into account the fact that 7 is half of 14. Else it would have been a tedious task.
Eg. here it is given 3AC.
C =12;
so last digit is (2^0 * 12)%7 = 5
A=10
next digit is (2^1 * 10 + 12/7)%7 = (20+1)%7=21%7=0
next is 3;
next digit is (2^2 * 3 + 21/7)%7 = (12+3)%7=15%7=1
next is nothing(0);
next digit is (2^3 * 0 + 15/7)%7 = (0+2)%7=2%7=2
Hence, in base 7 number will be, 2105. This method may seem confusing and difficult, but with a little practice, it may come very handy in solving similar types of problems! Also, even if the number is very long, like 287AC23B362, we don't have to unnecessarily find base 10, which may consume atleast some time, and directly compute base 7!
No, there's not really an easy way to do as you wish because 14 is not a power of 7.
The only tricks that I know of for something like this (ex easily going from hex to binary) require that one base be a power of the other.
Link gives a reasonable clear answer. In short, it's a bit of a pain from the methods I know.

Data Compression : Arithmetic coding unclear

Can anyone please explain arithmetic encoding for data compression with implementation details ? I have surfed through internet and found mark nelson's post but the implementation's technique is indeed unclear to me after trying for many hours.
Mark nelson's explanation on arithmetic coding can be located at
http://marknelson.us/1991/02/01/arithmetic-coding-statistical-modeling-data-compression/
The main idea with arithmetic compression is its the capability to code a probability using the exact amount of data length required.
This amount of data is known, proven by Shannon, and can be calculated simply by using the following formula : -log2(p)
For example, if p=50%, then you need 1 bit.
And if p=25%, you need 2 bits.
That's simple enough for probabilities which are power of 2 (and in this special case, huffman coding could be enough). But what if the probability is 63% ? Then you need -log2(0.63) = 0.67 bits. Sounds tricky...
This property is especially important if your probability is high. If you can predict something with a 95% accuracy, then you only need 0.074 bits to represent a good guess. Which means you are going to compress a lot.
Now, how to do that ?
Well, it's simpler than it sounds. You will divide your range depending on probabilities. For example, if you have a range of 100, 2 possible events, and a probability of 95% for the 1st one, then the first 95 values will say "Event 1", and the last 5 remaining values will say "Event 2".
OK, but on computers, we are accustomed to use powers of 2. For example, with 16 bits, you have a range of 65536 possible values. Just do the same : take the 1st 95% of the range (which is 62259) to say "Event 1", and the rest to say "Event 2". You obviously have a problem of "rounding" (precision), but as long as you have enough values to distribute, it does not matter too much. Furthermore, you are not constrained to 2 events, you could have a myriad of events. All that matters is that values are allocated depending on the probabilities of each event.
OK, but now i have 62259 possible values to say "Event 1", and 3277 to say "Event 2". Which one should i choose ?
Well, any of them will do. Wether it is 1, 30, 5500 or 62256, it still means "Event 1".
In fact, deciding which value to select will not depend on the current guess, but on the next ones.
Suppose i'm having "Event 1". So now i have to choose any value between 0 and 62256. On next guess, i have the same distribution (95% Event 1, 5% Event 2). I will simply allocate the distribution map with these probabilities. Except that this time, it is distributed over 62256 values. And we continue like this, reducing the range of values with each guess.
So in fact, we are defining "ranges", which narrow with each guess. At some point, however, there is a problem of accuracy, because very little values remain.
The idea, is to simply "inflate" the range again. For example, each time the range goes below 32768 (2^15), you output the highest bit, and multiply the rest by 2 (effectively shifting the values by one bit left). By continuously doing like this, you are outputting bits one by one, as they are being settled by the series of guesses.
Now the relation with compression becomes obvious : when the range are narrowed swiftly (ex : 5%), you output a lot of bits to get the range back above the limit. On the other hand, when the probability is very high, the range narrow very slowly. You can even have a lot of guesses before outputting your first bits. That's how it is possible to compress an event to "a fraction of a bit".
I've intentionally used the terms "probability", "guess", "events" to keep this article generic. But for data compression, you just to replace them with the way you want to model your data. For example, the next event can be the next byte; in this case, you have 256 of them.
Maybe this script could be useful to build a better mental model of arithmetic coder: gen_map.py. Originally it was created to facilitate debugging of arithmetic coder library and simplify generation of unit tests for it. However it creates nice ASCII visualizations that also could be useful in understanding arithmetic coding.
A small example. Imagine we have an alphabet of 3 symbols: 0, 1 and 2 with probabilities 1/10, 2/10 and 7/10 correspondingly. And we want to encode sequence [1, 2]. Script will give the following output (ignore -b N option for now):
$ ./gen_map.py -b 6 -m "1,2,7" -e "1,2"
000000111111|1111|111222222222222222222222222222222222222222222222
------011222|2222|222000011111111122222222222222222222222222222222
---------011|2222|222-------------00011111122222222222222222222222
------------|----|-------------------------00111122222222222222222
------------|----|-------------------------------01111222222222222
------------|----|------------------------------------011222222222
==================================================================
000000000000|0000|000000000000000011111111111111111111111111111111
000000000000|0000|111111111111111100000000000000001111111111111111
000000001111|1111|000000001111111100000000111111110000000011111111
000011110000|1111|000011110000111100001111000011110000111100001111
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
001100110011|0011|001100110011001100110011001100110011001100110011
010101010101|0101|010101010101010101010101010101010101010101010101
First 6 lines (before ==== line) represent a range from 0.0 to 1.0 which is recursively subdivided on intervals proportional to symbol probabilities. Annotated first line:
[1/10][ 2/10 ][ 7/10 ]
000000111111|1111|111222222222222222222222222222222222222222222222
Then we subdivide each interval again:
[ 0.1][ 0.2 ][ 0.7 ]
000000111111|1111|111222222222222222222222222222222222222222222222
[ 0.7 ][.1][ 0.2 ][ 0.7 ]
------011222|2222|222000011111111122222222222222222222222222222222
[.1][ .2][ 0.7 ]
---------011|2222|222-------------00011111122222222222222222222222
Note, that some intervals are not subdivided. That happens when there is not enough space to represent every subinterval within given precision (which is specified by -b option).
Each line corresponds to a symbol from the input (in our case - sequence [1, 2]). By following subintervals for each input symbol we'll get a final interval that we want to encode with minimal amount of bits. In our case it's a first 2 subinterval on a second line:
[ This one ]
------011222|2222|222000011111111122222222222222222222222222222222
Following 7 lines (after ====) represent the same interval 0.0 to 1.0, but subdivided according to binary notation. Each line is a bit of output and by choosing between 0 and 1 you choose left or right half-subinterval. For example bits 01 corresponds to subinterval [0.25, 05) on a second line:
[ This one ]
000000000000|0000|111111111111111100000000000000001111111111111111
The idea of arithmetic coder is to output bits (0 or 1) until the corresponding interval will be entirely inside (or equal to) the interval determined by the input sequence. In our case it's 0011. The ~~~~ line shows where we have enough bits to unambiguously identify the interval we want.
Vertical lines formed by | symbol show the range of bit sequences (rows) that could be used to encode the input sequence.
First of all thanks for introducing me to the concept of arithmetic compression!
I can see that this method has the following steps:
Creating mapping: Calculate the fraction of occurrence for each letter which gives a range size for each alphabet. Then order them and assign actual ranges from 0 to 1
Given a message calculate the range (pretty straightforward IMHO)
Find the optimal code
The third part is a bit tricky. Use the following algorithm.
Let b be the optimal representation. Initialize it to empty string (''). Let x be the minimum value and y the maximum value.
double x and y: x=2*x, y=2*y
If both of them are greater than 1 append 1 to b. Go to step 1.
If both of them are less than 1, append 0 to b. Go to step 1.
If x<1, but y>1, then append 1 to b and stop
b essentially contains the fractional part of the number you are transmitting. Eg. If b=011, then the fraction corresponds to 0.011 in binary.
What part of implementation do you not understand?

Resources