How to test a new random sequence in NIST Test Suite? - random

I have to test a random sequence using the NIST Test Suite. I have downloaded and run the tests on the files given in the data directory. It is working fine but when I tried to run it on a new random sequence, I am getting igmac: UNDERFLOW error. The random sequence is generated in Matlab using
bs=fix(randi([0 1],1,k));
and then saved as .dat file using
dlmwrite('bs.dat', bs);
I copied the bs.dat into the data folder and executed the test as follows. Can someone tell me what's wrong here?
ash#computer:~/Documents/NIST_Test_Original/sts-2.1.2$ ./assess 1000000
G E N E R A T O R S E L E C T I O N
______________________________________
[0] Input File [1] Linear Congruential
[2] Quadratic Congruential I [3] Quadratic Congruential II
[4] Cubic Congruential [5] XOR
[6] Modular Exponentiation [7] Blum-Blum-Shub
[8] Micali-Schnorr [9] G Using SHA-1
Enter Choice: 0
User Prescribed Input File: data/bs.txt
S T A T I S T I C A L T E S T S
_________________________________
[01] Frequency [02] Block Frequency
[03] Cumulative Sums [04] Runs
[05] Longest Run of Ones [06] Rank
[07] Discrete Fourier Transform [08] Nonperiodic Template Matchings
[09] Overlapping Template Matchings [10] Universal Statistical
[11] Approximate Entropy [12] Random Excursions
[13] Random Excursions Variant [14] Serial
[15] Linear Complexity
INSTRUCTIONS
Enter 0 if you DO NOT want to apply all of the
statistical tests to each sequence and 1 if you DO.
Enter Choice: 0
INSTRUCTIONS
Enter a 0 or 1 to indicate whether or not the numbered statistical
test should be applied to each sequence.
123456789111111
012345
110000000000000
P a r a m e t e r A d j u s t m e n t s
-----------------------------------------
[1] Block Frequency Test - block length(M): 128
Select Test (0 to continue): 0
How many bitstreams? 5
Input File Format:
[0] ASCII - A sequence of ASCII 0's and 1's
[1] Binary - Each byte in data file contains 8 bits of data
Select input mode: 0
Statistical Testing In Progress.........
igamc: UNDERFLOW
igamc: UNDERFLOW
igamc: UNDERFLOW
igamc: UNDERFLOW
igamc: UNDERFLOW
Statistical Testing Complete!!!!!!!!!!!!

Related

Palindrome partitioning with interval scheduling

So I was looking at the various algorithms of solving Palindrome partitioning problem.
Like for a string "banana" minimum no of cuts so that each sub-string is a palindrome is 1 i.e. "b|anana"
Now I tried solving this problem using interval scheduling like:
Input: banana
Transformed string: # b # a # n # a # n # a #
P[] = lengths of palindromes considering each character as center of palindrome.
I[] = intervals
String: # b # a # n # a # n # a #
P[i]: 0 1 0 1 0 3 0 5 0 3 0 1 0
I[i]: 0 1 2 3 4 5 6 7 8 9 10 11 12
Example: Palindrome considering 'a' (index 7) as center is 5 "anana"
Now constructing intervals for each character based on P[i]:
b = (0,2)
a = (2,4)
n = (2,8)
a = (2,12)
n = (6,12)
a = (10,12)
So, now if I have to schedule these many intervals on time 0 to 12 such that minimum no of intervals should be scheduled and no time slot remain empty, I would choose (0,2) and (2,12) intervals and hence the answer for the solution would be 1 as I have broken down the given string in two palindromes.
Another test case:
String: # E # A # B # A # E # A # B #
P[i]: 0 1 0 1 0 5 0 1 0 5 0 1 0 1 0
I[i]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Plotting on graph:
Now, the minimum no of intervals that can be scheduled are either:
1(0,2), 2(2,4), 5(4,14) OR
3(0,10), 6(10,12), 7(12,14)
Hence, we have 3 partitions so the no of cuts would be 2 either
E|A|BAEAB
EABAE|A|B
These are just examples. I would like to know if this algorithm will work for all cases or there are some cases where it would definitely fail.
Please help me achieve a proof that it will work in every scenario.
Note: Please don't discourage me if this post makes no sense as i have put enough time and effort on this problem, just state a reason or provide some link from where I can move forward with this solution. Thank you.
As long as you can get a partition of the string, your algorith will work.
Recall to mind that a partion P of a set S is a set of non empty subset A1, ..., An:
The union of every set A1, ... An gives the set S
The intersection between any Ai, Aj (with i != j) is empty
Even if the palindrome partitioning deals with strings (which are a bit different from sets), the properties of a partition are still true.
Hence, if you have a partition, you consequently have a set of time intervals without "holes" to schedule.
Choosing the partition with the minimum number of subsets, makes you have the minimum number of time intervals and therefore the minimum number of cuts.
Furthermore, you always have at least one palindrome partition of a string: in the worst case, you get a palindrome partition made of single characters.

Canonical huffman encoding algo

Hello I am trying to implement Canonical huffman encoding but i dont understand wiki and google guides,
I need explain more abstractly...
I tried this:
1. Get list of regular huffman encoding length's codes. like this:
A - code: 110, length: 3.
B - code: 111, length: 3.
C - code: 10, length 2.
D - code: 01, length 2.
E - code: 00, length 2.
I sorting the table by symbol and length like this:
C - code: 10, length 2.
D - code: 01, length 2.
E - code: 00, length 2.
A - code: 110, length: 3.
B - code: 111, length: 3.
now i dont know how to proceed...
tnx a lot
Throw out the codes you get from the Huffman algorithm. You don't need those. Just keep the lengths.
Now assign the codes based on the lengths and the symbols. Sort by length, from shortest to longest, and within each length, sort the symbols in ascending order. (How you do that exactly doesn't matter, so long as every symbol is strictly less than or greater than any other symbol, and the encoder and decoder agree on how to do it.)
So we do the ordering:
C - 2
D - 2
E - 2
A - 3
B - 3
Two's come before three's, and within the 2's, C, D, E are in order, and within the 3's, A, B are in order.
Now we assign the code in integer order within each length, adding a zero bit at the end each time we go up a length:
C - 2 - 00
D - 2 - 01
E - 2 - 10
A - 3 - 110 <- after incrementing to 11, a zero was added to make 110
B - 3 - 111
That is a canonical code.
You could do it other ways if you like and still be canonical, e.g. counting backwards from 11, so long as the encoder and decoder agree on the approach. The whole point is to only have to transmit the lengths for each symbol from the encoder to the decoder, so as to not have to transmit the codes themselves which take more space.
You should sort symbols by there frequency, so most often would be on top and least often would be on bottom. (Overall frequency - 1):
A (0.5)
B (0.2)
C (0.15)
D (0.15)
Then mark one symbol with 0 and other with 1, summ there frequencies and insert into proper position in list and again mark two least with 0 and 1:
A (0.5) A (0.5)
B (0.2) C&D (0.3) 0
C (0.15) 0 B (0.2) 1
D (0.15) 1
And again...
A (0.5) A (0.5) A (0.5) 0
B (0.2) C&D (0.3) 0 B&C&D (0.5) 1
C (0.15) 0 B (0.2) 1
D (0.15) 1
Until you obtain last pair.
The path, marked by 0 and 1 from tail to symbol would be corresponding Huffman code:
A 0
B 11
C 100
D 101

Non-recursive Grey code algorithm understanding

This is task from algorithms book.
The thing is that I completely don't know where to start!
Trace the following non-recursive algorithm to generate the binary reflexive
Gray code of order 4. Start with the n-bit string of all 0’s.
For i = 1, 2, ... 2^n-1, generate the i-th bit string by flipping bit b in the
previous bit string, where b is the position of the least significant 1 in the
binary representation of i.
So I know the Gray code for 1 bit should be 0 1, for 2 00 01 11 10 etc.
Many questions
1) Do I know that for n = 1 I can start of with 0 1?
2) How should I understand "start with the n-bit string of all 0's"?
3) "Previous bit string"? Which string is the "previous"? Previous means from lower n-bit? (for instance for n=2, previous is the one from n=1)?
4) How do I even convert 1-bit strings to 2-bit strings if the only operation there is to flip?
This confuses me a lot. The only "human" method I understand so far is: take sets from lower n-bit, duplicate them, invert the 2nd set, add 0's to every element in 1st set, add 1's do every elements in 2nd set. Done (example: 0 1 -> 0 1 | 0 1 -> 0 1 | 1 0 -> 00 01 | 11 10 -> 11 01 11 10 done.
Thanks for any help
The answer to all four your questions is that this algorithm does not start with lower values of n. All strings it generates have the same length, and the i-th (for i = 1, ..., 2n-1) string is generated from the (i-1)-th one.
Here is the fist few steps for n = 4:
Start with G0 = 0000
To generate G1, flip 0-th bit in G0, as 0 is the position of the least significant 1 in the binary representation of 1 = 0001b. G1 = 0001.
To generate G2, flip 1-st bit in G1, as 1 is the position of the least significant 1 in the binary representation of 2 = 0010b. G2 = 0011.
To generate G3, flip 0-th bit in G2, as 0 is the position of the least significant 1 in the binary representation of 3 = 0011b. G3 = 0010.
To generate G4, flip 2-nd bit in G3, as 2 is the position of the least significant 1 in the binary representation of 4 = 0100b. G4 = 0110.
To generate G5, flip 0-th bit in G4, as 0 is the position of the least significant 1 in the binary representation of 5 = 0101b. G5 = 0111.

Algorithm in hardware to find out if number is divisible by five

I am trying to think of an algorithm to implement this for a given n bit binary number. I tried out many examples, but am unable to find out any pattern. So how shall I proceed?
How about this:
Convert the number to base 4 (this is trivial by simply combining pairs of bits). 5 in base 4 is 11. The values base 4 that are divisible by 11 are somewhat familiar: 11, 22, 33, 110, 121, 132, 203, ...
The rule for divisibility by 11 is that you add all the odd digits and all the even digits and subtract one from the other. If the result is divisible by 11 (which remember is 5), then it's divisible by 11 (which remember is 5).
For example:
123456d = 1 1110 0010 0100 0000b = 132021000_4
The even digits are 1 2 2 0 0 : sum = 5d
The odd digits are 3 0 1 0 : sum = 4d
Difference is 1, which is not divisble by 5
Or another one:
123455d = 1 1110 0010 0011 1111b = 132020333_4
The even digits are 1 2 2 3 3 : sum = 11d
The odd digits are 3 0 0 3 : sum = 6d
Difference is 5, which is a 5 or a 0
This should have a fairly efficient HW implementation because it's mostly bit-slicing, followed by N/2 adders, where N is the number of bits in the number you're interested in.
Note that after adding the digits and subtracting, the maximum value is 3/4 * N, so if you have 16-bit numbers max, you can get at most 12 as a result, so you only need to check for 0, ±5 and ±10 explicitly. If you're using 32-bit numbers then you can get at most 24 as a result, so you need to also check if the result is ±15 or ±20.
Make a Deterministic Finite Automaton (DFA) to implement the divisibility check and implement the DFA in hardware.
Creating a DFA for divisibility by 5 is easy. You just need to notice the remainders and check what 2r (mod 5) and 2r + 1(mod 5) map to. There are many websites that discuss this. For example this one.
There are well-known examples to convert DFA to a hardware representation as well.
Well , I just figured out ...
number mod 5 = a0 * 2^0 mod 5 + a1 * 2^1 mod 5 +a2* 2^2 mod 5 + a3 * 2^3 mod 5 + a4 * 2^4 mod 5 + ....
= a0 (1) + a1(2) +a2 (-1) +a3 (-2) +a4 (1) repeats ...
Hence difference of odd digits + 2 times difference of even digits = divisible by 5
for example ... consider 110010
odd digits differnce = 0-0+1 = 1 or 01
even digits difference = 1-0+1 = 2 or 10
difference of odd digits + 2 times difference of even digits = 01 + 2*(10)=01 + 100 = 101 is divisible by 5 .
The contribution of each bit toward being divisible by five is a four bit pattern 3421.
You could shift through any binary number 4 bits at a time adding the corresponding value for positive bits.
Example:
100011
take 0011
apply the pattern 0021
sum 3
next four bits 0010
apply the pattern 0020
sum = 5
We can design a Deterministic Finite Automaton (DFA) for the same. The DFA, then can be implemented in Hardware. This is similar to this answer.
We will simulate a Deterministic Finite Automaton (DFA) that accepts Binary Representation of Integers which are divisible by 5
Now, by accept, we mean that when we are done with scanning string, we should be in one of the multiple possible Final States.
Approach to Design DFA : Essentially, we need to divide the Binary Representation of Integer by 5, and track the remainder. If after consuming/scanning [From Left to Right] the entire string, remainder is Zero, then we should end up in Final State, and if remainder isn't zero we should be in Non-Final States.
Now, DFA is defined by Quintuple/5-Tuple (Q,q₀,F,Σ,δ). We will obtain these five components step-by-step.
Q : Finite Set of States
We need to track remainder. On dividing any integer by 5, we can get remainder as 0,1, 2, 3 or 4. Hence, we will have Five States Z, O, T, Th and F for each possible remainder.
Q={Z, O, T, Th, F}
If after scanning certain part of Binary String, we are in state Z, this means that integer defined from Left to this part will give remainder Zero when divided by 5. Similarly, O for remainder One, and so on.
Now, we can write these three states by Euclidean Division Algorithm as
Z : 5m
O : 5m+1
T : 5m+2
Th : 5m+3
F : 5m+4
where m is Integer.
q₀ : an initial/start state from set Q
Now, start state can be thought in terms of empty string (ɛ). An ɛ directly gets into q₀.
What remainder does ɛ gives when divided by 5?
We can append as many 0s in left hand side of a Binary Number. In the similar fashion, we can append ɛ in left hand side of a Binary String. Thus, ɛ in left can be thought of as 0. And 0 when divided by 5 gives remainder 0. Hence, ɛ should end in State Z. But ɛ ends up in q₀.
Thus, q₀=Z
F : a set of accept states
Now we want all strings which are divisible by 5, or which gives remainder 0 when divided by 5, or which after complete scanning should end up in state Z, and gets accepted.
Hence,
F={Z}
Σ : Alphabet (a finite set of input symbols)
Since we are scanning/reading a Binary String. Hence,
Σ={0,1}
δ : Transition Function (δ : Q × Σ → Q)
Now this δ tells us that if we are in state x (in Q) and next input to be scanned is y (in Σ), then at which state z (in Q) should we go.
If the string upto this point gives remainder 3/Th when divided by 5, and if we append 1 to string, then what remainder will resultant string give.
Now, this can be analyzed by observing how magnitude of a binary string changes on appending 0 and 1.
a.
In Decimal (Base-10), if we add/append 0, then magnitude gets multiplied by 10 . 53, on appending 0 it becomes 530
Also, if we append 8 to decimal, then Magnitude gets multiplied by 10, and then we add 8 to multiplied magnitude.
b.
In Binary (Base-2), if we add/append 0, then magnitude gets multiplied by 2 (The Positional Weight of each Bit get multiplied by 2)
Example : (1010)2 [which is (10)10], on appending 0 it becomes (10100)2 [which is (20)10]
Similarly, In Binary, if we append 1, then Magnitude gets multiplied by 2, and then we add 1.
Example : (10)2 [which is (2)10], on appending 1 it becomes (101)2 [which is (5)10]
Thus, we can say that for Binary String x,
x0=2|x|
x1=2|x|+1
We will use these relation to analyze Five States
Any string in Z can be written as 5m
- On 0, it becomes 2(5m), which is 5(2m), nothing but state Z.
- On 1, it becomes 2(5m)+1, which is 5(2m)+1, that is O. [This can be read as if a Binary String is presently divisible by 5, and we append 1, then resultant string will give remainder as 1]
Any string in O can be written as 5m+1
- On 0, it becomes 2(5m+1) = 10m+2, which is 5(2m)+2, state T.
- On 1, it becomes 2(5m+1)+1 = 10m+3, which is 5(2m)+3, that is state Th.
Any string in T can be written as 5m+2
- On 0, it becomes 2(5m+2) = 10m+4, which is 5(2m)+4, state F.
- On 1, it becomes 2(5m+2)+1 = 10m+5, which is 5(2m+1), state Z. [If m is integer, so is (2m+1)]
Any string in Th can be written as 5m+3
- On 0, it becomes 2(5m+3) = 10m+6, which is 5(2m+1)+1, state V.
- On 1, it becomes 2(5m+3)+1 = 10m+7, which is 5(2m+1)+2, that is state T.
Any string in F can be written as 5m+4
- On 0, it becomes 2(5m+4) = 10m+8, which is 5(2m+1)+3, state Th.
- On 1, it becomes 2(5m+4)+1 = 10m+9, which is 5(2m+1)+4, that is state F.
Hence, the final DFA combining Everything (creating using Tool)
We can even write code [in High Level Language] for the same. But it would go beyond main aim of this question. If readers wish to see the same, they can check here.
As any assignment this would have been an answer for is bound to be way overdue a year later:
in the binary representation of a natural divisible by five the parities of bits 4n and 4n+2 equal, as well as those for bits 4n+1 and 4n+3.
(This is entirely equivalent to the answers of JoshG79, notsogeek, or james: 4≡-1(mod 5), 3≡-2(mod 5) (with reduced hand-waving about recursion in argumentation, and no dispensable handling of carries in circuitry))

How is the priority chosen while constructing the Huffman Tree?

Suppose my characters and their frequencies are as follows:
Char Freq.
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
When constructing a tree, at step 2 we have this:
[3] [3] [4] [5] [6] [7] [8]
/ \ c d e f g h
/ \
[1] [2]
a b
Now, since we have two 3s, how can we determine the priority of them?
In the Huffman Coding this is considered as:
[3] [3] [4] [5] [6] [7] [8]
c / \ d e f g h
/ \
[1] [2]
a b
Why?
What's the difference? Ignoring d through h for the moment, in the first case you'd get
a = 00
b = 01
c = 1
and in the second case,
a = 10
b = 11
c = 0
As long as c is at the same height in the final tree, its code will have the same length.
I would take c with bigger priority (shorter code). This would be in-line with the basic principle of Huffman trees: priority/shorter code for immediate results and lower priority for more parsing.
Your case is not interesting. The assignments of 0's and 1's to the branches is arbitrary, so the choice you outline results in the same code, i.e. the same code lengths, either way.
There are however interesting cases where you face a choice of three or more groups with the same total frequency and different shapes. Any choice will result in the same overall optimality, i.e. exactly the same number of total bits to encode the provided symbols at the provided frequencies. However the choices can result in different shape trees with different combinations of bit lengths. Then such a choice can be made to arrive at deeper or shallower trees, depending on what is desired.

Resources