I'm trying to solve a Huffman Coding problem, but I'm not completely sure I understand the topic completely. I am trying to figure out if the following are is a valid Huffman Code:
A: 0
B: 01
C: 11
D: 110
E: 111
What I'm thinking is that it is not valid, because A, or 1, would infringe on B, or 01. I'm not positive though. Could someone enlighten me on this?
Edit: I'm sorry I meant to type A as 0 and not 1.
No. A Huffman code is a prefix code, which means that no code can be a prefix of any other code. In your example, A is a prefix of B, and C is a prefix of both D and E.
A valid prefix code would be:
A: 0
B: 10
C: 11
That's as far as you can go with codes of length 1, 2, and 2. Any other codes would be a prefix of those. It is not possible to have a prefix code with lengths 1, 2, 2, 3, and 3.
This is a valid prefix code for five symbols:
A: 0
B: 10
C: 110
D: 1110
E: 1111
as is this:
A: 00
B: 01
C: 10
D: 110
E: 111
Related
Consider Microsoft Excel's column-numbering system. Columns are "numbered" A, B, C, ... , Y, Z, AA, AB, AC, ... where A is 1.
The column system is similar to the base-10 numbering system that we're familiar with in that when any digit has its maximum value and is incremented, its value is set to the lowest possible digit value and the digit to its left is incremented, or a new digit is added at the minimum value. The difference is that there isn't a digit that represents zero in the letter numbering system. So if the "digit alphabet" contained ABC or 123, we could count like this:
(base 3 with zeros added for comparison)
base 3 no 0 base 3 with 0 base 10 with 0
----------- ------------- --------------
- - 0 0
A 1 1 1
B 2 2 2
C 3 10 3
AA 11 11 4
AB 12 12 5
AC 13 20 6
BA 21 21 7
BB 22 22 8
BC 23 100 9
CA 31 101 10
CB 32 102 11
CC 33 110 12
AAA 111 111 13
Converting from the zeroless system to our base 10 system is fairly simple; it's still a matter of multiplying the power of that space by the value in that space and adding it to the total. So in the case of AAA with the alphabet ABC, it's equivalent to (1*3^2) + (1*3^1) + (1*3^0) = 9 + 3 + 1 = 13.
I'm having trouble converting inversely, though. With a zero-based system, you can use a greedy algorithm moving from largest to smallest digit and grabbing whatever fits. This will not work for a zeroless system, however. For example, converting the base-10 number 10 to the base-3 zeroless system: Though 9 (the third digit slot: 3^2) would fit into 10, this would leave no possible configuration of the final two digits since their minimum values are 1*3^1 = 3 and 1*3^0 = 1 respectively.
Realistically, my digit alphabet will contain A-Z, so I'm looking for a quick, generalized conversion method that can do this without trial and error or counting up from zero.
Edit
The accepted answer by n.m. is primarily a string-manipulation-based solution.
For a purely mathematical solution see kennytm's links:
What is the algorithm to convert an Excel Column Letter into its Number?
How to convert a column number (eg. 127) into an excel column (eg. AA)
Convert to base-3-with-zeroes first (digits 0AB), and from there, convert to base-3-without-zeroes (ABC), using these string substitutions:
A0 => 0C
B0 => AC
C0 => BC
Each substitution either removes a zero, or pushes one to the left. In the end, discard leading zeroes.
It is also possible, as an optimisation, to process longer strings of zeros at once:
A000...000 = 0BBB...BBC
B000...000 = ABBB...BBC
C000...000 = BBBB...BBC
Generalizable to any base.
Hello I am trying to implement Canonical huffman encoding but i dont understand wiki and google guides,
I need explain more abstractly...
I tried this:
1. Get list of regular huffman encoding length's codes. like this:
A - code: 110, length: 3.
B - code: 111, length: 3.
C - code: 10, length 2.
D - code: 01, length 2.
E - code: 00, length 2.
I sorting the table by symbol and length like this:
C - code: 10, length 2.
D - code: 01, length 2.
E - code: 00, length 2.
A - code: 110, length: 3.
B - code: 111, length: 3.
now i dont know how to proceed...
tnx a lot
Throw out the codes you get from the Huffman algorithm. You don't need those. Just keep the lengths.
Now assign the codes based on the lengths and the symbols. Sort by length, from shortest to longest, and within each length, sort the symbols in ascending order. (How you do that exactly doesn't matter, so long as every symbol is strictly less than or greater than any other symbol, and the encoder and decoder agree on how to do it.)
So we do the ordering:
C - 2
D - 2
E - 2
A - 3
B - 3
Two's come before three's, and within the 2's, C, D, E are in order, and within the 3's, A, B are in order.
Now we assign the code in integer order within each length, adding a zero bit at the end each time we go up a length:
C - 2 - 00
D - 2 - 01
E - 2 - 10
A - 3 - 110 <- after incrementing to 11, a zero was added to make 110
B - 3 - 111
That is a canonical code.
You could do it other ways if you like and still be canonical, e.g. counting backwards from 11, so long as the encoder and decoder agree on the approach. The whole point is to only have to transmit the lengths for each symbol from the encoder to the decoder, so as to not have to transmit the codes themselves which take more space.
You should sort symbols by there frequency, so most often would be on top and least often would be on bottom. (Overall frequency - 1):
A (0.5)
B (0.2)
C (0.15)
D (0.15)
Then mark one symbol with 0 and other with 1, summ there frequencies and insert into proper position in list and again mark two least with 0 and 1:
A (0.5) A (0.5)
B (0.2) C&D (0.3) 0
C (0.15) 0 B (0.2) 1
D (0.15) 1
And again...
A (0.5) A (0.5) A (0.5) 0
B (0.2) C&D (0.3) 0 B&C&D (0.5) 1
C (0.15) 0 B (0.2) 1
D (0.15) 1
Until you obtain last pair.
The path, marked by 0 and 1 from tail to symbol would be corresponding Huffman code:
A 0
B 11
C 100
D 101
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
We have a generated list:
1. 003
2. 012
3. 021
4. 030
5. 102
6. 111
7. 120
8. 201
9. 210
10. 300
(numbers are from 0 to 3 and their sum is 3)
How to find in what place is a combination without counting them??
Ex. 201 -> index=8
Thanks in advance.
If digits of your number are ABC, then index is:
ndx = A * (8 - A + 1) / 2 + B + 1;
For example, for value ABC=201, we will have:
ndx = 2 * (8 - 2 + 1) / 2 + 0 + 1 = 8;
Really, value 201 has index 8.
Not a complete answer, but I think this is a good start.
If you view each digit as two binary digits, you get:
1. 003 00 00 11
2. 012 00 01 10
3. 021 00 10 01
4. 030 00 11 00
5. 102 01 00 10
6. 111 01 01 01
7. 120 01 10 00
8. 201 10 00 01
9. 210 10 01 00
10. 300 11 00 00
If you ignore the right hand column of digits, then the first seven items (values 003 through 120) are the binary representations of the numbers 0 through 6.
The next two items have values 8 and 9, and the last is 12.
So, we can convert the number to a rough index with:
ix = 4*first_digit + second_digit
And then adjust:
if (first_digit < 2)
ix = ix + 1
else if (first_digit == 3)
ix = ix - 2
I'm not happy with the conditional there. Is there a mathematical way to make this translation:
0 => 1
1 => 1
2 => 0
3 => -2
Right, the comments under the question I have been making are assuming you want to go directly from the current value to the index, without performing a search. That is to say, making some inspection of the digits of the entry and translating that to a 1-indexed number.
Note, this answer is directional and incomplete, just shows the way I would approach the problem.
Looking at your example, if we treat each entry as composed of 3 digits, (z_i, y_i, x_i), then you get the following sequences:
003; z=0, y=0, x=3
012; z=0, y=1, x=2
021; z=0, y=2, x=1
030; z=0, y=3, x=0
102; z=1, y=0, x=2
111; z=1, y=1, x=1
120; z=1, y=2, x=0
201; z=2, y=0, x=1
210; z=2, y=1, x=0
300; z=3, y=0, x=0
If the max digit is k (=3), then:
x_i = 3, 2, 1, 0, 2, 1, 0, 1, 0, 0 = k, k-1, ..., 0, k-1, ... 0, ......, 0
y_i = 0, 1, 2, 3, 0, 1, 2, 0, 1, 0 = 0, 1, ..., k, 0, ..., k-1, ......, 0
z_i = 0, 0, 0, 0, 1, 1, 1, 2, 2, 3 = k+1 x 0, k x 1, ......., 1 x k
As you can see, the y_i digit goes up in sequence repetitively, knocking the z_i up at the end of each completion.
If you had more digits, the pattern gets more complicated, but still follows a similar pattern.
For k=4:
0004
0013
0022
0031
0040
0103
0112
0121
0130
0202
0211
0220
0301
0310
0400
1003
1012
1021
1030
1102
1111
1120
1201
1210
1300
2002
2011
2020
2101
2110
2200
3001
3010
3100
4000
The total entries can be seen from the first or last column, it is the triangle number of the triangle number of k+1, in the case of k=4. For k=3, it's just the triangle of k+1.
Not having it worked it out, but that pattern might indicate successive summations as the number of digits increases.
There is a pattern still:
k=3:
k=4:
k=5:
Or in general for the total number of entries in the sequence of length k:
This knowledge helps give us a hand in finding the scalars for the first digit, and the rest of the problem is effectively a sub problem for k-1. Defeating me at the moment...
Use Collections.binarySearch.
Check http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#binarySearch(java.util.List, T)
Is the code C = {00, 11, 0101, 111, 1010, 100100, 0110} uniquely decodeable?
My answer is no, because according to Sardinas–Patterson algorithm:
C1 = {1}
C2 = {1, 11, 010, 00100}
So C2 AND C = {11}, so C is not a uniquely decodable code.
I am wondering am I right about this?
You are correct that this code is not uniquely decodable.
Consider the string 111111, this can be parsed as 11 11 11 or as 111 111.
I'm looking for a hint to an algorithm or pseudo code which helps me calculate sequences.
It's kind of permutations, but not exactly as it's not fixed length.
The output sequence should look something like this:
A
B
C
D
AA
BA
CA
DA
AB
BB
CB
DB
AC
BC
CC
DC
AD
BD
CD
DD
AAA
BAA
CAA
DAA
...
Every character above represents actually an integer, which gets incremented from a minimum to a maximum.
I do not know the depth when I start, so just using multiple nested for loops won't work.
It's late here in Germany and I just can't wrap my head around this. Pretty sure that it can be done with for loops and recursion, but I have currently no clue on how to get started.
Any ideas?
EDIT: B-typo corrected.
It looks like you're taking all combinations of four distinct digits of length 1, 2, 3, etc., allowing repeats.
So start with length 1: { A, B, C, D }
To get length 2, prepend A, B, C, D in turn to every member of length 1. (16 elements)
To get length 3, prepend A, B, C, D in turn to every member of length 2. (64 elements)
To get length 4, prepend A, B, C, D in turn to every member of length 3. (256 elements)
And so on.
If you have more or fewer digits, the same method will work. It gets a little trickier if you allow, say, A to equal B, but that doesn't look like what you're doing now.
Based on the comments from the OP, here's a way to do the sequence without storing the list.
Use an odometer analogy. This only requires keeping track of indices. Each time the first member of the sequence cycles around, increment the one to the right. If this is the first time that that member of the sequence has cycled around, then add a member to the sequence.
The increments will need to be cascaded. This is the equivalent of going from 99,999 to 100,000 miles (the comma is the thousands marker).
If you have a thousand integers that you need to cycle through, then pretend you're looking at an odometer in base 1000 rather than base 10 as above.
Your sequence looks more like (An-1 X AT) where A is a matrices and AT is its transpose.
A= [A,B,C,D]
AT X An-1 ∀ (n=0)
sequence= A,B,C,D
AT X An-1 ∀ (n=2)
sequence= AA,BA,CA,DA,AB,BB,CB,DB,AC,BC,CC,DC,AD,BD,CD,DD
You can go for any matrix multiplication code like this and implement what you wish.
You have 4 elements, you are simply looping the numbers in a reversed base 4 notation. Say A=0,B=1,C=2,D=3 :
first loop from 0 to 3 on 1 digit
second loop from 00 to 33 on 2 digits
and so on
i reversed i output using A,B,C,D digits
loop on 1 digit
0 0 A
1 1 B
2 2 C
3 3 D
loop on 2 digits
00 00 AA
01 10 BA
02 20 CA
03 30 DA
10 01 AB
11 11 BB
12 21 CB
13 31 DB
20 02 AC
21 12 BC
22 22 CC
...
The algorithm is pretty obvious. You could take a look at algorithm L (lexicographic t-combination generation) in fascicle 3a TAOCP D. Knuth.
How about:
Private Sub DoIt(minVal As Integer, maxVal As Integer, maxDepth As Integer)
If maxVal < minVal OrElse maxDepth <= 0 Then
Debug.WriteLine("no results!")
Return
End If
Debug.WriteLine("results:")
Dim resultList As New List(Of Integer)(maxDepth)
' initialize with the 1st result: this makes processing the remainder easy to write.
resultList.Add(minVal)
Dim depthIndex As Integer = 0
Debug.WriteLine(CStr(minVal))
Do
' find the term to be increased
Dim indexOfTermToIncrease As Integer = 0
While resultList(indexOfTermToIncrease) = maxVal
resultList(indexOfTermToIncrease) = minVal
indexOfTermToIncrease += 1
If indexOfTermToIncrease > depthIndex Then
depthIndex += 1
If depthIndex = maxDepth Then
Return
End If
resultList.Add(minVal - 1)
Exit While
End If
End While
' increase the term that was identified
resultList(indexOfTermToIncrease) += 1
' output
For d As Integer = 0 To depthIndex
Debug.Write(CStr(resultList(d)) + " ")
Next
Debug.WriteLine("")
Loop
End Sub
Would that be adequate? it doesn't take much memory and is relatively fast (apart from the writing to output...).