The Set of All Turing Machines is Countable vs the set of all infinite binary sequences is uncountable - computation-theory

Trying to study for the final and got so confused with countability.
I understand any turing machine can be described as a string. We have a finite number of inputs (Σ). We can calculate the string combinations for each length.
Say there are 256 different input symbols.
For the string length of 1: 256 combinations.
For the string length of 2: we have 256^2 combinations.
For the string length of k, we have 256^k combinations.
Then we number all these combinations.
1, 2 ... 256,
257, 258 ... 256 + 256^2 ...
Since natural numbers are countable, there's a bijective mapping. So the set of all turing machines is countable.
My question is why couldn't I do the same for all infinite binary sequences? I find all the combinations for each length, number them, then I will get a bijective mapping.
Many thanks!

It sounds like you are asking about Cantor's Diagonal Argument. Given a set of Infinite sequences, you can craft a sequence that is not in the set.
This is very similar to the argument that you can not count the irrational numbers. You will always be able to craft a number that is not in the set, given that the set is composed of numbers/strings/etc. that are of infinite length.
I think the biggest flaw in your argument is you say "I find all the combinations of each length" but this is impossible considering you allow for strings having up to infinite length.

Related

Algorithm to represent a number

want to save the number x in more bits than the standard binary representation. The bitstring representation, which iam searching, must be unique for this number x that i can map x to this representation and back. Also on every bit position 1 and 0 must be allowed.
Exists such a bitstring representation of number x, or its not possible to create such representation?
For example the zeckendorf representation is unique but doesnt allow 2 consequtive 1. If iam cut out the 0 after one 1 the length of the resulting bistring is more or less equal to standard binary representation length, but not longer.
Add a single bit that is the parity of the original number: XOR all of the bits together. THe mapping is deterministic, unique, and trivially reversible.
In general, any error detection/correction addition will satisfy your posted requirements.

Divide binary array into three parts such that each part represent same decimal

Given a binary array, divide array into three parts such that each part represent same decimal.
Eg arr[] = {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1}
Above array can be divided in the following way:
{1},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}, {1}. Now each part represent same decimal.
One simple approach would be to iterate starting from 1 and check each decimal can divide the array into three equal parts.
Is there any efficient algorithm for this.
Count the number of set bits. There's no solution if this number isn't divisible by three. Say it is, and that there are 3k set bits. Find the positions of the first, k+1, and 2k+1 set bits. Keep incrementing all three positions and comparing until you get to the last set bit or get a disagreement. If you get to the last set bit with all three agreeing then you have a solution.

Using a set of integers to generate unique key

Now I have some sets of integers, say:
set1 = {int1, int2, int3};
set2 = {int2, int3, int1};
set3 = {int1, int4, int2};
The order or the numbers is not taken into consideration, so set1 and set2 are the same, while set3 are not with the other two.
Now I want to generate a unique key for these sets to distinguish them, in that way, set1 and set2 should generate the same key.
I think this for a while, thoughts as sum up the integers came to my mind but can be easily proved wrong. Sort the set and do
key = n1 + n2*2^16 + n3*2^32
may be a possible way but I wonder if this can be solved more elegantly.
The key can be either integer or string.
So any one has some idea about solving this as fast as possible? Or any reading material is welcome.
More info:
The numbers are in fact colors so each integer is less than 0xffffff
If these were small integers (all within the range(0,63) for example) then you could represent each set as a bitstring (1 for any integer that's present in the set; 0 for any that's absent). For sparse sets of large integers this would be horrendously expensive in terms of storage/memory).
One other method that comes to mind would be to sort the set and form the key as the concatenation of each number's digital representation (separated by some delimiter). So the set {2,1,3} -> "1/2/3" (using "/" as the delimiter) and {30,1,2,4} => "1/2/4/30"
I suppose you could also use a hybrid approach. All elements < 63 are encoded into a hex string and all others are encoded into a string as described. Then your final resulting key is formed by: HEXxA/B/c ... (with the "x" separating the small int hex string from the larger ints in the set).
If numbers of your set is not so large, I think hashing each set into one string can be one of proper solution.
Then they are lager ones, you can make it small ones by mod function or whatever. And by this, they can be dealed with in the same way.
Hope this will help your solution if there is no better idea.
I think a key of practical size can only be a hash value - there will always be a few pairs of inputs that hash to the same key, but you can make this unlikely.
I think the idea of sorting and then applying a standard hash function is good, but I don't like your hash multipliers. If arithmetic is mod 2^32, then multiplying by 2^32 is multiplying by zero. If it is mod 2^64, then multiplying by 2^32 will lose the top 32 bits of the input.
I would use a hash function like that described in Why chose 31 to do the multiplication in the hashcode() implementation ?, where you keep a running total, multiplying the hash value by some odd number before you add then next item into it. Multiplying by an odd number mod 2^n will at least not lose information immediately. I would suggest 131, but Java has a tradition of using 31.

Is it possible to create an algorithm which generates an autogram?

An autogram is a sentence which describes the characters it contains, usually enumerating each letter of the alphabet, but possibly also the punctuation it contains. Here is the example given in the wiki page.
This sentence employs two a’s, two c’s, two d’s, twenty-eight e’s, five f’s, three g’s, eight h’s, eleven i’s, three l’s, two m’s, thirteen n’s, nine o’s, two p’s, five r’s, twenty-five s’s, twenty-three t’s, six v’s, ten w’s, two x’s, five y’s, and one z.
Coming up with one is hard, because you don't know how many letters it contains until you finish the sentence. Which is what prompts me to ask: is it possible to write an algorithm which could create an autogram? For example, a given parameter would be the start of the sentence as an input e.g. "This sentence employs", and assuming that it uses the same format as the above "x a's, ... y z's".
I'm not asking for you to actually write an algorithm, although by all means I'd love to see if you know one to exist or want to try and write one; rather I'm curious as to whether the problem is computable in the first place.
You are asking two different questions.
"is it possible to write an algorithm which could create an autogram?"
There are algorithms to find autograms. As far as I know, they use randomization, which means that such an algorithm might find a solution for a given start text, but if it doesn't find one, then this doesn't mean that there isn't one. This takes us to the second question.
"I'm curious as to whether the problem is computable in the first place."
Computable would mean that there is an algorithm which for a given start text either outputs a solution, or states that there isn't one. The above-mentioned algorithms can't do that, and an exhaustive search is not workable. Therefore I'd say that this problem is not computable. However, this is rather of academic interest. In practice, the randomized algorithms work well enough.
Let's assume for the moment that all counts are less than or equal to some maximum M, with M < 100. As mentioned in the OP's link, this means that we only need to decide counts for the 16 letters that appear in these number words, as counts for the other 10 letters are already determined by the specified prefix text and can't change.
One property that I think is worth exploiting is the fact that, if we take some (possibly incorrect) solution and rearrange the number-words in it, then the total letter counts don't change. IOW, if we ignore the letters spent "naming themselves" (e.g. the c in two c's) then the total letter counts only depend on the multiset of number-words that are actually present in the sentence. What that means is that instead of having to consider all possible ways of assigning one of M number-words to each of the 16 letters, we can enumerate just the (much smaller) set of all multisets of number-words of size 16 or less, having elements taken from the ground set of number-words of size M, and for each multiset, look to see whether we can fit the 16 letters to its elements in a way that uses each multiset element exactly once.
Note that a multiset of numbers can be uniquely represented as a nondecreasing list of numbers, and this makes them easy to enumerate.
What does it mean for a letter to "fit" a multiset? Suppose we have a multiset W of number-words; this determines total letter counts for each of the 16 letters (for each letter, just sum the counts of that letter across all the number-words in W; also add a count of 1 for the letter "S" for each number-word besides "one", to account for the pluralisation). Call these letter counts f["A"] for the frequency of "A", etc. Pretend we have a function etoi() that operates like C's atoi(), but returns the numeric value of a number-word. (This is just conceptual; of course in practice we would always generate the number-word from the integer value (which we would keep around), and never the other way around.) Then a letter x fits a particular number-word w in W if and only if f[x] + 1 = etoi(w), since writing the letter x itself into the sentence will increase its frequency by 1, thereby making the two sides of the equation equal.
This does not yet address the fact that if more than one letter fits a number-word, only one of them can be assigned it. But it turns out that it is easy to determine whether a given multiset W of number-words, represented as a nondecreasing list of integers, simultaneously fits any set of letters:
Calculate the total letter frequencies f[] that W implies.
Sort these frequencies.
Skip past any zero-frequency letters. Suppose there were k of these.
For each remaining letter, check whether its frequency is equal to one less than the numeric value of the number-word in the corresponding position. I.e. check that f[k] + 1 == etoi(W[0]), f[k+1] + 1 == etoi(W[1]), etc.
If and only if all these frequencies agree, we have a winner!
The above approach is naive in that it assumes that we choose words to put in the multiset from a size M ground set. For M > 20 there is a lot of structure in this set that can be exploited, at the cost of slightly complicating the algorithm. In particular, instead of enumerating straight multisets of this ground set of all allowed numbers, it would be much better to enumerate multisets of {"one", "two", ..., "nineteen", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"}, and then allow the "fit detection" step to combine the number-words for multiples of 10 with the single-digit number-words.

Anagram generation - Isnt it kind of subset sum?

Anagram:
An anagram is a type of word play, the result of rearranging the
letters of a word or phrase to produce a new word or phrase, using
all the original letters exactly once;
Subset Sum problem:
The problem is this: given a set of integers, is there a non-empty
subset whose sum is zero?
For example, given the set { −7, −3, −2, 5, 8}, the answer is yes
because the subset { −3, −2, 5} sums to zero. The problem is
NP-complete.
Now say we have a dictionary of n words. Now Anagram Generation problem can be stated as to find a set of words in dictionary(of n words) which use up all letters of the input. So does'nt it becomes a kind of subset sum problem.
Am I wrong?
The two problems are similar but are not isomorphic.
In an anagram the order of the letters matters. In a subset sum, the order does not matter.
In an anagram, all the letters must be used. In a subset sum, any subset will do.
In an anagram, the subgroups must form words taken from a comparatively small dictionary of allowable words (the dictionary). In a subset sum, the groups are unrestricted (no dictionary of allowable groupings).
If you'd prove that solving anagram finding (not more than polynomial number of times) solves subset sum problem - it would be a revolution in computer science (you'd prove P=NP).
Clearly finding anagrams is polynomial-time problem:
Checking if two records are anagrams of each other is as simple as sorting letters and compare the resulting strings (that is C*s*log(s) time, where s - number of letters in a record). You'll have at most n such checks, where n - number of records in a dictionary. So obviously the running time ~ C*s*log(s)*n is limited by a polynomial of input size - your input record and dictionary combined.
EDIT:
All the above is valid only if the anagram finding problem is defined as finding anagram of the input phrase in a dictionary of possible complete phrases.
While the wording of the anagram finding problem in the original question above...
Now say we have a dictionary of n words. Now Anagram Generation problem can be stated as to find a set of words in dictionary(of n words) which use up all letters of the input.
...seems to imply something different - e.g. a possibility that some sort of composition of more than one entry in a dictionary is also a valid choice for a possible anagram of the input.
This however seems immediately problematic and unclear because (1) usually phrase is not just sequence of random words (it should make sense as a whole phrase), (2) usually words in a phrase require separators that are also symbols - so it is not clear if the separators (whitespace characters) are required in the input to allow the separate entries in a dictionary and if separators are allowed in a single dictionary entry.
So in my initial answer above I applied a "semantic razor" by interpreting the problem definition the only way it is unambiguous and makes sense as an "anagram finding".
But also we might interpret the authors definition like this:
Given the dictionary of n letter sequences (separate dictionary entries may contain same sequences) and one target letter sequence - find any subset of the dictionary entries that if concatenated together would be exact rearrangement of the target letter sequence OR determine that such subset does not exist.
^^^- Even though this problem no longer really makes perfect sense as an "anagram finding problem" still it is interesting. It is very different problem to what I considered above.
One more thing remains unclear - the alphabet flexibility. To be specific the problem definition must also specify whether set of letters is fixed OR it is allowed to redefine it for each new solution of the problem when specifying dictionary and target sequence of said letters. That's important - capabilities and complexity depends on that.
The variant of this problem with the ability to define the alphabet (available number of letters) for each solution individually actually is equivalent to a subset sum problem. That makes it NP-complete.
I can prove the equivalence of our problem to a natural number variant of subset sum problem defined as
Given the collection (multiset) of natural numbers (repeated numbers allowed) and the target natural number - find any sub-collection that sums exactly to the target number OR determine that such sub-collection does not exist.
It is not hard to see that mostly linear number of steps is enough to translate one problem input to another and vice versa. So the solution of one problem translates to exactly one solution of another problem plus mostly linear overhead.
This positive-only variant of subset-sum is equivalent to zero-sum subset-sum variant given by the author (see e.g. Subset Sum Wikipedia article).
I think you are wrong.
Anagram Generation must be simpler than Subset Sum, because I can devise a trivial O(n) algorithm to solve it (as defined):
initialize the list of anagrams to an empty list
iterate the dictionary word by word
if all the input letters are used in the ith word
add the word to the list of anagrams
return the list of anagrams
Also, anagrams consist of valid words that are permutations of the input word (i.e. rearrangements) whereas subsets have no concept of order. They may actually include less elements than the input set (hence sub set) but an anagram must always be the same length as the input word.
It isn't NP-Complete because given a single set of letters, the set of anagrams remains identical regardless.
There is always a single mapping that transforms the letters of the input L to a set of anagrams A. so we can say that f(L) = A for any execution of f. I believe, if I understand correctly, that this makes the function deterministic. The order of a Set is irrelevant, so considering a differently ordered solution non-deterministic is invalid, it is also invalid because all entries in a dictionary are unique, and thus can be deterministically ordered.

Resources