Given original string and encoded string, how to induce encoding?

Given original string and encoded string, how to induce encoding? - algorithm

Suppose I have an original string and an encoded string , like the following:
"abcd" -> "0010111111001010", then one possible solution would be that "a" matches with "0010", "b" matches with "1111", "c" matches with "1100", "d" matches with "1010".
How to write a program, that given these two strings, and figure out possible encoding rules?
My first scratch looks like this:
fun partition(orgl, encode) =
let
val part = size(orgl)
fun porpt(str, i, len) =
if i = len - 1 then
[substring(str, len * (len - 1), size(str) - (len - 1) * len)]
else
substring(str, len * i, len)::porpt(str, i + 1, len)
in
porpt(encode, 0, part)
end;
But obviously it can not check whether the two substrings match the identical character, and there are many other possibilities other than proportionally partitioning the strings.
What should be the appropriate algorithms for this problem?
P.S. Only prefix code is allowed.
What I have learned has not really got into serious algorithms yet, but I did some searching about backtracking and wrote my second version of the code:
fun partition(orgl, encode) =
let
val part = size(orgl)
fun backtrack(str, s, len, count, code) =
let
val current =
if count = 1 then
code#[substring(str, s, size(str) - s)]
else
code#[substring(str, s, len)]
in
if len > size(str) - s then []
else
if proper_prefix(0, orgl, code) then
if count = 1 then current
else
backtrack(str, s + len, len, count - 1, current)
else
backtrack(str, s, len + 1, count, code)
end
in
backtrack(encode, 0, 1, part, [])
end;
Where the function proper_prefix would check prefix code and unique mapping. However, this function does not function correctly.
For example, when I input :
partition("abcd", "001111110101101");
The returned result is:
uncaught exception Subscript
FYI, the body of proper_prefix looks like this:
fun proper_prefix(i, orgl, nil) = true
| proper_prefix(i, orgl, x::xs) =
let
fun check(j, str, nil) = true
| check(j, str, x::xs) =
if String.isPrefix str x then
if str = x andalso substring(orgl, i, 1) = substring(orgl, i + j + 1, 1) then
check(j + 1, str, xs)
else
false
else
check(j + 1, str, xs)
in
if check(0, x, xs) then proper_prefix(i + 1, orgl, xs)
else false
end;

I'd try a back-tracking approach:
Start with an empty hypothesis (i.e. set all encodings to unknown). Then process the encoded string character by character.
At every new code character, you have two options: Either append the code character to the encoding of the current source character or go to the next source character. If you encounter a source character that you already have an encoding for, check if it matches and go on. Or if it doesn't match, go back and try another option. You can also check the prefix-property during this traversal.
Your example input could be processed as follows:
Assume 'a' == '0'
Go to next source character
Assume 'b' == '0'
Violation of prefix property, go back
Assume 'a' == '00'
Go to next source character
Assume 'b' == '1'
...
This explores the range of all possible encodings. You can either return the first encoding found or all possible encodings.

If one were to naively iterate all possible translations of abcd → 0010111111001010, this possibly leads to a blow-up. Simple iteration also appears to lead to a lot of invalid translations one would have to skip:
(a, b, c, d) → (0, 0, 1, 0111111001010) is invalid because a = b
(a, b, c, d) → (0, 0, 10, 111111001010) is invalid because a = b
(a, b, c, d) → (0, 01, 0, 111111001010) is invalid because a = c
(a, b, c, d) → (00, 1, 0, 111111001010) is one possibility
(a, b, c, d) → (0, 0, 101, 11111001010) is invalid because a = b
(a, b, c, d) → (0, 010, 1, 11111001010) is another possibility
(a, b, c, d) → (001, 0, 1, 11111001010) is another possibility
(a, b, c, d) → (0, 01, 01, 11111001010) is invalid because b = c
(a, b, c, d) → (00, 1, 01, 11111001010) is another possibility
(a, b, c, d) → (00, 10, 1, 11111001010) is another possibility
...
If all character strings contain each character exactly once, then this blow-up of results is the answer. If the same character occurs more than once, this further constrains the solution. E.g. matching abca → 111011 could generate
(a, b, c, a) → (1, 1, 1, 011) is invalid because a = b = c, a ≠ a
(a, b, c, a) → (1, 1, 10, 11) is invalid because a = b, a ≠ a
(a, b, c, a) → (1, 11, 0, 11) is invalid because a = b, a ≠ a
(a, b, c, a) → (11, 1, 0, 11) is one possibility
... (all remaining combinations would eventually prove invalid)
For a given hypothesis, you can choose the order in which to verify your constraints. Either
See if any mappings overlap. (I think this is what Nico calls the prefix property.)
See if any character that occurs more than once actually occurs in both places in the bit string.
An algorithm using this search strategy will have to find an order of checking constraints in order to try to a hypothesis as soon possible. My intuition tells me that a constraint a → β is worth investigating sooner if the bit string β is long and if it occurs many times.
Another strategy is ruling out that a particular character can map to any bit string of/above/below a certain length. For example, aaab → 1111110 rules out a mapping to any bit string of length above 2, and abcab → 1011101 rules out a mapping to any bit string of length different than 2.
For the programming part, try and think of ways to represent hypotheses. E.g.
(* For the hypothesis (a, b, c, a) → (11, 1, 0, 11) *)
(* Order signifies first occurrence *)
val someHyp1 = ([(#"a", 2), (#"b", 1), (#"c", 1)], "abca", "111011")
(* Somehow recurse over hypothesis and accumulate offsets for each character, e.g. *)
val someHyp2 = ([(#"a", 2), (#"b", 1), (#"c", 1)],
[(#"a", 0), (#"b", 2), (#"c", 3), (#"a", 4)])
And make a function that generates new hypotheses in some order, and a function that finds if a hypothesis is valid.
fun nextHypothesis (hyp, origStr, encStr) = ... (* should probably return SOME/NONE *)
fun validHypothesis (hyp, origStr, encStr) =
allStr (fn (i, c) => (* is bit string for c at its
accumulated offset in encStr? *)) origStr
(* Helper function that checks whether a predicate is true for each
character in a string. The predicate function takes both the index
and the character as argument. *)
and allStr p s =
let val len = size s
fun loop i = i >= len orelse p (i, String.sub (s, i)) andalso loop (i+1)
in loop 0 end
An improvement over this framework would be to change the order in which to explore hypotheses, since some search paths can rule out larger amounts of invalid mappings than others.

Related

Finding all subsets of specified size

I've been scratching my head about this for two days now and I cannot come up with a solution. What I'm looking for is a function f(s, n) such that it returns a set containing all subsets of s where the length of each subset is n.
Demo:
s={a, b, c, d}
f(s, 4)
{{a, b, c, d}}
f(s, 3)
{{a, b, c}, {a, b, d}, {a, c, d}, {b, c, d}}
f(s, 2)
{{a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}}
f(s, 1)
{{a}, {b}, {c}, {d}}
I have a feeling that recursion is the way to go here. I've been fiddling with something like
f(S, n):
for s in S:
t = f( S-{s}, n-1 )
...
But this does not seem to do the trick. I did notice that len(f(s,n)) seems to be the binomial coefficient bin(len(s), n). I guess this could be utilized somehow.
Can you help me please?

Let us call n the size of the array and k the number of elements to be out in a subarray.
Let us consider the first element A[0] of the array A.
If this element is put in the subset, the problem becomes a (n-1, k-1) similar problem.
If not, it becomes a (n-1, k) problem.
This can be simply implemented in a recursive function.
We just have to pay attention to deal with the extreme cases k == 0 or k > n.
During the process, we also have to keep trace of:
n: the number of remaining elements of A to consider
k: the number of elements that remain to be put in the current subset
index: the index of the next element of A to consider
The current_subset array that memorizes the elements already selected.
Here is a simple code in c++ to illustrate the algorithm
Output
For 5 elements and subsets of size 3:
3 4 5
2 4 5
2 3 5
2 3 4
1 4 5
1 3 5
1 3 4
1 2 5
1 2 4
1 2 3
#include <iostream>
#include <vector>
void print (const std::vector<std::vector<int>>& subsets) {
for (auto &v: subsets) {
for (auto &x: v) {
std::cout << x << " ";
}
std::cout << "\n";
}
}
// n: number of remaining elements of A to consider
// k: number of elements that remain to be put in the current subset
// index: index of next element of A to consider
void Get_subset_rec (std::vector<std::vector<int>>& subsets, int n, int k, int index, std::vector<int>& A, std::vector<int>& current_subset) {
if (n < k) return;
if (k == 0) {
subsets.push_back (current_subset);
return;
}
Get_subset_rec (subsets, n-1, k, index+1, A, current_subset);
current_subset.push_back(A[index]);
Get_subset_rec (subsets, n-1, k-1, index+1, A, current_subset);
current_subset.pop_back(); // remove last element
return;
}
void Get_subset (std::vector<std::vector<int>>& subsets, int subset_length, std::vector<int>& A) {
std::vector<int> current_subset;
Get_subset_rec (subsets, A.size(), subset_length, 0, A, current_subset);
}
int main () {
int subset_length = 3; // subset size
std::vector A = {1, 2, 3, 4, 5};
int size = A.size();
std::vector<std::vector<int>> subsets;
Get_subset (subsets, subset_length, A);
std::cout << subsets.size() << "\n";
print (subsets);
}
Live demo

One way to solve this is by backtracking. Here's a possible algorithm in pseudo code:
def backtrack(input_set, idx, partial_res, res, n):
if len(partial_res == n):
res.append(partial_res[:])
return
for i in range(idx, len(input_set)):
partial_res.append(input_set[i])
backtrack(input_set, idx+1, partial_res, res, n) # path with input_set[i]
partial_res.pop()
backtrack(input_set, idx+1, partial_res, res, n) # path without input_set[i]
Time complexity of this approach is O(2^len(input_set)) since we make 2 branches at each element of input_set, regardless of whether the path leads to a valid result or not. The space complexity is O(len(input_set) choose n) since this is the number of valid subsets you get, as you correctly pointed out in your question.
Now, there is a way to optimize the above algorithm to reduce the time complexity to O(len(input_set) choose n) by pruning the recursive tree to paths that can lead to valid results only.
If n - len(partial_res) < len(input_set) - idx + 1, we are sure that even if we took every remaining element in input_set[idx:] we are still short at least one to reach n. So we can employ this as a base case and return and prune.
Also, if n - len(partial_res) == len(input_set) - idx + 1, this means that we need each and every element in input_set[idx:] to get the required n length result. Thus, we can't skip any elements and so the second branch of our recursive call becomes redundant.
backtrack(input_set, idx+1, partial_res, res, n) # path without input_set[i]
We can skip this branch with a conditional check.
Implementing these base cases correctly, reduces the time complexity of the algorithm to O(len(input_set) choose k), which is a hard limit because that's the number of subsets that there are.

subseqs 0 _ = [[]]
subseqs k [] = []
subseqs k (x:xs) = map (x:) (subseqs (k-1) xs) ++ subseqs k xs
Live demo
The function looks for subsequences of (non-negative) length k in a given sequence. There are three cases:
If the length is 0: there is a single empty subsequence in any sequence.
Otherwise, if the sequence is empty: there are no subsequences of any (positive) length k.
Otherwise, there is a non-empty sequence that starts with x and continues with xs, and a positive length k. All our subsequences are of two kinds: those that contain x (they are subsequences of xs of length k-1, with x stuck at the front of each one), and those that do not contain x (they are just subsequences of xs of length k).
The algorithm is a more or less literal translation of these notes to Haskell. Notation cheat sheet:
[] an empty list
[w] a list with a single element w
x:xs a list with a head of x and a tail of xs
(x:) a function that sticks an x in front of any list
++ list concatenation
f a b c a function f applied to arguments a b and c

Here is a non-recursive python function that takes a list superset and returns a generator that produces all subsets of size k.
def subsets_k(superset, k):
if k > len(superset):
return
if k == 0:
yield []
return
indices = list(range(k))
while True:
yield [superset[i] for i in indices]
i = k - 1
while indices[i] == len(superset) - k + i:
i -= 1
if i == -1:
return
indices[i] += 1
for j in range(i + 1, k):
indices[j] = indices[i] + j - i
Testing it:
for s in subsets_k(['a', 'b', 'c', 'd', 'e'], 3):
print(s)
Output:
['a', 'b', 'c']
['a', 'b', 'd']
['a', 'b', 'e']
['a', 'c', 'd']
['a', 'c', 'e']
['a', 'd', 'e']
['b', 'c', 'd']
['b', 'c', 'e']
['b', 'd', 'e']
['c', 'd', 'e']

Prolog - Multiplication by recursive addition

I am trying to recursively do addition to multiply two numbers in swi-prolog. I am currently learning Prolog and I do not want to use any library like clpfd.
mult(A, B, C) :- A < B, mult(B, A, C). % always make the first number bigger
mult(A, B, C) :- B > 0, B1 is B - 1, mult(A, B1, A + C). % keep adding
mult(A, B, C) :- B == 0, C is 0. % base case
'C' is supposed to be the result.
This is trying to replicate the following algorithm in Java:
int product(int x, int y)
{
# first prolog line
if (x < y)
return product(y, x);
# second prolog line
else if (y != 0)
return (x + product(x, y - 1));
# third prolog line
else
return 0;
}
However, no matter how I vary the input, the result will always be 'false'. I was able to step through my instructions with :- trace., but I cannot find out how to fix this.

The problem is the last literal of your second clause: mult(A, B1, A + C).
What you really want is the result of A*B1 to add that to A.
So try replacing this line with:
mult(A, B, C) :- B > 0, B1 is B - 1, mult(A, B1, C1), C is A + C1.

In writing a purely relational prolog program, is it ok to use a carefully placed cut?

I am trying my hand in writing a relational prolog program that manages a key, value store. The initial code is taken from some lecture slides i found on the internet (http://people.eng.unimelb.edu.au/pstuckey/book/course.html -- see: using data structure slides).
newdic([]).
addkey(D0,K,I,D) :- D = [p(K,I)|D0].
delkey([],_,[]).
delkey([p(K,_)|D],K,D).
delkey([p(K0,I)|D0],K,[p(K0,I)|D]) :-
dif(K, K0), delkey(D0,K,D).
This code allows adding more than one value with the same key -- which, is fine with me. However, it also adds the same key, value pair twice, e.g.
?- newdic(D), addkey(D, a, 1, D2), addkey(D2, a, 1, D3), lookup(D3, a, X).
D = [],
D2 = [p(a, 1)],
D3 = [p(a, 1), p(a, 1)],
X = 1
D3 includes p(a,1) twice.
To ensure that this doesn't happen i added the following code; and to ensure that backtracking doesn't find the alternative addkey clause, I added a cut in the end of the first clause.
Is this fair game for a purely relational program -- or are the better ways to ensure that no duplicate key,value pairs are added --without the use of a cut.
newdic([]).
addkey(D0,K,I,D0) :- lookup(D0, K, I), !. % if the key already do nothing
addkey(D0,K,I,D) :- D = [p(K,I)|D0].
delkey([],_,[]).
delkey([p(K,_)|D],K,D).
delkey([p(K0,I)|D0],K,[p(K0,I)|D]) :-
dif(K, K0), delkey(D0,K,D).
this leads to the following:
?- newdic(D), addkey(D, a, 1, D2), addkey(D2, a, 1, D3), lookup(D3, a, X).
D = [],
D2 = D3, D3 = [p(a, 1)],
X = 1.
No, more solutions are available -- the program returns immediately.
any suggestions are much appreciated,
Daniel
Note: as an aside: that if i add for the same key different values, the cut does allow to backtrack to identify the second value for the same key:
?- newdic(D), addkey(D, a, 1, D2), addkey(D2, a, 1, D3), addkey(D3, a, 2, D4), lookup(D4, a, X).
D = [],
D2 = D3, D3 = [p(a, 1)],
D4 = [p(a, 2), p(a, 1)],
X = 2 ;
D = [],
D2 = D3, D3 = [p(a, 1)],
D4 = [p(a, 2), p(a, 1)],
X = 1.

SWI Prolog has library predicates for dealing with key-value pairs and associations. I haven't looked at them closely to see what might match your situation, but something to consider.
If you want to roll your own solution, you could write addkey/4 out recursively and maintain the relational behavior:
addkey([], K, V, [p(K,V)]). % Add to empty dict
addkey([p(K,V)|T], K, _, [p(K,V)|T]). % Don't add
addkey([p(K,V)|T], Kadd, Vadd, [p(K,V)|TK]) :-
dif(Kadd, K),
addkey(T, Kadd, Vadd, TK).
This implementation adds if the key is unique. It ignores the addition and yields the same dictionary back if you try the same key even with different a value (which is usually how a dictionary of key-value pairs behaves). You can enhance it for the uniqueness of the key-value pair pretty easily. Of course, the Prolog you're using would need to include dif/2, or you'd need to roll your own dif/2. :)

You can use if-then-else instead of cut:
addkey(D0,K,I,D0) :-
( lookup(D0, K, I) ->
D = D0 % if the key already [exists] do nothing
; D = [p(K,I)|D0] ).

Why can't I compare two atom like this?

So basically here is some Prolog code I wrote, using GNU-Prolog 1.4.4.
A is 1,
B = (A == 2),
B == no.
A is 2,
B = (A == 2),
B == no.
What I am expecting is when A is 2, then B == no returns no, when A is 1, then B == no returns yes.
However, to my surprise, both two code snippets return no, which leaving me the impression that B == no works in an unexpected way..
So basically how can I write the code in the way I want?
Could anyone give me some help?

The line
B = (A == 2)
does not compute A==2 in any way and assign the result to B. It just unifies the term B (a variable) with the term (A==2). The result of the unification is that B is now A==2. You can check yourself by omitting B==no:
?- A is 1, B=(A==2).
A = 1,
B = (1==2) ?
yes
If you really want that B unifies with the atoms yes resp. no you can use an if-then-else construct:
( A == 2 -> B = yes
; otherwise -> B = no)

Prolog - List of sequence from f0 to fN

The question require me to write a predicate seqList(N, L), which is satisfied when L is the list [f0, . . . , fN].
Where the fN = fN-1 + fN-2 + fN-3
My code is to compare the head of a list given, and will return true or false when compared.
seqList(_,[]).
seqList(N,[H|T]) :-
N1 is N - 1,
seq(N,H),
seqList(N1,T).
However, it only valid when the value is reversed,
e.g. seqList(3,[1,1,0,0]) will return true, but the list should return me true for
seqList(3,[0,0,1,1]). Is there any way for me to reverse the list and verifies it correctly?

It seems that you want to generate N elements of a sequence f such that f(N) = f(N-1) + f(N-2) + f(N-3) where f(X) is the X-th element of the sequence list, 0-based. The three starting elements must be pre-set as part of the specification as well. You seem to be starting with [0,0,1, ...].
Using the approach from Lazy lists in Prolog?:
seqList(N,L):- N >= 3, !,
L=[0,0,1|X], N3 is N-3, take(N3, seq(0,0,1), X-[], _).
next( seq(A,B,C), D, seq(B,C,D) ):- D is A+B+C.
Now all these functions can be fused and inlined, to arrive at one recursive definition.
But you can do it directly. You just need to write down the question, to get the solution back.
question(N,L):-
Since you start with 0,0,1, ... write it down:
L = [0, 0, 1 | X],
since the three elements are given, we only need to find out N-3 more. Write it down:
N3 is N-3,
you've now reduced the problem somewhat. You now need to find N-3 elements and put them into the X list. Use a worker predicate for that. It also must know the three preceding numbers at each step:
worker( N3, 0, 0, 1, X).
So just write down what the worker must know:
worker(N, A, B, C, X):-
if N is 0, we must stop. X then is an empty list. Write it down.
N = 0, X = [] .
Add another clause, for when N is greater than 0.
worker(N, A, B, C, X):-
N > 0,
We know that the next element is the sum of the three preceding numbers. Write that down.
D is A + B + C,
the next element in the list is the top element of our argument list (the last parameter). Write it down:
X = [D | X2 ],
now there are one less elements to add. Write it down:
N2 is N - 1,
To find the rest of the list, the three last numbers are B, C, and D. Then the rest is found by worker in exactly the same way:
worker( N2, B, C, D, X2).
That's it. The question predicate is your solution. Rename it to your liking.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Given original string and encoded string, how to induce encoding? - algorithm

Related

Finding all subsets of specified size

Prolog - Multiplication by recursive addition

In writing a purely relational prolog program, is it ok to use a carefully placed cut?

Why can't I compare two atom like this?

Prolog - List of sequence from f0 to fN

Categories

Resources