Force CompareString to compare accented characters in the first pass? - winapi

For some languages CompareString(Ex) first compares the characters by ignoring any accents. They are compared in a second pass if the strings are considered equal.
This leads to these sort orders with German umlauts:
1. u
2. ü
First pass: u == u
Second pass: u < ü
------
1. üa
2. uz
First pass: u == u, but a < z
Second pass: Skipped
In my use case this is not desired and I wonder if it is possible to somehow force CompareString to compare accented characters in the first pass, so that this sort order is achived:
1. uz
2. üa
The available flags seem to be able to skip the second pass entirely but that would only worsen the problem. I hope there's something I missed. Perhaps (mis-)using one of the sort orders to be used with MAKELCID.

Related

Turing Machine Element Distinctness Problem

So the language is as follows:
E = {#x1#x2...#xi where alphabet is {0,1}* and no string can be a duplicate of another string }
I am trying to create the state diagram for this, but even before that I was coming up with the algorithm to solve it, but the issue I was encountering is whenever I compare the first two strings, I have to mark each character with an 'x' so how would I restore the first string? Like first I compare x1 and x2, by the time I'm done, in x2 and all characters in x1 would be marked with 'x', so when I move on to x3, x1 has nothing to compare.
Instead of marking considered symbols with an x, mark them with special symbols corresponding to the symbols being marked. So, instead of writing x for 0 and x for 1, write a for 0 and b for 1. In fact, go ahead and use symbols c and d also to replace values in "the earliest thing I need to check" so you can check all pairs. A high-level description of a Turing machine using this strategy is the following:
begin reading the first input, replacing 0 with c and 1 with d
go to the second input and if the second input is a match so far, write a for 0 and b for 1, then continue. If it's not a match, we know that these inputs don't match and we can begin comparing other pairs. Change the input you're checking to a and b only and reset the first input to 0 and 1 only.
repeat this process skipping over all a and b already there to check all pairs involving the first term.
once you've checked all pairs involving the first term, cross it out (using x maybe) and then repeat the whole process on the remaining input
This will check all pairs and work as expected. The key is, as you correctly surmised, being able to reconstruct parts of the input, meaning you need extra symbols in your tape alphabet. Never hesitate to introduce tape symbols - they're free and can never hurt.

How to get the smallest in lexicographical order?

I am doing a leetcode exercise
https://leetcode.com/problems/remove-duplicate-letters/
The question is:
# Given a string which contains only lowercase letters, remove duplicate
# letters so that every letter appear once and only once. You must make
# sure your result is the smallest in lexicographical order among all possible results.
#
# Example:
# Given "bcabc"
# Return "abc"
#
# Given "cbacdcbc"
# Return "acdb"
I am not quite sure about what is the smallest in lexicographical order and why Given "cbacdcbc" then the answer would be "acdb"
Thanks for the answer in advance :)
The smallest lexicographical order is an order relation where string s is smaller than t, given the first character of s (s1) is smaller than the first character of t (t1), or in case they are equivalent, the second character, etc.
So aaabbb is smaller than aaac because although the first three characters are equal, the fourth character b is smaller than the fourth character c.
For cbacdcbc, there are several options, since b and c are duplicates, you can decided which duplicates to remove. This results in:
cbacdcbc = adbc
cbacdcbc = adcb
cbacdcbc = badc
cbacdcbc = badc
...
since adbc < adcb, you cannot thus simply answer with the first answer that pops into your mind.
You cannot reorder characters. You can only choose which occurrence to remove in case of duplicated characters.
bcabc
We can remove either first b or second b, we can remove either first c or second c. All together four outputs:
..abc
.cab.
b.a.c
bca..
Sort these four outputs lexicographically (alphabetically):
abc
bac
bca
cab
And take the first one:
abc
clearly, the wanted output must contain only letter once.
now, from what i understand, you must pick the letters in a manner that will give you the best order when the leftmost letters come before in (abc? ascii?)
now you'd ask why "acdb" than and not "abcd". i think you don't take the first "cb" since you more c and b later, but you're have to take the "a" since there's only one coming now. then you must take c 'cause there are no more "d" after the next b. that's why you take c, and then d because no more d's later.
in short, you want to take it with best lexicographical order from low to high, but make sure you take all the letters while iterating over the input string.
String comparison usually can be done in 2 ways:
compare for first unmatched letter (called lexicographical ) for example aacccccc is less than ab because at second position b has been met (and a < b).
compare string length first and shorter string is treated as less. If strings length are equal then apply lexicographical.
Second one may be faster if length of strings are known.
You question contains small error:
why Given "bcabc" then the answer would be "acdb"
While origin was: "Given "bcabc" Return "abc"". That make sense that abc should be returned instead of bca
There seems to be some misunderstanding; the example states that for the input bcabc, the expected output should be abc, not acdb, which refers to the input cbacdcbc.
the smallest in lexicographical order - your answer should be a subsequence of initial string, containing one instance of every char.
If there are many such subsequences possible (bca, bac, cab, abc for the first example), return the smallest one, comparing them as strings (consider string order in vocabulary).
why Given "bcabc" then the answer would be "acdb"
You confused two different examples

Minimum number of char substitutions to get a palindrome

I would like to solve this problem from TopCoder, in which a String is given and in each step you have to replace all occurrences of an character (of your choice) with another character (of your choice), so that at the end after all steps you get a palindrome. The problem is to identify the minimum total number of replacements.
Ideas so far:
I can identify that the string after every step is simply a node/vertex in a graph and that the cost of every edge is the number of replacements made in the step, but I don't see how to use greedy for that (it is definitely not the Minimum Spanning Tree problem). I don't think it makes sense to identify all possible nodes & edge costs and to convert the problem in the Shortest Path problem. On the other side, I think in every step it makes sense to replace the character X with the biggest number of conflicts, with the character Y in conflict with X that occurs most in the string.
Anyway, I can't either prove that it works. Also I can't identify any known problems in this. Any ideas?
You need to identify disjunct sets of characters. A disjunct set of characters is a set of characters that will all have to become the same character in order for the string to become a palindrome.
Example:
Let's say we have the string abcdefgfmdebac
It has 3 disjunct sets, abc, de and fgm
Algorithm:
Pick the first character and check all occurences of it picking up other characters in the set.
In the example string we start with a and pick up band c (because they sit on the opposite sides of the two ain our string). We repeat the process for band c, but no new characters are added to the set. So abc is our first disjunct set.
Continue doing this with the remaining characters.
A disjunct set of n characters (counting all characters) needs n-m replacements, where m is the number of occurences of the most frequent character.
So simply sum over the sets.
In our example it takes 4 + 2 + 2 = 8 replacements.

Fewest toggles to create an alternating chain

I'm trying to solve this problem on SPOJ : http://www.spoj.pl/problems/EDIT/
I'm trying to get a decent recursive description of the algorithm, but I'm failing as my thoughts keep spinning in circles! Can you guys help me out with this one? I'll try to describe what approach I'm trying to solve this.
Basically I want to solve a problem of size j-i where i is the starting index and j is the ending index. Now, there should be two cases. If j-i is even then both the starting and the ending letters have to be the same case, and they have to be the opposite case when j-i is odd. I also want to reduce the problem of a lower size (j-i-1 or j-i-2), but I feel that if I know a solution to a smaller problem, then constructing a solution of a just bigger problem should also take into account the starting and ending letter cases of the smaller problem. This is exactly where I'm getting confused. Can you guys put my thoughts on the right track?
I think recursion is not the best way to go with this problem. It can be solved quite fast if we take a different approach!
Let us consider binary strings. Say an uppercase char is 1 and a lowercase one is 0. For example
AaAaB -> 10101
ABaa -> 1100
a -> 0
a "correct" alternating chain is either 10101010.. or 010101010..
We call the minimum number of substitutions required to change one string into the other the Hamming distance between the strings. What we have to find is the minimum Hamming distance between the input binary string and one of the two alternating chains of the same length.
It's not difficult: we XOR each string and then count the number of 1s. (link). For example, let's consider the following string: ABaa.
We convert it in binary:
ABaa -> 1100
We generate the only two alternating chains of length 4:
1010
0101
We XOR them with the input:
1100 XOR 1010 = 0101
1100 XOR 0101 = 1010
We count the 1s in the results and take the minimum. In this case, it's 2.
I coded this procedure in Java with some minor optimization (buffered I/O, no real need to generate the alternating chains) and it got accepted: (0.60 seconds one).
Given any string s of length n, there are only two possible "alternating chain".
This 2 variants can be defined sequentially by settings the first letter state (if first is upper then second is lower, third is upper...).
A simple linear algorithm would be to make 2 simple assumptions about the first letter:
First letter is UpperCase
First letter is LowerCase
For each assumption, run a simple edit distance algorithm and you are done.
You can do it recursively, but you'll need to pass and return a lot of state information between functions, which I think is not worthwhile when this problem can be solved by a simple loop.
As the others say, there are two possible "desired result" strings: one starts with an uppercase letter (let's call it result_U) and one starts with a lowercase letter (result_L). We want the smaller of EditDistance(input, result_U) and EditDistance(input, result_L).
Also observe that, to calculate EditDistance(input, result_U), we do not need to generate result_U, we just need to scan input 1 character at a time, and each character that is not the expected case will need 1 edit to make it the correct case, i.e. adds 1 to the edit distance. Ditto for EditDistance(input, result_L).
Also, we can combine the two loops so that we scan input only once. In fact, this can be done while reading each input string.
A naive approach would look like this:
Pseudocode:
EditDistance_U = 0
EditDistance_L = 0
Read a character
To arrive at result_U, does this character need editing?
Yes => EditDistance_U += 1
No => Do nothing
To arrive at result_L, does this character need editing?
Yes => EditDistance_L += 1
No => Do nothing
Loop until end of string
EditDistance = min(EditDistance_U, EditDistance_L)
There are obvious optimizations that can be done to the above also, but I'll leave it to you.
Hint 1: Do we really need 2 conditionals in the loop? How are they related to each other?
Hint 2: What is EditDistance_U + EditDistance_L?

Make palindrome from given word

I have given word like abca. I want to know how many letters do I need to add to make it palindrome.
In this case its 1, because if I add b, I get abcba.
First, let's consider an inefficient recursive solution:
Suppose the string is of the form aSb, where a and b are letters and S is a substring.
If a==b, then f(aSb) = f(S).
If a!=b, then you need to add a letter: either add an a at the end, or add a b in the front. We need to try both and see which is better. So in this case, f(aSb) = 1 + min(f(aS), f(Sb)).
This can be implemented with a recursive function which will take exponential time to run.
To improve performance, note that this function will only be called with substrings of the original string. There are only O(n^2) such substrings. So by memoizing the results of this function, we reduce the time taken to O(n^2), at the cost of O(n^2) space.
The basic algorithm would look like this:
Iterate over the half the string and check if a character exists at the appropriate position at the other end (i.e., if you have abca then the first character is an a and the string also ends with a).
If they match, then proceed to the next character.
If they don't match, then note that a character needs to be added.
Note that you can only move backwords from the end when the characters match. For example, if the string is abcdeffeda then the outer characters match. We then need to consider bcdeffed. The outer characters don't match so a b needs to be added. But we don't want to continue with cdeffe (i.e., removing/ignoring both outer characters), we simply remove b and continue with looking at cdeffed. Similarly for c and this means our algorithm returns 2 string modifications and not more.

Resources