Given a String of words, say "OhMy", keep the uppercase letter fixed(unchanged) but we can change the position of lower case letter. Output all possible permutation.
eg. given "OhMy" it should output [ "OhMy", "OyMh"]
Here is what I did:
public static List<String> Permutation(String s){
List<String> res = new ArrayList<String>();
if (s == null || s.length() == 0){
return res;
}
StringBuilder path = new StringBuilder(s);
List<Character> candidates = new ArrayList<Character>();
List<Integer> position = new ArrayList<Integer>();
for (int i = 0; i < s.length(); i++){
char c = s.charAt(i);
if (Character.isAlphabetic(c) && Character.isLowerCase(c)){
candidates.add(c);
position.add(i);
}
}
boolean[] occurred = new boolean[candidates.size()];
helper(res, path, candidates, position, 0);
return res;
}
public static void helper(List<String> res, StringBuilder path, List<Character> candidates, List<Integer> position, int index){
if (index == position.size()){
res.add(path.toString());
return ;
}
for (int i = index; i < position.size(); i++){
for (int j = 0; j < candidates.size(); j++){
path.setCharAt(position.get(i), candidates.get(j));
char c = candidates.remove(j);
helper(res, path, candidates, position, index+1);
candidates.add(j, c);
}
}
}
for input "Abc"
it will have result [Abc, Acb, Acc, Acb]
Essentially, the outer loop is iterating every possible position, inner loop tries every possible candidates at each possible position.
I don't know why it has duplicates li "Acc, Acb"
It seems like the main point of your implicit question is how to efficiently enumerate all permutations of a given set, which you can read about online (there are several methods). If you can enumerate all permutations of the indices of lower case letters, then it's pretty straightforward to do book keeping and merge each permutation of lower case letters with the original unchanged set of upper case letters, respecting the positions of the upper case letters, so you can output your strings. If you're having difficultly with that part, update your question and someone should be able to help you out.
Related
I have a long list of lines in (possibly) random order. So basically:
struct Line
{
Vector StartPos;
Vector EndPos;
};
Now I'm looking for an efficient way to sort these lines so that they are sorted into spans. I.E. if line A's startpos matches Line B's endpos, it gets moved into the list immediately after line B. If nothing matches, it just goes to the end of the list to start a new span.
Right now I'm doing it brute force-- setting a flag variable if anything was changed, and if anything changed, sorting it again. This produces gigantically exponential iterations. Is there any faster way to optimize this so that I could conceivably keep the iterations down to listsize^listsize?
If you do not have lines that start or end at the same point maybe you can use dictionaries to reduce the look ups. Something like:
public class Line
{
public Point StartPos;
public Point EndPos;
public bool isUsed = false;
};
and then 1) create a dictionary with the key the endPos and the value the index of the element in you list, 2) for each element of the list follow the link using the dictionary. Something like:
List<List<Line>> result = new List<List<Line>>();
Dictionary<Point,int> dic= new Dictionary<Point,int>();
for (int kk = 0; kk < mylines.Count; kk++)
{
dic[mylines[kk].EndPos] = kk;
}
for (int kk = 0; kk < mylines.Count; kk++)
{
if (mylines[kk].isUsed == false)
{
var orderline= new List<Line>();
orderline.Add(mylines[kk]);
int mm = kk;
while (dic.ContainsKey(mylines[mm].EndPos))
{
mm = dic[mylines[mm].EndPos];
mylines[mm].isUsed = true;
orderline.Add(mylines[mm]);
}
result.Add(orderline);
}
}
I'd like to make sure that I am doing the time complexity analysis correctly. There seems to be many different analysis.
Just in case people don't know the problem this is problem description.
Given two words (beginWord and endWord), and a dictionary's word list, find the length of shortest transformation sequence from beginWord to endWord, such that:
Only one letter can be changed at a time.
Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
For example,
Given:
beginWord = "hit"
endWord = "cog"
wordList = ["hot","dot","dog","lot","log","cog"]
As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" -> "cog",
return its length 5.
And this is simple BFS algorithm.
static int ladderLength(String beginWord, String endWord, List<String> wordList) {
int level = 1;
Deque<String> queue = new LinkedList<>();
queue.add(beginWord);
queue.add(null);
Set<String> visited = new HashSet<>();
// worst case we can add all dictionary thus N (len(dict)) computation
while (!queue.isEmpty()) {
String word = queue.removeFirst();
if (word != null) {
if (word.equals(endWord)) {
return level;
}
// m * 26 * log N
for (int i = 0; i < word.length(); i++) {
char[] chars = word.toCharArray();
for (char c = 'a'; c <= 'z'; c++) {
chars[i] = c;
String newStr = new String(chars);
if (!visited.contains(newStr) && wordList.contains(newStr)) {
queue.add(newStr);
visited.add(newStr);
}
}
}
} else {
level++;
if (!queue.isEmpty()) {
queue.add(null);
}
}
}
return 0;
}
wordList (dictionary) contains N elements and length of beginWord is m
In worst case, the queue would have all the element in the word list, thus, the outer while loop would run for o(N).
For each word (length m), it tries 26 charaters (a to z) thus inner nested for loop is o(26*m), and inside inner for loop, it does wordList.contains assume it's o(logN).
So overall it's o(N*m*26*logN) => o(N*mlogN)
Is this correct?
The List<T> type does not automatically sort its elements, but instead "faithfully" keeps all elements in the order they were added. So wordList.contains is in fact O(n). However for a HashSet such as visited, this operation is O(1) (amortized), so consider switching to that.
I've been trying for a few days to think of a simple case when my solution to the word ladders problem breaks down. I tried to implement a DP solution with memorization. I would greatly appreciate an explanation why DP doesnt work here. Here is how I implemented my (incorrect) DP solution.
public class Solution {
public int ladderLength(String beginWord, String endWord, List<String> wordList) {
int[] visited = new int[wordList.size()];
HashMap<String, Integer> map = new HashMap<>();
int res = ladderHelper( beginWord,endWord, wordList,visited,map);
return res;
}
private int ladderHelper(String beginWord, String endWord, List<String> wordList, int[] visited, HashMap<String, Integer> map) {
if (beginWord.equals(endWord)) return 1;
int bestSeen = 0;
for (int i = 0; i < wordList.size(); i++) {
if (visited[i] == 1) continue;
if (!validJump(beginWord, wordList.get(i))) continue;
if (map.containsKey(wordList.get(i))) {
int val = map.get(wordList.get(i));
if (val != 0 && val+ 1 < bestSeen) bestSeen = map.get(wordList.get(i))+1;
}else {
visited[i] = 1;
int distance = ladderHelper(wordList.get(i), endWord, wordList, visited, map);
visited[i] = 0;
if (distance != 0 && (bestSeen == 0 || distance + 1 < bestSeen)) bestSeen = distance+1;
}
}
map.put(beginWord, bestSeen);
return bestSeen;
}
private boolean validJump(String a, String b) {
int mistakes = 0;
for (int i = 0; i < a.length(); i++) {
if (a.charAt(i) != b.charAt(i) && ++mistakes > 1) return false;
}
return true;
}
}
The question is given more in detail here.
I think this code has one trivial and one interesting problem.
Trivial bug
In the line:
if (val != 0 && val+ 1 < bestSeen) bestSeen = map.get(wordList.get(i))+1;
if bestSeen is equal to 0 (this is the case if all values so far have been in the cache), then this condition will never activate. You need something more like:
if (val != 0 && (bestSeen==0 || val+ 1 < bestSeen)) bestSeen = map.get(wordList.get(i))+1;
The effect of this is that sometimes a shorter route will be ignored.
Interesting bug
You are using DFS to try and find the shortest path. If you switch to using BFS I would expect your solution to pass.
The reason DFS fails is due to the visited array. The visited array is used to keep track of the words on the current path to prevent infinite recursion. The problem is that we ignore all paths that go through these visited nodes.
At first sight this seems fine, after all our shortest path will never need to loop round on itself!
However, consider a pattern of words represented by the graph below:
Imagine that your DFS code has visited A,B,C,D.
When it visits D it has a look at its neighbours, sees that they are all visited, and concludes that it is impossible for there to be a route from D to the end!
When the algorithm backtracks, it will eventually try the route start->D, but the cache will report that this route is impossible so it will not find the shortest path.
For example, string "AAABBB" will have permutations:
"ABAABB",
"BBAABA",
"ABABAB",
etc
What's a good algorithm for generating the permutations? (And what's its time complexity?)
For a multiset, you can solve recursively by position (JavaScript code):
function f(multiset,counters,result){
if (counters.every(x => x === 0)){
console.log(result);
return;
}
for (var i=0; i<counters.length; i++){
if (counters[i] > 0){
_counters = counters.slice();
_counters[i]--;
f(multiset,_counters,result + multiset[i]);
}
}
}
f(['A','B'],[3,3],'');
This is not full answer, just an idea.
If your strings has fixed number of only two letters I'll go with binary tree and good recursion function.
Each node is object that contains name with prefix of parent name and suffix A or B furthermore it have numbers of A and B letters in the name.
Node constructor gets name of parent and number of A and B from parent so it needs only to add 1 to number of A or B and one letter to name.
It doesn't construct next node if there is more than three A (in case of A node) or B respectively, or their sum is equal to the length of starting string.
Now you can collect leafs of 2 trees (their names) and have all permutations that you need.
Scala or some functional language (with object-like features) would be perfect for implementing this algorithm. Hope this helps or just sparks some ideas.
Since you actually want to generate the permutations instead of just counting them, the best complexity you can hope for is O(size_of_output).
Here's a good solution in java that meets that bound and runs very quickly, while consuming negligible space. It first sorts the letters to find the lexographically smallest permutation, and then generates all permutations in lexographic order.
It's known as the Pandita algorithm: https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order
import java.util.Arrays;
import java.util.function.Consumer;
public class UniquePermutations
{
static void generateUniquePermutations(String s, Consumer<String> consumer)
{
char[] array = s.toCharArray();
Arrays.sort(array);
for (;;)
{
consumer.accept(String.valueOf(array));
int changePos=array.length-2;
while (changePos>=0 && array[changePos]>=array[changePos+1])
--changePos;
if (changePos<0)
break; //all done
int swapPos=changePos+1;
while(swapPos+1 < array.length && array[swapPos+1]>array[changePos])
++swapPos;
char t = array[changePos];
array[changePos] = array[swapPos];
array[swapPos] = t;
for (int i=changePos+1, j = array.length-1; i < j; ++i,--j)
{
t = array[i];
array[i] = array[j];
array[j] = t;
}
}
}
public static void main (String[] args) throws java.lang.Exception
{
StringBuilder line = new StringBuilder();
generateUniquePermutations("banana", s->{
if (line.length() > 0)
{
if (line.length() + s.length() >= 75)
{
System.out.println(line.toString());
line.setLength(0);
}
else
line.append(" ");
}
line.append(s);
});
System.out.println(line);
}
}
Here is the output:
aaabnn aaanbn aaannb aabann aabnan aabnna aanabn aananb aanban aanbna
aannab aannba abaann abanan abanna abnaan abnana abnnaa anaabn anaanb
anaban anabna ananab ananba anbaan anbana anbnaa annaab annaba annbaa
baaann baanan baanna banaan banana bannaa bnaaan bnaana bnanaa bnnaaa
naaabn naaanb naaban naabna naanab naanba nabaan nabana nabnaa nanaab
nanaba nanbaa nbaaan nbaana nbanaa nbnaaa nnaaab nnaaba nnabaa nnbaaa
What is the complexity of the algorithm is that is used to find the smallest snippet that contains all the search key words?
As stated, the problem is solved by a rather simple algorithm:
Just look through the input text sequentially from the very beginning and check each word: whether it is in the search key or not. If the word is in the key, add it to the end of the structure that we will call The Current Block. The Current Block is just a linear sequence of words, each word accompanied by a position at which it was found in the text. The Current Block must maintain the following Property: the very first word in The Current Block must be present in The Current Block once and only once. If you add the new word to the end of The Current Block, and the above property becomes violated, you have to remove the very first word from the block. This process is called normalization of The Current Block. Normalization is a potentially iterative process, since once you remove the very first word from the block, the new first word might also violate The Property, so you'll have to remove it as well. And so on.
So, basically The Current Block is a FIFO sequence: the new words arrive at the right end, and get removed by normalization process from the left end.
All you have to do to solve the problem is look through the text, maintain The Current Block, normalizing it when necessary so that it satisfies The Property. The shortest block with all the keywords in it you ever build is the answer to the problem.
For example, consider the text
CxxxAxxxBxxAxxCxBAxxxC
with keywords A, B and C. Looking through the text you'll build the following sequence of blocks
C
CA
CAB - all words, length 9 (CxxxAxxxB...)
CABA - all words, length 12 (CxxxAxxxBxxA...)
CABAC - violates The Property, remove first C
ABAC - violates The Property, remove first A
BAC - all words, length 7 (...BxxAxxC...)
BACB - violates The Property, remove first B
ACB - all words, length 6 (...AxxCxB...)
ACBA - violates The Property, remove first A
CBA - all words, length 4 (...CxBA...)
CBAC - violates The Property, remove first C
BAC - all words, length 6 (...BAxxxC)
The best block we built has length 4, which is the answer in this case
CxxxAxxxBxxAxx CxBA xxxC
The exact complexity of this algorithm depends on the input, since it dictates how many iterations the normalization process will make, but ignoring the normalization the complexity would trivially be O(N * log M), where N is the number of words in the text and M is the number of keywords, and O(log M) is the complexity of checking whether the current word belongs to the keyword set.
Now, having said that, I have to admit that I suspect that this might not be what you need. Since you mentioned Google in the caption, it might be that the statement of the problem you gave in your post is not complete. Maybe in your case the text is indexed? (With indexing the above algorithm is still applicable, just becomes more efficient). Maybe there's some tricky database that describes the text and allows for a more efficient solution (like without looking through the entire text)? I can only guess and you are not saying...
I think the solution proposed by AndreyT assumes no duplicates exists in the keywords/search terms. Also, the current block can get as big as the text itself if text contains lot of duplicate keywords.
For example:
Text: 'ABBBBBBBBBB'
Keyword text: 'AB'
Current Block: 'ABBBBBBBBBB'
Anyway, I have implemented in C#, did some basic testing, would be nice to get some feedback on whether it works or not :)
static string FindMinWindow(string text, string searchTerms)
{
Dictionary<char, bool> searchIndex = new Dictionary<char, bool>();
foreach (var item in searchTerms)
{
searchIndex.Add(item, false);
}
Queue<Tuple<char, int>> currentBlock = new Queue<Tuple<char, int>>();
int noOfMatches = 0;
int minLength = Int32.MaxValue;
int startIndex = 0;
for(int i = 0; i < text.Length; i++)
{
char item = text[i];
if (searchIndex.ContainsKey(item))
{
if (!searchIndex[item])
{
noOfMatches++;
}
searchIndex[item] = true;
var newEntry = new Tuple<char, int> ( item, i );
currentBlock.Enqueue(newEntry);
// Normalization step.
while (currentBlock.Count(o => o.Item1.Equals(currentBlock.First().Item1)) > 1)
{
currentBlock.Dequeue();
}
// Figuring out minimum length.
if (noOfMatches == searchTerms.Length)
{
var length = currentBlock.Last().Item2 - currentBlock.First().Item2 + 1;
if (length < minLength)
{
startIndex = currentBlock.First().Item2;
minLength = length;
}
}
}
}
return noOfMatches == searchTerms.Length ? text.Substring(startIndex, minLength) : String.Empty;
}
This is an interesting question.
To restate it more formally:
Given a list L (the web page) of length n and a set S (the query) of size k, find the smallest sublist of L that contains all the elements of S.
I'll start with a brute-force solution in hopes of inspiring others to beat it.
Note that set membership can be done in constant time, after one pass through the set. See this question.
Also note that this assumes all the elements of S are in fact in L, otherwise it will just return the sublist from 1 to n.
best = (1,n)
For i from 1 to n-k:
Create/reset a hash found[] mapping each element of S to False.
For j from i to n or until counter == k:
If found[L[j]] then counter++ and let found[L[j]] = True;
If j-i < best[2]-best[1] then let best = (i,j).
Time complexity is O((n+k)(n-k)). Ie, n^2-ish.
Here's a solution using Java 8.
static Map.Entry<Integer, Integer> documentSearch(Collection<String> document, Collection<String> query) {
Queue<KeywordIndexPair> queue = new ArrayDeque<>(query.size());
HashSet<String> words = new HashSet<>();
query.stream()
.forEach(words::add);
AtomicInteger idx = new AtomicInteger();
IndexPair interval = new IndexPair(0, Integer.MAX_VALUE);
AtomicInteger size = new AtomicInteger();
document.stream()
.map(w -> new KeywordIndexPair(w, idx.getAndIncrement()))
.filter(pair -> words.contains(pair.word)) // Queue.contains is O(n) so we trade space for efficiency
.forEach(pair -> {
// only the first and last elements are useful to the algorithm, so we don't bother removing
// an element from any other index. note that removing an element using equality
// from an ArrayDeque is O(n)
KeywordIndexPair first = queue.peek();
if (pair.equals(first)) {
queue.remove();
}
queue.add(pair);
first = queue.peek();
int diff = pair.index - first.index;
if (size.incrementAndGet() == words.size() && diff < interval.interval()) {
interval.begin = first.index;
interval.end = pair.index;
size.set(0);
}
});
return new AbstractMap.SimpleImmutableEntry<>(interval.begin, interval.end);
}
There are 2 static nested classes KeywordIndexPair and IndexPair, the implementation of which should be apparent from the names. Using a smarter programming language that supports tuples those classes wouldn't be necessary.
Test:
Document: apple, banana, apple, apple, dog, cat, apple, dog, banana, apple, cat, dog
Query: banana, cat
Interval: 8, 10
For all the words, maintain min and max index in case there is going to be more than one entry; if not both min and mix index will same.
import edu.princeton.cs.algs4.ST;
public class DicMN {
ST<String, Words> st = new ST<>();
public class Words {
int min;
int max;
public Words(int index) {
min = index;
max = index;
}
}
public int findMinInterval(String[] sw) {
int begin = Integer.MAX_VALUE;
int end = Integer.MIN_VALUE;
for (int i = 0; i < sw.length; i++) {
if (st.contains(sw[i])) {
Words w = st.get(sw[i]);
begin = Math.min(begin, w.min);
end = Math.max(end, w.max);
}
}
if (begin != Integer.MAX_VALUE) {
return (end - begin) + 1;
}
return 0;
}
public void put(String[] dw) {
for (int i = 0; i < dw.length; i++) {
if (!st.contains(dw[i])) {
st.put(dw[i], new Words(i));
}
else {
Words w = st.get(dw[i]);
w.min = Math.min(w.min, i);
w.max = Math.max(w.max, i);
}
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
DicMN dic = new DicMN();
String[] arr1 = { "one", "two", "three", "four", "five", "six", "seven", "eight" };
dic.put(arr1);
String[] arr2 = { "two", "five" };
System.out.print("Interval:" + dic.findMinInterval(arr2));
}
}