Two sum data structure problems - algorithm

I built a data structure for two sum question. In this data structure I built add and find method.
add - Add the number to an internal data structure.
find - Find if there exists any pair of numbers which sum is equal to the value.
For example:
add(1); add(3); add(5);
find(4) // return true
find(7) // return false
the following is my code, so what is wrong with this code?
http://www.lintcode.com/en/problem/two-sum-data-structure-design/
this is the test website, some cases could not be passed
public class TwoSum {
private List<Integer> sets;
TwoSum() {
this.sets = new ArrayList<Integer>();
}
// Add the number to an internal data structure.
public void add(int number) {
// Write your code here
this.sets.add(number);
}
// Find if there exists any pair of numbers which sum is equal to the value.
public boolean find(int value) {
// Write your code here
Collections.sort(sets);
for (int i = 0; i < sets.size(); i++) {
if (sets.get(i) > value) break;
for (int j = i + 1; j < sets.size(); j++) {
if (sets.get(i) + sets.get(j) == value) {
return true;
}
}
}
return false;
}
}

There does not seem to be anything wrong with your code.
However a coding challenge could possibly require a more performant solution. (You check every item against every item, which would take O(N^2)).
The best solution to implement find, is using a HashMap, which would take O(N). It's explained more in detail here.

Related

Best way to refactor current method

I tried really hard to refactor this code , but was unsuccessful. Please tell me how to go about it. I have been there for hours trying to find solution . I have read some excerpts from book Clean code however, being a beginner I really find it hard to refactor. Sorry, this is my first honest attempt but I am not able to figure out how to make this funtion of size ~4 or small.
public boolean[] validateTrueFalse(String[] checkBoxValues) {
boolean[] answer = new boolean[checkBoxValues.length];
for (int i = 0; i < checkBoxValues.length; i++) {
// values are like 1_true
String[] values = checkBoxValues[i].split("_"); // split each value
// from my array
int configId = Integer.parseInt(values[0]);
boolean isAns = Boolean.parseBoolean(values[1]);
for (TrueFalseConfigurationModel tm : dt.getTfModelList()) {
if (tm.getConfiguration_id() == configId) {
if (tm.isAnswer() == isAns) { // are values from both true
answer[i] = true;
} else {
answer[i] = false;
}
}
}
}
return answer;
}
Remember that short doesn't necessarily means better. Many times a longer method can be more readable and will be easier to understand and also maintain in the future. You will sometimes need to look at your code a year or 2 after you first wrote it and it ain't worth a thing if you can't understand it after you made it so short that you can't understand what you meant to do in that method. Of course that the other extreme is also something to be avoided and a too long method is not modular and can be difficult to understand if you want to change only a specific part of it.
In my opinion, that method you wrote is at a good length and doesn't need to be shortened.
But just to answer your question, you can always shorten your methods by dividing them to more methods. for example in your case:
public boolean[] validateTrueFalse(String[] checkBoxValues) {
boolean[] answer = new boolean[checkBoxValues.length];
for (int i = 0; i < checkBoxValues.length; i++) {
answer[i] = GetAnswer(checkBoxValues[i]);
}
return answer;
}
public bool GetAnswer(string aCheckBoxValue)
{
String[] values = aCheckBoxValue.split("_");
int configId = Integer.parseInt(values[0]);
boolean isAns = Boolean.parseBoolean(values[1]);
for (TrueFalseConfigurationModel tm : dt.getTfModelList())
{
if (tm.getConfiguration_id() == configId)
{
return tm.isAnswer() == isAns;
}
}
return false;
}
Notice how I divided the one big action in the method to smaller actions which created shorter methods. You can then continue in that manner and divide the GetAnswer method itself into 2 methods if you can find a logical way to divide it.
You can reduce
if (tm.isAnswer() == isAns) { // are values from both true
answer[i] = true;
} else {
answer[i] = false;
}
By
answer[i] = tm.isAnswer() == isAns;

Compare each string in datatable with that of list takes longer time.poor performance

I have a datatable of 200,000 rows and want to validate each row with that of list and return that string codesList..
It is taking very long time..I want to improve the performance.
for (int i = 0; i < dataTable.Rows.Count; i++)
{
bool isCodeValid = CheckIfValidCode(codevar, codesList,out CodesCount);
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
List<Codes> tempcodes= codesList.Where(code => code.StdCode.Equals(codevar)).ToList();
if (tempcodes.Count == 0)
{
RetVal = false;
for (int i = 0; i < dataTable.Rows.Count; i++)
{
bool isCodeValid = CheckIfValidCode(codevar, codesList,out CodesCount);
}
}
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
List<Codes> tempcodes= codesList.Where(code => code.StdCode.Equals(codevar)).ToList();
if (tempcodes.Count == 0)
{
RetVal = false;
}
else
{
RetVal=true;
}
return bRetVal;
}
codelist is a list which also contains 200000 records. Please suggest. I used findAll which takes same time and also used LINQ query which also takes same time.
A few optimizations come to mind:
You could start by removing the Tolist() altogether
replace the Count() with .Any(), which returns true if there are items in the result
It's probably also a lot faster when you replace the List with a HashSet<Codes> (this requires your Codes class to implement HashCode and Equals properly. Alternatively you could populate a HashSet<string> with the contents of Codes.StdCode
It looks like you're not using the out count at all. Removing it would make this method a lot faster. Computing a count requires you to check all codes.
You could also split the List into a Dictionary> which you populate with by taking the first character of the code. That would reduce the number of codes to check drastically, since you can exclude 95% of the codes by their first character.
Tell string.Equals to use a StringComparison of type Ordinal or OrdinalIgnoreCase to speed up the comparison.
It looks like you can stop processing a lot earlier as well, the use of .Any takes care of that in the second method. A similar construct can be used in the first, instead of using for and looping through each row, you could short-circuit after the first failure is found (unless this code is incomplete and you mark each row as invalid individually).
Something like:
private bool CheckIfValidCode(string codevar, List<Codes> codesList)
{
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
return codes.Contains(codevar);
// or: return codes.Any(c => string.Equals(codevar, c, StringComparison.Ordinal);
}
If you're adamant about the count:
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
count = codes.Count(codevar);
// or: count = codes.Count(c => string.Equals(codevar, c, StringComparison.Ordinal);
return count > 0;
}
You can optimize further by creating the HashSet outside of the call and re-use the instance:
InCallingCode
{
...
Hashset<string> codes = new Hashset(codesList.Select(c ==> code.StdCode));
for (/*loop*/) {
bool isValid = CheckIfValidCode(codevar, codes, out int count)
}
....
}
private bool CheckIfValidCode(string codevar, List<Codes> codesList, out int count)
{
count = codes.Count(codevar);
// or: count = codes.Count(c => string.Equals(codevar, c, StringComparison.Ordinal);
return count > 0;
}

How to efficiently add the entire English dictionary to a trie data structure

Simply put I want to check if a specified word exists or not.
The lookup needs to be very fast which is why I decided to store the dictionary in a trie. So far so good! My trie works without issues. The problem is filling the trie with a dictionary. What I'm currently doing is looping through every line of a plain text file that is the dictionary and adding each word to my trie.
This is understandably so an extremely slow process. The file contains just about 120 000 lines. If anyone could point me in the right direction for what I could do it would be much appreciated!
This is how I add words to the trie (in Boo):
trie = Trie()
saol = Resources.Load("saol") as TextAsset
text = saol.text.Split(char('\n'))
for new_word in text:
trie.Add(new_word)
And this is my trie (in C#):
using System.Collections.Generic;
public class TrieNode {
public char letter;
public bool word;
public Dictionary<char, TrieNode> child;
public TrieNode(char letter) {
this.letter = letter;
this.word = false;
this.child = new Dictionary<char, TrieNode>();
}
}
public class Trie {
private TrieNode root;
public Trie() {
root = new TrieNode(' ');
}
public void Add(string word) {
TrieNode node = root;
bool found_letter;
int c = 1;
foreach (char letter in word) {
found_letter = false;
// if current letter is in child list, set current node and break loop
foreach (var child in node.child) {
if (letter == child.Key) {
node = child.Value;
found_letter = true;
break;
}
}
// if current letter is not in child list, add child node and set it as current node
if (!found_letter) {
TrieNode new_node = new TrieNode(letter);
if (c == word.Length) new_node.word = true;
node.child.Add(letter, new_node);
node = node.child[letter];
}
c ++;
}
}
public bool Find(string word) {
TrieNode node = root;
bool found_letter;
int c = 1;
foreach (char letter in word) {
found_letter = false;
// check if current letter is in child list
foreach (var child in node.child) {
if (letter == child.Key) {
node = child.Value;
found_letter = true;
break;
}
}
if (found_letter && node.word && c == word.Length) return true;
else if (!found_letter) return false;
c ++;
}
return false;
}
}
Assuming that you don't have any serious implementation problems, pay the price for populating the trie. After you've populated the trie serialize it to a file. For future needs, just load the serialized version. That should be faster that reconstructing the trie.
-- ADDED --
Looking closely at your TrieNode class, you may want to replacing the Dictionary you used for child with an array. You may consume more space, but have a faster lookup time.
Anything you do with CLI yourself will be slower then using the built-in functions.
120k is not that much for a dictionary.
First thing I would do is fire up the code performance tool.
But just some wild guesses: You have a lot of function calls. Just starting with the Boo C# binding in a for loop. Try to pass the whole text block and tare it apart with C#.
Second, do not use a Dictionary. You waste just about as much resources with your code now as you would just using a Dictionary.
Third, sort the text before you go inserting - you can probably make some optimizations that way. Maybe just construct a suffix table.

How to find a word from arrays of characters?

What is the best way to solve this:
I have a group of arrays with 3-4 characters inside each like so:
{p, {a, {t, {m,
q, b, u, n,
r, c v o
s } } }
}
I also have an array of dictionary words.
What is the best/fastest way to find if the array of characters can combine to form one of the dictionary words? For example, the above arrays could make the words:
"pat","rat","at","to","bum"(lol)but not "nub" or "mat"Should i loop through the dictionary to see if words can be made or get all the combinations from the letters then compare those to the dictionary
I had some Scrabble code laying around, so I was able to throw this together. The dictionary I used is sowpods (267751 words). The code below reads the dictionary as a text file with one uppercase word on each line.
The code is C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Diagnostics;
namespace SO_6022848
{
public struct Letter
{
public const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
public static implicit operator Letter(char c)
{
return new Letter() { Index = Chars.IndexOf(c) };
}
public int Index;
public char ToChar()
{
return Chars[Index];
}
public override string ToString()
{
return Chars[Index].ToString();
}
}
public class Trie
{
public class Node
{
public string Word;
public bool IsTerminal { get { return Word != null; } }
public Dictionary<Letter, Node> Edges = new Dictionary<Letter, Node>();
}
public Node Root = new Node();
public Trie(string[] words)
{
for (int w = 0; w < words.Length; w++)
{
var word = words[w];
var node = Root;
for (int len = 1; len <= word.Length; len++)
{
var letter = word[len - 1];
Node next;
if (!node.Edges.TryGetValue(letter, out next))
{
next = new Node();
if (len == word.Length)
{
next.Word = word;
}
node.Edges.Add(letter, next);
}
node = next;
}
}
}
}
class Program
{
static void GenWords(Trie.Node n, HashSet<Letter>[] sets, int currentArrayIndex, List<string> wordsFound)
{
if (currentArrayIndex < sets.Length)
{
foreach (var edge in n.Edges)
{
if (sets[currentArrayIndex].Contains(edge.Key))
{
if (edge.Value.IsTerminal)
{
wordsFound.Add(edge.Value.Word);
}
GenWords(edge.Value, sets, currentArrayIndex + 1, wordsFound);
}
}
}
}
static void Main(string[] args)
{
const int minArraySize = 3;
const int maxArraySize = 4;
const int setCount = 10;
const bool generateRandomInput = true;
var trie = new Trie(File.ReadAllLines("sowpods.txt"));
var watch = new Stopwatch();
var trials = 10000;
var wordCountSum = 0;
var rand = new Random(37);
for (int t = 0; t < trials; t++)
{
HashSet<Letter>[] sets;
if (generateRandomInput)
{
sets = new HashSet<Letter>[setCount];
for (int i = 0; i < setCount; i++)
{
sets[i] = new HashSet<Letter>();
var size = minArraySize + rand.Next(maxArraySize - minArraySize + 1);
while (sets[i].Count < size)
{
sets[i].Add(Letter.Chars[rand.Next(Letter.Chars.Length)]);
}
}
}
else
{
sets = new HashSet<Letter>[] {
new HashSet<Letter>(new Letter[] { 'P', 'Q', 'R', 'S' }),
new HashSet<Letter>(new Letter[] { 'A', 'B', 'C' }),
new HashSet<Letter>(new Letter[] { 'T', 'U', 'V' }),
new HashSet<Letter>(new Letter[] { 'M', 'N', 'O' }) };
}
watch.Start();
var wordsFound = new List<string>();
for (int i = 0; i < sets.Length - 1; i++)
{
GenWords(trie.Root, sets, i, wordsFound);
}
watch.Stop();
wordCountSum += wordsFound.Count;
if (!generateRandomInput && t == 0)
{
foreach (var word in wordsFound)
{
Console.WriteLine(word);
}
}
}
Console.WriteLine("Elapsed per trial = {0}", new TimeSpan(watch.Elapsed.Ticks / trials));
Console.WriteLine("Average word count per trial = {0:0.0}", (float)wordCountSum / trials);
}
}
}
Here is the output when using your test data:
PA
PAT
PAV
QAT
RAT
RATO
RAUN
SAT
SAU
SAV
SCUM
AT
AVO
BUM
BUN
CUM
TO
UM
UN
Elapsed per trial = 00:00:00.0000725
Average word count per trial = 19.0
And the output when using random data (does not print each word):
Elapsed per trial = 00:00:00.0002910
Average word count per trial = 62.2
EDIT: I made it much faster with two changes: Storing the word at each terminal node of the trie, so that it doesn't have to be rebuilt. And storing the input letters as an array of hash sets instead of an array of arrays, so that the Contains() call is fast.
There are probably many way of solving this.
What you are interested in is the number of each character you have available to form a word, and how many of each character is required for each dictionary word. The trick is how to efficiently look up this information in the dictionary.
Perhaps you can use a prefix tree (a trie), some kind of smart hash table, or similar.
Anyway, you will probably have to try out all your possibilities and check them against the dictionary. I.e., if you have three arrays of three values each, there will be 3^3+3^2+3^1=39 combinations to check out. If this process is too slow, then perhaps you could stick a Bloom filter in front of the dictionary, to quickly check if a word is definitely not in the dictionary.
EDIT: Anyway, isn't this essentially the same as Scrabble? Perhaps try Googling for "scrabble algorithm" will give you some good clues.
The reformulated question can be answered just by generating and testing. Since you have 4 letters and 10 arrays, you've only got about 1 million possible combinations (10 million if you allow a blank character). You'll need an efficient way to look them up, use a BDB or some sort of disk based hash.
The trie solution previously posted should work as well, you are just restricted more by what characters you can choose at each step of the search. It should be faster as well.
I just made a very large nested for loop like this:
for(NSString*s1 in [letterList objectAtIndex:0]{
for(NSString*s2 in [letterList objectAtIndex:1]{
8 more times...
}
}
Then I do a binary search on the combination to see if it is in the dictionary and add it to an array if it is

What does ExpressionVisitor.Visit<T> Do?

Before someone shouts out the answer, please read the question through.
What is the purpose of the method in .NET 4.0's ExpressionVisitor:
public static ReadOnlyCollection<T> Visit<T>(ReadOnlyCollection<T> nodes, Func<T, T> elementVisitor)
My first guess as to the purpose of this method was that it would visit each node in each tree specified by the nodes parameter and rewrite the tree using the result of the elementVisitor function.
This does not appear to be the case. Actually this method appears to do a little more than nothing, unless I'm missing something here, which I strongly suspect I am...
I tried to use this method in my code and when things didn't work out as expected, I reflectored the method and found:
public static ReadOnlyCollection<T> Visit<T>(ReadOnlyCollection<T> nodes, Func<T, T> elementVisitor)
{
T[] list = null;
int index = 0;
int count = nodes.Count;
while (index < count)
{
T objA = elementVisitor(nodes[index]);
if (list != null)
{
list[index] = objA;
}
else if (!object.ReferenceEquals(objA, nodes[index]))
{
list = new T[count];
for (int i = 0; i < index; i++)
{
list[i] = nodes[i];
}
list[index] = objA;
}
index++;
}
if (list == null)
{
return nodes;
}
return new TrueReadOnlyCollection<T>(list);
}
So where would someone actually go about using this method? What am I missing here?
Thanks.
It looks to me like a convenience method to apply an aribitrary transform function to an expression tree, and return the resulting transformed tree, or the original tree if there is no change.
I can't see how this is any different of a pattern that a standard expression visitor, other than except for using a visitor type, it uses a function.
As for usage:
Expression<Func<int, int, int>> addLambdaExpression= (a, b) => a + b;
// Change add to subtract
Func<Expression, Expression> changeToSubtract = e =>
{
if (e is BinaryExpression)
{
return Expression.Subtract((e as BinaryExpression).Left,
(e as BinaryExpression).Right);
}
else
{
return e;
}
};
var nodes = new Expression[] { addLambdaExpression.Body }.ToList().AsReadOnly();
var subtractExpression = ExpressionVisitor.Visit(nodes, changeToSubtract);
You don't explain how you expected it to behave and why therefore you think it does little more than nothing.

Resources