Datastructure ambiguity

Datastructure ambiguity - algorithm

I can't figure this interview question.
You have an array of integers. You need to provide another Data structure that will have these functions:
int get(int index)
void set (int index, int value)
void setall(int value)
They all do what you guess they're suppose to do.
The limitation is that every function is in O(1).
How can you design it so that setAll will be O(1).
I thought about adding another field to each integer, that will point to an integer that will be changed every time setAll is called. the problem comes when someone call setAll and then set then get.
Edit: I changed the names of the variables so it would be clearer. Also, since you asked, get is suppose to return array[i], set(index, value) suppose to put the value value in array[index].
After setall(index, value) you should get (get(i) == get(j) == value) for every i,j in the array.

How about storing a "version number" with each variable, i.e.
int globalValue, globalVersion;
int nextVersion;
int[] localValue, localVersion;
int get(int i) {
if (localVersion[i] > globalVersion)
return localValue[i];
else
return globalValue;
}
void set(int i, int value) {
localValue[i] = value;
localVersion[i] = nextVersion++;
}
void setAll(int value) {
globalValue = value;
globalVersion = nextVersion++;
}

Keep a DateTime field ( or simply a counter ) with each element in the array, a setAllValue variable and setAllDateTime variable. With each set, update the DateTime/counter of the element. With SetAll, update the value and DateTime of setAllDateTime.
In get, compare the DateTime of SetAll with DateTime of the element, whichever is newer, return that.

Related

CPLEX OPL Returning a set index with a condition

I have a tuple as below and I want to return the first element's TimePeriodID whose ActivePeriod value is "YES". Tried to use first, and minl & ord functions, but it did not work. Any help would be much appreciated. Thanks.
tuple timeperiod {
string TimePeriodID;
string TimePeriodName;
float CurrencyRate;
float Rank;
string ActivePeriod;
}
{timeperiod} TimePeriods = ...;

tuple timeperiod {
string TimePeriodID;
string TimePeriodName;
float CurrencyRate;
float Rank;
string ActivePeriod;
}
{timeperiod} TimePeriods = {<"A","B",1.0,2.3,"NO">,<"V","B",4.0,2.3,"YES">,<"D","B",4.0,2.3,"YES">};
string answer=first({i | i in TimePeriods : i.ActivePeriod=="YES"}).TimePeriodID;
execute
{
writeln(answer);
}
gives
V

Methods, more than one return?

I have the following method:
From what I learned methods which are not voids need a return. For the following examples I can see two returns, once after if(), and one at the end.
For this example if String s is not a digit it will return the boolean as false. Which makes sense. If it is a digit then it will check whether it is in the interval. I guess I am confused regarding whether we can have multiple returns in such cases and what the limitations are, if there are any. thank you.
private boolean ElementBienFormat(String s) {
for (int i = 0; i < s.length(); i++) {
if (!Character.isDigit(s.charAt(i))) {
return false;
}
}
int n = Integer.valueOf(s);
return (n>=0 && n <=255);

A method will "quit" (return) when control reaches a return. So in this case as soon as a character is not a digit in the input String control will go back to the caller (with the appropriate value).
boolean success = ElementBienFormat( "a" ); // <-- control would go back here with the value of false.
Another quick note is that a void method can have multiple return statements as well
private void Method( int n )
{
if( n < 0 )
return;
//...
//implicit
//return;
}

Build trie faster

I'm making an mobile app which needs thousands of fast string lookups and prefix checks. To speed this up, I made a Trie out of my word list, which has about 180,000 words.
Everything's great, but the only problem is that building this huge trie (it has about 400,000 nodes) takes about 10 seconds currently on my phone, which is really slow.
Here's the code that builds the trie.
public SimpleTrie makeTrie(String file) throws Exception {
String line;
SimpleTrie trie = new SimpleTrie();
BufferedReader br = new BufferedReader(new FileReader(file));
while( (line = br.readLine()) != null) {
trie.insert(line);
}
br.close();
return trie;
}
The insert method which runs on O(length of key)
public void insert(String key) {
TrieNode crawler = root;
for(int level=0 ; level < key.length() ; level++) {
int index = key.charAt(level) - 'A';
if(crawler.children[index] == null) {
crawler.children[index] = getNode();
}
crawler = crawler.children[index];
}
crawler.valid = true;
}
I'm looking for intuitive methods to build the trie faster. Maybe I build the trie just once on my laptop, store it somehow to the disk, and load it from a file in the phone? But I don't know how to implement this.
Or are there any other prefix data structures which will take less time to build, but have similar lookup time complexity?
Any suggestions are appreciated. Thanks in advance.
EDIT
Someone suggested using Java Serialization. I tried it, but it was very slow with this code:
public void serializeTrie(SimpleTrie trie, String file) {
try {
ObjectOutput out = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
out.writeObject(trie);
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
public SimpleTrie deserializeTrie(String file) {
try {
ObjectInput in = new ObjectInputStream(new BufferedInputStream(new FileInputStream(file)));
SimpleTrie trie = (SimpleTrie)in.readObject();
in.close();
return trie;
} catch (IOException | ClassNotFoundException e) {
e.printStackTrace();
return null;
}
}
Can this above code be made faster?
My trie: http://pastebin.com/QkFisi09
Word list: http://www.isc.ro/lists/twl06.zip
Android IDE used to run code: http://play.google.com/store/apps/details?id=com.jimmychen.app.sand

Double-Array tries are very fast to save/load because all data is stored in linear arrays. They are also very fast to lookup, but the insertions can be costly. I bet there is a Java implementation somewhere.
Also, if your data is static (i.e. you don't update it on phone) consider DAFSA for your task. It is one of the most efficient data structures for storing words (must be better than "standard" tries and radix tries both for size and for speed, better than succinct tries for speed, often better than succinct tries for size). There is a good C++ implementation: dawgdic - you can use it to build DAFSA from command line and then use a Java reader for the resulting data structure (example implementation is here).

You could store your trie as an array of nodes, with references to child nodes replaced with array indices. Your root node would be the first element. That way, you could easily store/load your trie from simple binary or text format.
public class SimpleTrie {
public class TrieNode {
boolean valid;
int[] children;
}
private TrieNode[] nodes;
private int numberOfNodes;
private TrieNode getNode() {
TrieNode t = nodes[++numberOnNodes];
return t;
}
}

Just build a large String[] and sort it. Then you can use binary search to find the location of a String. You can also do a query based on prefixes without too much work.
Prefix look-up example:
Compare method:
private static int compare(String string, String prefix) {
if (prefix.length()>string.length()) return Integer.MIN_VALUE;
for (int i=0; i<prefix.length(); i++) {
char s = string.charAt(i);
char p = prefix.charAt(i);
if (s!=p) {
if (p<s) {
// prefix is before string
return -1;
}
// prefix is after string
return 1;
}
}
return 0;
}
Finds an occurrence of the prefix in the array and returns it's location (MIN or MAX are mean not found)
private static int recursiveFind(String[] strings, String prefix, int start, int end) {
if (start == end) {
String lastValue = strings[start]; // start==end
if (compare(lastValue,prefix)==0)
return start; // start==end
return Integer.MAX_VALUE;
}
int low = start;
int high = end + 1; // zero indexed, so add one.
int middle = low + ((high - low) / 2);
String middleValue = strings[middle];
int comp = compare(middleValue,prefix);
if (comp == Integer.MIN_VALUE) return comp;
if (comp==0)
return middle;
if (comp>0)
return recursiveFind(strings, prefix, middle + 1, end);
return recursiveFind(strings, prefix, start, middle - 1);
}
Gets a String array and prefix, prints out occurrences of prefix in array
private static boolean testPrefix(String[] strings, String prefix) {
int i = recursiveFind(strings, prefix, 0, strings.length-1);
if (i==Integer.MAX_VALUE || i==Integer.MIN_VALUE) {
// not found
return false;
}
// Found an occurrence, now search up and down for other occurrences
int up = i+1;
int down = i;
while (down>=0) {
String string = strings[down];
if (compare(string,prefix)==0) {
System.out.println(string);
} else {
break;
}
down--;
}
while (up<strings.length) {
String string = strings[up];
if (compare(string,prefix)==0) {
System.out.println(string);
} else {
break;
}
up++;
}
return true;
}

Here's a reasonably compact format for storing a trie on disk. I'll specify it by its (efficient) deserialization algorithm. Initialize a stack whose initial contents are the root node of the trie. Read characters one by one and interpret them as follows. The meaning of a letter A-Z is "allocate a new node, make it a child of the current top of stack, and push the newly allocated node onto the stack". The letter indicates which position the child is in. The meaning of a space is "set the valid flag of the node on top of the stack to true". The meaning of a backspace (\b) is "pop the stack".
For example, the input
TREE \b\bIE \b\b\bOO \b\b\b
gives the word list
TREE
TRIE
TOO
. On your desktop, construct the trie using whichever method and then serialize by the following recursive algorithm (pseudocode).
serialize(node):
if node is valid: put(' ')
for letter in A-Z:
if node has a child under letter:
put(letter)
serialize(child)
put('\b')

This isn't a magic bullet, but you can probably reduce your runtime slightly by doing one big memory allocation instead of a bunch of little ones.
I saw a ~10% speedup in the test code below (C++, not Java, sorry) when I used a "node pool" instead of relying on individual allocations:
#include <string>
#include <fstream>
#define USE_NODE_POOL
#ifdef USE_NODE_POOL
struct Node;
Node *node_pool;
int node_pool_idx = 0;
#endif
struct Node {
void insert(const std::string &s) { insert_helper(s, 0); }
void insert_helper(const std::string &s, int idx) {
if (idx >= s.length()) return;
int char_idx = s[idx] - 'A';
if (children[char_idx] == nullptr) {
#ifdef USE_NODE_POOL
children[char_idx] = &node_pool[node_pool_idx++];
#else
children[char_idx] = new Node();
#endif
}
children[char_idx]->insert_helper(s, idx + 1);
}
Node *children[26] = {};
};
int main() {
#ifdef USE_NODE_POOL
node_pool = new Node[400000];
#endif
Node n;
std::ifstream fin("TWL06.txt");
std::string word;
while (fin >> word) n.insert(word);
}

Tries that prealloate space all possible children (256) have a huge amount of wasted space. You are making your cache cry. Store those pointers to children in a resizable data structure.
Some tries will optimize by having one node to represent a long string, and break that string up only when needed.

Instead of a simple file you can use a database like sqlite and a nested set or celko tree to store the trie and you can also build a faster and shorter (less nodes) trie with a ternary search trie.

I don't like the idea of addressing nodes by index in array, but only because it requires one more addition (index to the pointer). But with array of preallocated nodes you will maybe save some time on allocation and initialization. And you can also save a lot of space by reserving first 26 indices for leaf nodes. Thus you'll not need to allocate and initialize 180000 leaf nodes.
Also with indices you will be able to read the prepared nodes array from disk in binary format. This has to be several times faster. But I'm not sure how to do this on your language. Is this Java?
If you checked that your source vocabulary is sorted, you may also save some time by comparing some prefix of the current string with the previous one. E.g. first 4 characters. If they are equal you can start your
for(int level=0 ; level < key.length() ; level++) {
loop from the 5-th level.

Is it space inefficient or time inefficient? If you are rolling a plain trie then space may be part of the problem when dealing with a mobil device. Check out patricia/radix tries, especially if you are using it as a prefix look-up tool.
Trie:
http://en.wikipedia.org/wiki/Trie
Patricia/Radix trie:
http://en.wikipedia.org/wiki/Radix_tree
You didn't mention a language but here are two implementations of prefix tries in Java.
Regular trie:
http://github.com/phishman3579/java-algorithms-implementation/blob/master/src/com/jwetherell/algorithms/data_structures/Trie.java
Patricia/Radix (space-effecient) trie:
http://github.com/phishman3579/java-algorithms-implementation/blob/master/src/com/jwetherell/algorithms/data_structures/PatriciaTrie.java

Generally speaking, avoid using a lot of object creations from scratch in Java, which is both slow and it also has a massive overhead. Better implement your own pooling class for memory management that allocates e.g. half a million entries at a time in one go.
Also, serialization is too slow for large lexicons. Use a binary read to populate array-based representations proposed above quickly.

Exiting a non void method? (Java)

I was wonder how you would go about exiting a method early. If this was a void type, I would just do "return", however since this is an int type, it wants me to return an integer. How do I return to main without returning any integers. Thanks.
public static int binarysearch(String[] myArray, String target, int first, int last)
{
int index;
if (first > last)
{
index = -1;
System.out.println("That is not in the array");
// Return to main here
}

You can't return from a method without a return value, in the traditional sense.
You can either return -1; and declare in your documentation that -1 represents a failed search, or you can throw an exception. If you throw an exception, though, you'll need to catch it. You can read more about that in the linked article.

A couple of options....
1. break;
2. return a value that you know would not be returned from a valid result. Eg 99999999 or -34
Those would be simple choices....
Edit
Of course break would only exit the loop. So youd still need to return a known value.

Refactoring many nested ifs or chained if statements

I have an object with large number of similar fields (like more than 10 of them) and I have to assign them values from an array of variable length. The solution would be either a huge nested bunch of ifs based on checking length of array each time and assigning each field
OR
a chain of ifs checking on whether the length is out of bounds and assigning each time after that check.
Both seem to be repetitive. Is there a better solution ?

If you language has switch/case with fallthrough, you could do it like this:
switch(array.length){
case 15: field14 = array[14];
case 14: field13 = array[13];
case 13: field12 = array[12];
// etc.
case 1: field0 = array[0];
case 0: break;
default: throw Exception("array too long!");
}

for (int i = 0; i < fieldCount; i++)
fields[i].value = array[i];
That is to say, maintain an array of fields that corresponds to your array of values.

If your language supports delegates, anonymous functions, that sort of thing, you can use those to clean it up. For example, in C# you could write this:
string[] values = GetValues();
SomeObject result = new SomeObject();
Apply(values, 0, v => result.ID = v);
Apply(values, 1, v => result.FirstName = v);
Apply(values, 2, v => result.LastName = v);
// etc.
The apply method would look like:
void Apply(string[] values, int index, Action<string> action)
{
if (index < values.Length)
action(values[index]);
}
This is obviously language-dependent, but something to think about regardless.
Another very simple option that we might be overlooking is, if you are actually trying to initialize an object from this value array (as opposed to update an existing object), to just accept the default values if the array isn't large enough.
C# example:
void CreateMyObject(object[] values)
{
MyObject o = new MyObject();
o.ID = GetValueOrDefault<int>(values, 0);
o.FirstName = GetValueOrDefault<string>(values, 0);
o.LastName = GetValueOrDefault<string>(values, 0);
// etc.
}
void GetValueOrDefault<T>(object[] values, int index)
{
if (index < values.Length)
return (T)values[index];
return default(T);
}
Sometimes the dumb solution is the smartest choice.

If your fields are declared in the same order of the array's elements, you could use reflection (if available in your language) to set these values. Here is an example of how you could do it in Java:
// obj is your object, values is the array of values
Field[] fields = obj.getClass().getFields();
for (int i = 0; i < fields.length && i < values.length; ++i) {
fields[i].set(obj, values[i]);
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Datastructure ambiguity - algorithm

Related

CPLEX OPL Returning a set index with a condition

Methods, more than one return?

Build trie faster

Exiting a non void method? (Java)

Refactoring many nested ifs or chained if statements

Categories

Resources