String set implementation - algorithm

I have to implement a set ADT for a pair of strings. The interface I want is (in Java):
public interface StringSet {
void add(String a, String b);
boolean contains(String a, String b);
void remove(String a, String b);
}
The data access pattern has the following properties:
The contains operation is far more frequent that the add and remove ones.
More often that not, contains returns true i.e. the search is successful
A simple implementation I can think of is to use a two-level hashtable, i.e. HashMap<String, HashMap<String, Boolean>>. But this datastructure makes no use of the two peculiarities of the access pattern. I am wondering if there is something more efficient than the hashtable, maybe by leveraging the access pattern peculiarities.

Personally, I would design this in terms of a standard Set<> interface:
public class StringPair {
public StringPair(String a, String b) {
a_ = a;
b_ = b;
hash_ = (a_ + b_).hashCode();
}
public boolean equals(StringPair pair) {
return (a_.equals(pair.a_) && b_.equals(pair.b_));
}
#Override
public boolean equals(Object obj) {
if (obj instanceof StringPair) {
return equals((StringPair) obj);
}
return false;
}
#Override
public int hashCode() {
return hash_;
}
private String a_;
private String b_;
private int hash_;
}
public class StringSetImpl implements StringSet {
public StringSetImpl(SetFactory factory) {
pair_set_ = factory.createSet<StringPair>();
}
// ...
private Set<StringPair> pair_set_ = null;
}
Then you could leave it up to the user of StringSetImpl to use the preferred Set type. If you are attempting to optimize access, though, it's hard to do better than a HashSet<> (at least with respect to runtime complexity), given that access is O(1), whereas tree-based sets have O(log N) access times.
That contains() usually returns true may make it worth considering a Bloom filter, although this would require that some number of false positives for contains() are allowed (don't know if that is the case).
Edit
To avoid the extra allocation, you can do something like this, which is similar to your two-level approach, except using a set rather than a map for the second level:
public class StringSetImpl implements StringSet {
public StringSetImpl() {
elements_ = new HashMap<String, Set<String>>();
}
public boolean contains(String a, String b) {
if (!elements_.containsKey(a)) {
return false;
}
Set<String> set = elements_.get(a);
if (set == null) {
return false;
}
return set.contains(b);
}
public void add(String a, String b) {
if (!elements_.containsKey(a) || elements_.get(a) == null) {
elements_.put(a, new HashSet<String>());
}
elements_.get(a).add(b);
}
public void remove(String a, String b) {
if (!elements_.containsKey(a)) {
return;
}
HashSet<String> set = elements_.get(a);
if (set == null) {
elements_.remove(a);
return a;
}
set.remove(b);
if (set.empty()) {
elements_.remove(a);
}
}
private Map<String, Set<String>> elements_ = null;
}
Since it's 4:20 AM where I'm located, the above is definitely not my best work (too tired to refresh myself on the treatment of null by these different collections types), but it sketches the approach.

Do not use normal trees (most standard library data structures) for this. There is one simple assumption, which will hurt you in this case:
The normal O(log(n)) calculation of operations on trees assume that comparisons are in O(1). This is true for integers and most other keys, but not for strings. In case of strings each comparison is on O(k) where k is the length of the string. This makes all operations dependent on the length, which will most likely hurt you if you need to be fast and is easily overlooked.
Especially if you most often return true there will be k comparisons for each string at each level, so with this access pattern you will experience the full drawback of strings in trees.
Your access pattern is easily handled by a Trie. Testing if a string is contained is in O(k) worst case (not average case as in a hash map). Adding a string is is also in O(k). Since you are storing two strings I would suggest, you don't index your trie by characters, but rather by some larger type, so you can add two special index values. One value for the end of the first string, and one value for the end of both strings.
In your case using these two extra symbols would also allow for simple removal: Just delete the final node containing the end symbol and your string will not be found anymore. You will waste some memory, because you still have the strings in your structure that have been deleted. In case this is a problem you could keep track of the number of deleted strings and rebuild your trie in case this get's to bad.
P.s. A trie can be thought of as a combination of a tree and several hashtables, so this gives you the best of both data structures.

I'd second the approach of Michael Aaron Safyan to use a StringPair type. Perhaps with a more specific name, or as a generic tuple type: Tuple<A,B> instantiated to Tuple<String,String>. But I would strongly suggest to use one of the provided set implementations, either a HashSet or a TreeSet.

Red-Black Tree implementation of the set would be a good option. C++ STL is implemented in Red-Black Tree

Related

A algorithm to track the status of a number

To design a API,
get(), it will return the random number, also the number should not duplicate, means it always be unique.
put(randomvalue), it will put back the generated random number from get(), if put back, get() function can reuse this number as output.
It has to be efficient, no too much resource is highly used.
Is there any way to implement this algorithm? It is not recommended to use hashmap, because if this API generate for billions of requests, saving the generated the random number still use too much space.
I could no work out this algorithm, please help give a clue, thanks in advance!
I cannot think of any solution without extra space. With space, one option could be to use TreeMap, firstly add all the elements in treeMap with as false. When element is accessed, mark as true. Similarly for put, change the value to false.
Code snippet below...
public class RandomNumber {
public static final int SIZE = 100000;
public static Random rand;
public static TreeMap<Integer, Boolean> treeMap;
public RandomNumber() {
rand = new Random();
treeMap = new TreeMap<>();
}
static public int getRandom() {
while (true) {
int random = rand.nextInt(SIZE);
if (!treeMap.get(random)) {
treeMap.put(random, true);
return random;
}
}
}
static public void putRandom(int number) {
treeMap.put(number, false);
}
}

The Cyclomatic Complexity of this method "setHeader" is 139 which is greater than 10 authorized

I have 139 switch cases in setHeader
private static void setHeader(String headertableField, String headerValue) {
switch (headertableField) {
case AUS:
headerDTO.setAudval(StringUtils.getTrimValueAfterNullCheck(headerValue));
break;
case AXL:
headerDTO.setAxlfieldl(StringUtils.getTrimValueAfterNullCheck(headerValue));
break;
................
..................
default:
break;
}
}
It shows sonar issues. Can you please suggest any solution to reduce complexity.
Eugene's answer is pretty good, but you can go step further and use the same logic inside enum
enum HeaderField {
AUS(HeaderDTO::setAudval),
AXL(HeaderDTO::setAxlfieldl);
private BiConsumer<HeaderDTO, String> fieldSetter;
HeaderField(BiConsumer<HeaderDTO, String> setter) {
fieldSetter= setter;
}
public void setField(HeaderDTO headerDTO, String value) {
fieldSetter.accept(headerDTO, value);
}
}
Then you can use it:
HeaderField.AUS.setField(headerDTO, "value");
HeaderField.AXL.setField(headerDTO, "axl");
A little bit of background first:
When you use case/switch with String, it's not a plain if/else/equals check. Internally (unlike for int types for example), first hashCode is computed on the String you are switching on and a lookupswitch is invoked on that.
If this hashCode value is equal to one of the values present in the case statements (it means that potentially this String is equal to the one you are looking for), do a another lookupswitch on a predefined value that depends on where you are "coming from" (previous lookupswitch tells you where to jump).
Anyway, doing String switch is actually a lookupswitch that is O(1) (even if doing two of them).
What you can do, is hide away this complexity at the same price of O(1) via a Map.
Map<String, BiConsumer<HeaderDTO, String>> MAP = Map.of(
"AUS", (x, y) -> x.setAudval(StringUtils.getTrimValueAfterNullCheck(y))
// all other cases
)
and then simply having this map do a :
private static void setHeader(String headertableField, String headerValue){
MAP.get(headertableField).accept(headerDTO, headerValue);
}

Use hashCode() for sorting Objects in java, not in HashTable and ect

I need your help.
If i want to sort a PriorityQeueu in java, with out connection to it's attributes - could i use the hashCode's Objects to compare?
This how i did it:
comp = new Comparator<Person>() {
#Override
public int compare(Person p1, Person p2) {
if(p1.hashCode() < p2.hashCode()) return 1;
if(p1.hashCode() == p2.hashCode()) return 0;
return -1;
}
};
collector = new PriorityQueue<Person>(comp);
It doesn't sound like a good approach.
Default hashCode() is typically implemented by converting the internal address of the object into an integer. So the order of objects will differ between application executions.
Also, 2 objects with the same set of attribute values will not return the same hashCode value unless you override the implementation. This actually breaks the expected contract of Comparable.

Tranversing and filtering a Set comparing its objects' getters to an Array using Stream

I've got some working, inelegant code here:
The custom object is:
public class Person {
private int id;
public getId() { return this.id }
}
And I have a Class containing a Set<Person> allPersons containing all available subjects. I want to extract a new Set<Person> based upon one or more ID's of my choosing. I've written something which works using a nested enhanced for loop, but it strikes me as inefficient and will make a lot of unnecessary comparisons. I am getting used to working with Java 8, but can't quite figure out how to compare the Set against an Array. Here is my working, but verbose code:
public class MyProgram {
private Set<Person> allPersons; // contains 100 people with Ids 1-100
public Set<Person> getPersonById(int[] ids) {
Set<Person> personSet = new HashSet<>() //or any type of set
for (int i : ids) {
for (Person p : allPersons) {
if (p.getId() == i) {
personSet.add(p);
}
}
}
return personSet;
}
}
And to get my result, I'd call something along the lines of:
Set<Person> resultSet = getPersonById(int[] intArray = {2, 56, 66});
//resultSet would then contain 3 people with the corresponding ID
My question is how would i convert the getPersonById method to something using which streams allPersons and finds the ID match of any one of the ints in its parameter array? I thought of some filter operation, but since the parameter is an array, I can't get it to take just the one I want only.
The working answer to this is:
return allPersons.stream()
.filter(p -> (Arrays.stream(ids).anyMatch(i -> i == p.getId())) )
.collect(Collectors.toSet());
However, using the bottom half of #Flown's suggestion and if the program was designed to have a Map - it would also work (and work much more efficiently)
As you said, you can introduce a Stream::filter step using a Stream::anyMatch operation.
public Set<Person> getPersonById(int[] ids) {
Objects.requireNonNull(ids);
if (ids.length == 0) {
return Collections.emptySet();
}
return allPersons.stream()
.filter(p -> IntStream.of(ids).anyMatch(i -> i == p.getId()))
.collect(Collectors.toSet());
}
If the method is called more often, then it would be a good idea to map each Person to its id having a Map<Integer, Person>. The advantage is, that the lookup is much faster than iterating over the whole set of Person.Then your algorithm may look like this:
private Map<Integer, Person> idMapping;
public Set<Person> getPersonById(int[] ids) {
Objects.requireNonNull(ids);
return IntStream.of(ids)
.filter(idMapping::containsKey)
.mapToObj(idMapping::get)
.collect(Collectors.toSet());
}

Complex conditional filter design

I'm stuck at implementing some conditional rules in a form in the backend. Basically i need to come up with an efficient and scalable way of doing this. I was looking into binary trees and decision trees for this one but still not sure what's the best way to implement this.
As you can see there's one statement with the possibility of more than one conditions separated by logical AND/OR. Basically what i need to know is the data structure to store this information in the database. It will act as a filter when a form is submitted by the user based on the form values when it goes through the filter some action will take place as a result.
Your question is a bit generic, but let me see if I can help you get started. In Java, you can set up a class structure as follows :
interface ConditionTree {
boolean evaluate(...);
}
class OperatorNode implements ConditionTree {
List<ConditionTree> subTrees = ....;
#Override
boolean evaluate(...) {
if(operator == AND) {
//loop through and evaluate each sub tree and make sure they are all true,
//or return false
}
else if(operator == OR) {
//loop through and evaluate each sub tree, and make sure at least one is
//true, or return false
}
}
}
class LeafNode implements ConditionTree {
#Override
boolean evaluate(...) {
//get the LHS, operator, and RHS, and evaluate them
}
}
Notice it's an N-ary tree, not a binary tree.

Resources