Efficient tuple search algorithm - algorithm

Given a store of 3-tuples where:
All elements are numeric ex :( 1, 3, 4) (1300, 3, 15) (1300, 3, 15) …
Tuples are removed and added frequently
At any time the store is typically under 100,000 elements
All Tuples are available in memory
The application is interactive requiring 100s of searches per second.
What are the most efficient algorithms/data structures to perform wild card (*) searches such as:
(1, *, 6) (3601, *, *) (*, 1935, *)
The aim is to have a Linda like tuple space but on an application level

Well, there are only 8 possible arrangements of wildcards, so you can easily construct 6 multi-maps and a set to serve as indices: one for each arrangement of wildcards in the query. You don't need an 8th index because the query (*,*,*) trivially returns all tuples. The set is for tuples with no wildcards; only a membership test is needed in this case.
A multimap takes a key to a set. In your example, e.g., the query (1,*,6) would consult the multimap for queries of the form (X,*,Y), which takes key <X,Y> to the set of all tuples with X in the first position and Y in third. In this case, X=1 and Y=6.
With any reasonable hash-based multimap implementation, lookups ought to be very fast. Several hundred a second ought to be easy, and several thousand per second doable (with e.g a contemporary x86 CPU).
Insertions and deletions require updating the maps and set. Again this ought to be reasonably fast, though not as fast as lookups of course. Again several hundred per second ought to be doable.
With only ~10^5 tuples, this approach ought to be fine for memory as well. You can save a bit of space with tricks, e.g. keeping a single copy of each tuple in an array and storing indices in the map/set to represent both key and value. Manage array slots with a free list.
To make this concrete, here is pseudocode. I'm going to use angle brackets <a,b,c> for tuples to avoid too many parens:
# Definitions
For a query Q <k2,k1,k0> where each of k_i is either * or an integer,
Let I(Q) be a 3-digit binary number b2|b1|b0 where
b_i=0 if k_i is * and 1 if k_i is an integer.
Let N(i) be the number of 1's in the binary representation of i
Let M(i) be a multimap taking a tuple with N(i) elements to a set
of tuples with 3 elements.
Let t be a 3 element tuple. Then T(t,i) returns a new tuple with
only the elements of t in positions where i has a 1. For example
T(<1,2,3>,0) = <> and T(<1,2,3>,6) = <2,3>
Note that function T works fine on query tuples with wildcards.
# Algorithm to insert tuple T into the database:
fun insert(t)
for i = 0 to 7
add the entry T(t,i)->t to M(i)
# Algorithm to delete tuple T from the database:
fun delete(t)
for i = 0 to 7
delete the entry T(t,i)->t from M(i)
# Query algorithm
fun query(Q)
let i = I(Q)
return M(i).lookup(T(Q, i)) # lookup failure returns empty set
Note that for simplicity, I've not shown the "optimizations" for M(0) and M(7). For M(0), the algorithm above would create a multimap taking the empty tuple to the set of all 3-tuples in the database. You can avoid this merely by treating i=0 as a special case. Similarly M(7) would take each tuple to a set containing only itself.
An "optimized" version:
fun insert(t)
for i = 1 to 6
add the entry T(t,i)->t to M(i)
add t to set S
fun delete(t)
for i = 1 to 6
delete the entry T(t,i)->t from M(i)
remove t from set S
fun query(Q)
let i = I(Q)
if i = 0, return S
elsif i = 7 return if Q\in S { Q } else {}
else return M(i).lookup(T(Q, i))
Addition
For fun, a Java implementation:
package hacking;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Random;
import java.util.Scanner;
import java.util.Set;
public class Hacking {
public static void main(String [] args) {
TupleDatabase db = new TupleDatabase();
int n = 200000;
long start = System.nanoTime();
for (int i = 0; i < n; ++i) {
db.insert(db.randomTriple());
}
long stop = System.nanoTime();
double elapsedSec = (stop - start) * 1e-9;
System.out.println("Inserted " + n + " tuples in " + elapsedSec
+ " seconds (" + (elapsedSec / n * 1000.0) + "ms per insert).");
Scanner in = new Scanner(System.in);
for (;;) {
System.out.print("Query: ");
int a = in.nextInt();
int b = in.nextInt();
int c = in.nextInt();
System.out.println(db.query(new Tuple(a, b, c)));
}
}
}
class Tuple {
static final int [] N_ONES = new int[] { 0, 1, 1, 2, 1, 2, 2, 3 };
static final int STAR = -1;
final int [] vals;
Tuple(int a, int b, int c) {
vals = new int[] { a, b, c };
}
Tuple(Tuple t, int code) {
vals = new int[N_ONES[code]];
int m = 0;
for (int k = 0; k < 3; ++k) {
if (((1 << k) & code) > 0) {
vals[m++] = t.vals[k];
}
}
}
#Override
public boolean equals(Object other) {
if (other instanceof Tuple) {
Tuple triple = (Tuple) other;
return Arrays.equals(this.vals, triple.vals);
}
return false;
}
#Override
public int hashCode() {
return Arrays.hashCode(this.vals);
}
#Override
public String toString() {
return Arrays.toString(vals);
}
int code() {
int c = 0;
for (int k = 0; k < 3; k++) {
if (vals[k] != STAR) {
c |= (1 << k);
}
}
return c;
}
Set<Tuple> setOf() {
Set<Tuple> s = new HashSet<>();
s.add(this);
return s;
}
}
class Multimap extends HashMap<Tuple, Set<Tuple>> {
#Override
public Set<Tuple> get(Object key) {
Set<Tuple> r = super.get(key);
return r == null ? Collections.<Tuple>emptySet() : r;
}
void put(Tuple key, Tuple value) {
if (containsKey(key)) {
super.get(key).add(value);
} else {
super.put(key, value.setOf());
}
}
void remove(Tuple key, Tuple value) {
Set<Tuple> set = super.get(key);
set.remove(value);
if (set.isEmpty()) {
super.remove(key);
}
}
}
class TupleDatabase {
final Set<Tuple> set;
final Multimap [] maps;
TupleDatabase() {
set = new HashSet<>();
maps = new Multimap[7];
for (int i = 1; i < 7; i++) {
maps[i] = new Multimap();
}
}
void insert(Tuple t) {
set.add(t);
for (int i = 1; i < 7; i++) {
maps[i].put(new Tuple(t, i), t);
}
}
void delete(Tuple t) {
set.remove(t);
for (int i = 1; i < 7; i++) {
maps[i].remove(new Tuple(t, i), t);
}
}
Set<Tuple> query(Tuple q) {
int c = q.code();
switch (c) {
case 0: return set;
case 7: return set.contains(q) ? q.setOf() : Collections.<Tuple>emptySet();
default: return maps[c].get(new Tuple(q, c));
}
}
Random gen = new Random();
int randPositive() {
return gen.nextInt(1000);
}
Tuple randomTriple() {
return new Tuple(randPositive(), randPositive(), randPositive());
}
}
Some output:
Inserted 200000 tuples in 2.981607358 seconds (0.014908036790000002ms per insert).
Query: -1 -1 -1
[[504, 296, 987], [500, 446, 184], [499, 482, 16], [488, 823, 40], ...
Query: 500 446 -1
[[500, 446, 184], [500, 446, 762]]
Query: -1 -1 500
[[297, 56, 500], [848, 185, 500], [556, 351, 500], [779, 986, 500], [935, 279, 500], ...

If you think of the tuples like a ip address, then a radix tree (trie) type structure might work. Radix tree is used for IP discovery.
Another way maybe to calculate use bit operations and calculate a bit hash for the tuple and in your search do bit (or, and) for quick discovery.

Related

Generate ordered list of sum between elements in large lists

I'm not sure whether this question should be posted in math of overflow, but here we go.
I have an arbitrary amount of ordered lists (say 3 for example) with numerical values. These lists can be long enough that trying all combinations of values becomes too computationally heavy.
What I need is to get an ordered list of possible sums when picking one value from each of the lists. Since the lists can be large, I only want the N smallest sums.
What I've considered is to step down one of the lists for each iteration. This however misses many cases that would have been possible if another list would have been chosen for that step.
An alternative would be a recursive solution, but that would generate many duplicate cases instead.
Is there any known methods that could solve such a problem?
Let we have K lists.
Make min-heap.
a) Push a structure contaning sum of elements from every list (the first ones at this elements) and list of indexes key = Sum(L[i][0]), [ix0=0, ix1=0, ix2=0]
b) Pop the smallest element from the heap, output key (sum) value
c) Construct K new elements from popped one - for every increment corresponding index and update sum
key - L[0][ix0] + L[0][ix0 + 1], [ix0 + 1, ix1, ix2]
key - L[1][ix1] + L[1][ix1 + 1], [ix0, ix1 + 1, ix2]
same for ix2
d) Push them into the heap
e) Repeat from b) until N smallest sums are extracted
A Java implementation of the min heap algorithm with a simple test case:
The algorithm itself is just as described by #MBo.
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.PriorityQueue;
class MinHeapElement {
int sum;
List<Integer> idx;
}
public class SumFromKLists {
public static List<Integer> sumFromKLists(List<List<Integer>> lists, int N) {
List<Integer> ans = new ArrayList<>();
if(N == 0) {
return ans;
}
PriorityQueue<MinHeapElement> minPq = new PriorityQueue<>(new Comparator<MinHeapElement>() {
#Override
public int compare(MinHeapElement e1, MinHeapElement e2) {
return e1.sum - e2.sum;
}
});
MinHeapElement smallest = new MinHeapElement();
smallest.idx = new ArrayList<>();
for(int i = 0; i < lists.size(); i++) {
smallest.sum += lists.get(i).get(0);
smallest.idx.add(0);
}
minPq.add(smallest);
ans.add(smallest.sum);
while(ans.size() < N) {
MinHeapElement curr = minPq.poll();
if(ans.get(ans.size() - 1) != curr.sum) {
ans.add(curr.sum);
}
List<MinHeapElement> candidates = nextPossibleCandidates(lists, curr);
if(candidates.size() == 0) {
break;
}
minPq.addAll(candidates);
}
return ans;
}
private static List<MinHeapElement> nextPossibleCandidates(List<List<Integer>> lists, MinHeapElement minHeapElement) {
List<MinHeapElement> candidates = new ArrayList<>();
for(int i = 0; i < lists.size(); i++) {
List<Integer> currList = lists.get(i);
int newIdx = minHeapElement.idx.get(i) + 1;
while(newIdx < currList.size() && currList.get(newIdx) == currList.get(newIdx - 1)) {
newIdx++;
}
if(newIdx < currList.size()) {
MinHeapElement nextElement = new MinHeapElement();
nextElement.sum = minHeapElement.sum + currList.get(newIdx) - currList.get(minHeapElement.idx.get(i));
nextElement.idx = new ArrayList<>(minHeapElement.idx);
nextElement.idx.set(i, newIdx);
candidates.add(nextElement);
}
}
return candidates;
}
public static void main(String[] args) {
List<Integer> list1 = new ArrayList<>();
list1.add(2); list1.add(4); list1.add(7); list1.add(8);
List<Integer> list2 = new ArrayList<>();
list2.add(1); list2.add(3); list2.add(5); list2.add(8);
List<List<Integer>> lists = new ArrayList<>();
lists.add(list1); lists.add(list2);
sumFromKLists(lists, 11);
}
}

Dynamic programming: Algorithm to solve the following?

I have recently completed the following interview exercise:
'A robot can be programmed to run "a", "b", "c"... "n" kilometers and it takes ta, tb, tc... tn minutes, respectively. Once it runs to programmed kilometers, it must be turned off for "m" minutes.
After "m" minutes it can again be programmed to run for a further "a", "b", "c"... "n" kilometers.
How would you program this robot to go an exact number of kilometers in the minimum amount of time?'
I thought it was a variation of the unbounded knapsack problem, in which the size would be the number of kilometers and the value, the time needed to complete each stretch. The main difference is that we need to minimise, rather than maximise, the value. So I used the equivalent of the following solution: http://en.wikipedia.org/wiki/Knapsack_problem#Unbounded_knapsack_problem
in which I select the minimum.
Finally, because we need an exact solution (if there is one), over the map constructed by the algorithm for all the different distances, I iterated through each and trough each robot's programmed distance to find the exact distance and minimum time among those.
I think the pause the robot takes between runs is a bit of a red herring and you just need to include it in your calculations, but it does not affect the approach taken.
I am probably wrong, because I failed the test. I don't have any other feedback as to the expected solution.
Edit: maybe I wasn't wrong after all and I failed for different reasons. I just wanted to validate my approach to this problem.
import static com.google.common.collect.Sets.*;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import org.apache.log4j.Logger;
import com.google.common.base.Objects;
import com.google.common.base.Preconditions;
import com.google.common.collect.Lists;
import com.google.common.collect.Maps;
public final class Robot {
static final Logger logger = Logger.getLogger (Robot.class);
private Set<ProgrammedRun> programmedRuns;
private int pause;
private int totalDistance;
private Robot () {
//don't expose default constructor & prevent subclassing
}
private Robot (int[] programmedDistances, int[] timesPerDistance, int pause, int totalDistance) {
this.programmedRuns = newHashSet ();
for (int i = 0; i < programmedDistances.length; i++) {
this.programmedRuns.add (new ProgrammedRun (programmedDistances [i], timesPerDistance [i] ) );
}
this.pause = pause;
this.totalDistance = totalDistance;
}
public static Robot create (int[] programmedDistances, int[] timesPerDistance, int pause, int totalDistance) {
Preconditions.checkArgument (programmedDistances.length == timesPerDistance.length);
Preconditions.checkArgument (pause >= 0);
Preconditions.checkArgument (totalDistance >= 0);
return new Robot (programmedDistances, timesPerDistance, pause, totalDistance);
}
/**
* #returns null if no strategy was found. An empty map if distance is zero. A
* map with the programmed runs as keys and number of time they need to be run
* as value.
*
*/
Map<ProgrammedRun, Integer> calculateOptimalStrategy () {
//for efficiency, consider this case first
if (this.totalDistance == 0) {
return Maps.newHashMap ();
}
//list of solutions for different distances. Element "i" of the list is the best set of runs that cover at least "i" kilometers
List <Map<ProgrammedRun, Integer>> runsForDistances = Lists.newArrayList();
//special case i = 0 -> empty map (no runs needed)
runsForDistances.add (new HashMap<ProgrammedRun, Integer> () );
for (int i = 1; i <= totalDistance; i++) {
Map<ProgrammedRun, Integer> map = new HashMap<ProgrammedRun, Integer> ();
int minimumTime = -1;
for (ProgrammedRun pr : programmedRuns) {
int distance = Math.max (0, i - pr.getDistance ());
int time = getTotalTime (runsForDistances.get (distance) ) + pause + pr.getTime();
if (minimumTime < 0 || time < minimumTime) {
minimumTime = time;
//new minimum found
map = new HashMap<ProgrammedRun, Integer> ();
map.putAll(runsForDistances.get (distance) );
//increase count
Integer num = map.get (pr);
if (num == null) num = Integer.valueOf (1);
else num++;
//update map
map.put (pr, num);
}
}
runsForDistances.add (map );
}
//last step: calculate the combination with exact distance
int minimumTime2 = -1;
int bestIndex = -1;
for (int i = 0; i <= totalDistance; i++) {
if (getTotalDistance (runsForDistances.get (i) ) == this.totalDistance ) {
int time = getTotalTime (runsForDistances.get (i) );
if (time > 0) time -= pause;
if (minimumTime2 < 0 || time < minimumTime2 ) {
minimumTime2 = time;
bestIndex = i;
}
}
}
//if solution found
if (bestIndex != -1) {
return runsForDistances.get (bestIndex);
}
//try all combinations, since none of the existing maps run for the exact distance
List <Map<ProgrammedRun, Integer>> exactRuns = Lists.newArrayList();
for (int i = 0; i <= totalDistance; i++) {
int distance = getTotalDistance (runsForDistances.get (i) );
for (ProgrammedRun pr : programmedRuns) {
//solution found
if (distance + pr.getDistance() == this.totalDistance ) {
Map<ProgrammedRun, Integer> map = new HashMap<ProgrammedRun, Integer> ();
map.putAll (runsForDistances.get (i));
//increase count
Integer num = map.get (pr);
if (num == null) num = Integer.valueOf (1);
else num++;
//update map
map.put (pr, num);
exactRuns.add (map);
}
}
}
if (exactRuns.isEmpty()) return null;
//finally return the map with the best time
minimumTime2 = -1;
Map<ProgrammedRun, Integer> bestMap = null;
for (Map<ProgrammedRun, Integer> m : exactRuns) {
int time = getTotalTime (m);
if (time > 0) time -= pause; //remove last pause
if (minimumTime2 < 0 || time < minimumTime2 ) {
minimumTime2 = time;
bestMap = m;
}
}
return bestMap;
}
private int getTotalTime (Map<ProgrammedRun, Integer> runs) {
int time = 0;
for (Map.Entry<ProgrammedRun, Integer> runEntry : runs.entrySet()) {
time += runEntry.getValue () * runEntry.getKey().getTime ();
//add pauses
time += this.pause * runEntry.getValue ();
}
return time;
}
private int getTotalDistance (Map<ProgrammedRun, Integer> runs) {
int distance = 0;
for (Map.Entry<ProgrammedRun, Integer> runEntry : runs.entrySet()) {
distance += runEntry.getValue() * runEntry.getKey().getDistance ();
}
return distance;
}
class ProgrammedRun {
private int distance;
private int time;
private transient float speed;
ProgrammedRun (int distance, int time) {
this.distance = distance;
this.time = time;
this.speed = (float) distance / time;
}
#Override public String toString () {
return "(distance =" + distance + "; time=" + time + ")";
}
#Override public boolean equals (Object other) {
return other instanceof ProgrammedRun
&& this.distance == ((ProgrammedRun)other).distance
&& this.time == ((ProgrammedRun)other).time;
}
#Override public int hashCode () {
return Objects.hashCode (Integer.valueOf (this.distance), Integer.valueOf (this.time));
}
int getDistance() {
return distance;
}
int getTime() {
return time;
}
float getSpeed() {
return speed;
}
}
}
public class Main {
/* Input variables for the robot */
private static int [] programmedDistances = {1, 2, 3, 5, 10}; //in kilometers
private static int [] timesPerDistance = {10, 5, 3, 2, 1}; //in minutes
private static int pause = 2; //in minutes
private static int totalDistance = 41; //in kilometers
/**
* #param args
*/
public static void main(String[] args) {
Robot r = Robot.create (programmedDistances, timesPerDistance, pause, totalDistance);
Map<ProgrammedRun, Integer> strategy = r.calculateOptimalStrategy ();
if (strategy == null) {
System.out.println ("No strategy that matches the conditions was found");
} else if (strategy.isEmpty ()) {
System.out.println ("No need to run; distance is zero");
} else {
System.out.println ("Strategy found:");
System.out.println (strategy);
}
}
}
Simplifying slightly, let ti be the time (including downtime) that it takes the robot to run distance di. Assume that t1/d1 ≤ … ≤ tn/dn. If t1/d1 is significantly smaller than t2/d2 and d1 and the total distance D to be run are large, then branch and bound likely outperforms dynamic programming. Branch and bound solves the integer programming formulation
minimize ∑i ti xi
subject to
∑i di xi = D
∀i xi &in; N
by using the value of the relaxation where xi can be any nonnegative real as a guide. The latter is easily verified to be at most (t1/d1)D, by setting x1 to D/d1 and ∀i ≠ 1 xi = 0, and at least (t1/d1)D, by setting the sole variable of the dual program to t1/d1. Solving the relaxation is the bound step; every integer solution is a fractional solution, so the best integer solution requires time at least (t1/d1)D.
The branch step takes one integer program and splits it in two whose solutions, taken together, cover the entire solution space of the original. In this case, one piece could have the extra constraint x1 = 0 and the other could have the extra constraint x1 ≥ 1. It might look as though this would create subproblems with side constraints, but in fact, we can just delete the first move, or decrease D by d1 and add the constant t1 to the objective. Another option for branching is to add either the constraint xi = ⌊D/di⌋ or xi ≤ ⌊D/di⌋ - 1, which requires generalizing to upper bounds on the number of repetitions of each move.
The main loop of branch and bound selects one of a collection of subproblems, branches, computes bounds for the two subproblems, and puts them back into the collection. The efficiency over brute force comes from the fact that, when we have a solution with a particular value, every subproblem whose relaxed value is at least that much can be thrown away. Once the collection is emptied this way, we have the optimal solution.
Hybrids of branch and bound and dynamic programming are possible, for example, computing optimal solutions for small D via DP and using those values instead of branching on subproblems that have been solved.
Create array of size m and for 0 to m( m is your distance) do:
a[i] = infinite;
a[0] = 0;
a[i] = min{min{a[i-j] + tj + m for all j in possible kilometers of robot. and j≠i} , ti if i is in possible moves of robot}
a[m] is lowest possible value. Also you can have array like b to save a[i]s selection. Also if a[m] == infinite means it's not possible.
Edit: we can solve it in another way by creating a digraph, again our graph is dependent to m length of path, graph has nodes labeled {0..m}, now start from node 0 connect it to all possible nodes; means if you have a kilometer i you can connect 0 and vi with weight ti, except for node 0->x, for all other nodes you should connect node i->j with weight tj-i + m for j>i and j-i is available in input kilometers. now you should find shortest path from v0 to vn. but this algorithm still is O(nm).
Let G be the desired distance run.
Let n be the longest possible distance run without pause.
Let L = G / n (Integer arithmetic, discard fraction part)
Let R = G mod n (ie. The remainder from the above division)
Make the robot run it's longest distance (ie. n) L times, and then whichever distance (a, b, c, etc.) is greater than R by the least amount (ie the smallest available distance that is equal to or greater than R)
Either I understood the problem wrong, or you're all over thinking it
I am a big believer in showing instead of telling. Here is a program that may be doing what you are looking for. Let me know if it satisfies your question. Simply copy, paste, and run the program. You should of course test with your own data set.
import java.util.Arrays;
public class Speed {
/***
*
* #param distance
* #param sprints ={{A,Ta},{B,Tb},{C,Tc}, ..., {N,Tn}}
*/
public static int getFastestTime(int distance, int[][] sprints){
long[] minTime = new long[distance+1];//distance from 0 to distance
Arrays.fill(minTime,Integer.MAX_VALUE);
minTime[0]=0;//key=distance; value=time
for(int[] speed: sprints)
for(int d=1; d<minTime.length; d++)
if(d>=speed[0] && minTime[d] > minTime[d-speed[0]]+speed[1])
minTime[d]=minTime[d-speed[0]]+speed[1];
return (int)minTime[distance];
}//
public static void main(String... args){
//sprints ={{A,Ta},{B,Tb},{C,Tc}, ..., {N,Tn}}
int[][] sprints={{3,2},{5,3},{7,5}};
int distance = 21;
System.out.println(getFastestTime(distance,sprints));
}
}

What algorithm can I use to produce 'Random' value?

Say I have 4 possible results and the probabilities of each result appearing are
1 = 10%
2 = 20%
3 = 30%
4 = 40%
I'd like to write a method like GetRandomValue which if called 1000 times would return
1 x 100 times
2 x 200 times
3 x 300 times
4 x 400 times
Whats the name of an algorithm which would produce such results?
in your case you can generate a random number (int) within 1..10 and if it's 1 then select 1, if it's between 2-3 select 2 and if it's between 4..6 select 3 and if is between 7..10 select 4.
In all if you have some probabilities which sum to 1, you can have a random number within (0,1) distribute your generated result to related value (I simplified in your case within 1..10).
To get a random number you would use the Random class of .Net.
Something like the following would accomplish what you requested:
public class MyRandom
{
private Random m_rand = new Random();
public int GetNextValue()
{
// Gets a random value between 0-9 with equal probability
// and converts it to a number between 1-4 with the probablities requested.
switch (m_rand.Next(0, 9))
{
case 0:
return 1;
case 1: case 2:
return 2;
case 3: case 4: case 5:
return 3;
default:
return 4;
}
}
}
If you just want those probabilities in the long run, you can just get values by randomly selecting one element from the array {1,2,2,3,3,3,4,4,4,4}.
If you however need to retrieve exactly 1000 elements, in those specific quantities, you can try something like this (not C#, but shouldn't be a problem):
import java.util.Random;
import java.util.*;
class Thing{
Random r = new Random();
ArrayList<Integer> numbers=new ArrayList<Integer>();
ArrayList<Integer> counts=new ArrayList<Integer>();
int totalCount;
public void set(int i, int count){
numbers.add(i);
counts.add(count);
totalCount+=count;
}
public int getValue(){
if (totalCount==0)
throw new IllegalStateException();
double pos = r.nextDouble();
double z = 0;
int index = 0;
//we select elements using their remaining counts for probabilities
for (; index<counts.size(); index++){
z += counts.get(index) / ((double)totalCount);
if (pos<z)
break;
}
int result = numbers.get(index);
counts.set( index , counts.get(index)-1);
if (counts.get(index)==0){
counts.remove(index);
numbers.remove(index);
}
totalCount--;
return result;
}
}
class Test{
public static void main(String []args){
Thing t = new Thing(){{
set(1,100);
set(2,200);
set(3,300);
set(4,400);
}};
int[]hist=new int[4];
for (int i=0;i<1000;i++){
int value = t.getValue();
System.out.print(value);
hist[value-1]++;
}
System.out.println();
double sum=0;
for (int i=0;i<4;i++) sum+=hist[i];
for (int i=0;i<4;i++)
System.out.printf("%d: %d values, %f%%\n",i+1,hist[i], (100*hist[i]/sum));
}
}

search tree in scala

I'm trying to put my first steps into Scala, and to practice I took a look at the google code jam storecredit excersize. I tried it in java first, which went well enough, and now I'm trying to port it to Scala. Now with the java collections framework, I could try to do a straight syntax conversion, but I'd end up writing java in scala, and that kind of defeats the purpose. In my Java implementation, I have a PriorityQueue that I empty into a Deque, and pop the ends off untill we have bingo. This all uses mutable collections, which give me the feeling is very 'un-scala'. What I think would be a more functional approach is to construct a datastructure that can be traversed both from highest to lowest, and from lowest to highest. Am I on the right path? Are there any suitable datastructures supplied in the Scala libraries, or should I roll my own here?
EDIT: full code of the much simpler version in Java. It should run in O(max(credit,inputchars)) and has become:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
public class StoreCredit {
private static BufferedReader in;
public static void main(String[] args) {
in = new BufferedReader(new InputStreamReader(System.in));
try {
int numCases = Integer.parseInt(in.readLine());
for (int i = 0; i < numCases; i++) {
solveCase(i);
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static void solveCase(int casenum) throws NumberFormatException,
IOException {
int credit = Integer.parseInt(in.readLine());
int numItems = Integer.parseInt(in.readLine());
int itemnumber = 0;
int[] item_numbers_by_price = new int[credit];
Arrays.fill(item_numbers_by_price, -1); // makes this O(max(credit,
// items)) instead of O(items)
int[] read_prices = readItems();
while (itemnumber < numItems) {
int next_price = read_prices[itemnumber];
if (next_price <= credit) {
if (item_numbers_by_price[credit - next_price] >= 0) {
// Bingo! DinoDNA!
printResult(new int[] {
item_numbers_by_price[credit - next_price],
itemnumber }, casenum);
break;
}
item_numbers_by_price[next_price] = itemnumber;
}
itemnumber++;
}
}
private static int[] readItems() throws IOException {
String line = in.readLine();
String[] items = line.split(" "); // uh-oh, now it's O(max(credit,
// inputchars))
int[] result = new int[items.length];
for (int i = 0; i < items.length; i++) {
result[i] = Integer.parseInt(items[i]);
}
return result;
}
private static void printResult(int[] result, int casenum) {
int one;
int two;
if (result[0] > result[1]) {
one = result[1];
two = result[0];
} else {
one = result[0];
two = result[1];
}
one++;
two++;
System.out.println(String.format("Case #%d: %d %d", casenum + 1, one,
two));
}
}
I'm wondering what you are trying to accomplish using sophisticated data structures such as PriorityQueue and Deque for a problem such as this. It can be solved with a pair of nested loops:
for {
i <- 2 to I
j <- 1 until i
if i != j && P(i-1) + P(j - 1) == C
} println("Case #%d: %d %d" format (n, j, i))
Worse than linear, better than quadratic. Since the items are not sorted, and sorting them would require O(nlogn), you can't do much better than this -- as far as I can see.
Actually, having said all that, I now have figured a way to do it in linear time. The trick is that, for every number p you find, you know what its complement is: C - p. I expect there are a few ways to explore that -- I have so far thought of two.
One way is to build a map with O(n) characteristics, such as a bitmap or a hash map. For each element, make it point to its index. One then only has to find an element for which its complement also has an entry in the map. Trivially, this could be as easily as this:
val PM = P.zipWithIndex.toMap
val (p, i) = PM find { case (p, i) => PM isDefinedAt C - p }
val j = PM(C - p)
However, that won't work if the number is equal to its complement. In other words, if there are two p such that p + p == C. There are quite a few such cases in the examples. One could then test for that condition, and then just use indexOf and lastIndexOf -- except that it is possible that there is only one p such that p + p == C, in which case that wouldn't be the answer either.
So I ended with something more complex, that tests the existence of the complement at the same time the map is being built. Here's the full solution:
import scala.io.Source
object StoreCredit3 extends App {
val source = if (args.size > 0) Source fromFile args(0) else Source.stdin
val input = source getLines ()
val N = input.next.toInt
1 to N foreach { n =>
val C = input.next.toInt
val I = input.next.toInt
val Ps = input.next split ' ' map (_.toInt)
val (_, Some((p1, p2))) = Ps.zipWithIndex.foldLeft((Map[Int, Int](), None: Option[(Int, Int)])) {
case ((map, None), (p, i)) =>
if (map isDefinedAt C - p) map -> Some(map(C - p) -> (i + 1))
else (map updated (p, i + 1), None)
case (answer, _) => answer
}
println("Case #%d: %d %d" format (n, p1, p2))
}
}

finding if two words are anagrams of each other

I am looking for a method to find if two strings are anagrams of one another.
Ex: string1 - abcde
string2 - abced
Ans = true
Ex: string1 - abcde
string2 - abcfed
Ans = false
the solution i came up with so for is to sort both the strings and compare each character from both strings till the end of either strings.It would be O(logn).I am looking for some other efficient method which doesn't change the 2 strings being compared
Count the frequency of each character in the two strings. Check if the two histograms match. O(n) time, O(1) space (assuming ASCII) (Of course it is still O(1) space for Unicode but the table will become very large).
Get table of prime numbers, enough to map each prime to every character. So start from 1, going through line, multiply the number by the prime representing current character. Number you'll get is only depend on characters in string but not on their order, and every unique set of characters correspond to unique number, as any number may be factored in only one way. So you can just compare two numbers to say if a strings are anagrams of each other.
Unfortunately you have to use multiple precision (arbitrary-precision) integer arithmetic to do this, or you will get overflow or rounding exceptions when using this method.
For this you may use libraries like BigInteger, GMP, MPIR or IntX.
Pseudocode:
prime[] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101}
primehash(string)
Y = 1;
foreach character in string
Y = Y * prime[character-'a']
return Y
isanagram(str1, str2)
return primehash(str1)==primehash(str2)
Create a Hashmap where key - letter and value - frequencey of letter,
for first string populate the hashmap (O(n))
for second string decrement count and remove element from hashmap O(n)
if hashmap is empty, the string is anagram otherwise not.
The steps are:
check the length of of both the words/strings if they are equal then only proceed to check for anagram else do nothing
sort both the words/strings and then compare
JAVA CODE TO THE SAME:
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package anagram;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
/**
*
* #author Sunshine
*/
public class Anagram {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException {
// TODO code application logic here
System.out.println("Enter the first string");
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String s1 = br.readLine().toLowerCase();
System.out.println("Enter the Second string");
BufferedReader br2 = new BufferedReader(new InputStreamReader(System.in));
String s2 = br2.readLine().toLowerCase();
char c1[] = null;
char c2[] = null;
if (s1.length() == s2.length()) {
c1 = s1.toCharArray();
c2 = s2.toCharArray();
Arrays.sort(c1);
Arrays.sort(c2);
if (Arrays.equals(c1, c2)) {
System.out.println("Both strings are equal and hence they have anagram");
} else {
System.out.println("Sorry No anagram in the strings entred");
}
} else {
System.out.println("Sorry the string do not have anagram");
}
}
}
C#
public static bool AreAnagrams(string s1, string s2)
{
if (s1 == null) throw new ArgumentNullException("s1");
if (s2 == null) throw new ArgumentNullException("s2");
var chars = new Dictionary<char, int>();
foreach (char c in s1)
{
if (!chars.ContainsKey(c))
chars[c] = 0;
chars[c]++;
}
foreach (char c in s2)
{
if (!chars.ContainsKey(c))
return false;
chars[c]--;
}
return chars.Values.All(i => i == 0);
}
Some tests:
[TestMethod]
public void TestAnagrams()
{
Assert.IsTrue(StringUtil.AreAnagrams("anagramm", "nagaramm"));
Assert.IsTrue(StringUtil.AreAnagrams("anzagramm", "nagarzamm"));
Assert.IsTrue(StringUtil.AreAnagrams("anz121agramm", "nag12arz1amm"));
Assert.IsFalse(StringUtil.AreAnagrams("anagram", "nagaramm"));
Assert.IsFalse(StringUtil.AreAnagrams("nzagramm", "nagarzamm"));
Assert.IsFalse(StringUtil.AreAnagrams("anzagramm", "nag12arz1amm"));
}
Code to find whether two words are anagrams:
Logic explained already in few answers and few asking for the code. This solution produce the result in O(n) time.
This approach counts the no of occurrences of each character and store it in the respective ASCII location for each string. And then compare the two array counts. If it is not equal the given strings are not anagrams.
public boolean isAnagram(String str1, String str2)
{
//To get the no of occurrences of each character and store it in their ASCII location
int[] strCountArr1=getASCIICountArr(str1);
int[] strCountArr2=getASCIICountArr(str2);
//To Test whether the two arrays have the same count of characters. Array size 256 since ASCII 256 unique values
for(int i=0;i<256;i++)
{
if(strCountArr1[i]!=strCountArr2[i])
return false;
}
return true;
}
public int[] getASCIICountArr(String str)
{
char c;
//Array size 256 for ASCII
int[] strCountArr=new int[256];
for(int i=0;i<str.length();i++)
{
c=str.charAt(i);
c=Character.toUpperCase(c);// If both the cases are considered to be the same
strCountArr[(int)c]++; //To increment the count in the character's ASCII location
}
return strCountArr;
}
Using an ASCII hash-map that allows O(1) look-up for each char.
The java example listed above is converting to lower-case that seems incomplete. I have an example in C that simply initializes a hash-map array for ASCII values to '-1'
If string2 is different in length than string 1, no anagrams
Else, we update the appropriate hash-map values to 0 for each char in string1 and string2
Then for each char in string1, we update the count in hash-map. Similarily, we decrement the value of the count for each char in string2.
The result should have values set to 0 for each char if they are anagrams. if not, some positive value set by string1 remains
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRAYMAX 128
#define True 1
#define False 0
int isAnagram(const char *string1,
const char *string2) {
int str1len = strlen(string1);
int str2len = strlen(string2);
if (str1len != str2len) /* Simple string length test */
return False;
int * ascii_hashtbl = (int * ) malloc((sizeof(int) * ARRAYMAX));
if (ascii_hashtbl == NULL) {
fprintf(stderr, "Memory allocation failed\n");
return -1;
}
memset((void *)ascii_hashtbl, -1, sizeof(int) * ARRAYMAX);
int index = 0;
while (index < str1len) { /* Populate hash_table for each ASCII value
in string1*/
ascii_hashtbl[(int)string1[index]] = 0;
ascii_hashtbl[(int)string2[index]] = 0;
index++;
}
index = index - 1;
while (index >= 0) {
ascii_hashtbl[(int)string1[index]]++; /* Increment something */
ascii_hashtbl[(int)string2[index]]--; /* Decrement something */
index--;
}
/* Use hash_table to compare string2 */
index = 0;
while (index < str1len) {
if (ascii_hashtbl[(int)string1[index]] != 0) {
/* some char is missing in string2 from string1 */
free(ascii_hashtbl);
ascii_hashtbl = NULL;
return False;
}
index++;
}
free(ascii_hashtbl);
ascii_hashtbl = NULL;
return True;
}
int main () {
char array1[ARRAYMAX], array2[ARRAYMAX];
int flag;
printf("Enter the string\n");
fgets(array1, ARRAYMAX, stdin);
printf("Enter another string\n");
fgets(array2, ARRAYMAX, stdin);
array1[strcspn(array1, "\r\n")] = 0;
array2[strcspn(array2, "\r\n")] = 0;
flag = isAnagram(array1, array2);
if (flag == 1)
printf("%s and %s are anagrams.\n", array1, array2);
else if (flag == 0)
printf("%s and %s are not anagrams.\n", array1, array2);
return 0;
}
let's take a question: Given two strings s and t, write a function to determine if t is an anagram of s.
For example,
s = "anagram", t = "nagaram", return true.
s = "rat", t = "car", return false.
Method 1(Using HashMap ):
public class Method1 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b ));// output => true
}
private static boolean isAnagram(String a, String b) {
Map<Character ,Integer> map = new HashMap<>();
for( char c : a.toCharArray()) {
map.put(c, map.getOrDefault(c, 0 ) + 1 );
}
for(char c : b.toCharArray()) {
int count = map.getOrDefault(c, 0);
if(count == 0 ) {return false ; }
else {map.put(c, count - 1 ) ; }
}
return true;
}
}
Method 2 :
public class Method2 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b));// output=> true
}
private static boolean isAnagram(String a, String b) {
int[] alphabet = new int[26];
for(int i = 0 ; i < a.length() ;i++) {
alphabet[a.charAt(i) - 'a']++ ;
}
for (int i = 0; i < b.length(); i++) {
alphabet[b.charAt(i) - 'a']-- ;
}
for( int w : alphabet ) {
if(w != 0 ) {return false;}
}
return true;
}
}
Method 3 :
public class Method3 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b ));// output => true
}
private static boolean isAnagram(String a, String b) {
char[] ca = a.toCharArray() ;
char[] cb = b.toCharArray();
Arrays.sort( ca );
Arrays.sort( cb );
return Arrays.equals(ca , cb );
}
}
Method 4 :
public class AnagramsOrNot {
public static void main(String[] args) {
String a = "Protijayi";
String b = "jayiProti";
isAnagram(a, b);
}
private static void isAnagram(String a, String b) {
Map<Integer, Integer> map = new LinkedHashMap<>();
a.codePoints().forEach(code -> map.put(code, map.getOrDefault(code, 0) + 1));
System.out.println(map);
b.codePoints().forEach(code -> map.put(code, map.getOrDefault(code, 0) - 1));
System.out.println(map);
if (map.values().contains(0)) {
System.out.println("Anagrams");
} else {
System.out.println("Not Anagrams");
}
}
}
In Python:
def areAnagram(a, b):
if len(a) != len(b): return False
count1 = [0] * 256
count2 = [0] * 256
for i in a:count1[ord(i)] += 1
for i in b:count2[ord(i)] += 1
for i in range(256):
if(count1[i] != count2[i]):return False
return True
str1 = "Giniiii"
str2 = "Protijayi"
print(areAnagram(str1, str2))
Let's take another famous Interview Question: Group the Anagrams from a given String:
public class GroupAnagrams {
public static void main(String[] args) {
String a = "Gini Gina Protijayi iGin aGin jayiProti Soudipta";
Map<String, List<String>> map = Arrays.stream(a.split(" ")).collect(Collectors.groupingBy(GroupAnagrams::sortedString));
System.out.println("MAP => " + map);
map.forEach((k,v) -> System.out.println(k +" and the anagrams are =>" + v ));
/*
Look at the Map output:
MAP => {Giin=[Gini, iGin], Paiijorty=[Protijayi, jayiProti], Sadioptu=[Soudipta], Gain=[Gina, aGin]}
As we can see, there are multiple Lists. Hence, we have to use a flatMap(List::stream)
Now, Look at the output:
Paiijorty and the anagrams are =>[Protijayi, jayiProti]
Now, look at this output:
Sadioptu and the anagrams are =>[Soudipta]
List contains only word. No anagrams.
That means we have to work with map.values(). List contains all the anagrams.
*/
String stringFromMapHavingListofLists = map.values().stream().flatMap(List::stream).collect(Collectors.joining(" "));
System.out.println(stringFromMapHavingListofLists);
}
public static String sortedString(String a) {
String sortedString = a.chars().sorted()
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append).toString();
return sortedString;
}
/*
* The output : Gini iGin Protijayi jayiProti Soudipta Gina aGin
* All the anagrams are side by side.
*/
}
Now to Group Anagrams in Python is again easy.We have to :
Sort the lists. Then, Create a dictionary. Now dictionary will tell us where are those anagrams are( Indices of Dictionary). Then values of the dictionary is the actual indices of the anagrams.
def groupAnagrams(words):
# sort each word in the list
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords, names in enumerate(A):
dict.setdefault(names, []).append(indexofsamewords)
print(dict)
#{'AOOPR': [0, 2, 5, 11, 13], 'ABTU': [1, 3, 4], 'Sorry': [6], 'adnopr': [7], 'Sadioptu': [8, 16], ' KPaaehiklry': [9], 'Taeggllnouy': [10], 'Leov': [12], 'Paiijorty': [14, 18], 'Paaaikpr': [15], 'Saaaabhmryz': [17], ' CNaachlortttu': [19], 'Saaaaborvz': [20]}
for index in dict.values():
print([words[i] for i in index])
if __name__ == '__main__':
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP", "Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
groupAnagrams(words)
The Output :
['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']
Another Important Anagram Question : Find the Anagram occuring Max. number of times.
In the Example, ROOPA is the word which has occured maximum number of times.
Hence, ['ROOPA' 'OOPAR' 'PAROO' 'AROOP' 'AOORP'] will be the final output.
from sqlite3 import collections
from statistics import mode, mean
import numpy as np
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP",
"Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
print(".....Method 1....... ")
sortedwords = [''.join(sorted(word)) for word in words]
print(sortedwords)
print("...........")
LongestAnagram = np.array(words)[np.array(sortedwords) == mode(sortedwords)]
# Longest anagram
print("Longest anagram by Method 1:")
print(LongestAnagram)
print(".....................................................")
print(".....Method 2....... ")
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords,samewords in enumerate(A):
dict.setdefault(samewords,[]).append(samewords)
#print(dict)
#{'AOOPR': ['AOOPR', 'AOOPR', 'AOOPR', 'AOOPR', 'AOOPR'], 'ABTU': ['ABTU', 'ABTU', 'ABTU'], 'Sadioptu': ['Sadioptu', 'Sadioptu'], ' KPaaehiklry': [' KPaaehiklry'], 'Taeggllnouy': ['Taeggllnouy'], 'Leov': ['Leov'], 'Paiijorty': ['Paiijorty', 'Paiijorty'], 'Paaaikpr': ['Paaaikpr'], 'Saaaabhmryz': ['Saaaabhmryz'], ' CNaachlortttu': [' CNaachlortttu'], 'Saaaaborvz': ['Saaaaborvz']}
aa = max(dict.items() , key = lambda x : len(x[1]))
print("aa => " , aa)
word, anagrams = aa
print("Longest anagram by Method 2:")
print(" ".join(anagrams))
The Output :
.....Method 1.......
['AOOPR', 'ABTU', 'AOOPR', 'ABTU', 'ABTU', 'AOOPR', 'Sadioptu', ' KPaaehiklry', 'Taeggllnouy', 'AOOPR', 'Leov', 'AOOPR', 'Paiijorty', 'Paaaikpr', 'Sadioptu', 'Saaaabhmryz', 'Paiijorty', ' CNaachlortttu', 'Saaaaborvz']
...........
Longest anagram by Method 1:
['ROOPA' 'OOPAR' 'PAROO' 'AROOP' 'AOORP']
.....................................................
.....Method 2.......
aa => ('AOOPR', ['AOOPR', 'AOOPR', 'AOOPR', 'AOOPR', 'AOOPR'])
Longest anagram by Method 2:
AOOPR AOOPR AOOPR AOOPR AOOPR
Well you can probably improve the best case and average case substantially just by checking the length first, then a quick checksum on the digits (not something complex, as that will probably be worse order than the sort, just a summation of ordinal values), then sort, then compare.
If the strings are very short the checksum expense will be not greatly dissimilar to the sort in many languages.
How about this?
a = "lai d"
b = "di al"
sorteda = []
sortedb = []
for i in a:
if i != " ":
sorteda.append(i)
if c == len(b):
for x in b:
c -= 1
if x != " ":
sortedb.append(x)
sorteda.sort(key = str.lower)
sortedb.sort(key = str.lower)
print sortedb
print sorteda
print sortedb == sorteda
How about Xor'ing both the strings??? This will definitely be of O(n)
char* arr1="ab cde";
int n1=strlen(arr1);
char* arr2="edcb a";
int n2=strlen(arr2);
// to check for anagram;
int c=0;
int i=0, j=0;
if(n1!=n2)
printf("\nNot anagram");
else {
while(i<n1 || j<n2)
{
c^= ((int)arr1[i] ^ (int)arr2[j]);
i++;
j++;
}
}
if(c==0) {
printf("\nAnagram");
}
else printf("\nNot anagram");
}
static bool IsAnagram(string s1, string s2)
{
if (s1.Length != s2.Length)
return false;
else
{
int sum1 = 0;
for (int i = 0; i < s1.Length; i++)
sum1 += (int)s1[i]-(int)s2[i];
if (sum1 == 0)
return true;
else
return false;
}
}
For known (and small) sets of valid letters (e.g. ASCII) use a table with counts associated with each valid letter. First string increments counts, second string decrements counts. Finally iterate through the table to see if all counts are zero (strings are anagrams) or there are non-zero values (strings are not anagrams). Make sure to convert all characters to uppercase (or lowercase, all the same) and to ignore white space.
For a large set of valid letters, such as Unicode, do not use table but rather use a hash table. It has O(1) time to add, query and remove and O(n) space. Letters from first string increment count, letters from second string decrement count. Count that becomes zero is removed form the hash table. Strings are anagrams if at the end hash table is empty. Alternatively, search terminates with negative result as soon as any count becomes negative.
Here is the detailed explanation and implementation in C#: Testing If Two Strings are Anagrams
If strings have only ASCII characters:
create an array of 256 length
traverse the first string and increment counter in the array at index = ascii value of the character. also keep counting characters to find length when you reach end of string
traverse the second string and decrement counter in the array at index = ascii value of the character. If the value is ever 0 before decrementing, return false since the strings are not anagrams. also, keep track of the length of this second string.
at the end of the string traversal, if lengths of the two are equal, return true, else, return false.
If string can have unicode characters, then use a hash map instead of an array to keep track of the frequency. Rest of the algorithm remains same.
Notes:
calculating length while adding characters to array ensures that we traverse each string only once.
Using array in case of an ASCII only string optimizes space based on the requirement.
I guess your sorting algorithm is not really O(log n), is it?
The best you can get is O(n) for your algorithm, because you have to check every character.
You might use two tables to store the counts of each letter in every word, fill it with O(n) and compare it with O(1).
It seems that the following implementation works too, can you check?
int histogram[256] = {0};
for (int i = 0; i < strlen(str1); ++i) {
/* Just inc and dec every char count and
* check the histogram against 0 in the 2nd loop */
++histo[str1[i]];
--histo[str2[i]];
}
for (int i = 0; i < 256; ++i) {
if (histo[i] != 0)
return 0; /* not an anagram */
}
return 1; /* an anagram */
/* Program to find the strings are anagram or not*/
/* Author Senthilkumar M*/
Eg.
Anagram:
str1 = stackoverflow
str2 = overflowstack
Not anagram:`enter code here`
str1 = stackforflow
str2 = stacknotflow
int is_anagram(char *str1, char *str2)
{
int l1 = strlen(str1);
int l2 = strlen(str2);
int s1 = 0, s2 = 0;
int i = 0;
/* if both the string are not equal it is not anagram*/
if(l1 != l2) {
return 0;
}
/* sum up the character in the strings
if the total sum of the two strings is not equal
it is not anagram */
for( i = 0; i < l1; i++) {
s1 += str1[i];
s2 += str2[i];
}
if(s1 != s2) {
return 0;
}
return 1;
}
If both strings are of equal length proceed, if not then the strings are not anagrams.
Iterate each string while summing the ordinals of each character. If the sums are equal then the strings are anagrams.
Example:
public Boolean AreAnagrams(String inOne, String inTwo) {
bool result = false;
if(inOne.Length == inTwo.Length) {
int sumOne = 0;
int sumTwo = 0;
for(int i = 0; i < inOne.Length; i++) {
sumOne += (int)inOne[i];
sumTwo += (int)inTwo[i];
}
result = sumOne == sumTwo;
}
return result;
}
implementation in Swift 3:
func areAnagrams(_ str1: String, _ str2: String) -> Bool {
return dictionaryMap(forString: str1) == dictionaryMap(forString: str2)
}
func dictionaryMap(forString str: String) -> [String : Int] {
var dict : [String : Int] = [:]
for var i in 0..<str.characters.count {
if let count = dict[str[i]] {
dict[str[i]] = count + 1
}else {
dict[str[i]] = 1
}
}
return dict
}
//To easily subscript characters
extension String {
subscript(i: Int) -> String {
return String(self[index(startIndex, offsetBy: i)])
}
}
import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Scanner;
/**
* --------------------------------------------------------------------------
* Finding Anagrams in the given dictionary. Anagrams are words that can be
* formed from other words Ex :The word "words" can be formed using the word
* "sword"
* --------------------------------------------------------------------------
* Input : if choose option 2 first enter no of word want to compare second
* enter word ex:
*
* Enter choice : 1:To use Test Cases 2: To give input 2 Enter the number of
* words in dictionary
* 6
* viq
* khan
* zee
* khan
* am
*
* Dictionary : [ viq khan zee khan am]
* Anagrams 1:[khan, khan]
*
*/
public class Anagrams {
public static void main(String args[]) {
// User Input or just use the testCases
int choice;
#SuppressWarnings("resource")
Scanner scan = new Scanner(System.in);
System.out.println("Enter choice : \n1:To use Test Cases 2: To give input");
choice = scan.nextInt();
switch (choice) {
case 1:
testCaseRunner();
break;
case 2:
userInput();
default:
break;
}
}
private static void userInput() {
#SuppressWarnings("resource")
Scanner scan = new Scanner(System.in);
System.out.println("Enter the number of words in dictionary");
int number = scan.nextInt();
String dictionary[] = new String[number];
//
for (int i = 0; i < number; i++) {
dictionary[i] = scan.nextLine();
}
printAnagramsIn(dictionary);
}
/**
* provides a some number of dictionary of words
*/
private static void testCaseRunner() {
String dictionary[][] = { { "abc", "cde", "asfs", "cba", "edcs", "name" },
{ "name", "mane", "string", "trings", "embe" } };
for (int i = 0; i < dictionary.length; i++) {
printAnagramsIn(dictionary[i]);
}
}
/**
* Prints the set of anagrams found the give dictionary
*
* logic is sorting the characters in the given word and hashing them to the
* word. Data Structure: Hash[sortedChars] = word
*/
private static void printAnagramsIn(String[] dictionary) {
System.out.print("Dictionary : [");// + dictionary);
for (String each : dictionary) {
System.out.print(each + " ");
}
System.out.println("]");
//
Map<String, ArrayList<String>> map = new LinkedHashMap<String, ArrayList<String>>();
// review comment: naming convention: dictionary contains 'word' not
// 'each'
for (String each : dictionary) {
char[] sortedWord = each.toCharArray();
// sort dic value
Arrays.sort(sortedWord);
//input word
String sortedString = new String(sortedWord);
//
ArrayList<String> list = new ArrayList<String>();
if (map.keySet().contains(sortedString)) {
list = map.get(sortedString);
}
list.add(each);
map.put(sortedString, list);
}
// print anagram
int i = 1;
for (String each : map.keySet()) {
if (map.get(each).size() != 1) {
System.out.println("Anagrams " + i + ":" + map.get(each));
i++;
}
}
}
}
I just had an interview and 'SolutionA' was basically my solution.
Seems to hold.
It might also work to sum all characters, or the hashCodes of each character, but it would still be at least O(n).
/**
* Using HashMap
*
* O(a + b + b + b) = O(a + 3*b) = O( 4n ) if a and b are equal. Meaning O(n) in total.
*/
public static final class SolutionA {
//
private static boolean isAnagram(String a, String b) {
if ( a.length() != b.length() ) return false;
HashMap<Character, Integer> aa = toHistogram(a);
HashMap<Character, Integer> bb = toHistogram(b);
return isHistogramsEqual(aa, bb);
}
private static HashMap<Character, Integer> toHistogram(String characters) {
HashMap<Character, Integer> histogram = new HashMap<>();
int i = -1; while ( ++i < characters.length() ) {
histogram.compute(characters.charAt(i), (k, v) -> {
if ( v == null ) v = 0;
return v+1;
});
}
return histogram;
}
private static boolean isHistogramsEqual(HashMap<Character, Integer> a, HashMap<Character, Integer> b) {
for ( Map.Entry<Character, Integer> entry : b.entrySet() ) {
Integer aa = a.get(entry.getKey());
Integer bb = entry.getValue();
if ( !Objects.equals(aa, bb) ) {
return false;
}
}
return true;
}
public static void main(String[] args) {
System.out.println(isAnagram("abc", "cba"));
System.out.println(isAnagram("abc", "cbaa"));
System.out.println(isAnagram("abcc", "cba"));
System.out.println(isAnagram("abcd", "cba"));
System.out.println(isAnagram("twelve plus one", "eleven plus two"));
}
}
I've provided a hashCode() based implementation as well. Seems to hold as well.
/**
* Using hashCode()
*
* O(a + b) minimum + character.hashCode() calculation, the latter might be cheap though. Native implementation.
*
* Risk for collision albeit small.
*/
public static final class SolutionB {
public static void main(String[] args) {
System.out.println(isAnagram("abc", "cba"));
System.out.println(isAnagram("abc", "cbaa"));
System.out.println(isAnagram("abcc", "cba"));
System.out.println(isAnagram("abcd", "cba"));
System.out.println(isAnagram("twelve plus one", "eleven plus two"));
}
private static boolean isAnagram(String a, String b) {
if ( a.length() != b.length() ) return false;
return toHashcode(a) == toHashcode(b);
}
private static long toHashcode(String str) {
long sum = 0; int i = -1; while ( ++i < str.length() ) {
sum += Objects.hashCode( str.charAt(i) );
}
return sum;
}
}
in java we can also do it like this and its very simple logic
import java.util.*;
class Anagram
{
public static void main(String args[]) throws Exception
{
Boolean FLAG=true;
Scanner sc= new Scanner(System.in);
System.out.println("Enter 1st string");
String s1=sc.nextLine();
System.out.println("Enter 2nd string");
String s2=sc.nextLine();
int i,j;
i=s1.length();
j=s2.length();
if(i==j)
{
for(int k=0;k<i;k++)
{
for(int l=0;l<i;l++)
{
if(s1.charAt(k)==s2.charAt(l))
{
FLAG=true;
break;
}
else
FLAG=false;
}
}
}
else
FLAG=false;
if(FLAG)
System.out.println("Given Strings are anagrams");
else
System.out.println("Given Strings are not anagrams");
}
}
How about converting into the int value of the character and sum up :
If the value of sum are equals then they are anagram to each other.
def are_anagram1(s1, s2):
return [False, True][sum([ord(x) for x in s1]) == sum([ord(x) for x in s2])]
s1 = 'james'
s2 = 'amesj'
print are_anagram1(s1,s2)
This solution works only for 'A' to 'Z' and 'a' to 'z'.

Resources