Sorting two Dictionaries and finding the differences in index position of items in the sorted list - algorithm

I have a challenge in front of me. Let me present the challenge which is perplexing me -
There are two dictionaries say - D1 and D2.
These dictionaries have same keys most of the time but there is no guarantee that it will be always the same.
The two Dictionaries could be represented as follows -
D1 = {["R1", 0.7], ["R2",0.73], ["R3", 1.5], ["R4", 2.5], ["R5", 0.12], ["R6", 1.9], ["R7", 9.8], ["R8", 6.5], ["R9", 7.2], ["R10", 5.6]};
D2 = {["R1", 0.7], ["R2",0.8], ["R3", 1.5], ["R4", 3.1], ["R5", 0.10], ["R6", 2.0], ["R7", 8.0], ["R8", 1.0], ["R9", 0.0], ["R10", 5.6], ["R11", 6.23]};
Here in these dictionaries, the keys are of string data type and value are of float data type.
Physically they are snapshot of a system in two different times. D1 being older than D2.
I need to sort these dictionaries independently based on the values in ascending orders. Which when done changes these dictionaries to -
D1 = {["R5", 0.12], ["R1", 0.7], ["R2",0.73], ["R3", 1.5], ["R6", 1.9], ["R4", 2.5], ["R10", 5.6], ["R8", 6.5], ["R9", 7.2], ["R7", 9.8]};
and
D2 = {["R9", 0.0], ["R5", 0.10], ["R1", 0.7], ["R2",0.8], ["R8", 1.0], ["R3", 1.5], ["R6", 2.0], ["R4", 3.1], ["R10", 5.6], ["R11", 6.23], ["R7", 8.0]};
Here the sorting of elements in the dictionary D1 is taken as reference point. Each element of the D1 is connected with the immediate next one in D1. It is expected to identify the elements in D2 which have broken the sequence as it appears in the reference dictionary D1 after sorting. While determining this addition of elements (i.e. the key not being present in D1 but is present in D2) to D2 and removal of elements (i.e. the key is present in D1 but not in D2) from D1 are ignored. i.e They should not highlighted in the result.
For example, in continuing with the example listed above, the elements which break a sequence in D2 with reference to D1 (ignoring addition and removal) are -
Breakers = {["R9", 0.0],["R8", 1.0]} since, R9 has jumped the sequence from 8th index in D1 sorted dictionary to 0th index in D2 sorted dictionary. Similarly R8 has jumped the sequence from 7th index in D1 sorted dictionary to 4th index in D2 sorted dictionary (all indexes are started from 0).
Note - ["R11", 6.23] is not expected to be in the list of Breakers since it is addition to D2.
Please suggest an algorithm to achieve this optimally, since this operation needs to be performed on data fetched from a database with 3,256,190 records.
Programming language is not a worry, if guided with logic I could take up the task of implementing it in any language.

I came up with this algorithm in C#. It works perfect for you example data. I also did a test with 3000000 totally random values (so a lot of breakers are detected) and it completes in 3.2seconds on my notebook (Intel Core i3 2.1GHz, 64bit).
I first put your data into temporary dictionaries, so I could copy-paste your values, before I put them into the Lists. Of course your application will put them directly in the lists.
class Program
{
struct SingleValue
{
public string Key;
public float Value;
public SingleValue(string key, float value)
{
Key = key;
Value = value;
}
public override string ToString()
{
return string.Format("{0}={1}", Key, Value);
}
}
static void Main(string[] args)
{
List<SingleValue> D1 = new List<SingleValue>();
HashSet<string> D1keys = new HashSet<string>();
List<SingleValue> D2 = new List<SingleValue>();
#if !LARGETEST
Dictionary<string, double> D1input = new Dictionary<string, double>() { { "R1", 0.7 }, { "R2", 0.73 }, { "R3", 1.5 }, { "R4", 2.5 }, { "R5", 0.12 }, { "R6", 1.9 }, { "R7", 9.8 }, { "R8", 6.5 }, { "R9", 7.2 }, { "R10", 5.6 } };
Dictionary<string, double> D2input = new Dictionary<string, double>() { { "R1", 0.7 }, { "R2", 0.8 }, { "R3", 1.5 }, { "R4", 3.1 }, { "R5", 0.10 }, { "R6", 2.0 }, { "R7", 8.0 }, { "R8", 1.0 }, { "R9", 0.0 }, { "R10", 5.6 }, { "R11", 6.23 } };
// You should directly put you values into this list... I converted them from a Dictionary so I didn't have to type over your input values :)
foreach (KeyValuePair<string, double> kvp in D1input)
{
D1.Add(new SingleValue(kvp.Key, (float)kvp.Value));
D1keys.Add(kvp.Key);
}
foreach (KeyValuePair<string, double> kvp in D2input)
D2.Add(new SingleValue(kvp.Key, (float)kvp.Value));
#else
Random ran = new Random();
for (int i = 0; i < 3000000; i++)
{
D1.Add(new SingleValue(i.ToString(), (float)ran.NextDouble()));
D1keys.Add(i.ToString());
D2.Add(new SingleValue(i.ToString(), (float)ran.NextDouble()));
}
#endif
// Sort the lists
D1.Sort(delegate(SingleValue x, SingleValue y)
{
if (y.Value > x.Value)
return -1;
else if (y.Value < x.Value)
return 1;
return 0;
});
D2.Sort(delegate(SingleValue x, SingleValue y)
{
if (y.Value > x.Value)
return -1;
else if (y.Value < x.Value)
return 1;
return 0;
});
int start = Environment.TickCount;
Dictionary<string, float> breakers = new Dictionary<string, float>();
List<SingleValue> additions = new List<SingleValue>();
// Walk through D1
IEnumerator<SingleValue> i1 = D1.GetEnumerator();
IEnumerator<SingleValue> i2 = D2.GetEnumerator();
while (i1.MoveNext() && i2.MoveNext())
{
while (breakers.ContainsKey(i1.Current.Key))
{
if (!i1.MoveNext())
break;
}
while (i1.Current.Key != i2.Current.Key)
{
if (D1keys.Contains(i2.Current.Key))
breakers.Add(i2.Current.Key, i2.Current.Value);
else
additions.Add(i2.Current);
if (!i2.MoveNext())
break;
}
}
int duration = Environment.TickCount - start;
Console.WriteLine("Lookup took {0}ms", duration);
Console.ReadKey();
}
}

If you could delete stuff from D2 before sorting it would be easy, right? You say you can't delete the data. However, you can make an additional data structure that simulates such deletions (e.g., add a "deleted" bit to the item, or if you can't change its type then make a set of "deleted" items). Then run the simple algorithm, but make sure to ignore items that are "deleted".

I have been thinking about this. As you mentioned Levenshtein distance, I am assuming you want to get those elements by moving whom from their position in D2 to some other position in D2 you will get D1 from D2 in the least number of moves (ignoring elements which don't exist in both sequences).
I wrote a greedy algorithm which may be sufficient for your needs, but it may not necessarily give the optimal result in all cases. I am honestly not sure and may come back to it later (weekend at the earliest) to check the correctness. However, if you really need to do this on sequences of 3 million elements, I believe that no algorithm which does any kind of good job at this will be fast enough, because I can't see an O(n) algorithm which does a good job and can't fail even on some trivial inputs.
This algorithm tries moving each element to its intended position and calculates the sum of errors (how far each element is from its original position) after the move. The element which results in the lowest sum of errors is proclaimed a breaker and moved. This is repeated until the sequence reverts back to D1.
I think it has O(n^3) complexity, although elements sometimes need to be moved multiple times so it could possibly be O(n^4) worst case, I'm not sure, but in 1 million random examples with 50 elements the max number of outer loop runs was 51 (n^4 would mean it can be 2500 and somehow I was lucky in all my million tests). There are just keys, no values. This is because the values are irrelevant in this step so there is no point in storing them.
edit: I wrote a counterexample generator for this and indeed it is not always optimal. The more breakers there are, the less of a chance of an optimal solution. For example, in 1000 elements with 50 randomly moved ones it will usually find a set of 55-60 breakers, when the optimal solution is at most 50.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Breakers
{
class Program
{
static void Main(string[] args)
{
//test case 1
//List<string> L1 = new List<string> { "R5", "R1", "R2", "R3", "R6", "R4", "R10", "R8", "R9", "R7" };
//List<string> L2 = new List<string> { "R9", "R5", "R1", "R2", "R8", "R3", "R6", "R4", "R10", "R11", "R7" };
//GetBreakers<string>(L1, L2);
//test case 2
//List<string> L1 = new List<string> { "R5", "R1", "R2", "R3", "R6", "R4", "R10", "R8", "R9", "R7" };
//List<string> L2 = new List<string> { "R5", "R9", "R1", "R6", "R2", "R3", "R4", "R10", "R8", "R7" };
//GetBreakers<string>(L1, L2);
//test case 3
List<int> L1 = new List<int>();
List<int> L2 = new List<int>();
Random r = new Random();
int n = 100;
for (int i = 0; i < n; i++)
{
L1.Add(i);
L2.Add(i);
}
for (int i = 0; i < 5; i++) // number of random moves, this is the upper bound of the optimal solution
{
int a = r.Next() % n;
int b = r.Next() % n;
if (a == b)
{
i--;
continue;
}
int x = L2[a];
Console.WriteLine(x);
L2.RemoveAt(a);
L2.Insert(b, x);
}
for (int i = 0; i < L2.Count; i++) Console.Write(L2[i]);
Console.WriteLine();
GetBreakers<int>(L1, L2);
}
static void GetBreakers<T>(List<T> L1, List<T> L2)
{
Dictionary<T, int> Appearances = new Dictionary<T, int>();
for (int i = 0; i < L1.Count; i++) Appearances[L1[i]] = 1;
for (int i = 0; i < L2.Count; i++) if (Appearances.ContainsKey(L2[i])) Appearances[L2[i]] = 2;
for (int i = L1.Count - 1; i >= 0; i--) if (!(Appearances.ContainsKey(L1[i]) && Appearances[L1[i]] == 2)) L1.RemoveAt(i);
for (int i = L2.Count - 1; i >= 0; i--) if (!(Appearances.ContainsKey(L2[i]) && Appearances[L2[i]] == 2)) L2.RemoveAt(i);
Dictionary<T, int> IndInL1 = new Dictionary<T, int>();
for (int i = 0; i < L1.Count; i++) IndInL1[L1[i]] = i;
Dictionary<T, int> Breakers = new Dictionary<T, int>();
int steps = 0;
int me = 0;
while (true)
{
steps++;
int minError = int.MaxValue;
int minErrorIndex = -1;
for (int from = 0; from < L2.Count; from++)
{
T x = L2[from];
int to = IndInL1[x];
if (from == to) continue;
L2.RemoveAt(from);
L2.Insert(to, x);
int error = 0;
for (int i = 0; i < L2.Count; i++)
error += Math.Abs((i - IndInL1[L2[i]]));
L2.RemoveAt(to);
L2.Insert(from, x);
if (error < minError)
{
minError = error;
minErrorIndex = from;
}
}
if (minErrorIndex == -1) break;
T breaker = L2[minErrorIndex];
int breakerOriginalPosition = IndInL1[breaker];
L2.RemoveAt(minErrorIndex);
L2.Insert(breakerOriginalPosition, breaker);
Breakers[breaker] = 1;
me = minError;
}
Console.WriteLine("Breakers: " + Breakers.Count + " Steps: " + steps);
foreach (KeyValuePair<T, int> p in Breakers)
Console.WriteLine(p.Key);
Console.ReadLine();
}
}
}

Related

As3 Position on top based on Y without lag?

I was creating a position script with my own made tag system to position all the objects in order of the y position they have. This script makes my game lag so I wanted to ask if there is a better way of doing this.
PS: This code is used every frame.
private function positionGameObjectToLayer():void
{
var objectOnScreen : Array = [];
var parentObj : Sprite;
var l : int;
l = gameObjects.length;
for (var i : int = 0; i < l; i++) {
/*checks if the object has the position tag*/
if (gameObjects[i].checkTag(Tags.POSITION_ON_Y_TAG)) {
objectOnScreen.push(gameObjects[i]); //if it does it goes into this array
}
}
objectOnScreen.sortOn("y", Array.NUMERIC); /* sorts the array on y position*/
l = objectOnScreen.length;
for (i = 0; i < l; i++) {
/*this sets the layer of the object in order of the array*/
parentObj = objectOnScreen[i].parent;
parentObj.setChildIndex(objectOnScreen[i],parentObj.numChildren - 1);
}
l = gameObjects.length;
for (i = 0; i < l; i++) {
//if it has the always on top tag
if (gameObjects[i].checkTag(Tags.POSITION_ON_TOP_TAG)) {
/*then this code will grab that object and place it over the other layers*/
parentObj = gameObjects[i].parent;
parentObj.setChildIndex(gameObjects[i], parentObj.numChildren - 1);
}
}
}

Algorithm that discovers all the fields on a map with as least turns as possible

Let's say I have such map:
#####
..###
W.###
. is a discovered cell.
# is an undiscovered cell.
W is a worker. There can be many workers. Each of them can move once per turn. In one turn he can move by one cell in 4 directions (up, right, down or left). He discovers all 8 cells around him - turns # into .. In one turn, there can be maximum one worker on the same cell.
Maps are not always rectangular. In the beginning all cells are undiscovered, except the neighbours of W.
The goal is to make all the cells discovered, in as least turns as possible.
First approach
Find the nearest # and go towards it. Repeat.
To find the nearest # I start BFS from W and finish it when first # is found.
On exemplary map it can give such solution:
##### ##### ##### ##### ##... #.... .....
..### ...## ....# ..... ...W. ..W.. .W...
W.### .W.## ..W.# ...W. ..... ..... .....
6 turns. Pretty far from optimal:
##### ..### ...## ....# .....
..### W.### .W.## ..W.# ...W.
W.### ..### ...## ....# .....
4 turns.
Question
What is the algorithm that discovers all the cells with as least turns as possible?
Here is a basic idea that uses A*. It is probably quite time- and memory-consuming, but it is guaranteed to return an optimal solution and is definitely better than brute force.
The nodes for A* will be the various states, i.e. where the workers are positioned and the discovery state of all cells. Each unique state represents a different node.
Edges will be all possible transitions. One worker has four possible transitions. For more workers, you will need every possible combination (about 4^n edges). This is the part where you can constrain the workers to remain within the grid and not to overlap.
The cost will be the number of turns. The heuristic to approximate the distance to the goal (all cells discovered) can be developed as follows:
A single worker can discover at most three cells per turn. Thus, n workers can discover at most 3*n cells. The minimum number of remaining turns is therefore "number of undiscovered cells / (3 * worker count)". This is the heuristic to use. This could even be improved by determining the maximum number of cells that each worker can discover in the next turn (will be max. 3 per worker). So overall heuristic would be "(undiscorvered cells - discoverable cells) / (3 * workers) + 1".
In each step you examine the node with the least overall cost (turns so far + heuristic). For the examined node, you calculate the costs for each surrounding node (possible movements of all workers) and go on.
Strictly speaking, the main part of this answer may be considered as "Not An Answer". So to first cover the actual question:
What is the algorithm that discovers all the cells with as least turns as possible?
Answer: In each step, you can compute all possible successors of the current state. Then the successors of these successors. This can be repeated recursively, until one of the successors contains no more #-fields. The sequence of states through which this successor was reached is optimal regarding the number of moves that have been necessary to reach this state.
So far, this is trivial. But of course, this is not feasible for a "large" map and/or a "large" number of workers.
As mentioned in the comments: I think that finding the optimal solution may be an NP-complete problem. In any case, it's most likely at least a tremendously complicated optimization problem where you may employ some rather sophisticated techniques to find the optimal solution in optimal time.
So, IMHO, the only feasible approach for tackling this are heuristics.
Several approaches can be imagined here. However, I wanted to give it a try, with a very simple approach. The following MCVE accepts the definition of the map as a rectangular string (empty spaces represent "invalid" regions, so it's possible to represent non-rectangular maps with that). The workers are simply enumerated, from 0 to 9 (limited to this number, at the moment). The string is converted into a MapState that consists of the actual map, as well as the paths that the workers have gone through until then.
The actual search here is a "greedy" version of the exhaustive search that I described in the first paragraph: Given an initial state, it computes all successor states. These are the states where each worker has moved in either direction (e.g. 64 states for 3 workers - of course these are "filtered" to make sure that workers don't leave the map or move to the same field).
These successor states are stored in a list. Then it searches the list for the "best" state, and again computes all successors of this "best" state and stores them in the list. Sooner or later, the list contains a state where no fields are missing.
The definition of the "best" state is where the heuristics come into play: A state is "better" than another when there are fewer fields missing (unvisited). When two states have an equal number of missing fields, then the average distance of the workers to the next unvisited fields serves as the criterion to decide which one is "better".
This finds and a solution for the example that is contained in the code below rather quickly, and prints it as the lists of positions that each worker has to visit in each turn.
Of course, this will also not be applicable to "really large" maps or "many" workers, because the list of states will grow rather quickly (one could consider dropping the "worst" solutions to speed this up a little, but this may have caveats, like being stuck in local optima). Additionally, one can easily think of cases where the "greedy" strategy does not give optimal results. But until someone posts an MVCE that always computes the optimal solution in polynomial time, maybe someone finds this interesting or helpful.
import java.awt.Point;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class MapExplorerTest
{
public static void main(String[] args)
{
String mapString =
" ### ######"+"\n"+
" ### ###1##"+"\n"+
"###############"+"\n"+
"#0#############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"##### #######"+"\n"+
"##### #######"+"\n"+
"##### #######"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"### ######2##"+"\n"+
"### #########"+"\n";
MapExplorer m = new MapExplorer(mapString);
MapState solution = m.computeSolutionGreedy();
System.out.println(solution.createString());
}
}
class MapState
{
private int rows;
private int cols;
private char map[][];
List<List<Point>> workerPaths;
private int missingFields = -1;
MapState(String mapString)
{
workerPaths = new ArrayList<List<Point>>();
rows = countLines(mapString);
cols = mapString.indexOf("\n");
map = new char[rows][cols];
String s = mapString.replaceAll("\\n", "");
for (int r=0; r<rows; r++)
{
for (int c=0; c<cols; c++)
{
int i = c+r*cols;
char ch = s.charAt(i);
map[r][c] = ch;
if (Character.isDigit(ch))
{
int workerIndex = ch - '0';
while (workerPaths.size() <= workerIndex)
{
workerPaths.add(new ArrayList<Point>());
}
Point p = new Point(r, c);
workerPaths.get(workerIndex).add(p);
}
}
}
}
MapState(MapState other)
{
this.rows = other.rows;
this.cols = other.cols;
this.map = new char[other.map.length][];
for (int i=0; i<other.map.length; i++)
{
this.map[i] = other.map[i].clone();
}
this.workerPaths = new ArrayList<List<Point>>();
for (List<Point> otherWorkerPath : other.workerPaths)
{
this.workerPaths.add(MapExplorer.copy(otherWorkerPath));
}
}
int distanceToMissing(Point p0)
{
if (getMissingFields() == 0)
{
return -1;
}
List<Point> points = new ArrayList<Point>();
Map<Point, Integer> distances = new HashMap<Point, Integer>();
distances.put(p0, 0);
points.add(p0);
while (!points.isEmpty())
{
Point p = points.remove(0);
List<Point> successors = MapExplorer.computeSuccessors(p);
for (Point s : successors)
{
if (!isValid(p))
{
continue;
}
if (map[p.x][p.y] == '#')
{
return distances.get(p)+1;
}
if (!distances.containsKey(s))
{
distances.put(s, distances.get(p)+1);
points.add(s);
}
}
}
return -1;
}
double averageDistanceToMissing()
{
double d = 0;
for (List<Point> workerPath : workerPaths)
{
Point p = workerPath.get(workerPath.size()-1);
d += distanceToMissing(p);
}
return d / workerPaths.size();
}
int getMissingFields()
{
if (missingFields == -1)
{
missingFields = countMissingFields();
}
return missingFields;
}
private int countMissingFields()
{
int count = 0;
for (int r=0; r<rows; r++)
{
for (int c=0; c<cols; c++)
{
if (map[r][c] == '#')
{
count++;
}
}
}
return count;
}
void update()
{
for (List<Point> workerPath : workerPaths)
{
Point p = workerPath.get(workerPath.size()-1);
for (int dr=-1; dr<=1; dr++)
{
for (int dc=-1; dc<=1; dc++)
{
if (dr == 0 && dc == 0)
{
continue;
}
int nr = p.x + dr;
int nc = p.y + dc;
if (!isValid(nr, nc))
{
continue;
}
if (map[nr][nc] != '#')
{
continue;
}
map[nr][nc] = '.';
}
}
}
}
public void updateWorkerPosition(int w, Point p)
{
List<Point> workerPath = workerPaths.get(w);
Point old = workerPath.get(workerPath.size()-1);
char oc = map[old.x][old.y];
char nc = map[p.x][p.y];
map[old.x][old.y] = nc;
map[p.x][p.y] = oc;
}
boolean isValid(int r, int c)
{
if (r < 0) return false;
if (r >= rows) return false;
if (c < 0) return false;
if (c >= cols) return false;
if (map[r][c] == ' ')
{
return false;
}
return true;
}
boolean isValid(Point p)
{
return isValid(p.x, p.y);
}
private static int countLines(String s)
{
int count = 0;
while (s.contains("\n"))
{
s = s.replaceFirst("\\\n", "");
count++;
}
return count;
}
public String createMapString()
{
StringBuilder sb = new StringBuilder();
for (int r=0; r<rows; r++)
{
for (int c=0; c<cols; c++)
{
sb.append(map[r][c]);
}
sb.append("\n");
}
return sb.toString();
}
public String createString()
{
StringBuilder sb = new StringBuilder();
for (List<Point> workerPath : workerPaths)
{
Point p = workerPath.get(workerPath.size()-1);
int d = distanceToMissing(p);
sb.append(workerPath).append(", distance: "+d+"\n");
}
sb.append(createMapString());
sb.append("Missing "+getMissingFields());
return sb.toString();
}
}
class MapExplorer
{
MapState mapState;
public MapExplorer(String mapString)
{
mapState = new MapState(mapString);
mapState.update();
computeSuccessors(mapState);
}
static List<Point> copy(List<Point> list)
{
List<Point> result = new ArrayList<Point>();
for (Point p : list)
{
result.add(new Point(p));
}
return result;
}
public MapState computeSolutionGreedy()
{
Comparator<MapState> comparator = new Comparator<MapState>()
{
#Override
public int compare(MapState ms0, MapState ms1)
{
int m0 = ms0.getMissingFields();
int m1 = ms1.getMissingFields();
if (m0 != m1)
{
return m0-m1;
}
double d0 = ms0.averageDistanceToMissing();
double d1 = ms1.averageDistanceToMissing();
return Double.compare(d0, d1);
}
};
Set<MapState> handled = new HashSet<MapState>();
List<MapState> list = new ArrayList<MapState>();
list.add(mapState);
while (true)
{
MapState best = list.get(0);
for (MapState mapState : list)
{
if (!handled.contains(mapState))
{
if (comparator.compare(mapState, best) < 0)
{
best = mapState;
}
}
}
if (best.getMissingFields() == 0)
{
return best;
}
handled.add(best);
list.addAll(computeSuccessors(best));
System.out.println("List size "+list.size()+", handled "+handled.size()+", best\n"+best.createString());
}
}
List<MapState> computeSuccessors(MapState mapState)
{
int numWorkers = mapState.workerPaths.size();
List<Point> oldWorkerPositions = new ArrayList<Point>();
for (int i=0; i<numWorkers; i++)
{
List<Point> workerPath = mapState.workerPaths.get(i);
Point p = workerPath.get(workerPath.size()-1);
oldWorkerPositions.add(p);
}
List<List<Point>> successorPositionsForWorkers = new ArrayList<List<Point>>();
for (int w=0; w<oldWorkerPositions.size(); w++)
{
Point p = oldWorkerPositions.get(w);
List<Point> ps = computeSuccessors(p);
successorPositionsForWorkers.add(ps);
}
List<List<Point>> newWorkerPositionsList = new ArrayList<List<Point>>();
int numSuccessors = (int)Math.pow(4, numWorkers);
for (int i=0; i<numSuccessors; i++)
{
String s = Integer.toString(i, 4);
while (s.length() < numWorkers)
{
s = "0"+s;
}
List<Point> newWorkerPositions = copy(oldWorkerPositions);
for (int w=0; w<numWorkers; w++)
{
int index = s.charAt(w) - '0';
Point newPosition = successorPositionsForWorkers.get(w).get(index);
newWorkerPositions.set(w, newPosition);
}
newWorkerPositionsList.add(newWorkerPositions);
}
List<MapState> successors = new ArrayList<MapState>();
for (int i=0; i<newWorkerPositionsList.size(); i++)
{
List<Point> newWorkerPositions = newWorkerPositionsList.get(i);
if (workerPositionsValid(newWorkerPositions))
{
MapState successor = new MapState(mapState);
for (int w=0; w<numWorkers; w++)
{
Point p = newWorkerPositions.get(w);
successor.updateWorkerPosition(w, p);
successor.workerPaths.get(w).add(p);
}
successor.update();
successors.add(successor);
}
}
return successors;
}
private boolean workerPositionsValid(List<Point> workerPositions)
{
Set<Point> set = new HashSet<Point>();
for (Point p : workerPositions)
{
if (!mapState.isValid(p.x, p.y))
{
return false;
}
set.add(p);
}
return set.size() == workerPositions.size();
}
static List<Point> computeSuccessors(Point p)
{
List<Point> result = new ArrayList<Point>();
result.add(new Point(p.x+0, p.y+1));
result.add(new Point(p.x+0, p.y-1));
result.add(new Point(p.x+1, p.y+0));
result.add(new Point(p.x-1, p.y+0));
return result;
}
}

Algorithms find shortest path to all cells on grid

I have a grid [40 x 15] with 2 to 16 units on it, and unknown amount of obstacles.
How to find the shortest path to all the units from my unit location.
I have two helper methods that we can consider as O(1)
getMyLocation() - return the (x, y) coordinates of my location on the grid
investigateCell(x, y) - return information about cell at (x,y) coordinates
I implemented A* search algorithm, that search simultaneously to all the directions. At the end it output a grid where each cell have a number representing the distance from my location, and collection of all the units on the grid. It performs with O(N) where N is the number of cells - 600 in my case.
I implement this using AS3, unfortunately it takes my machine 30 - 50 milliseconds to calculate.
Here is my source code. Can you suggest me a better way?
package com.gazman.strategy_of_battle_package.map
{
import flash.geom.Point;
/**
* Implementing a path finding algorithm(Similar to A* search only there is no known target) to calculate the shortest path to each cell on the map.
* Once calculation is complete the information will be available at cellsMap. Each cell is a number representing the
* number of steps required to get to that location. Enemies and Allies will be represented with negative distance. Also the enemy and Allys
* coordinations collections are provided. Blocked cells will have the value 0.<br><br>
* Worth case and best case efficiency is O(N) where N is the number of cells.
*/
public class MapFilter
{
private static const PULL:Vector.<MapFilter> = new Vector.<MapFilter>();
public var cellsMap:Vector.<Vector.<int>>;
public var allys:Vector.<Point>;
public var enemies:Vector.<Point>;
private var stack:Vector.<MapFilter>;
private var map:Map;
private var x:int;
private var y:int;
private var count:int;
private var commander:String;
private var hash:Object;
private var filtered:Boolean;
public function filter(map:Map, myLocation:Point, commander:String):void{
filtered = true;
this.commander = commander;
this.map = map;
this.x = myLocation.x;
this.y = myLocation.y;
init();
cellsMap[x][y] = 1;
excecute();
while(stack.length > 0){
var length:int = stack.length;
for(var i:int = 0; i < length; i++){
var mapFilter:MapFilter = stack.shift();
mapFilter.excecute();
PULL.push(mapFilter);
}
}
}
public function navigateTo(location:Point):Point{
if(!filtered){
throw new Error("Must filter before navigating");
}
var position:int = Math.abs(cellsMap[location.x][location.y]);
if(position == 0){
throw new Error("Target unreachable");
}
while(position > 2){
if(canNavigateTo(position, location.x + 1, location.y)){
location.x++;
}
else if(canNavigateTo(position, location.x - 1, location.y)){
location.x--;
}
else if(canNavigateTo(position, location.x, location.y + 1)){
location.y++;
}
else if(canNavigateTo(position, location.x, location.y - 1)){
location.y--;
}
position = cellsMap[location.x][location.y];
}
return location;
throw new Error("Unexpected filtering error");
}
private function canNavigateTo(position:int, targetX:int, targetY:int):Boolean
{
return isInMapRange(targetX, targetY) && cellsMap[targetX][targetY] < position && cellsMap[targetX][targetY] > 0;
}
private function excecute():void
{
papulate(x + 1, y);
papulate(x - 1, y);
papulate(x, y + 1);
papulate(x, y - 1);
}
private function isInMapRange(x:int, y:int):Boolean{
return x < cellsMap.length &&
x >= 0 &&
y < cellsMap[0].length &&
y >= 0;
}
private function papulate(x:int, y:int):void
{
if(!isInMapRange(x,y) ||
cellsMap[x][y] != 0 ||
hash[x + "," + y] != null ||
map.isBlocked(x,y)){
return;
}
// we already checked that is not block
// checking if there units
if(map.isEmpty(x,y)){
cellsMap[x][y] = count;
addTask(x,y);
}
else{
cellsMap[x][y] = -count;
if(map.isAlly(x,y, commander)){
hash[x + "," + y] = true;
allys.push(new Point(x,y));
}
else {
hash[x + "," + y] = true;
enemies.push(new Point(x,y));
}
}
}
private function addTask(x:int, y:int):void
{
var mapFilter:MapFilter = PULL.pop();
if(mapFilter == null){
mapFilter = new MapFilter();
}
mapFilter.commander = commander;
mapFilter.hash = hash;
mapFilter.map = map;
mapFilter.cellsMap = cellsMap;
mapFilter.allys = allys;
mapFilter..enemies = enemies;
mapFilter.stack = stack;
mapFilter.count = count + 1;
mapFilter.x = x;
mapFilter.y = y;
stack.push(mapFilter);
}
private function init():void
{
hash = new Object();
cellsMap = new Vector.<Vector.<int>>();
for(var i:int = 0; i < map.width;i++){
cellsMap.push(new Vector.<int>);
for(var j:int = 0; j < map.height;j++){
cellsMap[i].push(0);
}
}
allys = new Vector.<Point>();
enemies = new Vector.<Point>();
stack = new Vector.<MapFilter>();
count = 2;
}
}
}
You can use Floyd Warshall to find the shortest path between every pair of points. This would be O(|V|^3) and you would not have to run it for each unit, just once on each turn. It's such a simple algorithm I suspect it might be faster in practice than running something like BFS / Bellman Ford for each unit.

Find anagram of input on set of strings..?

Given a set of strings (large set), and an input string, you need to find all the anagrams of the input string efficiently. What data structure will you use. And using that, how will you find the anagrams?
Things that I have thought of are these:
Using maps
a) eliminate all words with more/less letters than the input.
b) put the input characters in map
c) Traverse the map for each string and see if all letters are present with their count.
Using Tries
a) Put all strings which have the right number of characters into a trie.
b) traverse each branch and go deeper if the letter is contained in the input.
c) if leaf reached the word is an anagram
Can anyone find a better solution?
Are there any problems that you find in the above approaches?
Build a frequency-map from each word and compare these maps.
Pseudo code:
class Word
string word
map<char, int> frequency
Word(string w)
word = w
for char in word
int count = frequency.get(char)
if count == null
count = 0
count++
frequency.put(char, count)
boolean is_anagram_of(that)
return this.frequency == that.frequency
You could build an hashmap where the key is sorted(word), and the value is a list of all the words that, sorted, give the corresponding key:
private Map<String, List<String>> anagrams = new HashMap<String, List<String>>();
void buildIndex(){
for(String word : words){
String sortedWord = sortWord(word);
if(!anagrams.containsKey(sortedWord)){
anagrams.put(sortedWord, new ArrayList<String>());
}
anagrams.get(sortedWord).add(word);
}
}
Then you just do a lookup for the sorted word in the hashmap you just built, and you'll have the list of all the anagrams.
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
/*
*Program for Find Anagrams from Given A string of Arrays.
*
*Program's Maximum Time Complexity is O(n) + O(klogk), here k is the length of word.
*
* By removal of Sorting, Program's Complexity is O(n)
* **/
public class FindAnagramsOptimized {
public static void main(String[] args) {
String[] words = { "gOd", "doG", "doll", "llod", "lold", "life",
"sandesh", "101", "011", "110" };
System.out.println(getAnaGram(words));
}
// Space Complexity O(n)
// Time Complexity O(nLogn)
static Set<String> getAnaGram(String[] allWords) {
// Internal Data Structure for Keeping the Values
class OriginalOccurence {
int occurence;
int index;
}
Map<String, OriginalOccurence> mapOfOccurence = new HashMap<>();
int count = 0;
// Loop Time Complexity is O(n)
// Space Complexity O(K+2K), here K is unique words after sorting on a
for (String word : allWords) {
String key = sortedWord(word);
if (key == null) {
continue;
}
if (!mapOfOccurence.containsKey(key)) {
OriginalOccurence original = new OriginalOccurence();
original.index = count;
original.occurence = 1;
mapOfOccurence.put(key, original);
} else {
OriginalOccurence tempVar = mapOfOccurence.get(key);
tempVar.occurence += 1;
mapOfOccurence.put(key, tempVar);
}
count++;
}
Set<String> finalAnagrams = new HashSet<>();
// Loop works in O(K), here K is unique words after sorting on
// characters
for (Map.Entry<String, OriginalOccurence> anaGramedWordList : mapOfOccurence.entrySet()) {
if (anaGramedWordList.getValue().occurence > 1) {
finalAnagrams.add(allWords[anaGramedWordList.getValue().index]);
}
}
return finalAnagrams;
}
// Array Sort works in O(nLogn)
// Customized Sorting for only chracter's works in O(n) time.
private static String sortedWord(String word) {
// int[] asciiArray = new int[word.length()];
int[] asciiArrayOf26 = new int[26];
// char[] lowerCaseCharacterArray = new char[word.length()];
// int characterSequence = 0;
// Ignore Case Logic written in lower level
for (char character : word.toCharArray()) {
if (character >= 97 && character <= 122) {
// asciiArray[characterSequence] = character;
if (asciiArrayOf26[character - 97] != 0) {
asciiArrayOf26[character - 97] += 1;
} else {
asciiArrayOf26[character - 97] = 1;
}
} else if (character >= 65 && character <= 90) {
// asciiArray[characterSequence] = character + 32;
if (asciiArrayOf26[character + 32 - 97] != 0) {
asciiArrayOf26[character + 32 - 97] += 1;
} else {
asciiArrayOf26[character + 32 - 97] = 1;
}
} else {
return null;
}
// lowerCaseCharacterArray[characterSequence] = (char)
// asciiArray[characterSequence];
// characterSequence++;
}
// Arrays.sort(lowerCaseCharacterArray);
StringBuilder sortedWord = new StringBuilder();
int asciiToIndex = 0;
// This Logic uses for reading the occurrences from array and copying
// back into the character array
for (int asciiValueOfCharacter : asciiArrayOf26) {
if (asciiValueOfCharacter != 0) {
if (asciiValueOfCharacter == 1) {
sortedWord.append((char) (asciiToIndex + 97));
} else {
for (int i = 0; i < asciiValueOfCharacter; i++) {
sortedWord.append((char) (asciiToIndex + 97));
}
}
}
asciiToIndex++;
}
// return new String(lowerCaseCharacterArray);
return sortedWord.toString();
}
}

How to find a word from arrays of characters?

What is the best way to solve this:
I have a group of arrays with 3-4 characters inside each like so:
{p, {a, {t, {m,
q, b, u, n,
r, c v o
s } } }
}
I also have an array of dictionary words.
What is the best/fastest way to find if the array of characters can combine to form one of the dictionary words? For example, the above arrays could make the words:
"pat","rat","at","to","bum"(lol)but not "nub" or "mat"Should i loop through the dictionary to see if words can be made or get all the combinations from the letters then compare those to the dictionary
I had some Scrabble code laying around, so I was able to throw this together. The dictionary I used is sowpods (267751 words). The code below reads the dictionary as a text file with one uppercase word on each line.
The code is C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Diagnostics;
namespace SO_6022848
{
public struct Letter
{
public const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
public static implicit operator Letter(char c)
{
return new Letter() { Index = Chars.IndexOf(c) };
}
public int Index;
public char ToChar()
{
return Chars[Index];
}
public override string ToString()
{
return Chars[Index].ToString();
}
}
public class Trie
{
public class Node
{
public string Word;
public bool IsTerminal { get { return Word != null; } }
public Dictionary<Letter, Node> Edges = new Dictionary<Letter, Node>();
}
public Node Root = new Node();
public Trie(string[] words)
{
for (int w = 0; w < words.Length; w++)
{
var word = words[w];
var node = Root;
for (int len = 1; len <= word.Length; len++)
{
var letter = word[len - 1];
Node next;
if (!node.Edges.TryGetValue(letter, out next))
{
next = new Node();
if (len == word.Length)
{
next.Word = word;
}
node.Edges.Add(letter, next);
}
node = next;
}
}
}
}
class Program
{
static void GenWords(Trie.Node n, HashSet<Letter>[] sets, int currentArrayIndex, List<string> wordsFound)
{
if (currentArrayIndex < sets.Length)
{
foreach (var edge in n.Edges)
{
if (sets[currentArrayIndex].Contains(edge.Key))
{
if (edge.Value.IsTerminal)
{
wordsFound.Add(edge.Value.Word);
}
GenWords(edge.Value, sets, currentArrayIndex + 1, wordsFound);
}
}
}
}
static void Main(string[] args)
{
const int minArraySize = 3;
const int maxArraySize = 4;
const int setCount = 10;
const bool generateRandomInput = true;
var trie = new Trie(File.ReadAllLines("sowpods.txt"));
var watch = new Stopwatch();
var trials = 10000;
var wordCountSum = 0;
var rand = new Random(37);
for (int t = 0; t < trials; t++)
{
HashSet<Letter>[] sets;
if (generateRandomInput)
{
sets = new HashSet<Letter>[setCount];
for (int i = 0; i < setCount; i++)
{
sets[i] = new HashSet<Letter>();
var size = minArraySize + rand.Next(maxArraySize - minArraySize + 1);
while (sets[i].Count < size)
{
sets[i].Add(Letter.Chars[rand.Next(Letter.Chars.Length)]);
}
}
}
else
{
sets = new HashSet<Letter>[] {
new HashSet<Letter>(new Letter[] { 'P', 'Q', 'R', 'S' }),
new HashSet<Letter>(new Letter[] { 'A', 'B', 'C' }),
new HashSet<Letter>(new Letter[] { 'T', 'U', 'V' }),
new HashSet<Letter>(new Letter[] { 'M', 'N', 'O' }) };
}
watch.Start();
var wordsFound = new List<string>();
for (int i = 0; i < sets.Length - 1; i++)
{
GenWords(trie.Root, sets, i, wordsFound);
}
watch.Stop();
wordCountSum += wordsFound.Count;
if (!generateRandomInput && t == 0)
{
foreach (var word in wordsFound)
{
Console.WriteLine(word);
}
}
}
Console.WriteLine("Elapsed per trial = {0}", new TimeSpan(watch.Elapsed.Ticks / trials));
Console.WriteLine("Average word count per trial = {0:0.0}", (float)wordCountSum / trials);
}
}
}
Here is the output when using your test data:
PA
PAT
PAV
QAT
RAT
RATO
RAUN
SAT
SAU
SAV
SCUM
AT
AVO
BUM
BUN
CUM
TO
UM
UN
Elapsed per trial = 00:00:00.0000725
Average word count per trial = 19.0
And the output when using random data (does not print each word):
Elapsed per trial = 00:00:00.0002910
Average word count per trial = 62.2
EDIT: I made it much faster with two changes: Storing the word at each terminal node of the trie, so that it doesn't have to be rebuilt. And storing the input letters as an array of hash sets instead of an array of arrays, so that the Contains() call is fast.
There are probably many way of solving this.
What you are interested in is the number of each character you have available to form a word, and how many of each character is required for each dictionary word. The trick is how to efficiently look up this information in the dictionary.
Perhaps you can use a prefix tree (a trie), some kind of smart hash table, or similar.
Anyway, you will probably have to try out all your possibilities and check them against the dictionary. I.e., if you have three arrays of three values each, there will be 3^3+3^2+3^1=39 combinations to check out. If this process is too slow, then perhaps you could stick a Bloom filter in front of the dictionary, to quickly check if a word is definitely not in the dictionary.
EDIT: Anyway, isn't this essentially the same as Scrabble? Perhaps try Googling for "scrabble algorithm" will give you some good clues.
The reformulated question can be answered just by generating and testing. Since you have 4 letters and 10 arrays, you've only got about 1 million possible combinations (10 million if you allow a blank character). You'll need an efficient way to look them up, use a BDB or some sort of disk based hash.
The trie solution previously posted should work as well, you are just restricted more by what characters you can choose at each step of the search. It should be faster as well.
I just made a very large nested for loop like this:
for(NSString*s1 in [letterList objectAtIndex:0]{
for(NSString*s2 in [letterList objectAtIndex:1]{
8 more times...
}
}
Then I do a binary search on the combination to see if it is in the dictionary and add it to an array if it is

Resources