How can I generate 100 random number of nodes from the whole graph Gephi Toolkit? - random

I'm working on a project and I'm using Gephi toolkit.I need to generate 100 random number of nodes from the whole graph.
public void script() {
//Init a project - and therefore a workspace
ProjectController pc = Lookup.getDefault().lookup(ProjectController.class);
pc.newProject();
Workspace workspace = pc.getCurrentWorkspace();
GraphModel graphModel = Lookup.getDefault().lookup(GraphController.class).getGraphModel();
PreviewModel model = Lookup.getDefault().lookup(PreviewController.class).getModel();
ImportController importController = Lookup.getDefault().lookup(ImportController.class);
FilterController filterController = Lookup.getDefault().lookup(FilterController.class);
AppearanceController appearanceController = Lookup.getDefault().lookup(AppearanceController.class);
AppearanceModel appearanceModel = appearanceController.getModel();
//Import file
Container container;
try {
// File file = new File(getClass().getResource("/org/gephi/toolkit/demos/polblogs.gml").toURI());
File file = new File(getClass().getResource("/org/gephi/toolkit/demos/Book3.csv").toURI());
container = importController.importFile(file);
container.getLoader().setEdgeDefault(EdgeDirectionDefault.DIRECTED); //Force DIRECTED
} catch (Exception ex) {
ex.printStackTrace();
return;
}
//Append imported data to GraphAPI
importController.process(container, new DefaultProcessor(), workspace);
//See if graph is well imported
DirectedGraph graph = graphModel.getDirectedGraph();
System.out.println("Nodes: " + graph.getNodeCount());
System.out.println("Edges: " + graph.getEdgeCount());
This code returns number of nodes and edges..but I can't find a function to extract a subset of nodes randomly...I need to print out number of nodes not all the nodes because i'm working on Genetic algorithm and i'm needing to generate initial population...please any idea.

There is probably a nicer way but you could use the NodeIterator returned from graph.getNodes()
Or use graph.getNodes().toArray() which will return an array of nodes.
You could then extract 100 random nodes from there making use of Math.random.
If you add your results to a set you can be sure that you aren't getting the same node more than once.
Not tested but something like this...
static final int POPULATION_SIZE = 100;
public Set<Node> getInitialPopulation(Graph graph){
Node[] nodes = graph.getNodes().toArray();
Set<Node> initialPopulation = new HashSet<>();
if(nodes.length < POPULATION_SIZE){
for(Node node : nodes){
initialPopulation.add(node);
}
return initialPopulation;
}
while(initialPopulation.size() < POPULATION_SIZE){
initialPopulation.add(nodes[(int)(Math.random()*nodes.length)]);
}
return initialPopulation;
}

Related

Binary Tree Step by Step Directions from One Node to Another

I am trying to solve LeetCode question 2096. Step-By-Step Directions From a Binary Tree Node to Another:
You are given the root of a binary tree with n nodes. Each node is uniquely assigned a value from 1 to n. You are also given an integer startValue representing the value of the start node s, and a different integer destValue representing the value of the destination node t.
Find the shortest path starting from node s and ending at node t. Generate step-by-step directions of such path as a string consisting of only the uppercase letters 'L', 'R', and 'U'. Each letter indicates a specific direction:
'L' means to go from a node to its left child node.
'R' means to go from a node to its right child node.
'U' means to go from a node to its parent node.
Return the step-by-step directions of the shortest path from node s to node t.
I have converted the tree to a graph using an adjacency list. For each node, I store the adjacent nodes as well as the direction. For example, suppose we have a tree [1,2,3], then at the end of traversal, we obtain a HashMap that looks like {1:[(2,'L'), (3,'R')], 2:[(1,'U')], 3:[(1,'U')].
I assumed that performing a BFS from startNode to endNode would help me trace the path. But I end up getting an incorrect answer or an extra step if the endNode was in the left but I tried the right node first or if I tried the right node first and the endNode was left.
I found on Stack Overflow How to trace the path in a Breadth-First Search? and it seems that my approach appears to be correct (I don't know what I am missing). I don't understand the purpose or the need to backtrace either.
My code is below:
public class StepByStep {
HashMap<TreeNode, HashMap<TreeNode, String>> graph = new HashMap<TreeNode, HashMap<TreeNode, String>>();
public static void main(String argv[]) {
TreeNode root = new TreeNode (5);
root.left = new TreeNode(1);
root.right = new TreeNode(2);
root.left.left = new TreeNode(3);
root.right.left = new TreeNode(6);
root.right.right = new TreeNode(4);
StepByStep sbs = new StepByStep();
System.out.println(sbs.getDirections(root, 3, 6));
Set<TreeNode> keys = sbs.graph.keySet();
for(TreeNode key : keys) {
System.out.print(key.val + " ");
HashMap<TreeNode, String> map = sbs.graph.get(key);
Set<TreeNode> nodes = map.keySet();
for(TreeNode node : nodes) {
System.out.print(node.val + map.get(node) + " ");
}
System.out.println();
}
}
public String getDirections(TreeNode root, int startValue, int destValue) {
// we do a inorder traversal
inorder(root, null);
//now we perform a breadth first search using the graph
Set<TreeNode> keys = graph.keySet();
TreeNode start = null;
for(TreeNode key : keys) {
if(key.val == startValue) {
start = key;
break;
}
}
return bfs(start, destValue);
}
public String bfs(TreeNode root, int destValue) {
Queue<TreeNode> queue = new LinkedList<TreeNode>();
HashSet<TreeNode> visited = new HashSet<TreeNode>();
queue.add(root);
StringBuilder sb = new StringBuilder("");
while(!queue.isEmpty()) {
int size = queue.size();
while(size > 0) {
TreeNode current = queue.poll();
if(current.val == destValue) {
return sb.toString();
}
visited.add(current);
HashMap<TreeNode, String> map = graph.get(current);
Set<TreeNode> keys = map.keySet();
for(TreeNode key : keys) {
if(!visited.contains(key)) {
sb.append(map.get(key));
queue.add(key);
}
}
--size;
}
}
return "";
}
public void inorder(TreeNode root, TreeNode parent) {
if (root == null)
return;
inorder(root.left, root);
inorder(root.right, root);
if (root.left != null) {
if (!graph.containsKey(root)) {
graph.put(root, new HashMap<TreeNode, String>());
}
HashMap<TreeNode, String> map = graph.get(root);
map.put(root.left, "L");
graph.put(root, map);
}
if (root.right != null) {
if (!graph.containsKey(root)) {
graph.put(root, new HashMap<TreeNode, String>());
}
HashMap<TreeNode, String> map = graph.get(root);
map.put(root.right, "R");
graph.put(root, map);
}
if (parent != null) {
if (!graph.containsKey(root)) {
graph.put(root, new HashMap<TreeNode, String>());
}
HashMap<TreeNode, String> map = graph.get(root);
map.put(parent, "U");
graph.put(root, map);
}
}
}
What am I missing?
The problem is in bfs: there you add every visited node to sb, but that is wrong, since these nodes are not all on the path from the root to that particular node. Instead, you should consider that every visited node represents its own unique path from the root, which does not include all the nodes that were already visited, just a few of those.
One solution is that you store in the queue not only the node, but also its own move-string (like its private sb), so that you actually store a Pair in the queue: a node and a string buffer.
Once you find the destination, you can then return that particular string.
Alternatively, you'd make life easier if you would perform a depth-first search (with recursion), building the string while backtracking. This will cost less memory. Breadth first is interesting when you need to find a shortest path, but in a tree there is only one path between two nodes, so finding any path is good enough, and that is what a depth-first search will give you.
Finally, I would solve this problem as follows:
Perform a depth first traversal, and collect the steps ("L" or "R") from the root to the start node and the steps from the root to the destination node. These paths only have the letters "L" and "R" as there is no upwards movement.
Remove the common prefix from these paths, so that both paths now start from their lowest common ancestor node.
Replace all letters of the first path (if any) with "U".
Done.

Java8 calculate average of list of objects in the map

Initial data:
public class Stats {
int passesNumber;
int tacklesNumber;
public Stats(int passesNumber, int tacklesNumber) {
this.passesNumber = passesNumber;
this.tacklesNumber = tacklesNumber;
}
public int getPassesNumber() {
return passesNumber;
}
public void setPassesNumber(int passesNumber) {
this.passesNumber = passesNumber;
}
public int getTacklesNumber() {
return tacklesNumber;
}
public void setTacklesNumber(int tacklesNumber) {
this.tacklesNumber = tacklesNumber;
}
}
Map<String, List<Stats>> statsByPosition = new HashMap<>();
statsByPosition.put("Defender", Arrays.asList(new Stats(10, 50), new Stats(15, 60), new Stats(12, 100)));
statsByPosition.put("Attacker", Arrays.asList(new Stats(80, 5), new Stats(90, 10)));
I need to calculate an average of Stats by position. So result should be a map with the same keys, however values should be aggregated to single Stats object (List should be reduced to single Stats object)
{
"Defender" => Stats((10 + 15 + 12) / 3, (50 + 60 + 100) / 3),
"Attacker" => Stats((80 + 90) / 2, (5 + 10) / 2)
}
I don't think there's anything new in Java8 that could really help in solving this problem, at least not efficiently.
If you look carefully at all new APIs, then you will see that majority of them are aimed at providing more powerful primitives for working on single values and their sequences - that is, on sequences of double, int, ? extends Object, etc.
For example, to compute an average on sequence on double, JDK introduces a new class - DoubleSummaryStatistics which does an obvious thing - collects a summary over arbitrary sequence of double values.
I would actually suggest that you yourself go for similar approach: make your own StatsSummary class that would look along the lines of this:
// assuming this is what your Stats class look like:
class Stats {
public final double a ,b; //the two stats
public Stats(double a, double b) {
this.a = a; this.b = b;
}
}
// summary will go along the lines of:
class StatsSummary implements Consumer<Stats> {
DoubleSummaryStatistics a, b; // summary of stats collected so far
StatsSummary() {
a = new DoubleSummaryStatistics();
b = new DoubleSummaryStatistics();
}
// this is how we collect it:
#Override public void accept(Stats stat) {
a.accept(stat.a); b.accept(stat.b);
}
public void combine(StatsSummary other) {
a.combine(other.a); b.combine(other.b);
}
// now for actual methods that return stuff. I will implement only average and min
// but rest of them are not hard
public Stats average() {
return new Stats(a.getAverage(), b.getAverage());
}
public Stats min() {
return new Stats(a.getMin(), b.getMin());
}
}
Now, above implementation will actually allow you to express your proper intents when using Streams and such: by building a rigid API and using classes available in JDK as building blocks, you get less errors overall.
However, if you only want to compute average once somewhere and don't need anything else, coding this class is a little overkill, and here's a quick-and-dirty solution:
Map<String, Stats> computeAverage(Map<String, List<Stats>> statsByPosition) {
Map<String, Stats> averaged = new HashMap<>();
statsByPosition.forEach((position, statsList) -> {
averaged.put(position, averageStats(statsList));
});
return averaged;
}
Stats averageStats(Collection<Stats> stats) {
double a, b;
int len = stats.size();
for(Stats stat : stats) {
a += stat.a;
b += stat.b;
}
return len == 0d? new Stats(0,0) : new Stats(a/len, b/len);
}
There is probably a cleaner solution with Java 8, but this works well and isn't too complex:
Map<String, Stats> newMap = new HashMap<>();
statsByPosition.forEach((key, statsList) -> {
newMap.put(key, new Stats(
(int) statsList.stream().mapToInt(Stats::getPassesNumber).average().orElse(0),
(int) statsList.stream().mapToInt(Stats::getTacklesNumber).average().orElse(0))
);
});
The functional forEach method lets you iterate over every key value pair of your given map.
You just put a new entry in your map for the averaged values. There you take the key you have already in your given map. The new value is a new Stats, where the arguments for the constructor are calculated directly.
Just take the value of your old map, which is the statsList in the forEach function, map the values from the given stats to Integer value with mapToInt and use the average function.
This function returns an OptionalDouble which is nearly the same as Optional<Double>. Preventing that anything didn't work, you use its orElse() method and pass a default value (like 0). Since the average values are double you have to cast the value to int.
As mentioned, there doubld probably be a even shorter version, using reduce.
You might as well use custom collector. Let's add the following methods to Stats class:
public Stats() {
}
public void accumulate(Stats stats) {
passesNumber += stats.passesNumber;
tacklesNumber += stats.tacklesNumber;
}
public Stats combine(Stats acc) {
passesNumber += acc.passesNumber;
tacklesNumber += acc.tacklesNumber;
return this;
}
#Override
public String toString() {
return "Stats{" +
"passesNumber=" + passesNumber +
", tacklesNumber=" + tacklesNumber +
'}';
}
Now we can use Stats in collect method:
System.out.println(statsByPosition.entrySet().stream().collect(
Collectors.toMap(
entity -> entity.getKey(),
entity -> {
Stats entryStats = entity.getValue().stream().collect(
Collector.of(Stats::new, Stats::accumulate, Stats::combine)
); // get stats for each map key.
// get average
entryStats.setPassesNumber(entryStats.getPassesNumber() / entity.getValue().size());
// get average
entryStats.setTacklesNumber(entryStats.getTacklesNumber() / entity.getValue().size());
return entryStats;
}
))); // {Attacker=Stats{passesNumber=85, tacklesNumber=7}, Defender=Stats{passesNumber=12, tacklesNumber=70}}
If java-9 is available and StreamEx, you could do :
public static Map<String, Stats> third(Map<String, List<Stats>> statsByPosition) {
return statsByPosition.entrySet().stream()
.collect(Collectors.groupingBy(e -> e.getKey(),
Collectors.flatMapping(e -> e.getValue().stream(),
MoreCollectors.pairing(
Collectors.averagingDouble(Stats::getPassesNumber),
Collectors.averagingDouble(Stats::getTacklesNumber),
(a, b) -> new Stats(a, b)))));
}

Two sum data structure problems

I built a data structure for two sum question. In this data structure I built add and find method.
add - Add the number to an internal data structure.
find - Find if there exists any pair of numbers which sum is equal to the value.
For example:
add(1); add(3); add(5);
find(4) // return true
find(7) // return false
the following is my code, so what is wrong with this code?
http://www.lintcode.com/en/problem/two-sum-data-structure-design/
this is the test website, some cases could not be passed
public class TwoSum {
private List<Integer> sets;
TwoSum() {
this.sets = new ArrayList<Integer>();
}
// Add the number to an internal data structure.
public void add(int number) {
// Write your code here
this.sets.add(number);
}
// Find if there exists any pair of numbers which sum is equal to the value.
public boolean find(int value) {
// Write your code here
Collections.sort(sets);
for (int i = 0; i < sets.size(); i++) {
if (sets.get(i) > value) break;
for (int j = i + 1; j < sets.size(); j++) {
if (sets.get(i) + sets.get(j) == value) {
return true;
}
}
}
return false;
}
}
There does not seem to be anything wrong with your code.
However a coding challenge could possibly require a more performant solution. (You check every item against every item, which would take O(N^2)).
The best solution to implement find, is using a HashMap, which would take O(N). It's explained more in detail here.

Algorithm that discovers all the fields on a map with as least turns as possible

Let's say I have such map:
#####
..###
W.###
. is a discovered cell.
# is an undiscovered cell.
W is a worker. There can be many workers. Each of them can move once per turn. In one turn he can move by one cell in 4 directions (up, right, down or left). He discovers all 8 cells around him - turns # into .. In one turn, there can be maximum one worker on the same cell.
Maps are not always rectangular. In the beginning all cells are undiscovered, except the neighbours of W.
The goal is to make all the cells discovered, in as least turns as possible.
First approach
Find the nearest # and go towards it. Repeat.
To find the nearest # I start BFS from W and finish it when first # is found.
On exemplary map it can give such solution:
##### ##### ##### ##### ##... #.... .....
..### ...## ....# ..... ...W. ..W.. .W...
W.### .W.## ..W.# ...W. ..... ..... .....
6 turns. Pretty far from optimal:
##### ..### ...## ....# .....
..### W.### .W.## ..W.# ...W.
W.### ..### ...## ....# .....
4 turns.
Question
What is the algorithm that discovers all the cells with as least turns as possible?
Here is a basic idea that uses A*. It is probably quite time- and memory-consuming, but it is guaranteed to return an optimal solution and is definitely better than brute force.
The nodes for A* will be the various states, i.e. where the workers are positioned and the discovery state of all cells. Each unique state represents a different node.
Edges will be all possible transitions. One worker has four possible transitions. For more workers, you will need every possible combination (about 4^n edges). This is the part where you can constrain the workers to remain within the grid and not to overlap.
The cost will be the number of turns. The heuristic to approximate the distance to the goal (all cells discovered) can be developed as follows:
A single worker can discover at most three cells per turn. Thus, n workers can discover at most 3*n cells. The minimum number of remaining turns is therefore "number of undiscovered cells / (3 * worker count)". This is the heuristic to use. This could even be improved by determining the maximum number of cells that each worker can discover in the next turn (will be max. 3 per worker). So overall heuristic would be "(undiscorvered cells - discoverable cells) / (3 * workers) + 1".
In each step you examine the node with the least overall cost (turns so far + heuristic). For the examined node, you calculate the costs for each surrounding node (possible movements of all workers) and go on.
Strictly speaking, the main part of this answer may be considered as "Not An Answer". So to first cover the actual question:
What is the algorithm that discovers all the cells with as least turns as possible?
Answer: In each step, you can compute all possible successors of the current state. Then the successors of these successors. This can be repeated recursively, until one of the successors contains no more #-fields. The sequence of states through which this successor was reached is optimal regarding the number of moves that have been necessary to reach this state.
So far, this is trivial. But of course, this is not feasible for a "large" map and/or a "large" number of workers.
As mentioned in the comments: I think that finding the optimal solution may be an NP-complete problem. In any case, it's most likely at least a tremendously complicated optimization problem where you may employ some rather sophisticated techniques to find the optimal solution in optimal time.
So, IMHO, the only feasible approach for tackling this are heuristics.
Several approaches can be imagined here. However, I wanted to give it a try, with a very simple approach. The following MCVE accepts the definition of the map as a rectangular string (empty spaces represent "invalid" regions, so it's possible to represent non-rectangular maps with that). The workers are simply enumerated, from 0 to 9 (limited to this number, at the moment). The string is converted into a MapState that consists of the actual map, as well as the paths that the workers have gone through until then.
The actual search here is a "greedy" version of the exhaustive search that I described in the first paragraph: Given an initial state, it computes all successor states. These are the states where each worker has moved in either direction (e.g. 64 states for 3 workers - of course these are "filtered" to make sure that workers don't leave the map or move to the same field).
These successor states are stored in a list. Then it searches the list for the "best" state, and again computes all successors of this "best" state and stores them in the list. Sooner or later, the list contains a state where no fields are missing.
The definition of the "best" state is where the heuristics come into play: A state is "better" than another when there are fewer fields missing (unvisited). When two states have an equal number of missing fields, then the average distance of the workers to the next unvisited fields serves as the criterion to decide which one is "better".
This finds and a solution for the example that is contained in the code below rather quickly, and prints it as the lists of positions that each worker has to visit in each turn.
Of course, this will also not be applicable to "really large" maps or "many" workers, because the list of states will grow rather quickly (one could consider dropping the "worst" solutions to speed this up a little, but this may have caveats, like being stuck in local optima). Additionally, one can easily think of cases where the "greedy" strategy does not give optimal results. But until someone posts an MVCE that always computes the optimal solution in polynomial time, maybe someone finds this interesting or helpful.
import java.awt.Point;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class MapExplorerTest
{
public static void main(String[] args)
{
String mapString =
" ### ######"+"\n"+
" ### ###1##"+"\n"+
"###############"+"\n"+
"#0#############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"##### #######"+"\n"+
"##### #######"+"\n"+
"##### #######"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"###############"+"\n"+
"### ######2##"+"\n"+
"### #########"+"\n";
MapExplorer m = new MapExplorer(mapString);
MapState solution = m.computeSolutionGreedy();
System.out.println(solution.createString());
}
}
class MapState
{
private int rows;
private int cols;
private char map[][];
List<List<Point>> workerPaths;
private int missingFields = -1;
MapState(String mapString)
{
workerPaths = new ArrayList<List<Point>>();
rows = countLines(mapString);
cols = mapString.indexOf("\n");
map = new char[rows][cols];
String s = mapString.replaceAll("\\n", "");
for (int r=0; r<rows; r++)
{
for (int c=0; c<cols; c++)
{
int i = c+r*cols;
char ch = s.charAt(i);
map[r][c] = ch;
if (Character.isDigit(ch))
{
int workerIndex = ch - '0';
while (workerPaths.size() <= workerIndex)
{
workerPaths.add(new ArrayList<Point>());
}
Point p = new Point(r, c);
workerPaths.get(workerIndex).add(p);
}
}
}
}
MapState(MapState other)
{
this.rows = other.rows;
this.cols = other.cols;
this.map = new char[other.map.length][];
for (int i=0; i<other.map.length; i++)
{
this.map[i] = other.map[i].clone();
}
this.workerPaths = new ArrayList<List<Point>>();
for (List<Point> otherWorkerPath : other.workerPaths)
{
this.workerPaths.add(MapExplorer.copy(otherWorkerPath));
}
}
int distanceToMissing(Point p0)
{
if (getMissingFields() == 0)
{
return -1;
}
List<Point> points = new ArrayList<Point>();
Map<Point, Integer> distances = new HashMap<Point, Integer>();
distances.put(p0, 0);
points.add(p0);
while (!points.isEmpty())
{
Point p = points.remove(0);
List<Point> successors = MapExplorer.computeSuccessors(p);
for (Point s : successors)
{
if (!isValid(p))
{
continue;
}
if (map[p.x][p.y] == '#')
{
return distances.get(p)+1;
}
if (!distances.containsKey(s))
{
distances.put(s, distances.get(p)+1);
points.add(s);
}
}
}
return -1;
}
double averageDistanceToMissing()
{
double d = 0;
for (List<Point> workerPath : workerPaths)
{
Point p = workerPath.get(workerPath.size()-1);
d += distanceToMissing(p);
}
return d / workerPaths.size();
}
int getMissingFields()
{
if (missingFields == -1)
{
missingFields = countMissingFields();
}
return missingFields;
}
private int countMissingFields()
{
int count = 0;
for (int r=0; r<rows; r++)
{
for (int c=0; c<cols; c++)
{
if (map[r][c] == '#')
{
count++;
}
}
}
return count;
}
void update()
{
for (List<Point> workerPath : workerPaths)
{
Point p = workerPath.get(workerPath.size()-1);
for (int dr=-1; dr<=1; dr++)
{
for (int dc=-1; dc<=1; dc++)
{
if (dr == 0 && dc == 0)
{
continue;
}
int nr = p.x + dr;
int nc = p.y + dc;
if (!isValid(nr, nc))
{
continue;
}
if (map[nr][nc] != '#')
{
continue;
}
map[nr][nc] = '.';
}
}
}
}
public void updateWorkerPosition(int w, Point p)
{
List<Point> workerPath = workerPaths.get(w);
Point old = workerPath.get(workerPath.size()-1);
char oc = map[old.x][old.y];
char nc = map[p.x][p.y];
map[old.x][old.y] = nc;
map[p.x][p.y] = oc;
}
boolean isValid(int r, int c)
{
if (r < 0) return false;
if (r >= rows) return false;
if (c < 0) return false;
if (c >= cols) return false;
if (map[r][c] == ' ')
{
return false;
}
return true;
}
boolean isValid(Point p)
{
return isValid(p.x, p.y);
}
private static int countLines(String s)
{
int count = 0;
while (s.contains("\n"))
{
s = s.replaceFirst("\\\n", "");
count++;
}
return count;
}
public String createMapString()
{
StringBuilder sb = new StringBuilder();
for (int r=0; r<rows; r++)
{
for (int c=0; c<cols; c++)
{
sb.append(map[r][c]);
}
sb.append("\n");
}
return sb.toString();
}
public String createString()
{
StringBuilder sb = new StringBuilder();
for (List<Point> workerPath : workerPaths)
{
Point p = workerPath.get(workerPath.size()-1);
int d = distanceToMissing(p);
sb.append(workerPath).append(", distance: "+d+"\n");
}
sb.append(createMapString());
sb.append("Missing "+getMissingFields());
return sb.toString();
}
}
class MapExplorer
{
MapState mapState;
public MapExplorer(String mapString)
{
mapState = new MapState(mapString);
mapState.update();
computeSuccessors(mapState);
}
static List<Point> copy(List<Point> list)
{
List<Point> result = new ArrayList<Point>();
for (Point p : list)
{
result.add(new Point(p));
}
return result;
}
public MapState computeSolutionGreedy()
{
Comparator<MapState> comparator = new Comparator<MapState>()
{
#Override
public int compare(MapState ms0, MapState ms1)
{
int m0 = ms0.getMissingFields();
int m1 = ms1.getMissingFields();
if (m0 != m1)
{
return m0-m1;
}
double d0 = ms0.averageDistanceToMissing();
double d1 = ms1.averageDistanceToMissing();
return Double.compare(d0, d1);
}
};
Set<MapState> handled = new HashSet<MapState>();
List<MapState> list = new ArrayList<MapState>();
list.add(mapState);
while (true)
{
MapState best = list.get(0);
for (MapState mapState : list)
{
if (!handled.contains(mapState))
{
if (comparator.compare(mapState, best) < 0)
{
best = mapState;
}
}
}
if (best.getMissingFields() == 0)
{
return best;
}
handled.add(best);
list.addAll(computeSuccessors(best));
System.out.println("List size "+list.size()+", handled "+handled.size()+", best\n"+best.createString());
}
}
List<MapState> computeSuccessors(MapState mapState)
{
int numWorkers = mapState.workerPaths.size();
List<Point> oldWorkerPositions = new ArrayList<Point>();
for (int i=0; i<numWorkers; i++)
{
List<Point> workerPath = mapState.workerPaths.get(i);
Point p = workerPath.get(workerPath.size()-1);
oldWorkerPositions.add(p);
}
List<List<Point>> successorPositionsForWorkers = new ArrayList<List<Point>>();
for (int w=0; w<oldWorkerPositions.size(); w++)
{
Point p = oldWorkerPositions.get(w);
List<Point> ps = computeSuccessors(p);
successorPositionsForWorkers.add(ps);
}
List<List<Point>> newWorkerPositionsList = new ArrayList<List<Point>>();
int numSuccessors = (int)Math.pow(4, numWorkers);
for (int i=0; i<numSuccessors; i++)
{
String s = Integer.toString(i, 4);
while (s.length() < numWorkers)
{
s = "0"+s;
}
List<Point> newWorkerPositions = copy(oldWorkerPositions);
for (int w=0; w<numWorkers; w++)
{
int index = s.charAt(w) - '0';
Point newPosition = successorPositionsForWorkers.get(w).get(index);
newWorkerPositions.set(w, newPosition);
}
newWorkerPositionsList.add(newWorkerPositions);
}
List<MapState> successors = new ArrayList<MapState>();
for (int i=0; i<newWorkerPositionsList.size(); i++)
{
List<Point> newWorkerPositions = newWorkerPositionsList.get(i);
if (workerPositionsValid(newWorkerPositions))
{
MapState successor = new MapState(mapState);
for (int w=0; w<numWorkers; w++)
{
Point p = newWorkerPositions.get(w);
successor.updateWorkerPosition(w, p);
successor.workerPaths.get(w).add(p);
}
successor.update();
successors.add(successor);
}
}
return successors;
}
private boolean workerPositionsValid(List<Point> workerPositions)
{
Set<Point> set = new HashSet<Point>();
for (Point p : workerPositions)
{
if (!mapState.isValid(p.x, p.y))
{
return false;
}
set.add(p);
}
return set.size() == workerPositions.size();
}
static List<Point> computeSuccessors(Point p)
{
List<Point> result = new ArrayList<Point>();
result.add(new Point(p.x+0, p.y+1));
result.add(new Point(p.x+0, p.y-1));
result.add(new Point(p.x+1, p.y+0));
result.add(new Point(p.x-1, p.y+0));
return result;
}
}

How to efficiently add the entire English dictionary to a trie data structure

Simply put I want to check if a specified word exists or not.
The lookup needs to be very fast which is why I decided to store the dictionary in a trie. So far so good! My trie works without issues. The problem is filling the trie with a dictionary. What I'm currently doing is looping through every line of a plain text file that is the dictionary and adding each word to my trie.
This is understandably so an extremely slow process. The file contains just about 120 000 lines. If anyone could point me in the right direction for what I could do it would be much appreciated!
This is how I add words to the trie (in Boo):
trie = Trie()
saol = Resources.Load("saol") as TextAsset
text = saol.text.Split(char('\n'))
for new_word in text:
trie.Add(new_word)
And this is my trie (in C#):
using System.Collections.Generic;
public class TrieNode {
public char letter;
public bool word;
public Dictionary<char, TrieNode> child;
public TrieNode(char letter) {
this.letter = letter;
this.word = false;
this.child = new Dictionary<char, TrieNode>();
}
}
public class Trie {
private TrieNode root;
public Trie() {
root = new TrieNode(' ');
}
public void Add(string word) {
TrieNode node = root;
bool found_letter;
int c = 1;
foreach (char letter in word) {
found_letter = false;
// if current letter is in child list, set current node and break loop
foreach (var child in node.child) {
if (letter == child.Key) {
node = child.Value;
found_letter = true;
break;
}
}
// if current letter is not in child list, add child node and set it as current node
if (!found_letter) {
TrieNode new_node = new TrieNode(letter);
if (c == word.Length) new_node.word = true;
node.child.Add(letter, new_node);
node = node.child[letter];
}
c ++;
}
}
public bool Find(string word) {
TrieNode node = root;
bool found_letter;
int c = 1;
foreach (char letter in word) {
found_letter = false;
// check if current letter is in child list
foreach (var child in node.child) {
if (letter == child.Key) {
node = child.Value;
found_letter = true;
break;
}
}
if (found_letter && node.word && c == word.Length) return true;
else if (!found_letter) return false;
c ++;
}
return false;
}
}
Assuming that you don't have any serious implementation problems, pay the price for populating the trie. After you've populated the trie serialize it to a file. For future needs, just load the serialized version. That should be faster that reconstructing the trie.
-- ADDED --
Looking closely at your TrieNode class, you may want to replacing the Dictionary you used for child with an array. You may consume more space, but have a faster lookup time.
Anything you do with CLI yourself will be slower then using the built-in functions.
120k is not that much for a dictionary.
First thing I would do is fire up the code performance tool.
But just some wild guesses: You have a lot of function calls. Just starting with the Boo C# binding in a for loop. Try to pass the whole text block and tare it apart with C#.
Second, do not use a Dictionary. You waste just about as much resources with your code now as you would just using a Dictionary.
Third, sort the text before you go inserting - you can probably make some optimizations that way. Maybe just construct a suffix table.

Resources