Related
I have two sequences of items:
S1 = [ A B C D E F ]
S2 = [ 1 2 3 4 5 6 7 8 ]
And I can determine "similarity" for each pair of items (s1, s2) as a number (for example on scale 0 to 10).
I want to find a mapping between S1/S2 items, such that ordering of each sequence is preserved and sum of "similarity" values between mapped items is maximum. It is not required that all S1/S2 items are part of mapping.
Example:
[ A B C D E F ]
[ 1 2 3 4 5 6 7 8 ]
In example above, mapping 'A on 3', 'D on 4' and 'F on 6' gives overall maximum "similarity".
Are there any existing problems (/algorithms) this could be turned into?
Looks like the Smith–Waterman algorithm, which is traditional used for determining similar regions between two strings of nucleic acid sequences or protein sequences, should be perfect:
Smith–Waterman algorithm aligns two sequences by matches/mismatches (also known as substitutions), insertions, and deletions. Both insertions and deletions are the operations that introduce gaps, which are represented by dashes. The Smith–Waterman algorithm has several steps:
Determine the substitution matrix and the gap penalty scheme. A substitution matrix assigns each pair of items (s1, s2) a score for match or mismatch. Usually matches get positive scores, whereas mismatches get relatively lower scores. A gap penalty function determines the score cost for opening or extending gaps. It is suggested that users choose the appropriate scoring system based on the goals. In addition, it is also a good practice to try different combinations of substitution matrices and gap penalties.
Initialize the scoring matrix. The dimensions of the scoring matrix are 1+length of each sequence respectively. All the elements of the first row and the first column are set to 0. The extra first row and first column make it possible to align one sequence to another at any position, and setting them to 0 makes the terminal gap free from penalty.
Scoring. Score each element from left to right, top to bottom in the matrix, considering the outcomes of substitutions (diagonal scores) or adding gaps (horizontal and vertical scores). If none of the scores are positive, this element gets a 0. Otherwise the highest score is used and the source of that score is recorded.
Traceback. Starting at the element with the highest score, traceback based on the source of each score recursively, until 0 is encountered. The segments that have the highest similarity score based on the given scoring system is generated in this process. To obtain the second best local alignment, apply the traceback process starting at the second highest score outside the trace of the best alignment.
Just choose the substitution matrix to match yours
And I can determine "similarity" for each pair of items (s1, s2) as a number (for example on scale 0 to 10).
and set the gap and no match penalty to zero
I want to find a mapping between S1/S2 items, such that ordering of each sequence is preserved and sum of "similarity" values between mapped items is maximum. It is not required that all S1/S2 items are part of mapping.
More information can be found at: https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm#Scoring_matrix
The problem you described looks like Longest Common Subsequence Problem variation.
Use this recurrent relation instead of original:
ans[i][j] = max(
ans[i-1][j],
ans[i][j-1],
ans[i-1][j-1] + similarity(S1[i], S2[j])
)
In looking through the dynamic programming algorithm for computing the minimum edit distance between two strings I am having a hard time grasping one thing. To me it seems like given the two strings s and t inserting a character into s would be the same as deleting a character from t. Why then do we need to consider these operations separately when computing the edit distance? I always have a hard time computing the indices in the recurrence relation because I can't intuitively understand this part.
I've read through Skiena and some other sources but they all don't explain this part well. This SO link explains the insert and delete operations better than elsewhere in terms of understanding what string is being inserted into or deleted from but I still can't figure out why they aren't one and the same.
Edit: Ok, I didn't do a very good job of detailing the source of my confusion.
The way Skiena explains computing the minimum edit distance m(i,j) of the first i characters of a string s and the first j characters of a string t based on already having computed solutions to the subproblems is as follows. m(i,j) will be the minimum of the following 3 possibilities:
opt[MATCH] = m[i-1][j-1].cost + match(s[i],t[j]);
opt[INSERT] = m[i][j-1].cost + indel(t[j]);
opt[DELETE] = m[i-1][j].cost + indel(s[i]);
The way I understand it the 3 operations are all operations on the string s. An INSERT means you have to insert a character at the end of string s to get the minimum edit distance. A DELETE means you have to delete the character at the end of string s to get the minimum edit distance.
Given s = "SU" and t = "SATU" INSERT and DELETE would be as follows:
Insert:
SU_
SATU
Delete:
SU
SATU_
My confusion was that an INSERT into s is the same as a DELETION from t. I'm probably confused on something basic but it's not intuitive to me yet.
Edit 2: I think this link kind of clarifies my confusion but I'd love an explanation given my specific questions above.
They aren't the same thing any more than < and > are the same thing. There is of course a sort of duality and you are correct to point it out. a < b if and only if b > a so if you have a good algorithm to test for b > a then it makes sense to use it when you need to test if a < b.
It is much easier to directly test if s can be obtained from t by deletion rather than to directly test if t can be obtained from s by insertion. It would be silly to randomly insert letters to s and see if you get t. I can't imagine that any implementation of edit-distance actually does that. Still, it doesn't mean that you can't distinguish between insertion and deletion.
More abstractly. There is a relation, R on any set of strings defined by
s R t <=> t can be obtained from s by insertion
deletion is the inverse relation. Closely related, but not the same.
The problem of edit distance can be restated as a problem of converting the source string into target string with minimum number of operations (including insertion, deletion and replacement of a single character).
Thus, in the process of converting a source string into a target string, if inserting a character from target string or deleting a character from the source string or replacing a character in the source string with a character from the target string yields the same (minimum) edit distance, then, well, all the operations can be said to be equivalent. In other words, it does not matter how you arrive at the target string as long as you have done minimum number of edits.
This is realized by looking at how the cost matrix is calculated. Consider a simpler problem where source = AT (represented vertically) and target = TA (represented horizontally). The matrix is then constructed as (coming from west, northwest, north in that order):
| ε | T | A |
| | | |
ε | 0 | 1 | 2 |
| | | |
A | 1 | min(2, 1, 2) = 1 | min(2, 1, 3) = 1 |
| | | |
T | 2 | min(3, 1, 3) = 1 | min(2, 2, 2) = 2 |
The idea of filling this matrix is:
If we moved east, we insert the current target string character.
If we moved south, we delete the current source string character.
If we moved southeast, we replace the current source character with current target character.
If all or any two of these impart the same cost in terms of editing, then they can be said to be equivalent and you can break the ties arbitrarily.
One of the first experiences with this comes when we find c(2, 2) in the cost matrix (c(0, 0) through c(0, 2) -- minimum costs of converting an empty string to "T", "TA" respectively, and c(0, 0) to c(2,0) -- costs of converting "A", "AT" respectively to empty string are clear).
Value of c(2, 2), can be realized either by:
inserting the current character in target, 'A' (we move east from c(2,1)) -- cost is 1 + 1 = 2, or
replacing the current character 'T' in source by current character in target 'A' -- cost is `1 + 1 = 2
deleting the current character in source, 'T' (we move south from c(1, 2)) -- cost is 1 + 1 = 2
Since all values are the same, which one are you going to choose?
If you choose to move from west, your alignment could be:
A T -
- T A
(one deletion, one 0-cost replacement, one insertion)
If you choose to move from north, your alignment could be:
- A T
T A -
(one insertion, one 0-cost replacement, one deletion)
If you choose to move from northwest, your alignment could be:
A T
T A
(Two 1-cost replacements).
All these edit graphs are equivalent in terms of given edit distance (under given cost function).
Edit distance is only interested in the minimum number of operations required to transform one sequence into another; it is not interested in the uniqueness of the transformation. In practice, there are often multiple ways to transform one string into another, that all have the minimum number of operations.
According to wikipedia, the definition of the recursive formula which calculates the Levenshtein distance between two strings a and b is the following:
I don't understand why we don't take into consideration the cases in which we delete a[j], or we insert b[i]. Also, correct me if I am wrong, isn't the case of insertion the same as the case of the deletion? I mean, instead of deleting a character from one string, we could insert the same character in the second string, and the opposite. So why not merge the insert/delete operations into one operation with cost equal to min{cost_insert, cost_delete}?
This is not done, because you are not allowed to edit both strings. The definition of the edit distance (from wikipedia) is:
the minimum-weight series of edit operations that transforms a into b.
So you are specifically looking for (the weight of) a sequence of operations to execute on the string a to transform it into string b.
Also, the edit distance is not necessarily symmetric. If your costs for inserts and deletions are identical the distance is symmetric: d(a,b) = d(b,a)
Consider the wikipedia example but with different costs:
costs for insertions: w_ins = 1
costs for deletions: w_del = 2
costs for substitutions: w_sub = 1
The distance of kitten and sitting still is 3,
kitten -> sitten (substitution k->s, cost 1)
sitten -> sittin (substitution e->i, cost 1)
sittin -> sitting (insertion of g, cost 1)
=> d(kitten, sitting) = 3
but the distance of sitting and kitten is not:
sitting -> kitting (substitution s->k, cost 1)
kitting -> kitteng (substitution i->e, cost 1)
kitteng -> kitten (deletion of g, cost 2)
=> d(kitten, sitting) = 4
You see that d(kitten, sitting) != d(kitten, sitting).
On the other hand if you do use symmetric costs, as the Levenshtein distance (which is an edit distance with unit costs) does, you can assume that d(a,b) = d(b,a) holds. Then you do not win anything by also considering the inverse cases. What you lose is the information which character has been replaced in which string, what makes it harder to extract the sequence of operations afterwards.
The Wagner-Fisher algorithm which you are showing in your question can extract this from the DP matrix by backtracking the path with minimal costs. Consider this two edit matrices between to and foo with unit costs:
t o f o o
f 1 2 t 1 2 3
o 2 1 o 2 1 2
o 3 2
Note that if you transpose the matrix for d(to, foo) you get the matrix for d(foo, to). Note that by this, an insertion in the first matrix becomes a deletion in the second matrix and vice versa. So this is where this symmetry you are looking for is coming up again.
I hope this helps :)
If the costs of insertions and deletions differ, inserting in one string isn't the same as deleting from the other. Even the cost of a substitution could differ from the cost of insertion+deletion and it is wise to keep them separate.
The problem is usually asymmetric: you have a list of valid strings, and you want to match to another that has errors in it.
I edited my question trying to make it as short and precise.
I am developing a prototype of a facial recognition system for my Graduation Project. I use Eigenface and my main source is the document Turk and Pentland. It is available here: http://www.face-rec.org/algorithms/PCA/jcn.pdf.
My doubts focus on step 4 and 5.
I can not correctly interpret the number of thresholds: If two types of thresholds, or only one (Notice that the text speaks of two types but uses the same symbol). And again, my question is whether this (or these) threshold(s) is unique and global for all person or if each person has their own default.
I understand the steps to be calculated until an matrix O() of classes with weights or weighted. So this matrix O() is of dimension M'x P. Since M' equal to the amount of eigenfaces chosen and P the number of people.
What follows and confuses me. He speaks of two distances: the distance of a class against another, and also from a distance of one face to another. I call it D1 and D2 respectively. NOTE: In the training set there are M images in total, with F = M / P the number of images per person.
I understand that threshold(s) should be chosen empirically. But there must be a way to approximate. I was initially designing a matrix of distances D1() of dimension PxP. Where the row vector D(i) has the distances from the vector average class O(i) to each O(j), j = 1..P. Ie a "all vs all."
Until I came here, and what follows depends on whether I should actually choose a single global threshold for all. Or if I should be chosen for each individual value. Also not if they are 2 types: one for distance classes, and one for distance faces.
I have a theory as could proceed but not so supported by the concepts of Turk:
Stage Pre-Test:
Gender two matrices of distances D1 and D2:
In D1 would be stored distances between classes, and in D2 distances between faces. This basis of the matrices W and A respectively.
Then, as indeed in the training set are P people, taking the F vectors columns D1 for each person and estimate a threshold T1 was in range [Min, Max]. Thus I will have a T1(i), i = 1..P
Separately have a T2 based on the range [Min, Max] out of all the matrix D2. This define is a face or not.
Step Test:
Buid a test set of image with a 1 image for each known person
Itest = {Itest(1) ... Itest(P)}
For every image Itest(i) test:
Calculate the space face Atest = Itest - Imean
Calculate the weight vector Otest = UT * Atest
Calculating distances:
dist1(j) = distance(Otest, O (j)), j = 1..P
Af = project(Otest, U)
dist2 = distance(Atest, Af)
Evaluate recognition:
MinDist = Min(dist1)
For each j = 1..P
If dist2 > T2 then "not is face" else:
If MinDist <= T1(j) then "Subject identified as j" else "subject unidentified"
Then I take account of TFA and TFR and repeat the test process with different threshold values until I find the best approach gives to each person.
Already defined thresholds can put the system into operation unknown images. The algorithm is similar to the test.
I know I get out of "script" of the official documentation but at least this reasoning is the most logical place my head. I wondered if I could give guidance.
EDIT:
i No more to say that has not already been said and that may help clarify things.
Could anyone tell me if I'm okay tackled with my "theory"? I'm moving into my project, and if this is not the right way would appreciate some guidance and does not work and you wrong.
This is going to be a long post and just for fun, so if you don't have much time better go help folks with more important questions instead :)
There is a game called "Tower Bloxx" recently released on xbox. One part of the game is to place different colored towers on a field in a most optimal way in order to maximize number of most valuable towers. I wrote an algorithm that would determine the most efficient tower placement but it is not very efficient and pretty much just brute forcing all possible combinations. For 4x4 field with 4 tower types it solves it in about 1 hr, 5 tower types would take about 40 hours which is too much.
Here are the rules:
There are 5 types of towers that could be placed on a field. There are several types of fields, the easiest one is just 4x4 matrix, others fields have some "blanks" where you can't build. Your aim is to put as many the most valuable towers on a field as possible to maximize total tower value on a field (lets assume that all towers are built at once, there is no turns).
Tower types (in order from less to most valuable):
Blue - can be placed anywhere, value = 10
Red - can be placed only besides blue, value = 20
Green - placed besides red and blue, value = 30
Yellow - besides green, red and blue, value = 40
White - besides yellow, green, red and blue, value = 100
Which means that for example green tower should have at least 1 red and 1 blue towers at either north, south, west or east neighbor cells (diagonals don't count). White tower should be surrounded with all other colors.
Here is my algorithm for 4 towers on 4x4 field:
Total number of combinations = 4^16
Loop through [1..4^16] and convert every number to base4 string in order to encode tower placement, so 4^16 = "3333 3333 3333 3333" which would represent our tower types (0=blue,...,3=yellow)
Convert tower placement string into matrix.
For every tower in a matrix check its neighbors and if any of requirements fails this whole combination fails.
Put all correct combinations into an array and then sort this array as strings in lexicographic order to find best possible combination (first need to sort characters in a string).
The only optimization I came up with is to skip combinations that don't contain any most valuable towers. It skips some processing but I still loop through all 4^16 combinations.
Any thought how this can be improved? Code samples would be helpful if in java or php.
-------Update--------
After adding more illegal states (yellow cannot be built in the corners, white cannot be built in corners and on the edges, field should contain at least one tower of each type), realizing that only 1 white tower could be possibly built on 4x4 field and optimizing java code the total time was brought down from 40 to ~16 hours. Maybe threading would bring it down to 10 hrs but that's probably brute forcing limit.
I found this question intriguing, and since I'm teaching myself Haskell, I decided to try my hand at implementing a solution in that language.
I thought about branch-and-bound, but couldn't come up with a good way to bound the solutions, so I just did some pruning by discarding boards that violate the rules.
My algorithm works by starting with an "empty" board. It places each possible color of tower in the first empty slot and in each case (each color) then recursively calls itself. The recursed calls try each color in the second slot, recursing again, until the board is full.
As each tower is placed, I check the just-placed tower and all of it's neighbors to verify that they're obeying the rules, treating any empty neighbors as wild cards. So if a white tower has four empty neighbors, I consider it valid. If a placement is invalid, I do not recurse on that placement, effectively pruning the entire tree of possibilities under it.
The way the code is written, I generate a list of all possible solutions, then look through the list to find the best one. In actuality, thanks to Haskell's lazy evaluation, the list elements are generated as the search function needs them, and since they're never referred to again they become available for garbage collection right away, so even for a 5x5 board memory usage is fairly small (2 MB).
Performance is pretty good. On my 2.1 GHz laptop, the compiled version of the program solves the 4x4 case in ~50 seconds, using one core. I'm running a 5x5 example right now to see how long it will take. Since functional code is quite easy to parallelize, I'm also going to experiment with parallel processing. There's a parallelized Haskell compiler that will not only spread the work across multiple cores, but across multiple machines as well, and this is a very parallelizable problem.
Here's my code so far. I realize that you specified Java or PHP, and Haskell is quite different. If you want to play with it, you can modify the definition of the variable "bnd" just above the bottom to set the board size. Just set it to ((1,1),(x, y)), where x and y are the number of columns and rows, respectively.
import Array
import Data.List
-- Enumeration of Tower types. "Empty" isn't really a tower color,
-- but it allows boards to have empty cells
data Tower = Empty | Blue | Red | Green | Yellow | White
deriving(Eq, Ord, Enum, Show)
type Location = (Int, Int)
type Board = Array Location Tower
-- towerScore omputes the score of a single tower
towerScore :: Tower -> Int
towerScore White = 100
towerScore t = (fromEnum t) * 10
-- towerUpper computes the upper bound for a single tower
towerUpper :: Tower -> Int
towerUpper Empty = 100
towerUpper t = towerScore t
-- boardScore computes the score of a board
boardScore :: Board -> Int
boardScore b = sum [ towerScore (b!loc) | loc <- range (bounds b) ]
-- boardUpper computes the upper bound of the score of a board
boardUpper :: Board -> Int
boardUpper b = sum [ bestScore loc | loc <- range (bounds b) ]
where
bestScore l | tower == Empty =
towerScore (head [ t | t <- colors, canPlace b l t ])
| otherwise = towerScore tower
where
tower = b!l
colors = reverse (enumFromTo Empty White)
-- Compute the neighbor locations of the specified location
neighborLoc :: ((Int,Int),(Int,Int)) -> (Int,Int) -> [(Int,Int)]
neighborLoc bounds (col, row) = filter valid neighborLoc'
where
valid loc = inRange bounds loc
neighborLoc' = [(col-1,row),(col+1,row),(col,row-1),(col,row+1)]
-- Array to store all of the neighbors of each location, so we don't
-- have to recalculate them repeatedly.
neighborArr = array bnd [(loc, neighborLoc bnd loc) | loc <- range bnd]
-- Get the contents of neighboring cells
neighborTowers :: Board -> Location -> [Tower]
neighborTowers board loc = [ board!l | l <- (neighborArr!loc) ]
-- The tower placement rule. Yields a list of tower colors that must
-- be adjacent to a tower of the specified color.
requiredTowers :: Tower -> [Tower]
requiredTowers Empty = []
requiredTowers Blue = []
requiredTowers Red = [Blue]
requiredTowers Green = [Red, Blue]
requiredTowers Yellow = [Green, Red, Blue]
requiredTowers White = [Yellow, Green, Red, Blue]
-- cellValid determines if a cell satisfies the rule.
cellValid :: Board -> Location -> Bool
cellValid board loc = null required ||
null needed ||
(length needed <= length empties)
where
neighbors = neighborTowers board loc
required = requiredTowers (board!loc)
needed = required \\ neighbors
empties = filter (==Empty) neighbors
-- canPlace determines if 'tower' can be placed in 'cell' without
-- violating the rule.
canPlace :: Board -> Location -> Tower -> Bool
canPlace board loc tower =
let b' = board // [(loc,tower)]
in cellValid b' loc && and [ cellValid b' l | l <- neighborArr!loc ]
-- Generate a board full of empty cells
cleanBoard :: Array Location Tower
cleanBoard = listArray bnd (replicate 80 Empty)
-- The heart of the algorithm, this function takes a partial board
-- (and a list of empty locations, just to avoid having to search for
-- them) and a score and returns the best board obtainable by filling
-- in the partial board
solutions :: Board -> [Location] -> Int -> Board
solutions b empties best | null empties = b
solutions b empties best =
fst (foldl' f (cleanBoard, best) [ b // [(l,t)] | t <- colors, canPlace b l t ])
where
f :: (Board, Int) -> Board -> (Board, Int)
f (b1, best) b2 | boardUpper b2 <= best = (b1, best)
| otherwise = if newScore > lstScore
then (new, max newScore best)
else (b1, best)
where
lstScore = boardScore b1
new = solutions b2 e' best
newScore = boardScore new
l = head empties
e' = tail empties
colors = reverse (enumFromTo Blue White)
-- showBoard converts a board to a printable string representation
showBoard :: Board -> String
showBoard board = unlines [ printRow row | row <- [minrow..maxrow] ]
where
((mincol, minrow), (maxcol, maxrow)) = bounds board
printRow row = unwords [ printCell col row | col <- [mincol..maxcol] ]
printCell col row = take 1 (show (board!(col,row)))
-- Set 'bnd' to the size of the desired board.
bnd = ((1,1),(4,4))
-- Main function generates the solutions, finds the best and prints
-- it out, along with its score
main = do putStrLn (showBoard best); putStrLn (show (boardScore best))
where
s = solutions cleanBoard (range (bounds cleanBoard)) 0
best = s
Also, please remember this is my first non-trivial Haskell program. I'm sure it can be done much more elegantly and succinctly.
Update: Since it was still very time-consuming to do a 5x5 with 5 colors (I waited 12 hours and it hadn't finished), I took another look at how to use bounding to prune more of the search tree.
My first approach was to estimate the upper bound of a partially-filled board by assuming every empty cell is filled with a white tower. I then modified the 'solution' function to track the best score seen and to ignore any board whose upper bound is less than than that best score.
That helped some, reducing a 4x4x5 board from 23s to 15s. To improve it further, I modified the upper bound function to assume that each Empty is filled with the best tower possible, consistent with the existing non-empty cell contents. That helped a great deal, reducing the 4x4x5 time to 2s.
Running it on 5x5x5 took 2600s, giving the following board:
G B G R B
R B W Y G
Y G R B R
B W Y G Y
G R B R B
with a score of 730.
I may make another modification and have it find all of the maximal-scoring boards, rather than just one.
If you don't want to do A*, use a branch and bound approach. The problem should be relatively easy to code up because your value functions are well defined. I imagine you should be able to prune off huge sections of the search space with relative ease. However because your search space is pretty large it may still take some time. Only one way to find out :)
The wiki article isn't the best in the world. Google can find you a ton of nice examples and trees and stuff to further illustrate the approach.
One easy way to improve the brute force method is to explore only legal states. For example, if you are trying all possible states, you will be testing many states where the top right corner is a white tower. All of these states will be illegal. It doesn't make sense to generate and test all of those states. So you want to generate your states one block at a time, and only go deeper into the tree when you are actually at a potentially valid state. This will cut down your search tree by many orders of magnitude.
There may be further fancy things you can do, but this is an easy to understand (hopefully) improvement to your current solution.
I think you will want to use a branch-and-bound algorithm because I think coming up with a good heuristic for an A* implementation will be hard (but, that's just my intuitition).
The pseudo-code for a branch-and-bound implementation is:
board = initial board with nothing on it, probably a 2D array
bestBoard = {}
function findBest(board)
if no more pieces can be added to board then
if score(board) > score(bestBoard) then
bestBoard = board
return
else
for each piece P we can legally add to board
newBoard = board with piece P added
//loose upper bound, could be improved
if score(newBoard) + 100*number of blanks in newBoard > score(bestBoard)
findBestHelper(newBoard)
The idea is that we search all possible boards, in order, but we keep track of the best one we have found so far (this is the bound). Then, if we find a partial board which we know will never be better than the best one so far then we stop looking working on that partial board: we trim that branch of the search tree.
In the code above I am doing the check by assuming that all the blanks would be filled by the white pieces, as we can't do better than that. I am sure that with a little bit of thought you can come up with a tighter bound than that.
Another place where you can try to optimize is in the order of the for-each loop. You want to try pieces in the order correct order. That is, optimally you want the first solution found to be the best one, or at least one with a really high score.
It seems like a good approach would be to start with a white tower and then build a set of towers around it based on the requirements, trying to find the smallest possible colored set of shapes which can act as interlocking tiles.
I wanted to advocate linear programming with integer unknowns, but it turns out that it's NP-hard even in the binary case. However, you can still get great success at optimizing a problem like yours, where there are many valid solutions and you simply want the best one.
Linear programming, for this kind of problem, essentially amounts to having a lot of variables (for example, the number of red towers present in cell (M, N)) and relationships among the variables (for example, the number of white towers in cell (M, N) must be less than or equal to the number of towers of the non-white color that has the smallest such number, among all its neighbors). It's kind of a pain to write up a linear program, but if you want a solution that runs in seconds, it's probably your best bet.
You've received a lot of good advice on the algorithmic side of things, so I don't have a lot to add. But, assuming Java as the language, here are a few fairly obvious suggestions for performance improvement.
Make sure you're not instantiating any objects inside that 4^16 loop. It's much, much cheaper for the JVM to re-initialize an existing object than to create a new one. Even cheaper to use arrays of primitives. :)
If you can help it, step away from the collection classes. They'll add a lot of overhead that you probably don't need.
Make sure you're not concatenating any strings. Use StringBuilder.
And lastly, consider re-writing the whole thing in C.