Arrange words to get max height - algorithm

In my app I have a frame inside which user can form a "sentence" from random source words. I should calculate words font size so they will fit this frame (any order, any number - even all - of words can be used).
Words are placed one-by-one from top-left corner. If next word exceeds frame width - it's moved to a new line. So result words height shouldn't exceed frame's one. To simplify algorithm let's ignore paddings (between words, lines). All words have same height and font.
Can't find anything better than NP-completeness... Any ideas?
UPDATE:
This question is: "How can I calculate words maximum height?" Problem: this height depends on words order.
UPDATE2:
Exact question by #Teepeemm: "What arrangement of the words will need the most lines to display?"

Related

Is there any known alg to fill certain area with rectangles of different size with most effective way?

By "effective way" I understand "with leaving more usable free areas for other rectangles".
I translating some game and I need to rebuild it's font. Font area is limited to certain size. I need to effectively fill that area with letters and letters combinations.
My current cycle increases X by 1 until we can place newly letter by that coords. If X hit font's right border, my cycle makes X = 0, Y++;
It leaves blank areas that is possible to use (marked with red):
How to use free space more effective?
Like that (at least):
I am assuming that the number of words in a row isn't very large(you said it's for some game). So you create a matrix say wordInfo[numberOfRowsInDisplay][maximumNumberOfColumnsForAnyRowInDisplay] you can calculate both of these variables by doing simple math like how many characters are in message(with width) and how many spaces(with width). It is filled as we go line by line and column by column for series of word/characters.
After that you for any wordInfo[row][column], represents width and height of that word/character. So if you populating some row you just see where this word/character ends(some number, say, pixel for example) and now check in previous row what was greatest height in this region including partial overlaps, and than set height of this word/character accordingly in wordInfo matrix, which will help in plotting next row. (It would be more efficient if you just only 2 rows in matrix, u got it?)
Note : if number of rows is large and there are man different height characters than in some cases might completely change.

How to solve crossword (NP-Hard)?

I am currently doing an assignment and I'm stuck with the approach.
I have a crossword problem which consists of an empty grid (no solid square as a conventional crossword would), with a varied width and height between 4 and 400 (inclusive).
Rules:
Words are part of the input - a list of 10 - 1000 (inclusive) English words of varying lengths.
A horizontal word can only intersect a vertical word.
A vertical word can only intersect a horizontal word.
A word can only intersect 1 or 2 other words.
Each letter is worth one point.
Words must have a 1 grid space gap surrounding it unless it is a part of an intersecting word.
Example:
X X X X X X
X B O S S X
X X X X X X
Goal:
Get the maximum possible score within a 5 minute time limit.
So far:
After some research I am aware that this is an NP-Hard problem. Thus the most optimal solution cannot be calculated because every combination cannot be examined.
The easiest solution would appear to be to sort the words according to length and inserting the highest scoring words for maximum score (greedy algorithm).
I’ve also been told a recursive tree with the nodes consisting of alternative equally scoring word insertions and the knapsack algorithm apply to this problem (not sure what the implementation would look like).
Questions:
What allows me to check the maximum number of combinations within a 5 minute time span that scales accordingly to the maximum possible word list and grid size?
What heuristics might I apply when inserting words?
Btw the goal here is to get the best possible solution in 5 minutes.
To clarify each letter of a valid word is worth 1 point, thus a 5 letter word is worth 5 points.
Thanks in advance I have been reading a lot of mathematical notation on crossword research papers all day which has seem to have lead me in a circle.
I'd start with a word with following characteristics:
It should have max possible intersections.
Its length should be such that number of words of that length are minimum in the list.
ie, word length should be least frequent and most number of intersections.
Reason for this kind of selection is that it would minimize further possibility of words that can be selected. eg. A word of size 9 with 2 further intersections is selected. These intersecting words are of length 6 and 5 (say). Now, you have removed possibility of all those words of length 6 and 5 whose 3rd char is 'a' and 2nd char is 's' (say, 'a' and 's' are the intersecting letters).
If there are many places with same configuration, run this selection procedure one or two steps deeper to get a better selection of which part (word) of the grid to fill first.
Now, try filling in all words in this 1st selected position (since this had min frequency, it should be good to use) and then going deeper in the crossword to fill it. Whichever word results in most points till a deadend is reached, should be your solution. When you reach a dead-end, you can start over with a new word.
This seems like a really interesting problem in discrete optimization. You're certainly right; with the number of words and number of possible placements there is no way you could ever explore a fraction of the space.
Also given the 5 minute time limit (quite short), I think you're going to have a really hard time with any solid heuristic. I think your best bet might be some sort of random permutation / simulated annealing algorithm.
If I was doing this, I would first calculate clusters of words, completely ignoring the crossword structure itself. Take one word, find a second word that intersects it. Then find another word that can fit onto this structure (obeying the max of 2 intersections per word), and so on. You should end up with many of these clusters, which you can rank by density (points / area used). I think you should be able to do this relatively quickly.
Then for the random permutation / simulated annealing part, for my moves I would place either a cluster or unused word onto the crossword itself, or move an existing cluster / word. Just save the current highest-scoring configuration as you go, and return this after the 5 minutes.
If the 5 min is too short to find anything meaningful using random permutations, another approach might be to use a constraint propagation idea working with those clusters.

Break text evenly into certain number of lines

There is a linear time algorithm (or quadratic time algorithm by Knuth & Plass) for breaking text evenly into lines of maximum width. It uses SMAWK and "evenly" means:
http://en.wikipedia.org/wiki/Word_wrap#Minimum_raggedness
Is there an algorithm or a concave cost function for algorithm above which would take into account the number of lines I would like the text break into, instead of the maximum line width?
In other words, I'm looking for a line breaking (or paragraph formation, or word wrapping) algorithm where the input is the desired number of lines, not the desired line width.
Just to describe a practically unusable approach: There are N words and N-1 spaces in-between each word pair, M is the desired number of lines (M <= N). After each space there might be at most one (possibly zero) line-break. Now, the algorithm would try to place the breaks in each possible combination, calculating the "raggedness" and return the best one. How to do it much faster?
You could simply reduce the problem of achieving a given number of lines to the problem of breaking lines after a maximum length by calculating the maximum length as the total length of the string divided by the number of lines you want. As the actual length of a line is going to be less than the maximum length in many cases, you would probably need to subtract 1 from the number of lines you want.

Divide grid (2D array) into random shaped parts?

The Problem
I want to divide a grid (2D array) into random shaped parts (think earth's tectonic plates).
Criteria are:
User inputs grid size (program should scale because this could be very large).
User inputs grid division factor (how many parts).
Grid is a rectangular shaped hex grid, and is capped top and bottom, wrap around left and right.
No fragmentation of the parts.
No parts inside other parts.
No tiny or super-large parts.
Random shaped parts, that are not perfect circles, or strung-out snaking shapes.
My solution:
Create a method that can access/manipulate adjacent cells.
Randomly determine the size of each part (the sum of all the parts equal the size of the whole 2D array).
Fill the entire 2D array with the last part's id number.
For each part except the last:
Seed the current part id number in a random cell of the 2D array.
Iterate over the entire array and store the address of each cell adjacent to any cells already seeded with the current part id number.
Extract one of the stored addresses and fill that cell with the current plate id number (and so the part starts to form).
Repeat until the part size is reached.
Note that to avoid parts with long strung out "arms" or big holes inside them, I created two storage arrays: one for cells adjacent
to just one cell with the current part id number, and the other for cells adjacent to more than one, then I exhaust the latter before the former.
Running my solution gives the following:
Grid size: 200
width: 20
height: 10
Parts: 7
66633333111114444466
00033331111114444466
00003331111114444466
00003331111144444660
00000333111164444660
00000336111664422600
00000336615522222200
00006655555522222200
00006655555552222220
00066655555552222220
Part number: 0
Part size: 47
Part number: 1
Part size: 30
Part number: 2
Part size: 26
Part number: 3
Part size: 22
Part number: 4
Part size: 26
Part number: 5
Part size: 22
Part number: 6
Part size: 27
Problems with my solution:
The last part is always fragmented - in the case above there are three separate groups of sixes.
The algorithm will stall when parts form in cul-de-sacs and don't have room to grow to their full size (the algorithm does not allow forming parts over other parts, unless it's the last part, which is layed down over the entire 2D array at the start).
If I don't specify the part sizes before forming the 2d array, and just make do with specifying the number of parts and randomly generating the part sizes on the fly, this leaves open the possibility of tiny parts being formed, that might aswell not be there at all, especially when the 2D array is very large. My current part size method limits the parts sizes to between 10% and 40% of the total size of the 2D array. I may be okay with not specifying the parts sizes if there is some super-elegant way to do this - the only control the user will have is 2d array size and number of parts.
Other ideas:
Form the parts in perfectly aligned squares, then run over the 2D array and randomly allow each part to encroach on other parts, warping them into random shapes.
Draw snaking lines across the grid and fill in the spaces created, maybe using some math like this: http://mathworld.wolfram.com/PlaneDivisionbyLines.html
Conclusion:
So here's the rub: I am a beginner programmer who is unsure if I'm tackling this problem in the right way. I can create some more "patch up" methods, that shift the fragmented parts together, and allow forming parts to "jump out" of the cul-de-sacs if they get stuck in them, but it feels messy.
How would you approach this problem? Is there some sexy math I could use to simplify things perhaps?
Thx
I did something similar for a game a few months back, though it was a rectangular grid rather than a hex grid. Still, the theory is the same, and it came up with nice contiguous areas of roughly equal size -- some were larger, some were smaller, but none were too small or too large. YMMV.
Make an array of pointers to all the spaces in your grid. Shuffle the array.
Assign the first N of them IDs -- 1, 2, 3, etc.
Until the array points to no spaces that do not have IDs,
Iterate through the array looking for spaces that do not have IDs
If the space has neighbors in the grid that DO have IDs, assign the space
the ID from a weighted random selection of the IDs of its neighbors.
If it doesn't have neighbors with IDs, skip to the next.
Once there are no non-empty spaces, you have your map with sufficiently blobby areas.
Here's what I'd do: use Voronoi algorithm. At first place some random points, then let the Voronoi algorithm generate the parts. To get the idea how it looks like consult: this applet.
As Rekin suggested, a Voronoi diagram plus some random perturbation will generally do a good job, and on a discretized space like you've got, is relatively easy to implement.
I just wanted to give some ideas about how to do the random perturbation. If you do it at the final resolution, then it's either going to take a very long time, or be pretty minimal. You might try doing a multi-resolution perturbation. So, start with a rather small grid, randomly seed, compute the Voronoi diagram. Then randomly perturb the borders - something like, for each pair of adjacent cells with different regions, push the region one way or the other. You might need to run a post-process to make sure you have no tiny islands.. a simple floodfill will work.
Then create a grid that's twice the size (in each direction), and copy your regions over. You can probably use nearest neighbor. Then perturb the borders again, and repeat until you reach your desired resolution.

Algorithm: Determine shape of two sectors delineated by an arbitrary path, and then fill one

NOTE: This is a challenging problem for anybody who likes logic problems, etc.
Consider a rectangular two-dimensional grid of height H and width W. Every space on the grid has a value, either 0 1 or 2. Initially, every space on the grid is a 0, except for the spaces along each of the four edges, which are initially a 2.
Then consider an arbitrary path of adjacent (horizontally or vertically) grid spaces. The path begins on a 2 and ends on a different 2. Every space along the path is a 1.
The path divides the grid into two "sectors" of 0 spaces. There is an object that rests on an unspecified 0 space. The "sector" that does NOT contain the object must be filled completely with 2.
Define an algorithm that determines the spaces that must become 2 from 0, given an array (list) of values (0, 1, or 2) that correspond to the values in the grid, going from top to bottom and then from left to right. In other words, the element at index 0 in the array contains the value of the top-left space in the grid (initially a 2). The element at index 1 contains the value of the space in the grid that is in the left column, second from the top, and so forth. The element at index H contains the value of the space in the grid that is in the top row but second from the left, and so forth.
Once the algorithm finishes and the empty "sector" is filled completely with 2s, the SAME algorithm must be sufficient to do the same process again. The second (and on) time, the path is still drawn from a 2 to a different 2, across spaces of 0, but the "grid" is smaller because the 2s that are surrounded by other 2s cannot be touched by the path (since the path is along spaces of 0).
I thank whomever is able to figure this out for me, very very much. This does not have to be in a particular programming language; in fact, pseudo-code or just English is sufficient. Thanks again! If you have any questions, just leave a comment and I'll specify what needs to be specified.
Seems to me a basic flood fill algorithm would get the job done:
Scan your array for the first 0 you find, and then start a flood fill from there, filling the 0 region with some other number, let's say 3 - this will label one of your "sectors".
Once that's done, scan again for a 0, and flood fill from there, filling with a 4 this time.
During both of the fills, you can be checking whether you found your object or not; whichever fill you find it during, keep track of that number.
After both fills are done, check which numbered region had the object in it - flood fill that region again, back with 0 this time.
Flood fill the other numbered region with 2, and you're done.
This'll work for any grid configuration, as long as there are exactly two 0 sectors that are disconnected from each other; so re-applying the same algorithm any number of times is fine.
Edit: Minor tweaks, to save you a flood-fill or two -
If you don't find your object in the first flood-fill, you can assume that the other sector has it, so you just re-fill the current number with 2 and leave the other sector alone (since it's already 0-filled).
Alternatively, if you do find the object in the first flood-fill, you can directly fill the other sector with 2, and then re-fill the first sector with 0.

Resources