Percentage load balance thread requests - algorithm

I have a pool of worker threads in which I send request to them based on percentage. For example, worker 1 must process 60% of total requests, worker 2 must process 31% of total requests and lastly worker 3 processes 9%. I need to know mathematically how to scale down the numbers and maintain ratio so I don't have to send 60 requests to thread 1 and then start sending requests to worker 2. It sounds like a "Linear Scale" math approach. In any case, all inputs on this issue are appreciated

One way to think about this problem makes it quite similar to the problem of drawing a sloped line on a pixel-based display, which can be done with Bresenham's algorithm.
First let's assume for simplicity that there are only 2 workers, and that they should take a fraction p (for worker 1) and (1-p) (for worker 2) of the incoming requests. Imagine that "Requests sent to worker 1" is the horizontal axis and "Requests sent to worker 2" is the vertical axis of a graph: what we want to do is draw a (pixelated) line in this graph that starts at (0, 0) and has slope (1-p)/p (i.e. it advances (1-p) units upwards for every p units it advances rightwards). When a new request comes in, a new pixel gets drawn. This new pixel will always be either immediately to the right of the previous pixel (if we assign the job to worker 1) or immediately above it (if we assign it to worker 2), so it's not quite like Bresenham's algorithm where diagonal movements are possible, but there are similarities.
With each new request that comes in, we have to assign that request to one of the workers, corresponding to drawing the next pixel rightwards or upwards from the previous one. I propose that a good way to pick the right direction is to pick the one that minimises an error function. The easiest thing to do is to take the slope of the line between (0, 0) and the point that would result from each of the 2 possible choices, and compare these slopes to the ideal slope (1-p)/p; then pick whichever one produces the lowest difference. This will cause the drawn pixels to "track" the ideal line as closely as possible.
To generalise this to more than 2 dimensions (workers), we can't use slope directly. If there are W workers, we need to come up with some function error(X, Y), where X and Y are both W-dimensional vectors, one representing the ideal direction (the ratios of requests to assign, analogous to the slope (1-p)/p earlier), the other representing the candidate point, and returning some number representing how different their directions are. Fortunately this is easy: we can take the cosine of the angle between two vectors by dividing their dot product by the product of their magnitudes, which is easy to calculate. This will be 1 if their directions are identical, and less than 1 otherwise, so when a new request arrives, all we need to do is perform this calculation for each of worker 1 <= i <= W and see which one's error(X, Y[i]) is closest to 1: that's the worker to give the request to.
[EDIT]
This procedure will also adapt to changes in the ideal direction. But as it stands, it tries (as hard as it can) to make the overall ratios of every request assigned so far track the ideal direction, so if the procedure has been running a long time, then even a small adjustment in the target direction could result in large "swings" to compensate. In that case, when calling error(X, Y[i]), it might be better to compute the second argument using the difference between the latest pixel (request assignment) and the pixel from some number k (e.g. k=100) steps ago. (In the original algorithm, we are implicitly subtracting the starting point (0, 0), i.e. k is as large as possible.) This only requires you to keep the last k chosen endpoints. Picking k too large will mean you can still get large swings, while picking k too small might mean that the "line" drifts well off-course, with some workers never picked at all, because each assignment alters the direction so drastically. You might need to experiment to find a good k.

To keep the assignments non-clustered, associate merits with each workers jobs inversely proportional to the intended share, e.g., 31 * 9 for w1, 60 * 9 for w2, and 31 * 60 for w3. Start mit no merits for each worker, next job goes to worker with least merits, and lesser ordinal in case of ties. Accumulate merits for jobs done. (On overflow from one accumulator, subtract MAXVALUE - 31 * 60 from each.)

Related

Process points spread apart in 2D in parallel

Problem
There are N points in 2D space (with coordinates in the range 10^9). All these points must be processed (once each).
Processing can use P parallel threads (with typical hardware, P ≈ 6).
The time it takes to process a point is different for each point and unknown beforehand.
All points being processed in parallel must be at least D apart from each other (Euclidean or other distance measure are all okay).
Attempts
I would imagine the algorithm would be an implementation of two parts:
Which points to schedule initially
Which new point to schedule (if possible) when a point finishes being processed
My solutions have not been much better than a naive method, which is simply to keep trying random points until one is at least D away from all points being processed.
(I have thought about making P of points so that every element of one group is at least D away from every element of every other group, and then when a point from a group finishes, take the next point in the group. This only saves time in some scenarios though, and I have not determined how to get a good set of groups either.)

How to find the highest number of changes/permutations inside a group (maybe a graph)

Lets say in my company there are a number N of workers and M sectors. Each worker is currently assigned to a sector, also each worker is all willing to change to another sector.
For example:
Worker A is in sector 1 but want to go to sector 2
B is in 2 but want 3
C is in 3 but want 2
D is in 1 but want 3
and so on...
But they all must change with eachother.
A go to B position and B go to A position
or
A go to B position / B go to C position / C go to A position
I know that not everyone will change sectors, but I'm wondering if there is any specific algorithm that could find what movements will yield the maximum amount of changes.
I tought about naively swap two workers but some of them may be missing, they could all form a "loop" and no one would be left out (if possible)
I could use Monte Carlo to chain the workers and find the longest chain/loop but that would be too expensive as N and M grows
Also tought about finding the longest path in a graph using djikstra but as it looks like a NP-hard problem
Does anyone know an algorithm or how could I solve this efficiently? Or I'm trying to fly too close to the sun here?
This can be solved as a min-cost circulation problem. Construct a flow network where each sector corresponds to a node, and each worker corresponds to an arc. The capacity of each arc is 1, and the cost is −1 (i.e., we should move workers if we can). The conservation of flow constraint ensures that we can decompose the worker movements into a sum of simple cycles.
Klein's cycle canceling algorithm is not the most efficient, but it's very simple. Use (e.g.) Bellman−Ford to find a negative-cost cycle in the network, if one exists. If so, reverse the direction of each arc in the cycle, multiply the cost of each arc in the cycle by −1, and loop back to the beginning.
You could use the following observations to generate the most attractive sector changes (measured as how many workers get the change they want). In order of falling attractiveness,
Identify all circular chains of sector changes. Everybody gets the change they want.
Identify all non-circular chains of sector changes. They can be made circular at the expense of one worker not getting what s/he wants.
Revisit 1. Combine any two circular chains at the expense of two workers not getting what they want.
Instead of one optimal solution, you get a list of many more or less attractive options. You will have to put some bounds on steps 1 - 3 to keep options down to a tractable number.

How can I take an algorithm that works very well in a 2D space and adapt it for 3D environments?

I liked the technique Spelunky used to generate the levels in the game, and I want to adapt it for a 3D space, so that I can use it to help me design something in that 3D space. I'm just not sure how to approach this kind of problem. I feel like a direct translation for this specific approach isn't possible, but I still want to have something that feels similar, with the same sort of winding path.
This is an interesting question due to some of the challenges that arise when you start moving algorithms into higher dimensions.
Issue 1: Main Paths Occupy Little Volume
Let's start with a simple mathematical observation. The level generation algorithm works by computing a random path from the top row of the world to the bottom row of the world. At each step, that path can either move left, right, or down, but never takes a step back onto where it starts from. Assuming these three options are equally likely, the first step on a particular level has a 2/3 chance of staying on the same level, and each subsequent step has at most a 1/2 chance of staying on the same level. (It's 50%, ignoring walls). That means that the expected number of tiles per level is going to be at most
1 × (1/3) + (2/3) × (1 + 2) (since there's 1/3 chance to move down immediately, otherwise (2/3) chance to get one room, plus the number of rooms you get from a random process that halts with 50% probability at each step)
= 2.5
This is important, because it means that if your world were an n × n grid, you'd expect to have only about 2.5 "used" cells per level in the world for a total of roughly 2.5n "used" cells on the path. Given that the world is n × n, that means that a
2.5n / n2 = 2.5 / n
fraction of the cells of the world will be on the main path, so n has to be small for the fraction of the world not on the main path to not get too huge. The video you linked picks n = 4, giving, a fraction of 2.5 / 4 = 62.5% of the world will be on the main path. That's a good blend between "I'd like to make progress" and "I'd like my side quest, please." And remember, this number is an overestimate of the number of path cells versus side-quest cells.
Now, let's do this in three dimensions. Same as before, we start in the top "slice," then randomly move forward, backward, left, right, or down at each step. That gives us at most a 4/5 chance of staying on our level with our first step, then from that point forward at most a 3/4 chance of staying on the level. (Again, this is an overestimate.) Doing the math again gives
1 × 1/5 + (4/5) × (1 + 4)
= 1/5 + (4/5) × 5
= 4.2
So that means that an average of 4.2 cells per level are going to be in the main path, for a total path length of 4.2n, on average, if we're overestimating. In an n × n × n world, this means that the fraction of "on-path" sites to "off-path" sites is
4.2n / n3
= 4.2 / n2
This means that your world needs to be very small for the main path not to be a trivial fraction of the overall space. For example, picking n = 3 means that you'd have just under 50% of the world off the main path and available for exploration. Picking n = 4 would give you 75% of the world off the main path, and giving n = 5 would give you over 80% of the world off the main path.
All of this is to say that, right off the bat, you'd need to reduce the size of your world so that a main-path-based algorithm doesn't leave the world mostly empty. That's not necessarily a bad thing, but it is something to be mindful of.
Issue 2: Template Space Increases
The next issue you're run into is building up your library of "templates" for room configurations. If you're in 2D space, there are four possible entrances and exits into each cell, and any subset of those four entrances (possibly, with the exception of a cell with no entrances at all) might require a template. That gives you
24 - 1 = 15
possible entrance/exit templates to work with, and that's just to cover the set of all possible options.
In three dimensions, there are six possible entrances and exits from each cell, so there are
26 - 1 = 63
possible entrance/exit combinations to consider, so you'd need a lot of templates to account for this. You can likely reduce this by harnessing rotational symmetry, but this is an important point to keep in mind.
Issue 3: Getting Stuck
The video you linked mentions as a virtue of the 2D generation algorithm the fact that
it creates fun and engaging levels that the player can't easily get stuck in.
In 2D space, most cells, with a few exceptions, will be adjacent to the main path. In 3D space, most cells, with a few exceptions, will not be adjacent to the main path. Moreover, in 2D space, if you get lost, it's not hard to find your way back - there are only so many directions you can go, and you can see the whole world at once. In 3D, it's a lot easier to get lost, both because you can take steps that get you further off the main path than in 2D space and because, if you do get lost, there are more options to consider for how to backtrack. (Plus, you probably can't see the whole world at once in a 3D space.)
You could likely address this by just not filling the full 3D space of cells with places to visit. Instead, only allow cells that are one or two steps off of the main path to be filled in with interesting side quests, since that way the player can't get too lost in the weeds.
To Summarize
These three points suggest that, for this approach to work in 3D, you'd likely need to do the following.
Keep the world smaller than you think you might need it to be, since the ratio of on-path to off-path cells will get large otherwise.
Alternatively, consider only filling in cells adjacent to the main path, leaving the other cells inaccessible, so that the player can quickly backtrack to where they were before.
Be prepared to create a lot of templates, or to figure out how to use rotations to make your templates applicable in more places.
Good luck!

Is it better to reduce the space complexity or the time complexity for a given program?

Grid Illumination: Given an NxN grid with an array of lamp coordinates. Each lamp provides illumination to every square on their x axis, every square on their y axis, and every square that lies in their diagonal (think of a Queen in chess). Given an array of query coordinates, determine whether that point is illuminated or not. The catch is when checking a query all lamps adjacent to, or on, that query get turned off. The ranges for the variables/arrays were about: 10^3 < N < 10^9, 10^3 < lamps < 10^9, 10^3 < queries < 10^9
It seems like I can get one but not both. I tried to get this down to logarithmic time but I can't seem to find a solution. I can reduce the space complexity but it's not that fast, exponential in fact. Where should I focus on instead, speed or space? Also, if you have any input as to how you would solve this problem please do comment.
Is it better for a car to go fast or go a long way on a little fuel? It depends on circumstances.
Here's a proposal.
First, note you can number all the diagonals that the inputs like on by using the first point as the "origin" for both nw-se and ne-sw. The diagonals through this point are both numbered zero. The nw-se diagonals increase per-pixel in e.g the northeast direction, and decreasing (negative) to the southwest. Similarly ne-sw are numbered increasing in the e.g. the northwest direction and decreasing (negative) to the southeast.
Given the origin, it's easy to write constant time functions that go from (x,y) coordinates to the respective diagonal numbers.
Now each set of lamp coordinates is naturally associated with 4 numbers: (x, y, nw-se diag #, sw-ne dag #). You don't need to store these explicitly. Rather you want 4 maps xMap, yMap, nwSeMap, and swNeMap such that, for example, xMap[x] produces the list of all lamp coordinates with x-coordinate x, nwSeMap[nwSeDiagonalNumber(x, y)] produces the list of all lamps on that diagonal and similarly for the other maps.
Given a query point, look up it's corresponding 4 lists. From these it's easy to deal with adjacent squares. If any list is longer than 3, removing adjacent squares can't make it empty, so the query point is lit. If it's only 3 or fewer, it's a constant time operation to see if they're adjacent.
This solution requires the input points to be represented in 4 lists. Since they need to be represented in one list, you can argue that this algorithm requires only a constant factor of space with respect to the input. (I.e. the same sort of cost as mergesort.)
Run time is expected constant per query point for 4 hash table lookups.
Without much trouble, this algorithm can be split so it can be map-reduced if the number of lampposts is huge.
But it may be sufficient and easiest to run it on one big machine. With a billion lamposts and careful data structure choices, it wouldn't be hard to implement with 24 bytes per lampost in an unboxed structures language like C. So a ~32Gb RAM machine ought to work just fine. Building the maps with multiple threads requires some synchronization, but that's done only once. The queries can be read-only: no synchronization required. A nice 10 core machine ought to do a billion queries in well less than a minute.
There is very easy Answer which works
Create Grid of NxN
Now for each Lamp increment the count of all the cells which suppose to be illuminated by the Lamp.
For each query check if cell on that query has value > 0;
For each adjacent cell find out all illuminated cells and reduce the count by 1
This worked fine but failed for size limit when trying for 10000 X 10000 grid

Generate random sequence of integers differing by 1 bit without repeats

I need to generate a (pseudo) random sequence of N bit integers, where successive integers differ from the previous by only 1 bit, and the sequence never repeats. I know a Gray code will generate non-repeating sequences with only 1 bit difference, and an LFSR will generate non-repeating random-like sequences, but I'm not sure how to combine these ideas to produce what I want.
Practically, N will be very large, say 1000. I want to randomly sample this large space of 2^1000 integers, but I need to generate something like a random walk because the application in mind can only hop from one number to the next by flipping one bit.
Use any random number generator algorithm to generate an integer between 1 and N (or 0 to N-1 depending on the language). Use the result to determine the index of the bit to flip.
In order to satisfy randomness you will need to store previously generated numbers (thanks ShreevatsaR). Additionally, you may run into a scenario where no non-repeating answers are possible so this will require a backtracking algorithm as well.
This makes me think of fractals - following a boundary in a julia set or something along those lines.
If N is 1000, use a 2^500 x 2^500 fractal bitmap (obviously don't generate it in advance - you can derive each pixel on demand, and most won't be needed). Each pixel move is one pixel up, down, left or right following the boundary line between pixels, like a simple bitmap tracing algorithm. So long as you start at the edge of the bitmap, you should return to the edge of the bitmap sooner or later - following a specific "colour" boundary should always give a closed curve with no self-crossings, if you look at the unbounded version of that fractal.
The x and y axes of the bitmap will need "Gray coded" co-ordinates, of course - a bit like oversized Karnaugh maps. Each step in the tracing (one pixel up, down, left or right) equates to a single-bit change in one bitmap co-ordinate, and therefore in one bit of the resulting values in the random walk.
EDIT
I just realised there's a problem. The more wrinkly the boundary, the more likely you are in the tracing to hit a point where you have a choice of directions, such as...
* | .
---+---
. | *
Whichever direction you enter this point, you have a choice of three ways out. Choose the wrong one of the other two and you may return back to this point, therefore this is a possible self-crossing point and possible repeat. You can eliminate the continue-in-the-same-direction choice - whichever way you turn should keep the same boundary colours to the left and right of your boundary path as you trace - but this still leaves a choice of two directions.
I think the problem can be eliminated by making having at least three colours in the fractal, and by always keeping the same colour to one particular side (relative to the trace direction) of the boundary. There may be an "as long as the fractal isn't too wrinkly" proviso, though.
The last resort fix is to keep a record of points where this choice was available. If you return to the same point, backtrack and take the other alternative.
While an algorithm like this:
seed()
i = random(0, n)
repeat:
i ^= >> (i % bitlen)
yield i
…would return a random sequence of integers differing each by 1 bit, it would require a huge array for backtracing to ensure uniqueness of numbers.
Further more your running time would increase exponentially(?) with increasing density of your backtrace, as the chance to hit a new and non-repeating number decreases with every number in the sequence.
To reduce time and space one could try to incorporate one of these:
Bloom Filter
Use a Bloom Filter to drastically reduce the space (and time) needed for uniqueness-backtracing.
As Bloom Filters come with the drawback of producing false positives from time to time a certain rate of falsely detected repeats (sic!) (which thus are skipped) in your sequence would occur.
While the use of a Bloom Filter would reduce the space and time your running time would still increase exponentially(?)…
Hilbert Curve
A Hilbert Curve represents a non-repeating (kind of pseudo-random) walk on a quadratic plane (or in a cube) with each step being of length 1.
Using a Hilbert Curve (on an appropriate distribution of values) one might be able to get rid of the need for a backtrace entirely.
To enable your sequence to get a seed you'd generate n (n being the dimension of your plane/cube/hypercube) random numbers between 0 and s (s being the length of your plane's/cube's/hypercube's sides).
Not only would a Hilbert Curve remove the need for a backtrace, it would also make the sequencer run in O(1) per number (in contrast to the use of a backtrace, which would make your running time increase exponentially(?) over time…)
To seed your sequence you'd wrap-shift your n-dimensional distribution by random displacements in each of its n dimension.
Ps: You might get better answers here: CSTheory # StackExchange (or not, see comments)

Resources