What is root mean square correlation? - correlation

I know root mean square error. What is root mean square correlation? How to use it to simulate data? For example, 1000 test statistics are correlated, and root mean square correlation is 0.1

Related

Need an explanation of Grundy numbers from Competitive Programming Handbook

I am trying to understand the example from the book https://cses.fi/book/book.pdf at page 239 .
The example is described as follows:
What I don't get is just what exactly, say, number 3 next to lower right corner means, we can move 4 steps up and 3 steps left from it, how is it 3? Same for 4 just above it, it doesn't correspond to any set of moves I can think of. The book in general makes a lot of leaps of logic they think are obvious but usually I can infer what they mean after some time, here I am just lost.
The rule for computing these numbers is recursive.
You consider all the values you can reach, and then pick the smallest (non-negative) integer that is not reachable.
For example, the value in the top-left corner is 0 because no moves are possible.
For example, the value next to lower right is 3 because the reachable values are 0,4,1,0,2,1,4 so 3 is the smallest integer not in this list.
This explains how to compute the numbers, but to understand them it is probably better to start with understanding the game of Nim. In the game of Nim, the Sprague Grundy number for a pile is simply equal to the size of a pile.

How can I take an algorithm that works very well in a 2D space and adapt it for 3D environments?

I liked the technique Spelunky used to generate the levels in the game, and I want to adapt it for a 3D space, so that I can use it to help me design something in that 3D space. I'm just not sure how to approach this kind of problem. I feel like a direct translation for this specific approach isn't possible, but I still want to have something that feels similar, with the same sort of winding path.
This is an interesting question due to some of the challenges that arise when you start moving algorithms into higher dimensions.
Issue 1: Main Paths Occupy Little Volume
Let's start with a simple mathematical observation. The level generation algorithm works by computing a random path from the top row of the world to the bottom row of the world. At each step, that path can either move left, right, or down, but never takes a step back onto where it starts from. Assuming these three options are equally likely, the first step on a particular level has a 2/3 chance of staying on the same level, and each subsequent step has at most a 1/2 chance of staying on the same level. (It's 50%, ignoring walls). That means that the expected number of tiles per level is going to be at most
1 × (1/3) + (2/3) × (1 + 2) (since there's 1/3 chance to move down immediately, otherwise (2/3) chance to get one room, plus the number of rooms you get from a random process that halts with 50% probability at each step)
= 2.5
This is important, because it means that if your world were an n × n grid, you'd expect to have only about 2.5 "used" cells per level in the world for a total of roughly 2.5n "used" cells on the path. Given that the world is n × n, that means that a
2.5n / n2 = 2.5 / n
fraction of the cells of the world will be on the main path, so n has to be small for the fraction of the world not on the main path to not get too huge. The video you linked picks n = 4, giving, a fraction of 2.5 / 4 = 62.5% of the world will be on the main path. That's a good blend between "I'd like to make progress" and "I'd like my side quest, please." And remember, this number is an overestimate of the number of path cells versus side-quest cells.
Now, let's do this in three dimensions. Same as before, we start in the top "slice," then randomly move forward, backward, left, right, or down at each step. That gives us at most a 4/5 chance of staying on our level with our first step, then from that point forward at most a 3/4 chance of staying on the level. (Again, this is an overestimate.) Doing the math again gives
1 × 1/5 + (4/5) × (1 + 4)
= 1/5 + (4/5) × 5
= 4.2
So that means that an average of 4.2 cells per level are going to be in the main path, for a total path length of 4.2n, on average, if we're overestimating. In an n × n × n world, this means that the fraction of "on-path" sites to "off-path" sites is
4.2n / n3
= 4.2 / n2
This means that your world needs to be very small for the main path not to be a trivial fraction of the overall space. For example, picking n = 3 means that you'd have just under 50% of the world off the main path and available for exploration. Picking n = 4 would give you 75% of the world off the main path, and giving n = 5 would give you over 80% of the world off the main path.
All of this is to say that, right off the bat, you'd need to reduce the size of your world so that a main-path-based algorithm doesn't leave the world mostly empty. That's not necessarily a bad thing, but it is something to be mindful of.
Issue 2: Template Space Increases
The next issue you're run into is building up your library of "templates" for room configurations. If you're in 2D space, there are four possible entrances and exits into each cell, and any subset of those four entrances (possibly, with the exception of a cell with no entrances at all) might require a template. That gives you
24 - 1 = 15
possible entrance/exit templates to work with, and that's just to cover the set of all possible options.
In three dimensions, there are six possible entrances and exits from each cell, so there are
26 - 1 = 63
possible entrance/exit combinations to consider, so you'd need a lot of templates to account for this. You can likely reduce this by harnessing rotational symmetry, but this is an important point to keep in mind.
Issue 3: Getting Stuck
The video you linked mentions as a virtue of the 2D generation algorithm the fact that
it creates fun and engaging levels that the player can't easily get stuck in.
In 2D space, most cells, with a few exceptions, will be adjacent to the main path. In 3D space, most cells, with a few exceptions, will not be adjacent to the main path. Moreover, in 2D space, if you get lost, it's not hard to find your way back - there are only so many directions you can go, and you can see the whole world at once. In 3D, it's a lot easier to get lost, both because you can take steps that get you further off the main path than in 2D space and because, if you do get lost, there are more options to consider for how to backtrack. (Plus, you probably can't see the whole world at once in a 3D space.)
You could likely address this by just not filling the full 3D space of cells with places to visit. Instead, only allow cells that are one or two steps off of the main path to be filled in with interesting side quests, since that way the player can't get too lost in the weeds.
To Summarize
These three points suggest that, for this approach to work in 3D, you'd likely need to do the following.
Keep the world smaller than you think you might need it to be, since the ratio of on-path to off-path cells will get large otherwise.
Alternatively, consider only filling in cells adjacent to the main path, leaving the other cells inaccessible, so that the player can quickly backtrack to where they were before.
Be prepared to create a lot of templates, or to figure out how to use rotations to make your templates applicable in more places.
Good luck!

Is it better to reduce the space complexity or the time complexity for a given program?

Grid Illumination: Given an NxN grid with an array of lamp coordinates. Each lamp provides illumination to every square on their x axis, every square on their y axis, and every square that lies in their diagonal (think of a Queen in chess). Given an array of query coordinates, determine whether that point is illuminated or not. The catch is when checking a query all lamps adjacent to, or on, that query get turned off. The ranges for the variables/arrays were about: 10^3 < N < 10^9, 10^3 < lamps < 10^9, 10^3 < queries < 10^9
It seems like I can get one but not both. I tried to get this down to logarithmic time but I can't seem to find a solution. I can reduce the space complexity but it's not that fast, exponential in fact. Where should I focus on instead, speed or space? Also, if you have any input as to how you would solve this problem please do comment.
Is it better for a car to go fast or go a long way on a little fuel? It depends on circumstances.
Here's a proposal.
First, note you can number all the diagonals that the inputs like on by using the first point as the "origin" for both nw-se and ne-sw. The diagonals through this point are both numbered zero. The nw-se diagonals increase per-pixel in e.g the northeast direction, and decreasing (negative) to the southwest. Similarly ne-sw are numbered increasing in the e.g. the northwest direction and decreasing (negative) to the southeast.
Given the origin, it's easy to write constant time functions that go from (x,y) coordinates to the respective diagonal numbers.
Now each set of lamp coordinates is naturally associated with 4 numbers: (x, y, nw-se diag #, sw-ne dag #). You don't need to store these explicitly. Rather you want 4 maps xMap, yMap, nwSeMap, and swNeMap such that, for example, xMap[x] produces the list of all lamp coordinates with x-coordinate x, nwSeMap[nwSeDiagonalNumber(x, y)] produces the list of all lamps on that diagonal and similarly for the other maps.
Given a query point, look up it's corresponding 4 lists. From these it's easy to deal with adjacent squares. If any list is longer than 3, removing adjacent squares can't make it empty, so the query point is lit. If it's only 3 or fewer, it's a constant time operation to see if they're adjacent.
This solution requires the input points to be represented in 4 lists. Since they need to be represented in one list, you can argue that this algorithm requires only a constant factor of space with respect to the input. (I.e. the same sort of cost as mergesort.)
Run time is expected constant per query point for 4 hash table lookups.
Without much trouble, this algorithm can be split so it can be map-reduced if the number of lampposts is huge.
But it may be sufficient and easiest to run it on one big machine. With a billion lamposts and careful data structure choices, it wouldn't be hard to implement with 24 bytes per lampost in an unboxed structures language like C. So a ~32Gb RAM machine ought to work just fine. Building the maps with multiple threads requires some synchronization, but that's done only once. The queries can be read-only: no synchronization required. A nice 10 core machine ought to do a billion queries in well less than a minute.
There is very easy Answer which works
Create Grid of NxN
Now for each Lamp increment the count of all the cells which suppose to be illuminated by the Lamp.
For each query check if cell on that query has value > 0;
For each adjacent cell find out all illuminated cells and reduce the count by 1
This worked fine but failed for size limit when trying for 10000 X 10000 grid

Computing the square root of 1000+ bit word in C

Imagine that we have e.g. 1000 bit word in our memory. I'm wondering if there is any way to calcuate a square root of it (not necessarily accurate, lets say without floating point part). Or we've got only memory location and later specified various size.
I assume that our large number is one array (most significant bits at the beginning?). Square root is more or less half of original number. When trying to use Digit-by-digit algorithm there is a point when usnigned long long is not enough to remember partial result (subtraction with 01 extended number). How to solve it? What with getting single digit of the large number? Only by bitmask?
While thinking about pseudocode stucked at this questions. Any ideas?
How would you do it by hand? How would you divide a 1000 digit number by a 500 digit by hand? (Just think about the method, obviously it would be quite time consuming). Now with a square root, the method is very similar to division where you "guess" the first digit, then the second digit and so on and subtract things. It's just that for a square root, you subtract slightly different things (but not that different, calculating a square root can be done in a way very similar to a division except that with each digit added, the divisor changes).
I wouldn't want to tell you exactly how to do it, because that spoils the whole fun of discovering it yourself.
The trick is: Instead of thinking about the square root of x, think about finding a number y such that y*y = x. And as you improve y, recalculate x - y*y with the minimum effort.
Calculating square roots is very easily done with a binary search algorithm.
A pseudo-code algorithm:
Take a guess c: the 1000 bit value divided by 2 (simple bitshift).
Square it:
If the square (almost) equals your 1000 bit number you've got your answer
If the square is smaller than your number, you can be sure the root is between c and your upper bound.
If the square is larger than your number, you know that the root lies between your lower bound and c.
Repeat until you have found your root, while keeping track of your upper and lower bound.
This kind of algorithm should run in log (N) time.
Kind of depends on how accurate you want it. Consider that the square root of 2^32 == 2^16. So one thing you could do is shift the 1000-bit number 500 bits to the right, and you have an answer that would be in the ballpark.
How well does this work? Let's see. The number 36 in binary is 100100. If I shift that to the right 3 bits, then I get 4. Hmmm ... should be 6. Pretty big error of 33%. The square root of 1,000,000 is 1,000. In binary, 1,000,000 is 1111 0100 0010 0100 0000. That's 20 bits. Shifted right 10 bits, it's 1111 0100 00, or 976. The error is 24/1000, or 2.4%.
When you get to a 1,000 bit number, the absolute error might be large, but the percentage error is going to be very small.
Depending on how you're storing the numbers, shifting a 1,000 bit number 500 bits to the right shouldn't be terribly difficult.
Newton's method is probably the way to go. At some point with Newton's method you're going to have to perform a division (in particular, when finding the next point to test), but it might be okay to approximate this to the nearest power of two and just do a bitshift instead.

How to quickly count the number of neighboring voxels?

I have got a 3D grid (voxels), where some of the voxels are filled, and some are not. The 3D grid is sparsely filled, so I have got a set filledVoxels with coordinates (x, y, z) of the filled voxels. What I am trying to do is find out is for each filled voxel, how many neighboring voxels are filled too.
Here is an example:
filledVoxels contains the voxels (1, 1, 1), (1, 2, 1), and (1, 3, 1).
Therefore, the neighbor counts are:
(1,1,1) has 1 neighbor
(1,2,1) has 2 neighbors
(1,3,1) has 1 neighbor.
Right now I have this algorithm:
voxelCount = new Map<Voxel, Integer>();
for (voxel v in filledVoxels)
count = checkAllNeighbors(v, filledVoxels);
voxelCount[v] = count;
end
checkAllNeighbors() looks up all 26 surrounding voxels. So in total I am doing 26*filledVoxels.size() lookups, which is quite slow.
Is there any way to cut down the number of required lookups? When you look at the above example you can see that I am checking the same voxels several times, so it might be possible to get rid of lookups with some clever caching.
If this helps in any way, the voxels represent a voxelized 3D surface (but there might be holes in it). I usually want to get a list of all voxels that have 5 or 6 neighbors.
You can transform your voxel space into a octree in which every node contains a flag that specifies whether it contains filled voxels at all.
When a node does not contain filled voxels, you don't need to check any of its descendants.
I'd say if each of your lookups is slow (O(size)), you should optimize it by binary search in an ordered list (O(log(size))).
The constant 26, I wouldn't worry much. If you iterate smarter, you could cache something and have 26 -> 10 or something, I think, but unless you have profiled the whole application and found out decisively that it is the bottleneck I would concentrate on something else.
As ilya states, there's not much you can do to get around the 26 neighbor look-ups. You have to make your biggest gains in efficiently identifying whether a given neighbor is filled or not. Given that the brute force solution is essentially O(N^2), you have a lot of possible ground to gain in that area. Since you have to iterate over all filled voxels at least once, I would take an approach similar to the following:
voxelCount = new Map<Voxel, Integer>();
visitedVoxels = new EfficientSpatialDataType();
for (voxel v in filledVoxels)
for (voxel n in neighbors(v))
if (visitedVoxels.contains(n))
voxelCount[v]++;
voxelCount[n]++;
end
next
visitedVoxels.add(v);
next
For your efficient spatial data type, a kd-tree, as Zifre suggested, might be a good idea. In any case, you're going to want to reduce your search space by binning visited voxels.
If you're marching along the voxels one at a time, you can keep a lookup table corresponding to the grid, so that after you've checked it once using IsFullVoxel() you put the value in this grid. For each voxel you're marching in you can check if its lookup table value is valid, and only call IsFullVoxel() it it isn't.
OTOH it seems like you can't avoid iterating over all neighboring voxels, either using IsFullVoxel() or the LUT. If you had some more a priori information it could help. For instance, if you knew that there were at most x neighboring filled voxels, or you knew that there were at most y neighboring filled voxels in each direction. For instance, if you know you're looking for voxels with 5 to 6 neighbors, you can stop after you've found 7 full neighbors or 22 empty neighbors.
I'm assuming that a function IsFullVoxel() exists that returns true if a voxel is full.
If most of the moves in your iteration were to neighbors, you could reduce your checking by around 25% by not looking back at the ones you just checked before you made the step.
You may find a Z-order curve a useful concept here. It lets you (with certain provisos) keep a sliding window of data around the point you're currently querying, so that when you move to the next point, you don't have to throw away many of the queries you've already performed.
Um, your question is not very clear. I'm assuming you just have a list of the filled points. In that case, this is going to be very slow, because you have to iterate through it (or use some kind of tree structure such as a kd-tree, but this will still be O(log n)).
If you can (i.e. the grid is not too big), just make a 3d array of bools. 26 lookups in a 3d array shouldn't really take that long (and there really is no way to cut down on the number of lookups).
Actually, now that I think of it, you could make it a 3d array of longs (64 bits). Each 64 bit block would hold 64 (4 x 4 x 4) voxels. When you are checking the neighbors of a voxel in the middle of the block, you could do a single 64 bit read (which would be much faster).
Is there any way to cut down the number of required lookups?
You will, at minimum, have to perform at least 1 lookup per voxel. Since that's the minimum, then any algorithm which only performs one lookup per voxel will meet your requirement.
One simplistic idea is to initialize an array to hold the count for each voxel, then look at each voxel and increment the neighbors of that voxel in the array.
Pseudo C might look something like this:
#define MAXX 100
#define MAXY 100
#define MAXZ 100
int x, y, z
char countArray[MAXX][MAXY][MAXZ];
initializeCountArray(MAXX, MAXY, MAXZ); // Set all array elements to 0
for(x=0; x<MAXX; x++)
for(y=0;y<MAXY;y++)
for(z=0;z<MAXZ;z++)
if(VoxelExists(x,y,z))
incrementNeighbors(x,y,z);
You'll need to write initializeCountArray so it sets all array elements to 0.
More importantly you'll also need to write incrementNeighbors so that it won't increment outside the array. A slight speed increase here is to only perform the above algorithm on all voxels on the interior, then do a separate run on all the outside edge voxels with a modified incrementNeighbrs routine that understands there won't be neighbors on one side.
This algorithm results in 1 lookup per voxel, and at most 26 byte additions per voxel. If your voxel space is sparse then this will result in very few (relative) additions. If your voxel space is very dense, you might consider reversing the algorithm - initialize the array to the value of 26 for each entry, then decrement the neighbors when a voxel doesn't exist.
The results for a given voxel (ie, how many neighbors do I have?) reside in the array. If you need to know how many neighbors voxel 2,3,5 has, just look at the byte in countArray[2][3][5].
The array will consume 1 byte per voxel. You could use less space, and possibly increase the speed a little bit by packing the bytes.
There are better algorithms if you know details about your data. For instance, a very sparse voxel space will benefit greatly from an octree, where you can skip large blocks of lookups when you already know there are no filled voxels inside. Most of these algorithms, however, would still require at least one lookup per voxel to fill their matrix, but if you are performing several operations then they may benefit more than this one operation.

Resources