Bidirectional selection - dc.js

I have 3 charts: a choropleth chart, row chart and a select menu. All three have the states of my country and the first two graphs show the number of people per state.
I had to create three times the same dimension and groups to have interactivity between them. If I click on the map in the states A, B and C, they highlight in the row chart and vice versa if I select the states in the rowchart appear on the map.
The problem I have is that the selection is only in one direction, if I already selected the states A, B and C in the rowchart and on the map I want to select the state D everything is gray and with values 0.
Is it possible to be bidirectional ?, or even between the three graphs.
Thank you so much.
Best regards.

Related

Diff and merge operations on two sets of quad-tree cells

Imagine we have a set of quad-tree cells that intersect with a viewport. These cells represent areas and are used for a spatial query.
After viewport is moved we get a new set of intersecting quad-tree cells.
Since some of the cells are identical and were already queried, we don't need to query all cells form the second set. However, in addition to identical cells, a cell can be contained within an already queried cell as a cells can have a different depth.
Viewport can be panned and zoomed in or out.
I would like to calculate two new sets from these two sets. A diff-set that contains only the new cells we need to query. And a combined-set that that contains all queried cells after the the second set has also been queried. Combined set should also merge 4 lower level cells to a single higher level cell in case all 4 of them have been queried.
Are there any well known algorithms for these problems? I feel like I'm reinventing the wheel, but I have no idea what keywords to search for.

What is the right scale for highly divergent values in d3 data visualization?

I have two maps that plot values against counties on the United States.
Values being plotted on both are highly divergent.
On the first, around 2.5k of 3.3k counties have values of 0. While the remaining counties have values ranging up to 257,519,000.
This map visualizes just fine using scaleLog likeso:
color = d3
.scaleLog()
.domain(d3.max(data), d3.min(data))
.range(["black", "purple"])
The next map has 0 values for about 3.2k of 3.3k counties. With the remaining c. 100 counties having values that range up to 881,587,418. The values are substantially more divergent.
Using a logarithmic scale to assign color does not work on this second map. All values are black.
What would be the best scale to use here? Or is there another technique for plotting mostly empty, highly divergent values in D3?

minimum number of rectangular regions to fill a grid

Suppose we have a grid and we want to paint rectangular regions on it using the smallest number of colors possible, one for each region.
There are some cells that are already painted black and cannot be painted over:
Is there a polynomial algorithm to solve this problem?
After testing, I found out that the solution for this case is 9 (because we need 9 different colors to paint the minimum number of regions to fill the whole grid):
The greedy approach seems to work well: just search for the rectangle with biggest (white) area and paint it, repeating this until there's nothing else to be painted, but I didn't measure the complexity or the correctness.
Here are a few observations that can simplify this problem in specific cases. First of all, adjacent identical rows and columns can be reduced to one row or column without changing the required number of regions, to form a simplified grid:
A simplified grid where no row or column is divided into more than two uncoloured parts (i.e. has two or more seperate black cells), has an optimal solution which can be found by using the rows or columns as regions (depending on whether the width or height of the grid is greater):
The number of regions is then minimum(width, height) + number of black cells.
If a border row or column in a simplified grid contains no black cells, then using it as a region is always the optimal solution; adding some parts of it to other regions would require at least one additional region to be made in the border row or column (depending on the number of black cells in the adjacent row or column):
This means that the grid can be further simplified by removing border rows and columns with no black cells, and adding the number of removed regions to the region count:
Similarly, if one or more border cells are isolated by a black cell in the adjacent row or column, all the connected uncoloured neighbouring cells can be regarded as one region:
At each point you can go back to previous rules; e.g. after the right- and left-most columns have been turned into regions in the example above, we are left with the grid below, which can be simplified with the first rule, because the bottom two rows are identical:
Collapsing identical adjacent rows or columns can also be applied locally to isolated parts of the grid. The example below has no identical adjacent rows, but the center part is isolated, so there rows 3 to 6 can be collapsed:
And on the left row 3 and 4 can be collapsed locally, and on the right rows 5 and 6, so we end up with the situation in the third image above. These collapsed cells then act as one.
Once you can't find any further simplifications using the rules above, and you want to check every possible division of (part of) a grid, a first step could be to list the maximum rectangle sizes that can be made with the corresponding cell as their top left corner; for the simplified 6x7 grid in the first example above that would be:
COL.1 COL.2 COL.3 COL.4 COL.5 COL.6
ROW 1 [6x1, 3x3, 1x7] [5x1, 2x3] [4x1, 1x7] [3x1] [2x5] [1x7]
ROW 2 [3x2, 1x6] [2x2] [1x6] [] [2x4] [1x6]
ROW 3 [6x1, 1x5] [5x1] [4x3, 2x5] [3x3, 1x5] [2x3] [1x5]
ROW 4 [1x4] [] [4x2, 2x4] [3x2, 1x4] [2x2] [1x4]
ROW 5 [6x1, 4x3] [5x1, 3x3] [4x1, 2x3] [3x1, 1x3] [2x1] [1x3]
ROW 6 [4x2] [3x2] [2x2] [1x2] [] [1x2]
ROW 7 [6x1] [5x1] [4x1] [3x1] [2x1] [1x1]
You can then use these maximum sizes to generate every option for each cell; e.g. for cell (1,1) they would be:
6x1, 5x1, 4x1, 3x3, 3x2, 3x1, 2x3, 2x2, 2x1, 1x7, 1x6, 1x5, 1x4, 1x3, 1x2, 1x1
(Some rectangle sizes in the list can be skipped; e.g. it never makes sense to use the 3x1-sized region without adding the fourth isolated cell to get 4x1.)
After choosing an option, you would skip the cells which are covered by the rectangle you've chosen and try each option for the next cell, and so on...
Running this on large grids will lead to huge numbers op options. However, at each point you can go back to checking whether the simplification rules can help.
To see that a greedy algorithm, which selects the largest rectangles first, cannot guarantee an optimal solution, consider the example below. Selecting the 2x2 square in the middle would lead to a solution with 5 regions, while several solutions with only 4 regions exist.

How do I group points together are in the same curved "row"?

Let's say I have a list of points (x,y) that are correspond to the black dots in the image below, which is a rectangular grid. Here, there are four curved "rows", and eight "columns".
How would I group the each row of points together? In other words, in the image below, how I do group together the first row of points circled in blue (let's call this Group 1), and group together the second row of points circled in blue (let's call this Group 2), etc.
My initial intuition says to start with the top-left point, and search for the closest point using a distance metric that would penalize the y-distance between two points. However, the problem I run into is that when I reach the last point in the first row, how do I know that the row is "complete", and I shouldn't add the right-most point of the 2nd row to my group of points?
Is there a better approach to this type of problem?
That highly depends on how the points are distributed.
For this special case a simple solution would be:
sort points by x
split point list into groups of 4 consecutive points (that's your columns)
sort columns by y
pick the first element of each column and put into row 1
pick the second element of each column and put it into row 2
...

Is there an algorithm to determine contiguous colored regions in a grid?

Given a basic grid (like a piece of graph paper), where each cell has been randomly filled in with one of n colors, is there a tried and true algorithm out there that can tell me what contiguous regions (groups of cells of the same color that are joined at the side) there are? Let's say n is something reasonable, like 5.
I have some ideas, but they all feel horribly inefficient.
The best possible algorithm is O(number of cells), and is not related to the number of colors.
This can be achieved by iterating through the cells, and every time you visit one that has not been marked as visited, do a graph traversal to find all the contiguous cells in that region, and then continue iterating.
Edit:
Here's a simple pseudo code example of a depth first search, which is an easy to implement graph traversal:
function visit(cell) {
if cell.marked return
cell.marked = true
foreach neighbor in cell.neighbors {
if cell.color == neighbor.color {
visit(neighbor)
}
}
}
In addition to recursive's recursive answer, you can use a stack if recursion is too slow:
function visit(cell) {
stack = new stack
stack.push cell
while not stack.empty {
cell = stack.pop
if cell.marked continue
cell.marked = true
foreach neighbor in cell.neighbors {
if cell.color == neighbor.color {
stack.push neighbor
}
}
}
}
You could try doing a flood fill on each square. As the flood spreads, record the grid squares in an array or something, and colour them in an unused colour, say -1.
The Wikipedia article on flood fill might be useful to you here: http://en.wikipedia.org/wiki/Flood_fill
Union-find would work here as well. Indeed, you can formulate your question as a problem about a graph: the vertices are the grid cells, and two vertices are adjacent if their grid cells have the same color. You're trying to find the connected components.
The way you would use a union-find data structure is as follows: first create a union-find data structure with as many elements as you have cells. Then iterate through the cells, and union two adjacent cells if they have the same color. In the end, run find on each cell and store the response. Cells with the same find are in the same contiguous colored region.
If you want a little more fine grain control, you might think about using the A* algorithm and use the heuristic to include similarly colored tiles.
You iterate through the regions in a scanline, going left-right top-bottom. For each cell you make a list of cells shared as the same memory object between the cells. For each cell, you add the current cell to the list (either shared with it or created). Then if the cell to the right or below is the same color, you share that list with that cell. If that cell already has a list, you combine the lists and replace the reference to the list object in each cell listed in the lists with the new merged list.
Then located in each cell is a reference to a list that contains every contiguous cell with that cell. This aptly combines the work of the floodfill between every cell. Rather than repeating it for each cell. Since you have the lists replacing the data with the merged data is just iterating through a list. It will be O(n*c) where n is the number of cells and c is a measure of how contiguous the graph is. A completely disjointed grid will be n time. A completely contiguous 1 color graph with be n^2/2.
I heard this question in a video and also found it here and I came up with what is the best approach I have seen in my searching. Here are the basic steps of the algorithm:
Loop through the array (assuming the grid of colors is represented as a 2-dimensional array) from top-left to bottom-right.
When you go through the first row just check the color to the left to see if it is the same color. When you go through all subsequent rows, check the cell above and the cell to the left - this is more efficient than checking to the top, bottom, left and right every time. Don't forget to check that the left cell is not out of bounds.
Create a Dictionary of type <int,Dictionary<int,Hashset<cell>>> for storing colors and groups within those colors. The Hashset contains cell locations (cell object with 2 properties: int row, int column).
If the cell is not connected at the top or left to a cell of the same color then create a new Dictionary entry, a new color group within that entry, and add the current cell to that group (Hashset). Else it is connected to another cell of the same color; add the current cell to the color group containing the cell it's connected to.
If at some point you encounter a cell that has the same color at the top and left, if they both belong to the same color group then that's easy, just add the current cell to that color group. Else check the kitty-corner cell to the top-left. If it is a different color than the current cell and the cell to the top and cell to the left belong to different color groups --> merge the 2 color groups together; add the current cell to the group.
Finally, loop through all of the Hashsets to see which one has the highest count - this will be the return value.
Here is a link to a video I made with visual and full explanation:
https://d.tube/#!/v/israelgeeksout77/wm2ax1vpu3y
P.S. I found this post on GeeksForGeeks https://www.geeksforgeeks.org/largest-connected-component-on-a-grid/
They conveniently posted source code to this problem in several languages! But I tried their code vs. mine and mine ran in about 1/3 of the time.

Resources