how can I use material-ui's TreeView, where all the data is in a single source of truth javascript data structure, and avoid rerenders on parts of the subtree that have not changed (using an identity check on the inputs) when modifying the data (adding/removing/renaming nodes, changing the selection, changing the open/close state, etc.)? There are no examples of this on https://material-ui.com/components/tree-view/, and using the "rich object" example for large-ish wide trees (~1000 elements, for example in a full binary tree), rerendering everything takes about one second on my computer.
By doing it manually, or by using immutable data structures libraries, it is possible for the input tree to reuse subtrees from previous renders when modified. For example, given the following tree,
A
/ \
B C
/ \ \
D E F
If you want to add a child X to F, then you would create new nodes for X, F, C, and A, but you could keep the B, D, and E nodes.
How can we take advantage of this to get acceptable performance (<50ms I would say, ideally only a few ms) for large wide trees?
Related
I'm looking for the best way to fill requirements depicted by a series of trees, and a map of data. A basic tree might look like this:
10
/ \
A B
And a map of data might look like this:
A: 7
B: 6
In these examples, the 10 represents a requirement, while the data list is what I have to work with. So, I could "fill" this tree by giving it 4 As and 6 Bs, or 5 of each, etc. Now, I want to use all the As and Bs available to me, and having a surplus isn't necessarily an issue (so I'm going to give 7 and 6 in this case), but things get more complicated; we can have multiple trees, and trees can have multiple levels, where each node but the leaves are requirements, potentially giving us something like this:
40 30
/ | \ / \
20 C D A C
/ \
A B
So we would need the A and B on the first tree to add to 20, the C and D on the first tree to add to twenty, and the A and C on the second tree to add to 30. (No tree should have the same letter appearing twice.) We can have any number of levels in a tree, or any number of trees.
Lastly, our data set may not be perfect. It may not be possible to fill both trees up all the way after optimization (we might have both trees falling short of their requirements, we might have one tree surpassing requirements while the other falls short, etc.) What I need is a way to, given these trees and a list of how many As, Bs, Cs, etc. we have available, fill up as many trees as possible. We've been at this for a while, but none of us are good enough at proofs to say "this way will work every time".
Does anyone know of a way to do this?
It a Max Flow problem. https://en.wikipedia.org/wiki/Maximum_flow_problem
But you need to modify graph.
You have a source. Source are connected to your resources (A, B, C) by edges. Throughput of this edges is amount of the resources. Then you connect all your trees to resources. You modify your trees so node throughput goes to outgoing edge throughput
And then outputs of your trees goes to one destination node.
Sequential traversal is the main difference between linear and non linear data structures.Can anyone explain it briefly?
A linear data structure is something like this:
A
B
C
D
E
For instance, lists and arrays. Each element is followed by a single element. Traversal is trivial, as you simply go from one element to the next. For instance, if you start at A, you only have one next element B, from B you only have one next element C and so on.
A non-linear data structure is something like this:
A
/ \
B C
/ \ / \
D E F G
For instance, a tree. Notice how A is followed by two elements; B and C, and each of them is followed by two elements. Now traversal is more complex, because once you start from A, you have a choice of going to either B and C. What's more, once at B, you have a choice of going further down, or going "sideways" to C. In this case (a tree), your traversal options are breadth-first or depth-first.
We have social graph that is later broken to clusters of high cohesion. Something called Truss by Jonathan Cohen [1].
Now that I have those clusters, I would like to come up with names for them.
Cluster name should allow insignificant changes to the cluster size without changing the name.
For example:
Let's assume we have cluster M:
M : {A, B, C, D, E, F}
and let's assume that "naming algorithm" generated name " m " for it.
After some time, vertex A has left the cluster, while vertex J has joined:
M : {B, C, D, E, F, J}
Newly generated name is " m' ".
Desired feature:
m' == m for insignificant cluster changes
[1] http://www.cslu.ogi.edu/~zak/cs506-pslc/trusses.pdf
Based on your example, I assume you mean "insignificant changes to the cluster composition", not to the "cluster size".
If your naming function f() cannot use the information about the existing name for the given cluster, you would have to allow that sometimes it does rename despite the change being small. Indeed, suppose that f() never renames a cluster when it changes just a little. Starting with cluster A, you can get to any other cluster B by adding or removing only one element at a time. By construction, the function will return the same name for A and B. Since A, B were arbitrary, f() will return the same name for all possible clusters - clearly useless.
So, you have two alternatives:
(1) the naming function relies on the existing name of a cluster, or
(2) the naming function sometimes (rarely) renames a cluster after a very tiny change.
If you go with alternative (1), it's trivial. You can simply assign names randomly, and then keep them unchanged whenever the cluster is updated as long as it's not too different (however you define different). Given how simple it is, I suppose that's not what you want.
If you go with alternative (2), you'll need to use some information about the underlying objects in the cluster. If all you have are links to various objects with no internal structure, it can't be done, since the function wouldn't have anything to work with apart from cluster size.
So let's say you have some information about the objects. For example, you may have their names. Call the first k letters of each object's name the object's prefix. Count all the different prefixes in your cluster, and find the n most common ones. Order these n prefixes alphabetically, and append them to each other in that order. For a reasonable choice of k, n (which should depend on the number of your clusters and typical object name lengths), you would get the result you seek - as long as you have enough objects in each cluster.
For instance, if objects have human names, try k = 2; and if you have hundreds of clusters, perhaps try n = 2.
This of course, can be greatly improved by remapping names to achieve a more uniform distribution, handling the cases where two prefixes have similar frequencies, etc.
If i have a tree structure whose nodes can have zero to many children, with each node holding some data value along with a boolean switch, how do i minimally represent the state of this tree for nodes with a particular switch value?
For example, say that my tree looks something like:
A[0] -> B[1] -> C[1]
|-----> D[1]
|-----> E[1]
Here we have a state where 4 nodes are checked, is there a way to represent this state in a concise manner? The naive approach would be to list the four nodes as being checked, but what if node B had 100 children instead of just four?
My current line of thinking is to store each node's ancestor in the data component and describe the checked state in terms of the set of ancestors that minimize the data required to represent a state. In the tree below, an ancestor of node N is represented as n'. So the above tree would now look something like:
A[0, {a}] -> B[1, {a', b}] -> C[1, {a' b' c}]
|--------------> D[1, {a' b' d}]
|--------------> E[1, {a' b' e}]
Now you can analyze the tree and see that all of node A's children are checked, and describe the state simply as the nodes with data element a' are set to 1, or just [a']. If node D's state switched to 0, the you could describe the tree state as [a' not d].
Are there data structures or algorithms that can be used to solve a problem of this type? Any thoughts on a better approach? Any thoughts on the analysis algorithm?
Thanks
Use a preorder tree traversal starting from the root. If a node is checked don't traverse its children. For each traversed node store it's checked state (boolean 0/1) in a boolean bitmap (8bits/byte). Finally compress the result with zip/bzip or any other compression technique.
When you reconstruct the state, first decompress, then use preorder tree traversal, set each node based on the state, if state is checked set all children to checked and skip them.
In general there is no technique that will always be able to store the checked elements in fewer than n bits of space, where n is the number of elements in the tree. The rationale behind this is that there are 2^n different possible check states, so you need at least 2^n different encodings, so there must be at least one coding of length 2^n since there are only 2^n - 1 encodings that are shorter than this.
Given this, if you really want to minimize space usage, I would suggest going with an encoding like the one #yi_H suggests. It uses precisely n bits for each encoding. You might be able to compress most of the encodings by applying a standard compression algorithm to the bits, which for practical sets of checked nodes might do quite well, but which degrades gracefully in the worst case.
Hope this helps!
I'm trying to figure out how non-destructive manipulation of large collections is implemented in functional programming, ie. how it is possible to alter or remove single elements without having to create a completely new collection where all elements, even the unmodified ones, will be duplicated in memory. (Even if the original collection would be garbage-collected, I'd expect the memory footprint and general performance of such a collection to be awful.)
This is how far I've got until now:
Using F#, I came up with a function insert that splits a list into two pieces and introduces a new element in-between, seemingly without cloning all unchanged elements:
// return a list without its first n elements:
// (helper function)
let rec skip list n =
if n = 0 then
list
else
match list with
| [] -> []
| x::xs -> skip xs (n-1)
// return only the first n elements of a list:
// (helper function)
let rec take list n =
if n = 0 then
[]
else
match list with
| [] -> []
| x::xs -> x::(take xs (n-1))
// insert a value into a list at the specified zero-based position:
let insert list position value =
(take list position) # [value] # (skip list position)
I then checked whether objects from an original list are "recycled" in new lists by using .NET's Object.ReferenceEquals:
open System
let (===) x y =
Object.ReferenceEquals(x, y)
let x = Some(42)
let L = [Some(0); x; Some(43)]
let M = Some(1) |> insert L 1
The following three expressions all evaluate to true, indicating that the value referred to by x is re-used both in lists L and M, ie. that there is only 1 copy of this value in memory:
L.[1] === x
M.[2] === x
L.[1] === M.[2]
My question:
Do functional programming languages generally re-use values instead of cloning them to a new memory location, or was I just lucky with F#'s behaviour? Assuming the former, is this how reasonably memory-efficient editing of collections can be implemented in functional programming?
(Btw.: I know about Chris Okasaki's book Purely functional data structures, but haven't yet had the time to read it thoroughly.)
I'm trying to figure out how
non-destructive manipulation of large
collections is implemented in
functional programming, ie. how it is
possible to alter or remove single
elements without having to create a
completely new collection where all
elements, even the unmodified ones,
will be duplicated in memory.
This page has a few descriptions and implementations of data structures in F#. Most of them come from Okasaki's Purely Functional Data Structures, although the AVL tree is my own implementation since it wasn't present in the book.
Now, since you asked, about reusing unmodified nodes, let's take a simple binary tree:
type 'a tree =
| Node of 'a tree * 'a * 'a tree
| Nil
let rec insert v = function
| Node(l, x, r) as node ->
if v < x then Node(insert v l, x, r) // reuses x and r
elif v > x then Node(l, x, insert v r) // reuses x and l
else node
| Nil -> Node(Nil, v, Nil)
Note that we re-use some of our nodes. Let's say we start with this tree:
When we insert an e into the the tree, we get a brand new tree, with some of the nodes pointing back at our original tree:
If we don't have a reference to the xs tree above, then .NET will garbage collect any nodes without live references, specifically thed, g and f nodes.
Notice that we've only modified nodes along the path of our inserted node. This is pretty typical in most immutable data structures, including lists. So, the number of nodes we create is exactly equal to the number of nodes we need to traverse in order to insert into our data structure.
Do functional programming languages
generally re-use values instead of
cloning them to a new memory location,
or was I just lucky with F#'s
behaviour? Assuming the former, is
this how reasonably memory-efficient
editing of collections can be
implemented in functional programming?
Yes.
Lists, however, aren't a very good data structure, since most non-trivial operations on them require O(n) time.
Balanced binary trees support O(log n) inserts, meaning we create O(log n) copies on every insert. Since log2(10^15) is ~= 50, the overhead is very very tiny for these particular data structures. Even if you keep around every copy of every object after inserts/deletes, your memory usage will increase at a rate of O(n log n) -- very reasonable, in my opinion.
How it is possible to alter or remove single elements without having to create a completely new collection where all elements, even the unmodified ones, will be duplicated in memory.
This works because no matter what kind of collection, the pointers to the elements are stored separately from the elements themselves. (Exception: some compilers will optimize some of the time, but they know what they are doing.) So for example, you can have two lists that differ only in the first element and share tails:
let shared = ["two", "three", "four"]
let l = "one" :: shared
let l' = "1a" :: shared
These two lists have the shared part in common and their first elements different. What's less obvious is that each list also begins with a unique pair, often called a "cons cell":
List l begins with a pair containing a pointer to "one" and a pointer to the shared tail.
List l' begins with a pair containing a pointer to "1a" and a pointer to the shared tail.
If we had only declared l and wanted to alter or remove the first element to get l', we'd do this:
let l' = match l with
| _ :: rest -> "1a" :: rest
| [] -> raise (Failure "cannot alter 1st elem of empty list")
There is constant cost:
Split l into its head and tail by examining the cons cell.
Allocate a new cons cell pointing to "1a" and the tail.
The new cons cell becomes the value of list l'.
If you're making point-like changes in the middle of a big collection, typically you'll be using some sort of balanced tree which uses logarithmic time and space. Less frequently you may use a more sophisticated data structure:
Gerard Huet's zipper can be defined for just about any tree-like data structure and can be used to traverse and make pointlike modifications at constant cost. Zippers are easy to understand.
Paterson and Hinze's finger trees offer very sophisticated representations of sequences, which among other tricks enable you to change elements in the middle efficiently—but they are hard to understand.
While the referenced objects are the same in your code,
I beleive the storage space for the references themselves
and the structure of the list
is duplicated by take.
As a result, while the referenced objects are the same,
and the tails are shared between the two lists,
the "cells" for the initial portions are duplicated.
I'm not an expert in functional programming,
but maybe with some kind of tree you could achieve
duplication of only log(n) elements,
as you would have to recreate only the path from the root
to the inserted element.
It sounds to me like your question is primarily about immutable data, not functional languages per se. Data is indeed necessarily immutable in purely functional code (cf. referential transparency), but I'm not aware of any non-toy languages that enforce absolute purity everywhere (though Haskell comes closest, if you like that sort of thing).
Roughly speaking, referential transparency means that no practical difference exists between a variable/expression and the value it holds/evaluates to. Because a piece of immutable data will (by definition) never change, it can be trivially identified with its value and should behave indistinguishably from any other data with the same value.
Therefore, by electing to draw no semantic distinction between two pieces of data with the same value, we have no reason to ever deliberately construct a duplicate value. So, in cases of obvious equality (e.g., adding something to a list, passing it as a function argument, &c.), languages where immutability guarantees are possible will generally reuse the existing reference, as you say.
Likewise, immutable data structures possess an intrinsic referential transparency of their structure (though not their contents). Assuming all contained values are also immutable, this means that pieces of the structure can safely be reused in new structures as well. For example, the tail of a cons list can often be reused; in your code, I would expect that:
(skip 1 L) === (skip 2 M)
...would also be true.
Reuse isn't always possible, though; the initial portion of a list removed by your skip function can't really be reused, for instance. For the same reason, appending something to the end of a cons list is an expensive operation, as it must reconstruct a whole new list, similar to the problem with concatenating null-terminated strings.
In such cases, naive approaches quickly get into the realm of awful performance you were concerned about. Often, it's necessary to substantially rethink fundamental algorithms and data structures to adapt them successfully to immutable data. Techniques include breaking structures into layered or hierarchical pieces to isolate changes, inverting parts of the structure to expose cheap updates to frequently-modified parts, or even storing the original structure alongside a collection of updates and combining the updates with the original on the fly only when the data is accessed.
Since you're using F# here I'm going to assume you're at least somewhat familiar with C#; the inestimable Eric Lippert has a slew of posts on immutable data structures in C# that will probably enlighten you well beyond what I could provide. Over the course of several posts he demonstrates (reasonably efficient) immutable implementations of a stack, binary tree, and double-ended queue, among others. Delightful reading for any .NET programmer!
You may be interested in reduction strategies of expressions in functional programming languages. A good book on the subject is The Implementation of Functional Programming Languages, by Simon Peyton Jones, one of the creators of Haskell.
Have a look especially at the following chapter Graph Reduction of Lambda Expressions since it describes the sharing of common subexpressions.
Hope it helps, but I'm afraid it applies only to lazy languages.