In my OSX app, I'm using an NSTreeController to keep track of any changes to to a document. The tree controller enables versioning by acting as a source control, which means that documents can create their own branches, etc.
It works fine so far. The problem is that every change to the document adds an NSTreeNode to the tree. Which means that after a few hours of use, the tree has accumulated many nodes, which means tons of objects in memory.
Is there a way I can create an NSTreeController with a capacity (like you'd give to an NSArray) which will automatically trim child nodes? If not, what's the best way to manually flush nodes at an appropriate interval so memory usage doesn't bloat?
Related
My back-end (Java) is heavily relying on Tree structures with strong inheritance. Conflict resolution is complex so I am looking to test way to simply block users when the propagation of changes in higher nodes has not yet reached the current element.
Hierarchies are represented through both Materialized Paths and Adjacency Lists for performance reasons. The goal would be to;
Prevent update (bad request) when API requests the change of a node with pending propagation
Inform user through the DTO (e.g. isLocked attribute) when they retrieve a node with pending propagation
Propagation is a simple matter of going through all nodes in a top-down fashion, previously level-by-level (which would have been easier) but now is no longer orchestrated: each node sends the message to its children.
At the moment I have two ideas I do not like:
Add a locked flag on each node (persisted in DB), toggle it to true for all descendants of a modified node, then each node can be unlocked after being processed.
Leverage the materialized path and record the current unprocessed node in a new table. If node D with path A.B.C.D is queried, any of the 4 path nodes in DB means the node has not been processed yet and should be locked.
I do not like approach 1 because it needs to update all entities twice, although retrieving the list would be quick with the Materialized Path.
I do not like approach 2 because:
The materialized path is stored as VARCHAR2, thus the comparison cannot be done in DB and I would first have to unwrap the path to get all nodes in the path and then query the DB to check for any of the elements in hierarchy.
Trees can be quite large with hundreds of children per node, tens of thousands of nodes per tree. Modifying the root would create a huge amount of those temporary records holding the current 'fringe' of the propagation. That many independent DB calls is not ideal, especially since nodes can often be processed in less than 10 ms. I'd probably quickly encounter a bottleneck and bad performances.
Is there another approach that could be taken to identify whether a propagation has reached a node? Examples, comparisons, ... Anything that could help decide on the best way to approach this problem.
Elasticsearch provides different ways of paginating through large amounts of data. The scroll API and search_after/PIT both allow a view of the data at a given point in time.
Normally, the background merge process optimizes the index by merging
together smaller segments to create new bigger segments, at which time
the smaller segments are deleted. This process continues during
scrolling, but an open search context prevents the old segments from
being deleted while they are still in use. This is how Elasticsearch
is able to return the results of the initial search request,
regardless of subsequent changes to documents.
It seems that this is achieved in the same way for both. The old segments are prevented from being deleted. However, search_after/PIT is often referred to as the light-weight alternative. Is this purely because a PIT can be shared between queries and therefore not as many PITs should have to be created?
I need a fixed-size cache of objects that keeps track how many times each object was requested. When it is full and a new object is added, the object with the lowest usage score gets removed.
So this is different from a LRU-cache of size N in that if some object is heavily requested, then even adding N new objects won't push it out of cache.
Some kind of mix of a cache and a priority queue. Is there a name for that?
Thanks!
Without a time element, this kind of cache clogs up with things that were used a lot in the past, but aren't used currently. Replacement becomes impossible, because everything in the cache has been used more than once, so you won't evict anything in favor of a new item.
You could write some code that degrades the value of the count over time (i.e. take into account the time since last used), but doing so is just a really complicated way of simulating an LRU cache. I experimented with it at one point, but found that it didn't perform any better than the simple LRU cache. At least not in my application.
i just want to know what is the suitable data structure to implement for:
1. storing the recently visited web addresses on a web browser?
2. the processes to be scheduled on the CPU of a computer?
3. the undo mechanism in a text editor like Notepad?
storing the recently visited web addresses on a web browser?
If you want to store the k last addresses, you can use a queue.
If the queue is smaller than k - just add the new address.
If it's of size k, delete the last element in the queue (one inserted first, the "oldest" entry), and insert the new one.
Might want to combine it with a set (or a map that maps to the queue's entry) to make sure an entry is not filling multiples values in your queue.
If you don't need to ever delete entries (and the number of "visited" elements is unbounded), you can use a set.
the processes to be scheduled on the CPU of a computer?
There are many options for that, but some simple ones are using a queue, or a priority queue (heap).
the undo mechanism in a text editor like Notepad?
A stack. Each "do" is a push, and to "Undo", you pop the last element and revert its action.
This question is about a Java JTree or a Window .Net Tree (Winforms) or an Adobe Flex Tree.
In a client-server application (for Flex it's Web, really), I have a tree with hierarchical data (in a Windows Explorer type interface). Right now I lazily load up the tree as the user requests more data from the server. This is fine and will work up to about 750K nodes (empirically tested on .Net Winforms and Adobe Flex), but after that it gets sluggish. But the databases grow fast (mostly because users can paste in huge amounts of nodes) and a database of 20 million nodes is not at all unlikely.
Should I be releasing data from the tree when a branch is collapsed so the Garbage Collector can release the memory? This is fine, but what if the users are not efficient and don't collapse branches? Should I do a memory management module that goes around closing branches that haven't been touched in a while?
This all seems like a lot of work so as not to not run out of memory.
Edit: Should I release data on node collapse? If so, when? The weak-object cache idea is good, but should I just continue filling up the UI until it busts (maybe it's not a bad idea)?
If the users don't collapse branches, then I guess they're going to be scrolling through 750K to 20M nodes, right? Seems rather inefficient to me from a user POV. So, the problem may well be self-enforcing.
in most frameworks i've seen, the tree structure itself is fairly efficient, but if you have a non-trivial object at each tree leaf it quickly adds.
the easiest would be not to store anything at the tree leaves, but on the render/draw/update/whatever method, pick your object from a weak-ref cache. if it's not there, load from the server. the trick is not to keep any other reference to the object, only the weak one on the cache. that way, it will be still available, but collected when necessary.