LINQ/Lambda to Binary Tree - linq

I'm looking for a way to select from a binary tree (AVL or B tree) using Lambda without running through all the nodes in the tree.
Any suggestions or interesting links?

If you want your binary tree to take advantage of the structure it knows it has to provide improved performance when queried with LINQ, you need to implement an IQueryProvider.
There's an excellent series of blog posts on that here, but this is not something to be undertaken lightly.

Related

best data structure to represent a non-binary tree

I have a hierarchical data structure. There is not much addition, deletion done to this structure, its mostly for reading and searching. I'm trying my best to find a good data structure to store this data to enable fast searching. All the examples/tutorials I have seen talk about some form of binary tree. Is there a data structure (tree) that will enable me to model this effectively. An alternative form I can think of is to use a graph, but I'm not sure about that.
B-Tree will be the best choice for your description because of its amazing performance in "reading and searching", it will enable you achieve log(n) for insertion/deletion/search, beside it's a cache friendly so you will get the minimum number of cache misses.

How was the concept of tree (abstract data type) first proposed?

I'm learning the tree (abstract data type), and I don't just want to know about the definition and operations, which can be found without any difficulty.
I want more, especially who propose the tree, and in what situation did he work out the tree structure. Because besides using the tree, I'm interested in why some people can design such a usefull data structure.
By the way, when I learn Stack(another data structure), I find a short history that tell me how the stack were proposed in Wikipedia. But when it comes to Tree, there is no such a history.
So, could anybody tell me something about the history of Tree? Thanks!

A data structure with certain properties

I want to implement a data structure myself in C++11. What I'm planning to do is having a data structure with the following properties:
search. O(log(n))
insert. O(log(n))
delete. O(log(n))
iterate. O(n)
What I have been thinking about after research was implementing a balanced binary search tree. Are there other structures that would fulfill my needs? I am completely new to this topic and thought a question here would give me a good jumpstart.
First of all, using the existing standard library data types is definitely the way to go for production code. But since you are asking how to implement such data structures yourself, I assume this is mainly an educational exercise for you.
Binary search trees of some form (https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree#Implementations) or B-trees (https://en.wikipedia.org/wiki/B-tree) and hash tables (https://en.wikipedia.org/wiki/Hash_table) are definitely the data structures that are usually used to accomplish efficient insertion and lookup. If you want to go wild you can combine the two by using a tree instead of a linked list to handle hash collisions (although this has a good potential to actually make your implementation slower if you don't make massive mistakes in sizing your hash table or in choosing an adequate hash function).
Since I'm assuming you want to learn something, you might want to have a look at minimal perfect hashing in the context of hash tables (https://en.wikipedia.org/wiki/Perfect_hash_function) although this only has uses in special applications (I had the opportunity to use a perfect minimal hash function exactly once). But it sure is fascinating. As you can see from the link above, the botany of search trees is virtually limitless in scope so you can also go wild on that front.

Best tree structure for Multi-dimensional data

To organize multi-dimensional data,
What is the most useful and efficient tree data structure?
(eg, K-D-B tree, region quadtree, R-tree)
I want to know best search time and best space utilization tree structure.
It highly depends on how your data is distributed in the space and how you want to search for it (what are the criteria you query for?).
It is very easy to find the right quad-tree bin given a location in space, on the other hand it introduces more overhead than a well-shaped kd-tree. There is a reason why all of these techniques are still in use.
Specify the problem you want to solve with the data structure.
Different data structures, including trees and information about them and source code of their implementation is found at https://ece.uwaterloo.ca/~ece250/Algorithms/
Furthermore, runtime information and asymptotic analysis on different types of tree structures is found under section 4 at https://ece.uwaterloo.ca/~ece250/Lectures/Slides/
These are very useful and reliable and this way you can choose the best structure depending on your specific needs/ data
I hope this helps!

How can I build an incremental directed acyclic word graph to store and search strings?

I am trying to store a large list of strings in a concise manner so that they can be very quickly analyzed/searched through.
A directed acyclic word graph (DAWG) suits this purpose wonderfully. However, I do not have a list of the strings to include in the first place, so it must be incrementally buildable. Additionally, when I search through it for a string, I need to bring back data associated with the result (not just a boolean saying if it was present).
I have found information on a modification of the DAWG for string data tracking here: http://www.pathcom.com/~vadco/adtdawg.html It looks extremely, extremely complex and I am not sure I am capable of writing it.
I have also found a few research papers describing incremental building algorithms, though I've found that research papers in general are not very helpful.
I don't think I am advanced enough to be able to combine both of these algorithms myself. Is there documentation of an algorithm already that features these, or an alternative algorithm with good memory use & speed?
I wrote the ADTDAWG web page. Adding words after construction is not an option. The structure is nothing more than 4 arrays of unsigned integer types. It was designed to be immutable for total CPU cache inclusion, and minimal multi-thread access complexity.
The structure is an automaton that forms a minimal and perfect hash function. It was built for speed while traversing recursively using an explicit stack.
As published, it supports up to 18 characters. Including all 26 English chars will require further augmentation.
My advice is to use a standard Trie, with an array index stored in each node. Ya, it is going to seem infantile, but each END_OF_WORD node represents only one word. The ADTDAWG is a solution to each END_OF_WORD node in a traditional DAWG representing many, many words.
Minimal and perfect hash tables are not the sort of thing that you can just put together on the fly.
I am looking for something else to work on, or a job, so contact me, and I'll do what I can. For now, all I can say is that it is unrealistic to use heavy optimization on a structure that is subject to being changed frequently.
Java
For graph problems which require persistence, I'd take a look at the Neo4j graph DB project. Neo4j is designed to store large graphs and allow incremental building and modification of the data, which seems to meet the criteria you describe.
They have some good examples to get you going quickly and there's usually example code to get you started with most problems.
They have a DAG example with a link at the bottom to the full source code.
C++
If you're using C++, a common solution to graph building/analysis is to use the Boost graph library. To persist your graph you could maintain a file based version of the graph in GraphML (for example) and read and write to that file as your graph changes.
You may also want to look at a trie structure for this (potentially building a radix-tree). It seems like a decent 'simple' alternative structure.
I'm suggesting this for a few reasons:
I really don't have a full understanding of your result.
Definitely incremental to build.
Leaf nodes can contain any data you wish.
Subjectively, a simple algorithm.

Resources