I read few books, but still didn't understand why its considered as linear. Not sure because of appearance or sequential access or something else. If possible please explain in some logical terms.
Data structures fall into two categories: Linear and Non-Linear. A data structure is said to be linear if the elements form a sequence, for example Array, Linked list, queue etc. Elements in a nonlinear data structure do not form a sequence, for example Tree, Hash tree, Binary tree, etc.
There are two ways of representing linear data structures in memory. One way is to have the linear relationship between the elements by means of sequential memory locations. Such linear structures are called arrays. The other way is to have the linear relationship between the elements represented by means of links. Such linear data structures are called linked list.
That is so because they follow a linear - fashion, they move from one block to another step - by - step.
Linked lists, Stack, Queues are linear because they have connected in a manner that they can have only one descendant at any node. Unlike trees and graphs which can have one or more child or nodes connected to a given node.
Just imagine nature .
When the events occur one after the other - you go for linear (time events , age , weather etc ) .
When the events branch out - you go for non linear (for example food-chain , biological nomenclature , classification of living beings into kingdoms , family etc )
Linked list :-
Linked list is a collection of nodes .
It has a end to end connection.one node is hold the address of another node or next node i.e. nodes are dependent on each other. They are connected in a linear manner.hence linked list is called as a linear data structure
Linear data structures:
the elements (objects) are sequential and ordered in a way so that:
there is only one first element and has only one next element,
there is only one last element and has only one previous element, while
all other elements have a next and a previous element
https://codeandwork.github.io/courses/java/linearDataStructures.html
non-linear data structures are those that are not linear!
Related
A linear data structure traverses the data elements sequentially, in which only one data element can directly be reached. Ex: Arrays, Linked Lists.
But in doubly linked list we can reach two data elements using previous pointer and next pointer.
So can we say that doubly linked list is a non linear data structure?
Correct me if I am wrong.
Thank you.
Non-linear data structures are those data-structure in which the elements appear in a non-linear fashion,which requires two or more than two-dimensional representation . The elements may OR mayn't(mostly) be stored in contiguous memory locations,rather in any order/non-linearly as if you have skipped the elements in between. Accessing the elements are also done in an out-of-order pattern.
Example :- A Tree, here one may iterate from root to right child,to its right child,... and so on---thereby skipping all the left nodes.
But, in doubly linked list, you have to move sequentially(linearly) only, to move forward(using forward pointer) or backward(using previous pointer).
You can't jump from any element in the list to any distant element without traversing the intermediary elements.
Hence, doubly-linked list is a linear data structure. In a linear data structure, the elements are arranged in a linear fashion(that is,one-dimensional representation).
You are wrong; 2 justifications:
While you can get to 2 elements from any node, one of them was the one you used to get to this node, so you can only get to one new node from each.
It is still linear in that it has to be traversed sequentially, or in a line.
It is still sequential: you need to go over some elements in the list to get to a particular element, compared to an array where you can randomly access each element.
However, you can go linearly forwards or backwards, which may optimize the search.
linked list is basically a linear data Structure because it stores data in a linear fashion. A linear data Structure is what which stores data in a linear format and the traversing is in sequential manner and not in zigzag way.
It depends on where you intend to apply linked lists. If you based it on storage, a linked list is considered non-linear. On the other hand, if you based it on access strategies, then a linked list is considered linear.
I was recently asked a coding question on the below problem.
I have some solution to this problem but I am not very sure if those are most efficient.
Problem:
Write a program to track set of text ranges. Start point and end point will be string.
Text range example : [AbA-Ef]
Aa would fall before this range
AB would fall inside this range
etc.
String comparison would be like 'A' < 'a' < 'B' < 'b' ... 'Z' < 'z'
We need to support following operations on this range
Add range - this should merge the ranges if applicable
Delete range - this deletes range from tracked ranges and recompute the ranges
Query range - Given a character, function should return whether it is part of any of tracked ranges or not.
Note that tracked ranges can be dis-continuous.
My solutions:
I came up with two approaches.
Store ranges as doubly linked list or
Store ranges as some sort of balanced tree with leaf node having actual data and they are inter-connected as linked list.
Do you think that this solution are good enough or you can think of any better way of doing this so that those three API gives your best performance ?
You are probably looking for an interval tree.
Use the data structure with your custom comparator to indicate "What's on range", and you will be able to do the required operations efficiently.
Note, an interval tree is actually an efficient way to implement your 2nd idea (Store ranges as a some sort of balanced tree)
I'm not clear on what the "delete range" operation is supposed to do. Does it;
Delete a previously inserted range, and recompute the merge of the remaining ranges?
Stop tracking the deleted range, regardless of how many times parts of it have been added.
That doesn't make a huge difference algorithmically; it's just bookkeeping. But it's important to clarify. Also, are the ranges closed or half-open? (Another detail which doesn't affect the algorithm but does affect the implementation).
The basic approach to this problem is to merge the tracked set into a sorted list of disjoint (non-overlapping) ranges; either as a vector or a binary search tree, or basically any structure which supports O(log n) searching.
One approach is to put both endpoints of every disjoint range into the datastructure. To find out if a target value is in a range, find the index of the smallest endpoint greater than the target. If the index is odd the target is in some range; even means it's outside.
Alternatively, index all the disjoint ranges by their start points; find the target by searching for the largest start-point not greater than the target, and then compare the target with the associated end-point.
I usually use the first approach with sorted vectors, which are plausible if (a) space utilization is important and (b) insert and merge are relatively rare. With binary search trees, I go for the second approach. But they differ only in details and constants.
Merging and deleting are not difficult, but there are an annoying number of cases. You start by finding the ranges corresponding to the endpoints of the range to be inserted/deleted (using the standard find operation), remove all the ranges in between the two, and fiddle with the endpoints to correct the partially overlapping ranges. While the find operation is always O(log n), the tree/vector manipulation is o(n) (if the inserted/deleted range is large, anyway).
Most languages, including Java and C++, have a some sort of ordered map or ordered set in which you can both look up a value and find the next value after or the first value before a value. You could use this as a building block - If it contains a set of disjoint ranges then it will have a least element of a range followed by a greatest element of a range followed by the least element of a range followed by the greatest element of a range and so on. When you add a range you can check to see if you have preserved this property. If not, you need to merge ranges. Similarly, you want to preserve this when you delete. Then you can query by just looking to see if there is a least element just before your query point and a greatest element just after.
If you want to create your own datastructure from scratch, I would think about some sort of radix trie structure, because this avoids doing lots of repeated string comparisons.
I think you would go for B+ tree it's the same which you have mentioned as your second approach.
Here are some properties of B+ tree:
All data is stored leaf nodes.
Every leaf is at the same level.
All leaf nodes have links to other leaf nodes.
Here are few applications B+ tree:
It reduces the number of I/O operations required to find an element in the tree.
Often used in the implementation of database indexes.
The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context — in particular, file systems.
NTFS uses B+ trees for directory indexing.
Basically it helps for range queries look ups, minimizes tree traversing.
I read this on wikipedia:
In B-trees, internal (non-leaf) nodes can have a variable number of
child nodes within some pre-defined range. When data is inserted or
removed from a node, its number of child nodes changes. In order to
maintain the pre-defined range, internal nodes may be joined or split.
Because a range of child nodes is permitted, B-trees do not need
re-balancing as frequently as other self-balancing search trees, but
may waste some space, since nodes are not entirely full.
We have to specify this range for B trees. Even when I looked up CLRS (Intro to Algorithms), it seemed to make to use of arrays for keys and children. My question is- is there any way to reduce this wastage in space by defining the keys and children as lists instead of predetermined arrays? Is this too much of a hassle?
Also, for the life of me I'm not able to get a decent psedocode on btreeDeleteNode. Any help here is appreciated too.
When you say "lists", do you mean linked lists?
An array of some kind of element takes up one element's worth of memory per slot, whether that slot is filled or not. A linked list only takes up memory for elements it actually contains, but for each one, it takes up one element's worth of memory, plus the size of one pointer (two if it's a doubly-linked list, unless you can use the xor trick to overlap them).
If you are storing pointers, and using a singly-linked list, then each list link is twice the size of each array slot. That means that unless the list is less than half full, a linked list will use more memory, not less.
If you're using a language whose runtime has per-object overhead (like Java, and like C unless you are handling memory allocation yourself), then you will also have to pay for that overhead on each list link, but only once on an array, and the ratio is even worse.
I would suggest that your balancing algorithm should keep tree nodes at least half full. If you split a node when it is full, you will create two half-full nodes. You then need to merge adjacent nodes when they are less than half full. You can then use an array, safe in the knowledge that it is more efficient than a linked list.
No idea about the details of deletion, sorry!
B-Tree node has an important characteristic, all keys in the node is sorted. When finding a specific key, binary search is used to find the right position. Using binary search keeps the complexity of search algorithm in B-Tree O(logn).
If you replace the preallocated array with some kind of linked list, you lost the ordering. Unless you use some complex data structures, like skip list, to keep the search algorithm with O(logn). But it's totally unnecessary, skip list itself is better.
I am reading an article on Hash tables. Here is the text snippet.
A hash table is useful for any graph theory problem where the nodes
have real names instead of numbers. Here, as the input is read,
vertices are assigned integers from 1 onwards by order of appearance.
Again, the input is likely to have large groups of alphabetized
entries. For example, the vertices could be computers. Then if one
particular installation lists its computers as ibm1, ibm2, ibm3, . .
. , there could be a dramatic effect on efficiency if a search tree is
used.
My quesitons on above text
What does author mean "as input is read, verticies are assigned integers from 1 onward" don't we have calculate hash key for input read?
What does author mean by "there could be a dramatic effect on efficiency if a search tree is used."?
How hash tables are helpful in graph theory problems when compared to search tree?
Thanks!
(1) It's a map not a set, they calculate a hash value of course, but the node is mapped to an integer, and that's the purpose of the map.
(2) A search tree is O(logn) search, using a map based on search tree will increase time complexity of all ops *O(logn). [for example, BFS will take O(logV*[V+E)) instead of O(V+E), because of the lookup time.
(3) Hash table is O(1), so time complexity will be better for hash tables, in the average case.
(1) the author could be referencing methods of representing graphs in matrix form, like this example
(2) not sure about "search trees", though if you can represent hash tables as a graph, then there are some methods to optimize them, like this example.
I'm reading Advanced Data Structures by Peter Brass.
In the beginning of the chapter on search trees, he stated that there is two models of search trees - one where nodes contain the actual object (the value if the tree is used as a dictionary), and an other where all objects are stored in leaves and internal nodes are only for comparisons.
What are the advantages of the second model over the first one?
One of the big advantages of a binary tree where data is only in the leaf nodes is that you can partition based on elements that are not in your dataset.
For example, if I have a possible dataset of 0-1 million, but the vast majority of items are either at the high end or low end but not in the middle, I may still want my first compare against 500,000 - even though that number is not in my data set. If every node had data, I could not do this. While not normally needed in theory, I've run into many times that partitioning based on a value outside my data simplified implementation.
B+ trees are an example of a case where all key/values are stored in leaf nodes. The primary advantage here is that since all items are in the leaf nodes, the leaf nodes can be linked together to form a linked list which allows rapid in-order traversal. If you access a particular element, you can always find the next element in the sequence without visiting any parents because the leaf nodes are linked together. Filesystems and database storage systems can take advantage of this structures for range searches and stuff.
Lets say you are building tree over some objects on some complex criteria. On example calculated from multiple properties. Sometimes you can't change this object to store calculated value and calculating this criteria is expansive. So you calculate this criteria only once, and store objects in leafs based on criteria result. Then when your tree is complete you can find required object much faster because you don't have to calculate criteria for each tree node in your path.
well storing information objects in the nodes, we talking in this case about a trie, is usefull for fast retrival of information(faster than storing stuff in an array/hashtable, where the worst case auf acces is O(n), in the trie this is O(m) [m is the lenght of n])
look here:
https://en.wikipedia.org/wiki/Trie
In a search tree this oerations can be much more complicated(look AVL Tree O(log n) ) and so can be slower and is more compley to implement.
What data structure to choose??
Well this depends on what u want to do