Context:
Hi guys! I need to do my final project for the DS course, and I was thinking about doing a File Explorer, with basic operations like Searching, creating Folder or files, deleting them. The point is to really make use of DS and algorithms, so I was thnking to use maybe a B-tree, Linked-list or I know that I could use Bit vector but I would like to implement it using trees, also I was thinking maybe a trie to do the search of the files, and do some sorting to search them by Name, creation date or something like.
Question:
What I was wondering is like, what would be a nice OOP way for the system design, like have one class to all the algorithms? Another one for the tree? Another one for the files? If anybody could help me just to have like a nice roadmap to follow that would be great!
PS: I would be ding this in Java, I'm currently in my second year of CS.
Related
Context: I'm working on an analyzer for useragent strings (Yauaa) and as part of this analysis I want to make an educated guess what brand of the device should be reported. I have an implementation that I need to rewrite to be a lot more efficient.
Because I do not want to have a complete list of all devices I want to do the detection based on the prefix of the model.
So I have a dataset with prefixes and the brand that is associated:
"GT-" --> "Samsung"
"LLD-" --> "Huawei"
And then I want to do a .get("GT-1234124") which should result in "Samsung" because that is the "longest matching prefix".
I had a look at the Trie structure but that seems to be for the opposite situation. What I understand is that you start with a set of values and you can efficiently get all the values that starts with the provided prefix.
If I were to implement this from scratch I would use a tree similar to the Trie but walk around it differently. What I'm looking for is a datastructure that does what I need as fast as possible.
What datastructure do you recommend for this usecase?
Is there an existing (proven) implementation I can use?
I did some digging into datastructures and found that essentially the Trie structure is what I need with a different way of walking around the structure.
Since this structure is really simple I created my own implementation that works very well.
See:
https://github.com/nielsbasjes/yauaa/blob/master/analyzer/src/main/java/nl/basjes/parse/useragent/utils/PrefixLookup.java
Updates:
I wrote an article about this https://techlab.bol.com/finding-the-longest-matching-string-prefix-fast/
I put my implementation into a separate library which I opensourced and which is already available via maven central. See https://github.com/nielsbasjes/prefixmap
i have to explain what data structure is to someone, so what would be the easiest way to explain it? would it be right if i say
"Data structure is used to organize data(arrange data in some fashion) so that we can perform certain operation fastly with as little resource usage as possible"
How values are placed in locations together and their location addresses and indices are stored as values too.
And that as very abstract "structures" so one has linked lists, arrays, pointers, graphs, binary trees. And can do things with them (the algorithms). The capabilities like being sorted, needing sortedness, fast access and so on.
This is fundamental, not too complicated, and a good grasp of data
structures, the correct usage of data structures can solve problems
elegantly. For learning data structures a language like Pascal is more
beneficial than C.
In computer science, a data structure is a particular way of organizing data in a computer so that it can be used efficiently.
Source: wikipedia (https://en.wikipedia.org/wiki/Data_structure)
I would say what you wrote is pretty close. :)
I want to implement graph data structure and want to make its graphical view in any language like Windows form or java any one. If you know about it then please tell me.
Tall order.
When I was learning data-structures I always found this page to be helpful for understanding data structures.
https://www.cs.usfca.edu/~galles/visualization/Algorithms.html
This has a bunch of different types of graphs if you scroll down.
The javascript version of each visualization is still maintained. Maybe you can use this as a point of departure and try to reverse engineer whatever specific graph algorithm you are trying to construct.
I am trying to store a large list of strings in a concise manner so that they can be very quickly analyzed/searched through.
A directed acyclic word graph (DAWG) suits this purpose wonderfully. However, I do not have a list of the strings to include in the first place, so it must be incrementally buildable. Additionally, when I search through it for a string, I need to bring back data associated with the result (not just a boolean saying if it was present).
I have found information on a modification of the DAWG for string data tracking here: http://www.pathcom.com/~vadco/adtdawg.html It looks extremely, extremely complex and I am not sure I am capable of writing it.
I have also found a few research papers describing incremental building algorithms, though I've found that research papers in general are not very helpful.
I don't think I am advanced enough to be able to combine both of these algorithms myself. Is there documentation of an algorithm already that features these, or an alternative algorithm with good memory use & speed?
I wrote the ADTDAWG web page. Adding words after construction is not an option. The structure is nothing more than 4 arrays of unsigned integer types. It was designed to be immutable for total CPU cache inclusion, and minimal multi-thread access complexity.
The structure is an automaton that forms a minimal and perfect hash function. It was built for speed while traversing recursively using an explicit stack.
As published, it supports up to 18 characters. Including all 26 English chars will require further augmentation.
My advice is to use a standard Trie, with an array index stored in each node. Ya, it is going to seem infantile, but each END_OF_WORD node represents only one word. The ADTDAWG is a solution to each END_OF_WORD node in a traditional DAWG representing many, many words.
Minimal and perfect hash tables are not the sort of thing that you can just put together on the fly.
I am looking for something else to work on, or a job, so contact me, and I'll do what I can. For now, all I can say is that it is unrealistic to use heavy optimization on a structure that is subject to being changed frequently.
Java
For graph problems which require persistence, I'd take a look at the Neo4j graph DB project. Neo4j is designed to store large graphs and allow incremental building and modification of the data, which seems to meet the criteria you describe.
They have some good examples to get you going quickly and there's usually example code to get you started with most problems.
They have a DAG example with a link at the bottom to the full source code.
C++
If you're using C++, a common solution to graph building/analysis is to use the Boost graph library. To persist your graph you could maintain a file based version of the graph in GraphML (for example) and read and write to that file as your graph changes.
You may also want to look at a trie structure for this (potentially building a radix-tree). It seems like a decent 'simple' alternative structure.
I'm suggesting this for a few reasons:
I really don't have a full understanding of your result.
Definitely incremental to build.
Leaf nodes can contain any data you wish.
Subjectively, a simple algorithm.
I would use a hash table and use ISBN number as key. As this will give me a look up time of O(1)....as avg time of look up in hash table is O(1)....
we can also use Binary search tree.....look up time is O(nlogn)...
What data structure would you guys use and why?
This sounds like a homework or interview question. If I were asking it, I would be interested in more than just whether you understand a couple of data structures. I would also want to know how you analyze a real-world problem and translate it to the world of computers and data structures.
As such, you should probably think about what operations you need to perform on the data before you pick a data structure. You should also think some about real libraries and some of the "gotchas" that could come up with any data structure you chose.
If all you need to do is translate from an ISBN to the catalog entry for the corresponding book, then a hash table might be a reasonable choice. But you might want to think about how you would deal with popular books, such as best sellers, that a library could have many copies of.
But is ISBN lookup really the important use case? I use my local library all the time, and I never look up books by ISBN. Some of things that I do are:
Look up a specific book by title. Sometimes there are different books with the same title.
Browse the list of books by an author I like
Find where books on a particular subject are shelved, so I can browse them.
Librarians probably have additional uses for a catalog system:
Add new books to the catalog
Mark books as checked out
Change listing information, such as subject classification, for a book
So I guess my recommendation would be to think more carefully about what problem you want to solve before you decide on the solution.
Apologies for asking more questions instead of providing an answer. I hope this is helpful anyway.
Well ... I don't think the hardest problem to solve with designing a data structure to store information about books is that of look-up speed.
And I would certainly not settle for a system that only allowed searching if you know the ISBN. What if you only remember the author, or a few words from the title? If there is to be any gains in having a computerized system for this, you must support flexible searches, in my opinion.
I would probably look into using Dublin Core, but I'm not at all sure that's the "right" thing to do. It seems people have spent a great deal of time thinking about that one, though.