Examples of Data Structures in real life [closed] - data-structures

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 9 days ago and left it closed:
Original close reason(s) were not resolved
Improve this question
In each of the following examples, we need to choose the best data structure(s). Options are: Array, Linked Lists, Stack, Queues, Trees, Graphs, Sets, Hash Tables. This is not homework, however, I am really curious about data structures and I would like the answer to these questions so that I can understand how each structure works.
You have to store social network “feeds”. You do not know the size, and things may need to be dynamically added.
You need to store undo/redo operations in a word processor.
You need to evaluate an expression (i.e., parse).
You need to store the friendship information on a social networking site. I.e., who is friends with who.
You need to store an image (1000 by 1000 pixels) as a bitmap.
To implement printer spooler so that jobs can be printed in the order of their arrival.
To implement back functionality in the internet browser.
To store the possible moves in a chess game.
To store a set of fixed key words which are referenced very frequently.
To store the customer order information in a drive-in burger place. (Customers keep on coming and they have to get their correct food at the payment/food collection window.)
To store the genealogy information of biological species.

Hash table (uniquely identifies each feed while allowing additional feeds to be added (assuming dynamic resizing))
Linked List (doubly-linked: from one node, you can go backwards/forwards one by one)
Tree (integral to compilers/automata theory; rules determine when to
branch and how many branches to have. look up parse trees)
Graph (each person is a point, and connections/friendships are an edge)
Array (2-dimensional, 1000x1000, storing color values)
Queue (like a queue/line of people waiting to get through a checkpoint)
Stack (you can add to the stack with each site
visited, and pop off as necessary to go back, as long as you don't
care about going forward. If you care about forward, this is the same scenario as the word processor, so linked list)
Tree (can follow any game move by move, down from the root to the leaf. Note that this tree is HUGE)
Hash table (If you want to use the keywords as keys, and get all things related to them, I would suggest a hash table with linked lists as the keys' corresponding values. I might be misunderstanding this scenario, the description confuses me a little as to how they are intended to be used)
Queue or Hash Table (if this is a drive thru, assuming people aren't cutting
in front of one another, it's like the printer question. If customers are placing orders ahead of time, and can arrive in any order, a hash table would be much better, with an order number or customer name as the key and the order details as the value)
Tree (look up phylogenic tree)
If you would like to know more about how each data structure works, here is one of many helpful sites that discusses them in detail.

You have to store social network “feeds”. You do not know the size, and things may need to be dynamically added. -linked list or Hash table
You need to store undo/redo operations in a word processor. – stack (use 2 stack for including redo operation)
You need to evaluate an expression (i.e., parse). – stack or tree
You need to store the friendship information on a social networking site. I.e., who is friends with who – graph
You need to store an image (1000 by 1000 pixels) as a bitmap. – ArrayList(java) or 2D array
To implement printer spooler so that jobs can be printed in the order of their arrival. – queue
To implement back functionality in the internet browser. – stack
To store the possible moves in a chess game. – tree
To store a set of fixed keywords which are referenced very frequently. –dynamic programming or hash table
To store the customer order information in a drive-in burger place. (Customers keep on coming and they have to get their correct food at the payment/food collection window.) – queue / min Priority Queue(if the order is simple and takes less time)
To store the genealogy information of biological species. – Tree

Related

How do I implement a ranking system like instagram? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am developing an app in which people can upload a TaggedImage that can be visualized by anothers.
So the thing is that I want to implement a ranking system that retrieves the data in function of how well recieved is, how new the item is, if it is shared etc,
But, I couldn't found any info that helps me as guide for. I really don't know where to start, it is from the SQL server, from the app service, or another backend service?
Where can I get more practical info about this topic?
Ranking means mapping your potential items to a number i.e. finding a function r(x): X -> R, where X is your set of items, x is some arbitrary item from X and R is the set of real numbers.
In practice you use all kinds of information about your items, also known as features which has a correlation with some actual goal you are trying to optimise. You need to define your goal - it may be related to engagement i.e. the amount of time the user spends using your application, it could also be the amount of revenue this user generates, or the number of likes he ends up leaving during the current session.
Once you have a goal and your features, you need to build a function. It could be hand-crafted or optimised. Optimisation involves searching for parameters that maximise your objective function in order to pick the best ranking function out of some family of functions. This is in essence what Machine Learning is about.
Endless books have been written about what features you might want to use for your particular problem, what objective functions you might want to use to achieve your business objectives and what algorithms you can use to increase your chances of finding a good set of parameters.
So I won't go into a lot of details but I will give you a few pointers on each of the three issues.
Features
Number of total likes for this item
Number of likes today
Who posted this item
When it was posted
How many comments were left
Who liked the item
The caption text
Hashtags used
The image itself
Objectives
Engagement with the item (like, comment, dwell time)
Engagement with the application (likes, comments, dwell time, posts)
Item diversity
Algorithms
Boosted Decision Trees
Neural Networks
Image Classification (for binary events, regression otherwise)

Details about Aggregates and where to use Aggregates [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I read many articles and watched videos on youtube about aggregates but every time I am getting more confused by them. So please describe in detail with a specific example. I understood little about that is it is a collection and it can used in DDD(Domain Driven Design) pattern and in identifying microservices boundary. If I am wrong please correct me and describe more about aggregates.
Thanks in advance.
An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes. -- Evans, 2003.
Aggregate is a pattern, that Evans describes in the lifecycle management chapter of the blue book.
The motivation is that we often have two or more domain entities that must always agree with each other in some way. That will normally mean that we want to save them together (otherwise, a badly timed failure could leave us in a state where the entities are not consistent with each other), and will normally mean that we want to have both entities available in any case where we might change one (because we'll need to make sure that change is consistent with the other).
See also: Coarse Grained Lock.
A somewhat contrived example:
Imagine a system that keeps track of bids for some commodity. Our entity might include a collection of BUY orders (with a price and an amount for each), and a similar collection of SELL orders.
Our job is to pair off BUY and SELL orders that are close to each other and time and have a common price. So when a new BUY order comes in, either it gets added to the collection of BUY orders, or it is matched against a SELL order, which is removed from the SELL collection.
In effect, these two collections are managed such that they never overlap. To ensure that property holds, we keep both collections in the same "aggregate", so that they are always saved as a unit, and we are protected from data races that might make the two entities inconsistent with each other.
Often, this will constrain our data model - for instance, if we are using a document store to hold our domain information, then both of these entities would be represented within the same "document".

How many servers needed to handle such a large data [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
We have 1 million dataset and each dataset is around 180mb. SO the total size of our data is around 185T. Each dataset is plain DEL file with only three columns. The first two columns is the row key and the last one is the value of the row. For example, the first column is A, the second is B the third is C. The value of A is the dataset number, so A is fixed in one dataset and its range is from 1-1million. B is the position number, B can range from 1 to 3 million.
What we are planning to do is given a any set of non-overlapping ranges of B like from 1-1000, 10000-13000, 16030-17000...., we calculate the sum of the values of each dataset over all these ranges, and return the top 200 dataset number(A) with in seconds.
Do any one expertised in bigdata have any idea on how many severs we will need to handle this case? My boss believe 10 servers (16 cores each) can do it with a budget of $50,000. Do you think it's feasible?
I think that services such as Microsoft Azure can be your friend in this case. I think that your budget will go further using "pay as you use" services. You can decide how many servers / instances you would like to use to crunch your data.
I do think that one slight problem might be the way your data is currently formatted. I would most certainly look at using Azure table storage possibly and first work on getting your data in a service such as that. Once that has been complete, you now have a more "queryable" and reliable data store. From there you can use your language of choice to interact with that data. Using table storage, you can create partition keys.
Once you have your partitions that you would like to use, you can then create a service that you would perhaps supply a partition or more likely partition range, which it will process. You will be able to adjust the size of your instances and also what hardware should drive them, with something like this in place you can then determine an average on how long it would take 1 instance to process x records. Perhaps you could write some logs as to the performance.
Once you have your logs, it will be simple to determine how long the process would take with reasonable accuracy. you can then start adding more instances to your service, thus starting to work through the data at a faster pace.
Table storage was also designed to work with big datasets, so going through the documentation on this, you will find many key features that you could use.
there are honestly many ways in which this problem could be solved, this is simply one option that I have used in the past and it worked for me at the time.
I would make sure that if this was a viable option for you, to place your data and services in the same data centre. While I assume you have some form of sequence in your files, you could also persist placeholders storing your sum values for future use and should your data grow in the future you could simply add the new data and run your services again to update the system.
I would not go on this journey without making sure that you can persist your sum values in some or other way, otherwise should you need to get the values again in the future you will again need to start from the beginning again.
I managed to find one quick write up about the services mentioned above working with big data. Perhaps it might be able to help you further. http://www.troyhunt.com/2013/12/working-with-154-million-records-on.html

Which data structure to choose for scenario A and B as described below [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I found the below in a question bank and I'm looking for some help with it.
For each of the following situations, select the best data structure and justify your selection.
The data structures should be selected from the following possibilities: unordered list, ordered array, heap, hash table, binary search tree.
(a) (4 points) Your structure needs to store a potentially very large number of records, with the data being added as it arrives. You need to be able to retrieve a record by its primary key, and these keys are random with respect to the order in which the data arrives. Records also may be deleted at random times, and all modifications to the data need to be completed just after they are submitted by the users. You have no idea how large the dataset could be, but the data structure implementation needs to be ready in a few weeks. While you are designing the program, the actual programming is going to be done by a co-op student.
For the answer, I thought BST would be the best choice.
Since the size is not clear, hashtable is not a good choice.
Since there is a matter of deletion, heap is not acceptable either.
Is my reasoning correct?
(b) (4 points) You are managing data for the inventory of a large warehouse store. New items (with new product keys) are added and deleted from the inventory system every week, but this is done while stores are closed for 12 consecutive hours.
Quantities of items are changed frequently: incremented as they are stocked, and decremented as they are sold. Stocking and selling items requires the item to be retrieved from the system using its product key.
It is also important that the system be robust, well-tested, and have predictable behaviour. Delays in retrieving an item are not acceptable, since it could cause problems for the sales staff. The system will potentially be used for a long time, though largely it is only the front end that is likely to be modified.
For this part I thought heapsort, but I have no idea how to justify my answer.
Could you please help me?
(a) needs fast insertion and deletion and you need retrieval based on key. Thus I'd go with a hashtable or a binary search tree. However, since the size is not known in advance and there's that deadline constraint, I'd say the binary search tree is the best alternative.
(b) You have enough time to process data after insertion/deletion but need an O(1) random access. An ordered array should do the trick.

What are the standard data structures that can be used to efficiently represent the world of minecraft? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am thinking of something like a 3x3 matrix for each of the x,y,z coordinates. But that would be a waste of memory since a lot of block spaces are empty. Another solution would be to have a hashmap ((x,y,z) -> BlockObject), but that doesn't seem too efficient either.
When I say efficient, I do not mean optimal. It simply means that it would be enough to run smoothly on your modern day computer. Keep in mind, that the worlds generated by minecraft are quite huge, efficiency is important regardless. There's is also tons of meta-data that needs to be stored.
As noted in my comment, I have no idea how MineCraft does this, but a common efficient way of representing this sort of data is an Octree; http://en.wikipedia.org/wiki/Octree. The general idea is that it's like a binary tree but in three-space. You recursively divide each block of space in each dimension to get eight smaller blocks, and each block contains the pointers to the smaller blocks and a pointer to its parent block.
This allows you to be efficient about storing large blocks of the same material (e.g., "empty space"), because you can terminate the recursion whenever you get to a block that is made up of all the same thing, even if you haven't recursed down to the level of individual "cube" units.
Also, this means that you can efficiently find all the cubes in a given region by taking your current block and going up the tree just far enough to get to a block that contains all you can see -- and that way, you can very easily ignore all the cubes that are somewhere else.
If you're interested in exploring alternative means to represent Minecraft world (chunk)data, you can also look into the idea of bitstrings. Each 'chunk' is comprised of a volume 16*16*128, whereas 16*16 can adequately be represented by a single byte character and can be consolidated into a binary string.
As this approach is highly specific to a certain goal of trading client-computation vs highly optimized storage and transfer time, it seems imprudent to attempt to explain all the details, but I have created a specification for just this purpose, if you're interested.
Using this method, calculating storage cost is drastically different than the current 1byte-per-block, but instead is 'variable-bit-rate': ((1bit-per-block, rounded up to a multiple of 8) * (number of unique layers a blocktype appears in a chunk) + 2bytes)
This is then summed for the (unique number of blocktypes in that chunk).
Pretty much only in deliberate edgecases can this be more expensive than a normally structured chunk, in excess of 99% of Minecraft chunks are naturally generated and would benefit from this variable-bit-representation by a ratio of 8:1 or more in many of my tests.
Your best bet is to decompile Minecraft and look at the source. Modifying Minecraft: The Source Code is a nice walkthrough on how to do that.
Minecraft is very far from efficent. It just stores "chunks" of data.
Check out the "Map formats" in the Development Resources at Minecraft Wiki. AFAIK, the internal representation is exactly the same.

Resources