Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I read many articles and watched videos on youtube about aggregates but every time I am getting more confused by them. So please describe in detail with a specific example. I understood little about that is it is a collection and it can used in DDD(Domain Driven Design) pattern and in identifying microservices boundary. If I am wrong please correct me and describe more about aggregates.
Thanks in advance.
An AGGREGATE is a cluster of associated objects that we treat as a unit for the purpose of data changes. -- Evans, 2003.
Aggregate is a pattern, that Evans describes in the lifecycle management chapter of the blue book.
The motivation is that we often have two or more domain entities that must always agree with each other in some way. That will normally mean that we want to save them together (otherwise, a badly timed failure could leave us in a state where the entities are not consistent with each other), and will normally mean that we want to have both entities available in any case where we might change one (because we'll need to make sure that change is consistent with the other).
See also: Coarse Grained Lock.
A somewhat contrived example:
Imagine a system that keeps track of bids for some commodity. Our entity might include a collection of BUY orders (with a price and an amount for each), and a similar collection of SELL orders.
Our job is to pair off BUY and SELL orders that are close to each other and time and have a common price. So when a new BUY order comes in, either it gets added to the collection of BUY orders, or it is matched against a SELL order, which is removed from the SELL collection.
In effect, these two collections are managed such that they never overlap. To ensure that property holds, we keep both collections in the same "aggregate", so that they are always saved as a unit, and we are protected from data races that might make the two entities inconsistent with each other.
Often, this will constrain our data model - for instance, if we are using a document store to hold our domain information, then both of these entities would be represented within the same "document".
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am developing an app in which people can upload a TaggedImage that can be visualized by anothers.
So the thing is that I want to implement a ranking system that retrieves the data in function of how well recieved is, how new the item is, if it is shared etc,
But, I couldn't found any info that helps me as guide for. I really don't know where to start, it is from the SQL server, from the app service, or another backend service?
Where can I get more practical info about this topic?
Ranking means mapping your potential items to a number i.e. finding a function r(x): X -> R, where X is your set of items, x is some arbitrary item from X and R is the set of real numbers.
In practice you use all kinds of information about your items, also known as features which has a correlation with some actual goal you are trying to optimise. You need to define your goal - it may be related to engagement i.e. the amount of time the user spends using your application, it could also be the amount of revenue this user generates, or the number of likes he ends up leaving during the current session.
Once you have a goal and your features, you need to build a function. It could be hand-crafted or optimised. Optimisation involves searching for parameters that maximise your objective function in order to pick the best ranking function out of some family of functions. This is in essence what Machine Learning is about.
Endless books have been written about what features you might want to use for your particular problem, what objective functions you might want to use to achieve your business objectives and what algorithms you can use to increase your chances of finding a good set of parameters.
So I won't go into a lot of details but I will give you a few pointers on each of the three issues.
Features
Number of total likes for this item
Number of likes today
Who posted this item
When it was posted
How many comments were left
Who liked the item
The caption text
Hashtags used
The image itself
Objectives
Engagement with the item (like, comment, dwell time)
Engagement with the application (likes, comments, dwell time, posts)
Item diversity
Algorithms
Boosted Decision Trees
Neural Networks
Image Classification (for binary events, regression otherwise)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 9 days ago and left it closed:
Original close reason(s) were not resolved
Improve this question
In each of the following examples, we need to choose the best data structure(s). Options are: Array, Linked Lists, Stack, Queues, Trees, Graphs, Sets, Hash Tables. This is not homework, however, I am really curious about data structures and I would like the answer to these questions so that I can understand how each structure works.
You have to store social network “feeds”. You do not know the size, and things may need to be dynamically added.
You need to store undo/redo operations in a word processor.
You need to evaluate an expression (i.e., parse).
You need to store the friendship information on a social networking site. I.e., who is friends with who.
You need to store an image (1000 by 1000 pixels) as a bitmap.
To implement printer spooler so that jobs can be printed in the order of their arrival.
To implement back functionality in the internet browser.
To store the possible moves in a chess game.
To store a set of fixed key words which are referenced very frequently.
To store the customer order information in a drive-in burger place. (Customers keep on coming and they have to get their correct food at the payment/food collection window.)
To store the genealogy information of biological species.
Hash table (uniquely identifies each feed while allowing additional feeds to be added (assuming dynamic resizing))
Linked List (doubly-linked: from one node, you can go backwards/forwards one by one)
Tree (integral to compilers/automata theory; rules determine when to
branch and how many branches to have. look up parse trees)
Graph (each person is a point, and connections/friendships are an edge)
Array (2-dimensional, 1000x1000, storing color values)
Queue (like a queue/line of people waiting to get through a checkpoint)
Stack (you can add to the stack with each site
visited, and pop off as necessary to go back, as long as you don't
care about going forward. If you care about forward, this is the same scenario as the word processor, so linked list)
Tree (can follow any game move by move, down from the root to the leaf. Note that this tree is HUGE)
Hash table (If you want to use the keywords as keys, and get all things related to them, I would suggest a hash table with linked lists as the keys' corresponding values. I might be misunderstanding this scenario, the description confuses me a little as to how they are intended to be used)
Queue or Hash Table (if this is a drive thru, assuming people aren't cutting
in front of one another, it's like the printer question. If customers are placing orders ahead of time, and can arrive in any order, a hash table would be much better, with an order number or customer name as the key and the order details as the value)
Tree (look up phylogenic tree)
If you would like to know more about how each data structure works, here is one of many helpful sites that discusses them in detail.
You have to store social network “feeds”. You do not know the size, and things may need to be dynamically added. -linked list or Hash table
You need to store undo/redo operations in a word processor. – stack (use 2 stack for including redo operation)
You need to evaluate an expression (i.e., parse). – stack or tree
You need to store the friendship information on a social networking site. I.e., who is friends with who – graph
You need to store an image (1000 by 1000 pixels) as a bitmap. – ArrayList(java) or 2D array
To implement printer spooler so that jobs can be printed in the order of their arrival. – queue
To implement back functionality in the internet browser. – stack
To store the possible moves in a chess game. – tree
To store a set of fixed keywords which are referenced very frequently. –dynamic programming or hash table
To store the customer order information in a drive-in burger place. (Customers keep on coming and they have to get their correct food at the payment/food collection window.) – queue / min Priority Queue(if the order is simple and takes less time)
To store the genealogy information of biological species. – Tree
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
We have 1 million dataset and each dataset is around 180mb. SO the total size of our data is around 185T. Each dataset is plain DEL file with only three columns. The first two columns is the row key and the last one is the value of the row. For example, the first column is A, the second is B the third is C. The value of A is the dataset number, so A is fixed in one dataset and its range is from 1-1million. B is the position number, B can range from 1 to 3 million.
What we are planning to do is given a any set of non-overlapping ranges of B like from 1-1000, 10000-13000, 16030-17000...., we calculate the sum of the values of each dataset over all these ranges, and return the top 200 dataset number(A) with in seconds.
Do any one expertised in bigdata have any idea on how many severs we will need to handle this case? My boss believe 10 servers (16 cores each) can do it with a budget of $50,000. Do you think it's feasible?
I think that services such as Microsoft Azure can be your friend in this case. I think that your budget will go further using "pay as you use" services. You can decide how many servers / instances you would like to use to crunch your data.
I do think that one slight problem might be the way your data is currently formatted. I would most certainly look at using Azure table storage possibly and first work on getting your data in a service such as that. Once that has been complete, you now have a more "queryable" and reliable data store. From there you can use your language of choice to interact with that data. Using table storage, you can create partition keys.
Once you have your partitions that you would like to use, you can then create a service that you would perhaps supply a partition or more likely partition range, which it will process. You will be able to adjust the size of your instances and also what hardware should drive them, with something like this in place you can then determine an average on how long it would take 1 instance to process x records. Perhaps you could write some logs as to the performance.
Once you have your logs, it will be simple to determine how long the process would take with reasonable accuracy. you can then start adding more instances to your service, thus starting to work through the data at a faster pace.
Table storage was also designed to work with big datasets, so going through the documentation on this, you will find many key features that you could use.
there are honestly many ways in which this problem could be solved, this is simply one option that I have used in the past and it worked for me at the time.
I would make sure that if this was a viable option for you, to place your data and services in the same data centre. While I assume you have some form of sequence in your files, you could also persist placeholders storing your sum values for future use and should your data grow in the future you could simply add the new data and run your services again to update the system.
I would not go on this journey without making sure that you can persist your sum values in some or other way, otherwise should you need to get the values again in the future you will again need to start from the beginning again.
I managed to find one quick write up about the services mentioned above working with big data. Perhaps it might be able to help you further. http://www.troyhunt.com/2013/12/working-with-154-million-records-on.html
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
We have a few products with one of the product use flat files for persistence.. Other products in the suite can use that data (via API) but only one at a time..
We cannot put the whole files in DB as its huge data.. 20GB+.. but still we have found a solution where some data can be put in DB.. e.g. user interpretations, meta info, markups etc..
So the story is like:
"As a user i can concurrently access product A data from product B, C and D". That is huge i.e. approx 6-8 months
Even if I keep it as "As a user i can concurrently access product A data from product B". It’s still huge.. i.e. approx 5-6 months
Even doing like following, It’s still huge..
"As a user i can concurrently access feature X of product A data from product B". i.e. approx 4-5 months.
The problem is if we can do one thing (one feature, one product) we can quickly do all..
how can i break this story into sub-stories.. or should i accept that some stories cannot be further broken into sub-stories that can fit in one iteration.
PS: we use scrum
Ask yourself (and your team): What makes the story so big? Is there absolutely no benefit that can be shown along the way? Features and products would be the obvious cut, but might not necessarily (as you've shown) be good enough.
How about sub-components of the feature? What are you putting in? Is any of it externally visible or valuable?
Do you have authentication, configuration, or other "standard" aspects of the product? You could cut those out and put them as user stories.
Perhaps the 3-5 month features can be cut down further?
Anyway,
I hope this helps,
Assaf.
What you are describing is what we call an "epic" - it's really a collection of smaller stories that you are describing with a much larger descriptor. I suggest you do some more analysis to determine what parts of the system will be impacted by your request. You might have groupings like Reports, Entry Forms, etc that are individually impacted by the request.
Tackle the impact of the "epic" request on each area as a user story. For example, "Enhance Report X to include data from Product B", "Enhance Report X to include data from Product C", etc. I don't know enough about what you are changing to make the titles more descriptive but hopefully you get the idea. Keep at this deconstruction until the stories get down to the sweet spot of 2, 3, or 5 points each.
The nice thing about this is that it also will allow the PO to make a decision once they see all of the costs for this request. They may decide that we really only need access to data from Product B alone to be successful once they see the costs to include Product C also.
Agile fully supports that some features have a longer horizons than a typical sprint period (2-4 weeks). Certainly the story can be broken down into tasks. In this case, I recommend prioritizing the tasks for this story and burning them down using your scrum methodology. At the end of each sprint, you should still have 'working software' that you can demonstrate / test. You may not have the full feature yet, and that is okay.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
It took me some time, but I've finally managed to write down all the tasks that need to go into Version 1.0 of the software product I'm working on.
The list is almost 1000 items long.
We are a 3-person team, and we've somehow managed to get this far using MindMeister, Google Docs, #todos in the code etc. Now, I have everything neatly grouped by feature, but how do I prioritize all this and turn it into a schedule?
Any advice would be greatly appreciated - I'm not looking for software recommendations, however - I'm seeking advice on how to take this enormous bag of tasks - ranging from bug-fixes to application modules - and find out in what order I should do them.
Prioritize ruthlessly. 1000 action items is a lot, and the odds are that as you go you'll modify some, toss others, and add new ones. Your list will not survive the things you learn by actually building the software, and if you don't do the most important stuff first, you'll end up with a mess.
For every item or feature, you have to answer the question: Can the product be at all usable or useful without this? If yes, it can wait; everything else goes to the head of the queue.
After that, I like to group milestones by focus: I'll do a features milestone (or multiple ones if there are natural small clusters of features), a UI milestone where I'll focus on AJAX/rich client interactivity, a performance milestone where I profile and do database & server tuning, etc. Or break them up some other way - but definitely break them up. Work in smaller bites with specific focus for each iteration, and make sure each iteration is solid before moving on.
My recommended approach will be based on Agile methodology best practices...
So, you have what in Agile terms is called a "backlog" defined- that's great - and an important first step.
A good Agile pace that is commonly used is a 2-3 week iteration length...and at the end you have a set of releasable features. This will establish the "heartbeat" of your development process. Next, you'll decided how to organize and group the features into Stories and Tasks.
You'll want to grow the underlying architecture and let it naturally emerge based on the ordering of the Stories and Tasks that you select from your backlog.
Its important to mitigate risks early - so you'll want to select early those items that are either performance or implementation unknowns that might pose the largest risk - and could result in the largest rework impact. For example - establishing the messaging infrastructure - might be an early architectural feature that might be included if you select a Story that required a persistent message to be delivered to complete a unit of work.
Can you group the set of features into functional categories that might naturally evolve to describe the 1.0 release as a System of Systems? For example, the Administrative functions, the User Profile Management, Reporting, external integration layers, Database Access Objects, etc.
What are the simplest Story / Use Cases that you can write - that will map to some of the ~1,000 features / requirements you've defined? Select a set of Stories (or individual Tasks from a Story - if the Story itself is too large to implement in a single interation). It will take some additional effort - but recomposing your requirements into a set of Stories/Tasks is important.
You'll find that you will refactor during subsequent interations - but that your steady 2-week heartbeat iteration schedule will keep delivering real functionality.
At various points you may want to schedule an architecture iteration just to focus on some cleaning-up / refactoring - and that's ok too.
Since you are indicating that all these items are required, I will assume that there is not much chance of dropping items off the list (at least for now). Given that, you have 2 large tasks at hand - deciding when to do items, and determining how long it will take to do them.
Since you have already conveniently grouped the items by feature, I would start by prioritizing the features. Hopefully this will significantly reduce your working set, and allow you to actually get through it in a reasonable amount of time.
I would prioritize each feature based on its risk. Some things are easy to implement and others are difficult. Since they are all required, do the riskiest features first, when your schedule is more flexible to meet any unanticipated problems. Wait until the end of your cycle, and Murphy's law will strike you down.
Given your small team, I would just send the list of features around and ask everyone to mark it if they consider it a risky or difficult feature to implement. Add up all the marks and you have your "risk assessment", with the highest scoring items getting assigned first.
Alternatively, if you have easy access to your customer, ask them to rate the "risk" associated with each feature (in this case risk refers to the worst-case scenario of not having the feature - if not having something would be annoying, it is not risky. If not having the feature would result in them not using your product, it is high-risk).
Now that you have a priority queue, it is time to estimate. For the initial estimates, I would simply do an order of magnitude estimate for each of the features. Since it sounds as if you have already broken the features up, you should be able to get a decent feel for whether something is going to take hours, days or weeks. From the sounds of it, you are still early in development, so I don't believe there is much point in trying to get an accurate estimate on something that won't be implemented for another month or so.
As you pull items off your queue, have your team provide more accurate estimates by identifying granular tasks that shouldn't take more than a few hours. If you want to refine your order of magnitude estimates, you can progressively provide quick estimates for the remaining tasks based on your up-to-date knowledge of the system.
This should provide you with a fairly accurate short term schedule, and a fuzzier long term schedule that will progressively get more accurate.
Finally, if you are facing a long development cycle, I would recommend you identify certain target goals or dates, and when you meet those goals, sit down and repeat this whole process. I would never go longer than 2 weeks without revisiting these things. New items will get added, others will get overtaken and become obsolete, and others will become higher risk as you better understand the problem. All of this must be taken into account.