There is a scenario of sorting data based on alphabetical order in my application. I am little confused, whether my sorting should be case sensitive or not, which would be better for user experience?
The data, I want to sort here is a list of form names.
I will not allow the user to change the sort order, so I should choose a user friendly sorting.
Related
Imagine: someone has a huge website selling, let's say, T-shirts.
we want to show paginated sorted listings of offers, also with options to filter by parameters, let's say - T-shirt colour.
offers should be sortable by any of 5 properties (creating date,
price, etc...)
Important requirement 1: we have to give a user an ability to browse all the 15 million offers, and not just the "top-N".
Important requirement 2: they must be able to jump to a random page at any time, not just flick through them sequentially
we use some sort of a traditional data storage (MongoDB, to be precise).
The problem is that MongoDB (as well as other traditional databases) performs poorly when it comes to big offsets. Imagine if a user wants to fetch a page of results somewhere in the middle of this huge list sorted by creation date with some additional filters (for instance - by colour)
There is an article describing this kind of problem:
http://openmymind.net/Paging-And-Ranking-With-Large-Offsets-MongoDB-vs-Redis-vs-Postgresql/
Okay now, so we are told that redis is a solution for similar kind of problem. You "just" need to prepare certain data structures and search them instead of your primary storage.
the question is:
What kind of structures and approaches whould you suggest to use in order to solve this with Redis?
Sorted Sets, paging through with ZRANGE.
I have a database of users with one field describing their interests in form of an array of strings. I'd like to write an algorithm which is able to take as input the list of users and returns groups of users with the same interests. The end goal would be to suggest the users what they may be interested into based on what users with similar interests are interested into. Could anybody suggest an algorithm that I could implement to achieve such result?
Thank you.
I have a variety of data that I've got cached in a standard Redis hashmap, and I've run into a situation where I need to respond to client requests for ordering and filtering. Order rankings for name, average rating, and number of reviews can change regularly (multiple times a minute, possibly). Can anyone advise me on a proper strategy for attacking this problem? Consider the following example to help understand what I'm looking for:
Client makes an API request to /api/v1/cookbooks?orderBy=name&limit=20&offset=0
I should respond with the first 20 entries, ordered by name
Strategies I've considered thus far:
for each type of hashmap store (cookbooks, recipes, etc), creating a sorted set for each ordering scheme (alphabetical, average rating, etc) from a Postgres ORDER BY; then pulling out ZRANGE slices based on limit and offset
storing ordering data directly into the JSON string data for each key.
hitting postgres with an SELECT id FROM table ORDER BY _, and using the ids to pull directly from the hashmap store
Any additional thoughts or advice on how to best address this issue? Thanks in advance.
So, as mentioned in a comment below Sorted Sets are a great way to implement sorting and filtering functionality in cache. Take the following example as an idea of how one might solve the issue of needing to order objects in a hash:
Given a hash called "movies" with the scheme of bucket:objectId -> object, which is a JSON string representation (read about "bucketing" your hashes for performance here.
Create a sorted set called "movieRatings", where each member is an objectId from your "movies" hash, and its score is an average of all rating values (computed by the database). Just use a numerical representation of whatever you're trying to sort, and Redis gives you a lot of flexibility on how you can extract the slices you need.
This simple scheme has a lot of flexibility in what can be achieved - you simply ask your sorted set for a set of keys that fit your requirements, and look up those keys with HMGET from your "movies" hash. Two swift Redis calls, problem solved.
Rinse and repeat for whatever type of ordering you need, such as "number of reviews", "alphabetically", "actor count", etc. Filtering can also be done in this manner, but normal sets are probably quite sufficient for that purpose.
This depends on your needs. Each of your strategies could work.
Your first approach of storing an auxiliary sorted set for each way
you want to order is the best way to do this if you have a very big
hash and/or you run your order queries frequently. This approach will
require a lot of ram if your hash is big, but it will also scale well
in terms of time complexity as your hash gets bigger and you start
running order queries more frequently. On the other hand, it
introduces complexity in your data structures, and feels like you're
trying to use Redis for something a typical DB like Postgres, MySQL,
or Mongo would be better at.
Storing ordering data directly into your keys means you need to pull
your entire hash every time you do an order query. Maybe that's not
so bad if your hash is very small, or you don't do ordered queries very often, but this won't scale at all.
If you're already hitting Postgres to get keys, why not just store the values in Postgres as well. That would be much cheaper than hitting Postgres and then hitting Redis, and would have your code depend on fewer things. IMO, this is probably your best option and would work most naturally. Do this, unless you have some really good reason to not store values in Postgres, or some really big speed concerns, in which case go with your first strategy.
First and foremost, no I'm not asking please tell me how Google is built in two sentences. What I am asking is slightly different. I have a database filled with textual data that users input. We also give them the functionality to search for this data later. The problem is, we do a simple full text search now and return the results in any order. I'd like to return the results based on a weight, a weight of how often the user types in something. An an example a user might type in the following:
"foo"
"bo"
"bob"
"bob"
"bob"
"bo"
"foo2"
Based on the above data, a search on 'b' should return bo and bob, but bob should be listed first. It is the most relevant based on usage.
Curious, what algorithm should I research to build this in an effective fashion? Any books based on common web algorithms (I know this isn't just web specific) out there that will explain this?
there is various search algorithms out there.
Here's a little guidepost to some of them:
http://en.wikipedia.org/wiki/Search_algorithm
not an expert myself in this area, so I cannot recommend a specific one.
I don't know how you'd do this in the context of a database, but here's one way to go about it:
Use a trie to store each unique word and the count of how often it was used. When your user starts typing, the trie allows you to efficiently grab all the string with the given prefix, which you can then sort using the words' counts as keys.
We use apache solr for our search.
In this technology, I think, this is normally done via boosting. So index your data and every day or so then boost individual documents based on user queries.
I have a basic question. Say you have a NSFetchRequest which you want to perform on a NSManagedObjectContext. If the fetch request doesn't have any sort descriptors set to it explicitly, will the objects be random every time, or are they going to be spit out into an array in the order they were added to the Managed Object Context initially? I can't find this answer anywhere in the documentation.
No, they're not guaranteed to be ordered. You might happen to see consistent ordering depending on what type of data store you use (I've never tried), but it's not something you should depend on in any way.
It's easy to order by creation date though. Just add a date attribute to your entity, initialize it to the current date in awakeFromInsert, and specify that sort descriptor in your fetch.
The order may not be "random every time" but as far as I know you cannot/should not depend on it. If you need a specific order, then use sort descriptors.
I see two questions here: will it come out in the same ordering every time? And, is that ordering on insertion order?
It comes out in set order, which is in some ordering. Note that NSSet is just an interface and there are private classes that implement NSSet. That means that while some instances of NSSet you get back if you call allObjects against it might return them in some consistent ordering, it's almost assuredly in hash ordering as sets are almost universally implemented as hashed dictionaries.
Since the hashing algorithm is highly variable depending on what is stored and how it's hashed, you might "luck out" that it comes out in the same ordering every time, but then be caught off guard another time when something changes very slightly.
So, technically, it's not really random and it could be in some stable ordering.
To the second question, I would say it's almost assuredly NOT in insertion order.
Marc's suggestion for handling awakeFromInsert is a good one, and what you would want.
There is no guarantee on the ordering. For example, I could implement an NSAtomicStore or NSIncrementalStore that returns results in random order and it would be completely correct. I have seen the SQLite store return different ordering on different versions of the operating system as well.