I am writing a very simple social networking app that uses Redis.
Each user has a sorted set that contains ids of items in their feed. If I want to display their feed, I do the following steps:
use ZREVRANGE to get ids of items in their feed
use HMGET to get the feed (each feed item is a string)
But now, I also want to know if the user has liked a feed item or not. So I have a set associated with each feed item that contains ids of user who have liked a feed item.
If I get 15 feed items, now I have to execute an additional 15 requests to Redis to find out, for each feed item if current user has commented on it or not (by checking if id exists in each set for each feed).
So that will take 15+1 requests.
Is this type of querying considered 'normal' when using Redis? Are there better ways I can structure the data to avoid this many requests?
I am using redis-rb gem.
You can easily refactor your code to collapse the 15 requests in one by using pipelines (which redis-rb supports).
You get the ids from the sorted sets with the first request and then you use them to get the many keys you need based on those results (using the pipeline)
With this approach you should have 2 requests in total instead of 16 and keep your code quite simple.
As an alternative you can use a lua script and fetch everything in one request.
This kind of database (Non-relational database), you have to make a trade-off between multiple requests and include some data redundancy.
You should analyze each case separately and consider some aspects, like:
How frequently this data will be accessed?
How much space this redundancy will consume?
How many requests I will have to do, in order to have all data, without redundancy?
Performance is an issue?
In your case, I would suggest to keep a Set/Hash or just a JSON encoded data for each user with a historical of all recent user interaction, such as comments, likes, etc. Every time the user access the feeds you just have to read the feeds and the historical; only two requests.
One thing to keep in mind, every user interaction, you must update all redundant data as well.
Related
We have a microservice for relationship modeling between objects. A relation is defined between primary and secondary objects with cardinality constraints like 1-1, 1-N, N-N, etc.
The microservice provides API like Create relation, Find relations, Get secondaries, Get primaries, etc.
The query API "Get secondaries" takes a primary object and returns back all the related secondary objects. Since the related secondary object could be large, the results are paginated.
We had another microservice which was making good use of this relation microservice to work with relations. This consuming service accepted a similar pagination options like page index and number and passed the same to the relation service, and returned back the calling application the page results as obtained from the relation service. It was so far so good.
We recently identified that the consuming microservice was a bit chatty with the relation microservice as it had to call "Get secondaries" API multiple times given that there were multiple primary objects for which secondary objects had to be fetched.
So we thought to make the "Get Secondaries" API a bulk API by making it accept multiple primary objects as input. But then we got stuck with how the pagination would work.
The API would return related secondary objects for each primary but limit the secondary objects to the page size like earlier.
This seemed fine for the first call, but we are unsure how would this behave for the subsequent calls. If there were lesser number of secondary objects than the page size for one or more primary objects, what should be the input for subsequent calls. Do I need to pass those primary objects again?
This is where we are looking for suggestion on how to design this bulk API. Any input is welcome.
Basically, you should have some way to ensure that the relationship service knows what the original query was when receiving a paginated request.
A simple and maintainable way for your relationship service to handle this is to preprocess the request by sorting the requested primary objects in some way (ie. sort alphabetically by Id), and then simply iterating through the primary objects, adding secondary objects to the response, until the response is full.
The simplest thing for clients to do is to always use the same batch request and just add an index number or page token to the request.
I'd recommend a page token that mentions the last seen item (for example, lastSeen=primaryId,secondaryId (which you should obfuscate in some way to avoid a leaky abstraction)). Then, the service can look at the original request, and know where to resume iterating through all of the primary objects.
Alternately, you can encode enough information into a page token so that you can reconstruct whatever you need from the original request. This allows you to make some adjustments to the query on subsequent requests. (For example, if the client requests primaries A-Z, and you return secondary objects A1 - J5 in the first response, then you could modify the request to be J-Z; already seen J5, encode it so that you aren't leaking your implementation details, and return it to the client as the page token.) Then, instead of responding with the original request + page number, the client simply responds with the page token.
Either way, clients of the relationship service should never have to "figure out" what the request for the next page should be. The pagination should only require the consumer to increment a number or respond with a page token that was given to it by the relationship service.
Another consideration is the database that you are using. For example, in DynamoDB, the way to get the 100th item for a query like select * from secondaries where primaryId='ABC' requires you to read all items up to the 100th item. If you have a NoSQL database, or if you think you might move to a NoSQL database at some point in the future, you might find that a page token makes it much simpler to maintain where you are in the result set (as compared to an index number).
I found this article to be very helpful when I was learning about pagination myself, and I'd recommend reading it. It primarily deals with pagination concerns for UIs, but the fundamentals are the same.
TLDR: Don't make the consumer do any work. The consumer should repeat the original request with an added index number or page token, or the consumer should send a request containing only a page token.
I have a GSA that fulfils a number of roles within my organisation. Honestly it's a bit of a frankenmess but it's what I have to work with.
One of the things we have it doing is indexing a number of sites based on a feed we pass it. Each of the items we pass in the feed gets tagged with a metadata that allows me to setup a frontend that only queries those items. This is working fine for the most part except that now I want to remove some metadata from items that are in the index (thereby stopping them from being in that particular frontend) and I can't figure out how.
I use a metadata-and-url type feed to push in these urls I want the system to be aware of. But it also finds a number of them through standard indexing patterns.
Heres the issue. The items that are in the index that have been found as a part of the standard crawling I can't remove. I just need the GSA to forget that I ever attached metadata to them.
Is this possible?
You can push a new feed that either updates the metadata or deletes the individual records that you want to remove from your frontend.
You can also block specific results from appearing in a specific frontend as a temporary measure while you work it out. See this doco.
It sounds like you would be better off using collections to group the subsets of the index that you want to present in a specific frontend.
We are trying to implement a FHIR Rest Server for our application. In our current data model (and thus live data) several FHIR resources are represented by multiple tables, e.g. what would all be Observations are stored in tables for vital values, laboratory values and diagnosis. Each table has an independent, auto-incrementing primary ID, so there are entries with the same ID in different tables. But for GET or DELETE calls to the FHIR server a unique ID is needed. What would be the most sensible way to handle this?
Searching didn't reveal an inherent way of doing this, so I'm considering these two options:
Add a prefix to all (or just the problematic) table IDs, e.g lab-123 and vit-123
Add a UUID to every table and use that as the logical identifier
Both have drawbacks: an ID parser is necessary for the first one and the second requires multiple database calls to identify the correct record.
Is there a FHIR way that allows to split a resource into several sub-resources, even in the Rest URL? Ideally I'd get something like GET server:port/Observation/laboratory/123
Server systems will have all sorts of different divisions of data in terms of how data is stored internally. What FHIR does is provide an interface that tries to hide those variations. So Observation/laboratory/123 would be going against what we're trying to do - because every system would have different divisions and it would be very difficult to get interoperability happening.
Either of the options you've proposed could work. I have a slight leaning towards the first option because it doesn't involve changing your persistence layer and it's a relatively straight-forward transformation to convert between external/fhir and internal.
Is there a FHIR way that allows to split a resource into several
sub-resources, even in the Rest URL? Ideally I'd get something like
GET server:port/Observation/laboratory/123
What would this mean for search? So, what would /Obervation?code=xxx search through? Would that search labs, vitals etc combined, or would you just allow access on /Observation/laboratory?
If these are truly "silos", maybe you could use http://servername/lab/Observation (so swap the last two path parts), which suggests your server has multiple "endpoints" for the different observations. I think more clients will be able to handle that url than the url you suggested.
Best, still, I think is having one of your two other options, for which the first is indeed the easiest to implement.
I'm building an internal server which contains a database of customer events. The webpage which allows access to the events is going to utilize an infinite scroll/dynamic loading scheme for display of live events as well as for browsing the results of queries to the database. So, you might query the database and maybe get 200k results. The webpage would display the 'first' 50 and allow you to scroll and scroll and scroll to see more and more results (loading perhaps 50 more at time).
I'm supposed to be using a REST api for the database access (a C# server). I'm unsure what the API should be so it remains RESTful. I've come up with 3 options. The question is, are any of them RESTful and which is most RESTful(is there such a thing -- if not I'll pick one of the RESTful).
Option 1.
GET /events?query=asdfasdf&first=1&last=50
This simply does the query and specifies the range of results to return. The server, unable to keep state, would have to requery the database each time (though perhaps utilizing the first/last hints to stop early) the infinite scroll occurs. Seems bad and there isn't any feedback about how many results are forthcoming.
Option 2 :
GET /events/?query=asdfasdf
GET /events/details?id1=asdf&id2=qwer&id3=zxcv&id4=tyui&...&id50=vbnm
This option first does a query which then returns the list of event ids but no further details. The webpage simply has the list of all the ids(at least it knows the count). The webpage holds onto the event id list and as infinite scroll/dynamic load is needed, makes another query for the event details of the specified ids. Each id is would nominally be a guid, so about 36 characters per id (plus &id##= for 41 characters). At 50 queries per hit, the URL would be quite long, 2000+ characters. The URL limit mentioned elsewhere on SO is around 2k. Maybe if I limit it to 40 ids per query this would be fine. It'd be nice to simply have a comma separated list instead of all the query parameters. Can you make a query parameter like ?ids=qwer,asdf,zxcv,wert,sdfg,rtyu,gfhj, ... ,vbnm ?
Option 3 :
POST /events/?query=asdfasdf
GET /events/results/{id}?first=1&last=50
This would post the query to the server and cause it to create a results resource. The ID of the results resource would be returned and would then be used to get blocks of the query results which in turn contain the event details needed for the webpage. The return from the POST XML could contain the number of records and other useful information besides the ID. Either the webpage would have to later delete the resource when the query page closed or the server would have to clean them up once they expire (days or weeks later).
I am concerned at Option 1, while RESTful is horrible for the server. I'm not sure requesting so many simultaneous resources, like the second GET in Option 2 is really RESTful or practical(seems like there has to be a better way). I'm not sure Option 3 is RESTful at all or if it is, its sort of cheating the REST thing by creating state via a POST(or should that be PUT).
Option 3 worked out fine. It required the server to maintain the query results and there was a bit of debate about how many queries (from various users) should simultaneously be saved as there would be no way to know when a user was actually done with a query.
I'm trying to create a feed to display the combined results of 3 queries to the Twitter API, since there seems to be no way to get what I want with one API call (2 user timelines and 1 search result for a hashtag). I want the results to be sorted by date so that the most recent appears at the start (just like when I get a result back from the Twitter API).
How can I combine these 3 JSONs (from the Twitter API), whilst maintaining the date order?
Thanks
there are several ways of doing this. i would probably persist all the stuff in a database.
that comes with several benefits:
first of all, caching is super easy, you can fetch the data from your own database until it gets stale.
secondly, databases are reaaaaaaaaaaaaaaaaly good at sorting and all this date shizzle. it's hard doing this manually and usually slow. i tend to mess it up all the time, so i let the database do the job for me.