Spring data JPA to filter by multiple fields and return projections

Spring data JPA to filter by multiple fields and return projections - spring

I am working on a PoC of a simple RESP API with a MySQL database. I am using Spring Data JPA and Spring Web. I think my my use case is very common and yet I haven't found the best solution. Here is the details.
For the sake of the discussion, let's use an entity Person with say 50 different fields/columns (e.g. id, name, age, etc.)
I want to expose a simple HTTP GET endpoint that would return a JSON with a list of people. I want to offer the user the following request params:
withName to filter by name.
withAge to filter by age.
... some other withXXX, although not as many as the number of fields. Let's say 5 more.
sortBy to sort
page and pageSize to paginate.
fields to decide what fields the user wants to get in the response
An example of a possible call would be:
GET /people?withName=Charly&withAge=30&withCountry=esp&sortBy=name&page=1&pageSize=10&fields=id,surname,genre
[
{
"id": 1,
"surname": "Smith",
"genre": "male"
},
{
"id": 45,
"surname": "Doe",
"genre": "female"
},
...
]
Now, after having read here and there, I think it would be good to have the following:
Use DTO (Spring projections) because I don't want to get the 50 columns from the database if it's not required (see fields request param).
Use perhaps specifications because the fields to filter with are chosen by the user in the request.
Use standard sorting and pagination features offered by JPA/Spring data.
I was expecting this use case to be very common, but I am surprised that I haven't found it anywhere else. This makes me think that I am doing something wrong.
Do I need to implement this "manually" using queries or criteria? I expect to have a high number of entities and was expecting not to have to implement this for every one of them.
I saw this interesting article, but it isn't using projections, which I think it's important.
I also have read articles in Baeldung, in Vlad Mihalcea's site, in Thorben Janssen's site and, of course, Stack Overflow, all related to the use of DTO, projections, etc., but again, I haven't found anywhere an example mixing Spring data, projections with a dynamic number of fields, filtering using multiple fields and pagination.
I am started to think that I am asking too much, but I see this scenario and think that people must have faced it many times in the past.
Can anybody help, please. Thank you very much.

Related

I don't get GraphQL. How do you solve the N+1 issue without preloading?

A neighborhood has many homes. Each home is owned by a person.
Say I have this graphql query:
{
neighborhoods {
homes {
owner {
name
}
}
}
}
I can preload the owners, and that'll make the data request be a single SQL query. Fine.
But what if I don't request the owner in the graphql query, the data will still be preloaded.
And if I don't preload, the data will either be fetched in every query, or not at all since I'm not loading the belongs_to association in the resolver.
I'm not sure if this is a solved issue, or just a painpoint one must swallow when working with graphql.
Using Absinthe, DataLoader and Elixir by the way.

Most GraphQL implementations, including Absinthe, expose some kind of "info" parameter that contains information specific to the field being resolved and the request being executed. You can parse this object to determine which fields were actually requested and build your SQL query appropriately.
See this issue for a more in-depth discussion.

In order to complement what Daniel Rearden said, you have to use the info.definition to resolve nested includes.
In my application I defined an array of possible values like:
defp relationships do
[
{:person, [tasks: [:items]]]}
...
]
end
then I have a logic that iterates over the info.definition and uses this function to preload the associations.
You will use a DataLoader to lazy load your resources. Usually to fetch third party requests or perform a complex database query.

Does GraphQL ever redundantly visit fields during execution?

I was reading this article and it used the following query:
{
getAuthor(id: 5){
name
posts {
title
author {
name # this will be the same as the name above
}
}
}
}
Which was parsed and turned into an AST like the one below:
Clearly it is bringing back redundant information (the Author's name is asked for twice), so I was wondering how GraphQL Handles that. Does it redundantly fetch that information? Is the diagram a proper depiction of the actual AST?
Any insight into the query parsing and execution process relevant to this would be appreciated, thanks.
Edit: I know this may vary depending on the actual implementation of the GraphQl server, but I was wondering what the standard / best practice was.

Yes, GraphQL may fetch the same information multiple times in this scenario. GraphQL does not memoize the resolver function, so even if it is called with the same arguments and the same parent value, it will still run again.
This is a fairly common problem when working with databases in GraphQL. The most common solution is to utilize DataLoader, which not only batches your database requests, but also provides a cache for those requests for the duration of the GraphQL request. This way, even if a particular record is requested multiple times, it will only be fetched from the database once.
The alternative (albeit more complicated) approach is to compose a single database query based on the requested fields that executes at the root level. For example, our resolver for getAuthor could constructor a single query that would return the author, their posts and each of that post's author. With this approach, we can skip writing resolvers for the posts field on the Author type or the author field on the Post type and just utilize the default resolver behavior. However, in order to do this and avoid overfetching, we have to parse the GraphQL request inside the getAuthor resolver in order to determine which fields were requested and should therefore be included in our database query.

Is there a good way to combine resolvers for multiple GraphQL objects, preferably using graphql-js?

Suppose you have a GraphQL layer, written on node.js using graphql-js, that communicates with a SQL database. Suppose you have the following simple types and fields:
Store
A single brick-and-mortar location for a chain of grocery stores.
Fields:
id: GraphQLID
region: StoreRegion
employees: GraphQLList(Employee)
StoreRegion
A GraphQLEnumType containing the list of regions into which the chain divides its stores.
Values:
NORTHEAST
MIDATLANTIC
SOUTHEAST
...
Employee
Represents a single employee working at a store.
Fields:
id: GraphQLID
name: GraphQLString
salary: GraphQLFloat
Suppose the API exposes a store query that accepts a Region and returns a list of Store objects. Now suppose the client sends this query:
{
store(region: NORTHEAST) {
employees {
name
salary
}
}
}
Hopefully this is all pretty straightforward so far.
So here's my question, and I hope (expect, really) that it's something that has a common solution and I'm just having trouble finding it because my Google-Fu is weak today: is there a good way that can I write the resolvers for these types such that I can wrap up all the requested fields for all the employees from all the returned stores into a single SQL query statement, resulting in one round-trip to the database of the form:
SELECT name,salary FROM employees WHERE id IN (1111, 1133, 2177, ...)
rather than making one request per employee or even one request per store?
This is really a concrete instance of a more general question: is there a good way to combine resolvers to avoid making multiple requests in cases where they could be easily combined?
I'm asking this question in terms of graphql-js because that's what I'm hoping to work with, and since I figure that would allow for more specific answers, but if there's a more implementation-agnostic answer, that would be cool too.

So, basically you are wondering how you can combine multiple resolvers into fewer database queries. This is trying to solve what they call the N+1 query problem. Here’s at least two ways you can solve this.
DataLoader: This is a more general solution and it's created by Facebook. You could use this to batch multiple queries that query a single item of a single type into a single query that queries multiple items of a single type. In your example you would batch all employees into a single DB query and you would still have a separate query for getting the store. Here's a video by Ben Awad that explains DataLoader
JoinMonster: Specifically for SQL. Will do JOINs to make one SQL query per graphql query. Here's a video by Ben Awad explaining JoinMonster

Save class field in particular order in Spring couchbase

I am using couchbase as Spring-data in Spring boot application,
I am saving a class let say Employee
which contain fields empId, empName, empDesc etc.
I wanted to save this object in couchbase in particular order.
Lets say I wanted to save this json in couchbase in the order
{
"empName": "hello",
"empDesc": "helloDesc",
"empId": "hello11"
}
How can I can acheive this?
Any help is appreciated.

I think you miss the point. Order of keys is irrelevant in a JSON object, be it in Couchbase or not.
An object is an unordered set of name/value pairs.
Cf http://json.org/
If the order is so important to you, you should use a JSON Array.
Don't use an API or framework in a devious way, it usually ends bad.

Is there anything wrong with creating Couch DB views with null values?

I've been doing a fair amount of work with Couch DB in my spare time recently and really enjoy using it. I find it to be much more flexible than using a relational database, but it's not without it's disadvantages.
One big disadvantage is the lack of dynamic queries / view generation... So you have to do a fair amount of work in planning and justifying your views, as you can't put that logic into your application code as you might do with SQL.
For example, I wrote a login scheme based on a JSON document template that looked a little bit like this:
{
"_id": "blah",
"type": "user",
"name": "Bob",
"email": "bob#theaquarium.com",
"password": "blah",
}
To prevent the creation of duplicate accounts, I wrote a very basic view to generate a list of user names to lookup as keys:
emit(doc.name, null)
This seemed reasonably efficient to me. I think it's way better than dragging out an entire list of documents (or even just a reduced number of fields for each document). So I did exactly the same thing to generate a list of email addresses:
emit(doc.email, null)
Can you see where I'm going with this question?
In a relational database (with SQL) one would simply make two queries against the same table. Would this technique (of equating a view to the product of an SQL query) be in some way analogous?
Then there's the performance / efficiency issue... Should those two views really be just one? Or is the use of a Couch DB view with keys and no associated value an effective practice? Considering the example above, both of those views would have uses outside of a login scheme... If I ever need to generate a list of user names, I can retrieve them without an additional overhead.
What do you think?

First, you certainly can put the view logic into your application code - all you need is an appropriate build or deploy system that extracts the views from the application and adds them to a design document. What is missing is the ability to generate new queries on the fly.
Your emit(doc.field,null) approach certainly isn't surprising or unusual. In fact, it is the usual pattern for "find document by field" queries, where the document is extracted using include_docs=true. There is also no need to mix the two views into one, the only performance-related decision is whether the two views should be placed in the same design document: all views in a design document are updated when any of them is accessed.
Of course, your approach does not actually guarantee that the e-mails are unique, even if your application tries really hard. Imagine the following circumstances with two client applications A and B:
A: queries view, determines that `test#email.com` does not exist.
B: queries view, determines that `test#email.com` does not exist.
A: creates account with `test#email.com`
B: creates account with `test#email.com`
This is a rare occurrence, but nonetheless possible. A better approach is to keep documents that use the email address as the key, because access to single documents is transactional (it's impossible to create two documents with the same key). Typical example:
{
_id: "test#email.com",
type: "email"
user: "000000001"
}
{
_id: "000000001",
type: "user",
email: "test#email.com",
firstname: "Test",
...
}
EDIT: a reservation pattern only works if two clients attempting to create an account for a given e-mail will reliably try to access the same document. If you randomly generate a new identifier, then client A will create and reserve document XXXX while client B will create and reserve document YYYY, and you will end up with two different documents that have the same e-mail.
Again, the only way to perform a transactional "check if it exists, create if it does not" operation is to have all clients alter a single document.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio