Will separating graphql query decrease performance? - graphql

Is it ideal or bad practice to separate query?
For example, should I query page. Or split this master query and query customer and footer?
Is there a performance difference difference between hitting one vs muliple queries like this?
query page {
customer {
Name
Gender
age
}
footer {
supportUrl
}
}
Vs
query customer {
Name
Gender
Age
}
Query footer {
SupportUrl
}

Running a single query will usually be faster; how much faster depends much more on the external network conditions than anything in your code.
Running a single query will result in a single network round trip. If your code would otherwise run the two queries serially in separate HTTP requests, having to wait for one network round trip to complete before starting the second could be observably slower, depending on the network between the server and the end user.
The only real downside to running the two queries together is if you had some queries that were quite slow, and ran the fast and slow ones together; then you wouldn't get responses to the fast queries until everything had completed.
The server is allowed to execute the queries in parallel. If your client code can run the queries in parallel as well, you can mitigate the visible performance penalty from having multiple separate queries, and this would address the problem of having specific queries that ran slowly on the server side.

Related

Why is redshift inefficient event for the simplest select queries?

I have a few views in my Redshift database. There are a couple of users who perform simple select statements on these views. When a single select query is run, it executes quickly (typically a few seconds) but when multiple select queries(same simple select statement) are run at the same time, all the queries get queued on the Redshift side and take forever to retrieve the results. Not sure why the same query taking a few seconds get queued when triggered in parallel with other select queries.
I am curious to know how can this be resolved or if there is any workaround I need to consider.
There are a number of reasons why this could be happening. First off how many queries in parallel are we talking about? 10, 100, 1000?
The WLM configuration determines the parallelism that a cluster is set up to perform. If the WLM has a queue with only one slot then only one query can run at a time.
Just because a query is simple doesn't mean it is easy. If the tables are configured correctly or if a lot of data is being read (or spilled) a lot of system resources could be needed to perform the query. When many such queries come along these resources get overloaded and things slow down. You may need to evaluate your cluster / table configurations to address any issues.
I could keep guessing possibilities but the better approach would be to provide a query example, WLM configuration and some cluster performance metrics (console) to help narrow things down.

MonetDB Query Plan

I have a few queries that I am running and I would like to view some sort of query plan for a given query. When I add "explain" before the query, I get a long (~4,000 lines) result that is not possible to interpret.
The MAL plan exposes all parallel activity needed to solve the query. Each line is a relational algebra operator or catalog action.
You might also use PLAN to get an idea of the output of the SQL optimizer.
Each part in the physical execution plan that'll be executed in parallel is repeated the same number of times as the number of cores you have in the result of EXPLAIN. That's why EXPLAIN can sometimes produce a huge MAL plan.
If you just want to have an idea of how are query is handled, you can force MonetDB to generate a sequential MAL plan, then at least, you get rid of the repetitions. For this, you can change the default optimiser pipe line to, e.g., 'sequential_pipe'. This can be done both in a client (it works then only for this client session), or in a server (it works then for the whole server session). For more information: https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/OptimizerPipelines

Dumping Azure tables quickly

My task is to dump entire Azure tables with arbitrary unknown schemas. Standard code to do this resembles the following:
TableQuery<DynamicTableEntity> query = new TableQuery<DynamicTableEntity>();
foreach (DynamicTableEntity entity in table.ExecuteQuery(query))
{
// Write a dump of the entity (row).
}
Depending on the table, this works at a rate of 1000-3000 rows per second on my system. I'm guessing this (lack of) performance has something to do with separate HTTP requests issued to retrieve the data in chunks. Unfortunately, some of the tables are multi-gigabyte in size, so this takes a rather long time.
Is there a good way to parallelize the above or speed it up some other way? It would seem that those HTTP requests could be sent by multiple threads, as in web crawlers and the like. However, I don't see an immediate method to do so.
Unless you know the PartitionKeys of the entities in the table (or some other querying criteria which includes PartitionKey), AFAIK you would need to take a top down approach which you're doing right now. In order for you to fire queries in parallel which would work efficiently you have to include PartitionKey in your queries.

What is the most efficient way to filter a search?

I am working with node.js and mongodb.
I am going to have a database setup and use socket.io to have real-time updates that will have the db queried again as well or push the new update to the client.
I am trying to figure out what is the best way to filter the database?
Some more information in regards to what is being queried and what the real time updates are:
A document in the database will include information such as an address, city, time, number of packages, name, price.
Filters include city/price/name/time (meaning only to see addresses within the same city, or within the same time period)
Real-time info: includes adding a new document to the database which will essentially update the admin on the website with a notification of a new address added.
Method 1: Query the db with the filters being searched?
Method 2: Query the db for all searches and then filter it on the client side (Javascript)?
Method 3: Query the db for all searches then store it in localStorage then query localStorage for what the filters are?
Trying to figure out what is the fastest way for the user to filter it?
Also, if it is different than what is the most cost effective way, then the most cost effective as well (which I am assuming is less db queries)...
It's hard to say because we don't see exact conditions of the filter, but in general:
Mongo can use only 1 index in a query condition. Thus whatever fields are covered by this index can be used in an efficient filtering. Otherwise it might do full table scan which is slow. If you are using an index then you are probably doing the most efficient query. (Mongo can still use another index for sorting though).
Sometimes you will be forced to do processing on client side because Mongo can't do what you want or it takes too many queries.
The least efficient option is to store results somewhere just because IO is slow. This would only benefit you if you use them as cache and do not recalculate.
Also consider overhead and latency of networking. If you have to send lots of data back to the client it will be slower. In general Mongo will do better job filtering stuff than you would do on the client.
According to you if you can filter by addresses within time period then you could have an index that cuts down lots of documents. You most likely need a compound index - multiple fields.

Ajax: many (smaller) calls vs. a single large call

I have a page that will display all available data of a certain kind to all users. The data will be displayed separated by a number of criteria and I'm pondering certain design questions.
to make matters easier to understand, say I have sales data per month, per category and per location. on the page I will create an accordion for each month, within which I will have 1 table per category and in each table a list of locations.
so I'm wondering, which is better:
1) a single controller method that fetches all the data and:
a) does the work of converting the tabular format returned from the database to a hierarchical structure (because this is easier for the front-end to navigate) like:
{ Month, { Category, { Location, Value } } }
b) returns tabular data like
{ Month, Category, Location, Value }
and lets JQuery at the front end loop through to make it hierarchical
2) many smaller methods that each return distinct data and that need to be called by the front end? for example, a method that returns a distinct list of months for which there is data would be called once but JQuery would then need to loop through the results to query for the categories, which themselves would be looped through to get locations, sort of like this:
for (var m in GetMonths()) {
for (var c in GetCategories(m)) {
GetLocations(m, c);
}
}
as a final note, by "better" I mean both that the system will perform better under a heavy load, and that the code is structured in a more maintainable and DRY manner.
thank you for your consideration
Without actual performance numbers for the individual queries the answer will have to contain quite a bit of conjecture. But here are some generalizations.
If the data is being presented to "all users", then your most important decision is going to be cache the data. Having said that, the server side performance of the code will be much less important. Not unimportant, but if you're going to server it hundreds, thousands or millions of times for each time it is generated, you can tolerate a little more server-side work.
For scalability, prefer fewer calls to the server over more calls. That would point to the single controller method that returns all the data. Unless fetching the data is dramatically more time consuming that fetching just a single month of data, that would be your better choice.
For transforming the data to the hierarchy on the server vs. on the client, I tend to prefer to do it on the client. If there is a lot of this data, your client experience may be better if it is done on the server but doing it on the client has a could of advantages. First, client side code is automatically distributed. If you are trying to get the highest throughput, moving logic, especially presentation logic, to the client will free up the servers to do other work.
Also, by doing the transformation on the client, your server method is not as dependent on the the view. If you later decide that a different display would be better, it may just be a view change.
But, back to the beginning of the answer, if you are going to create this data once and cache it, doing the transformation at the server eliminates the need for each client to transform it.

Resources