Parse-platform-Mongo-DB Are aggregate queries more efficient than normal queries? - parse-platform

Is using query.aggregate(pipeline) in mongoDB more efficient than using normal queries such as query.equalTo, or query.greaterThan?
Aggregate queries definitely require much less code, but that alone doesn't seem to justify the complexity they bring with all the additional parantheses and abbreviations.
Normal queries seem more straightforward, but are they inferior in performance? What is a good use case for aggregate queries vs normal ones?

Related

Why does query optimization affect performance?

I am a student who is learning spring and jpa recently. While developing 'get api' with conditions, I came to think about which method is advantageous in terms of performance.
When it is necessary to query data based on conditions, jpql or querydsl are usually used to generate dynamic queries. Can you tell me why generating a dynamic query like this and looking up only the necessary data is better than using the java stream filter() function after looking up the entire data?
Also, can you tell me why generating fewer queries is advantageous in terms of performance?
I know that generating fewer queries has a performance advantage, but I lack an understanding of why I say it has a performance advantage.
Can you tell me why generating a dynamic query like this and looking up only the necessary data is better than using the java stream filter() function after looking up the entire data?
In general addressing the database or any other external storage is much more expensive than most of operations on Java side because of networking latency. If you query all the data and use e.g. list.stream().filter() than the significant amount of data is transferred over the network. And if one vice versa queries only some data filtered on the DB side the transferred amount in lower.
Pay attention, that while this is true in general there might be a cases when filtering on Java side could be more effective. This is highly dependent on several things:
query complexity
amount of data
database structure (schema, indices, column types etc.)
As of number of queries here we have the same considerations: query execution costs, data transfer costs, so the less queries you have - the better. And again, this is not an axiom: in some cases having multiple lightweight queries with grouping/filtering on Java side might be faster, than one huge and complicated SQL-query.

Strategies to compare performance of two Elasticsearch queries?

Since actual query runtime varies, it's not always useful to just check the runtime of two queries to determine which is generally faster. What are some ways to generally test whether one query is more efficient than another?
As an example of what I'm after, in MongoDB I can run explain on a query to get the number of documents iterated vs. returned. If the documents iterated is several orders of magnitude higher than what it's actually returning, I know I have an inefficient query. I know that since Elasticsearch indexes data much differently than other dbs, this may not translate well, but I'm wondering if there's some rough equivalent.
I'm looking at the Profile API which looks like a good starting place. Are fields like next_doc and next_doc_count what I'm after? Are there any others I should look for? Thanks!!

Which of the Conditional Function is performance effective in HIVE? IF or CASE?

Which of the Conditional Function is performance effective in HIVE? IF or CASE ?
I can speak from experience of working on optimizing complex queries with experts from Hortonworks. We worked on multi hundred line queries that included multiple IF/THEN and CASE. The performance difference is so small as to be unmeasurable.
Worry instead about your joins - i.e. mapside vs side data vs reduce side joins - and UDF's: those are where the performance improvements are to be found.
We did substantial tuning across a number of areas including a number of different types and skewness of joins, UDF's, and inline views. This is not an area that ever surfaced.
Unsubstantiated, but it has been reported that the if/then is actually faster. http://www.oehive.org/node/985

Can I use LIMIT to speed up a SPARQL query?

I have a large number of results from a query that users can refine by typing a search term.
However, when there are many, many results, I don't need to show all of them.
I notice that when I use LIMIT in my SPARQL query though, the query takes just as long. Is there a way to use LIMIT in an "interrupt" fashion to shorten the processing time?
Thank you.
No, the implementation of LIMIT like any part of the query is up to the underlying query engine.
Some query engines may implement LIMIT in such a way that it will perform quicker than getting all the results but this doesn't necessarily apply to every query (nor to every query engine)
Depending on the framework being used to make queries and process results you may be able to process results in such a way that you only look at the portion of results you care about but that likely doesn't solve your problem.

One complex query vs Multiple simple queries

What is actually better? Having classes with complex queries responsible to load for instance nested objects? Or classes with simple queries responsible to load simple objects?
With complex queries you have to go less to database but the class will have more responsibility.
Or simple queries where you will need to go more to database. In this case however each class will be responsible for loading one type of object.
The situation I'm in is that loaded objects will be sent to a Flex application (DTO's).
The general rule of thumb here is that server roundtrips are expensive (relative to how long a typical query takes) so the guiding principle is that you want to minimize them. Basically each one-to-many join will potentially multiply your result set so the way I approach this is to keep joining until the result set gets too large or the query execution time gets too long (roughly 1-5 seconds generally).
Depending on your platform you may or may not be able to execute queries in parallel. This is a key determinant in what you should do because if you can only execute one query at a time the barrier to breaking up a query is that much higher.
Sometimes it's worth keeping certain relatively constant data in memory (country information, for example) or doing them as a separately query but this is, in my experience, reasonably unusual.
Far more common is having to fix up systems with awful performance due in large part to doing separate queries (particularly correlated queries) instead of joins.
I don't think that any option is actually better. It depends on your application specific, architecture, used DBMS and other factors.
E.g. we used multiple simple queries with in our standalone solution. But when we evolved our product towards lightweight internet-accessible solution we discovered that our framework made huge number of request and that killed performance cause of network latency. So we sufficiently reworked our framework for using aggregated complex queries. Meanwhile, we still maintained our stand-alone solution and moved from Oracle Light to Apache Derby. And once more we found that some of our new complex queries should be simplified as Derby performed them too long.
So look at your real problem and solve it appropriately. I think that simple queries are good for beginning if there are no strong objectives against them.
From a gut feeling I would say:
Go with the simple way as long as there is no proven reason to optimize for performance. Otherwise I would put the "complex objects and query" approach in the basket of premature optimization.
If you find that there are real performance implications then you should in the next step optimize the roundtripping between flex and your backend. But as I said before: This is a gut feeling, you really should start out with a definition of "performant", start simple and measure the performance.

Resources