SSAS aggregation not being used - performance

So I have a fairly hefty cube that won't be much good without aggregations. I'm still in dev phases, so I'm manually attempting usage based agg design. I'm aggregating some of the main queries that we've designed. However, every time I pull these up, it looks like it's reading through each partition it hits (biggest groups are partitioned monthly).
I decided I'd try to narrow it down. After all, may just be the queries, or a blip, or what have you. So, using SQL Server Profiler and BIDS Helper, I created one and only one aggregation on one of my measure groups. I then ran said query and looked at the profiler, and it again hit every single partition, and didn't grab a thing out of an aggregation.
My only guess is that this is due to the fact that the measure being pulled back has a measure expression (currency conversion). Anybody got any ideas?

As pointed out in the Identifying Bottlenecks whitepaper, measure expressions invalidate aggregations. Once I removed all measure expressions from the measure group, the aggregations were again in use. Hoorah!

Related

Strategies to compare performance of two Elasticsearch queries?

Since actual query runtime varies, it's not always useful to just check the runtime of two queries to determine which is generally faster. What are some ways to generally test whether one query is more efficient than another?
As an example of what I'm after, in MongoDB I can run explain on a query to get the number of documents iterated vs. returned. If the documents iterated is several orders of magnitude higher than what it's actually returning, I know I have an inefficient query. I know that since Elasticsearch indexes data much differently than other dbs, this may not translate well, but I'm wondering if there's some rough equivalent.
I'm looking at the Profile API which looks like a good starting place. Are fields like next_doc and next_doc_count what I'm after? Are there any others I should look for? Thanks!!

MongoDB search geospatial after searching by different index

I have a big collection, with data arranged by lat-lng coordinates and time with respective indices (spherical geospatial and descending for the times).
There is roughly 10`000 points per time step and around 2000 time steps now, and I would like to find, say, values in the 5 points, closest to the given coordinates in some small time span.
The problem is that after number of records had hit 10 millions, the performance of the geospatial queries became unsettlingly unpredictable: what earlier always took only around 3 seconds to fetch and process, now can take all the way up to 45, which is unacceptable in my case.
From what I've read on the mongodb docs, it seems that if query requires to utilize geospatial index, by default it has priority over any other parts of the query, so in my case, search first goes through all 2000 of the time steps to find all the 10`000 of the "5 closest points" and only then narrows it down to 50 time steps I requested in the first place. It seems even more plausible since even .count() queries along the geo-index can take as much time.
So basically, if possible I would like to do it other way around: first narrow down some time steps, and only then perform the $near find or $geoNear aggregation on this considerably smaller dataset. It would help also that queries without geospatial part had kept their performance.
Hinting the other index in the .find() query breaks geospatial search and $geoNear accepts no aggregation pipeline steps before it. Ensuring geo-index before query for some reason created second geo-index, so aggregation became impossible
Two-step search or aggregation would have been great, but it seems that there is no such thing in mongo, and although I think it can be imitated with aggregating data into the proxy collection, it look like quite a bad idea.
It is possible though that I have some gradual misunderstanding about mongodb index work and/or lifetime, since request time is not consistently bad like it should have probably been if what I have just written was correct.
Just in case:
mongodb 3.0.4;
Motor 0.4.1 with pymongo 2.8, although problem was first discovered on pymongo 3.0.3;
discussed collection is a part of replica set, querying script is running locally on the machine with secondary node.

MongoDB text index search slow for common words in large table

I am hosting a mongodb database for a service that supports full text searching on a collection with 6.8 million records.
Its text index includes ten fields with varying weights.
Most searches take less than a second. Some searches take two to three seconds. However, some searches take 15 - 60 seconds! The 15-60 second search cases are unacceptable for my application. I need to find a way to speed those up.
Searching takes 15-60 seconds when words that are very common in the index are used in the search query.
I seems that the text search feature does not support lazy parameters. My first thought was to cache a list of the 50 most common words in my text index and then ask mongodb to evaluate those last (lazy) and on top of the filtered results returned by the less common parameters. Hopefully people are still with me. For example, say I have a query "products chocolate", where products is common and chocolate is uncommon. I would like to be able to ask mongodb to evaluate "chocolate" first, and then filter those results with the "products" term. Does anyone know of a way to achieve this?
I can achieve the above scenario by omitting the most common words (i.e. "products") from the db query and then reapplying the common term filter on the application side after it has received records found by db. It is preferable for all query logic to happen on the database, but am open to application side processing for a speed payout.
There are still some holes in this design. If a user only searches common terms, I have no choice but to hit the database with all the terms. From preliminary reading, I gather that it is not recommended (or not supported) to have multiple text indexes (with different names) on the same collection. My plan is to create two identical tables, each with my 6.8M records, with different indexes - one for common words and one for uncommon words. This feels kludgy and clunky, but am willing to do this for a speed increase.
Does anyone have any insight and/or advice on how to speed up this system. I'd like as much processing to happen on the database as possible to keep it fast. I'm sure my little 6.8M record table is not the largest that mongodb has seen. Thanks!
Well I worked around these performance issues by allowing MongoDB full text search to search in OR based format. I'm prioritizing my results by fine tuning the weights on my indexed fields and just ordering by rank. I do get more results than desired, but that's not a huge problem because my weighted results that appear at the top will most likely be consumed before my user gets to less relevant results at the bottom.
If anyone is struggling with MongoDB text search performance using AND searching only, just switch back to OR and control your results using weights. It performs leaps better.
hth
This is the exact same issue as $all versus $in. $all only uses the index for the first keyword in the array. I believe your seeing the same issue here, reason why the OR a.k.a. IN works for you.

Mongodb count with queries have poor performance

Counting the number of records in a collection that is matched by a query even on an indexed field takes too much time. For example lets say there is a collection consists of 10000 records, and there is an index on the creationDate field of this collection. Getting the last ten records from the collection is faster than counting the number of records created on the last day. It takes more than 5 seconds, sometimes even up to 70 seconds to return the result of the count query. Do you have any idea how to solve this problem, what is the best way to solve this issue?
Btw we also use morphia, and we saw that getting the count through morphia is even slower, so for count queries, we transform the morphia query to the java driver query. Did anyone encounter a similar situation, why does morphia response even slower? Does this happen only for count queries or is it slow in general compared to using only java driver?
Help, suggestions or work-arounds would be really appreciated, our application relies heavily on count queries and the slowness of system is really annoying for us right now.
Thanks in advance.
While this might not be the final answer, let's get started and evolve this further:
Your indexes should always fit into RAM, otherwise you will get really bad performance.
To evaluate how much RAM is used, you can either use 10gen's MMS or check with various tools. For a description plus possible reasons for low (resident) memory usage see http://www.kchodorow.com/blog/2012/05/10/thursday-5-diagnosing-high-readahead/. Or you simply haven't accessed enough data yet in which case you can use MongoDB's touch (but I doubt that since you're already having performance issues).
Besides adding RAM and making sure you use all the available RAM, you could also drop unused indexes or use compound indexes where possible.
Wait for a fix.
As Asya Kamsky commented out, count performance is really bad on 2.2. The only workaround we've found is to avoid them as much as possible.
(There are other things that are unexplainably slow with mongodb - such as aggregate queries - most of those have associated JIRA issues and are being worked on/scheduled)

One complex query vs Multiple simple queries

What is actually better? Having classes with complex queries responsible to load for instance nested objects? Or classes with simple queries responsible to load simple objects?
With complex queries you have to go less to database but the class will have more responsibility.
Or simple queries where you will need to go more to database. In this case however each class will be responsible for loading one type of object.
The situation I'm in is that loaded objects will be sent to a Flex application (DTO's).
The general rule of thumb here is that server roundtrips are expensive (relative to how long a typical query takes) so the guiding principle is that you want to minimize them. Basically each one-to-many join will potentially multiply your result set so the way I approach this is to keep joining until the result set gets too large or the query execution time gets too long (roughly 1-5 seconds generally).
Depending on your platform you may or may not be able to execute queries in parallel. This is a key determinant in what you should do because if you can only execute one query at a time the barrier to breaking up a query is that much higher.
Sometimes it's worth keeping certain relatively constant data in memory (country information, for example) or doing them as a separately query but this is, in my experience, reasonably unusual.
Far more common is having to fix up systems with awful performance due in large part to doing separate queries (particularly correlated queries) instead of joins.
I don't think that any option is actually better. It depends on your application specific, architecture, used DBMS and other factors.
E.g. we used multiple simple queries with in our standalone solution. But when we evolved our product towards lightweight internet-accessible solution we discovered that our framework made huge number of request and that killed performance cause of network latency. So we sufficiently reworked our framework for using aggregated complex queries. Meanwhile, we still maintained our stand-alone solution and moved from Oracle Light to Apache Derby. And once more we found that some of our new complex queries should be simplified as Derby performed them too long.
So look at your real problem and solve it appropriately. I think that simple queries are good for beginning if there are no strong objectives against them.
From a gut feeling I would say:
Go with the simple way as long as there is no proven reason to optimize for performance. Otherwise I would put the "complex objects and query" approach in the basket of premature optimization.
If you find that there are real performance implications then you should in the next step optimize the roundtripping between flex and your backend. But as I said before: This is a gut feeling, you really should start out with a definition of "performant", start simple and measure the performance.

Resources