I want to run a heavy query and somehow limit the resources it uses, so it never affects other client's queries.
Is it possible?
This is currently not possible. You could use sharding to make the query not affect other queries that are reading totally different data, but there's no way right now to prioritize between different queries operating on the same data.
Related
I have a few views in my Redshift database. There are a couple of users who perform simple select statements on these views. When a single select query is run, it executes quickly (typically a few seconds) but when multiple select queries(same simple select statement) are run at the same time, all the queries get queued on the Redshift side and take forever to retrieve the results. Not sure why the same query taking a few seconds get queued when triggered in parallel with other select queries.
I am curious to know how can this be resolved or if there is any workaround I need to consider.
There are a number of reasons why this could be happening. First off how many queries in parallel are we talking about? 10, 100, 1000?
The WLM configuration determines the parallelism that a cluster is set up to perform. If the WLM has a queue with only one slot then only one query can run at a time.
Just because a query is simple doesn't mean it is easy. If the tables are configured correctly or if a lot of data is being read (or spilled) a lot of system resources could be needed to perform the query. When many such queries come along these resources get overloaded and things slow down. You may need to evaluate your cluster / table configurations to address any issues.
I could keep guessing possibilities but the better approach would be to provide a query example, WLM configuration and some cluster performance metrics (console) to help narrow things down.
I use bulk update with script in order to update a nested field, but this is very slow :
POST index/type/_bulk
{"update":{"_id":"1"}}
{"script"{"inline":"ctx._source.nestedfield.add(params.nestedfield)","params":{"nestedfield":{"field1":"1","field2":"2"}}}}
{"update":{"_id":"2"}}
{"script"{"inline":"ctx._source.nestedfield.add(params.nestedfield)","params":{"nestedfield":{"field1":"3","field2":"4"}}}}
... [a lot more splitted in several batches]
Do you know another way that could be faster ?
It seems possible to store the script in order to not repeat it for each update, but I couldn't find a way to keep "dynamic" params.
As often with performance optimization questions, there is no single answer since there are many possible causes of poor performance.
In your case you are making bulk update requests. When an update is performed, the document is actually being re-indexed:
... to update a document is to retrieve it, change it, and then reindex the whole document.
Hence it makes sense to take a look at indexing performance tuning tips. The first few things I would consider in your case would be selecting
right bulk size, using several threads for bulk requests and increasing/disabling indexing refresh interval.
You might also consider using a ready-made client that supports parallel bulk requests, like Python elasticsearch client does.
It would be ideal to monitor ElasticSearch performance metrics to understand where the bottleneck is, and if your performance tweaks are giving actual gain. Here is an overview blog post about ElasticSearch performance metrics.
just out of curiosity, does anybody know if Neo4j and OrientDB implement caching of query results, that is, storing in cache a query together with its result, so that subsequent requests of the same query are served without actually computing the result of the query.
Notice that this is different from caching part of the DB since in this case the query would be anyway executed (possibly using only data taken from memory instead of disk).
Starting from release v2.2 (not in SNAPSHOT but will be RC in few days), OrientDB supports caching of commands results. Caching command results has been used by other DBMSs and proven to dramatically improve the following use cases:
database is mostly read than write
there are a few heavy queries that result a small result set
you have available RAM to use or caching results
By default, the command cache is disabled. To enable it, set command.timeout=true.
For more information: http://orientdb.com/docs/last/Command-Cache.html.
There are a couple of layers where you can put the caching. You can put it at the highest level behind Varnish ( https://www.varnish-cache.org ) or some other high level cache. You can use a KV store like Redis ( http://redis.io ) and store a result with an expiration. You can also cache within Neo4j using extensions. Both simple things like index look-ups, partial traversals or complete results. See http://maxdemarzi.com/2014/03/23/caching-partial-traversals/ or http://maxdemarzi.com/2015/02/27/caching-immutable-id-lookups-in-neo4j/ for some ideas.
After using a myisam for years now with 3 indexes + around 500 columns for Mio of rows, I wonder how to "force" mongodb to store indexes in memory for fast-read performance.
In general, it is a simply structured table and all queries are WHERE index1=.. or index2=... or index3=.. (myisam) and pretty simple in mongodb as well.
It's nice if mongodb is managing the index and ram on its own.
However, I am not sure if it does and about the way mongodb can speed up these queries on indexs-only best.
Thanks
It's nice if mongodb is managing the index and ram on its own.
MongoDB does not manage the RAM at all. It uses Memory-Mapped files and basically "pretends" that everything is RAM all of the time.
Instead, the operating system is responsible for managing which objects are kept in RAM. Typically on a LRU basis.
You may want to check the sizes of your indexes. If you cannot keep all of those indexes in RAM, then MongoDB will likely perform poorly.
However, I am not sure if it does and about the way mongodb can speed up these queries on indexs-only best.
MongoDB can use Covered Indexes to retrieve directly from the DB. However, you have to be very specific about the fields returned. If you include fields that are not part of the index, then it will not return "index-only" queries.
The default behavior is to include all fields, so you will need to look at the specific queries and make the appropriate changes to allow "index-only". Note that these queries do not include the _id, which may cause issues down the line.
You don't need to "force" mongo to store indices in memory. An index is brought in memory when you use it and then stays in memory until the OS kicks it out.
MongoDB will will automatically use covered index when it can.
Is it ok to use that views in production? I mean if queries to dictionary is intended to be frequently called or it is designed just for very rare usage with tools like sql navigator, sql developer etc.
It depends on your definition of "frequently", the size of those objects in your database, and why you need to query them.
In general, it's fine to query data dictionary tables on a regular basis in production-- tons of database monitoring tools, for example, will regularly query a bunch of data dictionary tables to gather performance data. At the same time, though, you can easily configure most of these tools to put a tremendous load on your database by gathering too much data too frequently so your performance monitoring tool becomes the source of performance problems. Normally, you can just dial back the amount of data getting captured and the frequency at which it is captured to get 99% of the monitoring benefit without creating a bunch of issues.
I'm not sure why any tool would frequently need to query user_tables-- since tables aren't getting created or destroyed at runtime in a proper system, there aren't too many reasons why you'd really need to query that particular view all that frequently.