Why do same query on couchdb takes different amount of time ? - performance

I have a couch db application and for most of the views I notice that the time taken by the server to return a response varies from 10ms to 100ms. I do not have any concurrent write operations on the server and there are at the most 10 concurrent read requests.
How should I diagnose the problem ? Where you I look ?
I am running it on a rackspace cloud machine with 1GB RAM.

From the Couchdb Guide:
If you read carefully over the last few paragraphs, one part stands out: “When you query your view, CouchDB takes the source code and runs it for you on every document in the database.” If you have a lot of documents, that takes quite a bit of time and you might wonder if it is not horribly inefficient to do this. Yes, it would be, but CouchDB is designed to avoid any extra costs: it only runs through all documents once, when you first query your view. If a document is changed, the map function is only run once, to recompute the keys and values for that single document.
Most likely you are seeing the views be regenerated and recached.

Related

Why real time is much higher than "user" and "system" CPU TIME combined?

We have a batch process that executes every day. This week, a job that usually does not past 18 minutes of execution time (real time, as you can see), now is taking more than 45 minutes to finish.
Fullstimmer option is already active, but we don't know why only the real time was increased.
In old documentation there are Fullstimmer stats that could help identify the problem but they do not appear in batch log. (The stats are those down below: Page Faults, Context Switches, Block Operation and so on, as you can see)
It might be an I/O issue. Does anyone know how we can identify if it is really an I/O problem or if it could be some other issue (network, for example)?
To be more specific, this is one of the queries that have increased in time dramatically. As you can see, it is reading from a data base (SQL Server, VAULT schema) and work and writing in work directory.
Number of observations its almost the same:
We asked customer about any change in network traffic, and they said still the same.
Thanks in advance.
For a process to complete, much more needs to be done than the actual calculations on the CPU.
Your data has te be read and your results have to be written.
You might have to wait for other processes to finish first, and if your process includes multiple steps, writing to and reading from disk each time, you will have to wait for the CPU each time too.
In our situation, if real time is much larger than cpu time, we usually see much trafic to our Network File System (nfs).
As a programmer, you might notice that storing intermediate results in WORK is more efficient then on remote libraries.
You might safe much time by creating intermediate results as views instead of tables, IF you only use them once. That is not only possible in SQL, but also in data steps like this
data MY_RESULT / view=MY_RESULT;
set MY_DATA;
where transaction_date between '1jan2022'd and 30jun2022'd;
run;

SQLite not utilizing full system resources during slow SELECT queries

I'm performing some select queries with SQLite. The columns are already indexed.
Queries are in the format:
SELECT stuff
FROM table
WHERE haystack LIKE "%needle%"
I'm also loading the db into memory by running RESTORE FROM my_db.db and this seems to be working (used memory goes up)
The problem is - queries like the above still take a long time (close to 100ms), and I need to run a lot of them (thousands at a time), which means the results take up to several minutes to arrive.
However while the queries are running, I see a very low CPU usage - around 25-35% on average on all cores.
I think SQLite might be throttling its own performance somehow and not using all of the CPU.
Is there a way to get it to not throttle, and to utilize the full system resources so queries are faster?
Thanks.

MongoDB small collection Query very slow

I have a 33MB collection with around 33k items in it. This has been working perfectly for the past month and the queries were responsive and no slow queries. The collection have all the required indexes, and normally the response is almost instant(1-2ms)
Today I spotted that there was a major query queue and the requests were just not getting processed. The Oplog was filling up and just not clearing. After some searching I found the post below which suggest compacting and databaseRepair. I ran the repair and it fixed the problem. Ridiculously slow mongoDB query on small collection in simple but big database
My question is what could have gone wrong with the collection and how did databaseRepair fix the problem? Is there a way for me to ensure this does not happen again?
There are many things that could be an issue here, but ultimately if a repair/compact solved things for you it suggests storage related issues. Here are a few suggestions to follow up on:
Disk performance: Ensure that your disks are performing properly and that you do not have bad sectors. If part of your disk is damaged it could have spiked access times and you may run into this again. You may want to test your RAM modules as well.
Fragmentation: Without knowing your write profile it's hard to say, but your collections and indexes could have fragmented all over your storage system. Running repair will have rebuilt them and brought them back into a more contiguous form, allowing your disk access times to be much faster, especially if you're using mechanical disks and are going to disk for a lot of data.
If this was the issue then you may want to adjust your paddingFactor to reduce the frequency of this in the future, especially if your updates are growing the size of your documents over time. (Assuming you're using MMAPv1 storage).
Page faults: I'm assuming you may have brought the system down to do the repair, which may have reset your memory/working set. You might want to monitor for hard page faults that indicate that your queries are being bottlenecked by IO rather than being served by your in-memory working set. If this is consistently the case, your application behavior may change unexpectedly as data gets pushed in and out of memory, and you may need to add more RAM.

How to compare performance on neo4j queries without cache?

I've been trying to compare queries performance in neo4j.
In order to make the queries more efficient, I added index, analysed the result using profile, and tried doing the same while using the USING INDEX.
On most queries, DB Hits were much better using the second option (with the USING INDEX), rows were the same or less, but the time performance seems not to be reliable: on several queries adding the USING INDEX was slower though the better performance parameters (db hits & rows)and times got much better by re-executing a query.
In order to stop the cache's interfering, went to the the properties file, changed the cache_type in the neo4j.properties to none and restarted neo, but it still seems like the results of the same query comes faster each time (until a certain point).
What will be the best way to test it?
Neo4j has (up to 2.2.x) a two layered cache architecture. With cache_type=node you switch of just the object cache. To disable page cache, you can use dbms.pagecache.memory=0. However if all caches are disabled you basically measure the speed of your IO subsystem since every query goes down to the bare metal and reads from disc.
I recommend a different approach: enable both caches and run the queries you want to compare multiple times to warm up caches. Take measurement on warmed cache since this is much closer to a real production scenario.
On a side note: in Neo4j 2.3 the object cache will go away and we just have the page cache.

Bulk insert performance in MongoDB for large collections

I'm using the BulkWriteOperation (java driver) to store data in large chunks. At first it seems to be working fine, but when the collection grows in size, the inserts can take quite a lot of time.
Currently for a collection of 20M documents, bulk insert of 1000 documents could take about 10 seconds.
Is there a way to make inserts independent of collection size?
I don't have any updates or upserts, it's always new data I'm inserting.
Judging from the log, there doesn't seem to be any issue with locks.
Each document has a time field which is indexed, but it's linearly growing so I don't see any need for mongo to take the time to reorganize the indexes.
I'd love to hear some ideas for improving the performance
Thanks
You believe that the indexing does not require any document reorganisation and the way you described the index suggests that a right handed index is ok. So, indexing seems to be ruled out as an issue. You could of course - as suggested above - definitively rule this out by dropping the index and re running your bulk writes.
Aside from indexing, I would …
Consider whether your disk can keep up with the volume of data you are persisting. More details on this in the Mongo docs
Use profiling to understand what’s happening with your writes
Do have any index in your collection?
If yes, it has to take time to build index tree.
is data time-series?
if yes, use updates more than inserts. Please read this blog. The blog suggests in-place updates more efficient than inserts (https://www.mongodb.com/blog/post/schema-design-for-time-series-data-in-mongodb)
do you have a capability to setup sharded collections?
if yes, it would reduce time (tested it in 3 sharded servers with 15million ip geo entry records)
Disk utilization & CPU: Check the disk utilization and CPU and see if any of these are maxing out.
Apparently, it should be the disk which is causing this issue for you.
Mongo log:
Also, if a 1000 bulk query is taking 10sec, then check for mongo log if there are any few inserts in the 1000 bulk that are taking time. If there are any such queries, then you can narrow down your analysis
Another thing that's not clear is the order of queries that happen on your Mongo instance. Is inserts the only operation that happens or there are other find queries that run too? If yes, then you should look at scaling up whatever resource is maxing out.

Resources