CouchDB query performance - performance

If the number of documents is more will the querying of data gets slower in CouchDB?
Example Scenario:
I have a combobox in a form for customer name. When the user types the customer name, I have to do autofilling.
There will be around 10k customer documents in the CouchDB. I understand that i have to create a view to do the same.
CouchDB database is in the local machine where the application resides.
Question:
Will it take more than 2 - 3 seconds to query the DB for matching customer names?
Will querying take more time for each query if there are many documents in the CouchDB (say around 100000 documents)?
Any pointers on how to create views/index will be helpful.
Thanks in advance.

The view runs on every document, but only once. After that, the document's view value(s) are stored forever. Fetching a customer by name will be very fast because you would normally have only a few new documents to process in the view at query time.
Query time will not increase noticeably if you have more documents. Technically, access times grow logarithmically with the number of documents. However, in practice fetching documents is basically constant time and very unlikely to be a problem.

Related

Elasticsearch index design

I am maintaining a years of user's activity including browse, purchase data. Each entry in browse/purchase is a json object:{item_id: id1, item_name, name1, category: c1, brand:b1, event_time: t1} .
I would like to compose different queries such like getting all customers who browsed item A, and or purchased item B within time range t1 to t2. There are tens of millions customers.
My current design is to use nested object for each customer:
customer1:
customer_id,id1,
name: name1,
country: US,
browse: [{browseentry1_json},{browseentry2_json},...],
purchase: [{purchase entry1_json},{purchase entry2_json},...]
With this design, I can easily compose all kinds of queries with nested query. The only problem is that it is hard to expire older browse/purchase data: I only wanna keep, for example, a years of browse/purchase data. In this design, I will have to at some point, read the entire index out, delete the expired browse/purchase data, and write them back.
Another design is to use parent/child structure.
type: user is the parent of type browse and purchase.
type browse will contain each browse entry.
Although deleting old data seems easier with delete by query, for the above query, I will have to do multiple and/or has_child queries,and it would be much less performant. In fact, initially i was using parent/child structure, but the query time seemed really long. I thus gave it up and tried to switch to nested object.
I am also thinking about using nested object, but break the data into different index(like monthly index) so that I can easily expire old data. The problem with this approach is that I have to query across those multiple indexes, and do aggregation on that to get the distinct users, which I assume will be much slower.(havn't tried yet). One requirement of this project is to be able to give the count of the queries in acceptable time frame.(like seconds) and I am afraid this approach may not be acceptable.
The ES cluster is 7 machines, each 8 cores and 32G memory.
Any suggestions?
Thanks in advance!
Chen
Instead of creating a customers index I would create a "Browsing" indices (indexes) and a "Purchasing" indices separated by a timespan (EG: Monthly, as you mentioned in your last paragraph).
In each struct I would add the customer fields. Now you are facing two different approaches:
1. You can add only a reference to the customer (such as id) and make another query to get his details.
2. If you don't have any storage problem you can keep all the customer's data in each struct.
if this doesn't enough for performance you can combine it with "routing" and save all specific user's data on the same shard. and Elasticsearch won't need to fetch data between shards (you can watch this video where Shay Benon explains about "user data flow")
Niv

What would the perfomance and cost of storing every get request made into a "views" table?

I'm thinking about tracking page views for dynamic pages on my website for pages like the url below:
example.com/things/12456
I'm currently using Ruby on Rails with postgresql, on Heroku.
If I store EVERY get request into a table, every time a user views it, the database could grow extremely large, very quickly. Ideally, I'd like to track the time stamp, user id and user role of each request as well, so each view would have to be a row in the table, as opposed to having a "count" column for each resource.
I'd also like to make aggregate queries on this large table, for things like, total count per resource over a time period.
In terms of performance and cost, would this make sense to do? Are there better alternatives out there?
EDIT: Let's say I have a 1000 views a day, with each user viewing 10 pages each. And I'm making 500 aggregate requests/day.
Would this be expensive or non-scalable?
(I'd also need to store POST, PUT and DELETE requests as well, into an actions table, which fits into this very same problem)

What is the most efficient way to filter a search?

I am working with node.js and mongodb.
I am going to have a database setup and use socket.io to have real-time updates that will have the db queried again as well or push the new update to the client.
I am trying to figure out what is the best way to filter the database?
Some more information in regards to what is being queried and what the real time updates are:
A document in the database will include information such as an address, city, time, number of packages, name, price.
Filters include city/price/name/time (meaning only to see addresses within the same city, or within the same time period)
Real-time info: includes adding a new document to the database which will essentially update the admin on the website with a notification of a new address added.
Method 1: Query the db with the filters being searched?
Method 2: Query the db for all searches and then filter it on the client side (Javascript)?
Method 3: Query the db for all searches then store it in localStorage then query localStorage for what the filters are?
Trying to figure out what is the fastest way for the user to filter it?
Also, if it is different than what is the most cost effective way, then the most cost effective as well (which I am assuming is less db queries)...
It's hard to say because we don't see exact conditions of the filter, but in general:
Mongo can use only 1 index in a query condition. Thus whatever fields are covered by this index can be used in an efficient filtering. Otherwise it might do full table scan which is slow. If you are using an index then you are probably doing the most efficient query. (Mongo can still use another index for sorting though).
Sometimes you will be forced to do processing on client side because Mongo can't do what you want or it takes too many queries.
The least efficient option is to store results somewhere just because IO is slow. This would only benefit you if you use them as cache and do not recalculate.
Also consider overhead and latency of networking. If you have to send lots of data back to the client it will be slower. In general Mongo will do better job filtering stuff than you would do on the client.
According to you if you can filter by addresses within time period then you could have an index that cuts down lots of documents. You most likely need a compound index - multiple fields.

Joomla getItems default Pagination

Can anyone tell me if the getItems() function in the model automatically adds the globally set LIMIT before it actions the query (from getListQuery()). Joomla is really struggling, seemingly trying to cache the entire results (over 1 million records here!).
After looking in /libraries/legacy/model/list.php AND /libraries/legacy/model/legacy.php it appears that getItems() does add LIMIT to setQuery using $this->getState('list.limit') before it sends the results to the cache but if this is the case - why is Joomla struggling so much.
So what's going on? How come phpMyAdmin can return the limited results within a second and Joomla just times out?
Many thanks!
If you have one million records, you'll most definitely want to do as Riccardo is suggesting, override and optimize the model.
JModelList runs the query twice, once for the pagination numbers and then for the display query itself. You'll want to carefully inherit from JModellist to avoid the pagination query.
Also, the articles query is notorious for it's joins. You can definitely lose some of that slowdown (doubt you are using the contacts link, for example).
If all articles are visible to public, you can remove the ACL check - that's pretty costly.
There is no DBA from the West or the East who is able to explain why all of those GROUP BY's are needed, either.
Losing those things will help considerably. In fact, building your query from scratch might be best.
It does add the pagination automatically.
Its struggling is most likely due to a large dataset (i.e. 1000+ items returned in the collection) and many lookup fields: the content modules for example join as many as 10 tables, to get author names etc.
This can be a real killer, I had queries running for over one second with a dedicated server and only 3000 content items. One tag cloud component we found could take as long as 45 seconds to return a keywords list. If this is the situation (a lot of records and many joins), your only way out is to further limit the filters in the options to see if you can get some faster results (for example, limiting to articles in the last 3 months can reduce the time needed dramatically).
But if this is not sufficient or not viable, you're left with writing a new optimized query in a new model, which ultimately will bring the best performance optimization of any other optimization. In writing the query, consider leveraging the database specific optimizations, i.e. adding indexes, full-text indexes and only use joins if you really need them.
Also consider that joins must never grow with the number of fields, translations or else.
A constant query is easy for the db engine to optimize and cache, whilst a dynamic query will never be as efficient.

How to optimize data fetching in SQL Developer?

I am working on Oracle SQL Developer and have created tables with Wikipedia data, so size of data is very huge and have 7 tables. I have created a search engine which fetches and display data using JSP, but the problem is that for each query the application has to access 4 tables making my application very time consuming.
I have added indexes to all tables but still it takes more time, so any suggestion on how to optimize my app and reduce time it is taking to display result.
There are several approaches you can take to tune your application. And it could be either tuning at the database end, front end or a combination of the two.
At the database end you could be looking at say a materialized view to summarize the more commonly searched data. This could either be for your search purposes only or to reduce the size and complexity of the resultset. You might also look at tuning the query itself - perhaps placing indexes on the relevant WHERE clauses of your search or look at denormalizing your tables.
At the application end - the retrieval of vast recordsets - can always cause problems where a single record is large (multi-columned) and the number or records in the resultset - numerous.
What you are probably looking for is a rapid response time from your application so your user doesn't feel they are waiting ... and waiting.
A technique I have seen and used is to retrieve the resultset either as
1) a recordset of ROWIDs and to page through these ROWIDs on the display
2) a simulated "paged" recordset. Retrieving the recordset in chunks.

Resources