Does query by id faster then other queries in SOLR (Lucene)? - performance

Id is a primary key, another_field is a some string field.
http://localhost:8983/solr/select?q=id:c2c32773-1691-11df-97a5-7038c432aabf
http://localhost:8983/solr/select?q=another_field:c2c32773-1691-11df-97a5-7038c432aabf
Is the first query faster?

Query on id does not make it faster then any other query unless the fields are text fields which would take it a bit longer depending upon the analysis performed.
Also if you want to query on Id or fixed valued fields better using the Filter Query which would be much more faster cause of the field caching.

Related

Hibernate criteria query with restrictions takes long execution time

In our application we are using Criteria query with about 15 restrictions to filter some results.
The query is like this :
Select column1,column2,colum3... From Table where column=?1 and column=?2....
Based upon the input the AND clause will grow. I really can't find where lies the issue.
criteria.addRestrictions() is the one we are using to create this kind of dynamic query. We are validating the input simple If clause with a predicate input!=null. All the inputs are string only. And also we don't have indexes for every column. I know without indexes, a full table scan will be performed. But I want to know which is causing long execution time for that query.
Any help is appreciated. Thanks .

Sorting Mongo records on non unique created date time field

I am sorting on created field which is in the form of 2022-03-26T03:56:13.176+00:00 and is a representation of a java.time.LocalDateTime.
Sorting by only created is not consistent as the field is not unique, due to batch operations that run quickly enough to result in duplicates.
I've added a second sort, on _id, which is ObjectId.
It seems the second sort adds quite a bit of time to the query, more so than the first, which is odd to me.
Why does it more than double response time, and is there a more preferred way to ensure the order?
Using MongoTemplate, I sort like this:
query.with(Sort.by(Sort.Order.desc("createdOn"), Sort.Order.desc("_id")));
If your use case is always to sort in descending order on both fileds it is best to create the compound index in the expected sort order as follow:
db.collection.createIndex({ createdOn:-1,_id:-1 })
But in general the default _id field is containing the document insertion date and it is unique accross mongodb process so you may just sort based on _id , you most porbably don't need to sort additionally on createdOn date ...

RethinkDb OrderBy Before Filter, Performance

The data table is the biggest table in my db. I would like to query the db and then order it by the entries timestamps. Common sense would be to filter first and then manipulate the data.
queryA = r.table('data').filter(filter).filter(r.row('timestamp').minutes().lt(5)).orderBy('timestamp')
But this is not possible, because the filter creates a side table. And the command would throw an error (https://github.com/rethinkdb/rethinkdb/issues/4656).
So I was wondering if I put the orderBy first if this would crash the perfomance when the datatbse gets huge over time.
queryB = r.table('data').orderBy('timestamp').filter(filter).filter(r.row('timestamp').minutes().lt(5))
Currently I order it after querying, but usually datatbases are quicker in these processes.
queryA.run (err, entries)->
...
entries = _.sortBy(entries, 'timestamp').reverse() #this process takes on my local machine ~2000ms
Question:
What is the best approach (performance wise) to query this entries ordered by timestamp.
Edit:
The db is run with one shard.
Using an index is often the best way to improve performance.
For example, an index on the timestamp field can be created:
r.table('data').indexCreate('timestamp')
It can be used to sort documents:
r.table('data').orderBy({index: 'timestamp'})
Or to select a given range, for example the past hour:
r.table('data').between(r.now().sub(60*60), r.now(), {index: 'timestamp'})
The last two operations can be combined int one:
r.table('data').between(r.now().sub(60*60), r.maxval, {index: 'timestamp'}).orderBy({index: 'timestamp'})
Additional filters can also be added. A filter should always be placed after an indexed operation:
r.table('data').orderBy({index: 'timestamp'}).filter({colour: 'red'})
This restriction on filters is only for indexed operations. A regular orderBy can be placed after a filter:
r.table('data').filter({colour: 'red'}).orderBy('timestamp')
For more information, see the RethinkDB documentation: https://www.rethinkdb.com/docs/secondary-indexes/python/

Do mongo find queries perform faster with more criteria?

Does performance improve by limiting the find (or findOne) with more criteria?
An example:
db.users.find({_id : ObjectId("111111111111111111111111")})
db.users.find({_id : ObjectId("111111111111111111111111"), accountId : ObjectId("22222222222222222222222")})
Another example:
db.users.find({full_name: 'Lionel Messi'})
db.users.find({full_name : 'Lionel Messi', first_name : 'Lionel', last_name : 'Messi' })
Typically, no. Because mongoDB tends to return a cursor of the first N values found, if you're being more specific, it will take longer to find values matching that criteria.
If you want to see what could be effecting the speed of your query, its a good idea to use the explain() method.
See here for more details: http://docs.mongodb.org/manual/tutorial/analyze-query-plan/
No since you are using _id which is unique.
As for making the query slower: it could be slower by nanoseconds at most if there is not a compound index on {_id, accountId} since once the documents by the _id index have been found they will be loaded into memory to match the accountId field.
MongoDB will find by index before looking at fields which are not witin the selected index.
However since your query (being uncovered) will load the document prior to returning anyway the only thing slowing the query down is that final match which is basically negliable in speed.
In this case no. _id is indexed automatically and it uniquely identifies documents. The first criteria
{_id : ObjectId("111111111111111111111111")}
will find the document using the index. Checking the value of accountId will actually make the query slower because MongoDB has to check an other value.

Sort on a Ref<?> attribute - Objectify Query

I am struck in a data operation where I want to sort results of a query by a Ref field.
Lets say I have the following Data Objects.
EmployeeDO {Long id, String name, Ref refCompany}
CompanyDO {Long id, String name}
Now i want to query employees arranged by company name.
I tried the query
Query<EmployeeDO> query = ofy().load().type(EmployeeDO.class).order("refCompany");
Obviously this did not sort results with company name, but this compiled successfully too.
Please suggest if such sorting is possible by this way or some other workaround can be tried?
You can order by refCompany if you #Index refCompany, but it won't sort by company name - it will index by the key (if you aren't using #Parent, just an id order).
There are two 'usual' choices:
Load the data into ram and sort there. This is what an RDBMS would do internally. It's not exactly true that GAE doesn't support joins; it's just that you're the query planner.
Denormalize and pre-index the companyName. Put #Index companyName in the EmployeeDO. This is what you would do with an RDBMS if you the magic sorting performed poorly (say, there are too many employees).

Resources