MongoDB: Two-field index vs document field index - performance

I need to index a collection by two fields (unique index), say field1 and field2. What's better approach in terms of performance:
Create a regular two-column index
-or -
Combine those two fields in a single document field {field1 : value, field2 : value2} and index that field?
Note: I will always be querying by those two fields together.

You can keep the columns separate and create a single index that will increase performance when querying both fields together.
db.things.ensureIndex({field1:1, field2:1});
http://www.mongodb.org/display/DOCS/Indexes#Indexes-CompoundKeysIndexes
Having the columns in the same column provides no performance increases, because you must index them the same way:
db.things.ensureIndex({fields.field1:1, fields.field2:1});
http://www.mongodb.org/display/DOCS/Indexes#Indexes-EmbeddedKeys
Or you can index the entire document
db.things.ensureIndex({fields: 1});
http://www.mongodb.org/display/DOCS/Indexes#Indexes-DocumentsasKeys
There could be a possible performance increase, but doubtfully very much. Use the test database, create test data and benchmark some tests to figure it out. We would love to hear your results.

I'd create a compound index over both fields. This'll take up less disk space because you won't need to store the extra combined field, and give you the bonus of an additional index over the first field, i.e. an index over { a:1, b:1 } is also an index over { a:1 }.

Related

Elasticsearch query on string representation of number

Good day:
I have an indexed field called amount, which is of string type. The value of amount can be either one or 1. Say in this example, we have amount=1 as an indexed document but, I try to search for one, ElasticSearch will not return the value unless I put 1 for the search query. Thoughts on how I can get this to work? I'm thinking a tokenizer is what's needed.
Thanks.
You probably don't want this for sevenmillionfourhundredfifteenthousendtwohundredfourteen and the like, but only for a small number of values.
At index time I would convert everything to a proper number and store it in a numerical field, which then even allows to sort --- if you need it. Apart from this I would use synonyms at index and at query time and map everything to the digit-strings, but in a general text field that is searched by default.

RethinkDb OrderBy Before Filter, Performance

The data table is the biggest table in my db. I would like to query the db and then order it by the entries timestamps. Common sense would be to filter first and then manipulate the data.
queryA = r.table('data').filter(filter).filter(r.row('timestamp').minutes().lt(5)).orderBy('timestamp')
But this is not possible, because the filter creates a side table. And the command would throw an error (https://github.com/rethinkdb/rethinkdb/issues/4656).
So I was wondering if I put the orderBy first if this would crash the perfomance when the datatbse gets huge over time.
queryB = r.table('data').orderBy('timestamp').filter(filter).filter(r.row('timestamp').minutes().lt(5))
Currently I order it after querying, but usually datatbases are quicker in these processes.
queryA.run (err, entries)->
...
entries = _.sortBy(entries, 'timestamp').reverse() #this process takes on my local machine ~2000ms
Question:
What is the best approach (performance wise) to query this entries ordered by timestamp.
Edit:
The db is run with one shard.
Using an index is often the best way to improve performance.
For example, an index on the timestamp field can be created:
r.table('data').indexCreate('timestamp')
It can be used to sort documents:
r.table('data').orderBy({index: 'timestamp'})
Or to select a given range, for example the past hour:
r.table('data').between(r.now().sub(60*60), r.now(), {index: 'timestamp'})
The last two operations can be combined int one:
r.table('data').between(r.now().sub(60*60), r.maxval, {index: 'timestamp'}).orderBy({index: 'timestamp'})
Additional filters can also be added. A filter should always be placed after an indexed operation:
r.table('data').orderBy({index: 'timestamp'}).filter({colour: 'red'})
This restriction on filters is only for indexed operations. A regular orderBy can be placed after a filter:
r.table('data').filter({colour: 'red'}).orderBy('timestamp')
For more information, see the RethinkDB documentation: https://www.rethinkdb.com/docs/secondary-indexes/python/

Sort by a different index's values

Given two indexes, I'm trying to sort the first based on values of the second.
For example, Index 1 ('Products') has fields id, name. Index 2 ('Prices') has fields id, price.
Struggling to figure out how to sort 'Products' by the 'Prices'.price, assuming the ids match. Reason for this quest is that hypothetically the 'Products' index becomes very large (with duplicate ids), and updating all documents becomes expensive.
Elasticsearch is a document based store, rather than a column based store. What you're looking for is a way to JOIN the two indices, however this is not supported in Elasticsearch. The 'Elasticsearch way' of storing these documents is to have 1 index that contains all relevant data. If you're worried about update procedures taking very long, look into creating an index with an Alias. When you need to do a major update, do it to a new index and only when you're done switch the alias target to the new index, this will allow you to update you data seamlessly

Oracle 2 Indexes on same columns but different order

I have a table in production environment that has 2 Indexes on a Table with the same columns in the Index but in reversed order.
DDL is
- CREATE INDEX IND_1 ON ORDERS (STORE_ID,DIST_ID)
- CREATE INDEX IND_DL_1 ON ORDERS (DIST_ID,STORE_ID)
Are these two indexes not essentially the same. Why would someone create indexes such a way? Does reversing or changing of column position do something internally?
Indexes are tied to the fields they're indexing, in the order they're being defined in the index. As long as you use the fields in the index, in their left->right order, then index is useable for your query. If you're skipping fields, then the index cannot be used. e.g. given the following index:
CREATE INDEX ind1 ON foo (bar, baz, qux)
then these where clauses will be able to use the index:
WHERE bar=1
WHERE bar=1 AND baz=2
WHERE baz=2 AND bar=1 <--same as before
WHERE bar=1 AND baz=2 AND qux=3
The order you use the indexed fields in the query is not relevant, just that you ARE using them. However, the order they're defined in the index is critical. The following clauses can NOT use the index:
WHERE baz=2 <-- 'bar' not being used
WHERE baz=2 AND qux=3 <-- 'bar' again not being used
WHERE bar=1 AND qux=3 <-- the index can be partially used to find `bar`, but not qux.
For your two cases, there's nothing really wrong with how they're indexed, but it'd be slightly more efficient to index as follows:
(STORE_ID, DIST_ID)
(DIST_ID)
There's no point in indexing store_id in the second index, because the DBMS can use the first index to handle the store_id lookups. It's not a major gain, but still... maintaining indexes is overhead for the DB, and reducing overhead is always a good thing.
Oracle does not have to touch table segments in cases when all the needed information is found in indexes.
In your case these indexes can serve as a quick lookup/translation table STORE_ID => DIST_ID and vice-versa.
Just look at the exec plan for the query where you select only select STORE_ID based on DIST_ID,
the query will only go through index, and will not touch the table itself.
But maybe the reason is different (if any).

Having a string array field as an index, good or bad?

My gut feel is that setting a string (with array elements) field as an index on a table will be bad for performance (where the bulk of the operations done on a table are inserts and updates - the table holds transactional data and its current size is approximately 20 mil records).
The string extends a type with 4 array elements, where all of them aren’t always populated. I need to justify why not to set this field as one of the indexes. I’ve tried searching for answers, reading Kimberley Tripps blog, going through best practises re indexes on MSDN (which only mentions indexes are best on numerics first, then string fields), etc. But none of these mention indexing the table on a field that is of an array type. What reasons can I give to justify not indexing on the string-array field. And if my gut feel is totally wrong and indexes work well on array fields, why so?
A Memo or Container field cannot be part of an index in AX.
Furthermore, columns consisting of the ntext, text, or image data types cannot be specified as columns for an index in SQL Server.
Let's say you have an extended data type ArrElement with 3 additional array elements ArrElement2, ArrElement3, ArrElement4. Creating an index with a field of the ArrElement type in AX will effectively create an index with 4 fields (ArrElement, ArrElement2, ArrElement3, and ArrElement4 - in that order) in SQL Server. You cannot change the order of the array elements in the index, but in my opinion there's really nothing wrong in having such an index if it really serves your purpose. Hope that answers your question.
As #10p noted adding say Dimension as the only field, will create an index of all the array elements: Dimension, Dimension2_, Dimension3_ (which are the names of the SQL table fields).
The value of such an index will depend on the queries performed. If only Dimension[3] is queried, then the index is of no value because Dimension[1] and Dimension[2] is not known.
This could be solved by creating an index for each of the array elements, for example:
Dim1Idx: Dimension[1] (maybe append more fields)
Dim2Idx: Dimension[2] (maybe append more fields)
Dim3Idx: Dimension[3] (maybe append more fields)
Individual array elements can be selected by using the combo-box on the index field.
The value of such indexes should be weighted against the added cost of insertion (and update, if the array values are changed).

Resources