Elasticsearch - Unique values in a field of an index - elasticsearch

I have an index of a following type:
{
company: {
watchlist: [ {id: 1}, {id: 2}, {id, 1} ]
}
}
In the watchlist array in the indexes, duplicate values are stored. I want the indexes not to store duplicate values as this is increasing the size of my index.
I know that i can get unique values by calling aggregation, but what I want to do here is to store unique values in the index.
I am using elasticsearch rails here, it indexes data according to the json returned from 'as_indexed_json' method. The data for the above index is in sql database, which i cannot change. I can only create indexes from that database, so i need some 'uniqueness' constraint on the field 'watchlist'.
Is there a way to do it?

Related

ElasticSearch - backward pagination with search_after when sorting value is null

I have an application which has a dashboard, basically a table with hundreds of thousands of records.
This table has up to 50 different columns. These columns have different types in mapping: keyword, text, boolean, integer.
As records in the table might have the same values, I use sorting as an array of 2 attributes:
First attribute is what client wants to sort by. It can be a simple
sorting object or some sort query with nested filter.
Second
attribute is basically a default sorting by id, needed for sorting
the documents which have identical values for the column customer
wants to sort by.
I checked multiple topics/issues on github and here
on elastic forum to understand how to implement search_after
mechanism for back sorting but it's not working for all the cases I
need.
Please have a look at the image:
Imagine there is a limit = 3, the customer right now is on the 3d page of a table and all the data is sorted by name asc, _id asc
The names are: A, B, C, D, E on the image.
The ids are numeric parts of the Doc word.
When customer wants to go back to the previous page, which is a page #2 on my picture, what I do is pass the following to elastic:
sort: [
{
name: 'desc'
},
{
_id: 'desc'
}
],
search_after: [null, Doc7._id]
As as result, I get only one document, which is Doc6: null on my image. It seems to be logical, because I ask elastic to search by desc after null and id 7 and I have only 1 doc corresponding this..it's Doc6 but it's not what I need.
I can't make up the solution to get the data that I need.
Could anyone help, please?

Adding a New field too slow

I want to add a new field to my index which includes more than 20m documents. I have dictionary like this Template : [Catalog_id : {Keyword: Sold Count}]
Sold Counts = {1234: {Apple:50}, 3242: {Banana:20}, 3423: {Apple:23}, ...}
In the index, there are many documents which share the same catalog_id. According to each document's catalog_id, I want to add a new field.
_id: 12323423423, catalog_id: 1234, name: '....', **Sold Count: [Apple,50]**
What is the best way to insert a new field in this situtation?

MongoDB index: object keys vs array of strings

I'm new to the MongoDB and have been researching schema designs and indexing. I know you can index a property regardless of its value (ID, array, subdocument, etc...) but what I don't know is if there is a performance benefit to either indexing an array of strings or a nested object's keys.
Here's an example of both scenarios that I'm contemplating (in Mongoose):
// schema
mongoose.Schema({
visibility: {
usa: Boolean,
europe: Boolean,
other: Boolean
}
});
// query
Model.find({"visibility.usa": true});
OR
// schema
mongoose.Schema({
visibility: [String] // strings could be "usa", "europe", and/or "other"
});
// query
Model.find({visibility: "usa"});
Documents could have one, two, or all three visibility options.
Furthermore, if I went with the Boolean object design, could I simple index the visibility field or would I need to put an index on usa, europe, and other?
In MongoDB creating indexes on an array of strings results into multiKey index where all the strings in the array form the index keys and point to the same document.So in your case it would work same as nested object keys.
If you go with boolean design and you can put index on visibility field.You can further read on MongoDB Mulitkey indexing

For 1 billion documents, Populate data from one field to another fields in the same collection using MongoDB

I need to populate data from one field to multiple fields on the same collection. For example:
Currently I have document like below:
{ _id: 1, temp_data: {temp1: [1,2,3], temp2: "foo bar"} }
I want to populate into two different fields on the same collection as like below:
{ _id: 1, temp1: [1,2,3], temp2: "foo bar" }
I have one billion documents to migrate. Please suggest me the efficient way to update all one billion documents?
In your favorite language, write a tool that runs through all documents, migrates them, and store them in a new database.
Some hints:
When iterating the results, make sure they are sorted (e.g. on the _id) so you can implement resume should your migration code crash at 90%...
Do batch inserts: read, say, 1000 items, migrate them, then write 1000 items in a single batch to the new database. Reads are automatically batched.
Create indexes after the migration, not before. That will be faster and lead to less fragmentation
Here I made a query for you, use following query to migrate your data
db.collection.find().forEach(function(myDoc) {
db.collection_new.update(
{_id: myDoc._id},
{
$unset: {'temp_data': 1},
$set: {
'temp1': myDoc.temp_data.temp1,
'temp2': myDoc.temp_data.temp2
}
},
{ upsert: true }
)
});
To learn more about foreach cursor please visit link
Need $limit and $skip operator to migrate data in batches. In update query i have used upsert beacuse there if already exist it will update otherwise inserted entry wiil be new.
Thanks

Indexes for mongodb

I have a mongo db collection for restaurants.
e.g.
{_id: uniquemongoid,
rank: 3,
city: 'Berlin'
}
Restaurants are listed by city and ordered by rank (an integer) - should I create an index on city and rank, or city/rank compound? (I query by city and sort by rank)
Furthermore there are several fields with booleans e.g. { hasParking:true, familyFriendly:true } - should I create indexes to speed up queries for these filters? compound indexes? Its not clear for me if I should create compound indexes as the queries can have only one boolean set or more booleans set.
The best way to figure out whether you need indexes is to benchmark it with "explain()".
As for your suggested indexes:
You will need the city/rank compound index. Indexes in MongoDB can only be used for left-to-right (at the moment) and hence doing an equality search on "city" and then sorting the result by "rank" will mean that the { city: 1, rank: -1 } index would work best.
Indexes on boolean fields are often not very useful, as on average MongoDB will still need to access half of your documents. After doing a selection by city (and hopefully a limit!) doing an extra filter for hasParking etc will not make MongoDB use both the city/rank and the hasParking index. MongoDB can only use one index per query.
1) create index { restaurant:1, rank: 1} which will serve your purpose.
You will avoid 2 indexes
2) Create a document in following format and you can query for any no of fields you want.
{
info: [{hasParking:true}, {familyFriendly:true}],
_id:
rank:
city:
}
db.restaurants.ensureIndex({info : 1});
db.restaurants.find({ info :{ hasParking:true}})
Note MongoDB don't use two index for the same query (except $or queries). So, in the (2) case, if you want to add addition filter over the (1) query, then this (2) option won't work. I am not sure of your (2) requirement, so posting this solution.

Resources