In Elasticsearch - Is there a way to combine many completion type under same context? - elasticsearch

I tried to create suggest index which filter by context (I based on the examples at https://www.elastic.co/guide/en/elasticsearch/reference/6.8/suggester-context.html)
I want to filter by same context for every document but I don't want to save the duplication of the context per field, My question if there is a better way to do that?
For Example:
I have the document :
{
"Name":"Lupidon",
"Father":"Zeus",
"context":{
"favorite_food":[
"pizza"
]
}
}
when name and father is completion type and then I could look for prefix of name or father which in the document contains some favorite food

Related

Type of field for prefix search in Elastic Search

I'm confused on what index type I should apply for my field for prefix search, many show search_as_you_type but I think auto complete is not what I'm going for.
I have a UUID field:
id: 34y72ca1-3739-41ff-bbec-f6d17479384c
The following terms should return the doc above:
3
34
34y72ca1
34y72ca1-3739
34y72ca1-3739-41ff-bbec-f6d17479384c
Using 3739 should not return it as it doesn't start with 3739. Initially this is what I was going for but then the wildcard field is not supported by Amazon AWS, so I compromise for prefix search instead of partial search.
I tried search_as_you_type field but it doesn't return the result when I use the whole ID. Actually, my use case is when user click enter, the results will be shown, instead of real-live when they type, so if speed is compromised its OK, just that I hope for something that will be good for many rows of data.
Thanks
If you have not explicitly defined any index mapping, then you need to use id.keyword field instead of the id field for the prefix query to show the appropriate results. This uses the keyword analyzer instead of the standard analyzer
{
"query": {
"prefix": {
"id.keyword": {
"value": "34y72ca1"
}
}
}
}
Otherwise, you can modify your index mapping, by adding multi fields for id field

Elasticsearch query based on properties of another document

Is there a way in ES to do a single query that finds documents that are based on values "close" (whose logic I determine) to values in another document?
Example: i have document like this:
{
"myId": 10,
"price": 200
}
Now I want to run a query that finds documents that are within 100 either side of the price of the above document (but I don't know the price of document on the client..all I have is the myId)
In other words, i want to write a client method like this:
GetSimilarDocuments(int myId);
Is that possible to do in a single ES query? Or do I need two round trips? (get the document, then do another query based on the values of the document)

Elasticsearch 6.0 Removal of mapping types - Alternatives

Background
I migrating my ES index into ES version 6. I currenly stuck because ES6 removed the using on "_type" field.
Old Implementation (ES2)
My software has many users (>100K). Each user has at least one document in ES. So, the hierarchy looks like this:
INDEX -> TYPE -> Document
myindex-> user-123 -> document-1
The key point here is with this structure I can easily remove all the document of specific user.
DELETE /myindex/user-123
(Delete all the document of specific user, with a single command)
The problem
"_type" is no longer supported by ES6.
Possible solution
Instead of using _type, use the index name as USER-ID. So my index will looks like:
"user-123" -> "static-name" -> document
Delete user is done by delete index (instead of delete type in previous implementation).
Questions:
My first worry is about the amount of index and performance: Having like 1M indexes is something that acceptable in terms of performance? don't forget I have to search on them frequently.
Most of my users has small amount of documents stored in ES. Is that make sense to hold a shard, which should be expensive, for < 10 documents?
My data architecture sounds reasonable for you?
Any other tip will be welcome!
Thanks.
I would not have one index per user, it's a waste of resources, especially if there are only 10 docs per user.
What I would do instead is to use filtered aliases, one per user.
So the index would be named users and the type would be a static name, e.g. doc. For user 123, the documents of that user would all be stored in users/doc/xyz and in each document you need to add the user id, e.g.
PUT users/doc/xyz
{
...
"userId": 123,
...
}
Then you can define a filtered alias for all documents of user 123, like this:
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "users",
"alias" : "user-123",
"filter" : { "term" : { "userId" : "123" } }
}
}
]
}
If you need to delete all documents of user 123, then you can simply do it like this:
POST user-123/_delete_by_query?q=*
Having these many indexes is definitely not a good approach. If your only concern to delete multiple documents with a single command. Then you can use Delete by Query API provided by ElasticSearch
You can introduce "subtype" attribute in all your document containing value for each document like "user-" value. So in your case, document would looks like.
{
"attribute1":"value",
"subtype":"user-123"
}

Group by field in found document

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?
You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

one-to-many relationships in Elastic Search

Suppose I have 2 tables called "twitter_user" and "twitter_comments".
twitter_users has the fields: username and bio
twitter_comments has the fields: username and comment
Obviously, an user has 1 entry in twitter_users and potentially many in twitter_comments
I want to model both twitter_users and twitter_comments in Elastic Search, have ES search both models when I query, knowing that a comment counts towards the overall relevancy score for a twitter user.
I know I can mimic this with just 1 model, by creating a single extra field (in addition to username and bio) with all the comments concatenated. But is there another "cleaner" way?
It depends.
If you just want to be able to search for a users comments ,full-text and over all fields, simply store all comments within the user object (no need to concatenate anything):
{
"user" : {
"username" : "TestUser",
"bio" : "whatever",
"comments" : [
{
"title" : "First comment",
"text" : "My 1st comment"
},
{
"title" : "Second comment",
"text" : "My 2nd comment"
}
]
}
}
If you need per-comment-based queries you need to map the comments as nested (before submitting any data), so that every comment gets treated as a single item.
For your scoring, simply add another field "comment_count" and use this for your boost/scoring.
As Thorsten already suggested you can use nested query and it's a good approach.
Alternatively, you can index comments as children of users. Then you can can search users as you do now, search comments using top_children query to find all relevant to your search comments, and finally combine scores from both of them together using bool or dis_max queries.
Nested approach would be more efficient during search, but you will have to reindex the user and all comments every time an additional comment is added. With child/parent approach you will need to index only new comments, but search will be slower and it will require more memory.

Resources