Elasticsearch Autocomplete on Specific field in Specific Document - elasticsearch

I have Documents that contain many fields which are lists of values.
I would like to be able to autocomplete from one specific such field at a time in one specific document without data duplication (like Completion Suggestors)
For example, I would like to be able to autocomplete after 3 characters from the values in the category field of the document with id: '7'.
I tried to implement something based on this but this doesn't seem to work on a list of values.

For filtering the suggestions by a field, you can add the fields to filter on in context.
"category":{
type: "completion",
payloads: false,
context: {
id: {
type: "category",
path: "id"
}
}
}
You can index the document as :
POST /myindex/myitem/1
{
id: 123,
category: {
input: "my category",
context: {
id: 123
}
}
}
The minimum length check has to be applied on the client side. ES suggesters do not provide anything like that.
Now, you can suggest on category field with a filter on id field.

Related

Atlas Search Index partial match

I have a test collection with these two documents:
{ _id: ObjectId("636ce11889a00c51cac27779"), sku: 'kw-lids-0009' }
{ _id: ObjectId("636ce14b89a00c51cac2777a"), sku: 'kw-fs66-gre' }
I've created a search index with this definition:
{
"analyzer": "lucene.standard",
"searchAnalyzer": "lucene.standard",
"mappings": {
"dynamic": false,
"fields": {
"sku": {
"type": "string"
}
}
}
}
If I run this aggregation:
[{
$search: {
index: 'test',
text: {
query: 'kw-fs',
path: 'sku'
}
}
}]
Why do I get 2 results? I only expected the one with sku: 'kw-fs66-gre' 😬
During indexing, the standard anlyzer breaks the string "kw-lids-0009" into 3 tokens [kw][lids][0009], and similarly tokenizes "kw-fs66-gre" as [kw][fs66][gre]. When you query for "kw-fs", the same analyzer tokenizes the query as [kw][fs], and so Lucene matches on both documents, as both have the [kw] token in the index.
To get the behavior you're looking for, you should index the sku field as type autocomplete and use the autocomplete operator in your $search stage instead of text
You're still getting 2 results because of the tokenization, i.e., you're still matching on [kw] in two documents. If you search for "fs66", you'll get a single match only. Results are scored based on relevance, they are not filtered. You can add {$project: {score: { $meta: "searchScore" }}} to your pipeline and see the difference in score between the matching documents.
If you are looking to get exact matches only, you can look to using the keyword analyzer or a custom analyzer that will strip the dashes, so you deal w/ a single token per field and not 3

Elasticsearch multiple suggestions with more advanced cases like matching prefix in the middle of a sentence

My use case : I have a search bar when the user can type his query. I want to show multiple types of search suggestions to the user in addition to a regular query suggestion. For example, in the screenshot below, as you can see in this screenshot, you can see there are company sector, companies, and schools suggestions.
This is currently implemented using completion suggesters and the following mappings (this is code from our Ruby implementation, but I believe you should be able to understand it easily)
{
_source: '',
suggest: {
text: query_from_the_user, # User query like "sec" to find "security" related matches
'school_names': {
completion: {
field: 'school_names_suggest',
},
},
'companies': {
completion: {
field: 'company_name.suggest',
},
},
'sectors': {
completion: {
field: sector_field_based_on_current_language(I18n.locale),
# uses 'company_sector.french.suggest' when the user browses in french
},
},
},
}
Here are my mappings (this is written in Ruby as but I believe it shouldn't be too hard to mentally convert this to Elasticsearch JSON config
indexes :company_name, type: 'text' do
indexes :suggest, type: 'completion'
end
indexes :company_sector, type: 'object' do
indexes :french, type: 'text' do
indexes :suggest, type: 'completion'
end
indexes :english, type: 'text' do
indexes :suggest, type: 'completion'
end
end
indexes :school_names_suggest, type: 'completion'
# sample Indexed JSON
{
company_name: "Christian Dior Couture",
company_sector: {
english: 'Milk sector',
french: 'Secteur laitier'
},
school_names_suggest: ['Télécom ParisTech', 'Ecole Centrale Paris']
}
The problem is the suggestion is not powerful enough and cannot autocomplete based on the middle of a sentence and provide additional results even after a perfect match. Here are some scenarios that I need to capture with my ES implementation
CASE 1 - Matching by prefix in the middle of a sentence
# documents
[{ company_name: "Christian Dior Couture" }]
# => A search term "Dior" should return this document because it matches by prefix on the second word
CASE 2 - Provide results even after a perfect match
# documents
[
{ company_name: "Crédit Agricole" },
{ company_name: "Crédit Agricole Pyrénées Gascogne" },
]
# => A search term "Crédit Agricole" should return both documents (using the current implementation it only returns "Crédit Agricole"
Can I implement this using suggesters in Elasticsearch ? Or do I need to fall back to multiple search that would take advantage of the new search-as-you-type data type using a query as mentionned in the doc ?
I am using elasticsearch 7.1 on AWS and the Ruby driver (gem elasticsearch-7.3.0)

Having values as keys VS having them as a nested object array in ElasticSearch

Currently , I have a elasticsearch index with a field that has subfields like say A,B,C as below:
"myfield":{
"A":{
"name":"A",
"prop1":{
"sub-prop1":1,
"sub-prop2":2
},
"prop2":{}
},
"B":{
"name":"B",
"prop1":{
"sub-prop1":3,
"sub-prop2":8,
"sub-prop3":4,
"sub-prop4":7,
},
"prop2":{}
},
"C":{}
}
As can be seen, the structure of A and B fields are same, but the sub-props under the prop1 can be dynamic , meaning based on the documents added, the mapping might change but its not an issue as A and B exist as separate keys.However, because of this I am facing another problem, in that keeping on adding new documents, due to dynamic mapping, its possible that such sub-props or sub-fields like A,B,C,D ... and so on keep getting added to the mapping, which in turn might cause the mapping to exceed the index.mapping.total_fields.limit ,so to avoid that I am planning to make "myfield" and "prop1" fields as array of objects instead in the mapping, so that the fields A,B,C... are stored as array elements instead of keep getting added to the mapping as new fields.
The question is - is this a feasible solution and how to search for say, "myfield.A.prop1.sub-prop1" >= 3
the new mapping looks something like:
"myfield":[
{
"name":"A",
"prop1":{
"sub-prop1":1,
"sub-prop2":2
},
"prop2":{}
},
{
"name":"B",
"prop1":{
"sub-prop1":3,
"sub-prop2":8,
"sub-prop3":4,
"sub-prop4":7,
},
"prop2":{}
},
{}
]

Elasticsearch & X-Pack: how to get vertices/connections from nested documents

I just started using X-Pack for Elasticsearch and want to connect vertices from a nested document type. However, looking for documentation on this hasn't got me anywhere.
What I have is an index of documents which have person names/ids as nested documents (one document can have many persons, one person can be related to many documents). The desired result is to get a graph data with connections between persons.
Does anyone have a clue or can tell me if this is even possible?
Part of my mappings:
mappings: {
legend: {
properties: {
persons: {
type: 'nested',
properties: {
id: {
type: 'string',
index: 'not_analyzed'
},
name: {
type: 'string',
index: 'not_analyzed'
}
}
}
}
}
}
And my Graph API query, which of course doesn't work because I don't know how to handle the "name" field of the nested "persons" field.
POST sagenkarta_v3/_xpack/_graph/_explore
{
"controls": {
"use_significance": true,
"sample_size": 20000,
"timeout": 2000
},
"vertices": [
{
"field": "persons.name"
}
],
"connections": {
"vertices": [
{
"field": "persons.name"
}
]
}
}
Thanks in advance!
The following question was discussed here:
https://discuss.elastic.co/t/elasticsearch-x-pack-how-to-get-vertices-connections-from-nested-documents/88709
quote from Mark_Harwood - Elastic Team Member:
Unfortunately Graph does not support nested documents but you can use
copy_to in your mappings to put the person data in an indexed field in
the containing root document.
I can see that you have the classic problem of
"computers-want-IDs-but-people-want-labels" and have both these
values. In Graph (and arguably the rest of Kibana too) I suggest you
use tokens that combine IDs for uniqueness' sake and names for
readability by humans.
The copy_to and IDs-and-labels tips are part of the modelling
suggestions in my elasticon talk this year:
https://www.elastic.co/elasticon/conf/2017/sf/getting-your-data-graph-ready
3

Passing dynamic value to script query in Elastic Search

I have two mappings in my index. One of them stores some amount in different currencies and other stores current conversion rate. Records in each look like this:
http://localhost:9200/transactions/amount
[{
_index: "transactions",
_type: "amount",
_id: "AVA3fjawwMA2f8TzMTbM",
_score: 1,
_source: {
balance: 1000,
currency:"usd"
}
},
{
_index: "transactions",
_type: "amount",
_id: "AVA3flUWwMA2f8TzMTbN",
_score: 1,
_source: {
balance: 2000,
currency:"inr"
}
}]
and
http://localhost:9200/transactions/conversions
{
_index: "transactions",
_type: "conversions",
_id: "rates",
_score: 1,
_source: {
"usd": 1,
"inr":62.6
}
}
I want to query the data from amount and apply current conversion rates from conversions in a single query and get result.
I tried using scripted query and was able to convert the data based on passed params like:
GET _search
{
"query": {
"match_all": {}
},
"script_fields" : {
"test1" : {
"script" : "_source.balance * factor",
"params" : {
"factor" : 63.2
}
}
}
}
However in my case passed params are to be fetched from result of another query.
I want to visualize my data in Kibana in common currency. Kibana supports scripted queries. As per my knowledge all visualizations in Kibana can correspond to a single elastic search query so I don't have an option to do multiple queries.
I also tried exploring the possibility of using https://www.elastic.co/blog/terms-filter-lookup and adding some dynamic fields to each document in result set. However I don't think term filter allows that.
Assuming, you're trying to always plot transactions in USD, you could try the approach described in the accepted answer here:
In essence:
Model your data parent-child with each conversions document being a parent of all child transactions document in the same foreign currency. (And conversions having a standard fieldname like "conversion_divisor": 62.6)
Include a has_parent query clause for all relevant currency conversions.
Use a function_score (script_score) query to access the foreign currency multiple in each parent and generate a _score for each transaction by dividing the transaction amount by the foreign currency conversion_divisor.
Plot the _score in Kibana

Resources