Elasticsearch Completion in middle of the sentence - elasticsearch

is it possible to perform Completion on Elasticsearch and get result even if text is from the middle of input?
For instance:
"TitleSuggest" : {
"type" : "completion",
"index_analyzer" : "simple",
"search_analyzer" : "simple",
"payloads" : true,
"preserve_position_increments" : false,
"preserve_separators" : false
}
That's my current mapping and my query is
{
"passport": {
"text": "Industry Overview",
"completion": {
"field": "TitleSuggest",
"fuzzy": {
"edit_distance": 2
}
}
}
}
But nothing is returned, I have documents that contain Industry Overview in their input. For instance if I'm looking only for Industry:
{
"text" : "Industry",
"offset" : 0,
"length" : 8,
"options" : [{
"text" : "Airline Industry Sees Recovery in 2014",
"score" : 16
}, {
"text" : "Alcoholic Drinks Industry Overview",
"score" : 16
}, {
"text" : "Challenges in the Pet Care Industry For 2014",
"score" : 16
}
]
}
I can achieve that by using nGrams, but I'd like to get this done using completion suggesters
So my initial goal would getting this if I type in Industry Overview
{
"text" : "Industry Overview",
"offset" : 0,
"length" : 8,
"options" : [{
"text" : "Alcoholic Drinks Industry Overview",
"score" : 16
}
]
}
I've tried using shingle analyzer - that didn't solve the problem and I didn't come up on Google with anything useful.
ES Version : 1.5.1

Related

Elasticsearch : how to search multiple words in a copy_to field?

I am currently learning Elasticsearch and stuck on the issue described below:
On an existing index (I don't know if it matter) I added this new mapping:
PUT user-index
{
"mappings": {
"properties": {
"common_criteria": { -- new property which aggregates other properties by copy_to
"type": "text"
},
"name": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
},
"username": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
},
"phone": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
},
"country": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
}
}
}
}
The goal is to search ONE or MORE values only on common_criteria.
Say that we have:
{
"common_criteria": ["John Smith","johny","USA"]
}
What I would like to achieve is an exact match searching on multiple values of common_criteria:
We should have a result if we search with John Smith or with USA + John Smith or with johny + USA or with USA or with johny and finally with John Smith + USA + johny (the words order does not matter)
If we search with multiple words like John Smith + Germany or johny + England we should not have a result
I am using Spring Data Elastic to build my query:
NativeSearchQueryBuilder nativeSearchQuery = new NativeSearchQueryBuilder();
BoolQueryBuilder booleanQuery = QueryBuilders.boolQuery();
String valueToSearch = "johny"
nativeSearchQuery.withQuery(booleanQuery.must(QueryBuilders.matchQuery("common_criteria", valueToSearch)
.fuzziness(Fuzziness.AUTO)
.operator(Operator.AND)));
Logging the request sent to Elastic I have:
{
"bool" : {
"must" :
{
"match" : {
"common_criteria" : {
"query" : "johny",
"operator" : "AND",
"fuzziness" : "AUTO",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
With that request I have 0 result. I know that request is not correct because of must.match condition and maybe the field common_criteria is also not well defined.
Thanks in advance for your help and explanations.
EDIT: After trying multi_match query.
Following #rabbitbr's suggestion I tried the multi_match query but does not seem to work. This is the example of a request sent to Elastic (with 0 result):
{
"bool" : {
"must" : {
"multi_match" : {
"query" : "John Smith USA",
"fields" : [
"name^1.0",
"username^1.0",
"phone^1.0",
"country^1.0",
],
"type" : "best_fields",
"operator" : "AND",
"slop" : 0,
"fuzziness" : "AUTO",
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
},
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
That request does not return a result.
I would try to use Multi-match query before creating a field to store all the others in one place.
The multi_match query builds on the match query to allow multi-field
queries.

Elasticsearch Top 10 Most Frequent Values In Array Across All Records

I have an index "test". Document structure is as shown below. Each document has an array of "tags". I am not able to figure out how to query this index to get top 10 most frequently occurring tags?
Also, what are the best practices one should follow if we have more than 2mil docs in this index?
{
"_index" : "test",
"_type" : "data",
"_id" : "1412879673545024927_1373991666",
"_score" : 1.0,
"_source" : {
"instagramuserid" : "1373991666",
"likes_count" : 163,
"#timestamp" : "2017-06-08T08:52:41.803Z",
"post" : {
"created_time" : "1482648403",
"comments" : {
"count" : 9
},
"user_has_liked" : true,
"link" : "https://www.instagram.com/p/BObjpPMBWWf/",
"caption" : {
"created_time" : "1482648403",
"from" : {
"full_name" : "PARAMSahib ™",
"profile_picture" : "https://scontent.cdninstagram.com/t51.2885-19/s150x150/12750236_1692144537739696_350427084_a.jpg",
"id" : "1373991666",
"username" : "parambanana"
},
"id" : "17845953787172829",
"text" : "This feature talks about how to work pastels .\n\nDull gold pullover + saffron khadi kurta + baby pink pants + Deep purple patka and white sneakers - Perfect colours for a Happy sunday christmas morning . \n#paramsahib #men #menswear #mensfashion #mensfashionblog #mensfashionblogger #menswearofficial #menstyle #fashion #fashionfashion #fashionblog #blog #blogger #designer #fashiondesigner #streetstyle #streetfashion #sikh #sikhfashion #singhstreetstyle #sikhdesigner #bearded #indian #indianfashionblog #indiandesigner #international #ootd #lookbook #delhistyleblog #delhifashionblog"
},
"type" : "image",
"tags" : [
"men",
"delhifashionblog",
"menswearofficial",
"fashiondesigner",
"singhstreetstyle",
"fashionblog",
"mensfashion",
"fashion",
"sikhfashion",
"delhistyleblog",
"sikhdesigner",
"indianfashionblog",
"lookbook",
"fashionfashion",
"designer",
"streetfashion",
"international",
"paramsahib",
"mensfashionblogger",
"indian",
"blog",
"mensfashionblog",
"menstyle",
"ootd",
"indiandesigner",
"menswear",
"blogger",
"sikh",
"streetstyle",
"bearded"
],
"filter" : "Normal",
"attribution" : null,
"location" : null,
"id" : "1412879673545024927_1373991666",
"likes" : {
"count" : 163
}
}
}
},
If your tags type in mapping is object (which is by default) you can use an aggregation query like this:
{
"size": 0,
"aggs": {
"frequent_tags": {
"terms": {"field": "post.tags"}
}
}
}

elasticsearch completion suggester produce duplicate results

I am using elasticsearch completion suggester thesedays, and got some problem that it always produce similar results.
Say I search with the following statement:
"my_suggestion": {
> "text": "ni",
> "completion": {
> "field": "my_name_for_sug"
> }
> }
And get the following results:
"my_suggestion" : [ {
"text" : "ni",
"offset" : 0,
"length" : 2,
"options" : [ {
"text" : "Nine West",
"score" : 329.0
}, {
"text" : "Nine West ",
"score" : 329.0
}, {
"text" : "Nike",
"score" : 295.0
}, {
"text" : "NINE WEST",
"score" : 168.0
}, {
"text" : "NINE WEST ",
"score" : 168.0
} ]
} ],
So the question is how can I merge or aggregate the same results like "NINE WEST" and "NINE WEST ".
the mapping is:
"my_name_for_sug": {
"type": "completion"
,"analyzer": "ik_max_word"
,"search_analyzer": "ik_max_word"
,"payloads": true
,"preserve_separators": false
}
where ik_max_word is an chinese-specific analyzer, and it can do the standard analyzer's job.
Thanks
Elastic Suggesters automatically de-duplicate same output (at least till 2.x). I haven't tried out 5.x yet, and there's some changes in suggesters there.
The problem seems to be your index analyzer, which is indexing your documents so that:
"text" : "Nine West",
"text" : "Nine West ",
"text" : "NINE WEST",
"text" : "NINE WEST ",
aren't exactly the same. You need to index them using an analyzer which lowercases the tokens, and strips extra spaces etc.
Once you do that, you should get de-duplicated output for suggestions, like you want.

ElasticSearch doesn't seem to support array lookups

I currently have a fairly simple document stored in ElasticSearch that I generated with an integration test:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "unit-test_project600",
"_type" : "recordDefinition505",
"_id" : "400",
"_score" : 1.0, "_source" : {
"field900": "test string",
"field901": "500",
"field902": "2050-01-01T00:00:00",
"field903": [
"Open"
]
}
} ]
}
}
I would like to filter for specifically field903 and a value of "Open", so I perform the following query:
{
query: {
filtered: {
filter: {
term: {
field903: "Open",
}
}
}
}
}
This returns no results. However, I can use this with other fields and it will return the record:
{
query: {
filtered: {
filter: {
term: {
field901: "500",
}
}
}
}
}
It would appear that I'm unable to search in arrays with ElasticSearch. I have read a few instances of people with a similar problem, but none of them appear to have solved it. Surely this isn't a limitation of ElasticSearch?
I thought that it might be a mapping problem. Here's my mapping:
{
"unit-test_project600" : {
"recordDefinition505" : {
"properties" : {
"field900" : {
"type" : "string"
},
"field901" : {
"type" : "string"
},
"field902" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"field903" : {
"type" : "string"
}
}
}
}
}
However, the ElasticSearch docs indicate that there is no difference between a string or an array mapping, so I don't think I need to make any changes here.
Try searching for "open" rather than "Open." By default, Elasticsearch uses a standard analyzer when indexing fields. The standard analyzer uses a lowercase filter, as described in the example here. From my experience, Elasticsearch does search arrays.

ElasticSerach - Statistical facets on length of the list

I have the following sample mappipng:
{
"book" : {
"properties" : {
"author" : { "type" : "string" },
"title" : { "type" : "string" },
"reviews" : {
"properties" : {
"url" : { "type" : "string" },
"score" : { "type" : "integer" }
}
},
"chapters" : {
"include_in_root" : 1,
"type" : "nested",
"properties" : {
"name" : { "type" : "string" }
}
}
}
}
}
I would like to get a facet on number of reviews - i.e. length of the "reviews" array.
For instance, verbally spoken results I need are: "100 documents with 10 reviews, 20 documents with 5 reviews, ..."
I'm trying the following statistical facet:
{
"query" : {
"match_all" : {}
},
"facets" : {
"stat1" : {
"statistical" : {"script" : "doc['reviews.score'].values.size()"}
}
}
}
but it keeps failing with:
{
"error" : "SearchPhaseExecutionException[Failed to execute phase [query_fetch], total failure; shardFailures {[mDsNfjLhRIyPObaOcxQo2w][facettest][0]: QueryPhaseExecutionException[[facettest][0]: query[ConstantScore(NotDeleted(cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter#a2a5984b)))],from[0],size[10]: Query Failed [Failed to execute main query]]; nested: PropertyAccessException[[Error: could not access: reviews; in class: org.elasticsearch.search.lookup.DocLookup]
[Near : {... doc[reviews.score].values.size() ....}]
^
[Line: 1, Column: 5]]; }]",
"status" : 500
}
How can I achieve my goal?
ElasticSearch version is 0.19.9.
Here is my sample data:
{
"author" : "Mark Twain",
"title" : "The Adventures of Tom Sawyer",
"reviews" : [
{
"url" : "amazon.com",
"score" : 10
},
{
"url" : "www.barnesandnoble.com",
"score" : 9
}
],
"chapters" : [
{ "name" : "Chapter 1" }, { "name" : "Chapter 2" }
]
}
{
"author" : "Jack London",
"title" : "The Call of the Wild",
"reviews" : [
{
"url" : "amazon.com",
"score" : 8
},
{
"url" : "www.barnesandnoble.com",
"score" : 9
},
{
"url" : "www.books.com",
"score" : 5
}
],
"chapters" : [
{ "name" : "Chapter 1" }, { "name" : "Chapter 2" }
]
}
It looks like you are using curl to execute your query and this curl statement looks like this:
curl localhost:9200/my-index/book -d '{....}'
The problem here is that because you are using apostrophes to wrap the body of the request, you need to escape all apostrophes that it contains. So, you script should become:
{"script" : "doc['\''reviews.score'\''].values.size()"}
or
{"script" : "doc[\"reviews.score"].values.size()"}
The second issue is that from your description it looks like your are looking for a histogram facet or a range facet but not for a statistical facet. So, I would suggest trying something like this:
curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"histogram" : {
"key_script" : "doc[\"reviews.score\"].values.size()",
"value_script" : "doc[\"reviews.score\"].values.size()",
"interval" : 1
}
}
}
}'
The third problem is that the script in the facet will be called for every single record in the result list and if you have a lot of results it might take really long time. So, I would suggest indexing an additional field called number_of_reviews that should be populated with the number of reviews by your client. Then your query would simply become:
curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
"query" : {
"match_all" : {}
},
"facets" : {
"histo1" : {
"histogram" : {
"field" : "number_of_reviews"
"interval" : 1
}
}
}
}'

Resources