I'm wondering how elastic search tokenizes keywords.
Example:
So I'm using a search box for searching keywords in comments.
When I search for "Zelle" only comments in Spanish showed up.
enter image description here
But if I search for "Zell", all comments with "Zelle" showed up, with highlighting "Zell".
enter image description here
Can anyone please tell me why when I search for some keywords, only some comments in specific languages showed up?
Edit1:
The mapping is like this:
{
"comments" : {
"mappings" : {
"ios" : {
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"country" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"date" : {
"type" : "date"
},
"language" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"product_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"product_version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"rating" : {
"type" : "long"
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"user_language" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
and it did not have any info with the tokenizer.
How should I know which tokenizer es uses for searching?
I recommend you read the Mapping chapter of the official book, it will help you a lot.
To answer your question, we need to know the Mapping of your documents, specifically, the mapping of the field you search in.
By the look of it, you do not use the default analyzer (called "standard"), because "Zell" would not match "Zelle" with it.
In Elasticsearch you have analyzer which tokenize your content the way you want. And by the look of it, some analyzer is setup in your mapping, because "Zelle" and "Zell" are matching.
Related
Kibana Version : 7.4.2
My Index: cls-docker-logs*
Existing mapping in index cls-docker-logs-*
GET cls-docker-logs-*/_mapping
"stack_trace" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
Updated the mapping to:
"stack_trace" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 2560000
}
}
}
Created a scripted field from where i basically want the value and want to perform some string manipulation
return doc['stack_trace.keyword'].value;
I don't see any value after i run this
What I am doing wrong here?
I'm using Elastic cloud hosted in Azure and use NEST for the client. I have a part of auto generated mapping that I need to change from
"Bonus" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
to
"Bonus" : {
"properties" : {
"Amount" : {
"properties" : {
"Value" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"PayrollSyncDateTime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
When I tried to do it, I get illegal_argument_exception error with the message "can't merge a non object mapping [activityData.Bonus] with an object mapping". How can I correct the auto generated mapping?
I am trying to create an elastic query that will show non-common properties between two indexes. Say the first index is:
{
"myFirstIndex" : {
"mappings" : {
"properties" : {
"CAT" : {
"type" : "keyword",
"ignore_above" : 256
},
"DATE_OF_BIRTH" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"ID" : {
"type" : "keyword",
"ignore_above" : 256
},
"NAME" : {
"type" : "text"
},
"timestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
}
}
}
}
}
, and the second was is:
{
"mySecondIndex" : {
"mappings" : {
"properties" : {
"CAT" : {
"type" : "keyword",
"ignore_above" : 256
},
"DATE_OF_BIRTH" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"ID" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
I have never done a query across indexes so I am not sure how to do this. I don't care much about whether the properties have nested characteristics. For my purposes, finding the appropriate common properties at a base level is sufficient.
Grateful for any assistance. Thank you
(based off your clarifications) you can't do that natively in Elasticsearch
you'd need to run the queries from some code and then compare the two indices in said code
I'm having a hard time figuring out how to set up Elasticsearch for the typical product model nomenclature. For instance, a product called "Shure SM7B" should appear as a result when searching for SM7B, SM 7B, SM 7, SM-7... and vice versa: searching for SM7B should give results like SM-7, SM7...
For now, I'm getting this kind of results: if I search for "Roland D 50", I get Roland D 50, Roland D-50, Roland D-550, Roland D-20 and so on... but if I search for "Roland D50", I get only "Roland D50" results.
This is my current mapping/settings:
{
"products" : {
"mappings" : {
"Product" : {
"properties" : {
"article_reviews" : {
"type" : "integer"
},
"brand_id" : {
"type" : "integer"
},
"category" : {
"type" : "text"
},
"category_id" : {
"type" : "integer"
},
"date" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"description" : {
"type" : "text"
},
"has_image" : {
"type" : "integer"
},
"id" : {
"type" : "integer"
},
"last_review_date" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss"
},
"min_price" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text"
},
"name_order" : {
"type" : "keyword"
},
"price_history" : {
"type" : "integer"
},
"rating" : {
"type" : "float"
},
"reviews" : {
"type" : "integer"
},
"shops" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"widget" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
Also, I'd need to autocomplete my searches, so for instance, "Shure SM" should show results like Shure SM-7, Shure SM7, Shure SM58, Shure SM 57, etc... narrowing them down as I type.
Any clues? Thank you!
Been trying to find a way to do this for a couple days now. I've looked through 'bool', 'constant_score', 'filtered' queries none of which seem to be able to come up with the result I want.
One that HAS come close is the 'ids' query (does exactly what I described in the title of this questions) the one problem is that the key that I'm trying to search is not the '_id' value of the Elastic search index. Instead it is 'posterId' in the index below:
"_index": "activity",
"_type": "activity",
"_id": "<unique string id>",
"_score": null,
"_source": {
...
misc keys
...
"posterId": "<QUERY BASED ON THIS VALUE>",
"time": 20171007173623
}
Query that returns based on the _id value:
ids : {
type : "activity",
values : ["<unique string id>", ...]
}
as seen here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html
How I want my query to work:
posterId : {
type : "activity",
values : [<list of posterIds>]
}
Returning all indicies that have posterIds contained in "<list of posterIds>"
< Edit > I'm trying to do this in one query as apposed to looping through each member of my list of posterIds because I also need to sort based on the time key and be able to page the query.
So, does anyone know of a built in query that does this or a work around?
Side note: if you feel like you're about to downvote this please just comment why, I'm about to be banned and I've read through all the guidelines and I feel like I'm following them but my questions rarely perform well. :( It would be much appreciated
Edit:
{
"activity" : {
"aliases" : { },
"mappings" : {
"activity" : {
"properties" : {
"-Kvp7f3epvW_dXSONzKj" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"actionId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"actionType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"activityType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachedId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachedType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"cardType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"noteTitleDict" : {
"properties" : {
"noun" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"subject" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"verb" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"posterId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"segueType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"time" : {
"type" : "long"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1507678305995",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "<id>",
"version" : {
"created" : "5010199"
},
"provided_name" : "activity"
}
}
}
}
I think what you are looking for is a Terms Query
{
"query": {
"constant_score" : {
"filter" : {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
}
}
This finds documents which contain the exact term Kimchy or elasticsearch in the index of the user field. You can read more about this here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
In your case you need to replace
the user with posterId.keyword
Kimchy and elasticsearch with all your posterIds
Keep in mind that a terms query is case sensitive and the keyword field does not use a lowercase analyzer (which means it'll save/index the value in the same case it was received)