I'm wondering how elastic search tokenizes keywords.
Example:
So I'm using a search box for searching keywords in comments.
When I search for "Zelle" only comments in Spanish showed up.
enter image description here
But if I search for "Zell", all comments with "Zelle" showed up, with highlighting "Zell".
enter image description here
Can anyone please tell me why when I search for some keywords, only some comments in specific languages showed up?
Edit1:
The mapping is like this:
{
"comments" : {
"mappings" : {
"ios" : {
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"country" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"date" : {
"type" : "date"
},
"language" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"product_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"product_version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"rating" : {
"type" : "long"
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"user_language" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
and it did not have any info with the tokenizer.
How should I know which tokenizer es uses for searching?
I recommend you read the Mapping chapter of the official book, it will help you a lot.
To answer your question, we need to know the Mapping of your documents, specifically, the mapping of the field you search in.
By the look of it, you do not use the default analyzer (called "standard"), because "Zell" would not match "Zelle" with it.
In Elasticsearch you have analyzer which tokenize your content the way you want. And by the look of it, some analyzer is setup in your mapping, because "Zelle" and "Zell" are matching.
Been trying to find a way to do this for a couple days now. I've looked through 'bool', 'constant_score', 'filtered' queries none of which seem to be able to come up with the result I want.
One that HAS come close is the 'ids' query (does exactly what I described in the title of this questions) the one problem is that the key that I'm trying to search is not the '_id' value of the Elastic search index. Instead it is 'posterId' in the index below:
"_index": "activity",
"_type": "activity",
"_id": "<unique string id>",
"_score": null,
"_source": {
...
misc keys
...
"posterId": "<QUERY BASED ON THIS VALUE>",
"time": 20171007173623
}
Query that returns based on the _id value:
ids : {
type : "activity",
values : ["<unique string id>", ...]
}
as seen here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html
How I want my query to work:
posterId : {
type : "activity",
values : [<list of posterIds>]
}
Returning all indicies that have posterIds contained in "<list of posterIds>"
< Edit > I'm trying to do this in one query as apposed to looping through each member of my list of posterIds because I also need to sort based on the time key and be able to page the query.
So, does anyone know of a built in query that does this or a work around?
Side note: if you feel like you're about to downvote this please just comment why, I'm about to be banned and I've read through all the guidelines and I feel like I'm following them but my questions rarely perform well. :( It would be much appreciated
Edit:
{
"activity" : {
"aliases" : { },
"mappings" : {
"activity" : {
"properties" : {
"-Kvp7f3epvW_dXSONzKj" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"actionId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"actionType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"activityType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachedId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachedType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"cardType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"noteTitleDict" : {
"properties" : {
"noun" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"subject" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"verb" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"posterId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"segueType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"time" : {
"type" : "long"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1507678305995",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "<id>",
"version" : {
"created" : "5010199"
},
"provided_name" : "activity"
}
}
}
}
I think what you are looking for is a Terms Query
{
"query": {
"constant_score" : {
"filter" : {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
}
}
This finds documents which contain the exact term Kimchy or elasticsearch in the index of the user field. You can read more about this here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
In your case you need to replace
the user with posterId.keyword
Kimchy and elasticsearch with all your posterIds
Keep in mind that a terms query is case sensitive and the keyword field does not use a lowercase analyzer (which means it'll save/index the value in the same case it was received)
I'm using Elasticsearch latest version on Ubuntu 16.04 and I'm having a little issue on putting data on it.
here is my json document (relevant part of it)
{ "products" : {
"232CDFDW89ENUXRB" : {
"sku" : "232CDFDW89ENUXRB",
"productFamily" : "Compute Instance",
"attributes" : {
"servicecode" : "AmazonEC2",
"location" : "US East (N. Virginia)",
"locationType" : "AWS Region",
"instanceType" : "d2.8xlarge",
"currentGeneration" : "Yes",
"instanceFamily" : "Storage optimized",
"vcpu" : "36",
"physicalProcessor" : "Intel Xeon E5-2676v3 (Haswell)",
"clockSpeed" : "2.4 GHz",
"memory" : "244 GiB",
"storage" : "24 x 2000 HDD",
"networkPerformance" : "10 Gigabit",
"processorArchitecture" : "64-bit",
"tenancy" : "Host",
"operatingSystem" : "Linux",
"licenseModel" : "No License required",
"usagetype" : "HostBoxUsage:d2.8xlarge",
"operation" : "RunInstances",
"enhancedNetworkingSupported" : "Yes",
"preInstalledSw" : "NA",
"processorFeatures" : "Intel AVX; Intel AVX2; Intel Turbo" }
}
}
}
and here's the returning response from ES when i try "PUT http://localhost:9200/aws"
{ "error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "unknown setting [index.products.232CDFDW89ENUXRB.attributes.clockSpeed] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
}
],
"type": "illegal_argument_exception",
"reason": "unknown setting [index.products.232CDFDW89ENUXRB.attributes.clockSpeed] please check that any required plugins are installed, or check the breaking changes documentation for removed settings" }, "status": 400 }
Seems to me ES thinks that "clockSpeed" is some sort of setting...?
I was hoping to use dynamic mapping to speed the process up instead of first mapping all the document and then importing it in ES.
Any suggestion?
The issue is you are missing document type and document id while indexing a document through PUT http://localhost:9200/aws command.
Proper way to index document is:
POST my-index/my-type/my-id-1
{
"name": "kibana"
}
i.e You have to provide document type (here my-type) and document id (here my-id-1). Note that document id is optional here so if you don't provide one then elasticsearch create one alphanumeric id for you.
Other couple of ways indexing a doc:
POST my-index/my-type
{
"name": "kibana"
}
//if you want to index document through PUT then you must provide document id
PUT my-index/my-type/my-id-1
{
"name": "kibana"
}
Note: If automatic index creation is disabled then you have to create index before indexing documents.
Given a clean mapping, XPOST works perfectly for me on elasticsearch 5.1.1.,
$ curl -XPOST localhost:9200/productsapp/productdocs -d '
{ "products" : {
"sku1" : {
"sku" : "SKU-Name",
"productFamily" : "Compute Instance",
"attributes" : {
"servicecode" : "AmazonEC2",
"location" : "US East (N. Virginia)",
"locationType" : "AWS Region",
"instanceType" : "d2.8xlarge",
"currentGeneration" : "Yes",
"instanceFamily" : "Storage optimized",
"vcpu" : "36",
"physicalProcessor" : "Intel Xeon E5-2676v3 (Haswell)",
"clockSpeed" : "2.4 GHz",
"memory" : "244 GiB",
"storage" : "24 x 2000 HDD",
"networkPerformance" : "10 Gigabit",
"processorArchitecture" : "64-bit",
"tenancy" : "Host",
"operatingSystem" : "Linux",
"licenseModel" : "No License required",
"usagetype" : "HostBoxUsage:d2.8xlarge",
"operation" : "RunInstances",
"enhancedNetworkingSupported" : "Yes",
"preInstalledSw" : "NA",
"processorFeatures" : "Intel AVX; Intel AVX2; Intel Turbo" }
}
}
}'
{"_index":"productsapp","_type":"productdocs","_id":"AVuhXdYYUiSguAb0FsSX","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"created":true}
GET the inserted doc
curl -XGET localhost:9200/productsapp/productdocs/_search
{"took":11,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"productsapp","_type":"productdocs","_id":"AVuhXdYYUiSguAb0FsSX","_score":1.0,"_source":{ "products" : {
"sku1" : {
"sku" : "SKU-Name",
"productFamily" : "Compute Instance",
"attributes" : {
"servicecode" : "AmazonEC2",
"location" : "US East (N. Virginia)",
"locationType" : "AWS Region",
"instanceType" : "d2.8xlarge",
"currentGeneration" : "Yes",
"instanceFamily" : "Storage optimized",
"vcpu" : "36",
"physicalProcessor" : "Intel Xeon E5-2676v3 (Haswell)",
"clockSpeed" : "2.4 GHz",
"memory" : "244 GiB",
"storage" : "24 x 2000 HDD",
"networkPerformance" : "10 Gigabit",
"processorArchitecture" : "64-bit",
"tenancy" : "Host",
"operatingSystem" : "Linux",
"licenseModel" : "No License required",
"usagetype" : "HostBoxUsage:d2.8xlarge",
"operation" : "RunInstances",
"enhancedNetworkingSupported" : "Yes",
"preInstalledSw" : "NA",
"processorFeatures" : "Intel AVX; Intel AVX2; Intel Turbo" }
}
}
}}]}}
The mapping it creates is as below, with clockSpeed as text type.
curl -XGET localhost:9200/productsapp/productdocs/_mapping?pretty=true
{
"productsapp" : {
"mappings" : {
"productdocs" : {
"properties" : {
"products" : {
"properties" : {
"232CDFDW89ENUXRB" : {
"properties" : {
"attributes" : {
"properties" : {
"clockSpeed" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"currentGeneration" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"enhancedNetworkingSupported" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"instanceFamily" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"instanceType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"licenseModel" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"location" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"locationType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"memory" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"networkPerformance" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"operatingSystem" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"operation" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"physicalProcessor" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"preInstalledSw" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"processorArchitecture" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"processorFeatures" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"servicecode" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"storage" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"tenancy" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"usagetype" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"vcpu" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"productFamily" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"sku" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}
}
}
Can you check you mapping for attributes.clockSpeed and make sure its not screwed up.
And if you want to update the document do XPUT on the id of first document (which is AVuhXdYYUiSguAb0FsSX),
In following example, I am updating sku field to "sku name updated"
curl -XPUT localhost:9200/productsapp/productdocs/AVuhXdYYUiSguAb0FsSX -d '
{
"products" : {
"sku1" : {
"sku" : "sku name updated",
"productFamily" : "Compute Instance",
"attributes" : {
"servicecode" : "AmazonEC2",
"location" : "US East (N. Virginia)",
"locationType" : "AWS Region",
"instanceType" : "d2.8xlarge",
"currentGeneration" : "Yes",
"instanceFamily" : "Storage optimized",
"vcpu" : "36",
"physicalProcessor" : "Intel Xeon E5-2676v3 (Haswell)",
"clockSpeed" : "2.4 GHz",
"memory" : "244 GiB",
"storage" : "24 x 2000 HDD",
"networkPerformance" : "10 Gigabit",
"processorArchitecture" : "64-bit",
"tenancy" : "Host",
"operatingSystem" : "Linux",
"licenseModel" : "No License required",
"usagetype" : "HostBoxUsage:d2.8xlarge",
"operation" : "RunInstances",
"enhancedNetworkingSupported" : "Yes",
"preInstalledSw" : "NA",
"processorFeatures" : "Intel AVX; Intel AVX2; Intel Turbo"
}
}
}}'
{"_index":"productsapp","_type":"productdocs","_id":"AVu5OLfHPw6Pv_3O38-V","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"created":false}