I am trying to learn Elasticsearch and I am using Kibana to visualise things. I cannot seem to figure out what is wrong with my mapping and queries though.
I am trying to store photo metadata (iptc data). And I have the following mapping for it:
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"photo_added": {
"type": "date",
"index": true,
"format": "yyyy-MM-dd' 'H:m:s"
},
"photo_id": {
"type": "long",
"index": true
},
"photo_owner": {
"type": "long",
"index": true
},
"project": {
"type": "long",
"index": true
},
"iptc": {
"type": "nested",
"properties": {
"caption/abstract": {
"type": "text",
"index": true
},
"copyright notice": {
"type": "text",
"index": true
},
"keywords": {
"type": "text",
"index": true,
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
},
"keyword": {
"type": "keyword"
}
}
},
"object name": {
"type": "text",
"index": true
},
"province/state": {
"type": "text",
"index": true
},
"sub-location": {
"type": "text",
"index": true
},
"time created": {
"type": "text",
"index": true
},
"urgency": {
"type": "text",
"index": true
},
"writer/editor": {
"type": "text",
"index": true
}
}
}
}
}
}
}
The thing is: I want a query that searching through the keywords and caption for the existence of the search-text. Whenever the search-text is found within keywords, the score is boosted because that indicated that the photo is of higher relevance. So I formulated the following query (where value is the search-text):
GET /photos/_search
{
"query": {
"dis_max": {
"queries": [
{
"fuzzy": {
"iptc.keywords": {
"value": "value",
"fuzziness": 1,
"boost": 1
}
}
},
{
"fuzzy": {
"iptc.caption/abstract": {
"value": "value",
"fuzziness": 1
}
}
}
]
}
}
}
However it does not seem to find any matches despite the fact that the value is in the documents... And I cannot seem to construct a simple match query that will match against the exact text... for example:
GET /photos/doc/_search?error_trace=true
{
"query": {
"match": {
"iptc.caption/abstract": "exact value from one of the documents"
}
}
}
Will return 0 results... the search-text is however exactly in the document.. I don't know what to do know. To make matters worse (for me, since I am near bald thanks due to the frustration that is causing me) Kibana seems to act up.. I am almost sure it is something really simple (the document date is within 5 years) but when filtering for the exact copy pasted value it returns 0 results... as shown in the screenshot...
I am going crazy here. Does someone know how to fix this or what in earths name I am doing wrong?
I found the solution which is in the documentation of Elastic.
Because nested documents are indexed as separate documents, they can only be accessed within the scope of the nested query, the nested/reverse_nested aggregations, or nested inner hits.
Documentation
So I constructed the following query which works.
{
"query": {
"nested": {
"path": "iptc",
"query": {
"bool": {
"should": [
{
"dis_max": {
"queries": [
{
"fuzzy": {
"iptc.keywords": {
"value": "Feyenoord",
"boost": 1
}
}
},
{
"fuzzy": {
"iptc.caption/abstract": {
"value": "Feyenoord",
"fuzziness": 1
}
}
}
]
}
}
]
}
}
}
}
Related
How can I search for documents in Elasticsearch that have numeric field with value having decimal places?
My Mapping is as follows:
POST /itemnew/_doc
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "string",
"analyzer": "edge_ngram_analyzer",
},
"purchase_price": {
"type": "double"
},
"sale_price": {
"type": "double"
},
"sku": {
"type": "string",
},
"settings": {
"index": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
}
},
"analyzer": {
"ngram_analyzer": {
"tokenizer": "standard",
}
Sample document is as follows:
PUT itemnew/_doc/3
{
"company_id":"4510339694428161" ,
"item_type": "goods",
"name":"Apple sam" ,
"purchase_price":"45.50" ,
"sale_price":"50",
"sku": "sku 123"
}
I get NumberFormatException when I try the following query: GET itemnew/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "45.5",
"fields": [
"name",
"purchase_price",
"sale_price",
"sku"
],
"type": "most_fields"
```
How can I search for documents in Elasticsearch that have numeric field with value having decimal places?Please help me to solve this issue. Thank you }
You can use a lenient top-level parameter for a multi-match query here. Adding lenient just ignore exception that occurs due to format failures.
lenient (Optional, Boolean) If true, format-based errors, such as
providing a text query value for a numeric field, are ignored.
Defaults to false.
Adding a working example
Index Mapping:
PUT testidx
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram"
}
},
"analyzer": {
"ngram_analyzer": {
"tokenizer": "standard"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "ngram_analyzer"
},
"purchase_price": {
"type": "double"
},
"sale_price": {
"type": "double"
},
"sku": {
"type": "text"
}
}
}
}
Index Data:
PUT testidx/_doc/1
{
"company_id": "4510339694428161",
"item_type": "goods",
"name": "Apple sam",
"purchase_price": "45.50",
"sale_price": "50",
"sku": "sku 123"
}
Search Query:
POST testidx/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "hello",
"fields": [
"name",
"purchase_price",
"sale_price",
"sku"
],
"lenient": true,
"type": "most_fields"
}
}
]
}
}
}
I tried to debug my synonym search .it seems like when i use wornet format and use the wn_s.pl file it doesn't work, but when i use a custom synonym.txt file then it works.Please let me know where i am doing wrong.please find my below index:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/wn_s.pl"
}
},
"analyzer": {
"synonym": {
"tokenizer": "standard",
"filter": ["lowercase",
"synonym"
]
}
},
"mappings": {
"properties": {
"firebaseId": {
"type": "text"
},
"name": {
"fielddata": true,
"type": "text",
"analyzer": "standard"
},
"name_auto": {
"type": "text"
},
"category_name": {
"type": "text",
"analyzer": "synonym"
},
"sku": {
"type": "text"
},
"price": {
"type": "text"
},
"magento_id": {
"type": "text"
},
"seller_id": {
"type": "text"
},
"square_item_id": {
"type": "text"
},
"square_variation_id": {
"type": "text"
},
"typeId": {
"type": "text"
}
}
}
}
}
}
}
I am trying to do synonym search on category_name ,i have items like shoes and dress etc .when i search for boots,flipflop or slipper nothing comes.
here is my query search:
{
"query": {
"match": {
"category_name": "flipflop"
}
}
}
Your wordnet synonym format is not correct. Please have a look here
For a fast implementation please look at the synonyms.json
I have an index with 3 different types of content: ['media','group',user'] and I need to do a search at the three at the same type, but requesting some extra parameters that one of them must accomplish before adding to the results list.
Here is my current index data:
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"media": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"UID": {
"type": "integer",
"include_in_all": false
},
"addtime": {
"type": "integer",
"include_in_all": false
},
"title": {
"type": "string",
"index": "not_analyzed"
}
}
},
"group": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"UID": {
"type": "integer",
"include_in_all": false
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"desc": {
"type": "string",
"include_in_all": false
}
}
},
"user": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"addtime": {
"type": "integer",
"include_in_all": false
},
"username": {
"type": "string"
}
}
}
}
}
So currently I can make a search on all the index with
{
query: {
match: {
_all: {
"query": "foo",
"operator": "and"
}
}
}
}
and get the results for media, groups or users with the word "foo" on it, which is great, but I need to make it remove all the media on which the user is not the owner of the results. So I guess I need to do a bool query where I set the "must" clause and add the 'UID' variable to whatever the current user ID is.
My problem is how to do this and how to specify that the filter will work just on one type while leaving the others untouched.
I haven't been able to find an answer on the Elastic Search documentation
At the end I was able to accomplish this by following Andrei's comments. I know it is not perfect since I had to add a should with the types "group" and "user", but it fit perfectly with my design since I need to put more filters on those too. Be advice that the search will end up being slower.
curl -X GET 'http://localhost:9200/foo/_search' -d '
{
"query": {
"bool" :
{
"must" :
{
"query" : {
"match" :
{
"_all":
{
"query" : "test"
}
}
}
},
"filter":
{
"bool":
{
"should":
[{
"bool" : {
'must':
[{
"type":
{
"value": "media"
}
},
{
'bool':
{
"should" : [
{ "term" : {"UID" : 2}},
{ "term" : {"type" : "public"}}
]
}
}]
}
},
{
"bool" : {
"should" : [
{ "type" : {"value" : "group"}},
{ "type" : {"value" : "user"}}
]
}
}]
}
}
}
}
}'
In my application, users belong to a list of roles and objects have a list of roles associated with them to determine visibility. I'm trying to create a query that ensures the user belongs to at least one of the groups that is required by the object.
Here is my index configuration:
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 12,
"token_chars": []
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"team" : {
"dynamic": "strict",
"properties" : {
"id": {
"type": "string",
"index": "not_analyzed"
},
"object": {
"type": "string",
"index": "not_analyzed"
},
"roles": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"text": {
"type": "string",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
}
}
}
}
}
Here is some sample data that I have indexed:
(verified via localhost:9200/index/_search?q=name:employee_1&pretty)
{
"id":"lsJ17K4sgQVfd",
"roles: ["OwnerslsJ17K21px6VX","AdminslsJ17K21px6VX"],
"object":"contact",
"name":"employee_1",
"text":"lsJ17K4sgQVfd employee_1 employee_1 employee_1#lsj17k1nysk75.com"
}
Here is my query that I am trying to execute to find that same contact:
{
"_source": ["id", "object", "name"],
"size": 30,
"query": {
"filtered": {
"query": {
"bool": {
"should": {
"multi_match": {
"query": "employee_1",
"type": "cross_fields",
"operator": "or",
"fields": ["name^2", "text"],
"minimum_should_match": "50%",
"fuzziness": "AUTO"
}
},
...,
"minimum_should_match": 1
}
},
"filter": {
"terms": {
"roles": [ "AdminslsJ17K21px6VX", "lsJ17K3gHCH4P" ]
}
}
}
},
"suggest": {
"text": "employee_1",
"text_suggestion": {
"term": {
"size": 3,
"field": "name",
"sort": "score",
"suggest_mode": "missing",
"prefix_length": 1
}
}
}
}
If I remove the filter clause then I get results, but as soon as I add it back everything gets filtered out again. What is the right way to express that I want the results to have at least one role in common?
The query above works as expected, the only problem was that my test case was executing before the index was fully populated. Adding a short wait time before making my first query solved the problem.
Consider a very simple model where we have locations and each location can have zero or more events. A location would have properties such as name, description and geo point data (lon/lat). An event should be attached to one location (its parent) and should have a name and description.
{
"location" : {
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" },
"geo": { "type": "geo_point" },
"exhibits": {
"type": "nested",
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" }
}
}
}
}
}
What I want to be able to do, is to query for the child documents (events) performing a full text search on their names and descriptions. I would like to get the matching events back and be able to also get their parent location's name. I would also like to narrow down the result set by location's coordinates. I don't want to get any events that do not match the query. Is that possible in Elastic Search? What types of queries should I use?
I have tried putting events as an array property under location (see above) and using the nested query but it does not return the kind of results I want (I think it returns the whole location, including all events, even the ones that do not match my query). I have tried putting events into a separate index (mapping?) providing the _parent property and then performing the top_children query on locations, but I don't get any results.
{
"exhibit": {
"_parent": { "type": "locations" },
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" }
}
}
}
Could anyone shed some light? I don't know where to begin...
Here's the working solution to my problem, perhaps it will be useful to somebody.
Location mapping:
{
"location" : {
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" },
"geo": { "type": "geo_point" }
}
}
}
Exhibit mapping:
{
"exhibit": {
"_parent": { "type": "locations" },
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" }
}
}
}
Query:
{
"fields": [ "_parent", "name", "_source" ],
"query": {
"bool": {
"should": [
{ "text": { "name": "candy" } },
{ "text": { "description": "candy" } }
]
}
},
"filter": {
"and": [
{
"terms" : {
"_parent": [ "4e7089a9b97d640b30695b7a", "4e7089eeb97d640b30695b7b" ]
}
},
{ "range": { "start": { "lte": "2011-09-22" } } },
{ "range": { "end": { "gte": "2011-09-22" } } }
]
}
}
You should query using the _parent field and passing it an array of IDs of locations to which you want to limit the exhibits.