Elasticsearch: do exact searches where the query contains special characters like '#' - elasticsearch

Get the results of only those documents which contain '#test' and ignore the documents that contain just 'test' in elasticsearch

People may gripe at you about this question, so I'll note that it was in response to my comment on this post.
You're probably going to want to read up on analysis in Elasticsearch, as well as match queries versus term queries.
Anyway, the convention here is to use a .raw sub-field on a string field. That way, if you want to do searches involving analysis, you can use the base field, but if you want to search for exact (un-analyzed) values, you can use the sub-field.
So here is a simple mapping that accomplishes this:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"post_text": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Now if I add these two documents:
PUT /test_index/doc/1
{
"post_text": "#test"
}
PUT /test_index/doc/2
{
"post_text": "test"
}
A "match" query against the base field will return both:
POST /test_index/_search
{
"query": {
"match": {
"post_text": "#test"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.5945348,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.5945348,
"_source": {
"post_text": "#test"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.5945348,
"_source": {
"post_text": "test"
}
}
]
}
}
But the "term" query below will only return the one:
POST /test_index/_search
{
"query": {
"term": {
"post_text.raw": "#test"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"post_text": "#test"
}
}
]
}
}
Here is the code I used to test it:
http://sense.qbox.io/gist/2f0fbb38e2b7608019b5b21ebe05557982212ac7

Related

Normalizing keyword field: ascii should match diacritic, but not vice versa

I have a keyword field that can contain characters with diacritics. Queries without diacritics should return results with those diacritics, but not vice versa. The first part can be resolved by using a normalizer, the configuration for which is also described in a related question. If I use that for e.g. {"title": "Sulgi"} and {"title": "Šulgi"}, searching for "Sulgi" will (correctly) return both documents. However, searching for "Šulgi" also returns both documents, instead of just the one with the diacritic. It seems ES is also normalizing the query input, which is generally good, but is it possible to change that behavior?
PUT _template/test
{
"index_patterns": ["*"],
"settings": {
"analysis": {
"normalizer": {
"exact": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "keyword",
"normalizer": "exact"
}
}
}
}
POST test/_doc/1
{
"title": "Sulgi"
}
POST test/_doc/2
{
"title": "Šulgi"
}
Example search query:
POST test/_search
{
"query": {
"term": {
"title":"Šulgi"
}
}
}
{
"took": 294,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.18232156,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"title": "Šulgi"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"title": "Sulgi"
}
}
]
}
}

sorting on aggregate of value in a given field in elasticsearch

I have the following field in my index
field1:{key:value}
Is it possible to sort my query on sum of values in field1.
Thanks
Here's one way you could do this, assuming you know the fields ahead of time. It should be possible with some minor refinements if you need to wildcard the fields. This assumes the sibling fields on the nested type are numeric.
Example mapping:
"test": {
"mappings": {
"type1": {
"properties": {
"field1": {
"properties": {
"key1": {
"type": "integer"
},
"key2": {
"type": "integer"
}
}
}
}
}
}
}
Default results:
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "type1",
"_id": "AV8O7956gIcGI2d5A_5g",
"_score": 1,
"_source": {
"field1": {
"key1": 11,
"key2": 17
}
}
},
{
"_index": "test",
"_type": "type1",
"_id": "AV8O78FqgIcGI2d5A_5f",
"_score": 1,
"_source": {
"field1": {
"key1": 5,
"key2": 6
}
}
}
]
}
Query with script:
GET /test/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "return (doc['field1.key1'].value + doc['field1.key2'].value) * -1"
}
}
]
}
}
}
Logic taking the lowest score as the best score (least negative in this case):
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": -11,
"hits": [
{
"_index": "test",
"_type": "type1",
"_id": "AV8O78FqgIcGI2d5A_5f",
"_score": -11,
"_source": {
"field1": {
"key1": 5,
"key2": 6
}
}
},
{
"_index": "test",
"_type": "type1",
"_id": "AV8O7956gIcGI2d5A_5g",
"_score": -28,
"_source": {
"field1": {
"key1": 11,
"key2": 17
}
}
}
]
}
}
Hopefully this gives you the gist of whatever specific scoring logic you need

elasticsearch exact matching string include dot

I have the following indexed document:
curl -XGET "http://127.0.0.1:8200/logstash-test/1/_search"
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "logstash-test",
"_type": "1",
"_id": "AVthzksHqNe69jLmmCEp",
"_score": 1,
"_source": {
"foo": "bar2"
}
},
{
"_index": "logstash-test",
"_type": "1",
"_id": "AVthzlbfqNe69jLmmCSr",
"_score": 1,
"_source": {
"foo": "bar3"
}
},
{
"_index": "logstash-test",
"_type": "1",
"_id": "AVthwg4_qNe69jLmlStd",
"_score": 1,
"_source": {
"foo": "bar"
}
},
{
"_index": "logstash-test",
"_type": "1",
"_id": "AVth0IS1qNe69jLmmMpZ",
"_score": 1,
"_source": {
"foo": "bar4.foo_bar.foo"
}
}
]
}
}
I want to search foo=bar2 or foo=ba3 or foo=bar4.foo_bar.foo
curl -XPOST "http://127.0.0.1:8200/logstash-test/1/_search" -d
'{"query":{"bool":{"filter":[{"terms":{"foo":["bar3","bar2","bar4.foo_bar.foo"]}}]}}}'
But bar4.foo_bar.foo do not match.
Thank you.
As you are searching on exact terms use keyword field available on foo field as shown below:
curl -XPOST "http://127.0.0.1:8200/logstash-test/1/_search" -d
'{
"query": {
"bool": {
"filter": [
{
"terms": {
"foo.keyword": [
"bar3",
"bar2",
"bar4.foo_bar.foo"
]
}
}
]
}
}
}'
You can read more about multi-fields here
Method #2
You can solve it by using different analyzer( e.g whitespace analyzer) for foo field while defining mapping for it.
PUT logstash-test
{
"mappings": {
"1": {
"properties": {
"foo": {
"type": "text",
"analyzer": "whitespace"
}
}
}
}
}
But as your are searching on exact terms method #1 is preferred over method #2

What is the query required for fetching full-text with delimiter in elasticsearch

Assuming I have a document like this in elasticSearch :
{
"videoName": "taylor.mp4",
"type": "long"
}
I tried full-text search using the DSL query:
{
"query": {
"match":{
"videoName": "taylor"
}
}
}
I need to get the above document, but I don't get it .If I specify taylor.mp4, it returns the document.
So, I would like to know, how to make full-text search with delimiters.
Edit after KARTHEEK answer:
The regexp fetches the taylor.mp4 document. Take the situation, where the document in video index are:
{
"videoName": "Akon - smack that.mp4",
"type": "long"
}
So, the query for retrieving this document can be ,
{
"query": {
"match":{
"videoName": "smack that"
}
}
}
In this case, the document will be retrieved, since we use smack in the query string. match does the full-text search and gets us the document. But, say I only know the that keyword and the match, doesn't get the document. I need to use regexp for that.
{
"query": {
"regexp":{
"videoName": "smack.* that.*"
}
}
}
On the Other hand, if i take up regexp and make all my query strings to smack.* that.*, this will also not retrieve any documents. And, we dont know which word will have its suffix .mp4. So, my question is we need to do the full-text search with match, and it should also detect the delimiters. Is there any other way ?
Edit after Richa asked the mapping of index
for http://localhost:9200/example/videos/_mapping
{
"example": {
"mappings": {
"videos": {
"properties": {
"query": {
"properties": {
"match": {
"properties": {
"videoName": {
"type": "string"
}
}
}
}
},
"type": {
"type": "string"
},
"videoName": {
"type": "string"
}
}
}
}
}
}
Depending upon above query you mentioned right we can use regular expression in order get the result.Please find attached result for your perusal and let me know if there are anything else you want.
curl -XGET "http://localhost:9200/test/sample/_search" -d'
{
"query": {
"regexp":{
"videoName": "taylor.*"
}
}
}'
Result:
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "sample",
"_id": "1",
"_score": 1,
"_source": {
"videoName": "taylor.mp4",
"type": "long"
}
}
]
}
}
Please use this mapping
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"videoName": {
"type": "string",
"term_vector": "yes"
}
}
}
}
}
After that you need to index a document that you mentioned earlier:
PUT test_index/doc/1
{
"videoName": "Akon - smack that.mp4",
"type": "long"
}
Output:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.15342641,
"_source": {
"videoName": "Akon - smack that.mp4",
"type": "long"
}
}
]
}
}
Query to get results:
GET /test_index/doc/1/_termvector?fields=videoName
Results:
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_version": 1,
"found": true,
"took": 1,
"term_vectors": {
"videoName": {
"field_statistics": {
"sum_doc_freq": 3,
"doc_count": 1,
"sum_ttf": 3
},
"terms": {
"akon": {
"term_freq": 1
},
"smack": {
"term_freq": 1
},
"that.mp4": {
"term_freq": 1
}
}
}
}
}
By using this we will search based on "smack"
POST /test_index/_search
{
"query": {
"match": {
"_all": "smack"
}
}
}
Result:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.15342641,
"_source": {
"videoName": "Akon - smack that.mp4",
"type": "long"
}
}
]
}
}

how to index for specific fields of documents using elasticsearch

My requirement is to store specific fields of document to index in elasticsearch.
Example:
My document is
{
"name":"stev",
"age":26,
"salary":25000
}
This is my document but i don't want indexing total document.I want store only name field.
I created one index emp and write mapping like below
"person" : {
"_all" : {"enabled" : false},
"properties" : {
"name" : {
"type" : "string", "store" : "yes"
}
}
}
When see the index document
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "AU1_p0xAq8r9iH00jFB_",
"_score": 1,
"_source": { }
}
,
{
"_index": "test",
"_type": "test",
"_id": "AU1_lMDCq8r9iH00jFB-",
"_score": 1,
"_source": { }
}
]
}
}
name fields is not generated,Why?
any one help to me
It's hard to tell what you're doing wrong from what you posted, but I can give you an example that works.
Elasticsearch will, by default, index whatever source documents you give it. Every time it sees a new document field, it will create a mapping field with sensible defaults, and it will index them by default as well. If you want to exclude fields, you can set "index": "no" and "store": "no" in the mapping for each field you want to exclude. If you want that behavior to be the default for every field, you can use the "_default_" property for specifying that fields not be stored (though I couldn't get it to work for not indexing).
You probably also will want to disable "_source", and use the "fields" parameter in your search queries.
Here is an example. The index definition looks like this:
PUT /test_index
{
"mappings": {
"person": {
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"name": {
"type": "string",
"index": "analyzed",
"store": "yes"
},
"age": {
"type": "integer",
"index": "no",
"store": "no"
},
"salary": {
"type": "integer",
"index": "no",
"store": "no"
}
}
}
}
}
Then I can add a few documents with the bulk api:
POST /test_index/person/_bulk
{"index":{"_id":1}}
{"name":"stev","age":26,"salary":25000}
{"index":{"_id":2}}
{"name":"bob","age":30,"salary":28000}
{"index":{"_id":3}}
{"name":"joe","age":27,"salary":35000}
Since I disabled "_source", a simple query will return only ids:
POST /test_index/_search
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "person",
"_id": "1",
"_score": 1
},
{
"_index": "test_index",
"_type": "person",
"_id": "2",
"_score": 1
},
{
"_index": "test_index",
"_type": "person",
"_id": "3",
"_score": 1
}
]
}
}
But if I specify that I want the "name" field, I'll get it:
POST /test_index/_search
{
"fields": [
"name"
]
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "person",
"_id": "1",
"_score": 1,
"fields": {
"name": [
"stev"
]
}
},
{
"_index": "test_index",
"_type": "person",
"_id": "2",
"_score": 1,
"fields": {
"name": [
"bob"
]
}
},
{
"_index": "test_index",
"_type": "person",
"_id": "3",
"_score": 1,
"fields": {
"name": [
"joe"
]
}
}
]
}
}
You can prove to yourself that the other fields were not stored by running:
POST /test_index/_search
{
"fields": [
"name", "age", "salary"
]
}
which will return the same result. I can also prove that the "age" field wasn't indexed by running this query, which would return a document if "age" had been indexed:
POST /test_index/_search
{
"fields": [
"name", "age"
],
"query": {
"term": {
"age": {
"value": 27
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Here is a bunch of code I used for playing around with this. I wanted to use a _default mapping and/or field to handle this without having to specify the settings for each field. I was able to make it work in terms of not storing data, but each field was still indexed.
http://sense.qbox.io/gist/d84967923d6c0757dba5f44240f47257ba2fbe50

Resources