ElasticSearch search for part of url - elasticsearch

I'm working with ElasticSearch 5 and can't find a solution for the following:
I want to search for a string with slashes (part of a url) in a document. But it won't return matching documents.
I've read something that strings with slashes are splitted by ES and that's not what I want for this field. I've tried to set "not_analyzed" on the field with a mapping, but I can't seem to get it to work somehow.
"Create index":
Put http://localhost:9200/test
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type1" : {
"properties" : {
"field1" : { "type" : "text","index": "not_analyzed" }
}
}
}
}
"Add document":POST http://localhost:9200/test/type1/
{
"field1" : "this/is/a/url/test"
}
"Search document" POST http://localhost:9200/test/type1/_search
{
"size" : 1000,
"query" : {
"bool" : {
"must" : [{
"term" : {
"field1" : {
"value" : "this/is/a/url/test"
}
}
}
]
}
}
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
"The mapping response": GET http://localhost:9200/test/_mapping?pretty
{
"test": {
"mappings": {
"type1": {
"properties": {
"field1": {
"type": "text"
}
}
}
}
}
}

Using a term query for getting an exact match is correct. However, your initial mapping is wrong.
"type" : "text", "index": "not_analyzed"
should be this instead
"type": "keyword"
(Note: The keyword type in ES5 is equivalent to a not_analyzed string in ES 2.x)
You need to delete your index and re-create it with the corrected mapping. Then your term query will work.

I suspect what you need is a Match query, not a Terms query. Terms is looking for a single "term"/word and is not breaking down your request with an analyzer.
{
"size" : 1000,
"query" : {
"bool" : {
"must" : [{
"match" : {
"field1" : "this/is/a/url/test"
}
}
]
}
}
}

Related

Can't Highlight Dynamic Template Field Value in Elasticsearch

Follow up to this question.
I have a dynamic template which copies the text of a JSON blob to a single text field, and I'd like to search on that field and highlight matches. Here is my full code for ES 6.5
DELETE /test
PUT /test?include_type_name=true
{
"settings": {"number_of_shards": 1,"number_of_replicas": 1},
"mappings": {
"_doc": {
"dynamic_templates": [
{
"full_name": {
"match_mapping_type": "string",
"path_match": "content.*",
"mapping": {
"type": "text",
"copy_to": "content_text"
}
}
}
],
"properties": {
"content_text": {
"type": "text"
},
"content": {
"type": "object",
"enabled": "true"
}
}
}
}
}
PUT /test/_doc/1?refresh=true
{
"content": {
"a": {
"b": {
"text": "42"
}
}
}
}
GET /test/_search
{
"query": {
"match": {
"content_text": "42"
}
},
"highlight": {
"fields": {
"content_text": {}
}
}
}
The response does not show the highlighted content_text
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"content" : {
"a" : {
"b" : {
"text" : "42"
}
}
}
}
}
]
}
}
As you can see, the content_text field is not highlight. It's also not in the response at all. How do I get highlights for this field to show up?
This is a tricky one, but will make sense once you read what follows.
As per the official documentation on highlighting, the actual content of a field is required to exist somewhere. So if the field is not stored (i.e. the mapping does not set store to true), the actual _source is loaded and the relevant field is extracted from _source.
In your case, the content_text field doesn't exist in the _source document (i.e. it is just indexed from other text fields present in content.*) and in the mapping, the store parameter is not set to true (it is false by default).
So you simply need to change your mapping to this:
"content_text": {
"store": true,
"type": "text"
},
And then your query will yield this:
"highlight" : {
"content_text" : [
"<em>42</em>"
]
}

access query value from function_score to compute new score

I need to customize ES score. The score function I need to implement is:
score = len(document_term) - len(query_term)
For instance, one of my document in the ES index is :
{
"name": "foobar"
}
And the search query
{
"query": {
"function_score": {
"query": {
"match": {
"name": {
"query": "foo"
}
}
},
"functions": [
{
"script_score": {
"script": {
"source": "doc['name'].value.length() - ?LEN(query_tem)?"
}
}
}
],
"boost_mode": "replace"
}
}
}
The above search should provide a score of 6 - 3 = 3. But I didn't find a solution to get access the value of the query term.
Is it possible to access the value of the query term in a function_score context ?
There is no direct way to do this, however you can achieve that in the below way where you would need to add the query parameters in two different parts of the query.
Before that one important note, you cannot apply the doc['myfield'].value if the field is of type text, instead you would need to have its sibling field created as keyword and refer that in the script, which again I've mentioned below:
Mapping:
PUT myindex
{
"mappings" : {
"properties" : {
"myfield" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
Sample Document:
POST myquery/_doc/1
{
"myfield": "I've become comfortably numb"
}
Query:
POST <your_index_name>/_search
{
"query": {
"function_score": {
"query": {
"match": {
"myfield": "numb"
}
},
"functions": [
{
"script_score": {
"script": {
"source": "return doc['myfield.keyword'].value.length() - params.myquery.length()",
"params": {
"myquery": "numb" <---- Add the query string here as well
}
}
}
}
],
"boost_mode": "replace"
}
}
}
Response:
{
"took" : 558,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 24.0,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 24.0,
"_source" : {
"myfield" : "I've become comfortably numb"
}
}
]
}
}
Hope this helps!

Elasticsearch geospatial queries returning no hits

I'm using Kibana to look at a geospatial dataset in Elasticsearch for a feature currently under development. There is a index of positions which contains field "loc.coordinates", which is a geo_point, and has as data as such:
loc.coordinates 25.906958000000003, 51.776407000000006
However when I run the following query I get no results:
Query
GET /positions/_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "2000km",
"loc.coordinates" : {
"lat" : 25,
"lon" : 51
}
}
}
}
}
}
Response
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
I'm trying to understand why this is, as there are over 250,000 datapoints in the index, and I'm getting no hits regardless of how big the search area is. When I look in the position index mapping I see the following:
"loc": {
"type": "nested",
"properties": {
"coordinates": {
"type": "geo_point"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
I'm new to Elasticsearch and have been making my way through the documentation, but so far I don't see why my geo queries aren't working as expected. What am I doing wrong?
Your loc field is of type nested, so you need to query that field accordingly with a nested query:
GET /positions/_search
{
"query": {
"bool" : {
"filter" : {
"nested": {
"path": "loc",
"query": {
"geo_distance" : {
"distance" : "2000km",
"loc.coordinates" : {
"lat" : 25,
"lon" : 51
}
}
}
}
}
}
}
}

Elasticsearch Filtering Parents by Filtered Child Document Count

I'm attempting to do some elasticsearch query fu on a set of data I have.
I have a user document that is the parent to many child page view documents. I'm looking to return all users that have viewed a specific page an arbitrary amount of times (defined by user input box). So far, I've got a has_child query that will return me all the users that have a page view with certain ids. However, this will return those parents with all their children. Next, I've tried to write an aggregation on those query results, that will essentially do the same has_child query in aggregation form. Now, I have the right document count for my filtered child documents. I need to use this document count to go back and filter the parents. To explain the query in words, "return to me all the users that have viewed a specific page more than 4 times". It's possible that I may need to restructure my data. Any thoughts?
Here is my query thus far:
curl -XGET 'http://localhost:9200/development_users/_search?pretty=true' -d '
{
"query" : {
"has_child" : {
"type" : "page_view",
"query" : {
"terms" : {
"viewed_id" : [175,180]
}
}
}
},
"aggs" : {
"to_page_view": {
"children": {
"type" : "page_view"
},
"aggs" : {
"page_views_that_match" : {
"filter" : { "terms": { "viewed_id" : [175,180] } }
}
}
}
}
}'
This returns me a response like:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "development_users",
"_type" : "user",
"_id" : "22548",
"_score" : 1.0,
"_source":{"id":22548,"account_id":1009}
} ]
},
"aggregations" : {
"to_page_view" : {
"doc_count" : 53,
"page_views_that_match" : {
"doc_count" : 2
}
}
}
}
Associated Mappings:
{
"development_users" : {
"mappings" : {
"page_view" : {
"dynamic" : "false",
"_parent" : {
"type" : "user"
},
"_routing" : {
"required" : true
},
"properties" : {
"created_at" : {
"type" : "date",
"format" : "date_time"
},
"id" : {
"type" : "integer"
},
"viewed_id" : {
"type" : "integer"
},
"time_on_page" : {
"type" : "integer"
},
"title" : {
"type" : "string"
},
"type" : {
"type" : "string"
},
"updated_at" : {
"type" : "date",
"format" : "date_time"
},
"url" : {
"type" : "string"
}
}
},
"user" : {
"dynamic" : "false",
"properties" : {
"account_id" : {
"type" : "integer"
},
"id" : {
"type" : "integer"
}
}
}
}
}
}
Okay, so this is kind of involved. I made a few simplifications to keep it straight in my head. First, I used this mapping:
PUT /test_index
{
"mappings": {
"page_view": {
"_parent": {
"type": "development_user"
},
"properties": {
"viewed_id": {
"type": "string"
}
}
},
"development_user": {
"properties": {
"id": {
"type": "string"
}
}
}
}
}
Then I added some data. In this little universe, I have three users and two pages. I want to find users who have viewed "page_a" at least twice, so if I construct the correct query only user 3 will be returned.
POST /test_index/development_user/_bulk
{"index":{"_type":"development_user","_id":1}}
{"id":"user_1"}
{"index":{"_type":"page_view","_parent":1}}
{"viewed_id":"page_a"}
{"index":{"_type":"development_user","_id":2}}
{"id":"user_2"}
{"index":{"_type":"page_view","_parent":2}}
{"viewed_id":"page_b"}
{"index":{"_type":"development_user","_id":3}}
{"id":"user_3"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_a"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_a"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_b"}
To get that answer we'll use aggregations. Notice that I don't want documents returned (the normal way), but I do want to filter down the documents we analyze, because it will make things more efficient. So I use the same basic filter you had before.
So the aggregation tree starts with terms_parent_id which will just separate parent documents. Inside that I have children_page_view which filters the child documents down to the ones I want ("page_a"), and next to it in the hierarchy is bucket_selector_page_id_term_count which uses a bucket selector (you'll need ES 2.x) to filter the parent documents by those meeting the criterium, and then finally a top hits aggregation which shows us the documents that match the requirements.
POST /test_index/development_user/_search
{
"size": 0,
"query": {
"has_child": {
"type": "page_view",
"query": {
"terms": {
"viewed_id": [
"page_a"
]
}
}
}
},
"aggs": {
"terms_parent_id": {
"terms": {
"field": "id"
},
"aggs": {
"children_page_view": {
"children": {
"type": "page_view"
},
"aggs": {
"filter_page_ids": {
"filter": {
"terms": {
"viewed_id": [
"page_a"
]
}
}
}
}
},
"bucket_selector_page_id_term_count": {
"bucket_selector": {
"buckets_path": {
"children_count": "children_page_view>filter_page_ids._count"
},
"script": "children_count >= 2"
}
},
"top_hits_users": {
"top_hits": {
"_source": {
"include": [
"id"
]
}
}
}
}
}
}
}
which returns:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"terms_parent_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "user_3",
"doc_count": 1,
"children_page_view": {
"doc_count": 3,
"filter_page_ids": {
"doc_count": 2
}
},
"top_hits_users": {
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "development_user",
"_id": "3",
"_score": 1,
"_source": {
"id": "user_3"
}
}
]
}
}
}
]
}
}
}
Here's all the code I used:
http://sense.qbox.io/gist/43f24461448519dc884039db40ebd8e2f5b7304f

ElasticSearch object filter not working

I try to create mappings and indexes by using Jest.
After I inserted some data, I tried to filter the query and It didn't work.
I have an object mapping like this:
http://localhost:9200/contacts?pretty=true
"contacts" : {
...
"mappings" : {
"contact" : {
"properties" : {
...
"user" : {
"properties" : {
"id" : {
"type" : "long"
},
"uuid" : {
"type" : "string"
}
}
}
}
}
Data:
{
"_index" : "contacts",
"_type" : "contact",
"_id" : "131530ff-d125-47c1-8fae-f48f2def9037",
"_version" : 1,
"found" : true,
"_source":{"id":"131530ff-d125-47c1-8fae-f48f2def9037","shared":false,"favourite":false,"user":{"id":1,"uuid":"AB353469"}}
}
My query:
http://localhost:9200/contacts/_search
{
"query":{
"filtered":{
...
"filter":{
"term" : {
"user.uuid" : "AB353469" }
}
}
}
}
Response:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Could you please tell me why It didn't work?
Thank you very much!
P.S:
- Elasticsearch version: 1.7.2
Change initiatorUuid mapping from
"initiatorUuid" : {
"type" : "string"
}
to
"initiatorUuid" : {
"type" : "string",
"index": "not_analyzed"
}
re-create the index, re-index the documents and try again.

Resources