ElasticSearch object filter not working - elasticsearch

I try to create mappings and indexes by using Jest.
After I inserted some data, I tried to filter the query and It didn't work.
I have an object mapping like this:
http://localhost:9200/contacts?pretty=true
"contacts" : {
...
"mappings" : {
"contact" : {
"properties" : {
...
"user" : {
"properties" : {
"id" : {
"type" : "long"
},
"uuid" : {
"type" : "string"
}
}
}
}
}
Data:
{
"_index" : "contacts",
"_type" : "contact",
"_id" : "131530ff-d125-47c1-8fae-f48f2def9037",
"_version" : 1,
"found" : true,
"_source":{"id":"131530ff-d125-47c1-8fae-f48f2def9037","shared":false,"favourite":false,"user":{"id":1,"uuid":"AB353469"}}
}
My query:
http://localhost:9200/contacts/_search
{
"query":{
"filtered":{
...
"filter":{
"term" : {
"user.uuid" : "AB353469" }
}
}
}
}
Response:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Could you please tell me why It didn't work?
Thank you very much!
P.S:
- Elasticsearch version: 1.7.2

Change initiatorUuid mapping from
"initiatorUuid" : {
"type" : "string"
}
to
"initiatorUuid" : {
"type" : "string",
"index": "not_analyzed"
}
re-create the index, re-index the documents and try again.

Related

elasticsearch - only return specific fields without _source?

I've found some answer like
Make elasticsearch only return certain fields?
But they all need _source field.
In my system, disk and network are both scarce resources.
I can't store _source field and I don't need _index, _score field.
ElasticSearch Version: 5.5
Index Mapping just likes
{
"index_2020-04-08": {
"mappings": {
"type1": {
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"rank_score": {
"type": "float"
},
"first_id": {
"type": "keyword"
},
"second_id": {
"type": "keyword"
}
}
}
}
}
}
My query:
GET index_2020-04-08/type1/_search
{
"query": {
"bool": {
"filter": {
"term": {
"first_id": "hello"
}
}
}
},
"size": 1000,
"sort": [
{
"rank_score": {
"order": "desc"
}
}
]
}
The search results I got :
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "index_2020-04-08",
"_type": "type1",
"_id": "id_1",
"_score": null,
"sort": [
0.06621722
]
},
{
"_index": "index_2020-04-08",
"_type": "type1",
"_id": "id_2",
"_score": null,
"sort": [
0.07864579
]
}
]
}
}
The results I want:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_id": "id_1"
},
{
"_id": "id_2"
}
]
}
}
Can I implement it?
To return specific fields in the document, you must do one of the two:
Include the _source field in your documents, which is enabled by default.
Store specific fields with the stored fields feature which must be enabled manually
Because you want pretty much the document Ids and some metadata, you can use the filter_path feature.
Here's an example that's close to what you want (just change the field list):
$ curl -X GET "localhost:9200/metricbeat-7.6.1-2020.04.02-000002/_search?filter_path=took,timed_out,_shards,hits.total,hits.max_score,hits.hits._id&pretty"
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_id" : "8SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "8iEGSHEBzNscjCyQ18cg"
},
{
"_id" : "8yEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9CEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9iEGSHEBzNscjCyQ18cg"
},
{
"_id" : "9yEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-CEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-SEGSHEBzNscjCyQ18cg"
},
{
"_id" : "-iEGSHEBzNscjCyQ18cg"
}
]
}
}
Just to clarify based on the SO question you linked -- you're not storing the _source, you're requesting it from ES. It's usually used to limit what you want to have retrieved, i.e.
...
"_source": ["only", "fields", "I", "need"]
...
_score, _index etc are meta fields that are going to be retrieved no matter what. You can "hack" it a bit by seeting the size to 0 and aggregating, i.e.
{
"size": 0,
"aggs": {
"by_ids": {
"terms": {
"field": "_id"
}
}
}
}
which will save you a few bytes
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Ac76WXEBnteqn982smh_",
"doc_count" : 1
},
{
"key" : "As77WXEBnteqn982EGgq",
"doc_count" : 1
}
]
}
}
}
but performing aggregations has a cost of its own.

ElasticSearch search for part of url

I'm working with ElasticSearch 5 and can't find a solution for the following:
I want to search for a string with slashes (part of a url) in a document. But it won't return matching documents.
I've read something that strings with slashes are splitted by ES and that's not what I want for this field. I've tried to set "not_analyzed" on the field with a mapping, but I can't seem to get it to work somehow.
"Create index":
Put http://localhost:9200/test
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type1" : {
"properties" : {
"field1" : { "type" : "text","index": "not_analyzed" }
}
}
}
}
"Add document":POST http://localhost:9200/test/type1/
{
"field1" : "this/is/a/url/test"
}
"Search document" POST http://localhost:9200/test/type1/_search
{
"size" : 1000,
"query" : {
"bool" : {
"must" : [{
"term" : {
"field1" : {
"value" : "this/is/a/url/test"
}
}
}
]
}
}
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
"The mapping response": GET http://localhost:9200/test/_mapping?pretty
{
"test": {
"mappings": {
"type1": {
"properties": {
"field1": {
"type": "text"
}
}
}
}
}
}
Using a term query for getting an exact match is correct. However, your initial mapping is wrong.
"type" : "text", "index": "not_analyzed"
should be this instead
"type": "keyword"
(Note: The keyword type in ES5 is equivalent to a not_analyzed string in ES 2.x)
You need to delete your index and re-create it with the corrected mapping. Then your term query will work.
I suspect what you need is a Match query, not a Terms query. Terms is looking for a single "term"/word and is not breaking down your request with an analyzer.
{
"size" : 1000,
"query" : {
"bool" : {
"must" : [{
"match" : {
"field1" : "this/is/a/url/test"
}
}
]
}
}
}

Elasticsearch Filtering Parents by Filtered Child Document Count

I'm attempting to do some elasticsearch query fu on a set of data I have.
I have a user document that is the parent to many child page view documents. I'm looking to return all users that have viewed a specific page an arbitrary amount of times (defined by user input box). So far, I've got a has_child query that will return me all the users that have a page view with certain ids. However, this will return those parents with all their children. Next, I've tried to write an aggregation on those query results, that will essentially do the same has_child query in aggregation form. Now, I have the right document count for my filtered child documents. I need to use this document count to go back and filter the parents. To explain the query in words, "return to me all the users that have viewed a specific page more than 4 times". It's possible that I may need to restructure my data. Any thoughts?
Here is my query thus far:
curl -XGET 'http://localhost:9200/development_users/_search?pretty=true' -d '
{
"query" : {
"has_child" : {
"type" : "page_view",
"query" : {
"terms" : {
"viewed_id" : [175,180]
}
}
}
},
"aggs" : {
"to_page_view": {
"children": {
"type" : "page_view"
},
"aggs" : {
"page_views_that_match" : {
"filter" : { "terms": { "viewed_id" : [175,180] } }
}
}
}
}
}'
This returns me a response like:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "development_users",
"_type" : "user",
"_id" : "22548",
"_score" : 1.0,
"_source":{"id":22548,"account_id":1009}
} ]
},
"aggregations" : {
"to_page_view" : {
"doc_count" : 53,
"page_views_that_match" : {
"doc_count" : 2
}
}
}
}
Associated Mappings:
{
"development_users" : {
"mappings" : {
"page_view" : {
"dynamic" : "false",
"_parent" : {
"type" : "user"
},
"_routing" : {
"required" : true
},
"properties" : {
"created_at" : {
"type" : "date",
"format" : "date_time"
},
"id" : {
"type" : "integer"
},
"viewed_id" : {
"type" : "integer"
},
"time_on_page" : {
"type" : "integer"
},
"title" : {
"type" : "string"
},
"type" : {
"type" : "string"
},
"updated_at" : {
"type" : "date",
"format" : "date_time"
},
"url" : {
"type" : "string"
}
}
},
"user" : {
"dynamic" : "false",
"properties" : {
"account_id" : {
"type" : "integer"
},
"id" : {
"type" : "integer"
}
}
}
}
}
}
Okay, so this is kind of involved. I made a few simplifications to keep it straight in my head. First, I used this mapping:
PUT /test_index
{
"mappings": {
"page_view": {
"_parent": {
"type": "development_user"
},
"properties": {
"viewed_id": {
"type": "string"
}
}
},
"development_user": {
"properties": {
"id": {
"type": "string"
}
}
}
}
}
Then I added some data. In this little universe, I have three users and two pages. I want to find users who have viewed "page_a" at least twice, so if I construct the correct query only user 3 will be returned.
POST /test_index/development_user/_bulk
{"index":{"_type":"development_user","_id":1}}
{"id":"user_1"}
{"index":{"_type":"page_view","_parent":1}}
{"viewed_id":"page_a"}
{"index":{"_type":"development_user","_id":2}}
{"id":"user_2"}
{"index":{"_type":"page_view","_parent":2}}
{"viewed_id":"page_b"}
{"index":{"_type":"development_user","_id":3}}
{"id":"user_3"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_a"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_a"}
{"index":{"_type":"page_view","_parent":3}}
{"viewed_id":"page_b"}
To get that answer we'll use aggregations. Notice that I don't want documents returned (the normal way), but I do want to filter down the documents we analyze, because it will make things more efficient. So I use the same basic filter you had before.
So the aggregation tree starts with terms_parent_id which will just separate parent documents. Inside that I have children_page_view which filters the child documents down to the ones I want ("page_a"), and next to it in the hierarchy is bucket_selector_page_id_term_count which uses a bucket selector (you'll need ES 2.x) to filter the parent documents by those meeting the criterium, and then finally a top hits aggregation which shows us the documents that match the requirements.
POST /test_index/development_user/_search
{
"size": 0,
"query": {
"has_child": {
"type": "page_view",
"query": {
"terms": {
"viewed_id": [
"page_a"
]
}
}
}
},
"aggs": {
"terms_parent_id": {
"terms": {
"field": "id"
},
"aggs": {
"children_page_view": {
"children": {
"type": "page_view"
},
"aggs": {
"filter_page_ids": {
"filter": {
"terms": {
"viewed_id": [
"page_a"
]
}
}
}
}
},
"bucket_selector_page_id_term_count": {
"bucket_selector": {
"buckets_path": {
"children_count": "children_page_view>filter_page_ids._count"
},
"script": "children_count >= 2"
}
},
"top_hits_users": {
"top_hits": {
"_source": {
"include": [
"id"
]
}
}
}
}
}
}
}
which returns:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"terms_parent_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "user_3",
"doc_count": 1,
"children_page_view": {
"doc_count": 3,
"filter_page_ids": {
"doc_count": 2
}
},
"top_hits_users": {
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "development_user",
"_id": "3",
"_score": 1,
"_source": {
"id": "user_3"
}
}
]
}
}
}
]
}
}
}
Here's all the code I used:
http://sense.qbox.io/gist/43f24461448519dc884039db40ebd8e2f5b7304f

Elasticsearch Prefix query not working on nested documents

I'm using a prefix query for an elasticsearch query. It works fine when using it on top-level data, but once applied to nested data there are no results returned. The data I try to query looks as follows:
Here the prefix query works fine:
Query:
{ "query": { "prefix" : { "duration": "7"} } }
Result:
{
"took": 25, ... },
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "itemresults",
"_type": "itemresult",
"_id": "ITEM_RESULT_7c8649c2-6cb0-487e-bb3c-c4bf0ad28a90_8bce0a3f-f951-4a01-94b5-b55dea1a2752_7c965241-ad0a-4a83-a400-0be84daab0a9_61",
"_score": 1,
"_source": {
"score": 1,
"studentId": "61",
"timestamp": 1377399320017,
"groupIdentifiers": {},
"assessmentItemId": "7c965241-ad0a-4a83-a400-0be84daab0a9",
"answered": true,
"duration": "7.078",
"metadata": {
"Korrektur": "a",
"Matrize12_13": "MA.1.B.1.d.1",
"Kompetenz": "ZuV",
"Zyklus": "Z2",
"Schwierigkeit": "H",
"Handlungsaspekt": "AuE",
"Fach": "MA",
"Aufgabentyp": "L"
},
"assessmentSessionId": "7c8649c2-6cb0-487e-bb3c-c4bf0ad28a90",
"assessmentId": "8bce0a3f-f951-4a01-94b5-b55dea1a2752"
}
},
Now trying to use the prefix query to apply on the nested structure 'metadata' doesn't return any result:
{ "query": { "prefix" : { "metadata.Fach": "M"} } }
Result:
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 15,
"successful": 15,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
What am I doing wrong? Is it at all possible to apply prefix on nested data?
It does not depends whether is nested or not. It depends on your mapping, if you are analyzing the string at index time or not.
I'm going to put an example:
I've created and index with the following mapping:
curl -XPUT 'http://localhost:9200/test/' -d '
{
"mappings": {
"test" : {
"properties" : {
"text_1" : {
"type" : "string",
"index" : "analyzed"
},
"text_2" : {
"index": "not_analyzed",
"type" : "string"
}
}
}
}
}'
Basically 2 text fields, one analyzed and the other not_analyzed. Now I index the following document:
curl -XPUT 'http://localhost:9200/test/test/1' -d '
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}'
text_1 query
As text_1 is analyzed one of the things that elasticsearch does is to convert the field into lower case. So if I make the following query it doesn't find any document:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_1": "H"} } }
'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
But if I do the trick and use lower case for making the query:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_1": "h"} } }
'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 1.0, "_source" :
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}
} ]
}
}
text_2 query
As text_2 is not analyzed, when I make the original query it matches:
curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
{ "query": { "prefix" : { "text_2": "H"} } }
'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "1",
"_score" : 1.0, "_source" :
{
"text_1" : "Hello world",
"text_2" : "Hello world"
}
} ]
}
}

How to get a detailed results output from ElasticSearch

When I do a query like this:
curl 'http://localhost:9200/xenforo/_search?q=message:test'
I get the following result:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 12.816886,
"hits": [
{
"_index": "xenforo",
"_type": "post",
"_id": "1778114",
"_score": 12.816886
}
]
}
}
The important _id is shown, but how would I get more information like the date, user and node information.
Here is some of my mapping information, I think the important part is shown:
curl -X GET 'http://localhost:9200/xenforo/_mapping?pretty=true'
{
"xenforo113" : {
"post" : {
"_source" : {
"enabled" : false
},
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string"
},
"node" : {
"type" : "long"
},
"thread" : {
"type" : "long"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
},
I assume I will need to do a DSL query but I don't know which command would show the other information I'm after in the results.
As you have disabled _source, you have to ask for explicit fields:
curl 'http://localhost:9200/xenforo/_search -d '{
"fields" : ["user", "date", "node"],
"query" : {
"match" : { "message" : "test" }
}
}'
See documentation.

Resources