Elasticsearch [match] unknown token [START_OBJECT] after [created_utc] - elasticsearch

I am learning how to use elasticsearch using the 2006 dataset of reddit comments from pushshift.io.
created_utc is the field with the time a comment was created.
I am trying to get all the posts within a certain time range. I googled a bit and found out that I need to use the "range" keyword.
This is my query right now:
{
"query": {
"match" : {
"range": {
"created_utc": {
"gte": "1/1/2006",
"lte": "31/1/2006",
"format": "dd/MM/yyyy"
}
}
}
}
}
I then tried using a bool query so I can match time range with edited must not = False (edited being the boolean field that tells me whether a post has been edited or not):
{
"query": {
"bool" : {
"must" : {
"range" : {
"created_utc": {
"gte" : "01/12/2006", "lte": "31/12/2006", "format": "dd/MM/yyyy"
}
}
},
"must_not": {
"edited": False
}
}
}
}
However, this gave me another error that I can't figure out:
[edited] query malformed, no start_object after query name
I'd appreciate if anyone can help me out with this, thanks!
Here is my mapping for the comment if it helps:
{
"comment":{
"properties":{
"author":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"body":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"controversiality":{
"type":"long"
},
"created_utc":{
"type":"date"
},
"edited":{
"type":"boolean"
},
"gilded":{
"type":"long"
},
"id":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"link_id":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"parent_id":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"score":{
"type":"long"
},
"subreddit":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
}

If you want to get all the posts within a time range, then you must be using a range query. The problem with your query is you are using range inside a match query which is not allowed in elasticsearch, so your query should look like:
{
"query": {
"range": {
"created_utc": {
"gte": 1136074029,
"lte": 1136076410
}
}
}
}
Providing the fact that the created_utc field is saved as epoch, you must use a epoch format to query.
The second query where you want to find the posts within a range where edited must not false:
{
"query": {
"bool": {
"must": [
{
"range": {
"created_utc": {
"gte": 1136074029,
"lte": 1136076410
}
}
}
],
"must_not": [
{
"match": {
"edited": false
}
}
]
}
}
}
Note: If your created_utc is stored in dd/MM/yyyy format then while querying you should use a strict companion format, i.e. instead of 1/1/2006 you should be giving 01/01/2006.
Hope this helps !

Related

Elasticsearch composite aggregate query on nested fields

I've a question on an aggregation on nested objects.
Document is like:
{
"features": [{
"key": "key1",
"values": ["A", "B"]
},
{
"key": "key2",
"values": ["C", "D"]
},
{
"key": "key2",
"values": ["E"]
}
]
}
where 'features' is a nested object.
I can aggregate and get distinct values from key and values, but I need to get a combined bucket aggregation, where I need:
key1 -> A,B
key2 -> C,D,E
Is composite aggregation that has to be used? Or which is the proper aggregation to use?
Java samples are also welcome!
Thanks!!!
You don't really need composite for this. The following should be fine:
{
"size": 0,
"aggs": {
"nested_aggs": {
"nested": {
"path": "features"
},
"aggs": {
"by_key": {
"terms": {
"field": "features.key.keyword"
},
"aggs": {
"by_values": {
"terms": {
"field": "features.values.keyword"
}
}
}
}
}
}
}
}
assuming your mapping looks like this
{
"mappings":{
"properties":{
"features":{
"type":"nested",
"properties":{
"key":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"values":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
}
}
}

Nested query on ElasticSearch for Long type (ES 5.0.4)

This is my first question on Stack overflow , please excuse me for the mistakes. I will improve on them in the future.
I am new to Elastic Search too. Okay so I am trying to do a exact match in elastic search (5.0.4). Instead of doing an exact match, the request returns all the documents present.
Not sure of this behavior.
Here is the mapping
{
"properties":{
"debug_urls":{
"properties":{
"characteristics":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"url_id":{
"type":"long"
}
},
"type":"nested"
},
"scanId":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
This is my request.
{
"query": {
"nested": {
"path": "debug_urls",
"query": {
"match": {
"debug_urls.url_id": 1
}
}
}
}
}
The response received,
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":1,
"hits":[
{
"_index":"cust_cca39c0c6c8141008e9411032bbf4d21",
"_type":"debug-urls",
"_id":"AW70h0l72s9qXitMsWgC",
"_score":1,
"_source":{
"scan_id":"n_a0a523fb5c81435fb79c34c624c7fbd6",
"debug_urls":[
{
"url_id":1,
"characteristics":[
"FORM",
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":2,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":3,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":4,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":5,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":6,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
},
{
"url_id":7,
"characteristics":[
"EXTERNAL_SCRIPT",
"INLINE_SCRIPT"
]
}
]
}
}
]
}
}
If you only want to see the nested documents that match the criteria, you can leverage nested inner_hits:
{
"_source":["scan_id"], <--- add this line
"query": {
"nested": {
"path": "debug_urls",
"query": {
"match": {
"debug_urls.url_id": 1
}
},
"inner_hits": {} <--- add this line
}
}
}

Elastic search query to return documents matching all elements in an array

I have a structure similar to this:
Document 1:
nestedobject: {
uniqueid: "12345",
field: [ {id: 1,
color: blue,
fruit:banana},
{id: 2,
color: red,
fruit:apple},
]
}
Document 2: (in same index)
nestedobject: {
uniqueid:23456,
field: [ {id: 3,
color: blue,
fruit:banana},
{id: 4,
color: blue,
fruit:banana},
]
}
the field mappings can be seen as :
{"mappings":
"nestedobject":{
"properties":{
"uniqueid":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
"field":{
"type":"nested",
"id":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"color":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"fruit":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
}
}
Now, I try to query this index with 2 documents and I want only the document which has all the elements in the field array with color blue and fruit as banana - NOT ATLEAST 1.
Right now, with the query, I get returned both the documents as it matches with the first element of the first document and returns that.
How to make this possible?
{
"query": {
"nested" : {
"path" : "nestedobject.field",
"query" : {
"bool" : {
"must" : [
{ "match" : {"nestedobject.field.color" : "blue"} },
{ "match" : {"nestedobject.field.fruit" : "banana"}}
]
}
}
}
}
}
Change your query to the below:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "field",
"query": {
"match":{
"field.color": "blue"
}
}
}
},
{
"nested": {
"path": "field",
"query": {
"match":{
"field.fruit": "banana"
}
}
}
}
]
}
}
}
Note that there are two Nested Queries inside a must clause.
Also note that, in order to make use of Exact Match, you should be using Term Queries on keyword field as shown below:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "field",
"query": {
"term": {
"field.color.keyword": "yellow"
}
}
}
},
{
"nested": {
"path": "field",
"query": {
"term": {
"field.fruit.keyword": "banana"
}
}
}
}
]
}
}
}
Hope that helps and if you think that solved what you are looking for, feel free to upvote and/or accept the answer by clicking on big gray check button on the left side of this answer.

Can spring data elasticsearch join parent and child relationship?

{
"properties":{
"id":{
"type":"text",
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
}
}
},
"username":{
"type":"text",
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
}
}
},
"parentId":{
"type":"text",
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
}
}
}
}
}
for example I have a user:
id:1,
username: admin,
parentId: null
I have another user:
id:5,
username:manager,
parentId:1
I have another user:
id:10,
username:staff001,
parentId:5
If I query like this:
{
"query": {
"query_string": {
"query": "*staff*",
"default_field": "*"
}
}
}
my expected result is staff001 and his parent's detail
Is it possible to do this on spring data elasticsearch?
I am sure that it is possible to do this on spring jpa mapping using #OneToOne or #ManyToOne (for example mysql/postgresql)

why data can't get by elasticsearch?

Elastic search version 6.2.4
I made elastic search environment and made mapping like this.
{
"state":"open",
"settings":{
"index":{
"number_of_shards":"5",
"provided_name":"lara_cart",
"creation_date":"1529082175034",
"analysis":{
"filter":{
"engram":{
"type":"edgeNGram",
"min_gram":"1",
"max_gram":"36"
},
"maxlength":{
"type":"length",
"max":"36"
},
"word_delimiter":{
"split_on_numerics":"false",
"generate_word_parts":"true",
"preserve_original":"true",
"generate_number_parts":"true",
"catenate_all":"true",
"split_on_case_change":"true",
"type":"word_delimiter",
"catenate_numbers":"true"
}
},
"char_filter":{
"normalize":{
"mode":"compose",
"name":"nfkc",
"type":"icu_normalizer"
},
"whitespaces":{
"pattern":"\s[2,]",
"type":"pattern_replace",
"replacement":"\u0020"
}
},
"analyzer":{
"keyword_analyzer":{
"filter":[
"lowercase",
"trim",
"maxlength"
],
"char_filter":[
"normalize",
"whitespaces"
],
"type":"custom",
"tokenizer":"keyword"
},
"autocomplete_index_analyzer":{
"filter":[
"lowercase",
"trim",
"maxlength",
"engram"
],
"char_filter":[
"normalize",
"whitespaces"
],
"type":"custom",
"tokenizer":"keyword"
},
"autocomplete_search_analyzer":{
"filter":[
"lowercase",
"trim",
"maxlength"
],
"char_filter":[
"normalize",
"whitespaces"
],
"type":"custom",
"tokenizer":"keyword"
}
},
"tokenizer":{
"engram":{
"type":"edgeNGram",
"min_gram":"1",
"max_gram":"36"
}
}
},
"number_of_replicas":"1",
"uuid":"5xyW07F-RRCuIJlvBufNbA",
"version":{
"created":"6020499"
}
}
},
"mappings":{
"products":{
"properties":{
"sale_end_at":{
"format":"yyyy-MM-dd HH:mm:ss",
"type":"date"
},
"image_5":{
"type":"text"
},
"image_4":{
"type":"text"
},
"created_at":{
"format":"yyyy-MM-dd HH:mm:ss",
"type":"date"
},
"description":{
"analyzer":"keyword_analyzer",
"type":"text",
"fields":{
"autocomplete":{
"search_analyzer":"autocomplete_search_analyzer",
"analyzer":"autocomplete_index_analyzer",
"type":"text"
}
}
},
"sale_start_at":{
"format":"yyyy-MM-dd HH:mm:ss",
"type":"date"
},
"sale_price":{
"type":"integer"
},
"category_id":{
"type":"integer"
},
"updated_at":{
"format":"yyyy-MM-dd HH:mm:ss",
"type":"date"
},
"price":{
"type":"integer"
},
"image_1":{
"type":"text"
},
"name":{
"analyzer":"keyword_analyzer",
"type":"text",
"fields":{
"autocomplete":{
"search_analyzer":"autocomplete_search_analyzer",
"analyzer":"autocomplete_index_analyzer",
"type":"text"
},
"keyword":{
"analyzer":"keyword_analyzer",
"type":"text"
}
}
},
"image_3":{
"type":"text"
},
"categories":{
"type":"nested",
"properties":{
"parent_category_id":{
"type":"integer"
},
"updated_at":{
"type":"text",
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
}
}
},
"name":{
"analyzer":"keyword_analyzer",
"type":"text",
"fields":{
"autocomplete":{
"search_analyzer":"autocomplete_search_analyzer",
"analyzer":"autocomplete_index_analyzer",
"type":"text"
}
}
},
"created_at":{
"type":"text",
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
}
}
},
"id":{
"type":"long"
}
}
},
"id":{
"type":"long"
},
"image_2":{
"type":"text"
},
"stock":{
"type":"integer"
}
}
}
},
"aliases":[
],
"primary_terms":{
"0":1,
"1":1,
"2":1,
"3":1,
"4":1
},
"in_sync_allocations":{
"0":[
"clYoJWUKTru2Z78h0OINwQ"
],
"1":[
"MGQC73KiQsuigTPg4SQG4g"
],
"2":[
"zW6v82gNRbe3wWKefLOAug"
],
"3":[
"5TKrfz7HRAatQsJudKX9-w"
],
"4":[
"gqiblStYSYy_NA6fYtkghQ"
]
}
}
I want to use suggest search by autocomplete filed.
So I added a document like this.
{
"_index":"lara_cart",
"_type":"products",
"_id":"19",
"_version":1,
"_score":1,
"_source":{
"id":19,
"name":"Conqueror, whose.",
"description":"I should think you'll feel it a bit, if you wouldn't mind,' said Alice: 'besides, that's not a regular rule: you invented it just missed her. Alice caught the flamingo and brought it back, the fight.",
"category_id":81,
"stock":79,
"price":11533,
"sale_price":15946,
"sale_start_at":null,
"sale_end_at":null,
"image_1":"https://lorempixel.com/640/480/?56260",
"image_2":"https://lorempixel.com/640/480/?15012",
"image_3":"https://lorempixel.com/640/480/?14138",
"image_4":"https://lorempixel.com/640/480/?94728",
"image_5":"https://lorempixel.com/640/480/?99832",
"created_at":"2018-06-01 16:12:41",
"updated_at":"2018-06-01 16:12:41",
"deleted_at":null,
"categories":{
"id":81,
"name":"A secret, kept.",
"parent_category_id":"33",
"created_at":"2018-06-01 16:12:41",
"updated_at":"2018-06-01 16:12:41",
"deleted_at":null
}
}
}
After that, I try to search by below query.
But, this query can't get anything.
Do you know how to resolve it?
I think to cause is mapping and setting cause.
{
"query":{
"bool":{
"must":[
{
"term":{
"name.autocomplete":"Conqueror"
}
}
],
"must_not":[
],
"should":[
]
}
},
"from":0,
"size":10,
"sort":[
],
"aggs":{
}
}
It's just because of the field that you are using is analyzed and "term" couldn't support the query
you can try "match" on the field which analyzer is autocomplete; may be some basic knowledge of autocomplete and n-grams will help you better understanding this problem.
e.g.
you defined the following analyzer:
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
After that you can test the autocomplete with following request:
GET /my_index/_analyze?analyzer=autocomplete
quick brown
as configured abrove, the autocomplete will generate n-grams for the input query with the edges from 1 ~ 20. And the return for the request is:
q
qu
qui
quic
quick
b
br
bro
brow
brown
As we all know that term query is a query that will search the field which exactly contains the query world, just like where condition of mysql.

Resources