I get a parse exception when i want to get data out of my elastic search. My document looks like this
{
"_index": "some name",
"_type": "row",
"_id": "665",
"_score": 6.3700795,
"_source": {
"dateOfClaim": 1215986400000,
"employer": {
"username": null,
"password": null,
"name": "customer",
"customerNumber": "some number",
"dosierNumbers": null
},
"recipient": {
"username": null,
"password": null,
"name": "some name",
"taxNumber": "some number"
},
"claim": 402401,
"dosierNumber": "",
"worthWayTaxes": "",
"good": {
"brutoWeight": 25,
"nettoWeight": 25050,
"coll": 25000,
"taxWorth": "58830.67",
"eori": ""
},
"poDValues": "YES",
"correctedTaxNumber": null,
"note": null
}
},
and my query looks like this
POST /customs/_search
{
"nested" : {
"path" : "employer",
"score_mode" : "none",
"query" : {
"match": {
"employer.name" : "customer"
}
}
}
}
I want to get all document of a specific employer where the poDValue is NO. But my query already gives me a parseexception (All shards failed for phase: [query]) even without say that the poDValue should be NO.
I think you misunderstood the concept of nested objects. You are just using an object content, not nested object. Check this one:
POST /_search
{
"query": {
"term": {
"employer.titel": {
"value": "Dit is mijn titel"
}
}
}
}
Related
I have an Opensearch index with a string field message defined as below:
{"name":"message.message","type":"string","esTypes":["text"],"count":0,"scripted":false,"searchable":true,"aggregatable":false,"readFromDocValues":false}
Sample data:
"_source" : {
"message" : {
"message" : "user: AB, from: home, to: /app1"
}
}
I would like to convert the message column into json so that I can access the values message.user, message.from and message.to individually.
How do I go about it?
You can use Json Processor.
POST /_ingest/pipeline/_simulate
{
"pipeline": {
"description": "convert json to object",
"processors": [
{
"json": {
"field": "foo",
"target_field": "json_target"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"foo": "{\"name\":\"message.message\",\"type\":\"string\",\"esTypes\":[\"text\"],\"count\":0,\"scripted\":false,\"searchable\":true,\"aggregatable\":false,\"readFromDocValues\":false}\r\n"
}
}
]
}
Response:
{
"docs": [
{
"doc": {
"_index": "index",
"_id": "id",
"_version": "-3",
"_source": {
"foo": """{"name":"message.message","type":"string","esTypes":["text"],"count":0,"scripted":false,"searchable":true,"aggregatable":false,"readFromDocValues":false}
""",
"json_target": {
"esTypes": [
"text"
],
"readFromDocValues": false,
"name": "message.message",
"count": 0,
"aggregatable": false,
"type": "string",
"scripted": false,
"searchable": true
}
},
"_ingest": {
"timestamp": "2022-11-09T19:38:01.16232Z"
}
}
}
]
}
I'm implementing a search box in Elasticsearch and I have an Elasticsearch index with the following mappings:
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"brand": {
"type": "text"
}
}
}
}
And I'd like, quite simply, to do a query such as (in SQL):
SELECT * FROM <table> WHERE brand ILIKE '%test%' OR name ILIKE '%test%';
I've tried a query such as:
{
"query": {
"query_string": {
"query": "*test*",
"fields": ["brand", "name"]
}
}
}
and that gives me my desired result, however, I've noticed that the docs recommend not using query_string for a search box as it can lead to performance issues.
I then tried a multi_match query:
{
"query": {
"multi_match" : {
"query": "test"
}
}
}
But that yielded no results. Further, when I used an ngram tokenizer, it returned all documents all the time.
I've consulted countless resources on this and even on StackOverflow there are countless unanswered questions regarding this topic. Could somebody explain how this is achieved in the Elasticsearch world, or am I simply using the wrong tool for the job? Thanks.
Since you have not provided the sample documents, I have created complete example, what you are trying to do is very much possible in Elasticsearch, with simple boolean should wildcard queries as shown below
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"name.keyword": {
"value": "*test*"
}
}
},
{
"wildcard": {
"brand.keyword": {
"value": "*test*"
}
}
}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
You can test above query on below sample documents
{
"brand" : "test",
"name" : "name foo according to use"
}
{
"brand" : "barand name is foo",
"name" : "name foo according to use"
}
{
"brand" : "barand name is test",
"name" : "name tested according to use"
}
{
"brand" : "barand name is testing",
"name" : "test the name"
}
on above 4 sample documents, query returns below documents
"hits": [
{
"_index": "73885469",
"_id": "1",
"_score": 2.0,
"_source": {
"brand": "barand name is testing",
"name": "test the name"
}
},
{
"_index": "73885469",
"_id": "2",
"_score": 2.0,
"_source": {
"brand": "barand name is test",
"name": "name tested according to use"
}
},
{
"_index": "73885469",
"_id": "4",
"_score": 1.0,
"_source": {
"brand": "test",
"name": "name foo according to use"
}
}
]
Which is i believe your expected documents
we currently have a 'message' that can have a link to a 'parent' message. E.g. a reply would have the original message as the parent_id.
PUT {
"mappings": {
"message": {
"properties": {
"subject": {
"type": "text"
},
"body" : {
"type" : "text"
},
"parent_id" : {
"type" : "long"
}
}
}
}
}
}
Currently we didn't have an elasticsearch parent child join on the document as parent and child weren't allowed to be of the same type. Now with 5.6 and the drive by elastic to get rid of types we are now trying to use the new parent and child join in 5.6 which.
PUT {
"settings": {
"mapping.single_type": true
},
"mappings": {
"message": {
"properties": {
"subject": {
"type": "text"
},
"body" : {
"type" : "text"
},
"join_field": {
"type" : "join",
"relations": {
"parent_message":"child_message"
}
}
}
}
}
}
}
I know I will have to create a new index for this and then reindex everything with _reindex but I am not quite sure how I would do that.
If I index a parent_message it is simple
PUT localhost:9200/testm1/message/1
{
"subject": "Message 1",
"body" : "body 1"
}
PUT localhost:9200/testm1/message/3?routing=1
{
"subject": "Message Reply to 1",
"body" : "body 3",
"join_field": {
"name": "child_message",
"parent": "1"
}
}
A search would now return
{
"_index": "testm1",
"_type": "message",
"_id": "2",
"_score": 1,
"_source": {
"subject": "Message 2",
"body": "body 2"
}
},
{
"_index": "testm1",
"_type": "message",
"_id": "1",
"_score": 1,
"_source": {
"subject": "Message 1",
"body": "body 1"
}
},
{
"_index": "testm1",
"_type": "message",
"_id": "3",
"_score": 1,
"_routing": "1",
"_source": {
"subject": "Message Reply to 1",
"body": "body 3",
"join_field": {
"name": "child_message",
"parent": "1"
}
}
}
I tried to create the new index (testmnew) and then just do a _reindex
POST _reindex
{
"source": {
"index" : "testm"
},
"dest" :{
"index" : "testmnew"
},
"script" : {
"inline" : """
ctx._routing = ctx._source.parent_id;
--> Missing need to set join_field here as well I guess <--
"""
}
}
The scripting is still not quite clear to me. But am I on the right path here? Would I simply set the _routing on the messages (would be null on parent messages). But how would I set the join_field only for child messages?
This is the re-indexing script that I've used in the end:
curl -XPOST 'localhost:9200/_reindex' -H 'Content-Type: application/json' -d'
{
"source": {
"index" : "testm"
},
"dest" :{
"index" : "testmnew"
},
"script" : {
"lang" : "painless",
"source" : "if(ctx._source.parent_id != null){ctx._routing = ctx._source.parent_id; ctx._source.join_field= params.cjoin; ctx._source.join_field.parent = ctx._source.parent_id;}else{ctx._source.join_field = params.parent_join}",
"params" : {
"cjoin" :{
"name": "child_message",
"parent": 1
},
"parent_join" : {"name": "parent_message"}
}
}
}
'
I am able to search by customer_id, name, lastname, and kids id, name, lastname and "birthdate"
The searching by id must be exact and it is. The searching by names or lastname has misspelling with distance 2 and it works. But I want to search also by kid_birthdate with match exact (mispelling, distance 0)
So far whenever I search by birthdate the results are returned like misspelling distance 2. I don't know how to search exact dates.
Rails 5.1.0.rc1
elasticsearch-5.0.3
searchkick-2.2.0
class Customer < ActiveRecord::Base
include Searchable
def search_data
attributes.merge avatar_url: avatar.url, kids: kids
end
has_many :kids
...
end
class Kid < ActiveRecord::Base
belongs_to :customer
def reindex_customer
customer.reindex async: true
end
...
end
module Searchable
extend ActiveSupport::Concern
included do
SEARCH_RESULTS_PER_PAGE = 10
def self.elastic_search(query, opts = { page: 1 })
# This regex accept string that contains digits or dates
regexp = /(\d+)|(^(0[1-9]|1\d|2\d|3[01])-(0[1-9]|1[0-2])-(19|20)\d{2}$)/
distance = query.match?(regexp) ? 0 : 2 #This is for calculate the distance for misspelling 0 for digits and dates and 2 for strings
options = { load: false,
match: :word_middle,
misspellings: { edit_distance: distance },
per_page: SEARCH_RESULTS_PER_PAGE,
page: opts[:page] }
search query, options
end
end
end
My index contains customer data with her/his kids data. Kids are nested under her/his parent customer.
How can I force the searching for exact matching for dates
For this query:
curl http://localhost:9200/customers_development/_search?pretty -d '{"query":{"dis_max":{"queries":[{"match":{"_all":{"query":"28388","boost":10,"operator":"and","analyzer":"searchkick_search"}}},{"match":{"_all":{"query":"28388","boost":10,"operator":"and","analyzer":"searchkick_search2"}}},{"match":{"_all":{"query":"28388","boost":1,"operator":"and","analyzer":"searchkick_search","fuzziness":0,"prefix_length":0,"max_expansions":3,"fuzzy_transpositions":true}}},{"match":{"_all":{"query":"28388","boost":1,"operator":"and","analyzer":"searchkick_search2","fuzziness":0,"prefix_length":0,"max_expansions":3,"fuzzy_transpositions":true}}}]}},"size":10,"from":0,"timeout":"11s"}'
This is how the index looks:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 97.29381,
"hits": [
{
"_index": "customers_development_20170913145033808",
"_type": "customer",
"_id": "28388",
"_score": 97.29381,
"_source": {
"id": 28388,
"created_at": "2017-07-10T19:49:43.856Z",
"updated_at": "2017-09-13T03:01:51.727Z",
"name": "Linda",
"lastname": "Schott",
"email": "linda.schott#web.de",
"avatar": null,
"phone": null,
"mobile": null,
"erster_kontakt": null,
"memo": null,
"brief_title": null,
"newsletter": null,
"avatar_url": "/no_customer.png",
"kids": [
{
"id": 34229,
"name": "Jakob",
"lastname": "Schott",
"birthdate": "2013-03-22",
"age": "4,5",
"avatar": {
"url": "/avatars/kid/34229/Jellyfish.png",
"thumb": {
"url": "/avatars/kid/34229/thumb_Jellyfish.png"
}
},
"created_at": "2017-07-10T19:50:16.058Z",
"updated_at": "2017-09-13T03:02:52.962Z",
"customer_id": 28388,
"member": null,
"year_certified": null,
"zahlart": null,
"tn_merge_markiert": null,
"family": null,
"medal": "black",
"score": 30,
"current_level": "swimmys"
},
{
"id": 34228,
"name": "Lilith",
"lastname": "Schott",
"birthdate": "2013-03-22",
"age": "4,5",
"avatar": {
"url": "/avatars/kid/34228/Penguins.png",
"thumb": {
"url": "/avatars/kid/34228/thumb_Penguins.png"
}
},
"created_at": "2017-07-10T19:50:16.058Z",
"updated_at": "2017-09-13T03:02:52.962Z",
"customer_id": 28388,
"member": null,
"year_certified": null,
"zahlart": null,
"tn_merge_markiert": null,
"family": null,
"medal": "green",
"score": 17,
"current_level": "beginner"
},
{
"id": 27718,
"name": "Johanna",
"lastname": "Plischke",
"birthdate": "2010-12-29",
"age": "6,8",
"avatar": {
"url": "/avatars/kid/27718/Koala.png",
"thumb": {
"url": "/avatars/kid/27718/thumb_Koala.png"
}
},
"created_at": "2017-07-10T19:50:16.034Z",
"updated_at": "2017-09-13T04:01:15.261Z",
"customer_id": 28388,
"member": null,
"year_certified": null,
"zahlart": null,
"tn_merge_markiert": null,
"family": null,
"medal": "red",
"score": 27,
"current_level": ""
}
]
}
}
]
}
}
Let's analyze the part of query:
"match":{
"_all":{
"query":"28388",
"boost":1,
"operator":"and",
"analyzer":"searchkick_search",
"fuzziness":0,
"prefix_length":0,
"max_expansions":3,
"fuzzy_transpositions":true
}
}
_all
You have said your kids is the nested field but you just search _all, so the first thing we should make it clear is whether kids is included in _all.
As the document says:
Sets the default include_in_all value for all the properties within the nested object. Nested documents do not have their own _all field. Instead, values are added to the _all field of the main “root” document.
So, first question is whether the index nested type has set include_in_all to false which makes nested fields can't be search by _all.
Nested Query
Or you can choose nested query to query nested object:
GET /_search
{
"query": {
"nested" : {
"path" : "kids",
"score_mode" : "avg",
"query" : {
"query_string": {
"fields": ["kids.birthdate"],
"query": "xxx"
}
}
}
}
}
Fuzzy
When it comes to misspelling, Elasticsearch recommend us to use fuzzy query:
GET /_search
{
"query": {
"fuzzy" : {
"name" : {
"value" : "xxx",
"boost" : 1.0,
"fuzziness" : 2,
"prefix_length" : 0,
"max_expansions": 100
}
}
}
}
Combine Query
And finally, we can combine them using bool query:
POST _search
{
"query": {
"bool" : {
"must" : [{
"nested" : {
"path" : "kids",
"query" : {
"query_string": {
"fields": ["kids.birthdate"],
"query": "xxx"
}
}
}
},
{ "fuzzy" : {
"name" : {
"value" : "xxx",
"boost" : 1.0,
"fuzziness" : 2,
"prefix_length" : 0,
"max_expansions": 100
}
}
}]
}
}
}
I am not familiar with Ruby, so that all I can help. Hope that helpful.
I have a index like this,
"_index": "test",
"_type": "products",
"_id": "URpYIFBAQRiPPu1BFOZiQg",
"_score": null,
"_source": {
"currency": null,
"colors": [],
"api": 1,
"sku": 9999227900050002,
"category_path": [
{
"id": "cat00000",
"name": "B1"
},
{
"id": "abcat0400000",
"name": "Cameras & Camcorders"
},
{
"id": "abcat0401000",
"name": "Digital Cameras"
},
{
"id": "abcat0401005",
"name": "Digital SLR Cameras"
},
{
"id": "pcmcat180400050006",
"name": "DSLR Package Deals"
}
],
"price": 1034.99,
"status": 1,
"description": null,
}
And i want to search only exact text ["Camcorders"] in category_path field.
I did some match query, but it search all the products which has "Camcorders" as a part of the text. Can some one help me to solve this.
Thanks
To search in nested field use like following query
{
"query": {
"term": {
"category_path.name": {
"value": "b1"
}
}
}
}
HOpe it helps..!
you could add one more nested field raw_name with not_analyzed analyzer and match against it.