Rails Searchkick has_many indexing and searching - elasticsearch

I am able to search by customer_id, name, lastname, and kids id, name, lastname and "birthdate"
The searching by id must be exact and it is. The searching by names or lastname has misspelling with distance 2 and it works. But I want to search also by kid_birthdate with match exact (mispelling, distance 0)
So far whenever I search by birthdate the results are returned like misspelling distance 2. I don't know how to search exact dates.
Rails 5.1.0.rc1
elasticsearch-5.0.3
searchkick-2.2.0
class Customer < ActiveRecord::Base
include Searchable
def search_data
attributes.merge avatar_url: avatar.url, kids: kids
end
has_many :kids
...
end
class Kid < ActiveRecord::Base
belongs_to :customer
def reindex_customer
customer.reindex async: true
end
...
end
module Searchable
extend ActiveSupport::Concern
included do
SEARCH_RESULTS_PER_PAGE = 10
def self.elastic_search(query, opts = { page: 1 })
# This regex accept string that contains digits or dates
regexp = /(\d+)|(^(0[1-9]|1\d|2\d|3[01])-(0[1-9]|1[0-2])-(19|20)\d{2}$)/
distance = query.match?(regexp) ? 0 : 2 #This is for calculate the distance for misspelling 0 for digits and dates and 2 for strings
options = { load: false,
match: :word_middle,
misspellings: { edit_distance: distance },
per_page: SEARCH_RESULTS_PER_PAGE,
page: opts[:page] }
search query, options
end
end
end
My index contains customer data with her/his kids data. Kids are nested under her/his parent customer.
How can I force the searching for exact matching for dates
For this query:
curl http://localhost:9200/customers_development/_search?pretty -d '{"query":{"dis_max":{"queries":[{"match":{"_all":{"query":"28388","boost":10,"operator":"and","analyzer":"searchkick_search"}}},{"match":{"_all":{"query":"28388","boost":10,"operator":"and","analyzer":"searchkick_search2"}}},{"match":{"_all":{"query":"28388","boost":1,"operator":"and","analyzer":"searchkick_search","fuzziness":0,"prefix_length":0,"max_expansions":3,"fuzzy_transpositions":true}}},{"match":{"_all":{"query":"28388","boost":1,"operator":"and","analyzer":"searchkick_search2","fuzziness":0,"prefix_length":0,"max_expansions":3,"fuzzy_transpositions":true}}}]}},"size":10,"from":0,"timeout":"11s"}'
This is how the index looks:
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 97.29381,
"hits": [
{
"_index": "customers_development_20170913145033808",
"_type": "customer",
"_id": "28388",
"_score": 97.29381,
"_source": {
"id": 28388,
"created_at": "2017-07-10T19:49:43.856Z",
"updated_at": "2017-09-13T03:01:51.727Z",
"name": "Linda",
"lastname": "Schott",
"email": "linda.schott#web.de",
"avatar": null,
"phone": null,
"mobile": null,
"erster_kontakt": null,
"memo": null,
"brief_title": null,
"newsletter": null,
"avatar_url": "/no_customer.png",
"kids": [
{
"id": 34229,
"name": "Jakob",
"lastname": "Schott",
"birthdate": "2013-03-22",
"age": "4,5",
"avatar": {
"url": "/avatars/kid/34229/Jellyfish.png",
"thumb": {
"url": "/avatars/kid/34229/thumb_Jellyfish.png"
}
},
"created_at": "2017-07-10T19:50:16.058Z",
"updated_at": "2017-09-13T03:02:52.962Z",
"customer_id": 28388,
"member": null,
"year_certified": null,
"zahlart": null,
"tn_merge_markiert": null,
"family": null,
"medal": "black",
"score": 30,
"current_level": "swimmys"
},
{
"id": 34228,
"name": "Lilith",
"lastname": "Schott",
"birthdate": "2013-03-22",
"age": "4,5",
"avatar": {
"url": "/avatars/kid/34228/Penguins.png",
"thumb": {
"url": "/avatars/kid/34228/thumb_Penguins.png"
}
},
"created_at": "2017-07-10T19:50:16.058Z",
"updated_at": "2017-09-13T03:02:52.962Z",
"customer_id": 28388,
"member": null,
"year_certified": null,
"zahlart": null,
"tn_merge_markiert": null,
"family": null,
"medal": "green",
"score": 17,
"current_level": "beginner"
},
{
"id": 27718,
"name": "Johanna",
"lastname": "Plischke",
"birthdate": "2010-12-29",
"age": "6,8",
"avatar": {
"url": "/avatars/kid/27718/Koala.png",
"thumb": {
"url": "/avatars/kid/27718/thumb_Koala.png"
}
},
"created_at": "2017-07-10T19:50:16.034Z",
"updated_at": "2017-09-13T04:01:15.261Z",
"customer_id": 28388,
"member": null,
"year_certified": null,
"zahlart": null,
"tn_merge_markiert": null,
"family": null,
"medal": "red",
"score": 27,
"current_level": ""
}
]
}
}
]
}
}

Let's analyze the part of query:
"match":{
"_all":{
"query":"28388",
"boost":1,
"operator":"and",
"analyzer":"searchkick_search",
"fuzziness":0,
"prefix_length":0,
"max_expansions":3,
"fuzzy_transpositions":true
}
}
_all
You have said your kids is the nested field but you just search _all, so the first thing we should make it clear is whether kids is included in _all.
As the document says:
Sets the default include_in_all value for all the properties within the nested object. Nested documents do not have their own _all field. Instead, values are added to the _all field of the main “root” document.
So, first question is whether the index nested type has set include_in_all to false which makes nested fields can't be search by _all.
Nested Query
Or you can choose nested query to query nested object:
GET /_search
{
"query": {
"nested" : {
"path" : "kids",
"score_mode" : "avg",
"query" : {
"query_string": {
"fields": ["kids.birthdate"],
"query": "xxx"
}
}
}
}
}
Fuzzy
When it comes to misspelling, Elasticsearch recommend us to use fuzzy query:
GET /_search
{
"query": {
"fuzzy" : {
"name" : {
"value" : "xxx",
"boost" : 1.0,
"fuzziness" : 2,
"prefix_length" : 0,
"max_expansions": 100
}
}
}
}
Combine Query
And finally, we can combine them using bool query:
POST _search
{
"query": {
"bool" : {
"must" : [{
"nested" : {
"path" : "kids",
"query" : {
"query_string": {
"fields": ["kids.birthdate"],
"query": "xxx"
}
}
}
},
{ "fuzzy" : {
"name" : {
"value" : "xxx",
"boost" : 1.0,
"fuzziness" : 2,
"prefix_length" : 0,
"max_expansions": 100
}
}
}]
}
}
}
I am not familiar with Ruby, so that all I can help. Hope that helpful.

Related

ElasticSearch sorting by more conditions

I have index with simple data and I have to filter and sort it like this:
Records are like this:
{
"name": "Product ABC variant XYZ subvariant JKL",
"date": "2023-01-03T10:34:39+01:00"
}
And I'm searching name, where it is: "Product FGH"
Get records with exact match (field name) and sort them by date (field date) DESC
if nothing found in 1) or if there is not exact match, but similar records, then the rest records sort by default score.
Is it possible to do it in one elasticsearch request? And how it should look the whole query?
Thanks
What you are looking for is running Elasticsearch queries based on the conditions, which is not possible in a single query, you need to first fire first query and if it doesn't return any hit, you need to fire the second one.
Using script_query, you can do it how you want. Convert the date to milliseconds and assign it to the "_score" field for an exact match. for non exact match, you can simply return _score field
For an exact match, it will be sorted by date field desc.
For non exact match, it will sorted by _score field
For example:
Mapping:
{
"mappings": {
"properties": {
"name" : {"type": "keyword"},
"date" : {"type": "date", "format": "yyyy-MM-dd HH:mm:ss"}
}
}
}
Insert:
PUT func/_doc/1
{
"name" : "Product ABC variant XYZ subvariant JKL",
"date" : "2023-01-03 10:34:39"
}
PUT func/_doc/2
{
"name" : "Product ABC variant XYZ subvariant JKL",
"date" : "2022-12-03 10:33:39"
}
PUT func/_doc/3
{
"name" : "Product ABC",
"date" : "2022-11-03 10:33:39"
}
PUT func/_doc/4
{
"name" : "Product ABC",
"date" : "2023-01-03 10:33:39"
}
Query:
GET /func/_search
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "if (doc['name'].value == params.search_term) { return doc['date'].value.toInstant().toEpochMilli(); } else return _score",
"params": {
"search_term": "Product ABC"
}
}
}
}
}
output:
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1672742040000,
"hits": [
{
"_index": "func",
"_id": "4",
"_score": 1672742040000,
"_source": {
"name": "Product ABC",
"date": "2023-01-03 10:33:39"
}
},
{
"_index": "func",
"_id": "3",
"_score": 1667471640000,
"_source": {
"name": "Product ABC",
"date": "2022-11-03 10:33:39"
}
},
{
"_index": "func",
"_id": "1",
"_score": 1,
"_source": {
"name": "Product ABC variant XYZ subvariant JKL",
"date": "2023-01-03 10:34:39"
}
},
{
"_index": "func",
"_id": "2",
"_score": 1,
"_source": {
"name": "Product ABC variant XYZ subvariant JKL",
"date": "2022-12-03 10:33:39"
}
}
]
}
}

Query on Elastic Search on multiple criterias

I have this document in elastic search
{
"_index" : "master",
"_type" : "_doc",
"_id" : "q9IGdXABeXa7ITflapkV",
"_score" : 0.0,
"_source" : {
"customer_acct" : "64876457056",
"ssn_number" : "123456789",
"name" : "Julie",
"city" : "NY"
}
I wanted to query the master index , with the customer_acct and ssn_number to retrive the entire document. I wanted to disable scoring and relevance , I have used the below query
curl -X GET "localhost/master/_search/?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"term": {
"customer_acct": {
"value":"64876457056"
}
}
}
}'
I need to include the second criteria in the term query as well which is the ssn_number, how would I do that? , I want to turn off scoring and relevance would that be possible, I am new to Elastic Search and how would I fit the second criteria on ssn_number in the above query that I have tried?
First, you need to define the proper mapping of your index. your customer_acct and ssn_number are of numeric type but you are storing it as a string. Also looking at your sample I can see you have to use long to store them. and then you can just use filter context in your query as you don't need score and relevance in your result. Read more about filter context in official ES doc as well as below snippet from the link.
In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data,
which is exactly your use-case.
1. Index Mapping
{
"mappings": {
"properties": {
"customer_acct": {
"type": "long"
},
"ssn_number" :{
"type": "long"
},
"name" : {
"type": "text"
},
"city" :{
"type": "text"
}
}
}
}
2. Index sample docs
{
"name": "Smithe John",
"city": "SF",
"customer_acct": 64876457065,
"ssn_number": 123456790
}
{
"name": "Julie",
"city": "NY",
"customer_acct": 64876457056,
"ssn_number": 123456789
}
3. Main search query to filter without the score
{
"query": {
"bool": {
"filter": [ --> only filter clause
{
"term": {
"customer_acct": 64876457056
}
},
{
"term": {
"ssn_number": 123456789
}
}
]
}
}
}
Above search query gives below result:
{
"took": 186,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "so-master",
"_type": "_doc",
"_id": "1",
"_score": 0.0, --> notice score is 0.
"_source": {
"name": "Smithe John",
"city": "SF",
"customer_acct": 64876457056,
"ssn_number": 123456789
}
}
]
}
}

Elasticsearch query scores all documents 1.0. Why?

I'm using ElasticSearch 2.4.1. When I execute the following query, all documents are scored 1.0. Why?
I get the same behavior if I remove the "bool" and just do a match on one field.
Query:
{
"query" :
{
"bool": {
"must" : [
{"match" : { "last" : { "query" : "SMITH" , fuzziness: 2.0}} }
],
"should" : [
{"match" : {"first" :{ "query" : "JOE", fuzziness: 1.0, boost: 99.0}}}
]
}
}
}
Explain for one match gives me:
1.0 = sum of:
1.0 = ConstantScore(+(last:1mith^0.8 last:1smith^0.8 last:4mith^0.8 last:amith^0.8 last:asmith^0.8 last:bsmith^0.8 last:csmith^0.8 last:dsmith^0.8 last:emith^0.8 last:esmith^0.8 last:fsmith^0.8 last:hmith^0.8 last:hsmith^0.8 last:imith^0.8 last:ismith^0.8 last:jmith^0.8 last:jsmith^0.8 last:ksmith^0.8 last:lsmith^0.8 last:msith^0.8 last:msmith^0.8 last:nsmith^0.8 last:omith^0.8 last:osmith^0.8 last:psmith^0.8 last:qsmith^0.8 last:rsmith^0.8 last:saith^0.8 last:samith^0.8 last:scmith^0.8 last:seith^0.8 last:shith^0.8 last:simith^0.8 last:simth^0.8 last:skith^0.8 last:slith^0.8 last:smaith^0.8 last:smath^0.8 last:smdith^0.8 last:smeth^0.8 last:smfith^0.8 last:smich^0.8 last:smidh^0.8 last:smidth^0.8 last:smieth^0.8 last:smigh^0.8 last:smiht^0.8 last:smiih^0.8 last:smiith^0.8 last:smith) (first:aoe^0.6666666 first:bjoe^0.6666666 first:boe^0.6666666 first:coe^0.6666666 first:djoe^0.6666666 first:doe^0.6666666 first:eoe^0.6666666 first:foe^0.6666666 first:goe^0.6666666 first:hoe^0.6666666 first:ioe^0.6666666 first:j0e^0.6666666 first:jae^0.6666666 first:jbe^0.6666666 first:jce^0.6666666 first:jee^0.6666666 first:jeo^0.6666666 first:jge^0.6666666 first:jhe^0.6666666 first:jhoe^0.6666666 first:jie^0.6666666 first:jioe^0.6666666 first:jke^0.6666666 first:jle^0.6666666 first:jme^0.6666666 first:jne^0.6666666 first:jnoe^0.6666666 first:joa^0.6666666 first:joae^0.6666666 first:job^0.6666666 first:jobe^0.6666666 first:joc^0.6666666 first:joce^0.6666666 first:jod^0.6666666 first:jode^0.6666666 first:joe first:joea^0.6666666 first:joeb^0.6666666 first:joec^0.6666666 first:joed^0.6666666 first:joee^0.6666666 first:joef^0.6666666 first:joeg^0.6666666 first:joeh^0.6666666 first:joei^0.6666666 first:joej^0.6666666 first:joek^0.6666666 first:joel^0.6666666 first:joem^0.6666666 first:joen^0.6666666)^99.0), product of:
1.0 = boost
1.0 = queryNorm
0.0 = match on required clause, product of:
0.0 = # clause
0.0 = weight(_type:mytype in 327) [], result of:
0.0 = score(doc=327,freq=1.0), with freq of:
1.0 = termFreq=1.0
Type mapping:
{
"ourindex1": {
"mappings": {
"people": {
"properties": {
"city": {
"type": "string"
},
"first": {
"type": "string"
},
"last": {
"type": "string"
},
"middle": {
"type": "string"
},
"state": {
"type": "string"
},
"street": {
"type": "string"
},
"suffix": {
"type": "string"
},
"suite": {
"type": "string"
},
"territory": {
"type": "string"
},
"zip5": {
"type": "string"
}
}
}
}
}
}
Edit: Simplified Reproduction:
Download clean version of elasticsearch 2.4.1 and start it up
Create new index with:
POST /newindex/people
{"first" : "JOE", "last": "SMITH", "street" : "1 FIRST STREET", "city" : "LOS ANGELES", "state" : "CA", "middle" : ""}
Issue the following query:
{ "query" : {"match" : { "last" : { "query" : "SMITHX", fuzziness: 1.0} } }}
When I do this, document returned is scored 1.0 and explain says something about ConstantScore.
Edit 2: It appears my reproduction steps included an unintentional lie
The library my app uses to communicate with elasticsearch (elastic4s), appears to mangle the query so that it becomes:
{"query" : { "query" : {"match" : { "last" : { "query" : "SMITHX", fuzziness: 1.0} } }}}
(Note that extra "query." This mangled query returns the results I'd expect, but with score = 1.0.) I thought I had already tried executing the query directly with curl, but evidently not.
This is happening because of double query keyword. So, basically it working like this - inner query selects hits and produce something like this:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.30685285,
"hits": [
{
"_index": "my_index",
"_type": "people",
"_id": "2",
"_score": 0.30685285,
"_source": {
"first": "JOHN",
"last": "SMITHS",
"street": "2 SECOND STREET",
"city": "LA",
"state": "CA",
"middle": ""
}
},
{
"_index": "my_index",
"_type": "people",
"_id": "1",
"_score": 0.30685282,
"_source": {
"first": "JOE",
"last": "SMITH",
"street": "1 FIRST STREET",
"city": "LOS ANGELES",
"state": "CA",
"middle": ""
}
}
]
}
}
which is fully correct response with proper score, but then the second query appears, which didn't change result set, but only "eat" the score and replace it with 1.0. So, you need to fix your usage of elastic4s

manipulate returned fields in elasticsearch

Is there a way to manipulate (for example concatenate) returned fields from a query?
This is how I created my index:
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
And this is how I query it:
GET /megacorp/employee/_search
{
"query": {"match_all": {}}
}
The response is this:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
}
]
}
}
That's all working fine.
What I want is to concatenate two fields from the _source and display it in the output as a new field.
first_name and last_name should be combined to a new field "full_name". I can't figure out how to do that without creating a new field in my index. I have looked at "copy_to", but it requires you to explicitly set the store property in the mapping and you have to explicitly ask for the stored field in the query. But the main downside is that when you do both that, the first_name and last_name are returned comma separated. I would like a nice string: "John Smith"
GET /megacorp/employee/_search
{
"query": {"match_all": {}},
"script_fields": {
"combined": {
"script": "_source['first_name'] + ' ' + _source['last_name']"
}
}
}
And you need to enable dynamic scripting.
You can use script_fields to achieve that
GET /megacorp/employee/_search
{
"query": {"match_all": {}},
"script_fields" : {
"full_name" : {
"script" : "[doc['first_name'].value, doc['last_name'].value].join(' ')"
}
}
}
You need to make sure to enable dynamic scripting in order for this to work.

Elasticsearch nested Query parseException

I get a parse exception when i want to get data out of my elastic search. My document looks like this
{
"_index": "some name",
"_type": "row",
"_id": "665",
"_score": 6.3700795,
"_source": {
"dateOfClaim": 1215986400000,
"employer": {
"username": null,
"password": null,
"name": "customer",
"customerNumber": "some number",
"dosierNumbers": null
},
"recipient": {
"username": null,
"password": null,
"name": "some name",
"taxNumber": "some number"
},
"claim": 402401,
"dosierNumber": "",
"worthWayTaxes": "",
"good": {
"brutoWeight": 25,
"nettoWeight": 25050,
"coll": 25000,
"taxWorth": "58830.67",
"eori": ""
},
"poDValues": "YES",
"correctedTaxNumber": null,
"note": null
}
},
and my query looks like this
POST /customs/_search
{
"nested" : {
"path" : "employer",
"score_mode" : "none",
"query" : {
"match": {
"employer.name" : "customer"
}
}
}
}
I want to get all document of a specific employer where the poDValue is NO. But my query already gives me a parseexception (All shards failed for phase: [query]) even without say that the poDValue should be NO.
I think you misunderstood the concept of nested objects. You are just using an object content, not nested object. Check this one:
POST /_search
{
"query": {
"term": {
"employer.titel": {
"value": "Dit is mijn titel"
}
}
}
}

Resources