How to match first and last names with Elasticsearch? - elasticsearch

A typical Elasticsearch JSON response is kind of like:
[
{
"_index": "articles",
"_id": "993",
"_score": 10.443843,
"_source": {
"title": "This is a test title",
"authors": [
{
first_name: 'john',
last_name: 'smith'
},
How can I query for all articles where one of the authors is 'john smith'? Currently I have:
const {
hits: { hits }
} = await client.search({
index: "articles",
body: {
query: {
bool: {
should: [
{
match: {
"authors.first_name": "john"
}
},
{
match: {
"authors.first_name": "Smith"
}
}
]
}
}
}
});
But this returns articles where first or last name are john or smith, not articles with a 'john smith' as an author.

I think you are facing nested vs. object dilemma here. You can achieve what you are looking for by changing the type of authors field to nested type (you didn't share your index mapping so I'm assuming here) and using this query
{
"query":{
"nested":{
"path":"authors",
"query":{
"bool":{
"must":[
{
"match":{
"authors.firstName":{
"query":"john"
}
}
},
{
"match":{
"authors.lastName":{
"query":"Smith"
}
}
}
]
}
}
}
}
}
Hope that helps.

Well in this case your using a "should" statement which can be explained as
firstname:john OR lastname:smith
this can be easily fix with a "must" instead, which can be explained as
firstname:john AND lastname:smith
Also as rob mention in his response, nested vs object is indeed a dilema.
but this dilema would appear when you're treating with arrays of information.
for example you have the following entry
entry #1
{
"serviceType": "mysql",
"allowedUsers": [
{
"firstName": "Daniel",
"lastName": "Acevedo"
},
{
"firstName": "John",
"lastName": "Smith"
},
{
"firstName": "Mike",
"lastName": "K"
}
]
}
and you do the following search
{
"size": 10,
"query": {
"query_string": {
"query": "allowedUsers.firstName:john AND allowedUsers.lastName:acevedo"
}
}
}
you WILL have a match in the document because because both firstName and lastName match your document even though they match in different user objects. this is an example of OBJECT mapping.
in this case there is no work around, and you must use NESTED mapping in order to acomplish a natural match.
in your specific case i dont think you're facing this so going with OBJECT and MUST (AND instead of should (OR)) query you should do fine.
if you need further explanation let me know I'll make an edit with more details.
cheers.

Related

Elasticsearch: add new fields to an existing documents

I would like to add to an existing document matched by a query a new object with new fields.
PUT test/_doc/1
{
"id" : 1,
"text": "My life is beautiful"
"category": "optimistic"
}
I would like to add to all the "category":"optimistic" documents, a new object, something like
{"references": {
"group": "Pro-life",
"responsable": "Mr. Happy Guy"
"job": "Happiness bringer"
}
I would like to try with update_by_query but I cannot make it work with object like this. Any ideas?
I did try with this:
{
"script": "ctx._source.references='{\"hello\":\"world\"}'",
"query": {
"match": {
"category": "optimistic"
}
}
}
But it doesn't give me the expected results. It just saved it as a string """{"hello":"world"}"""
whilst I wanted it as JSON object
Not tested but you could try with params along with update_by_query,
{
"query": {
"match": {
"category": "optimistic"
}
},
"script": {
"inline": "ctx._source.references = params.new_fields",
"params": {
"new_fields": {
"group": "Pro-life",
"responsable": "Mr. Happy Guy"
"job": "Happyness bringer"
}
}
}
}
You should use painless syntax to do that, try it like that:
"script": "ctx._source.references=[ \"hello\":\"world\" ]"
More info: https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-operators-reference.html#map-initialization-operator

May Elasticsearch nested query return only matched nested documents for nested fields?

I'm new to Elasticsearch, and come up with a question that whether Elasticsearch nested query may return only matched nested documents for nested fields or not.
For Example I have a type named blog with a nested field named comments
{
"id": 1,
...
"comments":[
{"content":"Michael is a basketball player"},
{"content":"David is a soccer player"}
]
}
{
"id": 2,
...
"comments":[
{"content":"Wayne is a soccer player"},
{"content":"Steven is also a soccer player"},
]
}
and the nested query
{"query":{
"nested":{
"path":"comments",
"query":{"match":{"comments.content":"soccer"}}
}
}
What I need is to search blog posts with comments which mentioned "soccer", with the count of comments that matched "soccer" (in the example it counts 1, since another comment just mentioned "basketball") for each blog post.
{"hits":[
{
"id":1,
...
"count_for_comments_that_matches_query":1,
},
{
"id":2,
...
"count_for_comments_that_matches_query":2,
}
]}
However it seems Elasticsearch always return the full document, so how could I achieve it, or I couldn't?
The answer is here.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html#nested-inner-hits
You need to use the nested inner hits feature of the Elastic search.
{
"_source": [
"id"
],
"query": {
"bool": {
"must": [
{
"match": {
"id": "1"
}
},
{
"nested": {
"path": "comments",
"query": {
"match": {
"comments.content": "soccer"
}
},
"inner_hits": {}
}
}
]
}
}
}
I think it will solve the problem

Elasticsearch: Get report of unmatched should elements in a bool query

I'm looking for a way to get a report of unmatched should querys and display it.
For instance I have two user objects
User 1:
{
"username": "user1"
"docType": "user"
"level": "Professor"
"discipline": "Sciences"
"sub-discipline": "Mathematical"
}
User 2:
{
"username": "user1"
"docType": "user"
"level": "Professor"
"discipline": "Sciences"
"subDiscipline": "Physics"
}
When I do a bool query where the matching discipline is in must query and the sub-discipline is in the should query
bool:
must: [{
term: { "doc.docType": "user" }
},{
term: { "doc.level": "professor" }
},{
term: { "doc.discipline": "sciences" }
}],
should: [{
term: { "subDiscipline": "physics" }
}]
How can I get the unmatched elements in my result like that:
Result 1: user1 match 100%
Result 2: user2 match 70% (unmatch subdiscipine "physics")
I had a look into the explainApi but the result doesn't seems to be provided for that use case and seems very complicated to parse.
You will need to use named queries for this.
Using the same , create a bool query like below -
{
"query": {
"bool": {
"must": [
{
"match": {
"SourceName": {
"query": "CNN",
"_name": "sourceMatch"
}
}
},
{
"match": {
"author": {
"query": "qbox.io",
"_name": "author"
}
}
}
]
}
}
}
In the result section , it will tell which all named queries matched.
You can use this information to fabricate the stats you are looking for.

Is there any way not to return arrays when specifying return fields in an Elasticsearch query?

If I have a documents like this :
[
{
"model": "iPhone",
"brand": "Apple"
},
{
"model": "Nexus 5",
"brand": "Google"
}
]
And that I make a query which only returns the model field in a query, like this:
{
"fields": ["model"],
"query": {
"term": {
"brand": "apple"
}
}
}
Then each document field is returned within an array like this:
{ "model": ["iPhone"] }
instead of
{ "model": "iPhone" }
How can I avoid that and get the fields in the same format as when the fields query option is not defined?
At the end the answer was pretty easy: you have to use the _source query option insteand of fields.
Example:
{
"_source": ["model"],
"query": {
"term": {
"brand": "apple"
}
}
}
This way I get documents in the following format, like in the original one (without the _source option):
{ "model": "iPhone" }
I had the same problem, and indeed (as Wax Cage said) I thought that _source would bring some performances problems. I think using both fields and _source solves the problem:
const fields = ['model']
{
fields: fields,
_source: fields
query: {
term: {
brand: 'apple'
}
}
}

Elasticsearch complex proximity query

Given that I have a query like below:
council* W/5 (tip OR tips)
The above query can be translated as: Find anything that has council* and (tip OR tips) no more than 5 words apart.
So following text will match:
Shellharbour City Council Tip
council best tip
councils top 10 tips
But this one should not match:
... City Council at Shellharbour. There is not any good tip at all.
I need help to build an elasticsearch query for that. I was thinking about Regex query but I'm not quite sure about better alternatives. Thanks
You can use a combination of the span_near query, span_multi and span_or. We can use the query below to perform the same search.
{
"query": {
"span_near": {
"clauses": [
{
"span_multi":
{
"match":
{
"prefix": { "text": "council"}
}
}
},
{
"span_or": {
"clauses": [
{
"span_term": {
"text": {
"value": "tip"
}
}
},
{
"span_term": {
"text": {
"value": "tips"
}
}
}
]
}
}
],
"slop": 5,
"in_order": true
}
}
}
The important things to look out for are the span_term which is the text your searching for. In this example I only had one field called "text". Slop indicates the number of words we will allow between the terms, and in_order indicates that the order of words is important. So "tip council" will not match, where as "council tip" will.

Resources