ElasticSearch and NEST - How do I construct a simple OR query? - elasticsearch

I'm developing a building repository query.
Here is the query that I am trying to write.
(Exact match on zipCode) AND ((Case-insensitive exact
match on address1) OR (Case-insensitive exact match on siteName))
In my repository, I have a document that looks like the following:
address1: 4 Myrtle Street
siteName: Myrtle Street
zipCode: 90210
And I keep getting matches on:
address1: 45 Myrtle Street
siteName: Myrtle
zipCode: 90210
Here are some attempts that have not worked:
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"address1": {
"value": "45 myrtle street"
}
}
},
{
"term": {
"siteName": {
"value": "myrtle"
}
}
}
]
}
},
{
"term": {
"zipCode": {
"value": "90210"
}
}
}
]
}
}
}
{
"query": {
"filtered": {
"query": {
"term": {
"zipCode": {
"value": "90210"
}
}
},
"filter": {
"or": {
"filters": [
{
"term": {
"address1": "45 myrtle street"
}
},
{
"term": {
"siteName": "myrtle"
}
}
]
}
}
}
}
}
{
"filter": {
"bool": {
"must": [
{
"or": {
"filters": [
{
"term": {
"address1": "45 myrtle street"
}
},
{
"term": {
"siteName": "myrtle"
}
}
]
}
},
{
"term": {
"zipCode": "90210"
}
}
]
}
}
}
{
"query": {
"bool": {
"must": [
{
"span_or": {
"clauses": [
{
"span_term": {
"siteName": {
"value": "myrtle"
}
}
}
]
}
},
{
"term": {
"zipCode": {
"value": "90210"
}
}
}
]
}
}
}
{
"query": {
"filtered": {
"query": {
"term": {
"zipCode": {
"value": "90210"
}
}
},
"filter": {
"or": {
"filters": [
{
"term": {
"address1": "45 myrtle street"
}
},
{
"term": {
"siteName": "myrtle"
}
}
]
}
}
}
}
}
Does anyone know what I am doing wrong?
I'm writing this with NEST, so I would prefer NEST syntax, but ElasticSearch syntax would certainly suffice as well.
EDIT: Per #Greg Marzouka's comment, here are the mappings:
{
[indexname]: {
"mappings": {
"[indexname]elasticsearchresponse": {
"properties": {
"address": {
"type": "string"
},
"address1": {
"type": "string"
},
"address2": {
"type": "string"
},
"address3": {
"type": "string"
},
"city": {
"type": "string"
},
"country": {
"type": "string"
},
"id": {
"type": "string"
},
"originalSourceId": {
"type": "string"
},
"placeId": {
"type": "string"
},
"siteName": {
"type": "string"
},
"siteType": {
"type": "string"
},
"state": {
"type": "string"
},
"systemId": {
"type": "long"
},
"zipCode": {
"type": "string"
}
}
}
}
}
}

Based on your mapping, you won't be able to search for exact matches on siteName because it's being analyzed with the standard analyzer, which is more tuned for full text search. This is the default analyzer that is applied by Elasticsearch when one isn't explicitly defined on a field.
The standard analyzer is breaking up the value of siteName into multiple tokens. For example, Myrtle Street is tokenized and stored as two separate terms in your index, myrtle and street, which is why your query is matching that document. For a case-insensitive exact match, instead you want Myrtle Street stored as a single, lower-cased token in your index: myrtle street.
You could set siteName to not_analyzed, which won't subject the field to the analysis chain at all- meaning the values will not be modified. However, this will produce a single Myrtle Street token, which will work for exact matches, but will be case-sensitive.
What you need to do is create a custom analyzer using the keyword tokenizer and lowercase token filter, then apply it to your field.
Here's how you can accomplish this with NEST's fluent API:
// Create the custom analyzer using the keyword tokenizer and lowercase token filter
var myAnalyzer = new CustomAnalyzer
{
Tokenizer = "keyword",
Filter = new [] { "lowercase" }
};
var response = this.Client.CreateIndex("your-index-name", c => c
// Add the customer analyzer to your index settings
.Analysis(an => an
.Analyzers(az => az
.Add("my_analyzer", myAnalyzer)
)
)
// Create the mapping for your type and apply "my_analyzer" to the siteName field
.AddMapping<YourType>(m => m
.MapFromAttributes()
.Properties(ps => ps
.String(s => s.Name(t => t.SiteName).Analyzer("my_analyzer"))
)
)
);

Related

Elastic search sublist query with and logic

I have post below json data:
[
{
"id": 1,
"shopname": "seven up",
"shopkeeper": "John",
"salesbooks": [
{
"bookid": 11,
"bookname": "Tom-story",
"soldout": false
},
{
"bookid": 12,
"bookname": "Iron-Man",
"soldout": true
}
]
}
]
and I make a simple elastic query as below:
{
"from": 0,
"query": {
"bool": {
"must": [
{
"wildcard": {
"salesbooks.bookname": {
"value": "*Iron-Man*"
}
}
},
{
"term": {
"salesbooks.soldout": {
"value": false
}
}
}
]
}
}
}
It should be empty as I want filter salesbooks.bookname contain ("iron-man") and soldout is false, it didn't work
can I know what's wrong inside
Thank you
The "must" clause requires that all clauses be true, in your case there is no "salesbooks.bookname": "Iron-Man" and soldout false, only "salesbooks.bookname": "Iron-Man" and soldout true.
Another important point is that the wildcard is a Term Level Query.
My example I tested (I figured salesbooks is a nested object).
PUT /house
{
"mappings": {
"properties": {
"salesbooks": {
"type": "nested",
"properties": {
"bookname": {
"type": "keyword"
},
"soldout": {
"type": "boolean"
}
}
}
}
}
}
PUT /house/_doc/1
{
"shopname": "seven up",
"shopkeeper": "John",
"salesbooks": [
{
"bookid": 11,
"bookname": "Tom-story",
"soldout": false
},
{
"bookid": 12,
"bookname": "Iron-Man",
"soldout": true
}
]
}
GET house/_search
{
"from": 0,
"query": {
"nested": {
"path": "salesbooks",
"query": {
"bool": {
"must": [
{
"wildcard": {
"salesbooks.bookname": "*Iron-Man*"
}
},
{
"term": {
"salesbooks.soldout": {
"value": true
}
}
}
]
}
}
}
}
}

Query hashmap structure with elasticsearch

I have two questions regarding mapping and querying a java hashmap in elasticsearch.
Does this mapping make sense in elasticsearch (is it the correct way to map a hashmap)?:
{
"properties": {
"itemsMap": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "yyyy-MM-dd"
},
"value": {
"type": "nested",
"properties": {
"itemVal1": {
"type": "double"
},
"itemVal2": {
"type": "double"
}
}
}
}
}
}
}
Here is some example data:
{
"itemsMap": {
"2021-12-31": {
"itemVal1": 100.0,
"itemVal2": 150.0,
},
"2021-11-30": {
"itemVal1": 200.0,
"itemVal2": 50.0,
}
}
}
My queries don't seem to work. For example:
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"match": {
"itemsMap.key": "2021-11-30"
}
}
]
}
}
}
}
}
Am I doing something wrong? How can I query such a structure? I have the possibility to change the mapping if it's necessary.
Thanks
TLDR;
The way you are uploading your data, nothing is stored in key.
You will have fields named 2021-11-30 ... and key is going to be empty.
Either you have a limited amount of "dates" and this is a viable options (less than 1000) else your format is not viable on the long run.
If you don't want to change your doc, here is the query
GET /71525899/_search
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "itemsMap.2021-12-31"
}
}
]
}
}
}
}
}
To understand
If you inspect the mapping by querying the index
GET /<index_name>/_mapping
You will see that the number of fields name after your date is going to grow.
And in all your doc, itemsMap.key is going to be empty. (this explain why my previous answer did not work.
A more viable option
Keep your mapping, update the shape of your docs.
They will look like
{
"itemsMap": [
{
"key": "2021-12-31",
"value": { "itemVal1": 100, "itemVal2": 150 }
},
{
"key": "2021-11-30",
"value": { "itemVal1": 200, "itemVal2": 50 }
}
]
}
DELETE /71525899
PUT /71525899/
{
"mappings": {
"properties": {
"itemsMap": {
"type": "nested",
"properties": {
"key": {
"type": "date",
"format": "yyyy-MM-dd"
},
"value": {
"type": "nested",
"properties": {
"itemVal1": {
"type": "double"
},
"itemVal2": {
"type": "double"
}
}
}
}
}
}
}
}
POST /_bulk
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2021-12-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2022-12-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
{"index":{"_index":"71525899"}}
{"itemsMap":[{"key":"2021-11-31", "value": {"itemVal1":100,"itemVal2":150}},{"key":"2021-11-30", "value":{"itemVal1":200,"itemVal2":50}}]}
GET /71525899/_search
{
"query": {
"nested": {
"path": "itemsMap",
"query": {
"bool": {
"must": [
{
"match": {
"itemsMap.key": "2021-12-31"
}
}
]
}
}
}
}
}

Elastic Search query for an AND condition on two properties of a nested object

I have the post_filter as below, Where I am trying to filter records where the school name is HILL SCHOOL AND containing a nested child object with name JOY AND section A.
school is present in the parent object, Which is holding children list of nested objects.
All of the above are AND conditions.
But the query doesn't seem to work. Any idea why ? And is there a way to combine the two nested queries?
GET /test_school/_search
{
"query": {
"match_all": {}
},
"post_filter": {
"bool": {
"must_not": [
{
"bool": {
"must": [
{
"term": {
"schoolname": {
"value": "HILL SCHOOL"
}
}
},
{
"nested": {
"path": "children",
"query": {
"bool": {
"must": [
{
"match": {
"name": "JACK"
}
}
]
}
}
}
},
{
"term": {
"children.section": {
"value": "A"
}
}
}
]
}
}
]
}
}
}
The schema is as below:
PUT /test_school
{
"mappings": {
"_doc": {
"properties": {
"schoolname": {
"type": "keyword"
},
"children": {
"type": "nested",
"properties": {
"name": {
"type": "keyword",
"index": true
},
"section": {
"type": "keyword",
"index": true
}
}
}
}
}
}
}
Sample data as below:
POST /test_school/_doc
{
"schoolname":"HILL SCHOOL",
"children":{
"name":"JOY",
"section":"A"
}
}
second record
POST /test_school/_doc
{
"schoolname":"HILL SCHOOL",
"children":{
"name":"JACK",
"section":"B"
}
}
https://stackoverflow.com/a/17543151/183217 suggests special mapping is needed to work with nested objects. You appear to be falling foul of the "cross object matching" problem.

Create keyword string type with custom analyzer in 5.3.0

I have a string I'd like to index as keyword type but with a special comma analyzer:
For example:
"San Francisco, Boston, New York" -> "San Francisco", "Boston, "New York"
should be both indexed and aggregatable at the same time so that I can split it up by buckets. In pre 5.0.0 the following worked:
Index settings:
{
'settings': {
'analysis': {
'tokenizer': {
'comma': {
'type': 'pattern',
'pattern': ','
}
},
'analyzer': {
'comma': {
'type': 'custom',
'tokenizer': 'comma'
}
}
},
},
}
with the following mapping:
{
'city': {
'type': 'string',
'analyzer': 'comma'
},
}
Now in 5.3.0 and above the analyzer is no longer a valid property for the keyword type, and my understanding is that I want a keyword type here. How do I specify an aggregatable, indexed, searchable text type with custom analyzer?
You can use multifields to index the same fields in two different ways one for searching and other for aggregations.
Also i suugest you to add a filter for trim and lowercase the tokens produced to help you with better search.
Mappings
PUT commaindex2
{
"settings": {
"analysis": {
"tokenizer": {
"comma": {
"type": "pattern",
"pattern": ","
}
},
"analyzer": {
"comma": {
"type": "custom",
"tokenizer": "comma",
"filter": ["lowercase", "trim"]
}
}
}
},
"mappings": {
"city_document": {
"properties": {
"city": {
"type": "keyword",
"fields": {
"city_custom_analyzed": {
"type": "text",
"analyzer": "comma",
"fielddata": true
}
}
}
}
}
}
}
Index Document
POST commaindex2/city_document
{
"city" : "san fransisco, new york, london"
}
Search Query
POST commaindex2/city_document/_search
{
"query": {
"bool": {
"must": [{
"term": {
"city.city_custom_analyzed": {
"value": "new york"
}
}
}]
}
},
"aggs": {
"terms_agg": {
"terms": {
"field": "city",
"size": 10
}
}
}
}
Note
In case you want to run aggs on indexed fields, like you want to count for each city in buckets, you can run terms aggregation on city.city_custom_analyzed field.
POST commaindex2/city_document/_search
{
"query": {
"bool": {
"must": [{
"term": {
"city.city_custom_analyzed": {
"value": "new york"
}
}
}]
}
},
"aggs": {
"terms_agg": {
"terms": {
"field": "city.city_custom_analyzed",
"size": 10
}
}
}
}
Hope this helps
Since you're using ES 5.3, I suggest a different approach, using an ingest pipeline to split your field at indexing time.
PUT _ingest/pipeline/city-splitter
{
"description": "City splitter",
"processors": [
{
"split": {
"field": "city",
"separator": ","
}
},
{
"foreach": {
"field": "city",
"processor": {
"trim": {
"field": "_ingest._value"
}
}
}
}
]
}
Then you can index a new document:
PUT cities/city/1?pipeline=city-splitter
{ "city" : "San Francisco, Boston, New York" }
And finally you can search/sort on city and run an aggregation on the field city.keyword as if the cities had been split in your client application:
POST cities/_search
{
"query": {
"match": {
"city": "boston"
}
},
"aggs": {
"cities": {
"terms": {
"field": "city.keyword"
}
}
}
}

Elasticsearch : search document with conditional filter

I have two documents in my index (same type) :
{
"first_name":"John",
"last_name":"Doe",
"age":"24",
"phone_numbers":[
{
"contract_number":"123456789",
"phone_number":"987654321",
"creation_date": ...
},
{
"contract_number":"123456789",
"phone_number":"012012012",
"creation_date": ...
}
]
}
{
"first_name":"Roger",
"last_name":"Waters",
"age":"36",
"phone_numbers":[
{
"contract_number":"546987224",
"phone_number":"987654321",
"creation_date": ...,
"expired":true
},
{
"contract_number":"87878787",
"phone_number":"55555555",
"creation_date": ...
}
]
}
Clients would like to perform a full text search. Okay no problem here
My problem :
In this full text search, sometimes user will search by phone_number. In this case there is a parameter like expired=true.
Example :
First client search request : "987654321" with expired absent or set to false
--> Result : Only first document
Second client search request : "987654321" with expired set to true
--> Result : The two documents
How can I achieve that ?
Here is my mapping :
{
"user": {
"_all": {
"auto_boost": true,
"omit_norms": true
},
"properties": {
"phone_numbers": {
"type": "nested",
"properties": {
"phone_number": {
"type": "string"
},
"creation_date": {
"type": "string",
"index": "no"
},
"contract_number": {
"type": "string"
},
"expired": {
"type": "boolean"
}
}
},
"first_name":{
"type": "string"
},
"last_name":{
"type": "string"
},
"age":{
"type": "string"
}
}
}
}
Thanks !
MC
EDIT :
I tried this query :
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "987654321",
"analyze_wildcard": "true"
}
},
"filter": {
"nested": {
"path": "phone_numbers",
"filter": {
"bool": {
"should":[
{
"bool": {
"must": [
{
"term": {
"phone_number": "987654321"
}
},
{
"missing": {
"field": "expired"
}
}
]
}
},
{
"bool": {
"must_not": [
{
"term": {
"phone_number": "987654321"
}
}
]
}
}
]
}
}
}
}
}
}}
But I get the two documents instead of get only the first one
You're very close. Try using a combination of must and should, where the must clause ensures the phone_number matches the search value, and the should clause ensures that either the expired field is missing or set to false. For example:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "987654321",
"analyze_wildcard": "true"
}
},
"filter": {
"nested": {
"path": "phone_numbers",
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"phone_number": "987654321"
}
}
],
"should": [
{
"missing": {
"field": "expired"
}
},
{
"term": {
"expired": false
}
}
]
}
}
}
}
}
}
}
}
}
I ran this query using your mapping and sample documents and it returned the one document for John Doe, as expected.

Resources