Indexing a field for both phrase searching and partial matches - elasticsearch

I am creating an index on an object, and wanting to be able to do both full phrase searches as well as partial matches. The type is called "deponent", and a simplified index creation is shown below:
{
"deponent": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name": {
"type": "string"
},
"full": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs",
"include_in_all": false
}
}
}
}
}
}
The intent of this is to index the values in the "name" field twice: once where the individual words within the field are not broken up (name.full) and once where the words are broken up (name.name).
I have a document which has been indexed whose name field is set to "Dr. Danny Watson". I would expect the following behaviors to occur when executing a term query (whose query string is not analyzed according to the documentation):
When searching name.full using "Dr. Danny Watson", the record
should be returned
When searching name.full using "Watson", the record should not be returned
When searching name.name using "Dr. Danny Watson", the record should not be returned
When searching name.name using "Watson", the record should be returned
The queries for the four points above:
1 - works as expected (returns the record)
{
"query" : {
"term": {
"name.full": {
"value": "Dr. Danny Watson"
}
}
}
}
2 - works as expected (does not return the record)
{
"query" : {
"term": {
"name.full": {
"value": "Watson"
}
}
}
}
3 - works as expected (does not return the record)
{
"query" : {
"term": {
"name.name": {
"value": "Dr. Danny Watson"
}
}
}
}
4 - does NOT work as expected - the record is not returned
{
"query" : {
"term": {
"name.name": {
"value": "Watson"
}
}
}
}
So it seems my understanding of something is flawed. What am I missing?

You don't need to call the field "name.name". The multi-field with the original name is used as the default, so you should use just "name" for that.
Also it's always good to make sure the index and search analyzers are in order (so for instance both your indexed terms and the search term are changed to lower case).

Related

How to formulate an and condition for a nested object in Opensearch?

Opensearch ingests documents similar to this example (its just a minimal example):
PUT nested_test/_doc/4
{
"log": "This is a fourth log message",
"function": "4 test function",
"related_objects": [
{ "type": "user", "id": "10" },
{ "type": "offer", "id": "120" }
]
}
PUT nested_test/_doc/5
{
"log": "This is a fifth log message",
"function": "5 test function",
"related_objects": [
{ "type": "user", "id": "120" },
{ "type": "offer", "id": "90" }
]
}
With many of these documents, I'd like to filter those which have a specific related object (e.g. type=user and id=120). With the example data above, this should only return the document with id 5. Using simple filters (DQL syntax) as follows does not work:
related_objects.type:user and related_objects.id:120
As this would also match a document 5, as there is a related_object with type user and a related object with id 120, although its not the related user object with id 120, its the related offer.
If Array[object] is used, the field type is nested, The document reference
Elasticsearch query example:
{
"query" : {
"nested" : {
"path" : "related_objects",
"query" : {
"bool" : {
"must" : [
{
"term" : {"related_objects.type" : "MYTYPE"}
},
{
"term" : {"related_objects.id" : "MYID"}
}
]
}
}
}
}
}
Basically just go into a nested query and specify all your AND conditions as MUST clauses inside a bool query.
As soon as the field is declared as nested field, it is possible to run a simple DQL query to get the desired information:
related_objects:{type:"user" and id:120}
This requires that the field has been defined as nested before:
PUT my-index-000001
{
"mappings": {
"properties": {
"related_objects": {
"type": "nested"
}
}
}
}

how to use filter in ElasticSearch?

I'm trying to implement filter using ElasticSearch I'm simply want to implement range filter I've the following data:
{
"result": [
{
"Id": "144039",
"posted_dt": 1506951883637,
"submit_dt": 1507609800000,
"title": "Request for Information (RFI) # 306-18-0018",
"fname": "RODRI",
"email": "",
"desc": "dummy Text"
}
]
}
I want to get data from last 3 or 5 days I'm using this :
query = {
"bool": {
"must": [
{
"range" : {
"posted_dt" : {
"gte" : "now-3d/d",
"lt" : "now/d"
}
}
} ]
}
}
My mapping for posted_dt is :
"posted_dt": {
"type": "long"
},
I did try the filter as well but didn't succeed.
Please help.
Thanks
Randheer
Your mapping of "posted_dt" field is incorrect. You intend to store date which is in epoch in millis but you are storing it as long type. So the date range filter won't work on long datatype. Update your "posted_dt" field's mapping like :
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"posted_dt": {
"type": "date",
"format": "epoch_millis"
}
}
}
}
}
Refer Date datatype in Elasticsearch.
First you need to share your mapping. Actually make sure that posted_dt and submit_dt are defined as date in your mapping. Here you are using a long which is incorrect to deal with dates.
A side note is that you should use filter instead of must in your case. Will be faster IMO.

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.
I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

ElasticSearch - issue with sub term aggregation with array fields

I have the two following documents:
{
"title":"The Avengers",
"year":2012,
"casting":[
{
"name":"Robert Downey Jr.",
"category":"Actor",
},
{
"name":"Chris Evans",
"category":"Actor",
}
]
}
and:
{
"title":"The Judge",
"year":2014,
"casting":[
{
"name":"Robert Downey Jr.",
"category":"Producer",
},
{
"name":"Robert Duvall",
"category":"Actor",
}
]
}
I would like to perform aggregations, based on two fields : casting.name and casting.category.
I tried with a TermsAggregation based on casting.name field, with a subaggregation, which is another TermsAggregation based on the casting.category field.
The problem is that for the "Chris Evans" entry, ElasticSearch set buckets for ALL categories (Actor, Producer) whereas it should set only 1 bucket (Actor).
It seems that there is a cartesian product between all casting.category occurences and all casting.name occurences.
It behaves like this with array fields (casting), whereas I don't have the problem with simple fields (as title, or year).
I also tried to use nested aggregations, but maybe not properly, and ElasticSearch throws an error telling that casting.category is not a nested field.
Any idea here?
Elasticsearch will flatten the nested objects, so internally you will get:
{
"title":"The Judge",
"year":2014,
"casting.name": ["Robert Downey Jr.","Robert Duvall"],
"casting.category": ["Producer", "Actor"]
}
if you want to keep the relationship you'll need to use either nested objects or a parent child relationship
To do a nested mapping you'd need to do something like this:
"mappings": {
"movies": {
"properties": {
"title" : { "type": "string" },
"year" : { "type": "integer" },
"casting": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"category": { "type": "string" }
}
}
}
}
}

Elasticsearch mapping - different data types in same field

I am trying to to create a mapping that will allow me to have a document looking like this:
{
"created_at" : "2014-11-13T07:51:17+0000",
"updated_at" : "2014-11-14T12:31:17+0000",
"account_id" : 42,
"attributes" : [
{
"name" : "firstname",
"value" : "Morten",
"field_type" : "string"
},
{
"name" : "lastname",
"value" : "Hauberg",
"field_type" : "string"
},
{
"name" : "dob",
"value" : "1987-02-17T00:00:00+0000",
"field_type" : "datetime"
}
]
}
And the attributes array must be of type nested, and dynamic, so i can add more objects to the array and index it by the field_type value.
Is this even possible?
I have been looking at the dynamic_templates. Can i use that?
You actually can index multiple datatypes into the same field using a multi-field mapping and the ignore_malformed parameter, if you are willing to query the specific field type if you want to do type specific queries (like comparisons).
This will allow elasticsearch to populate the fields that are pertinent for each input, and ignore the others. It also means you don’t need to do anything in your indexing code to deal with the different types.
For example, for a field called user_input that you want to be able to do date or integer range queries over if that is what the user has entered, or a regular text search if the user has entered a string, you could do something like the following:
PUT multiple_datatypes
{
"mappings": {
"_doc": {
"properties": {
"user_input": {
"type": "text",
"fields": {
"numeric": {
"type": "double",
"ignore_malformed": true
},
"date": {
"type": "date",
"ignore_malformed": true
}
}
}
}
}
}
}
We can then add a few documents with different user inputs:
PUT multiple_datatypes/_doc/1
{
"user_input": "hello"
}
PUT multiple_datatypes/_doc/2
{
"user_input": "2017-02-12"
}
PUT multiple_datatypes/_doc/3
{
"user_input": 5
}
And when you search for these, and have ranges and other type-specific queries work as expected:
// Returns only document 2
GET multiple_datatypes/_search
{
"query": {
"range": {
"user_input.date": {
"gte": "2017-01-01"
}
}
}
}
// Returns only document 3
GET multiple_datatypes/_search
{
"query": {
"range": {
"user_input.numeric": {
"lte": 9
}
}
}
}
// Returns only document 1
GET multiple_datatypes/_search
{
"query": {
"term": {
"user_input": {
"value": "hello"
}
}
}
}
I wrote about this as a blog post here
No - you cannot have different datatypes for the same field within the same type.
e.g. the field index/type/value can not be both a string and a date.
A dynamic template can be used to set the datatype and analyzer based on the format of the field name
For example:
set all fields with field names ending in "_dt" to type datetime.
But this won't help in your scenario, once the datatype is set you can't change it.

Resources