Elasticsearch nested objects with query_string as first class attributes - elasticsearch

I'm trying to index a nested field as a first-class attribute in my document so that I can search them using query_string without dot syntax.
For example, if I have a document like
"data": { "name": "Bob" }
instead of searching for data.name:Bob I would like to be able to search for name:Bob
The root of my issue is that we index a jsonb column that may have varying attributes. In some instances the data property may contain a data.business attribute, etc. I would like users to be able to search on these attributes without needing to "dig" into the object.
The data field does not have to be indexed as a nested type unless necessary; I was indexing it as an object previously.
I have tried to leverage the _all field as suggested in this post.
I have also tried to use include_in_parent:true and set the datatype as nested for my data field as suggested in this post.
I have also looked into the inner_hits feature to no avail.
Here's an example of my mapping for the data attribute.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"data": {
"type": "object"
}
}
}
}
}
Example document
PUT my_index/_doc/1
{
"data": {
name: "bob",
business: "None of yours"
}
}
And how my query currently looks:
GET my_index/_search
{
"query": {
"query_string": {
"query": "name:bob",
"fields": ["data.*"]
}
}
}
With the current setup I almost get my desired results. I can search on individual properties like data.name:bob and data.business:"None of yours" and get back the correct documents.
However I want to be able to get the exact same results with business:"None of yours" or name:bob.
Thanks in advance for any help!

I figured it out using dynamic templates. For anyone coming across this in the future, here is how I solved the issue:
I used path_match to match the data object (data.*).
Then using copy_to and {name} I dynamically created top-level fields on my parent object.
{
"dynamic_templates":[
{"template_1":
{"mapping":
{"copy_to":"{name}"},
"path_match":"data.*"
}
}
]
}

Related

Elasticsearch 6.2: terms query require lowercase input when searching on keyword

I've created an example index, with the following mapping:
{
"_doc": {
"_source": {
"enabled": False
},
"properties": {
"status": { "type": "keyword" }
}
}
}
And indexed a document:
{"status": "CMP"}
When searching the documents with this status with a terms query, I find no results:
{
"query" : {
"terms": { "status": ["CMP"]}
}
}
However, if I make the same query by putting the input in lowercase, I will find my document:
{
"query" : {
"terms": { "status": ["cmp"]}
}
}
Why is it? Since I'm searching on a keyword field, the indexed content should not be analyzed and should match an uppercase value...
no more #Oliver Charlesworth Now - in Elastic 6.x - you could continue to use a keyword datatype, lowercasing your text with a normalizer,doc here. However in every cases you should change your index mapping and reindex your docs
The index and mapping creation and the search were part of a test suite. It seems that the setup part of the test suite was not executed, and the mapping was not applied to the index.
The index was then using the default types instead of the mapping types, resulting of the use of string fields instead of keywords.
After changing the setup method of the automated tests, the mappings are well applied to the index, and the uppercase values for the status "CMP" are now matching documents.
The symptoms you're seeing shouldn't occur, unless something else is wrong.
A keyword index is not analysed, so your index should contain only CMP. A terms query is also not analysed, etc. so your index is searched only for CMP. Hence there should be a match.

Find documents in Elasticsearch where `ignore_malformed` was triggered

Elasticsearch by default throws an exception if inserting data to a field which does not fit the existing type. For example, if a field has been created as number type, inserting a document with a string value for that field causes an error.
This behavior can be changed by enabling then ignore_malformed setting, which means such fields are silently ignored for indexing purposes, but retained in the _source document - meaning that the invalid values cannot be searched or aggregated, but are still included in the returned document.
This is preferable behavior in our use case, but we would wish to be able to locate such documents somehow so we can fix them in the future.
Is there any way to somehow flag documents for which some malformed fields were ignored? We control the document insertion process fully, so we can modify all insertion flags, or do a trial insert, or anything, to reach our goal.
You can use the exists query to find document where this field does not exist, see this example
PUT foo
{
"mappings": {
"bar": {
"properties": {
"baz": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}
PUT foo/bar/1
{
"baz": "field"
}
GET foo/bar/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "baz"
}
}
]
}
}
}
}
}
There is no dedicated mechanism though, so this search finds also documents where the field is not set intentionally
You cannot, when you search on elasticsearch, you don't search on document source but on the inverted index, which contains the analyzed data.
ignore_malformed flag is saying "always store document, analyze if possible".
You can try, create a mal-formed document, and use _termvectors API to see how the document is analyzed and stored in the inverted index, in a case of a string field, you can see an "Array" is stored as an empty string etc.. but the field will exists.
So forget the inverted index, let's use the source!
Scroll all your data until you find the anomaly, I use a small python script that search scroll, unserialize and I test field type for every documents (very long) but I can have a list of wrong document IDs.
Use a script query can be very long and crash your cluster, use with caution, maybe as a post_filter:
Here I want to retrieve the document where country_name is not a string:
{
"_source": false,
"timeout" : "30s",
"query" : {
"query_string" : {
"query" : "locale:de_ch"
}
},
"post_filter": {
"script": {
"script": "!(_source.country_name instanceof String)"
}
}
}
"_source:false" => I want only document ID
"timeout" => prevent crash
As you notice, this is a missing feature, I know logstash will tag
document that fail, so elasticsearch could implement the same thing.

Document with multiple nested types

I have a type of document that can have lots of nested type objects, how can I map all of these as nested without actually having to tediously specify a mapping for every single field in the document?
Have you seen Dynamic templates?
Dynamic templates allow you to define custom mappings that can be
applied to dynamically added fields based on:
the datatype detected by Elasticsearch, with match_mapping_type.
the name of the field, with match and unmatch or match_pattern.
the full dotted path to the field, with path_match and path_unmatch.
The original field name {name} and the detected datatype
{dynamic_type} template variables can be used in the mapping
specification as placeholders.
So you could potentially use this example by adding some sort of special pattern to your field, so template would recognize it and map it as a nested object.
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"nested_objects": {
"match": "nested_*",
"mapping": {
"type": "nested"
}
}
}
]
}
}
}
P.S. I haven't tested this myself. Let me know if this helps you.

Elasticsearch field name aliasing

Is it possible to setup alias for field names in elasticsearch? (Just like how index names can be aliased)
For example: i have a document {'firstname': 'John', 'lastname': 'smith'}
I would like to alias 'firstname' to 'fn'...
Just a quick update, Elasticsearch 6.4 came up with feature called Alias Datatype. Check the below mapping and query as sample.
Note that the type of the field is alias in the below mapping for fieldname fn
Sample Mapping:
PUT myindex
{
"mappings": {
"_doc": {
"properties": {
"firstname": {
"type": "text"
},
"fn": {
"type": "alias",
"path": "firstname"
}
}
}
}
}
Sample Query:
GET myindex/_search
{
"query": {
"match" : {
"fn" : "Steve"
}
}
}
The idea is to use the alias for actual field on which inverted index is created. Note that fields with alias datatype aren't meant for write operations and its only meant for querying purpose.
Although you can refer to the link I've mentioned for more details, below are just some of the important points.
Field alias is only meant to be used when your index has a single mapping. Index has to be created post 6.xx version or be created in older version with the setting index.mapping.single_type: true
Can be used in querying, aggregations, sorting, highlighting and suggestion operations
Target field must be actual field on which inverted index is created
Cannot create alias of another alias field
Cannot use alias on multiple fields. Single alias, Single field.
Cannot be used as part of source filtering using _source.
There is no direct field alias functionality. However, you could rename the fields upon indexing using the index_name property in your mappings.
index_name : The name of the field that will be stored in the index.
Defaults to the property/field name.
See here for more information: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html
Adding alias fn for existing field firstname
PUT myindex/_mapping
{
"properties": {
"fn": {
"type": "alias",
"path": "firstname"
}
}
}
Should work this way as of Elasticsearch 7.
Probably you can try creating an alias on your index with filter on the desired field. Your filter must be written in such a way that it selects all the entries from your field. Please refer Filtered aliases section in here. But I am interested in knowing your use case. Why you want to create alias on particular field.

Sorting a match query with ElasticSearch

I'm trying to use ElasticSearch to find all records containing a particular string. I'm using a match query for this, and it's working fine.
Now, I'm trying to sort the results based on a particular field. When I try this, I get some very unexpected output, and none of the records even contain my initial search query.
My request is structured as follows:
{
"query":
{
"match": {"_all": "some_search_string"}
},
"sort": [
{
"some_field": {
"order": "asc"
}
}
] }
Am I doing something wrong here?
In order to sort on a string field, your mapping must contain a non-analyzed version of this field. Here's a simple blog post I found that describes how you can do this using the multi_field mapping type.

Resources