Algolia Facet Filters with Array Value - filter

For some reason, facet filters broke recently in algolia. I have products that have array attributes such as "pa_size:["XS", "S", "M", "L", "XL"]". I added "_product_attributes.pa_size.value" to the Attributes for faceting and set it to "searchable".
But when I try adding a facet filter such as "pa_size:M" directly in the algolia explorer I get 0 rows returned. It does the same thing with other similar facet filters.
Here's the raw query:
index.search("", {
"getRankingInfo": 1,
"facets": "*",
"attributesToRetrieve": "*",
"highlightPreTag": "<em>",
"highlightPostTag": "</em>",
"hitsPerPage": 10,
"facetFilters": [
"pa_size:M"
],
"maxValuesPerFacet": 100
});
Here's the attribute:
"_product_attributes" : {
"pa_size" : {
"name" : "pa_size",
"value" : "",
"position" : "1",
"is_visible" : 1,
"is_variation" : 1,
"is_taxonomy" : 1
},
Any ideas what could be causing this?
Thanks!
Rob

It turned out to be a fairly simple issue. The attributes needed to be added to the Attributes for faceting list, but I was having problems saving it. That's why it didn't work. Once I was able to save the "pa_size" attribute, it worked as expected.

Related

Create field based on an existing field in Elasticsearch

I have an Elasticsearch index that stores products and it's properties like size, color, material as a dynamic field:
"raw_properties" : {
"dynamic" : "true",
"properties" : {
"Color" : {
"type" : "text",
"fields" : {
"keyword" : { "type" : "keyword", "ignore_above" : 256 }
}
},
"Size" : {
"type" : "text",
"fields" : {
"keyword" : { "type" : "keyword", "ignore_above" : 256
}
}
}
}
}
An indexed document looks like this:
{
"_index" : "development-products",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"raw_properties" : {
"Size" : ["XS", "S", "XL"],
"Color" : ["blue", "orange"]
}
}
}
The problem is, that the value of raw_properties comes from various sources, and they differ a lot. For example, the field Color is called Colour from another source, and blue could be light-blue and so on.
So, I've implemented a normalization step in my app, that does a simple mapping like this (for simplicity, the mapping here is just a Ruby Hash, in reality the mapping is read from a database):
PROPERTY_MAPPING = {
"Colour_blue" => ["Color", "blue"],
"Color_light-blue" => ["Color", "blue"],
"Size_46" => ["Size", "S"]
}
When my app indexes a product, it looks into this property mapping and normalizes the property. This keeps the cardinality of the fields low and the user isn't presented with too much properties to filter.
The problem: Updating those mappings is pretty slow, as I have to reindex the affected products by applying the new mapping in my app and sending the data to Elasticsearch. I'm dealing with about 3 million products here, and new data with a new normalization comes in every day. I try to only find the products that are affected and so on, but it is still too slow.
So I was thinking if there was a way to do the normalization inside Elasticsearch? I've read about enriching data (https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest-enriching-data.html) or the pipelines with processors (https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest-processors.html) and had a look into Painless.
The main idea would be to only update the mapping, do an update_by_query, and let Elasticsearch take care of the rest.
So, I'm not sure if this is possible at all or where I should start. Any advice or hint is appreciated!

Preserve wrong messages in Elasticsearch

I have a static mapping in Elasticsearch index. When a message doesn't match this mapping, it is discarded. Is there a way to route it to a default index for wrong messages?
To give you example, I have some fields with integer type:
"status_code": {
"type": "integer"
},
When a message contains a number
"status_code": 123,
it's ok. But when it is
"status_code": "abc"
it fails.
You can have ES do this triage pretty easily using ingest nodes/processors.
The main idea is to create an ingest pipeline with a convert processor for the status_code field and if the conversion doesn't work, you can add an on_failure condition which will direct the document at another index that you can later process.
So create the failures ingest pipeline:
PUT _ingest/pipeline/failures
{
"processors": [
{
"convert": {
"field": "status_code",
"type": "integer"
}
}
],
"on_failure": [
{
"set": {
"field": "_index",
"value": "failed-{{ _index }}"
}
}
]
}
Then when you index a document, you can simply specify the pipeline in parameter. Indexing a document with correct status code will succeed:
PUT test/doc/1?pipeline=failures
{
"status_code": 123
}
However, trying to index a document with a bad status code, will actually also succeed, but your document will be indexed in the failed-test index and not the test one:
PUT test/doc/2?pipeline=failures
{
"status_code": "abc"
}
After running these two commands, you'll see this:
GET failed-test/_search
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "failed-test",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"status_code" : "abc"
}
}
]
}
}
To sum up, you didn't have to handle that exceptional case in your client code and could fully leverage ES ingest nodes to achieve the same task.
You can set the parameter ignore malformed to ignore just the field with the type mismatch and not the whole document.
And you can try to combine it with multi-fields, that allows you to map the same value in different ways.
You will probably need something like this:
"status_code": {
"type": "integer",
"fields": {
"as_string": {
"type": "keyword"
}
}
}
This way you will have a field named status_code as an intenger and the same value in a field named status_code.as_string as a keyword, but you should test to see if really does what you want.
Use Strict mapping and you will be able to catch the exception raised by Elastic.
Below is the excerpt from Elastic docs:
By default, when a previously unseen field is found in a document, Elasticsearch will add the new field to the type mapping. This behaviour can be disabled, both at the document and at the object level, by setting the dynamic parameter to false (to ignore new fields) or to strict (to throw an exception if an unknown field is encountered).
As a part of Exception handling, you can push the message to some other index where dynamic mapping is enabled.

Elasticsearch appends random strings to source data inside indexes

I am new to Elasticsearch and have a peculiar problem: I am using elasticsearch with kibana to store and visualize most of the events in my application. For example to store a user login with user_id of 123, I would write to an index user/login/123 with the following array as data:
{
"details" : {
"fname" : "John",
"lname" : "Smith",
"click" : "login-button",
etc...
},
"ip_address" : 127.0.0.1,
"browser_type" : "Chrome",
"browser_version" : "17"
}
However the problem I encountered is that some records show up with a random string after the "details" array: see screenshot. Can anyone suggest what am I doing wrong and how can I fix existing indexes?
Screenshot
I think you should have something like this in your data:
{
"details" : {
"28d211adbf" : {
"stats" : {
"merge_field_count": 6,
"unsubscribe_count_since_send": 3
}
},
"555cd3bcba" : {
"stats" : {
"merge_field_count": 6,
"unsubscribe_count_since_send": 3
}
}
},
"ip_address" : 127.0.0.1,
"browser_type" : "Chrome",
"browser_version" : "17"
}
It's actually not a good practice in indexing document in elasticsearch.
Read about mapping explosion for more info about this:
https://www.elastic.co/blog/found-crash-elasticsearch#mapping-explosion

Elasticsearch: Comparing two fields of the same document, where one of the fields is inside a nested document

Consider the following document:
{
"group" : "fans",
"preferredFanId" : 1,
"user" : [
{
"fanId" : 1,
"first" : "John",
"last" : "Smith"
},
{
"fanId" : 2,
"first" : "Alice",
"last" : "White"
},
]
}
where "user" is a nested document. I want to get inner_hits (from 2.0.0-SNAPSHOT) where preferredFanId == user.fanId , and so I want only the John Smith record returned in the inner_hits.
Is it possible? I've tried several approaches like using "include_in_parent" or "_source", but nothing seems to work.

Get all fields of a document in ElasticSearch search query

How can I get all fields in documents matched by search query? ES documentation on fields says that using *, one can get all fields: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html
Having this document and this query, I get hit in result, but no fields are returned:
Put document:
curl -XPUT http://localhost:9200/idx/t/doc1 -d '{
"f": "value"
}'
Search it:
curl -XPOST http://localhost:9200/idx/_search?pretty -d '{
"fields": "*",
"query": { "term" : { "f" : "value" }}
}'
I tried also ["*"], but the result is the same, only default fields (_id and _type) are returned. The hits part of response looks like this:
"hits" : {
"total" : 1,
"max_score" : 0.30685282,
"hits" : [ {
"_index" : "idx",
"_type" : "t",
"_id" : "doc1",
"_score" : 0.30685282
} ]
}
The doc actually says:
"* can be used to load all stored fields from the document."
The core types doc says that the default for storing fields is 'false'.
Since by default ElasticSearch stores all fields of the source document in the special _source field, this option is primarily useful when the _source field has been disabled in the type definition. Defaults to false.
If you don't specify 'fields' in your search, you can see what's in _source.
So, if you want to return it as a field, change your mapping to store the field.
I am facing this problem, too.
I found out that if I just search the text or keyword fields, everything is OK.
Hope this may help you.

Resources