Sorting in elastic search using new java api - elasticsearch

I am using the latest java API for communication with the elastic search server.
I require to search data in some sorted order.
SortOptions sort = new SortOptions.Builder().field(f -> f.field("customer.keyword").order(SortOrder.Asc)).build();
List<SortOptions> list = new ArrayList<SortOptions>();
list.add(sort);
SearchResponse<Order> response = elasticsearchClient.search(b -> b.index("order").size(100).sort(list)
.query(q -> q.bool(bq -> bq
.filter(fb -> fb.range(r -> r.field("orderTime").
gte(JsonData.of(timeStamp("01-01-2022-01-01-01")))
.lte(JsonData.of(timeStamp("01-01-2022-01-01-10")))
)
)
// .must(query)
)), Order.class);
I have written the
above code for getting search results in sorted order by customer.
I am getting the below error when I run the program.
Exception in thread "main" co.elastic.clients.elasticsearch._types.ElasticsearchException: [es/search] failed: [search_phase_execution_exception] all shards failed
at co.elastic.clients.transport.rest_client.RestClientTransport.getHighLevelResponse(RestClientTransport.java:281)
at co.elastic.clients.transport.rest_client.RestClientTransport.performRequest(RestClientTransport.java:147)
at co.elastic.clients.elasticsearch.ElasticsearchClient.search(ElasticsearchClient.java:1487)
at co.elastic.clients.elasticsearch.ElasticsearchClient.search(ElasticsearchClient.java:1504)
at model.OrderDAO.fetchRecordsQuery(OrderDAO.java:128)
Code runs fine if I remove .sort() method.
My index is configured in the following format.
{
"order": {
"aliases": {},
"mappings": {
"properties": {
"customer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"orderId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"orderTime": {
"type": "long"
},
"orderType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"provided_name": "order",
"creation_date": "1652783550822",
"number_of_replicas": "1",
"uuid": "mrAj8ZT-SKqC43-UZAB-Jw",
"version": {
"created": "8010299"
}
}
}
}
}
Please let me know what is wrong here also if possible please send me the correct syntax for using sort() in the new java API.
Thanks a lot.

As you have confirmed in comment, customer is a text type field and this is the reason you are getting above error as sort can not apply on texttype of field.
Your index should be configured like below for customer field to apply sort:
{
"mappings": {
"properties": {
"customer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Once you have index mapping like above, you can use customer.keyword as field name for sorting and customer as field name for free text search.

Related

How to declare mapping for nested fields in Elasticsearch to allow for storing different types?

In essence, I want my mapping to be as schemaless as possible, but allow for nested types and being able to store data that may have different types:
When I try to add a document where some fields have different types of values, I get an error like this:
"type": "illegal_argument_exception",
"reason": "mapper [data.customData.value] of different type, current_type [long], merged_type [text]"
This can easily be solved by mapping the field value to text (or create it dynamically by first inserting a document with only text). However, I would like to avoid having a schema. Perhaps having all of the fields nested in customData to be set to text? How do I do that?
I had the problem earlier, but then it started working after accidentally managing to get a dynamical mapping that worked (since everything was regarded as text. I was later made aware of this problem since I needed to change the mapping to allow for nested types.
Documents with this kind of data are troublesome to store successfully:
"customData": [
{
"value": "some_text",
"key": "some_text"
},
{
"value": 0,
"key": "some_text"
}
]
A part of the mapping that works:
{
"my_index": {
"aliases": {},
"mappings": {
"_doc": {
"properties": {
"data": {
"properties": {
"customData": {
"properties": {
"key": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
},
"some_list": {
"type": "nested",
"properties": {
"some_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
In essence, I want the mapping to be as schemaless as possible, but allow for nested types and being able to store data that may have different types:
{
"mappings": {
"_doc": {
"properties": {
"data": {
"type": "object"
},
"somee_list": {
"type": "nested"
}
}
}
}
}
So what would be the best approach to go about this problem?

How to set elasticsearch index mapping as not_analysed for all the fields

I want my elasticsearch index to match the exact value for all the fields. How do I map my index to "not_analysed" for all the fields.
I'd suggest making use of multi-fields in your mapping (which would be default behavior if you aren't creating mapping (dynamic mapping)).
That way you can switch to traditional search and exact match searches when required.
Note that for exact matches, you would need to have keyword datatype + Term Query. Sample examples are provided in the links I've specified.
Hope it helps!
You can use dynamic_templates mapping for this. As a default, Elasticsearch is making the fields type as text and index: true like below:
{
"products2": {
"mappings": {
"product": {
"properties": {
"color": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
As you see, also it creates a keyword field as multi-field. This keyword fields indexed but not analyzed like text. if you want to drop this default behaviour. You can use below configuration for the index while creating it :
PUT products
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"product": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"index": false
}
}
}
]
}
}
}
After doing this the index will be like below :
{
"products": {
"mappings": {
"product": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"index": false
}
}
}
],
"properties": {
"color": {
"type": "keyword",
"index": false
},
"type": {
"type": "keyword",
"index": false
}
}
}
}
}
}
Note: I don't know the case but you can use the multi-field feature as mentioned by #Kamal. Otherwise, you can not search on the not analyzed fields. Also, you can use the dynamic_templates mapping set some fields are analyzed.
Please check the documentation for more information :
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html
Also, I was explained the behaviour in this article. Sorry about that but it is Turkish. You can check the example code samples with google translate if you want.

ElasticSearch Logstash JDBC: How to aggregate into different column names

I am new to Elasticsearch and I am trying to use Logstash to load data to an index. Following is a partial of my losgstash config:
filter {
aggregate {
task_id => "%{code}"
code => "
map['campaignId'] = event.get('CAM_ID')
map['country'] = event.get('COUNTRY')
map['countryName'] = event.get('COUNTRYNAME')
# etc
"
push_previous_map_as_event => true
timeout => 5
}
}
output {
elasticsearch {
document_id => "%{code}"
document_type => "company"
index => "company_v1"
codec => "json"
hosts => ["127.0.0.1:9200"]
}
}
I was expecting that the aggregation would map for instance the column 'CAM_ID' into a property in the ElasticSearch Index as 'campaignId'. Instead, is creating a property with the name 'cam_id' which is the column name as lowercase. The same with the rest of the properties.
Following is the Index Document after logstash being executed:
{
"company_v1": {
"aliases": {
},
"mappings": {
"company": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"cam_id": {
"type": "long"
},
"campaignId": {
"type": "long"
},
"cam_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"campaignType": {
"type": "text"
}
}
}
},
"settings": {
"index": {
"creation_date": "1545905435871",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "Dz0x16ohQWWpuhtCB3Y4Vw",
"version": {
"created": "6050399"
},
"provided_name": "company_v1"
}
}
}
}
'campaignId' and 'campaignType' were created by me when i created the index, but logstash created the other 2.
Can someone explain me how to configure logstash to customize the indexes documents properties names when data is being loaded?
Thank you very much.
Best Regards

elasticsearch query child list containing specific value

I writing a query to return the products that has a specific promotionCode. In my index, product has following property indexed
"offers": [
{
"promotionCode": "MV"
},
{
"promotionCode": "LI"
},
.....
]
My initial thought the following would be the answer to
GET alias-live-dev/_search
{
"query": {
"match": {
"offers.promotionCode":"MV"
}
}
}
However, this always return 0 hit, I am guessing, it failed because offers is a list. Could anyone please advise what would the right query for this scenario. Thanks in advance.
In mapping,
"productId": {
"type": "keyword"
},
"offers": {
"type": "nested",
"properties": {
......
"promotionCode": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},

Aggregating over _field_names in elasticsearch 5

I'm trying to aggregate over field names in ES 5 as described in Elasticsearch aggregation on distinct keys But the solution described there is not working anymore.
My goal is to get the keys across all the documents. Mapping is the default one.
Data:
PUT products/product/1
{
"param": {
"field1": "data",
"field2": "data2"
}
}
Query:
GET _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 0
}
}
}
}
I get following error: Fielddata is not supported on field [_field_names] of type [_field_names]
After looking around it seems the only way in ES > 5.X to get the unique field names is through the mappings endpoint, and since cannot aggregate on the _field_names you may need to slightly change your data format since the mapping endpoint will return every field regardless of nesting.
My personal problem was getting unique keys for various child/parent documents.
I found if you are prefixing your field names in the format prefix.field when hitting the mapping endpoint it will automatically nest the information for you.
PUT products/product/1
{
"param.field1": "data",
"param.field2": "data2",
"other.field3": "data3"
}
GET products/product/_mapping
{
"products": {
"mappings": {
"product": {
"properties": {
"other": {
"properties": {
"field3": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"param": {
"properties": {
"field1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"field2": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Then you can grab the unique fields based on the prefix.
This is probably because setting size: 0 is not allowed anymore in ES 5. You have to set a specific size now.
POST _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 100 <--- change this
}
}
}
}

Resources