Sorting on fields containing Numbers as strings

Sorting on fields containing Numbers as strings - sorting

I am trying to sort results in elastic search based on a field named "segmentNumber" that has numbers stored as strings i.e ("1", "2", "10", "14"). I need to write an ES query to get results in sorted ascending order and descending as shown below.
Ascending order:
{segmentNumber: "1"},
{segmentNumber: "2"},
{segmentNumber: "10"},
{segmentNumber: "14"},
Descending Order:
{segmentNumber: "14"},
{segmentNumber: "10"},
{segmentNumber: "2"},
{segmentNumber: "1"},
I am using a sort script as shown below to achieve this, but it doesn't work for me
"sort": [
{
"_script": {
"script": "try { Integer.parseInt(doc[\"segmentNumber.keyword\"].value); } catch(Exception e){ return Integer.MAX_VALUE;}",
"type": "number",
"order": "asc"
}
}
]
}
Could anyone please suggest me how to achieve this functionality ? Thanks!

Related

How to sort with case insensitive without changing the settings

My index name is data_new
Below is the code to insert into index
test = [ {'id':1,'name': 'A', 'professor': ['Bill Cage', 'accounting']},
{ 'id':2, 'name': 'AB', 'professor': ['Gregg Payne', 'engineering']},
{'id':3, 'name': 'a', 'professor': ['Bill Cage', 'accounting']},
{'id':4,'name': 'Tax Accounting 200', 'professor': ['Thomas Baszo', 'finance']},
{'id':5,'name': 'Capital Markets 350', 'professor': ['Thomas Baszo', 'finance']},
{'id':6,'name': 'Theatre 410', 'professor': ['Sebastian Hern', 'art']},
{'id':7,'name': 'Accounting 101', 'professor': ['Thomas Baszo', 'finance']},
{'id':8,'name': 'Marketing 101', 'professor': ['William Smith', 'finance']},
{'id':8,'name': 'Anthropology 230', 'professor': ['Devin Cranford', 'history']},
{'id':10, 'name': 'Computer Science 101',
'professor': ['Gregg Payne', 'engineering']}]
from elasticsearch import Elasticsearch
import json
es = Elasticsearch()
es.indices.create(index='data_new', ignore=400)
for e in test:
es.index(index="data_new", body=e, id=e['id'])
search = es.search(index="data_new", body={"from" : 0, "size" : 2,"query": {"match_all": {}}})
search['hits']['hits']
Right now
[{'id':1,'name': 'A'},
{ 'id':2, 'name': 'AB'},
{'id':3, 'name': 'a'}]
Expected is in below order
[{'id':1,'name': 'A'},
{ 'id':3, 'name': 'a'},
{'id':2, 'name': 'AB'}]
for input ["a", "b", "B", "C", "c", "A"]
the result is : ["A", "B", "C", "a", "b", "c"]
I want output as ["A", "a", "B", "b", "C", "c"]
Expected out
My first Expected output > I need to sort the output with respect to name only in {Case insensitive}. I need to normalise name keyword and sort
How to do the modification on search = es.search(index="data_new", body={"from" : 0, "size" : 2,"query": {"match_all": {}}})
I have updated the code with below
search = es.search(index="data_new", body={ "sort" : [{"name.keyword" : {"order" : "asc"}], {"size": 1000, "query": {"query_string": {"query": "A"}}})
with "normalizer": "case_insensitive"}
I got the error
RequestError: RequestError(400, 'x_content_parse_exception', '[1:41] [field_sort] unknown field [normalizer]')

In order to do this you will have to use a script with ctx._source.mykey.toLowerCase()
https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-sort-context.html
You can find another post which talk about it:
Script-based sorting on Elasticsearch date field
And a good article with an example here:
https://qbox.io/blog/how-to-painless-scripting-in-elasticsearch
Code will look like (not tested)
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "string",
"order": "asc",
"script": {
"lang": "painless",
"inline": "doc['name.keyword'].value.toLowerCase()"
}
}
}
}
Note: It's a bad practice and you should do it only for a one shot query. If you want your application to stay healthy you should implement the solution suggested by saeednasehi.
You can also use index sorting to be more performant.

In order to use normalizer, you need to define it into your mapping. you are not able to use it as an argument in your search. In your case, you need to have two fields for sort. I have made this by copying data to other fields. the first field has lowercase normalizer and the other one not.
PUT /test_index/
{
"settings": {
"analysis": {
"normalizer": {
"myLowercase": {
"type": "custom",
"filter": [ "lowercase" ]
}
}
}
},
"mappings":{
"post":{
"properties":{
"name":{
"normalizer":"myLowercase",
"type":"keyword",
"copy_to": [
"name2"
]
},
"name2":{
"type":"keyword"
}
}
}
}
}
And your query would be something like this:
GET test_index/_search
{
"query": {
"match_all": {}
},"sort": [
{
"name": {
"order": "asc"
}
},
{
"name2":{
"order": "asc"
}
}
]
}
This is the mapping and setting that you must have for your name field in your indices and you need to add other fields to the mapping as well. Please have the attention that this is for elasticsearch version below 7. If you use elasticsearch version 7 you must delete doc_type which is named post here from the mapping.

Sort by date + show past results after upcoming results

In Elastic I'd like to sort results by start_date ascending, but with past dates showing up after upcoming dates.
Example desired results:
[
{id: 5, start_date: '3000-01-01'},
{id: 7, start_date: '3001-01-01'},
{id: 8, start_date: '3002-01-01'},
{id: 1, start_date: '1990-01-01'},
{id: 4, start_date: '1991-01-01'},
{id: 3, start_date: '1992-01-01'},
]
Something like this would be possible in SQL:
ORDER BY (start_date > NOW()) DESC, start_date ASC
But I'm not sure how to accomplish this in Elastic. The only thing I can think of would be to set a boolean is_upcoming flag and reindex that every day.
Also I could be limiting and paginating the # of search results, so fetching them in reverse start_date order and then manipulating the results in my code isn't really doable.

It's perfectly possible using a sort script if your start_date is of type date and its format is yyyy-MM-dd (I found YYYY-... to not work properly).
GET future/_search
{
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "return doc['start_date'].value.millis > params.now ? (doc['start_date'].value.millis - params.now) : Long.MAX_VALUE",
"params": {
"now": 1594637988236
}
},
"order": "asc"
}
},
{
"start_date": {
"order": "asc"
}
}
]
}
The parametrized now is needed for synchronization reasons as described here.

JMESPath current array index

In JMESPath with this query:
people[].{"index":#.index,"name":name, "state":state.name}
On this example data:
{
"people": [
{
"name": "a",
"state": {"name": "up"}
},
{
"name": "b",
"state": {"name": "down"}
},
{
"name": "c",
"state": {"name": "up"}
}
]
}
I get:
[
{
"index": null,
"name": "a",
"state": "up"
},
{
"index": null,
"name": "b",
"state": "down"
},
{
"index": null,
"name": "c",
"state": "up"
}
]
How do I get the index property to actually have the index of the array? I realize that #.index is not the correct syntax but have not been able to find a function that would return the index. Is there a way to include the current array index?

Use-case
Use Jmespath query syntax to extract the numeric index of the current array element, from a series of array elements.
Pitfalls
As of this writing (2019-03-22) this feature is not a part of the standard Jmespath specification.
Workaround
This is possible when running Jmespath from within any of various programming languages, however this must be done outside of Jmespath.

This is not exactly the form you requested but I have a possible answer for you:
people[].{"name":name, "state":state.name} | merge({count: length(#)}, #[*])
this request give this result:
{
"0": {
"name": "a",
"state": "up"
},
"1": {
"name": "b",
"state": "down"
},
"2": {
"name": "c",
"state": "up"
},
"count": 3
}
So each attribute of this object have a index except the last one count it just refer the number of attribute, so if you want to browse the attribute of the object with a loop for example you can do it because you know that the attribute count give the number of attribute to browse.

mgo with aggregation and grouping

I am trying to perform a query using golang mgo
to effectively get distinct values from a join, I understand that this might not be the best paradigm to work with in Mongo.
Something like this:
pipe := []bson.M{
{
"$group": bson.M{
"_id": bson.M{"user": "$user"},
},
},
{
"$match": bson.M{
"_id": bson.M{"$exists": 1},
"user": bson.M{"$exists": 1},
"date_updated": bson.M{
"$gt": durationDays,
},
},
},
{
"$lookup": bson.M{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details",
},
},
{
"$lookup": bson.M{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details",
},
},
}
err := d.Pipe(pipe).All(&result)
If I comment out the $group section, the query returns the join as expected.
If I run as is, I get NULL
If I move the $group to the bottom of the pipe I get an array response with Null values
Is it possible to do do an aggregation with a $group (with the goal of simulating DISTINCT) ?

The reason you're getting NULL is because your $match filter is filtering out all of documents after the $group phase.
After your first stage of $group the documents are only as below example:
{"_id": { "user": "foo"}},
{"_id": { "user": "bar"}},
{"_id": { "user": "baz"}}
They no longer contains the other fields i.e. user, date_updated and organization. If you would like to keep their values, you can utilise Group Accumulator Operator. Depending on your use case you may also benefit from using Aggregation Expression Variables
As an example using mongo shell, let's use $first operator which basically pick the first occurrence. This may make sense for organization but not for date_updated. Please choose a more appropriate accumulator operator.
{"$group": {
"_id":"$user",
"date_updated": {"$first":"$date_updated"},
"organization": {"$first":"$organization"}
}
}
Note that the above also replaces {"_id":{"user":"$user"}} with simpler {"_id":"$user"}.
Next we'll add $project stage to rename our result of _id field from the group operation back to user. Also carry along the other fields without modifications.
{"$project": {
"user": "$_id",
"date_updated": 1,
"organization": 1
}
}
Your $match stage can be simplified, by just listing the date_updated filter. First we can remove _id as it's no longer relevant up to this point in the pipeline, and also if you would like to make sure that you only process documents with user value you should placed $match before the $group. See Aggregation Pipeline Optimization for more.
So, all of those combined will look something as below:
[
{"$group":{
"_id": "$user",
"date_updated": { "$first": "$date_updated"},
"organization": { $first: "$organization"}
}
},
{"$project":{
"user": "$_id",
"date_updated": 1,
"organization": 1
}
},
{"$match":{
"date_updated": {"$gt": durationDays } }
},
{"$lookup":{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details"
}
},
{"$lookup":{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details"
}
}
]
(I know you're aware of it) Lastly, based on the database schema above with users and organizations collections, depending on your application use case you may re-consider embedding some values. You may find 6 Rules of Thumb for MongoDB Schema Design useful.

Elastic Search. Search by sub-collection value

Need help with specific ES query.
I have objects at Elastic Search index. Example of one of them (Participant):
{
"_id": null,
"ObjectID": 6008,
"EventID": null,
"IndexName": "crmws",
"version_id": 66244,
"ObjectData": {
"PARTICIPANTTYPE": "2",
"STATE": "ACTIVE",
"EXTERNALID": "01010111",
"CREATORID": 1006,
"partAttributeList":
[
{
"SYSNAME": "A",
"VALUE": "V1"
},
{
"SYSNAME": "B",
"VALUE": "V2"
},
{
"SYSNAME": "C",
"VALUE": "V2"
}
],
....
I need to find the only entity(s) by partAttributeList entities. For example whole Participant entity with SYSNAME=A, VALUE=V1 at the same entity of partAttributeList.
If i use usul matches:
{"match": {"ObjectData.partAttributeList.SYSNAME": "A"}},
{"match": {"ObjectData.partAttributeList.VALUE": "V1"}}
Of course I will find more objects than I really need. Example of redundant object that can be found:
...
{
"SYSNAME": "A",
"VALUE": "X"
},
{
"SYSNAME": "B",
"VALUE": "V1"
}..

What I get you are trying to do is to search multiple fields of the same object for exact matches of a piece of text so please try this out:
https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-query-strings.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Sorting on fields containing Numbers as strings - sorting

Related

How to sort with case insensitive without changing the settings

Sort by date + show past results after upcoming results

JMESPath current array index

mgo with aggregation and grouping

Elastic Search. Search by sub-collection value

Categories

Resources