Spring Redis sort keys - spring

I have the following keys in Redis(Spring Data Redis),
localhost>Keys *
"1+ { \"_id":"1", \"Name\" : \"C5796\" , \"Site\" : \"DRG1\"}"
"2+ { \"_id":"2", \"Name\" : \"CX1XE\" , \"Site\" : \"DG1\"}"
"3+ { \"_id":"3", \"Name\" : \"C553\" , \"Site\" : \"DG1\"}"
If I want to sort according to id/name/site, how can I do it in Spring Redis?
List<Object> keys = redistemplate.sort(SortQueryBuilder.sort("Customer").build());
and,
SortQuery<String> sort = SortQueryBuilder.sort(key).noSort().get(field).build();
List<?> keys = redistemplate.boundHashOps(key).getOperations().sort(sort);
are not working.

The code is at the last of the post, if you are familiar with the principle of multi hset keys sort in redis, skip the following content and directly read the code.
Redis Sort is aimed to sort fields inside List/Set/Zset, but this method can be used to sort multi keys base on specified metric we want. We can use "sort" to sort multi hset keys by specified field, but there is limitation about the pattern of hset keys.
For example, if the pattern of hset keys is "hash{i}"(i is an integer), under this condition we can sort it.
127.0.0.1:6379> keys hash*
1) "hash3"
2) "hash2"
3) "hash1"
Take a look at the content of hash1:
127.0.0.1:6379> hgetall hash1
1) "id"
2) "24"
3) "name"
4) "kobe"
Every hash key contains two fields : "id", "name". If we want to sort these hset keys by its id. What should we do ?
First, add a set key named "myset". "myset" is a set key which contains members {"1", "2", "3"}.
127.0.0.1:6379> smembers myset
1) "1"
2) "2"
3) "3"
Then run the following command:
127.0.0.1:6379> SORT myset BY hash*->id GET hash*->id GET hash*->name
1) "3"
2) "wade"
3) "24"
4) "kobe"
5) "30"
6) "curry"
Eureka, sort hash{1-3} by its id.
Here is the code of using Spring Redis to do the job:
public static String getRandomStr() {
return String.valueOf(new Random().nextInt(100));
}
public static void redisTemplateSort(RedisTemplate redisTemplate) {
String sortKey = "sortKey";
StringRedisSerializer stringRedisSerializer = new StringRedisSerializer();
redisTemplate.setKeySerializer(stringRedisSerializer);
redisTemplate.setValueSerializer(stringRedisSerializer);
redisTemplate.setHashKeySerializer(stringRedisSerializer);
redisTemplate.setHashValueSerializer(stringRedisSerializer);
redisTemplate.delete(sortKey);
if (!redisTemplate.hasKey(sortKey)) {
for (int i = 0; i < 10; i++) {
redisTemplate.boundSetOps(sortKey).add(String.valueOf(i));
String hashKey = "hash" + i,
strId = String.valueOf(i),
strName = getRandomStr(),
strSite = getRandomStr();
redisTemplate.boundHashOps(hashKey).put("_id", strId);
redisTemplate.boundHashOps(hashKey).put("Name", strName);
redisTemplate.boundHashOps(hashKey).put("Site", strSite);
System.out.printf("%s : {\"_id\": %s, \"Name\": %s, \"Site\", %s}\n",
hashKey, strId, strName, strSite);
}
}
SortQuery<String> sortQuery = SortQueryBuilder.sort(sortKey).by("hash*->Name")
.get("hash*->_id").get("hash*->Name").get("hash*->Site").build();
List<String> sortRslt = redisTemplate.sort(sortQuery);
for (int i = 0; i < sortRslt.size(); ) {
System.out.printf("{\"_id\": %s, \"Name\": %s, \"Site\", %s}\n", sortRslt.get(i+2), sortRslt.get(i+1), sortRslt.get(i));
i += 3;
}
}
Result of running redisTemplateSort(redisTemplate)(as sort by name in the code) :
hash0 : {"_id": 0, "Name": 59, "Site", 60}
hash1 : {"_id": 1, "Name": 37, "Site", 57}
hash2 : {"_id": 2, "Name": 6, "Site", 40}
hash3 : {"_id": 3, "Name": 91, "Site", 58}
hash4 : {"_id": 4, "Name": 39, "Site", 32}
hash5 : {"_id": 5, "Name": 27, "Site", 82}
hash6 : {"_id": 6, "Name": 43, "Site", 10}
hash7 : {"_id": 7, "Name": 17, "Site", 55}
hash8 : {"_id": 8, "Name": 14, "Site", 91}
hash9 : {"_id": 9, "Name": 39, "Site", 91}
{"_id": 40, "Name": 6, "Site", 2}
{"_id": 91, "Name": 14, "Site", 8}
{"_id": 55, "Name": 17, "Site", 7}
{"_id": 82, "Name": 27, "Site", 5}
{"_id": 57, "Name": 37, "Site", 1}
{"_id": 32, "Name": 39, "Site", 4}
{"_id": 91, "Name": 39, "Site", 9}
{"_id": 10, "Name": 43, "Site", 6}
{"_id": 60, "Name": 59, "Site", 0}
{"_id": 58, "Name": 91, "Site", 3}

I don't know about spring data redis. Let me give you a sample to achieve this in naive Redis. Let us say you have hash, which has id, name and site. and i have a list representing the keys of that hash.
My structure will be like :
lpush("Values",1);
hset("hash_1","id","1"),hset("hash_1","Name","C5796"),hset("hash_1","Site","DRG1")
for second hash
lpush("Values",2);
...
Similarly for all the values you want to set in hash. Now for sorting you do like this
SORT "Values" BY hash_*->id get hash_*->id get hash_*->name get hash_*->site
this will return you ascending sorted hashmaps result based on id. similarly you can do for names/site. For more info about sorting in redis : http://redis.io/commands/sort

Related

two level nested aggregation in elastic search based on condition over first level aggregation

My ES document structure is like this:
{
"_index": "my_index",
"_type": "_doc",
"_id": "1296",
"_version": 1,
"_seq_no": 431,
"_primary_term": 1,
"_routing": "1296",
"found": true,
"_source": {
"id": 1296,
"test_name": "abc"
"test_id": 513
"inventory_arr"[
{
"city": "bangalore",
"after_tat": 168,
"before_tat": 54,
"popularity_score": 15,
"rank": 0,
"discounted_price": 710,
"labs": [
{
"lab_id": 395,
"lab_name": "Prednalytics Laboratory",
"lab_rating": 34,
},
{
"lab_id": 363,
"lab_name": "Neuberg Diagnostics",
"lab_rating": 408,
}
]
},
{
"city": "mumbai",
"after_tat": 168,
"before_tat": 54,
"popularity_score": 15,
"rank": 0,
"discounted_price": 710,
"labs": [
{
"lab_id": 395,
"lab_name": "Prednalytics Laboratory",
"lab_rating": 34,
},
{
"lab_id": 380,
"lab_name": "Neuberg Diagnostics",
"lab_rating": 408,
}
]
}
]
}
}
I want to know how many tests are performed in each lab that is in Bangalore.
The problem I'm facing that:
If grouping by lab_id using nested aggregation than it group by each lab no matter in which city it is.
Suppose there is only one record in my doc then I'm expecting answer like this for city Bangalore
[
{key: 395, doc_count: 1}
{key: 363, doc_count: 1}
]
Note: lab id can be duplicated in each city.
This problem can be solved using a filter aggregation.
When you are using a nested aggregation, you are iterating over the nested documents. The filter aggregation, filters out the nested documents that don't match the filter query that you provide inside. In your case you would want to filter out the nested documents that aren't inside the city of Bangalore. After you have removed those nested documents you can use another terms bucket aggregation on the lab_id.
Good luck!

How can I query docs base on its field value in other index in ElasticSearch 7

I have two indexes in elasticsearch 7.
How do I sort user docs by age base on user_id in group "abc"
Here's some sample docs in the indexes.
================ index: group ====================
# doc
{
"abc": [1, 3, 5, 7],
"efg": [1, 3, 46, 53]
}
================ index: user ====================
# doc
{
"user_id": 1,
"age": 28
}
{
"user_id": 3,
"age": 21
}
{
"user_id": 46,
"age": 29
}
Expected Return
[
{
"user_id": 3,
"age": 21
},
{
"user_id": 1,
"age": 28
}
]
Or for this propose. I should have my doc structure of index: group be below instead of above.
{
"group_id": "abc",
"users": [1, 3, 5, 7]
}

Want to get distinct records in hits section from elasticsearch

I want to get all the distinct records as per "departmentNo" .
Please check the below Index Data : (it is dummy data.)
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 1, "employeeName": "vijay", ...}
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 2, "employeeName": "rathod", ...}
{'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 3, "employeeName": "ajay", ...}
{'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 4, "employeeName": "kamal", ...}
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 5, "employeeName": "rahul", ...}
I want the below output.
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 1, "employeeName": "vijay", ...}
{'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 3, "employeeName": "ajay", ...}
I was trying to get data in hits section. But didn't found the answer.
So I tried with aggeration. Used below query
{
"size": 0,
"aggs": {
"Group_By_Dept": {
"terms": {
"field": "departmentNo"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1
}
}
}
}
}
}
I got the data by the above query. But I want all the distinct data and they should support pagination + sorting.
In elastic 6.0 we could use bucket_sort , but I am using 5.6.7.So I can't use bucket_sort.
So Can I do it in any other way.?
If I could get data in hits's section then it will be good.
(I don't want to change my index mapping. Actually here i have added dummy mapping. but usecase is same.)
You can do that by using field collapsing:
{
"query": { ... },
"from": 153,
"size": 27,
"collapse": {
"field": "departmentNo"
}
}
This will leave only one document for each repeating value in such field. You can control which document it would be using standard sort (i.e. document with highest sort value among collapsed would be returned).
Please note that there is additional functionality called inner hits, which you may want to use in the future - be aware that it multiplies document fetches and negatively affects performance.

SUTime custom date parsing

Need to parse a sentence like "Bob was born on Jan fifteen nineteen seventy nine." and extract the date. How do I create a new rule to handle the date expression?
If I use "Bob was born on Jan fifteenth nineteen seventy nine." the parser extracts the correct date 01/15/1979. Simply changing "fifteenth" to "fifteen" leads to incorrect parsing.
val input = "Bob was born on Jan fifteenth nineteen seventy nine."
val document = new CoreDocument(input)
val props = new Properties()
val annotators_ner = "tokenize,ssplit,pos,lemma,ner"
props.setProperty("annotators", annotators_ner)
val pipeline = new StanfordCoreNLP(props)
pipeline.annotate(document)
val writer = new StringWriter
pipeline.jsonPrint(document.annotation(), writer);
val json = writer.toString()
println(json)
The json excerpt below shows correct entity mentions.
"entitymentions": [
{
"docTokenBegin": 0,
"docTokenEnd": 1,
"tokenBegin": 0,
"tokenEnd": 1,
"text": "Bob",
"characterOffsetBegin": 0,
"characterOffsetEnd": 3,
"ner": "PERSON"
},
{
"docTokenBegin": 4,
"docTokenEnd": 9,
"tokenBegin": 4,
"tokenEnd": 9,
"text": "Jan fifteenth nineteen seventy nine",
"characterOffsetBegin": 16,
"characterOffsetEnd": 51,
"ner": "DATE",
"normalizedNER": "1979-01-15",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "1979-01-15"
}
}
]
Changing the input as follows.
val input = "Bob was born on Jan fifteen nineteen seventy nine."
Leads to the following output for entity mentions.
"entitymentions": [
{
"docTokenBegin": 0,
"docTokenEnd": 1,
"tokenBegin": 0,
"tokenEnd": 1,
"text": "Bob",
"characterOffsetBegin": 0,
"characterOffsetEnd": 3,
"ner": "PERSON"
},
{
"docTokenBegin": 4,
"docTokenEnd": 7,
"tokenBegin": 4,
"tokenEnd": 7,
"text": "Jan fifteen nineteen",
"characterOffsetBegin": 16,
"characterOffsetEnd": 36,
"ner": "DATE",
"normalizedNER": "1519-01",
"timex": {
"tid": "t1",
"type": "DATE",
"value": "1519-01"
}
},
{
"docTokenBegin": 7,
"docTokenEnd": 9,
"tokenBegin": 7,
"tokenEnd": 9,
"text": "seventy nine",
"characterOffsetBegin": 37,
"characterOffsetEnd": 49,
"ner": "NUMBER",
"normalizedNER": "79.0"
}
]
This is complicated because I think the rules are recognizing "Jan fifteen nineteen" as January 1519. One thing you could try is adding some post-processing rules if you find a common pattern such as "DATE" followed by "NUMBER" is actually a written out "DATE". But you could imagine scenarios such as "In Jan Fifteen Nineteen Seventy Nine people visited".
There is a full write up about how to add TokensRegex rules to the ner pipeline here: https://stanfordnlp.github.io/CoreNLP/ner.html
and info on writing tokensregex rules here: https://stanfordnlp.github.io/CoreNLP/tokensregex.html

How do I iterate over this hash selectively?

{
"menu": {
"header": "menu",
"items": [
{"id": 27},
{"id": 0, "label": "Label 0"},
null,
{"id": 93},
{"id": 85},
{"id": 54},
null,
{"id": 46, "label": "Label 46"}
]
}
}
Above is the JSON that I am trying to iterate through. Essentially, I would like to identify the value of key "id" if that hash also has a "label" key.
So the above would return 0 and 46 as well.
I am stuck here:
require 'json'
line = '{"menu": {"header": "menu", "items": [{"id": 27}, {"id": 0, "label": "Label 0"}, null, {"id": 93}, {"id": 85}, {"id": 54}, null, {"id": 46, "label": "Label 46"}]}}'
my_parse = JSON.parse(line)
items = my_parse['menu']['items'].compact.select { |item| item['label'] }
puts items.inject
Use Array#select to identify the elements that have both "id" and "label" then Array#map to pluck only the "ids".
hash = JSON.parse(your_json_string)
hash['menu']['items'].select { |h| h && h['id'] && h['label'] }.map {|h| h['id']}
# => [0, 46]
A more cleaned up version could look like this
def ids_with_label(json_str)
hash = JSON.parse(json_str)
items = hash['menu']['items']
items_with_label = items.select { |h| h && h.include?('id') && h.include?('label') }
ids = items_with_label.map { |h| h['id'] }
ids
end
ids_with_label(your_json_string) # => [0, 46]
I do not know if this is what you want exactly:
items = my_parse['menu']['items'].compact.select { |item| item.has_key?('label') }.map{ |item| item['id'] }
There's no need to create a temporary array:
my_parse['menu']['items'].each_with_object([]) { |o,a|
a << o['id'] if o && o.key?('id') && o.key?('label') }
#=> [0, 46]

Resources