Request specific fields (not all fields) with RediSearch - performance

Here's an official example of a RediSearch query:
127.0.0.1:6379> FT.SEARCH myIdx "hello world" LIMIT 0 10
1) (integer) 1
2) "doc1"
3) 1) "title"
2) "hello world"
3) "body"
4) "lorem ipsum"
5) "url"
6) "http://redis.io"
My question is, how could I request just one or two fields, e.g. just to get back to the "title" value ("hello world") or the "ID" and "title" fields ([1, "hello world"]). Mainly for performance reasons.

Yes, it supports it with the RETURN option.
127.0.0.1:6379> FT.SEARCH myIdx "hello world" LIMIT 0 10 RETURN 2 title url
See: https://oss.redislabs.com/redisearch/Commands/#ftsearch

Related

How to get all HMSET - Laravel Redis

Hmset Example Screenshot
This is my demo hmset example, I try to get all hmset list by Redis::hgetall("hmset_demo:user_id:*"), but it return a empty array [].
How can I get this return:
[{
"id": "1",
"name": "name edited 1",
"description": "desc edited 1",
"avatar": "xxx 1"
},
{
"id": "1",
"name": "name edited 1",
"description": "desc edited 1",
"avatar": "xxx 1"
},
...
]
PS: I use the following code to insert:
Redis::hmset("hmset_demo:user_id:".$id, [
'id' => $id,
'name' => 'name edited '.$id,
'description' => 'desc edited'.$id,
'avatar' => 'xxx'.$id,
]);
I can get specific hmset data by this code:
$id = 1;
Redis::hgetall("hmset_demo:user_id:".$id);
You cannot use wildcards in key names.
Let's see an example. We store two hash type keys
127.0.0.1:6379> HSET hmset_demo:user_id:0 NAME PETER
127.0.0.1:6379> HSET hmset_demo:user_id:1 NAME GWEN
We can retrieve their value by its key
127.0.0.1:6379> HGETALL hmset_demo:user_id:0
1) "NAME"
2) "PETER"
127.0.0.1:6379> HGETALL hmset_demo:user_id:1
1) "NAME"
2) "GWEN"
But we cannot use wildcards with HGETALL command
127.0.0.1:6379> HGETALL hmset_demo:user_id:*
(empty array)
If you need to use wildcards to retrieve a lot of keys, you can use SCAN COMMAND
127.0.0.1:6379> SCAN 0 MATCH hmset_demo:user_id:* COUNT 10
1) "0"
2) 1) "hmset_demo:user_id:1"
2) "hmset_demo:user_id:0"
Iterate each element from the resulset and retrieve its values using HGETALL
There is no way to retrieve everything using just one Redis command
Also, HMSET command is deprecated. You should use HSET instead.

elasticsearch - query between document types

I have a production_order document_type
i.e.
{
part_number: "abc123",
start_date: "2018-01-20"
},
{
part_number: "1234",
start_date: "2018-04-16"
}
I want to create a commodity document type
i.e.
{
part_number: "abc123",
commodity: "1 meter machining"
},
{
part_number: "1234",
commodity: "small flat & form"
}
Production orders are datawarehoused every week and are immutable.
Commodities on the other hand could change over time. i.e abc123 could change from 1 meter machining to 5 meter machining, so I don't want to store this data with the production_order records.
If a user searches for "small flat & form" in the commodity document type, I want to pull all matching records from the production_order document type, the match being between part number.
Obviously I can do this in a relational database with a join. Is it possible to do the same in elasticsearch?
If it helps, we have about 500k part numbers that will be commoditized and our production order data warehouse currently holds 20 million records.
I have found that you can indeed now query between indexs in elasticsearch, however you have to ensure your data stored correctly. Here is an example from the 6.3 elasticsearch docs
Terms lookup twitter example At first we index the information for
user with id 2, specifically, its followers, then index a tweet from
user with id 1. Finally we search on all the tweets that match the
followers of user 2.
PUT /users/user/2
{
"followers" : ["1", "3"]
}
PUT /tweets/tweet/1
{
"user" : "1"
}
GET /tweets/_search
{
"query" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "followers"
}
}
}
}
Here is the link to the original page
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-terms-query.html
In my case above I need to setup my storage so that commodity is a field and it's values are an array of part numbers.
i.e.
{
"1 meter machining": ["abc1234", "1234"]
}
I can then look up the 1 meter machining part numbers against my production_order documents
I have tested and it works.
There is no joins supported in elasticsearch.
You can query twice first by getting all the partnumbers using "small flat & form" and then using all the partnumbers to query the other index.
Else try to find a way to merge these into a single index. That would be better. Updating the Commodities would not cause you any problem by combining the both.

ElasticSearch boosting relevance based on the count of the field value

I'm trying to boost the relevance based on the count of the field value. The less count of the field value, the more relevant.
For example, I have 1001 documents. 1000 documents are written by John, and only one is written by Joe.
// 1000 documents by John
{"title": "abc 1", "author": "John"}
{"title": "abc 2", "author": "John"}
// ...
{"title": "abc 1000", "author": "John"}
// 1 document by Joe
{"title": "abc 1", "author": "Joe"}
I'll get 1001 documents when I search "abc" against title field. These documents should have pretty similar relevance score if they are not exact same. The count of field value "John" is 1000 and the count of field value "Joe" is 1. Now, I'd like to boost the relevance of the document {"title": "abc 1", "author": "Joe"}, otherwise, it would be really hard to see the document with the author Joe.
Thank you!
In case someone runs into the same use case, I'll explain my workaround by using Function Score Query. This way would make at least two calls to Elasticsearch server.
Get the counts for each person(You may use aggregation feature). In our example, we get 1000 from John and 1 from Joe.
Generate the weight from the counts. The more counts, the less relevance weight. Something like 1 + sqrt(1/1000) for John and 1 + sqrt(1/1) for Joe.
Use the weight in the script to calculate the score according to the author value(The script can be much better):
{
"query": {
"function_score": {
"query": {
"match": { "title": "abc" }
},
"script_score" : {
"script" : {
"inline": "if (doc['author'].value == 'John') {return (1 + sqrt(1/1000)) * _score}\n return (1 + sqrt(1/1)) * _score;"
}
}
}
}
}

How to get the word count for all the documents based on index and type in elasticsearch?

If I have few documents and would like to get the count of each word in all the documents for a particular field how do I get?
ex: Doc1 : "aaa bbb aaa ccc"
doc2 : "aaa ccc"
doc3 : "www"
I want it like aaa-3, bbb-1, ccc-2, www-1
If you want the document counts, you can do it by using a terms aggregation like this:
POST your_index/_search
{
"aggs" : {
"counts" : {
"terms" : { "field" : "your_field" }
}
}
}
UPDATE
If you want to get the term count, you need to use the _termvector API, however, you'll only be able to query one document after another.
GET /your_index/your_type/1/_termvector?fields=your_field
And for doc1 you'll get
aaa: 2
bbb: 1
ccc: 1
The multi-term vectors API can help but you'll still need to specify the documents to get the term vectors from.
POST /your_index/your_type/_mtermvectors' -d '{
"docs": [
{
"_id": "1"
},
{
"_id": "2"
},
{
"_id": "3"
}
]
}'
And for your docs you'll get
aaa: 2 + 1
bbb: 1
ccc: 1 + 1
www: 1

Uniformly distributing results in elastic search based on an attribute

I am using tire to perform searches on sets of objects that have a category attribute (there are 6 different categories).
I want the results to come in pages of 6 with one of each category on a page (while it is possible).
Eg1. So if the first,second and third category had 2 objects each and the fourth, fifth and sixth categories had 1 object each the pages would look like:
Data: [1,1,2,2,3,3,4,5,6]
1: 1,2,3,4,5,6
2: 1,2,3
Eg2. [1,1,1,1,1,2,2,3,4,5]
1: 1,2,3,4,5,1
2: 2,1,1,1
In something like ruby it wouldn't be too difficult to sort based on the number of times a number has appeared.
Something like
times_seen = {}
results.sort_by do |r|
times_seen[r.category] ||= 0
[times_seen[r.category] += 1, r.category]
end
E.g.
irb(main):032:0> times_seen = {};[1,1,1,1,1,2,2,3,4,5].sort_by{|i| times_seen[i] ||= 1; [times_seen[i] += 1, i];}
=> [1, 2, 3, 4, 5, 1, 2, 1, 1, 1]
To do this with a large number of results would be really slow because we would need to pull them all into ruby first and then sort.
Ideally we want to do this in elastic search and let it handle the pagination for us.
There is Script based sorting in elastic search:
http://www.elasticsearch.org/guide/reference/api/search/sort/
{
"query" : {
....
},
"sort" : {
"_script" : {
"script" : "doc['field_name'].value * factor",
"type" : "number",
"params" : {
"factor" : 1.1
},
"order" : "asc"
}
}
}
So if we could do something like this but have the times_seen logic from above in it, it would make life really easy, but it would require having a times_seen variable that persists between scripts.
Any ideas on how to achieve a uniform distribution based on an attribute or if it is possible to somehow use a variable in the script sort?

Resources