How do query subdocument from mongoid using Ruby? - ruby

I have this document which I only want part of it. But I'm not sure how to do this in Mongoid query.
{
"_id": {
"$oid": "5297d6773865640002000000"
},
"saved_tweets": [
{
"_id": {
"$oid": "52b0856b6535380002000000"
},
"saved_id": "123456",
"tweet_ids": [
"1",
"2"
]
},
{
"_id": {
"$oid": "52b0856b6535380002000001"
},
"saved_id": "78901",
"tweet_ids": [
"3",
"4"
]
}
]}
What I want is all the tweet_ids according to the saved_id. This is what I'm doing right now which I think it's very ineffective.
existing_user = User.find_by(:social_id => social_id)
existing_user.saved_tweets.each do |saved_tweet|
if saved_id == saved_tweet.saved_id
#saved_tweet_ids = saved_tweet.tweet_ids
end
end

did you try something like that?
user.saved_tweets.where(saved_id: user.saved_id).map(&:tweet_ids)
?

Related

How to filter match in top 3 - elasticsearch?

I am having the following data in the elasticsearch
{
"_index": "media",
"_type": "information",
"_id": "6838",
"_source": {
"demographics_countries": {
"AE": 0.17543859649122806,
"CA": 0.013157894736842105,
"FR": 0.017543859649122806,
"GB": 0.043859649122807015,
"IT": 0.02631578947368421,
"LB": 0.013157894736842105,
"SA": 0.49122807017543857,
"TR": 0.017543859649122806,
"US": 0.09210526315789472
}
}
},
{
"_index": "media",
"_type": "information",
"_id": "57696",
"_source": {
"demographics_countries": {
"TN": 0.8125,
"MA": 0.034375,
"DZ": 0.032812,
"FR": 0.0125,
"EG": 0.0125,
"IN": 0.009375,
"SA": 0.009375
}
}
]
Expected result:
Find out an document having specific country SA (saudi arabia) is among top 3 in demographics_countries
For example:
"_id": "6838" (first document) is matched because SA (saudi arabia) is among top 3 in the demographics_countries in the above mentioned example document.
Tried ? : I have tried to filter using top_hits, But it's not working as expected.
Any suggestion will be grateful
With the current data model it's quite difficult to do that. What I'd suggest might be not the easiest way to do it, but it will definitely be the fastest to query eventually.
I'd suggest remodelling your documents to already include top countries:
[
{
"_index": "media",
"_type": "information",
"_id": "6838",
"_source": {
"top_demographics_countries": ["TN", "MA", "DZ"],
"demographics_countries": {
"AE": 0.17543859649122806,
"CA": 0.013157894736842105,
"FR": 0.017543859649122806,
"GB": 0.043859649122807015,
"IT": 0.02631578947368421,
"LB": 0.013157894736842105,
"SA": 0.49122807017543857,
"TR": 0.017543859649122806,
"US": 0.09210526315789472
}
}
},
{
"_index": "media",
"_type": "information",
"_id": "57696",
"_source": {
"top_demographics_countries": ["TN", "MA", "DZ"],
"demographics_countries": {
"TN": 0.8125,
"MA": 0.034375,
"DZ": 0.032812,
"FR": 0.0125,
"EG": 0.0125,
"IN": 0.009375,
"SA": 0.009375
}
}
}
]
Ignore values I've picked for top_demographics_countries. With this kind of approach, you can always precalculate top and then you could use a simple terms query to check if document contains that value or not:
{
"query": {
"bool": {
"filter": {
"term": {
"top_demographics_countries": "SA"
}
}
}
}
}
It's going to be cheaper to compute them once during saving compared to always building that clause dynamically.
#Evaldas is right -- it's better to extract the top 3 beforehand.
But if you can't help yourself and feel compelled to use java/painless, here's one approach:
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "demographics_countries.SA"
}
},
{
"script": {
"script": {
"source": """
def tuple_list = new ArrayList();
for (def c : params.all_countries) {
def key = 'demographics_countries.'+c;
if (!doc.containsKey(key) || doc[key].size() == 0) {
continue;
}
def val = doc[key].value;
tuple_list.add([c, val]);
}
// sort tuple list by the country values
Collections.sort(tuple_list, (arr1, arr2) -> arr1[1] < arr2[1] ? 1 : -1);
// slice & take only the top 3
def top_3_countries = tuple_list.subList(0, 3).stream().map(arr -> arr[0]).collect(Collectors.toList());
return top_3_countries.size() >=3 && top_3_countries.contains(params.country_of_interest);
""",
"params": {
"country_of_interest": "SA",
"all_countries": [
"AE",
"CA",
"FR",
"GB",
"IT",
"LB",
"SA",
"TR",
"US",
"TN",
"MA",
"DZ",
"EG",
"IN"
]
}
}
}
}
]
}
}
}

ElasticSearch not sorting results

I'm trying to sort the results based on a numeric field,
Here is my mapping:
{
"elasticie": {
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"number": {
"type": "long"
}
}
}
}
}
I'm using Python, and this is my testing data:
data = [
{'name': 'sElwUYiLXGHaQCKbdxtnvVzqIehfFWkJcPSTurgNoRD', 'number': 8583},
{'name': 'XJEtNsIFfcwHTMhqAvRkiygjbUGzZQPdS', 'number': 8127},
{'name': 'ZIeAGosUKJbjOdylM', 'number': 5862},
{'name': 'HYvcafoXkC', 'number': 7458},
{'name': 'tATJCjNuizOlGckXBpyVqSQL', 'number': 530},
{'name': 'TFYixotjhXzNZPvHnkraRDpAMEImJfqdcVGLC', 'number': 7052},
{'name': 'JCEGfoKDHRrcIkPQSqiVgNshZOBaMdXjAlxwUzmeWLy', 'number': 6168},
{'name': 'IpCTwUAQynSizJtcsuDmbX', 'number': 6492},
{'name': 'fTrcoXSBJNFhAkzWpDMxsEiLmZRvgnC', 'number': 382},
{'name': 'ulVNmqKTpPXfEIdiykhDjMrUGOYazLBFvgnWwsRtJoQbxSe', 'number': 2061}
]
Using following code, I'm creating the index and inserting the data:
from elasticsearch import Elasticsearch
from data import data # the data I've shown above
INDEX = 'elasticie'
es = Elasticsearch('http://127.0.0.1:9200')
for _ in data:
es.index(index=INDEX, body=_)
I'm trying to sort data based on number, asc or desc
Here is what I tried so far:
es.search(index=INDEX, params={'sort': {'number': {'order': 'asc'}})
es.search(index=INDEX, params={'sort': {'number': 'asc'})
es.search(index=INDEX, params={'sort': [('number', 'asc')]})
es.search(index=INDEX, params={'sort': {'number': {'order': 'asc', 'ignore_unmapped': True}})
es.search(index=INDEX, params={'sort': {'number': {'order': 'asc', 'unmapped_type': 'integer'}})
es.search(index=INDEX, params={'sort': {'number': {'order': 'asc', 'unmapped_type': 'long'}})
es.search(index=INDEX, params={'sort': {'number.raw': 'asc'})
Not of the above methods worked for me, The result is the same as the inserted data,
If I assign the above lines to a variable named search_result and print the result using the following code:
for index, result in enumerate(search_result['hits']['hits']):
print(f'{index}. {result["_source"]["number"]}')
I'll get the following result:
0. 8583
1. 8127
2. 5862
3. 7458
4. 530
5. 7052
6. 6168
7. 6492
8. 382
9. 2061
Which is obviously not sorted using number field!!
I don't know what I'm doing wrong, I'm using ElasticSearch 7.6 and Python 3.8
How can I make the sorting results work?
Update
Based on debugging logs, Python is sending a GET request to the following URL using the first method:
http://127.0.0.1:9200/elasticie/_search?sort={%27number%27%3A+{%27order%27%3A+%27asc%27}}
I am not familiar with python, but here is the Elasticsearch JSON query which would sort your documents according to the numbers in desc order. I've tried with your data set and it gives proper results.
Sort Search query
{
"sort": [
{
"number": {
"order": "desc"
}
}
]
}
Results
"hits": [
{
"_index": "so-60598395-sort",
"_type": "_doc",
"_id": "1",
"_score": null,
"_source": {
"name": "sElwUYiLXGHaQCKbdxtnvVzqIehfFWkJcPSTurgNoRD",
"number": 8583
},
"sort": [
8583
]
},
{
"_index": "so-60598395-sort",
"_type": "_doc",
"_id": "2",
"_score": null,
"_source": {
"name": "XJEtNsIFfcwHTMhqAvRkiygjbUGzZQPdS",
"number": 8127
},
"sort": [
8127
]
},
{
"_index": "so-60598395-sort",
"_type": "_doc",
"_id": "4",
"_score": null,
"_source": {
"name": "HYvcafoXkC",
"number": 7862
},
"sort": [
7862
]
},
{
"_index": "so-60598395-sort",
"_type": "_doc",
"_id": "3",
"_score": null,
"_source": {
"name": "ZIeAGosUKJbjOdylM",
"number": 5862
},
"sort": [
5862
]
}
EDIT:- Based on the OP comments, python library which he is using supports the POST method of search endpoint, using which he solved the issue. Refer to the comments on the question for more details.
My mistake, I read the documentation and the code functionality using help and dir functions
There is no parameter named sort defined on the Elasticsearch.search method, That's why I thought I should use it as a key within the params dict that it takes,
Thanks to #OpsterElasticSearchNinja and his comment, I realized there is something wrong with either the library or how I'm using it
Sending POST request with sort key as post body worked well,
So I decided to read the whole source code and find out what's going wrong?
#query_params(
#...
"size",
"sort",
#...
)
def search(self, body=None, index=None, doc_type=None, params=None):
# ...
This is how the sort parameter is defined, using a decorator on runtime!!
That's when I tried this code, and somehow it worked!
es.search(index=INDEX, sort=['number:asc'])

doc['field'].value never returning values

Using Kibana/ Elasticsearch version 6.6.
Trying to run the below simple painless script:
String val = "Vanished";
if(doc.containsKey('type')) {
return doc['type'].value;
}
return val;
In the Preview Results section, when I try to run the code, the First 10 results section is always:
[]
However, if I alter the code like below:
String val = "Vanished";
if(doc.containsKey('type')) {
return "Present";
}
return val;
I am getting the below result in the same Preview Results section:
[
{
"_id": "Kha1NmkBcY4KotEKXsZz",
"test112": [
"Present"
]
},
{
"_id": "1oS1NmkBjBc6pl9UX0IW",
"test112": [
"Present"
]
},
{
"_id": "14S1NmkBjBc6pl9UX0IW",
"test112": [
"Present"
]
},
{
"_id": "whC1NmkBCa8dRNQVXzEW",
"test112": [
"Present"
]
},
{
"_id": "X221NmkBZQRXPOstYIHB",
"test112": [
"Present"
]
},
{
"_id": "Rca1NmkBZrtXVVVdY50r",
"test112": [
"Present"
]
},
{
"_id": "CMS1NmkBwiujVR8BZAt2",
"test112": [
"Present"
]
},
{
"_id": "xhC1NmkBCa8dRNQVZTFf",
"test112": [
"Present"
]
},
{
"_id": "yBC1NmkBCa8dRNQVZTFf",
"test112": [
"Present"
]
},
{
"_id": "yRC1NmkBCa8dRNQVZTFf",
"test112": [
"Present"
]
}
]
Can someone please help figure out why the doc['type'].value is failing?
This issue is solved by changing the script to the below:
String val = "Vanished";
if(doc.containsKey('type.keyword')) {
return doc['type.keyword'].value;
}
return val;
Note the use of type.keyword instead of type as the field name.

Get data by Popularity/Rating using Elasticsearcch

I am trying to do a get request using elasticsearch which needs to get the data with respect to its Popularity/ rating. So I followed this Link
. I set the rating of my item by using the below one,
#1 http://localhost:9200/cars/car/_rank_eval
above is the Api which is used to create _rank_eval using postman ,
Below is my Body content ,
{
"requests": [
{
"id": "horsepower",
"request": {
"query": {
"match": {
"horsepower": "68"
}
}
},
"ratings": [
{
"_index": "cars",
"_id": "25",
"rating": 10
},
{
"_index": "cars",
"_id": "119",
"rating": 1
},
{
"_index": "cars",
"_id": "52",
"rating": 2
}
]
}
],
"metric": {
"precision": {
"relevant_rating_threshold": 1,
"ignore_unlabeled": false
}
}
}
Steps i did so far ,
1 . Created database with dumb data's in MySql
2 . Transferred my db data's to ElasticSearch using loadstash
3 . Set some rating's to my data
so what are all the next steps , did i miss anything .. I need some clarification/help on this .

Logstash 5.0 Ruby Filter Can't Update Hash in Array

I'm a newbie to both Logstash and Ruby, and I meet a subtle problem today.
My input JSON like the following:
{
"1": "1",
"2": "2",
"market": [
{
"id": "1",
"name": "m1"
},
{
"id": "2",
"name": "m2"
}
]
}
My filter is like the following code, and I want to set event["1"] to m1, event["2"] to m2, event["market"][0]["id"] to m1, event["market"][1]["id"] to m2:
filter {
......
ruby {
code => "
markets = event.get('market')
markets.each_index do |index|
event.set(markets[index]['id'], markets[index]['name'])
markets[index]['id'] = markets[index]['name']
end
"
}
......
}
And the output is following:
{
"1": "m1",
"2": "m2",
"market": [
{
"id": "1",
"name": "m1"
},
{
"id": "2",
"name": "m2"
}
]
}
The event["1"] and event["2"] get the expected values, but the event["market"][0]["id"] and event["market"][1]["id"] do not, and I want to know why? The desired output should be:
{
"1": "m1",
"2": "m2",
"market": [
{
"id": "m1",
"name": "m1"
},
{
"id": "m2",
"name": "m2"
}
]
}
PS: The logstash I'm using is version 5.0.
I think it is because of the new Event API introduced in the Logstash 5.0. After changing my filter to the following, I get the desired output:
filter {
......
ruby {
code => "
markets = event.get('market')
markets.each_index do |index|
event.set(markets[index]['id'], markets[index]['name'])
markets[index]['id'] = markets[index]['name']
end
event.set('market', markets) // comment: adding this setter in the filter
"
}
......
}
According to Logstash Git Issue, "Mutating a collections after setting it in the Event has an undefined behaviour".

Resources