Elasticsearch: How do you search scroll without HTTP 404 errors? - elasticsearch

I'm trying to get all the documents from Elasticsearch for a search query that has 600k+ documents. You can not use the search size parameter or the index's settings that adjusts the max number of returned documents parameter because both of these expect a smaller number of documents.
From python elasticsearch module I do a:
result = client.search(... scroll='1m')
scroll_id = result['_scroll_id']
while True:
# ...
client.scroll(... scroll_id=scroll_id)
# ...
The console reports successful scroll for like 200ms or about 7 scrolls. Then I get HTTP 404.
How do you search scroll without HTTP 404 errors?
p.s. i'm on version 7.3 here.

I found a solution here: https://github.com/ropensci/elastic/issues/178
You have to modify your client.scroll call to include the scroll timeout parameter scroll='1m':
client.search(... scroll='1m')
while True:
# ...
client.scroll(... scroll_id=scroll_id, scroll='1m')
Then my scrolling works.
p.s. I only use python here because it makes the writing easier (the solution was originally from a github for R programming language). This solution applies to any Elasticsearch REST API usage IMO.

Related

Grafana to use substraction of two fields in Elasticsearch data source

I have two fields, called 'status_codes' and requests
I want to get number of failed requests.
My equation is [requests - no of success requests]
In the script i wrote something like this _value - doc['#status_codes.200'].value
BUT the value return in the graph is 'N/A'
I'm using elasticsearch(7.6.0) and Grafana(6.6.2).
Following is the out file which i'm sending to elasticsearch
{ "latencies":{
"total":3981710268690,
"mean":43876078,
"50th":916913,
"90th":2217744,
"95th":5162430,
"99th":60233348,
"max":60000209373,
"min":43652
},
"#version":"1",
"latest":"2020-03-05T16:14:44.23387091Z",
"path":"test23.json",
"duration":61163899322,
"wait":552109,
"status_codes":{
"0":90624,
"200":125
},
"earliest":"2020-03-05T16:13:43.069971588Z",
"rate":1483.702004057131,
"throughput":2.0436707446156577,
"#timestamp":"2020-03-05T16:14:44.453Z",
"errors":[
"Post http://www: dial tcp 0.0.0.0:0->10.133.9.87:8688: socket: too many open files",
"Post http://www: dial tcp: lookup internal-netty-load-balancer-937469711.us-east-1.elb.amazonaws.com on 10.20.30.30: dial udp 10.20.30:45: socket: too many open files"
],
"bytes_in":{
"mean":70.90298515686123,
"total":6434375
},
"requests":90749,
"Report_Title":"test23",
"host":"ABS",
"success":0.0013774256465635985,
"end":"2020-03-05T16:14:44.234423019Z",
"bytes_out":{
"mean":70.90298515686123,
"total":6434375
}
}
Also I have used Singlestat plugin as #yash mentioned, but still i could resolve the issue.
Query section
Visualization section
Can someone help me
This is a fairly easy task. You just need to use either of the 'Singlestat Math' or 'Metaqueries' plugin for that. What you need to do is, use the count metric in two queries in the same panel, one for getting the count of successful status codes, and other for unsuccessful status codes. Then you can use either of the plugin to subtract the value of result of query one query from another.
https://grafana.com/grafana/plugins/blackmirror1-singlestat-math-panel
https://grafana.com/grafana/plugins/goshposh-metaqueries-datasource
I suggest you go with the singlestat math plugin, it would be easier to work with, from my experience.
Note: The calculation (A-B) is done in the visualization section and not in query section, in the singlestat math plugin.
P.S. singlestat-math plugin actually adds a new panel in the visualization section. It's a different panel than the default singlestat panel.
Finally I found the the solution as follow,
solution
Thanks everyone.

Displaying JSON output from an API call in Ruby using VScode

For context, I'm someone with zero experience in Ruby - I just asked my Senior Dev to copy-paste me some of his Ruby code so I could try to work with some APIs that he ended up putting off because he was too busy.
So I'm using an API wrapper called zoho_hub, used as a wrapper for Zoho APIs (https://github.com/rikas/zoho_hub/blob/master/README.md).
My IDE is VSCode.
I execute the entire length of the code, and I'm faced with this:
[Done] exited with code=0 in 1.26 seconds
The API is supposed to return a paginated list of records, but I don't see anything outputted in VSCode, despite the fact that no error is being reflected. The last 2 lines of my code are:
ZohoHub.connection.get 'Leads'
p "testing"
I use the dummy string "testing" to make sure that it's being executed up till the very end, and it does get printed.
This has been baffling me for hours now - is my response actually being outputted somewhere, and I just can't see it??
Ruby does not print anything unless you tell it to. For debugging there is a pretty printing method available called pp, which is decent for trying to print structured data.
In this case, if you want to output the records that your get method returns, you would do:
pp ZohoHub.connection.get 'Leads'
To get the next page you can look at the source code, and you will see the get request has an additional Hash parameter.
def get(path, params = {})
Then you have to read the Zoho API documentation for get, and you will see that the page is requested using the page param.
Therefore we can finally piece it together:
pp ZohoHub.connection.get('Leads', page: NNN)
Where NNN is the number of the page you want to request.

How to Fix Document Not Found errors with find

I have a collection of Person, stored in a legacy mongodb server (2.4) and accessed with the mongoid gem via the ruby mongodb driver.
If I perform a
Person.where(email: 'some.existing.email#server.tld').first
I get a result (let's assume I store the id in a variable called "the_very_same_id_obtained_above")
If I perform a
Person.find(the_very_same_id_obtained_above)
I got a
Mongoid::Errors::DocumentNotFound
exception
If I use the javascript syntax to perform the query, the result is found
Person.where("this._id == #{the_very_same_id_obtained_above}").first # this works!
I'm currently trying to migrate the data to a newever version. Currently mongodbrestore-ing on amazon documentdb to make tests (mongodb 3.6 compatible) and the issue remains.
One thing I noticed is that those object ids are peculiar:
5ce24b1169902e72c9739ff6 this works anyway
59de48f53137ec054b000004 this requires the trick
The small number of zeroes toward the end of the id seems to be highly correlated with the problem (I have no idea of the reason).
That's the default:
# Raise an error when performing a #find and the document is not found.
# (default: true)
raise_not_found_error: true
Source: https://docs.mongodb.com/mongoid/current/tutorials/mongoid-configuration/#anatomy-of-a-mongoid-config
If this doesn't answer your question, it's very likely the find method is overridden somewhere in your code!

how can I get ALL records from route53?

how can I get ALL records from route53?
referring code snippet here, which seemed to work for someone, however not clear to me: https://github.com/aws/aws-sdk-ruby/issues/620
Trying to get all (I have about ~7000 records) via resource record sets but can't seem to get the pagination to work with list_resource_record_sets. Here's what I have:
route53 = Aws::Route53::Client.new
response = route53.list_resource_record_sets({
start_record_name: fqdn(name),
start_record_type: type,
max_items: 100, # fyi - aws api maximum is 100 so we'll need to page
})
response.last_page?
response = response.next_page until response.last_page?
I verified I'm hooked into right region, I see the record I'm trying to get (so I can delete later) in aws console, but can't seem to get it through the api. I used this: https://github.com/aws/aws-sdk-ruby/issues/620 as a starting point.
Any ideas on what I'm doing wrong? Or is there an easier way, perhaps another method in the api I'm not finding, for me to get just the record I need given the hosted_zone_id, type and name?
The issue you linked is for the Ruby AWS SDK v2, but the latest is v3. It also looks like things may have changed around a bit since 2014, as I'm not seeing the #next_page or #last_page? methods in the v2 API or the v3 API.
Consider using the #next_record_name and #next_record_type from the response when #is_truncated is true. That's more consistent with how other paginations work in the Ruby AWS SDK, such as with DynamoDB scans for example.
Something like the following should work (though I don't have an AWS account with records to test it out):
route53 = Aws::Route53::Client.new
hosted_zone = ? # Required field according to the API docs
next_name = fqdn(name)
next_type = type
loop do
response = route53.list_resource_record_sets(
hosted_zone_id: hosted_zone,
start_record_name: next_name,
start_record_type: next_type,
max_items: 100, # fyi - aws api maximum is 100 so we'll need to page
)
records = response.resource_record_sets
# Break here if you find the record you want
# Also break if we've run out of pages
break unless response.is_truncated
next_name = response.next_record_name
next_type = response.next_record_type
end

Using a function query from Solr

I'm trying to calculate the tf*idf of a term in my index.
Following Yonik's post from http://yonik.com/posts/solr-relevancy-function-queries/ I tried
http://localhost:8080/solr/select/?fl=score,id&defType=func&q=mul(tf(texto_completo,bug),idf(texto,bug))
(where texto_completo is the field, and 'bug' is the term) without much success. The response was:
error 400: The request sent by the client was syntactically incorrect (null).
I went ahead and looked at this answer /a/13477887 so I tried to do a simpler function query:
http://localhost:8080/solr/select/?q={!func}docFreq(texto_completo,bug)
And yet, I got the same error.
What is my syntax lacking to work properly?
For this not working:
q={!func}docFreq(texto_completo,bug)
use all lower-case docfreq:
q={!func}docfreq(texto_completo,bug)
I just tried:
q={!func}mul(tf(name,movie),idf(name,movie))
in Solr 4.2.1 and it is working fine. My field name is name (Text type) and term I am looking for is movie.
UPDATE: You need at least Solr 4.0 to use these. See http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions

Resources