Query all in ElasticSearch using Nest v. 2.1 - elasticsearch

var settings = new ConnectionSettings(Constants.ElasticSearch.Node);
var client = new ElasticClient(settings);
var response = client.Search<DtoTypes.Customer.SearchResult>(s =>
s.From(0)
.Size(100000)
.Query(q => q.MatchAll()));
It works when the size is smaller, but I want to retrieve all documents in an index that has over 100k documents. Must be a configuration setting I'm missing to get around a limit. I've also tried Take() instead of Size()
The Debug Info returned back is
"Invalid NEST response built from a unsuccesful low level call on
POST: /_search\r\n# Audit trail of this API call:\r\n - BadResponse:
Node: http://127.0.0.1:9200/ Took: 00:00:00.2964038\r\n# ServerError:
ServerError: 500Type: search_phase_execution_exception Reason: \"all
shards failed\"\r\n# OriginalException: System.Net.WebException: The
remote server returned an error: (500) Internal Server Error.\r\n at
System.Net.HttpWebRequest.GetResponse()\r\n at
Elasticsearch.Net.HttpConnection.Request[TReturn](RequestData
requestData) in
C:\users\russ\source\elasticsearch-net\src\Elasticsearch.Net\Connection\HttpConnection.cs:line
138\r\n# Request:\r\n\r\n#
Response:\r\n\r\n"

Elasticsearch has a soft limit on the amount of results it allows to return. If you want more then 10.000 results in one go, you should use the scan and scroll functionality :)
From the Elasticsearch documentation:
"Note that from + size can not be more than the
index.max_result_window index setting which defaults to 10,000. See
the Scroll API for more efficient ways to do deep scrolling."
Reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
https://nest.azurewebsites.net/nest/search/scroll.html

Related

Elastic-Cloud Not Receiving Data from Serilog Sink

I set up an Elastic Cloud to offload my local elasticsearch config (as one does), but for reasons unknown to me, I can't get it to show any logs in Elastic Cloud, despite it working fine locally.
The code I got: (modified for privacy reasons)
//var uri = new Uri("http://localhost:9200"); // old one
var uri = new Uri("https://my-server.kb.eastus2.azure.elastic-cloud.com:9243");
var sinkOptions = new ElasticsearchSinkOptions(uri)
{
AutoRegisterTemplate = true,
ModifyConnectionSettings = x => x.BasicAuthentication("elastic", "the password I was given"),
IndexFormat = $"test-logs-{env.EnvironmentName?.ToLower().Replace('.', '-')}-{DateTime.Now:yyyy-MM}",
};
Log.Logger = new LoggerConfiguration()
.ReadFrom.Configuration(config)
.Enrich.FromLogContext()
.Enrich.WithMachineName()
.WriteTo.Console()
.WriteTo.Elasticsearch(sinkOptions)
.Enrich.WithProperty("Environment", env.EnvironmentName)
.CreateLogger();
There are two possible reasons I can think of that might be the cause of this not working:
The credentials are wrong
The Uri is wrong
Every solution I've been given so far has provided the data in this fashion, and nowhere does it say what the URI I'm supposed to use looks like.
I get no errors.
I get no warnings.
I get no logs.
What am I doing wrong here?
The issue was using the incorrect uri. I wrote
my-server.kb.eastus2.azure.elastic-cloud.com:9243 rather than
my-server.es.eastus2.azure.elastic-cloud.com:9243.
Note the very tiny difference that is kb vs es in the url

Limitations for Google Speech to text Model Adaptation Apis

I aim to use the google model adaptation to improve the speech to text accuracy, but these APIs are not well documented anywhere.
https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/projects.locations.customClasses
I tried to create a custom class with 200000 values. And above that count, it is giving an error for the size of the payload and not for the entries count limit.
Where can I find the proper information/details of API and its restriction.
I am using the Ruby library to create custom classes.
Code to create the custom class .
cname = "TestClass"
items = 3_00_000.times.map{|e| Google::Cloud::Speech::V1p1beta1::CustomClass::ClassItem.new(value: Faker::Name.name) };
_class = Google::Cloud::Speech::V1p1beta1::CustomClass.new(name: cname, items: items);
request = Google::Cloud::Speech::V1p1beta1::CreateCustomClassRequest.new({custom_class: _class, parent: "projects/<projectID>/locations/global", custom_class_id: cname})
_klass = client.create_custom_class request
Getting the following error looks like it's being created/updated with the 10_000_000 values.
Google::Cloud::InvalidArgumentError: 3:Request payload size exceeds the limit: 10485760 bytes.. debug_error_string:{"created":"#1628230030.306827000","description":"Error received from peer ipv4:142.251.42.10:443","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Request payload size exceeds the limit: 10485760 bytes.","grpc_status":3}
Here's all the publicly available documentation about the API.
https://cloud.google.com/speech/docs/
https://cloud.google.com/speech-to-text/docs/release-notes
https://cloud.google.com/speech-to-text/pricing
https://cloud.google.com/speech-to-text/quotas
https://cloud.google.com/speech-to-text/sla
https://cloud.google.com/speech-to-text/docs/support#troubleshooting
https://cloud.google.com/speech-to-text/docs/best-practices
https://cloud.google.com/speech-to-text/docs/encoding
https://cloud.google.com/speech-to-text/docs/languages
https://cloud.google.com/speech-to-text/docs/apis
https://cloud.google.com/speech-to-text/docs/concepts
https://cloud.google.com/speech-to-text/docs/how-to
https://cloud.google.com/speech/docs/tutorials

Why and how is the quota "critial read requests" exceeded when using batchCreateContacts

I'm programming a contacts export from our database to Google Contacts using the Google People API. I'm programming the requests over URL via Google Apps Script.
The code below - using https://people.googleapis.com/v1/people:batchCreateContacts - works for 13 to about 15 single requests, but then Google returns this error message:
Quota exceeded for quota metric 'Critical read requests (Contact and Profile Reads)' and limit 'Critical read requests (Contact and Profile Reads) per minute per user' of service 'people.googleapis.com' for consumer 'project_number:***'.
For speed I send the request with batches of 10 parallel requests.
I have the following two questions regarding this problem:
Why, for creating contacts, I would hit a quotum regarding read requests?
Given the picture link below, why would sending 2 batches of 10 simultaneous requests (more precise: 13 to 15 single requests) hit that quotum limit anyway?
quotum limit of 90 read requests per user per minute as displayed on console.cloud.google.com
Thank you for any clarification!
Further reading: https://developers.google.com/people/api/rest/v1/people/batchCreateContacts
let payloads = [];
let lengthPayloads;
let limitPayload = 200;
/*Break up contacts in payload limits*/
contacts.forEach(function (contact, index) /*contacts is an array of objects for the API*/
{
if(!(index%limitPayload))
{
lengthPayloads = payloads.push(
{
'readMask': "userDefined",
'sources': ["READ_SOURCE_TYPE_CONTACT"],
'contacts': []
}
);
}
payloads[lengthPayloads-1]['contacts'].push(contact);
}
);
Logger.log("which makes "+payloads.length+" payloads");
let parallelRequests = [];
let lengthParallelRequests;
let limitParallelRequest = 10;
/*Break up payloads in parallel request limits*/
payloads.forEach(function (payload, index)
{
if(!(index%limitParallelRequest))
lengthParallelRequests = parallelRequests.push([]);
parallelRequests[lengthParallelRequests-1].push(
{
'url': "https://people.googleapis.com/v1/people:batchCreateContacts",
'method': "post",
'contentType': "application/json",
'payload': JSON.stringify(payload),
'headers': { 'Authorization': "Bearer " + token }, /*token is a token of a single user*/
'muteHttpExceptions': true
}
);
}
);
Logger.log("which makes "+parallelRequests.length+" parallelrequests");
let responses;
parallelRequests.forEach(function (parallelRequest)
{
responses = UrlFetchApp.fetchAll(parallelRequest); /* error occurs here*/
responses = responses.map(function (response) { return JSON.parse(response.getContentText()); });
responses.forEach(function (response)
{
if(response.error)
{
Logger.log(JSON.stringify(response));
throw response;
}
else Logger.log("ok");
}
);
Output of logs:
which makes 22 payloads
which makes 3 parallelrequests
ok (15 times)
(the error message)
I had raised the same issue in Google's issue tracker.
Seems that the single BatchCreateContacts or BatchUpdateContacts call consumes six (6) "Critical Read Request" quota per request. Still did not get an answer why for creating/updating contacts, we are hitting the limit of critical read requests.
Quota exceeded for quota metric 'Critical read requests (Contact and Profile Reads)' and limit 'Critical read requests (Contact and Profile Reads) per minute per user' of service 'people.googleapis.com' for consumer 'project_number:***'.
There are two types of quotas: project based quotas and user based quotas. Project based quotas are limits placed upon your project itself. User based quotes are more like flood protection they limit the number of requests a single user can make over a period of time.
When you send a batch request with 10 requests in it it counts as ten requests not as a single batch request. If you are trying to run this parallel then you are defiantly going to be overflowing the request per minute per user quota.
Slow down this is not a race.
Why, for creating contacts, I would hit a quota regarding read requests?
I would chock it up to a bad error message.
Given the picture link below, why would sending 13 to 15 requests hit that quota limit anyway? ((there are 3 read requests before this code)) quota limit of 90 read requests per user per minute as displayed on console.cloud.google.com
Well you are sending 13 * 10 = 130 per minute that would exceed the request per minute. There is also no way of knowing how fast your system is running it could be going faster as it will depend upon what else the server is doing at the time it gets your requests what minute they are actually being recorded in.
My advice is to just respect the quota limits and not try to understand why there are to many variables on Googles servers to be able to tack down what exactly a minute is. You could send 100 requests in 10 seconds and then try to send another 100 in 55 seconds and you will get the error you could also get the error after 65 seconds depend upon when they hit the server and when the server finished processing your initial 100 requests.
Again slow down.

C# NEST Bulk api failing with System.IO.IOException [duplicate]

This question already has an answer here:
Elasticsearch bulk insert with NEST returns es_rejected_execution_exception
(1 answer)
Closed 5 years ago.
I am trying to bulk insert data from SQL to ElasticSearch index. Below is the code I am using and total number of records is around 1.5 million. I think it something to do with connection setting but I am not able to figure it out. Can someone please help with this code or suggest better way to do it?
public void InsertReceipts
{
IEnumerable<Receipts> receipts = GetFromDB() // get receipts from SQL DB
const string index = "receipts";
var config = ConfigurationManager.AppSettings["ElasticSearchUri"];
var node = new Uri(config);
var settings = new ConnectionSettings(node).RequestTimeout(TimeSpan.FromMinutes(30));
var client = new ElasticClient(settings);
var bulkIndexer = new BulkDescriptor();
foreach (var receiptBatch in receipts.Batch(20000)) //using MoreLinq for Batch
{
Parallel.ForEach(receiptBatch, (receipt) =>
{
bulkIndexer.Index<OfficeReceipt>(i => i
.Document(receipt)
.Id(receipt.TransactionGuid)
.Index(index));
});
var response = client.Bulk(bulkIndexer);
if (!response.IsValid)
{
_logger.LogError(response.ServerError.ToString());
}
bulkIndexer = new BulkDescriptor();
}
}
Code works fine but takes around 10 mins to complete. When I try to increase batch size, it fails with below error:
Invalid NEST response built from a unsuccessful low level call on
POST: /_bulk
Invalid Bulk items: OriginalException: System.Net.WebException: The
underlying connection was closed: An unexpected error occurred on a
send. ---> System.IO.IOException: Unable to write data to the
transport connection: An existing connection was forcibly closed by
the remote host. ---> System.Net.Sockets.SocketException: An existing
connection was forcibly closed by the remote host
A good place to start is with batches of 1,000 to 5,000 documents or, if your documents are very large, with even smaller batches.
It is often useful to keep an eye on the physical size of your bulk requests. One thousand 1KB documents is very different from one thousand 1MB documents. A good bulk size to start playing with is around 5-15MB in size.
I had a similar problem. My problem was solved by adding following code, before the ElasticClient connection is established:
System.Net.ServicePointManager.Expect100Continue = false;

Elasticsearch.Net and Timeouts

I have a 4 node elasticsearch cluster. I have a .net console application that is designed to fill the cluster with data which comes from sql. Everything works fine as long as I keep the rate of records being added (or deleted) fairly low. If I increase the number of threads eventually I will see timeout errors from my console app. The cluster has a total of 48 cores and the average time it takes to index a record is about .1 seconds.
I have been able to get it to do about 7000 records (documents) per second. I never see any exceptions thrown from elasticsearch.net that indicate low resources. I never see any of the indexing queues overloaded. The servers never peak to more than about 10% cpu. It looks like the issue is not the cluster or it's configuration but something in the nest connection. Here is my code for the connection:
//set up the es client
Uri node = new Uri(ConfigurationManager.AppSettings["ESConnectionString"]);
var connectionPool = new SniffingConnectionPool(new[] { node });
ConnectionSettings settings = new ConnectionSettings(connectionPool);
settings.SetDefaultPropertyNameInferrer(p => p); //ditch the camelcase
settings.SniffOnConnectionFault(true);
settings.SniffOnStartup(true);
settings.SniffLifeSpan(TimeSpan.FromMinutes(1));
settings.SetPingTimeout(3000);
settings.SetTimeout(5000);
settings.MaximumRetries(5);
//settings.SetMaximumAsyncConnections(20);
settings.SetDefaultIndex("dummyindex");
settings.SetBasicAuthentication(ConfigurationManager.AppSettings["ESUser"], ConfigurationManager.AppSettings["ESPass"]);
ElasticClient client = new ElasticClient(settings);
I have the cluster set up with http.basic authentication, but I have tried with it turned on and off and there is no difference.
Here are some of the pertinent settings from the ES nodes:
discovery.zen.minimum_master_nodes: 2
discovery.zen.fd.ping_timeout: 30s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["CACHE01","CACHE02","CACHE03","CACHE04"]
cluster.routing.allocation.node_concurrent_recoveries: 5
indices.recovery.max_bytes_per_sec: 50mb
http.basic.enabled: true
http.basic.user: "admin"
http.basic.password: "XXXXXXX"
At this point I can't seem to figure out if it's the .Net client that is the issue or the servers? Everything points to the client but I'm at a loss for what to try next.
I don't think I can use the BulkAPI because I'm essentially just replicating changes from a SQL server and in order to keep them in sync I execute the change as soon as it's received.
It seems when I'm inserting new documents I can go at a much faster pace then when updating. I have read the updating docs and it almost reads like partial updates are better than full updates, but the there is the whole get-update-delete-reindex things that seems to happen with every update.
According to the es docs I'm not supposed to tweak the thread pools or the performance settings. I don't think I'm hitting any of those limits anyway. The ES error logs don't indicate any issue either.
Anyone have advice on what I can do to track down the connection errors?
UPDATE:
This is the actual error:
Error: Unexpected result (SaveToES). Elasticsearch.Net.Exceptions.MaxRetryException: Sniffing known nodes in the cluster caused a maxretry exception of its own ---> Elasticsearch.Net.Exceptions.SniffException: Sniffing known nodes in the cluster caused a maxretry exception of its own ---> Elasticsearch.Net.Exceptions.MaxRetryException: Retry timeout 00:00:05 was hit after retrying 1 times: 'GET _nodes/_all/clear?timeout=3000'.
InnerException: WebException, InnerMessage: The operation has timed out, InnerStackTrace: at System.Net.HttpWebRequest.GetResponse()
at Elasticsearch.Net.Connection.HttpConnection.DoSynchronousRequest(HttpWebRequest request, Byte[] data, IRequestConfiguration requestSpecificConfig)
InnerException: WebException, InnerMessage: The operation has timed out, InnerStackTrace: at System.Net.HttpWebRequest.GetResponse()
at Elasticsearch.Net.Connection.HttpConnection.DoSynchronousRequest(HttpWebRequest request, Byte[] data, IRequestConfiguration requestSpecificConfig) ---> System.AggregateException: One or more errors occurred. ---> System.Net.WebException: The operation has timed out
at System.Net.HttpWebRequest.GetResponse()
at Elasticsearch.Net.Connection.HttpConnection.DoSynchronousRequest(HttpWebRequest request, Byte[] data, IRequestConfiguration requestSpecificConfig)
--- End of inner exception stack trace ---
--- End of inner exception stack trace ---
at Elasticsearch.Net.Connection.RequestHandlers.RequestHandlerBase.ThrowMaxRetryExceptionWhenNeeded[T](TransportRequestState1 requestState, Int32 maxRetries)
at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.RetryRequest[T](TransportRequestState1 requestState)
at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.DoRequest[T](TransportRequestState1 requestState)
at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.RetryRequest[T](TransportRequestState1 requestState)
at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.DoRequest[T](TransportRequestState1 requestState)
at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.Request[T](TransportRequestState1 requestState, Object data)
at Elasticsearch.Net.Connection.Transport.Elasticsearch.Net.Connection.ITransportDelegator.Sniff(ITransportRequestState ownerState)
--- End of inner exception stack trace ---
--- End of inner exception stack trace ---
at Elasticsearch.Net.Connection.Transport.Elasticsearch.Net.Connection.ITransportDelegator.Sniff(ITransportRequestState ownerState)
at Elasticsearch.Net.Connection.Transport.Elasticsearch.Net.Connection.ITransportDelegator.SniffClusterState(ITransportRequestState requestState)
at Elasticsearch.Net.Connection.Transport.Elasticsearch.Net.Connection.ITransportDelegator.SniffOnConnectionFailure(ITransportRequestState requestState)
at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.RetryRequest[T](TransportRequestState1 requestState)
at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.DoRequest[T](TransportRequestState1 requestState)
at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.Request[T](TransportRequestState1 requestState, Object data)
at Elasticsearch.Net.Connection.Transport.DoRequest[T](String method, String path, Object data, IRequestParameters requestParameters)
at Elasticsearch.Net.ElasticsearchClient.DoRequest[T](String method, String path, Object data, IRequestParameters requestParameters)
at Elasticsearch.Net.ElasticsearchClient.IndicesCreatePost[T](String index, Object body, Func2 requestParameters)
at Nest.RawDispatch.IndicesCreateDispatch[T](ElasticsearchPathInfo1 pathInfo, Object body)
at Nest.ElasticClient.<CreateIndex>b__281_0(ElasticsearchPathInfo1 p, ICreateIndexRequest d)
at Nest.ElasticClient.Nest.IHighLevelToLowLevelDispatcher.Dispatch[D,Q,R](D descriptor, Func3 dispatch)
at Nest.ElasticClient.CreateIndex(Func2 createIndexSelector)
at DCSCache.esvRepository.CreateIndex(String IndexName, String IndexVersion)
at DCSCache.esvRepository.Save(esv ItemToSave, String IndexName, String IndexVersion)

Resources