Watson Discovery Service - File size limit - watson-discovery

The Watson discovery service documentation says limit of file "The maximum file size that can be uploaded to the Discovery service is 50MB." But it seems there is a character limit as well which is 50000 characters.
The warning message is "Text content exceeds 50000 character limit. Only first 50000 characters processed...".
Can anyone please confirm?

The warning about exceeding 50,000 characters is talking about enrichments. The text of entire document is still added to your Discovery collection, but the enrichments only apply to the first 50,000 characters of the document text.

Related

Opensearch throws 429 error when Fluentbit outputs log under heavy load

With the below fluentbit configuration we are getting errors from opensearch under heavy load.
Http bulk requests to opensearch by fluentbit(respresenting 429 errors as spike)
Fluentbit config:
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
DB /var/log/flb_kube.db
Mem_Buf_Limit 400M
storage.type filesystem
Skip_Long_Lines On
Refresh_Interval 1
Rotate_Wait 600
[OUTPUT]
Name es
Match kube.*
Host ${ES_HOST}
Port ${PORT}
Buffer_Size False
AWS_Auth Off
AWS_Role_ARN ${ES_ARN}
AWS_External_ID ${ES_IAMROLE}
HTTP_User ${ES_USER}
HTTP_Passwd ${ES_PASSWD}
tls On
tls.verify Off
Trace_Output ${TRACE_OUTPUT}
Trace_Error On
Replace_Dots On
Index fluentbit
Type flb
AWS_Region ${AWS_REGION}
Logstash_Format On
Logstash_Prefix ${ES_LOGSTASHPREFIX}_app_log
Logstash_DateFormat %Y.%m.%d
Retry_Limit 10
storage.total_limit_size 1G
For resolving this we have upgraded our opensearch instance type from r5.xlarge.search(4 nodes) to r5.2xlarge.search(3 nodes) but that also didn't solve the issue.
We have also increased the ES index refresh_interval to 60s but that didn't help.
We read that output to ES from fluentbit can be controlled via buffering so we decreased Mem_Buf_Limit to 400M and it didn't help.
Can someone help if can try any other things or we are missing something.
The issue here is not that of fluentbit but is of opensearch/elasticsearch.
The HTTP 429 errors (es_request_rejected_exception) in ES occur when too many requests are sent to the cluster, than what the thread pool for it can handle. The thread pool in OpenSearch for different tasks are allocated differently with search operations getting a larger share. The option to manually modify thread pool allocation is not available for versions 5.1 and later.
You can try to resolve this by few ways.
1: Refresh rate (you already did that and it didn't help).
2: Change the indexing speed. Try to send logs with an interval greater than your current.
3: Upscale (you did and it didn't work either)
You can get an idea with the following formula for thread pools.
Number of thread pools allocated for writes = Number of Virtual CPUs (your case)
Number of thread pools allocated for search = ((3 * Number of virtual CPUs)/2) + 1
So, I am guessing your issue here is a big number of shards! You can either decrease the shards for each index or if you are having this issue only once in a while when there is extra load, you can change the replica count to 0 and when the period is finished, change it back to the original.
Check these two links to find out more about optimizing your ES domain.
indexing performance
Best practices

What counts as a "Schema" for the Power Automate limit of 512?

I have a swagger document that is 431 lines long. When I enter it into the Custom Connections swagger editor, it parses out just fine.
But when I click "Update Connection" I get the following error:
Specified swagger has the following errors:
'Definition is not valid. Error: 'Critical : definitions/Order :
The total number of schemas in the object exceeds the max schema count allowed value of '512'.
Please remove any unnecessary property or item definitions. '
I went to this page and it does indicate that Power Automate has a limit of 512 for its "Maximum schema count per body allowed in a Swagger file".
But I am not sure what this really means. My whole file is less than 512 lines, so what ever it is calling a "schema" should also be less than 512 right?
So here is my question:
What is a "Schema" to Power Automate? (And how can I see where all of them are hiding in my file?)

Random (403) User Rate Limit Exceeded

I am using translate API to translate some texts in my page, those texts are large html formated texts, so I had to develop a function that splits these texts into smaller pieces less than 4500 characters (including html tags) to avoid the limit of 5000 characters per request, also I had to modify the Google PHP API to allow send requests via POST.
I have enabled the paid version of the api in Goole Developers Console, and changed the total quota to 50M of characters per day and 500 requests/second/urser.
Now I am translating the whole database of texts with a script, it works fine but at some random points I revive the error "(403) User Rate Limit Exceeded", and I have to wait some minutes to re-run the script because when reached the error the api is returning the same error over and over until I wait some time.
I don't know why it keeps returning the error if I don't pass the number of requests, it's like it has some kind of maximum chaaracters per each interval of time or something...
You probably exceed the quota limits you set before: this is either the daily billable or the limit on the request characters per second.
To change the usage limits or request an increase to your quota, do the following:
1. Go to the Google Developers Console "https://console.developers.google.com/".
2. Select a project.
3. On the left sidebar, expand APIs & auth.
4. Click APIs.
5. Click the name of an activated API you're interested in "i.e. The Translate API".
6. Near the top of the info page for the API, click Quota.
If you have the billing enabled, just click Quota and it will take you to the quota page where you can view and change the quota-related settings.
If not, clicking Quota shows information about any free quota and limits that apply to the Translate API.
Google Developer Console has a rate limit of 10 requests per second, regardless of the settings or limits you may have changed.
You may be exceeding this limit.
I was unable to find any documentation around this, but could verify it myself with various API requests.
You control the characters limitation but not the concurrency
You are either making more than 500 concurrent request/second or you are using another Google API that is hitting such concurrency limitation.
The referer header is not set by default, but it is possible to add the headers to a request like so:
$result = $t->translate('Hola Mundo', [
'restOptions' => [
'headers' => [
'referer' => 'https://your-uri.com'
]
]
]);
If it makes more sense for you to set the referer at the client level (so all requests flowing through the client receive the header), this is possible as well:
$client = new TranslateClient([
'key' => 'my-api-key',
'restOptions' => [
'headers' => [
'referer' => 'https://your-uri.com'
]
]
]);
This worked for me!
Reference
In my case, this error was caused by my invalid payment information. Go to Billing area and make sure everything is ok.

AppFabric QuotaExceededException

When trying to insert a large item into the AppFabric cache I get an error
Microsoft.ApplicationServer.Caching.DataCacheException:ErrorCode<ERRCA0016>:SubStatus<ES0001>:The connection was terminated, possibly due to server or network problems or serialized Object size is greater than MaxBufferSize on server. Result of the request is unknown. ---> System.ServiceModel.CommunicationException: The maximum message size quota for incoming messages (183886080) has been exceeded. To increase the quota, use the MaxReceivedMessageSize property on the appropriate binding element. ---> System.ServiceModel.QuotaExceededException: The maximum message size quota for incoming messages (183886080) has been exceeded. To increase the quota, use the MaxReceivedMessageSize property on the appropriate binding element.
--- End of inner exception stack trace ---
at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at System.ServiceModel.Channels.TransportDuplexSessionChannel.EndReceive(IAsyncResult result)
at Microsoft.ApplicationServer.Caching.WcfClientChannel.CompleteProcessing(IAsyncResult result)
The problem is I can find very little documentation on this issue.
I can see various links discussing the issue all pointing to sites that no longer exist.
e.g.
http://www.biztalkgurus.com/appfabric/b/appfabric-syn/archive/2011/04/19/understanding-the-windows-azure-appfabric-service-bus-quotaexceededexception.aspx
I've also found the following which discusses setting the MaxReceivedMessageSize property.
http://msdn.microsoft.com/en-us/library/ee677250(v=azure.10).aspx
However on my install of AppFabric 1.1 on Windows server I don't have the cmdlet Set-ASAppServiceEndpoint and cannot find where to locate it.
Error not in AppFabric. The service need more size to transmit messsage to other endpoint. You remember this configuration must be the same at both endpoints.

Named pipe performance

I have an application, where I send approx. 125 data items via a named pipe.
Each data item consists of data block 1 with max. 300 characters and data block 2 with max. 600 characters.
This gives 125 data items * (300 + 600) characters * 2 bytes per character = 125 * 900 * 2 = 225000 bytes.
Each data item is surrounded by curly braces like {Message1}{Message2}.
I noticed that if I send the messages, there are sending/receiving problems. Instead of {Message1}{Message2} the receiving application gets {Messa{Message2}.
Then I changed the sending code so that the messages are sent in 500 ms intervals. Then, the problem disappeared.
If I do everything correct (no bugs on my side, no misconfiguration of named pipes), how much time is required to send 225000 bytes over a named pipe from application in Delphi 2009 to application in .NET on the same machine?
What is a reasonable time for sending data of that size?

Resources