"Recent values to use" in Toloka quality control - crowdsourcing

Can the "Recent values to use" for control tasks be set to a larger value than the current pool size to include older pools?

Yes, you can put a larger value than the current pool size. The rule will be spread to the other pools where you indicate "recent values to use" for control tasks. To make a calculation based on control task responses from all project pools, fill in the field in the rule for each pool.
In other words, image you have 3 pools. In each pool you have control tasks rule. In 1st and 3rd pool you set "Recent values to use​" = 10. In the 2nd pool you did not indicate any values for "Recent values to use​“. So the performer's "control tasks history" will include only 1st and 3rd pools.

Related

Experiments Feature stuck on collecting data

I am trying to split traffic from a given flow into different versions to measure statistical performance over time using the Experiment feature. However, it always shows the state "Collecting Data".
Here are the steps to reproduce the issue --
Create an Experiment on a flow and select different versions
Select Auto rollout and Select Steps option
Add steps for gradual progress of traffic increase and minimum duration
Save and Start the Experiment
Send queries to chatbot triggering the configured flow for the Experiment
The experiment should show some results in the Status tab and compare the performance of multiple flow versions. However, It does not produce any results. Always show the status as "Collecting Data" and Auto Rollout as "Not Started".
The only prerequisite for the Experiments feature to work is to enable the Interaction logs which are already enabled on my virtual agent.
About 2.5K sessions (~4K interactions ) were created in the last 48 hours. Are there any minimum requirements for it to generate results like the minimum number of sessions etc.?

Azure Storage Account crossing IOPS limit

I have Azure Storage Account and it's crossing 20K IOPS limit. How can I reduce the IOPS of the storage account?
We are doing copy operation and also delete operation on the file share. Can we do this using batch operation so that it could be counted as one transaction.
Please advise.
If all the IOs are on File Share then, you are also constrained by the SMB behavior and the nature of the Fileshare contents and if are you deleting many small files? Here basically you must be deleting + creating 20,000 files every second?
Batch operations can be used only when you can delete the whole share
e.g. if you are uploading a blob, then using put blob will help reduce the IOs, instead of multiple put blocks/pages etc
You can use the following steps to measure the share IOPS from Performance Monitor. This can also be used to isolate which share has a significant amount of activity:
Open Performance Monitor (PerfMon.msc).
Go to Performance | Monitoring Tools | Performance Monitor.
Delete all default counters at the bottom of the display.
Click on the green + sign at the top of the display.
Under "Available counters," expand "SMB Client Shares."
Select "Data Requests/sec" which is the total number of IOPS.
a. You can also select "Data Bytes/sec" if you want to see throughput and whether you are getting to the 60 MBps limit.
Under "Instances of selected object," choose the share(s) that you suspect are hitting the 1000 IOPS limit.
a. Use Ctrl to select multiple, or choose "" to select all.
Click on "Add >>" and then click on OK.
The graph will display the IOPS over time. The "Last" field displays the number of IOs in the previous second.

Nifi QueryDatabaseTable processor, when will it reset the value?

According to the document mentioned below, it seems like if I will restart the processor it will reset the value of maximum column value I have provided and will start fetching data from the beginning.
Document Link: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.QueryDatabaseTable/index.html
A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running.
However, I tested this behavior, and even if I restart the processor I get incremental load only. is there a mistake in the document or have missed something?
What would happen if I re-deploy the job, I mean deleting the job and re-creating it from the template?
In the code, it has mentioned that the value will be stored as part of Scop.CLUSTER. would someone please explain to me what is it? and in which conditions the state will be cleared?
#Stateful(scopes = Scope.CLUSTER, description = "After performing a query on the specified table, the maximum values for " + "the specified column(s) will be retained for use in future executions of the query. This allows the Processor " + "to fetch only those records that have max values greater than the retained values. This can be used for " + "incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor " + "per the State Management documentation")
Once the processor is started the first time, it will never reset it's value unless you go into the the "View State" menu of the processor and click "Clear State".
It would not make sense to clear the state when starting and stopping the processor because then any time NiFi restarted for maintenance or a crash then it would reset which would not be desired.
Where the state is stored is dependent on whether you are running a single node or a cluster. In a single node it is stored in a local write ahead log, in a cluster it is stored in ZooKeeper so all nodes can access it if necessary. In either case it stored by the UUID of the processor.

Default termination policy of AWS auto scaling group

In an auto scaling group, if there are equal number of instances in multiple availability zones, which availability zone will be selected for terminating instances as per the AWS default termination policy? Is it randomly selected?
According to the documentation, if you did not assign a specific termination policy to the group, it uses the default termination policy.
In the scenario when an equal number of instances are there in multiple availability zones, Auto Scaling group selects the Availability Zone with the instances that use the oldest launch configuration.
If the instances were launched from the same launch configuration, then the Auto Scaling group selects the instance that is closest to the next billing hour and terminates it.

Bulk Movement Jobs in FileNet 5.2.1

I have a requirement of moving documents from one storage area to another and planning to use Bulk Movement Jobs under Sweep Jobs in FileNet P8 v5.2.1.
My filter criteria is obviously (and only) the storage area id as I want to target a specific storage area and move the content to another storage area(kinda like archiving) without altering the security, relationship containment, document class etc.
When I run the job, though I have around 100,000 objects in the storage area that I am targeting; in examined objects field the job shows 500M objects and it took around 15hrs to move the objects. DBA analyze this situation to tell me that though I have all necessary indexes created on the docverion table(as per FileNet documentation), the job's still going for the full table scan.
Why would something like this happen?
What additional indexes can be used and how would that be helpful?
Is there a better way to do this with less time consumption?
Only for 2 and 3 questions.
About indexes you can use this documentation https://www-01.ibm.com/support/knowledgecenter/SSNW2F_5.2.0/com.ibm.p8.performance.doc/p8ppt237.htm
You can improve the performance of your jobs if you split all documents throught option "*Policy controlled batch size" (as i remember) at "Sweeps subsystem" tab in the Domain settings.
Use Time Slot management
https://www-01.ibm.com/support/knowledgecenter/SSNW2F_5.2.1/com.ibm.p8.ce.admin.tasks.doc/p8pcc179.htm?lang=ru
and Filter Timelimit option
https://www-01.ibm.com/support/knowledgecenter/SSNW2F_5.2.1/com.ibm.p8.ce.admin.tasks.doc/p8pcc203.htm?lang=ru
In commons you just split all your documents to the portions and process it in separated times and threads.

Resources