Azure Storage Account crossing IOPS limit - performance

I have Azure Storage Account and it's crossing 20K IOPS limit. How can I reduce the IOPS of the storage account?
We are doing copy operation and also delete operation on the file share. Can we do this using batch operation so that it could be counted as one transaction.
Please advise.

If all the IOs are on File Share then, you are also constrained by the SMB behavior and the nature of the Fileshare contents and if are you deleting many small files? Here basically you must be deleting + creating 20,000 files every second?
Batch operations can be used only when you can delete the whole share
e.g. if you are uploading a blob, then using put blob will help reduce the IOs, instead of multiple put blocks/pages etc
You can use the following steps to measure the share IOPS from Performance Monitor. This can also be used to isolate which share has a significant amount of activity:
Open Performance Monitor (PerfMon.msc).
Go to Performance | Monitoring Tools | Performance Monitor.
Delete all default counters at the bottom of the display.
Click on the green + sign at the top of the display.
Under "Available counters," expand "SMB Client Shares."
Select "Data Requests/sec" which is the total number of IOPS.
a. You can also select "Data Bytes/sec" if you want to see throughput and whether you are getting to the 60 MBps limit.
Under "Instances of selected object," choose the share(s) that you suspect are hitting the 1000 IOPS limit.
a. Use Ctrl to select multiple, or choose "" to select all.
Click on "Add >>" and then click on OK.
The graph will display the IOPS over time. The "Last" field displays the number of IOs in the previous second.

Related

Experiments Feature stuck on collecting data

I am trying to split traffic from a given flow into different versions to measure statistical performance over time using the Experiment feature. However, it always shows the state "Collecting Data".
Here are the steps to reproduce the issue --
Create an Experiment on a flow and select different versions
Select Auto rollout and Select Steps option
Add steps for gradual progress of traffic increase and minimum duration
Save and Start the Experiment
Send queries to chatbot triggering the configured flow for the Experiment
The experiment should show some results in the Status tab and compare the performance of multiple flow versions. However, It does not produce any results. Always show the status as "Collecting Data" and Auto Rollout as "Not Started".
The only prerequisite for the Experiments feature to work is to enable the Interaction logs which are already enabled on my virtual agent.
About 2.5K sessions (~4K interactions ) were created in the last 48 hours. Are there any minimum requirements for it to generate results like the minimum number of sessions etc.?

How to make a Spotfire link open faster?

I've published a Spotfire file with 70 '.txt' files linked to it. The total size of the files is around 2Gb. when the users open it in their web browser it takes + - 27 minutes to load the linked tables.
I need an option that enhances opening performance. The issue seems to be the aumont of data and the way they are linked to Spotfire.
This runs in a server and the users open the BI in their browser.
I've tryed to embeed the data, it lowers the time, but forces me to interact with the software every time I want to update the data. The solution is supposed to run automatically.
I need to open this in less than 5 minutes.
Update:
- I need the data to be updated at least twice a day.
- The embedded link is acceptable from the time perspective, but the system need to run without my intetrvention.
- I've never used Spotfire automation services.
Schedule the report to cache twice a day on the Spotfire server by setting up a rule under scheduling and routing. The good thing about this is while it is updating the analysis for the second time during the day, it will still allow users to quickly open older data until it is complete. To the end user it will open in seconds but behind the scenes you have just pre-opened the report. Once you set up the rule this will run automatically with no intervention needed.
All functionality and scripting within the report will work the same, and it can be opened up many times at the same time from different users. This is really the best way if you have to link to that many files. Otherwise, try collapsing files, aggregating data, removing all unnecessary columns and data tables for the data to pull through faster.

Hadoop vs Cassandra: Which is better for the following scenario?

There is a situation in our systems in which the user can view and "close" a report. After they close it, the report is moved to a temporary table inside the database where it is kept for 24 hrs, and then moved to an archives table(where the report is stored for next 7 years). At any point during the 7 years, a user can "reopen" the report and work on it. The problem is that archives storage is getting large and finding/reopening reports tend to be time consuming. And I need to get statistics on the archives from time to time(i.e. report dates, clients, average length "opened", etc). I want to use a big data approach but I am not sure whether to use Hadoop, Cassandra, or something else ? Can someone provide me with some guidelines how to get started and decide on what to use ?
If you archive is large and you'd like to get reports from it, you won't be able to use just Cassandra, as it has no easy means of aggregating the data. You'll end up collocating Hadoop and Cassandra on the same nodes.
From my experience archives (write once - read many) is not the best use case for Cassandra if you're having a lot of writes (we've tried it for a backend for a backup sysyem). Depending on your compaction strategy you'll pay either in space or in iops for having that. Added changes are propagated through the SSTable hierarchies resulting in a lot more writes than the original change.
It is not possible to answer your question in full without knowing other variables: how much hardware (servers, their ram/cpu/hdd/ssd) are you going to allocate? what is the size of each 'report' entry? how many reads / writes you usually serve daily? How large is your archive storage now?
Cassandra might work fine. Keep two tables, reports and reports_archive. Define the schema using a TTL of 24 hours and 7 years:
CREATE TABLE reports (
...
) WITH default_time_to_live = 86400;
CREATE TABLE reports_archive (
...
) WITH default_time_to_live = 86400 * 365 * 7;
Use the new Time Window Compaction Strategy (TWCS) to minimize write amplification. It could be advantageous to store the report metadata and report binary data in separate tables.
For roll-up analytics, use Spark with Cassandra. You don't mention the size of your data, but roughly speaking 1-3 TB per Cassandra node should work fine. Using RF=3 you'll need at least three nodes.

Bulk Movement Jobs in FileNet 5.2.1

I have a requirement of moving documents from one storage area to another and planning to use Bulk Movement Jobs under Sweep Jobs in FileNet P8 v5.2.1.
My filter criteria is obviously (and only) the storage area id as I want to target a specific storage area and move the content to another storage area(kinda like archiving) without altering the security, relationship containment, document class etc.
When I run the job, though I have around 100,000 objects in the storage area that I am targeting; in examined objects field the job shows 500M objects and it took around 15hrs to move the objects. DBA analyze this situation to tell me that though I have all necessary indexes created on the docverion table(as per FileNet documentation), the job's still going for the full table scan.
Why would something like this happen?
What additional indexes can be used and how would that be helpful?
Is there a better way to do this with less time consumption?
Only for 2 and 3 questions.
About indexes you can use this documentation https://www-01.ibm.com/support/knowledgecenter/SSNW2F_5.2.0/com.ibm.p8.performance.doc/p8ppt237.htm
You can improve the performance of your jobs if you split all documents throught option "*Policy controlled batch size" (as i remember) at "Sweeps subsystem" tab in the Domain settings.
Use Time Slot management
https://www-01.ibm.com/support/knowledgecenter/SSNW2F_5.2.1/com.ibm.p8.ce.admin.tasks.doc/p8pcc179.htm?lang=ru
and Filter Timelimit option
https://www-01.ibm.com/support/knowledgecenter/SSNW2F_5.2.1/com.ibm.p8.ce.admin.tasks.doc/p8pcc203.htm?lang=ru
In commons you just split all your documents to the portions and process it in separated times and threads.

How to know that Fusion Tables usage is over limit?

I use Fusion Tables API to add rows to my tables. I found this answer explaining the limits. Will I get some error message when these limits are reached?
I am getting currently the following error
https://www.googleapis.com/upload/fusiontables/v1/tables/my-table-id/import?uploadType=media&alt=json
returned "Internal error when processing import. Please try again.">
and don't know what is the reason.
Total number of rows in my document is 464'938. Number of cells is 13 * 464'938 (non-empty cells - 5'295'364). Downloaded file size is 43M (not sure how to check file size directly on Google side). But when I've created new table, it started to work well.
Looks like a capacity limit was hit.
In December 2015, Fusion Tablkes announced increased limits.
We are happy to announce that, starting immediately:
All users have 1 GB of storage quota for their tables. There continues
to be a 250 MB limit per table.
Newly created tables can show up to 350,000 features on a map. There
continues to be a limit of 1 M characters per cell and 10 M vertices
per table. You can activate the new limit for existing tables by
opening the row editor and then clicking the “Save” button.
If you try again it may work, but 465k rows is still high.

Resources