Why does file operations hangs after deleting folder on large NTFS volume - windows

There is a computer under Windows Server 2012R2 with 54,5 TB NTFS volume. Volume is almost full and highly fragmented (defrag.exe says that it has 98% fragmented space). It is used for storing video archive and have folder structure: d:\Video\. There are about 4K folders under the Video and each folder contains 100 - 12K files.
When I delete any of that folders there is a very long interval (minutes or tens of minutes) when every WINAPI file function "hangs". They not returns any error they simply doesn't return. After that period of time they finally returns without any error.
I can see in Windows Performance Analyzer that while hanging time there is one CPU-consuming thread in System process. Its stack:
2 [Root]
3 |- ntoskrnl.exe!KiStartSystemThread
4 | ntoskrnl.exe!PspSystemThreadStartup
5 | |- ntoskrnl.exe!ExpWorkerThread
6 | | |- Ntfs.sys!NtfsCheckpointAllVolumes
7 | | | Ntfs.sys!NtfsForEachVcb
8 | | | Ntfs.sys!NtfsCheckpointAllVolumesWorker
9 | | | Ntfs.sys!NtfsCheckpointVolume
10 | | | Ntfs.sys!NtfsFreeRecentlyDeallocated
11 | | | |- Ntfs.sys!NtfsDeviceIoControl
12 | | | | |- Ntfs.sys!NtfsCallStorageDriver
13 | | | | | ntoskrnl.exe!KeExpandKernelStackAndCalloutInternal
14 | | | | | ntoskrnl.exe!KiSwitchKernelStackContinue
15 | | | | | ntoskrnl.exe!KySwitchKernelStackCallout
16 | | | | | Ntfs.sys!NtfsStorageDriverCallout
17 | | | | | volsnap.sys!VolSnapDeviceControl
18 | | | | | |- volsnap.sys!VspQueryCopyFreeBitmap
19 | | | | | | |- ntoskrnl.exe!RtlFindNextForwardRunClearCapped
Can anyone help me to understand what's going on? I have full access to the server and can give any additional info.
NTFSInfo output:
NTFS Information Dump V1.01
Copyright (C) 1997 Mark Russinovich
http://www.sysinternals.com
Volume Size
-----------
Volume size : 57223549 MB
Total sectors : 117193830399
Total clusters : 3662307199
Free clusters : 7644452
Free space : 119444 MB (0% of drive)
Allocation Size
----------------
Bytes per sector : 512
Bytes per cluster : 16384
Bytes per MFT record : 1024
Clusters per MFT record: 0
MFT Information
---------------
MFT size : 14959 MB (0% of drive)
MFT start cluster : 196608
MFT zone clusters : 3653996704 - 3654008160
MFT zone size : 179 MB (0% of drive)
MFT mirror start : 1

Unfortunately I could not reach developers in my support conversation. Issue stops to reproduce on this particular machine after we make volume backup. And support incident was closed.
We rework archive writing mechanism to reduce file fragmentation and free space fragmentation and newer seen the issue on low fragmented volumes.
I guess the issue related to free space fragmentation. But I have no proof.

Related

When restoring a database on Cockroach why is there a difference in number of index entries/rows restored?

When backing up a database in CockroachDB the number of rows/index_entries/bytes written are included in the output:
job_id | status | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+---------+---------------+--------------
903515943941406094 | succeeded | 1 | 100000 | 200000 | 4194304
When restoring the same backup the number of the same metrics are reported:
job_id | status | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+---------+---------------+--------------
803515943941406094 | succeeded | 1 | 99999 | 199999 | 4194200
What causes the difference between the two and is all my data restored?
A couple of things impact the metrics you are observing, the backup phase may make a copy of specific system tables for metadata, which are not directly restored from the backup image. The configurations from these system tables are applied to your restored database, but they don't count toward the reported metrics for rows / index_entries.
So a relatively small delta between the two values is not unusual in full/backup restore scenarios and not out of the ordinary.

Converting Raw Data to Event Log

I do research in the field of Health-PM and facing an unstructured big data which needs a preprocessing phase for converting to suitable event log.
I've just googled and understood no ProM plug-in, stand-alone code, or script has developed specially for this task. Except Celonis, which has claimed developed an event log convertor. I'm also writing an event log generator code for my specific case study.
I just want to know, is there any business solution, case study or article on this topic which investigated this issue?
Thanks.
Soureh
What do you exactly mean with unstructured? Is this a bad-structured table like the example you provided, or is it data that is not structured at all (e.g. a hard disk with files)?
In the first situation, Celonis indeed provide an option to extract events based on tables using Vertica SQL. In their free SNAP environment you can learn how to do that.
In the latter, I quess that at least semi-structured data is needed to extract events on large scale, otherwise your script has no clue where to look for.
Good question! Many process mining papers mention that most of the existing information systems are PAIS (process-aware information system) hence, qualified to perform process mining on them. This is true, BUT, it does not mean you can get the data out-of-the-box!
What's the solution? You may transform the existing data (typically from a relational database of your business solution, e.g., an ERP or HIS system) into an event log that process mining can understand.
It works like this: you look into the table containing, e.g., patient registration data. You need the patient ID of this table and the timestamp of registration for each ID. You create an empty table for your event log, typically called "Activity_Table". You consider giving a name to each activity depending on the business context. In our example "Patient Registration" would be a sound name. You insert all the patient IDs with their respective timestamp into the Activity_Table followed by the same activity name for all rows, i.e., "Patient Registration". The result looks like this:
|Patient-ID | Activity | timestamp |
|:----------|:--------------------:| -------------------:|
| 111 |"Patient Registration"| 2021.06.01 14:33:49 |
| 112 |"Patient Registration"| 2021.06.18 10:03:21 |
| 113 |"Patient Registration"| 2021.07.01 01:20:00 |
| ... | | |
Congrats! you have an event log with one activity. The rest is just the same. You create the same table for every important action that has a timestamp in your database, e.g., "Diagnose finished", "lab test requested", "treatment A finished".
|Patient-ID | Activity | timestamp |
|:----------|:-----------------:| -------------------:|
| 111 |"Diagnose finished"| 2021.06.21 18:03:19 |
| 112 |"Diagnose finished"| 2021.07.02 01:22:00 |
| 113 |"Diagnose finished"| 2021.07.01 01:20:00 |
| ... | | |
Then you UNION all these mini tables and sort it based on Patient-ID and then by timestamp:
|Patient-ID | Activity | timestamp |
|:----------|:--------------------:| -------------------:|
| 111 |"Patient Registration"| 2021.06.01 14:33:49 |
| 111 |"Diagnose finished" | 2021.06.21 18:03:19 |
| 112 |"Patient Registration"| 2021.06.18 10:03:21 |
| 112 |"Diagnose finished" | 2021.07.02 01:22:00 |
| 113 |"Patient Registration"| 2021.07.01 01:20:00 |
| 113 |"Diagnose finished" | 2021.07.01 01:20:00 |
| ... | | |
If you notice, the last two rows have the same timestamp. This is very common when working with real data. To avoid this, we need an extra column called "sorting" which helps the process mining algorithm to understand the "normal" order of activities with the same timestamp according to the nature of the underlying business. In this case, we can easily know that registration happens before diagnosis hence, we assign a low value (e.g., 1) to all "Patient Registration" activities. The table might look like this:
|Patient-ID | Activity | timestamp |Order |
|:----------|:--------------------:|:-------------------:| ----:|
| 111 |"Patient Registration"| 2021.06.01 14:33:49 | 1 |
| 111 |"Diagnose finished" | 2021.06.21 18:03:19 | 2 |
| 112 |"Patient Registration"| 2021.06.18 10:03:21 | 1 |
| 112 |"Diagnose finished" | 2021.07.02 01:22:00 | 2 |
| 113 |"Patient Registration"| 2021.07.01 01:20:00 | 1 |
| 113 |"Diagnose finished" | 2021.07.01 01:20:00 | 2 |
| ... | | | |
Now, you have an event log that process mining algorithms undertand!
Side note:
there has been many attempts to automate event log extraction process. The works of "Eduardo González López de Murillas" are really interesting if you want to follow this topic. I could also recommend this open-access paper by Eduardo et al. 2018:
"Connecting databases with process mining: a meta model and toolset" (https://link.springer.com/article/10.1007/s10270-018-0664-7)

Cost Savings of ECS/EKS over Straight EC2

I've read plenty of blogs that talk about 25-50% cost savings of moving a micro services fleet from straight EC2 VMs to containers on either ECS or EKS. While that's compelling, I'm scratching my head on how that might be, given the cost estimations using some simple models with the AWS Pricing Calculator. I'm sure I'm oversimplifying the problem here with my estimations below, but the scale of price difference is nearly a factor of five ($68 vs. $319), which begs the question, where are the cost savings?
For instance, assume a small cluster of eight services that work well on a [small t4g][2]:
| Instance | EC2 Type | vCPU | Mem (GB) | Storage (GB) | Monthly Cost |
| ---------- | --------- | ---- | -------- | ------------ | ------------:|
| Service 1 | t4g.small | 2 | 2 | 8 | USD 8.47 |
| Service 2 | t4g.small | 2 | 2 | 8 | USD 8.47 |
| Service 3 | t4g.small | 2 | 2 | 8 | USD 8.47 |
| Service 4 | t4g.small | 2 | 2 | 8 | USD 8.47 |
| Service 5 | t4g.small | 2 | 2 | 8 | USD 8.47 |
| Service 6 | t4g.small | 2 | 2 | 8 | USD 8.47 |
| Service 7 | t4g.small | 2 | 2 | 8 | USD 8.47 |
| Service 8 | t4g.small | 2 | 2 | 8 | USD 8.47 |
| **Totals** | | 16 | 16 | 64 | USD 67.76 |
If I were to move to ECS/EKS and purchase some larger c5's with equivalent vCPU, this is what I'm guessing I'd need to accomplish the same thing:
| Instance | EC2 Type | vCPU | Mem (GB) | Storage (GB) | Monthly Cost |
| ---------- | ---------- | ---- | -------- | ------------ | ------------:|
| Service 1 | c5.2xlarge | 8 | 16 | 32 | USD 159.42 |
| Service 2 | | | | | |
| Service 3 | | | | | |
| Service 4 | | | | | |
| Service 5 | c5.2xlarge | 8 | 16 | 32 | USD 159.42 |
| Service 6 | | | | | |
| Service 7 | | | | | |
| Service 8 | | | | | |
| **Totals** | | 16 | [32][1] | 64 | USD 318.84 |
As I mentioned, I'm sure this is a naive comparison, but I figured I'd end up in the same ballpark and not be off by a factor of 5. I understand that ECS/EKS will give me better resource utilization, but it would need to increase efficiency 470% just to break even, which seems unreasonable.
[1]: While the c5's have twice the memory, given the 1:10 ratio of mem:vCPU, I don't believe this is contributing significantly to the delta.
[2]: Assuming 1 Year Reservation, EC2 Instance Savings Plan, No Upfront
The comparison are not valid because they are different products, so like Vikrant said it is comparing Apples to Oranges.
t4g is a burstable CPU instance
Suitable for websites with spike traffic
relatively small number of users (spike in numbers of visitors)
Before t4g came out, there are t3a, t3, t2, t1... each newer generation offer better performance at lower price. It is also based on graviton processors, not the Intel Xeon C5 is using. Additionally you are also factoring in reserved instance.
When CPU credits runs out for t4g instance...
T instances are so affordable because once the CPU credits run out, the CPU will be slowed down to a crawl. (for example 10% in micro instances)
C5 is for high and constant CPU load
Firstly the pricing is not as competitive with new products released few months ago.
Also it does not offer burstable CPU performance
C5 provide high constant raw CPU power
Focus on CPU and relatively less RAM
C5 offers better network bandwidth
C5 is suitable for applications with constantly heavy CPU load. Web server is usually not demanding for the CPU, and the workload spikes when traffic pattern changes.
Unless for websites with very fast response time requirement that involves CPU heavy calculations, T family instances are more suitable for web servers.
Of course, if the website is serving large amount of people from multiple timezones, then the workload will be high and more stable. In this situation C5 may be a better choice.
You can run the CPU at 100% all the time if you need, there is nothing about CPU credit and it will not slow down. It provide constant and high CPU performance you can use all the time.
Using C5 with T4g
A tactical setup for a high visitors web server is to use C5 to provide a very solid baseline performance, and use T instance to handle extra traffic during busy hours. For example a food ordering platform can use C5 to handle baseline customer orders, and have T instance to take care of peak hours around lunch and dinner.
This way, when traffic drops, the T instance will slowly gain CPU credits. Also you will not have to worry the servers becoming very slow (10% speed) if the CPU runs out because you have a very fast C5 instance to back it up even if all T instances are slowing down.

How faster is tensorflow-gpu with AVX and AVX2 compared with it without AVX and AVX2?

How faster is tensorflow-gpu with AVX and AVX2 compared with it without AVX and AVX2?
I tried to find an answer using Google but with no success. It's hard to recompile tensorflow-gpu for Windows. So, I want to know if it worth it.
If your computation is one giant matmul on CPU, you will get 3x speed-up on Xeon V3 (see benchmark here). But it's also possible to see no speed-up, presumably because there's not enough time spent in high arithmetic intensity ops executed on CPU.
Here's a table from "High Performance Models" guide for training of resnet50 on CPU with difference optimizations. It looks like you can get 2.5 speed-up with best settings
| Optimization | Data Format | Images/Sec | Intra threads | Inter Threads |
: : : (step time) : : :
| ------------ | ----------- | ------------ | ------------- | ------------- |
| AVX2 | NHWC | 6.8 (147ms) | 4 | 0 |
| MKL | NCHW | 6.6 (151ms) | 4 | 1 |
| MKL | NHWC | 5.95 (168ms) | 4 | 1 |
| AVX | NHWC | 4.7 (211ms) | 4 | 0 |
| SSE3 | NHWC | 2.7 (370ms) | 4 | 0 |
If you are able to compile an optimized version for Windows, it would help to mention it in this issue -- https://github.com/yaroslavvb/tensorflow-community-wheels/issues/13 , it seems there's some demand for such a build

magento compilation mode vs apc

Magento has a compilation mode in which you can compile all files of a Magento installation in order to create a single include path to increase performance. http://alanstorm.com/magento_compiler_path http://www.magentocommerce.com/wiki/modules_reference/english/mage_compiler/process/index
In my current shop setup, I have already configured apc to be used as an opcode cache, and am leveraging its performance gains. http://www.aitoc.com/en/blog/apc_speeds_up_Magento.html
My question are:
1) Is there any advantage of using apc over magento compilation mode, or vice versa? I have a dedicated server for magento, and am looking for maximum performance gains.
2) Will it be useful to use both of these togather? Why, or why not?
These do different things so both together is fine. APC will usually give the greater performance gain that simply enabling compilation, but doing both gives you the best of both worlds.
Just remember when you have enabled compilation you need to disable it before making any code changes or updating/installing modules, then recompile after.
As #JohnBoy has already said in his answer, both can be used in conjunction.
Beyond that, another concern was, if using apc would make the compilation redundant.
So I verified the scenario with some siege load tests and overall, there is definite improvement happening.
Here are the test results
siege --concurrent=50 --internet --file=urls.txt --verbose --benchmark --reps=30 --log=compilation.log
-------------|-------------------------------------------------------------------------------------------------------------------------|
|Compilation |Date & Time |Trans |Elap Time |Data Trans |Resp Time |Trans Rate |Throughput |Concurrent |OKAY |Failed |
-------------|-------------------------------------------------------------------------------------------------------------------------|
|No |2013-09-26 12:27:23 | 600 | 202.37 | 6 | 9.79 | 2.96 | 0.03 | 29.01 | 600 | 0|
-------------|-------------------------------------------------------------------------------------------------------------------------|
|Yes |2013-09-26 12:34:05 | 600 | 199.78 | 6 | 9.73 | 3.00 | 0.03 | 29.24 | 600 | 0|
-------------|-------------------------------------------------------------------------------------------------------------------------|
|No |2013-09-26 12:59:42 | 1496 | 510.40 | 17 | 9.97 | 2.93 | 0.03 | 29.23 | 1496 | 4|
-------------|-------------------------------------------------------------------------------------------------------------------------|
|Yes |2013-09-26 12:46:05 | 1500 | 491.98 | 17 | 9.59 | 3.05 | 0.03 | 29.24 | 1500 | 0|
-------------|-------------------------------------------------------------------------------------------------------------------------|
There was a certain amount of variance; however, the good thing was that there was always some improvement, however miniscule be it.
So we can use both.
The only extra overhead here is disabling and recompiling after module changes.

Resources