Very slow copy of MyISAM .MYD file - windows

We noticed that a few of our MyISAM .MYD files (MySQL databasebase tables) copy extremely slow. Both the C: drive and the D: drive are SSDs; theoretical limit is 500MB / sec data rate. For timings, we turn off the MySQL service. Here are some sample timing for the file test.myd with 6GB:
NET STOP MYSQL56
Step1: COPY D:\MySQL_Data\test.myd C:\Temp --> 61MB / sec copy speed
Step2: COPY C:\Temp\test.myd D:\temp --> 463 MB / sec
Step3: COPY D:\Temp\test.myd c:\temp\test1.myd --> 92 MB / sec
Strange results; why would the speed in one direction be so different from the other direction?
Let's try this:
NET START MYSQL56
in MySQL: REPAIR TABLE test; (took about 6 minutes)
NET STOP MYSQL56
Step4: COPY D:\MySQL_Data\test.myd C:\Temp --> 463 MB / sec
Step5: COPY C:\Temp\test.myd D:\temp --> 463 MB / sec
Step6: COPY D:\Temp\test.myd c:\temp\test1.myd --> 451 MB / sec
Can anybody explain the difference in copy speed?
What might have caused the slow copy speed in the first place?
What would REPAIR make a difference, but OPTIMIZE which we tried
first, did not make a difference.
Would there be any kind of performance hit on the SQL level with the
initial version (ie before the REPAIR)? Sorry, I did not test this
out before running these tests.

REPAIR would scan through the table and fix issues that it finds. This means that the table is completely read.
OPTIMIZE copies the entire table over, then RENAMEs it back to the old name. The result is as if the entire table were read.
COPY reads one file and writes to the other file. If the target file does not exist, it must create it; this is a slow process on Windows.
When reading a file, the data is fetched from disk (SSD, in your case) and cached in RAM. A second reading will use the cached copy, thereby being faster.
This last bullet item may explain the discrepancies you found.
Another possibility is "wear leveling" and/or "erase-before-write" -- two properties of SSDs.
Wear leveling is when the SSD moves things around to avoid too much "wear". Note that a SSD block "wears out" after N writes to it. By moving blocks around, this physical deficiency is avoided. (It is a feature of Enterprise-grade SSDs, but may be missing on cheap drives.)
Before a write can occur on an SSD, the spot must first be "erased". This extra step is simply a physical requirement of how SSDs work. I doubt if it factors into your question, but it might.
I am removing [mysql] and [myisam] tags since the question really only applies to file COPY with Windows and SSD.

Related

HP Fortify. Issues while handling very large fpr reports on fortify server

We have this huge source-code base. We scan it using HP SCA and create a fpr file ( size app 620 MB). Then we upload it to our fortify server using "fortifyclient" command.
After uploading, if i log into the fortify server and go into details of that project, i see that the artifact is in "processing" stage. It remains in processing stage even after few days. There is no way provided on the dashboard using which i can stop /kill/delete it.
Ques 1: Why is it taking so long to process ( We have 1 successfully processed fpr report that took 6 days ). What can we do to make it faster?
Ques 2: If i want to delete a artifact while it in in processing stage, how to do that?
Machine Info:
6 CPUs (Intel(R) Xeon(R) 3.07GHz )
RAM 36 gig
Thanks,
Addition:
We had 1 report that was successfully processed earlier in the month for the same codebase. FPR file for that was of also of similar size (610 MB ) . I can see the issue count for that report. Here it is:
EDIT:
Fortify Version: Fortify Static Code Analyzer 6.02.0014
HP Fortify Software Security Center Version 4.02.0014
Total issues: 157000
Total issues Audited: 0.0%
Critical issues: 4306
High: 151200
Low: 1640
medium: 100
That's a large FPR file, so it will need time to process. SSC is basically unzipping a huge ZIP file (that's what an FPR file is) and then transferring the data into the database. Here are a few things to check:
Check the amount of memory allotted for SSC. You may need to pass up to 16Gb of memory as the Xmx value to handlean FPR that size. Maybe more. The easiest way to tell would be to upload the FPR and then watch the java process that your app server uses. See how long it takes to reach the maximum amount of memory.
Make sure the database is configured for performance. Having the database on a separate server with the data files on another hard drive can significantly speed of processing.
As a last resort, you could also try making the FPR smaller. You can turn off the source rendering so that source code is not bundled with the FPR file. You can do this with this command:
sourceanalyzer -b mybuild -disable-source-bundling
-fvdl-no-snippets -scan -f mySourcelessResults.fpr
As far as deleting an in progress upload, I think you have to let it finish. With some tuning, you should be able to get the processing time down.

ChipScope Error - Did not find trigger mark in buffer

Has anybody mentioned data errors, trigger error or upload errors in ChipScope?
I'm using ChipScope (from ISE 14.7) with the IP core flow. So I created 15 different ICON IP cores as ngc files and wrapped them all in a VHDL module. This module chooses by generic with ngc file should be instantiated. So I can easily choose the number of active VIO/ILA cores.
Currently my project has 2 VIO cores and 5 ILA cores, utilizing circa 190 BlockRAMs on a Kintex-7 325T (>400 BlockRAMs in total). When a trigger event occurs, I get sometimes the warning Did not find trigger mark in buffer. Data buffer may be corrupted. or Data upload error.
This error is independent of the trigger mode (normal trigger event, immediate trigger, startup trigger). It seams to happen mostly on Unit 4 (91 bit data * 32k depth + 3 trigger ports each of 4 units). The upload progress bar can stops at any percentage from 1 to 95% as far as I noticed.
Additionally I get hundreds of these warnings:
Xst - Edge .../TransLayer_ILA2_ControlBus<14> has no source ports and will not be translated to ABC.
My google research: ignore them :)
There is also a bug in XST: This warning has no ID and can't be filtered :(
As of now, I tried to fix this problem:
Reduced / Increased JTAG speed -> no effect (program device is not effected)
recompiled ip core / new ngc file
reduced ILA windows size
So what can it be?
P.S. All timings are met.
I found the problem and a solution.
The problem: I changed one ILA coregenerator file's name and it's contents (modified internal name with an editor). But I missed one parameter so CoreGen generated some sources under the old name. This was still in usage by another ILA core, so one of them got overwritten.
Solution:
I opened every ILA xco file and every cgp file and check all names.

How does HDFS with append works

Let's assume one is using default block size (128 MB), and there is a file using 130 MB ; so using one full size block and one block with 2 MB. Then 20 MB needs to be appended to the file (total should be now of 150 MB). What happens?
Does HDFS actually resize the size of the last block from 2MB to 22MB? Or create a new block?
How does appending to a file in HDFS deal with conccurency?
Is there risk of dataloss ?
Does HDFS create a third block put the 20+2 MB in it, and delete the block with 2MB. If yes, how does this work concurrently?
According to the latest design document in the Jira issue mentioned before, we find the following answers to your question:
HDFS will append to the last block, not create a new block and copy the data from the old last block. This is not difficult because HDFS just uses a normal filesystem to write these block-files as normal files. Normal file systems have mechanisms for appending new data. Of course, if you fill up the last block, you will create a new block.
Only one single write or append to any file is allowed at the same time in HDFS, so there is no concurrency to handle. This is managed by the namenode. You need to close a file if you want someone else to begin writing to it.
If the last block in a file is not replicated, the append will fail. The append is written to a single replica, who pipelines it to the replicas, similar to a normal write. It seems to me like there is no extra risk of dataloss as compared to a normal write.
Here is a very comprehensive design document about append and it contains concurrency issues.
Current HDFS docs gives a link to that document, so we can assume that it is the recent one. (Document date is 2009)
And the related issue.
Hadoop Distributed File System supports appends to files, and in this case it should add the 20 MB to the 2nd block in your example (the one with 2 MB in it initially). That way you will end up with two blocks, one with 128 MB and one with 22 MB.
This is the reference to the append java docs for HDFS.

TortoiseHg is Slow

Basically, what it says on the tin: TortoiseHg is slow.
My team moved from Subversion to Mercurial recently. (In part to take advantage of Kiln for Code Reviews) One of the things we've noticed is that interacting with Mercurial through TortoiseHg is painfully slow. Some stats:
Open TortoiseHg Workbench: 8 minutes 13 seconds
Response time when clicking on a revision: 2.8 seconds
Time to "Refresh Current Repository": 6.4 seconds
Time to check for incoming changes: 12.8 seconds
All this really adds up to a very slow feeling application. For reference, here are the command line tool times:
hg status: 4.573 seconds
hg incoming: 12.150 seconds
The command-line times seem to jive with the workbench times, but the workbench makes the delay much more frustrating, because it is synchronous with the use of the program. For example, a typical task is "get the latest stuff my coworker just pushed". It looks like this (only listing the time spent waiting on the computer, rounded):
Open TortoiseHg: 10 minutes.
Open the appropriate repository by double-clicking in the repository registry: 5 seconds.
Commit local changes that need committing:
Click on "Working Directory": 5 seconds.
Select important files and type a commit message.
Press Commit: 20 seconds.
Get coworker's changes:
Check for incoming changesets: 10 seconds.
Review them.
Accept incoming changesets: 40 seconds.
Shelve unready changes:
Open Shelve dialog: 2 seconds.
Shelve remaining files: 6 minutes
Refresh: 5 seconds.
Merge:
Click the other head: 3 seconds.
Merge with local:
Wait for "Clean" verification: 15 seconds.
Wait for merge (assuming no conflicts): 10 seconds.
Commit: 30 seconds.
Unshelve changes:
Open Shelve dialog: 2 seconds.
Unshelve: 6 minutes.
Refresh: 5 seconds.
Total: 24 minutes, 32 seconds.
Twelve of those minutes are spent shelving and unshelving. Ten are spent just opening. One consequence of this is that people tend to commit stuff they aren't sure will go anywhere just in order to avoid the shelving cost. But even if you assume no shelving and no opening cost (maybe you just leave it open), it still takes 2 and half minutes of meticulous clicking to get the latest stuff.
And that doesn't even count the more significant stuff like cloning and whatnot. Everything is this slow.
I have:
Disabled antivirus.
Disabled indexing.
Rebooted.
Tried it on 3 different versions of windows.
Tried in on varying hardware, most of it reasonable quality: Core 2 Duo #3.16 GHz, 8Gb Ram.
Tried it on 32 and 64 bit OSs.
Tried it disconnected from a network.
The repository is actually two repositories: a primary repo and a sub-repo that contains all our third-party binaries. The .hg folder of the primary repo is 676 MB. The .hg folder of the sub-repo is 641 MB. The contents of default in the primary repo is 7.05 GB. The contents of default in the sub-repo is 642 MB. The average file size in the main repo is 563 KB. The max file size in the main repo is 170 MB. There are 13,438 files in the main repo. The average file size in the sub-repo is 23KB. The max file size in the sub-repo is 132 MB. There are 57087 files in the sub-repo.
I have big-push, caseguard, fetch, gestalt, kbfiles, kiln, kilnauth, kilnpath, mq, purge, and transplant extensions enabled.
Any ideas where to start figuring out how to speed stuff up? The slowness is driving us crazy.
Ok, answering my own question because I found the answer while following Tim's advice.
The culprit is kbfiles from FogCreek. Disabling that dropped stat times from 12 seconds to .7 seconds. Likewise, the GUI opens faster than I can time. Re-enabling it causes everything to slow down drastically again.
It doesn't look like every slow thing can be blamed on kbfiles, but the worst of it can. (Specifically, shelve is still pretty slow -- CPU bound. We can work around that, though.)
That is a ton of files... and some are awfully big. How does it perform without the larger files? Binary files aren't exactly the best thing to track with hg/git, in my humble opinion.
What about breaking the big repo up into smaller ones. Do they really need to be in 2 HUGE repos?
Maybe a defrag on the hard drives could slightly improve some of those times. Also look at the extensions that have been created to help deal specifically with big binary files. See here:
https://www.mercurial-scm.org/wiki/HandlingLargeFiles
In some cases the advice given in the documentation may be useful for improving THG speed:
5.4.8. Performance Implications
There are some Workbench features that could have performance implications in large repositories.
View ‣ Choose Log columns…
Enabling the Changes column can be expensive to calculate on repositories with large working copies, causing both refreshes and scrolling to be slow.
View ‣ Load all
Normally, when the user scrolls through the history, chunks of changesets are read as you scroll. This menu choice allows you to have the Workbench read all the changesets from the repository, probably allowing smoother moving through the history.
In my own experience these are definitely worth doing! You should at least try them and see if there is a noticeable effect.
Also, if you have read Why is mercurial's hg rebase so slow? there is a setting which can speed up rebase significantly:
By default, rebase writes to the working copy, but you can configure
it to run in-memory for better performance, and to allow it to run if
the working copy is dirty. Just add following lines in your .hgrc
file:
[rebase]
experimental.inmemory = True

Is the order of cashed writes preserved in Windows 7?

When writing to a file in Windows 7, Windows will cache the writes by default. When it completes the writes, does Windows preserve the order of writes, or can the writes happen out of order?
I have an existing application that writes continuously to a binary file. Every 20 seconds, it writes a block of data, updates the file's Table of Contents, and calls _commit() to flush the data to disk.
I am wondering if it is necessary to call commit, or if we can rely on Windows 7 to get the data to disk properly.
If the computer goes down, I'm not too worried about losing the most recent 20 seconds worth of data, but I am concerned about making the file invalid. If the file's Table of Contents is updated, but the data isn't present, then the file will not be correct. If the data is updated, but the Table of Contents isn't, then there will be extra data at the end of the file, but since it's not referenced by the Table of Contents, it is ignored when reading the file, and we have a correct file.
The writes will not necessarily happen in order. In particular if there are multiple disk I/Os outstanding, the filesystem/disk driver may reorder the I/O operations to reduce head motion. That means that there is no guarantee that data that is written to disk will be written in the order it was written to the file.
Having said that, flushing the file to disk will stall until the I/O is complete - that may mean several dozen milliseconds (or even longer) of inactivity when you application could be doing something more useful.

Resources