How to improve the performance of write data into registry? - windows

I am working on performance optimizing for our legacy application. It use VC++ 2008, OS is WindowsXP or above.
In installation, it will parse a file and write some information about the file into registry.
With the files count increasing, the installation need very long time.
I try to comment the code that write to registry, and it will reduce the installation time sharply.
But we can't remove the registry action.
In old code, it will use RegCreateKey and RegSetValueEx to set the registry data.
So I try another method, I write the data into a file, and call the function like "regedit /s /c aaa.tmp" to import the file.
It will reduce some time, but not significant.
Could you suggust me some method could try?
Many thanks,
Well

If you're writing data to the registry frequently enough that performance matters to your user, then you're doing too much of that, and the data should rather be written into a simple file.

Registry I/O was never well optimized, as it's not meant to be done often. However, your customers may benefit a little from using products such as RegClean to reduce the size of their registries.

You might be using too many keys or opening and closing them too many times. Storing all your values under one key and keeping it open until you finish writing to it might improve performance.

Related

Store the state inside golang binary

I am Developing an onpremise solution for a client without any control and internet connection on the machine.
The solution is to be monetized based on number of allowed requests(REST API calls) for a bought license. So currently we store the request count in an encrypted file on the file system itself. But this solution is not perfect as the file can be copied somewhere and then replaced when the requests quota is over. Also if the file is deleted then there's manual intervention needed from support.
I'm looking for a solution to store the state/data in binary and update it runtime (consider usage count that updates in binary itself)
Looking for a better approach.
Also binary should start from the previous stored State
Is there a way to do it?
P.S. I know writing to binary won't solve the issue but I think it'll increase the difficulty by increasing number of permutation and combinations for places where the state can be stored and since it's not a common knowledge that you can change the executable that would be the last place to look for the state if someone's trying to mess with the system (security by obscurity)
Is there a way to do it?
No.
(At least no official, portable way. Of course you can modify a binary and change e.g. the data or BSS segment, but this is hard, OS-dependent and does not solve your problem as it has the same problem like an external file: You can just keep the original executable and start over with that one. Some things simply cannot be solved technically.)
If your rest API is within your control and is the part that you are monetizing surely this is the point at which you would be filtering the licensed perhaps some kind of certificate authentication or key to the API and then you can keep then count on the API side that you can control and then it wont matter if it is in a flat file or a DB etc, because you control it.
Here is a solution to what you are trying to do (not to writing to the executable which) that will defeat casual copying of files.
A possible approach is to regularly write the request count and the current system time to file. This file does not even have to be encrypted - you just need to generate a hash of the data (eg using SHA2) and sign it with a private key then append to the file.
Then when you (re)start the service read and verify the file using your public key and check that it has not been too long since the time that was written to the file. Note that some initial file will have to be written on installation and your service will need to be running continually - only allowing for brief restarts. You also would probably verify that the time is not in the future as this would indicate an attempt to circumvent the system.
Of course this approach has problems such as the client fiddling with the system time or even debugging your code to find the private key and probably others. Hopefully these are hard enough to act as a deterrent. Also if the service or system is shut down for an extended period of time then some sort of manual intervention would be required.

Windows Task Manager Save or Graph a Process

I've been monitoring some memory usage by specific processes through Windows Task Manager for awhile and at the moment I've just been taking down a bunch of values into a notepad file. This seems pretty inefficient and also not nearly as easy to glean information from at a glance. Plus, it may not necessarily be accurate as with how much the processes change I definitely have missed some information. I was wondering if anyone knew of a tool that could be used to save process information to a file, or maybe even graph specific process memory usage. I don't think it would be worth me writing my own script since I have no idea how to access the task manager programmatically, and I feel like something already must be out there like what I'm looking for. If anyone knows of anything that would be nice.
Use perfmon. It can monitor memory usage metrics for a process and can export the results to a file.
To get a list of all processes with some interesting numbers like working set, private bytes, CPU usage, ... you can enter on a command prompt simply:
wmic process
You can do this on a regular basis and then make some graph out of it. The text file is rather simple a fixed width column format which is easy to parse.
Yours,
Alois Kraus

5GB file to read

I have a design question. I have a 3-4 GB data file, ordered by time stamp. I am trying to figure out what the best way is to deal with this file.
I was thinking of reading this whole file into memory, then transmitting this data to different machines and then running my analysis on those machines.
Would it be wise to upload this into a database before running my analysis?
I plan to run my analysis on different machines, so doing it through database would be easier but if I increase the number machines to run my analysis on the database might get too slow.
Any ideas?
#update :
I want to process the records one by one. Basically trying to run a model on a timestamp data but I have various models so want to distribute it so that this whole process run over night every day. I want to make sure that I can easily increase the number of models and not decrease the system performance. Which is why I am planning to distributing data to all the machines running the model ( each machine will run a single model).
You can even access the file in the hard disk itself and reading a small chunk at a time. Java has something called Random Access file for the same but the same concept is available in other languages also.
Whether you want to load into the the database and do analysis should be purely governed by the requirement. If you can read the file and keep processing it as you go no need to store in database. But for analysis if you require the data from all the different area of file than database would be a good idea.
You do not need the whole file into memory, just the data you need for analysis. You can read every line and store only the needed parts of the line and additionally the index where the line starts in file, so you can find it later if you need more data from this line.
Would it be wise to upload this into a database before running my analysis ?
yes
I plan to run my analysis on different machines, so doing it through database would be easier but if I increase the number machines to run my analysis on the database might get too slow.
don't worry about it, it will be fine. Just introduce a marker so the rows processed by each computer are identified.
I'm not sure I fully understand all of your requirements, but if you need to persist the data (refer to it more than once,) then a db is the way to go. If you just need to process portions of these output files and trust the results, you can do it on the fly without storing any contents.
Only store the data you need, not everything in the files.
Depending on the analysis needed, this sounds like a textbook case for using MapReduce with Hadoop. It will support your requirement of adding more machines in the future. Have a look at the Hadoop wiki: http://wiki.apache.org/hadoop/
Start with the overview, get the standalone setup working on a single machine, and try doing a simple analysis on your file (e.g. start with a "grep" or something). There is some assembly required but once you have things configured I think it could be the right path for you.
I had a similar problem recently, and just as #lalit mentioned, I used the RandomAccess file reader against my file located in the hard disk.
In my case I only needed read access to the file, so I launched a bunch of threads, each thread starting in a different point of the file, and that got me the job done and that really improved my throughput since each thread could spend a good amount of time blocked while doing some processing and meanwhile other threads could be reading the file.
A program like the one I mentioned should be very easy to write, just try it and see if the performance is what you need.
#update :
I want to process the records one by one. Basically trying to run a model on a timestamp data but I have various models so want to distribute it so that this whole process run over night every day. I want to make sure that I can easily increase the number of models and not decrease the system performance. Which is why I am planning to distributing data to all the machines running the model ( each machine will run a single model).

Is RegNotifyChangeKeyValue as coarse as it seems?

I've been using ReadDirectoryChangesW to monitor a particular portion of the file system. It rather nicely provides a partial pathname to the file or directory which changed along with a clue about the nature of the change. This may have spoiled me.
I also need to monitor a particular portion of the registry, but it looks as if RegNotifyChangeKeyValue is very coarse. It will tell me that something under the given key changed, but it doesn't seem to want to tell me what that something might have been. Bummer!
The portion of the registry in question is arbitrarily deep, so enumerating all the sub-keys and calling RegNotifyChangeKeyValue for each probably isn't a hot idea because I'll eventually end up having to overcome MAXIMUM_WAIT_OBJECTS. Plus I'd have to adjust the set of keys I'd passed to RegNotifyChangeKeyValue, which would be a fair amount of effort to do without enumerating the sub-keys every time, which would defeat a fair amount of the purpose.
Any ideas?
Unfortunately, yes. You probably have to cache all the values of interest to your code, and update this cache yourself whenever you get a change trigger, or else set up multiple watchers, one on each of the individual data items of interest. As you noted the second solution gets unwieldy very quickly.
If you can implement the required code in .Net you can get the same effect more elegantly via RegistryEvent and its subclasses.

Trying to write a program / library like LogParser - How does it work internally?

LogParser isn't open source and I need this functionality for an open source project I'm working on.
I'd like to write a library that allows me to query huge (mostly IIS) log files, preferably with Linq.
Do you have any links that could help me? How does a program like LogParser work so fast? How does it handle memory limitations?
It probably process the information in the log as it reads it. This means it (the library) doesn't have to allocate a huge amount of memory to store the information. It can read a chunk, process it and throw it away. It is a usual and very effective way to process data.
You could for example work line by line and parse each line. For the actual parsing you can write a state machine or if the requirements allows it, use regex.
Another approach would be a state machine that both reads and parses the data. If for some reason a log entry spans more than one line this might be needed.
Some state machine related links:
A very simple state machine written in C: http://snippets.dzone.com/posts/show/3793
Alot of python related code, but some sections are universally applicable: http://www.ibm.com/developerworks/library/l-python-state.html
If your aim is to query IIS log data with LINQ. Then i suggest you to move the Raw IIS Log data to database and query the database using LINQ. This blog post might help.
http://getsrirams.blogspot.in/2012/07/migrate-iislog-data-to-sqlce-4-database.html

Resources