How does Google Desktop Search manage to stay light and fast? - performance

I always wondered what different methods Google Desktop Search is using so that it uses least CPU and memory while indexing a computer containing more 100,000 files on an average.
In just few hours it has indexed the whole system and I did not see it eating up my CPU, memory etc.
If any of you have done some research, please do share.

The trick is simple: It starts to work then very soon stops and just sits there in in memory, doing nothing. Of course it's then totally useless but at least, it keeps light and fast. Sorry, couldn't resist :-) I Switched to Windows Search 4.0 and I'm much happier about it.

It doesn't...
I installed it on one computer, and quickly removed it because it was intrusive (although this can be probably configured) and hungry (particularly on a low end PC).
It is installed on a laptop near me right now, and if I compare it to a couple of small utilities I run permanently (SlickRun, CLCL, my AutoHotkey script...) it uses more than 10 times their CPU and 5 to 20 times their memory. Times two, since, for some reason, I have one instance running another, plus the ToolbarNotifier (less hungry).
Even Trend Micro anti-virus uses less memory and CPU.
Perhaps I will try it again when I will get a more modern PC with lot of memory, but right now I am happy enough with some grep utilities, even if they are slower.

Take a look at disk usage. If you build many keys/indexes you will use lots of disk space and the searches will be fast.
For example;
30 gig drive 75% used. 3.6 gig used for 2 instances of Google Desktop. (roaming profiles suck)

Once it has done the initial index, and written it to disc, it doesn't need to anything.
Searching using the index will require very little resources, the only thing that will is indexing new or modified files..

Related

Why do you use the keyword delete?

I understand that delete returns memory to the heap that was allocated of the heap, but what is the point? Computers have plenty of memory don't they? And all of the memory is returned as soon as you "X" out of the program.
Example:
Consider a server that allocates an object Packet for each packet it receives (this is bad design for the sake of the example).
A server, by nature, is intended to never shut down. If you never delete the thousands of Packet your server handles per second, your system is going to swamp and crash in a few minutes.
Another example:
Consider a video game that allocates particles for the special effect, everytime a new explosion is created (and never deletes them). In a game like Starcraft (or other recent ones), after a few minutes of hilarity and destruction (and hundres of thousands of particles), lag will be so huge that your game will turn into a PowerPoint slideshow, effectively making your player unhappy.
Not all programs exit quickly.
Some applications may run for hours, days or longer. Daemons may be designed to run without cease. Programs can easily consume more memory over their lifetime than available on the machine.
In addition, not all programs run in isolation. Most need to share resources with other applications.
There are a lot of reasons why you should manage your memory usage, as well as any other computer resources you use:
What might start off as a lightweight program could soon become more complex, depending on your design areas of memory consumption may grow exponentially.
Remember you are sharing memory resources with other programs. Being a good neighbour allows other processes to use the memory you free up, and helps to keep the entire system stable.
You don't know how long your program might run for. Some people hibernate their session (or never shut their computer down) and might keep your program running for years.
There are many other reasons, I suggest researching on memory allocation for more details on the do's and don'ts.
I see your point, what computers have lots of memory but you are wrong. As an engineer you have to create programs, what uses computer resources properly.
Imagine, you made program which runs all the time then computer is on. It sometimes creates some objects/variables with "new". After some time you don't need them anymore and you don't delete them. Such a situation occurs time to time and you just make some RAM out of stock. After a while user have to terminate your program and launch it again. It is not so bad but it not so comfortable, what is more, your program may be loading for a while. Because of these user feels bad of your silly decision.
Another thing. Then you use "new" to create object you call constructor and "delete" calls destructor. Lets say you need to open so file and destructor closes it and makes it accessible for other processes in this case you would steel not only memory but also files from other processes.
If you don't want to use "delete" you can use shared pointers (it has garbage collector).
It can be found in STL, std::shared_ptr, it has one disatvantage, WIN XP SP 2 and older do not support this. So if you want to create something for public you should use boost it also has boost::shared_ptr. To use boost you need to download it from here and configure your development environment to use it.

Give all possible resources to a program

I created a program in C# to work with 2.5 million records in Oracle Express (local instance), parse/split those records and create an additional 5 million records.
I added some code to print times on the screen and it seems fairly fast. It is doing all the processing for 1K records every 9 seconds. Which means it takes more than 6 hours to finish.
Now, with Task Manager I can see the program is using 6% of CPU (max) and around 50MB of memory. I understand the OS, and Oracle itself need resources to operate but..... is there a way to tell this little program "hey, it's ok, go ahead and use at least 50% of CPU, there are 4GB of RAM so knock yourself out"?
Note: One of the reasons I'm using a local instance with Oracle Express is to reduce the network bottleneck. Also I might not run this process quite often but I was intrigued to see if this was at all possible.
Please forgive my noobness,
Thanks!
The operating system will give your program all the resources it needs, the reason your process is not consuming all the CPU is probably because it's waiting for the IO sub system more than the processor.
If you want to see if you can consume more CPU cycles try writing a program that runs a short infinite loop as fast as possible and you will see the difference in CPU usage.
A number of thoughts, not really answers I guess, but.
You could up the priority of the applications thread, however, its possible that the code maybe less efficient than you think, so..
Have you run a profiler on it?
If its currently a single threaded app, you could look to see if you could parse it in batches and therefore run them in parallel.
Without knowing a lot of detail of the splitting of records, is it possible to off hand that more to oracle to do? eg, would matter less about network etc or local or otherwise.
If you're apps drawing/updating a screen or UI then it will almost certainly slow the progress of the work down. An example. I ran an app which sorted about 10k emails into around 250k lines into a database, if I added an item to a listbox each line the time went from short to rediculous eg, crash out got bored. So, again, offloading to a thread to do the work with as few UI updates to do as possible can help.

Proving that replacing hardware will improve developer performance

Now the machines we are forced to use are 2GB Ram, Intel Core 2 Duo E6850 # 3GHz CPU...
The policy within the company is that everyone has the same computer no matter what and that they are on a 3 year refresh cycle... Meaning I will have this machine for the next 2 years... :S
We have been complaining like crazy but they said they want proof that upgrading the machines will provide exactly X time saving before doing anything... And with that they are only semi considering giving us more RAM...
Even when you put forward that developer resources are much more expensive than hardware, they firstly say go away, then after a while they say prove it. As far as they are concerned paying wages comes from a different bucket of money to the machines and that they don't care (i.e. the people who can replace the machines, because paying wages doesn't come from their pockets)...
So how can I prove that $X benefit will be gained by spending $Y on new hardware...
The stack I'm working with is as follows: VS 2008, SQL 2005/2008. As duties dictate we are SQL admins as well as Web/Winform/WebService Developers. So its very typical to have 2 VS sessions and at least one SQL session open at the same time.
Cheers
Anthony
Actually, the main cost for your boss is not the lost productivity. It is that his developers don't enjoy their working conditions. This leads to:
loss of motivation and productivity
more stress causing illness
external opportunities causing developers to go away
That sounds like a decent machine for your stack. Have you proven to yourself that you're going to get better performance, using real-world tests?
Check with your IT people to see if you can get the disks benchmarked, and max out the memory. Management should be more willing to take these incremental steps first.
The machine looks fine apart from the RAM.
If you want to prove this sort of thing time all the things you wait for (typically load times and compile times), add it all up and work how much it costs you to sit around. From that make some sort of guess how much time you'll save (it'll have to be a guess unless you can compare like with like, which is difficult if they won't upgrade your systems). You'll probably find that they'll make the money back on the RAM at least in next to no time - and that's before you even begin to factor in the loss of productivity from people's minds wandering whilst they wait for stuff to happen.
Unfortunately if they're skeptical then it's unlikely you can prove it to them in a quantitative way alone. Even if you came up with numbers, they'll probably question the methodology. I suggest you see if they're willing to watch a 10 minute demo (maybe call it a presentation), and show them the experience of switching between VS instances (while explaining why you'd need to switch and how often), show them the build process (again explaining why you'd need to create a build and how often), etc.
Ask them if you're allowed to bring your own hardware. If you're really convinced it would make you more productive, upgrade it yourself and when you start producing more ask for a raise or to be reimbursed.
Short of that though..
I have to ask: what else are you running? I'm not really that familiar with that stack, but it really shouldn't be that taxing. Are they forcing you to run some kind of system-slowing monitoring or antivirus app?
You'd probably have better luck convincing them to let you change that than getting them to roll out new updates.
If you really must convince them, your best bet is to benchmark your machine as accurately as you can and price out exactly what you need upgraded. Its a lot easier to get them to agree to an exact (and low) dollar amount than some open-ended upgrade
Even discussion this with them for more than five minutes will cost more than just calling out to your local PC dealer and buy the RAM out of your own pocket. Ask you project lead whether they can put it on the tab of the project as another "development tool". If (s)he can't, don't bother and cough up the
When they come complaining, then put the time of the meetings for this on their budget (since they come crying). See how long they can take this.
When we had the same issue, my boss bought better gfx cards for the whole team out of his own pockets and went to the PC guys to get each of us a second monitor. A few days later, he went again to get each of us 2GB more RAM, too.
The main cost from slow developer machines comes from the slow builds and the 'context switching', ie the time that it takes you to switch between the tasks required of you:
Firing up the second instance of VS and waiting for it to load and build
Checking out or updating a source tree
Starting up another instance of VS or checking out a clean source tree to 'have a quick look at' some bug that's been assigned
Multiple build/debug cycles to fix difficult bugs
The mental overhead in switching between different tasks, which shouldn't be underestimated
I made a case a while ago for new hardware after doing a breakdown of the amount of time that was wasted waiting for the machine to catch up. In a typical day we might need to do 2 or 3 full builds at half an hour each. The link time was around 3 minutes, and in a build/debug cycle you might do that 40 times a day. So that's 3.5 hours a day waiting for the machine. The bulk of that is in small 2 or 3 minute pockets which isn't long enough for you to context switch and do something else. It's long enough to check your mail, check stackoverflow, blow your nose and that's about it. So there's nothing else productive you can do with that time.
If you can show that a new machine will build the full project in 15 minutes and link in 1 minute then that's theoretically given you an extra 2 hours of productivity a day (or more realistically, the potential for more build cycles).
So I would get some objective timings that show how long it takes for different parts of your work cycle, then try to do comparative timings on machines with 4GB of RAM, a second drive (eg something fast like a WD Raptor), an SSD, whatever, to come up with some hard figures to support your case.
EDIT: I forgot to mention: present this as your current hardware is making you lose productivity, and put a cost on the amount of time lost by multiplying it by a typical developer hourly rate. On this basis I was able to show that a new PC would pay for itself in about month.
Take a task you do regularly that would be improved with faster hardware - ex: running the test suite, running a build, booting and shutting down a virtual machine - and measure the time it takes with current hardware and with better hardware.
Then compute the monthly, or yearly cost: how many times per month x time gained x hourly salary, and see if this is enough to make a case.
For instance, suppose you made $10,000/month, and gained 5 minutes a day with a better machine, the loss to your company per month would be around (5/60 hours lost a day) x 20 work days/month x $10,000 / 8 hours/day = $105 / month. Or about $1200/year lost because of the machine (assuming I didn't mess up the math...). Now before talking to your manager, think about whether this number is significant.
Now this is assuming that 1) you can measure the improvement, even though you don't have a better machine, and 2) while you are "wasting" your 5 minutes a day, you are not doing anything productive, which is not obvious.
For me, the cost of a slow machine is more psychological, but it's hard to quantify - after a few days of having to wait for things to happen on the PC, I begin to get cranky, which is both bad for my focus, and my co-workers!
It’s easy; hardware is cheap, developers are expensive. Throwing reasonable amounts of money at the machinery should be an absolute no brainer and if your management doesn’t understand that and won’t be guided by your professional opinion then you might be in the wrong job.
As for your machine, throw some more RAM at it and use a fast disk (have a look at how intensive VS is on disk IO using the resource monitor – it’s very hungry). Lots of people going towards 10,000 RPM or even SSD these days and they make a big difference to your productivity.
Try this; take the price of the hardware you need (say fast disk and more RAM), split it across a six month period (a reasonable time period in which to recoup the investment) and see what it’s worth in “developer time” each day. You’ll probably find it only needs to return you a few minutes a day to pay for itself. Once again, if your management can’t understand or support this then question if you’re in the right place.

Is it reasonable for modern applications to consume large amounts of memory?

Applications like Microsoft Outlook and the Eclipse IDE consume RAM, as much as 200MB. Is it OK for a modern application to consume that much memory, given that few years back we had only 256MB of RAM? Also, why this is happening? Are we taking the resources for granted?
Is it acceptable when most people have 1 or 2 gigabytes of RAM on their PCS?
Think of this - although your 200mb is small and nothing to worry about given a 2Gb limit, everyone else also has apps that take masses of RAM. Add them together and you find that the 2Gb I have very quickly gets all used up. End result - your app appears slow, resource hungry and takes a long time to startup.
I think people will start to rebel against resource-hungry applications unless they get 'value for ram'. you can see this starting to happen on servers, as virtualised systems gain popularity - people are complaining about resource requirements and corresponding server costs.
As a real-world example, I used to code with VC6 on my old 512Mb 1.7GHz machine, and things were fine - I could open 4 or 5 copies along with Outlook, Word and a web browser and my machine was responsive.
Today I have a dual-processor 2.8Ghz server box with 3Gb RAM, but I cannot realistically run more than 2 copies of Visual Studio 2008, they both take ages to start up (as all that RAM still has to be copied in and set up, along with all the other startup costs we now have), and even Word take ages to load a document.
So if you can reduce memory usage you should. Don't think that you can just use whatever bloated framework/library/practice you want with impunity.
http://en.wikipedia.org/wiki/Moore%27s_law
also:
http://en.wikipedia.org/wiki/Wirth%27s_law
There's a couple of things you need to think about.
1/ Do you have 256M now? I wouldn't think so - my smallest memory machine is 2G so a 200M application is not much of a problem.
2a/ That 200M you talk about might not be "real" memory. It may just be address space in which case it might not all be in physical memory at once. Some bits may only be pulled in to physical memory when you choose to do esoteric things.
2b/ It may also be shared between other processes (such as a DLL). This means it could be only held in physical memory as one copy but be present in the address space of many processes. That way, the usage is amortized over those many processes. Both 2a and 2b depend on where your figure of 200M actually came from (which I don't know and, running Linux, I'm unlikel to find out without you telling me :-).
3/ Even if it is physical memory, modern operating systems aren't like the old DOS or Windows 3.1 - they have virtual memory where bits of applications can be paged out (data) or thrown away completely (code, since it can always reload from the executable). Virtual memory gives you the ability to use far more memory than your actual physical memory.
Many modern apps will take advantage of the existance of more memory to cache more. Some like firefox and SQL server have explicit settings for how much memory they will use. In my opinion, it's foolish to not use available memory - what's the point of having 2GB of RAM if your apps all sit around at 10MB leaving 90% of your physical memory unused. Of course, if your app does use caching like this, it better be good at releasing that memory if page file thrashing starts, or allow the user to limit the cache size manually.
You can see the advantage of this by running a decent-sized query against SQL server. The first time you run the query, it may take 10 seconds. But when you run that exact query again, it takes less than a second - why? The query plan was only compiled the first time and cached for use later. The database pages that needed to be read were only loaded from disk the first time - the second time, they were still cached in RAM. If done right, the more memory you use for caching (until you run into paging) the faster you can re-access data. You'll see the same thing in large documents (e.g. in Word and Acrobat) - when you scroll to new areas of a document, things are slow, but once it's been rendered and cached, things speed up. If you don't have enough memory, that cache starts to get overwritten and going to the old parts of the document gets slow again.
If you can make good use of the RAM, it is your responsability to use it.
Yes, it is perfectly normal. Also something big was changed since 256MB were normal... and do not forget that before that 640Kb were supposed to be enough for everybody!
Now most software solutions are build with a garbage collector: C#, Java, Ruby, Python... everybody love them because certainly development can be faster, however there is one glitch.
The same program can be memory leak free with either manual or automatic memory deallocation. However in the second case it is likely for the memory consumption to grow. Why? In the first case memory is deallocated and kept clean immediately after something becomes useless (garbage). However it takes time and computing power to detect that automatically, hence most collectors (except for reference counting) wait for garbage to accumulate in order to make worth the cost of the exploration. The more you wait the more garbage you can sweep with the cost of one blow, but more memory is needed to accumulate that garbage. If you try to force the collector constantly, your program would spend more time exploring memory than working on your problems.
You can be completely sure than as long as programmers get more resources, they will sacrifice them using heavier tools in exchange for more freedom, abstraction and faster development.
A few years ago 256 MB was the norm for a PC, then Outlook consumed about 30 - 35 MB or so of memory, that's around 10% of the available memory, Now PC's have 2 GB or more as a norm, and outlook consumes 200 MB of memory, that's about 10% also.
The 1st conclusion: as more memory is available applications use more of it.
The 2nd conclusion: no matter what time frame you pick there are applications that are true memory hogs (like Outlook) and applications that are very efficient memory wise.
The 3rd conclusion: memory consumption of a app can't go down with time, else 640K would have been enough even today.
It completely depends on the application.

Windows Stalls When My Program Uses Swapfile

I am running a user mode program on normal priority. My program is searching an NP problem, and as a result, uses up a lot of memory which eventually ends up in the swap file.
Then my mouse freezes up, and it takes forever for task manager to open up and let me end the process.
What I want to know is how I can stop my Windows operating system from completely locking up from this even though only 1 out of my 2 cores are being used.
Edit:
Thanks for the replies.
I know that making it use less memory will help, but it just doesn't make sense to me that the whole OS should lock up.
The obvious answer is "use less memory". When your app uses up all the
available memory, the OS has to page the task manager (etc.) out to make room for your app. When you switch programs, the OS has to page the other programs back in (as they are needed).
Disk reads are slower than memory reads, so everything appears to be
going slower.
If you want to avoid this, have your app manage its own memory, or
use a better algorithm than brute force. (There are genetic
algorithms, simulated annealing, etc.)
The problem is that when another program (e.g. explorer.exe) is going to execute, all of its code and memory has been swapped out. To make room for the other program Windows has to first write data that your program is using to disk, then load up the other program's memory. Every new page of code that is executed in the other program requires disk access, causing it to run slowly.
I don't know the access pattern of your program, but I'm guessing it touches all of its memory pages a lot in a random fashion, which makes the problem worse because as soon as Windows evicts a memory page from your program, suddenly you need it again and Windows has to find some other page to give the same treatment.
To give other processes more RAM to live in, you can use SetProcessWorkingSetSize to reduce the maximum amount of RAM that your program may use. Of course this will make your program run more slowly because it has to do more swapping.
Another alternative you could try is to add more drives to the system, and distribute the swap files over those. You may have a dual-core CPU, but you have only a single drive. Distributing the swap file over multiple drives allows Windows to balance work across them (although I don't have first-hand experience of how well it does this).
I don't think there's a programming answer to this question, aside from "restructure your app to use less memory." The swapfile problem is most likely due to the bottleneck in accessing the disk, especially if you're using an IDE HDD or a highly fragmented swapfile.
It's a bit extreme, but you could always minimise your swap file so you don't have all the disk thashing, and your program isn't allowed to allocate much virtual memory. Under Control panel / Advanced / Advanced tab / Perfromance / Virtual memory, set the page file to custom size and enter a value of 2mb (smallest allowed on XP). When an allocation fails, you should get an exception and be able exit gracefully. It doesn't quite fix your problem, just speeds it up ;)
Another thing worth considering would be if you are ona 32bit platform, port to a 64bit system and get a box with much more addressable RAM.

Resources