Is it possible to use different garbage collection policies for Go? [closed] - performance

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
As the title says, I wonder if it is possible to change the GC policy used by the Go?

Not in the sense that you can use entirely different collectors like in Java or so on. This is on purpose; they want to get something that works decently well everywhere and avoid GC parameter tuning becoming a specialty for Go programmers.
The most often used option is GOGC. The default value of 100 essentially lets your program grow to twice the amount of live data it had after the last GC before another collection is triggered. 200 would let the program grow to 3x the live data after last GC, 50 would let it grow to only 1.5x. The exact timing and pacing of the collection is a little more complicated under the hood since 1.5 made GC concurrent, but the idea is still to target peak memory use of ~2x the amount of live data.
Practically speaking, the main use of GOGC I've seen is people increasing it to reduce a program's garbage collection workload when they know they have memory to spare. People have run the Go compiler with GOGC=400 or such for a little speedup, for instance. But note it's far more disastrous to bog down your servers by eating a lot of RAM than to spend a few percent more CPU time GC'ing, so don't go overboard with this.
The other knobs that exist are documented in package runtime. With the current implementation you can force GC's to be stop-the-world again or explicitly trigger potentially-stop-the-world collections with runtime.GC(). Separate from GC knobs, runtime also lets you ReadMemStats and get profiles, which are often useful.
You generally don't want to focus much on the little GC tuning possible; defaults work pretty well and your time is usually better spent thinking about your application.

Related

What is the most limited and expensive resource in a computer? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
What is the most expensive and limited resource in a computer today?
Is it the CPU? Maybe the memory or as I was told the bandwidth (Or something entirely different)?
Does that mean that a computer should do everything to use that resource more efficiently,
including putting more load on other resources?
For example by compressing files, do we put more load on the CPU, so the file can be transmitted over
the network faster?
I think I know the answer to that, but I would like to hear it from someone else, please provide an explanation.
There is a more costly resource that you left out -- Design and Programming.
I answer a lot of questions here. Rarely do I say "beef up the hardware". I usually say "redesign or rewrite".
Most hardware improvements are measured in percentages. Cleaver redesigns are measured in multiples.
A complex algorithm can be replaced by a big table lookup. -- "Speed" vs "space".
"Your search returned 8,123,456 results, here are the first 10" -- You used to see things like that from search engines. Now it says "About 8,000,000 results" or does not even say anything. -- "Alter the user expectations" or "Get rid of the bottleneck!".
One time I was looking at why a program was so slow. I found that 2 lines of code were responsible for 50% of the CPU consumed. I rewrote those 2 lines into about 20, nearly doubling the speed. This is an example of how to focus the effort to efficiently use the programmer.
Before SSDs, large databases were severely dominated by disk speed. SSDs shrank that by a factor of 10, but disk access is still a big problem.
Many metrics in computing have followed Moore's law. But one hit a brick wall -- CPU speed. That has only doubled in the past 20 years. To make up for it, there are multiple CPUs/cores/threads. But that requires much more complex code. Most products punt -- and simply use a single 'cpu'.
"Latency" vs "throughput" -- These two are mostly orthogonal. The former measures elapsed time, which is limited by the speed of light, etc. The latter measures how much data -- fiber optics is much "fatter" than a phone wire.

Calling antiviruses from software to scan in-memory images [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Our system periodically (several times a minute) calls an external service to download images. As security is a priority, we are performing various validations and checks on our proxy service, which interfaces with the external service.
One control we are looking into is anti-malware which is supposed to scan the incoming image and discard it if it contains malware. The problem is that our software does not persist the images (where they can be scanned the usual way) and instead holds them in an in-memory (RAM) cache for a period of time (due to the large volume of images).
Do modern antiviruses offer APIs that can be called by the software to scan a particular in-memory object? Does Windows offer a unified way to call this API across different antivirus vendors?
On a side note, does anybody have a notion of how this might affect performance?
You should contact antivirus manufacturers - Some of them do, but you probably find it tricky to find out the pricing even.
Windows has AMSI which has a stream interface and a buffer interface. I am unaware if it makes a copy of the data in the buffer or scans the buffer as it is.
And it will absolutely wreck your performance, probably.
What might be faster would be to just have some code to assure that they are in fact images that can be read and re-encoded, but then there are obvious problems with re-encoding .jpg images, so maybe just sanity check the header and data with them. This could also be slower. decoding large images is slow, but it would probably catch 0 day exploits targeting libpng/libjpeg better.
Also you could read some horror stories of scanning servers like that being targets of malware in otherwise benign files, though the last one I remember is from last decade.

Website Performance Issue [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 10 months ago.
Improve this question
If a website is experiencing performance issues all of a sudden, what can be the reasons behind it?
According to me database can one reason or space on server can be one of few reasons, I would like to know more about it.
There can be n number of reasons and n depends on your specification
According to what you have specified you can have a look at,
System counters of webserver/appserver like cpu, memory, paging, io, disk
What changes you did to application if any, were those changes performance costly i.e. have a round of analysis on those changes to check whether any improvement is required.
If system counters are choking then check which one is bottleneck and try to resolve it.
Check all layers/tiers of application i.e. app server, database, directory etc.
if database is bottleneck then identify costly queries and apply indexes & other DB tuning
If app server is choking then, you need to identify & improve the method which is resource heavy.
Performance tuning is not a fast track process, it takes time, identify bottlenecks and try to solve it and repeat the process until you get desired performance.

Measuring performances and scalability of mpi programs [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to measure scalability and performances of one mpi program I wrote. Till now I used the MPI_Barrier function and the stopwatch library in order to count the time. The thing is that the computation time depends a lot on the current use of my cpu and ram so all the time I get different results. Moreover my program runs on a virtual machine vmware which I need in order to use Unix.
I wanted to ask...how can I have an objective measure of the times? I want to see if my program has a good scalability or not.
In general, the way most people measure time in their MPI programs is to use MPI_WTIME since it's supposed to be a portable way to get the system time. That will give you a decent realtime result.
If you're looking to measure CPU time instead of real time, that's a very different and much more difficult problem. Usually the way most people handle that is to run their benchmarks on an otherwise quiet system.

What environment do I need for Testing Big Data Frameworks? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
As part of my thesis i have to evaluate and test some Big Data Frameworks like Hadoop or Storm. What minimal setup would you recommend to get some relevant Information about Performance and scalability? What Cloud Plattforms would be best suitable for this? Since im evaluating more than one Framework a out of the box PaaS - Solution wouldnt be the best choice. right? Whats the minimal number of nodes/servers to get some relevant Information? The cheaper the better, since the company im doing it for wont probably grant me a 20 Machine Cluster ;)
thanks a lot,
kroax
Well, you're definitely going to want at least two physical machines. Anything like putting multiple VMs on one physical machine is out of the question, as then you don't get the network overhead that's typical of distributed systems.
Three is probably the absolute minimum you could get away with as being a realistic scenario. And even then, a lot of the time, the overhead of Hadoop is just barely outweighed by the gains.
I would say five is the most realistic minimum, and a pretty typical small cluster size. 5 - 8 is a good, small range.
As far as platforms go, I would say Amazon EC2/EMR should always be a good first option to consider. It's a well-established, great service, and many real-world clusters are running on it. The upsides are that it's easy to use, relatively inexpensive, and representative of real-world scenarios. The only downside is that the virtualization could cause it to scale slightly differently than individual physical machines, but that may or may not be an issue for you. If you use larger instance types, I believe they are less virtualized.
Hope this helps.

Categories

Resources