tuning performance of a program by using different combinations of gcc options

tuning performance of a program by using different combinations of gcc options - gcc

To my memory, there was a project to explore the best combination of gcc options(cflag)
for getting best performance of the program.
If I'm not mistaken, they do it in random test.
Could somebody reminder me about the name of the project.
It is difficult to dig from google since the project was halted.
thanks!

I think you mean ACOVEA (Analysis of Compiler Options via Evolutionary Algorithm).

You probably mean acovea.
http://www.coyotegulch.com/products/acovea/

Related

How can I build an intuitive sense of the relative cost of WebAssembly instructions?

I'm building a simple compiler which emits WebAssembly. As I craft the Wasm that the compiler will emit, there are often multiple ways to implement a given behavior and I'm left unsure which one would be more performant.
For example, there are some cases where I could chain some math instructions to avoid storing/retrieving a value into/out of a variable. When is that tradeoff worth making? Is that even a thing I should be considering?
Obviously the only real answer to that question is "build both and then measure the performance on multiple Wasm interpreters", but that feels infeasible for the number of questions I have. I'm guessing there are some types of instructions which are an order of magnitude more expensive than others, and knowing that would help me make better intuitive decisions.
Are there any rules of thumb for how to think about this? Has anyone written about this? Are there tools which will show me what a given browser/interpreter would compile a snippet of Wasm to?

Sounds like you need something similar to JsPerf but for WASM.
The performance is going to depend on the browser and version you're using and I'm afraid there isn't a good answer here. If you take a look at js performance over time, compilers do help a bit but it is the vendors who can do something meaningful.
Over time we can expect the gap between wasm and native apps to short quite a bit and I suspect the main bottleneck for wasm is the bridge between js and wasm.
There is a paper that may interest you: https://www.usenix.org/system/files/atc19-jangda.pdf

You can optimize you Wasm code to reduce the size of the output. Indeed, a smaller output file can be downloaded faster. This is important if your Wasm code is stored on a web server and executed in client browsers.
Beside this, the Wasm compiler in the browser could optimize the code too. But even in such case, it is not totally pointless to do the optimizations ahead of time in your compiler because the browser putting more burden to the browser will likely often cause it to compile the code to a native binary more slowly. This results in slightly bigger loading times or slightly less responsive web applications.

Parallel STL algorithms in OS X

I working on converting an existing program to take advantage of some parallel functionality of the STL.
Specifically, I've re-written a big loop to work with std::accumulate. It runs, nicely.
Now, I want to have that accumulate operation run in parallel.
The documentation I've seen for GCC outline two specific steps.
Include the compiler flag -D_GLIBCXX_PARALLEL
Possibly add the header <parallel/algorithm>
Adding the compiler flag doesn't seem to change anything. The execution time is the same, and I don't see any indication of multiple core usage when monitoring the system.
I get an error when adding the parallel/algorithm header. I thought it would be included with the latest version of gcc (4.7).
So, a few questions:
Is there some way to definitively determine if code is actually running in parallel?
Is there a "best practices" way of doing this on OS X? (Ideal compiler flags, header, etc?)
Any and all suggestions are welcome.
Thanks!

See http://threadingbuildingblocks.org/
If you only ever parallelize STL algorithms, you are going to disappointed in the results in general. Those algorithms generally only begin to show a scalability advantage when working over very large datasets (e.g. N > 10 million).
TBB (and others like it) work at a higher level, focusing on the overall algorithm design, not just the leaf functions (like std::accumulate()).

Second alternative is to use OpenMP, which is supported by both GCC and
Clang, though is not STL by any means, but is cross-platform.
Third alternative is to use Grand Central Dispatch - the official multicore API in OSX, again hardly STL.
Forth alternative is to wait for C++17, it will have Parallelism module.

Fastest math programming language?

I have an application that requires millions of subtractions and remainders, i originally programmed this algorithm inside of C#.Net but it takes five minutes to process this information and i need it faster than that.
I have considered perl and that seems to be the best alternative now. Vb.net was slower in testing. C++ may be better also. Any advice would be greatly appreciated.

You need a compiled language like Fortran, C, or C++. Other languages are designed to give you flexibility, object-orientation, or other advantages, and assume absolutely fastest performance is not your highest priority.
Know how to get maximum performance out of a single thread, and after you have done so investigate sharing the work across multiple cores, for example with MPI. To get maximum performance in a single thread, one thing I do is single-step it at the machine instruction level, to make sure it's not dawdling about in stuff that could be removed.

Some calculations are regular enough to take profit of GPGPUs: recent graphic cards are essentially specialized massively parallel numerical co-processors. For instance, you could code your numerical kernels in OpenCL. Otherwise, learn C++11 (not some earlier version of the C++ standard) or C. And in many cases Ocaml could be nearly as fast as C++ but much easier to code with.
Perhaps your problem can be handled by scilab or R, I did not understand it enough to help more.
And you might take advantage of your multi-core processor by e.g. using Pthreads or MPI
At last, the Linux operating system is perhaps better to deal with massive calculations. It is significant that most super computers use it today.

If execution speed is the highest priority, that usually means Fortran.

Try Julia: its killing feature is being easy to code in a high level concise way, while keeping performances at the same order of magnitude of Fortran/C.

PARI/GP is the best I have used so far. It's written in C.

Try to look at DMelt mathematical program. The program calls Java libraries. Java virtual machine can optimize long mathematical calculations for you.

The standard tool for mathmatic numerical operations in engineering is often Matlab (or as free alternatives octave or the already mentioned scilab).

Minimalist Programming Tools

What tools go well with or help minimalist programming? Examples would be libraries with tight, clean interface and very small size in it's genre.
Techniques, functions or concepts that result in smaller and/or more efficient apps would be great. If you know of any other relevant tools this would help as well.

This may not be quite what you're looking for, but I enjoyed reading A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux, which starts out with basic techniques for reducing bloat, before going into far more detail than I thought possible in order to shave every last byte from an executable!

if not assembler, then almost any Forth.

See colorFORTH - minimal and strange ... best of both worlds :)

Curious: Could LLVM be used for Infocom z-machine code, and if so how? (in general)

Forgive me if this is a silly question, but I'm wondering if/how LLVM could be used to obtain a higher performance Z-Machine VM for interactive fiction. (If it could be used, I'm just looking for some high-level ideas or suggestions, not a detailed solution.)
It might seem odd to desire higher performance for a circa-1978 technology, but apparently Z-Machine games produced by the modern Inform 7 IDE can have performance issues due to the huge number of rules that need to be evaluated with each turn.
Thanks!
FYI: The Z-machine architecture was reverse-engineered by Graham Nelson and is documented at http://www.inform-fiction.org/zmachine/standards/z1point0/overview.html

Yes, it could be. A naïve port of the interpreter to the a compiler could be done relatively easily.
That said, it wouldn't be a big performance win. The problem with any compiler for ZCode or Glulx is that they're both relatively low-level. For instance, Glulx supports indirect jumps and self-modifying code. There's no way to statically compile that into efficient native code. Making it truly fast would require a trace compilation or something similar.

It would certainly be possible (but difficult) to use LLVM as a kind of JIT for Z-machine code, but wouldn't it be easier to simply compile the Inform source directly to a faster language? Eg, C for maximum speed, or .NET or Java if you prefer portability. I would suspect this route would be a lot easier, and better performing, than just jerry-rigging a JIT onto the side of the interpreter.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

tuning performance of a program by using different combinations of gcc options - gcc

I think you mean ACOVEA (Analysis of Compiler Options via Evolutionary Algorithm).

You probably mean acovea. http://www.coyotegulch.com/products/acovea/

Related

How can I build an intuitive sense of the relative cost of WebAssembly instructions?

Parallel STL algorithms in OS X

Fastest math programming language?

Minimalist Programming Tools

Curious: Could LLVM be used for Infocom z-machine code, and if so how? (in general)

Categories

Resources