How to optimize cppcheck static analysis - static-analysis

I want to run cppcheck with the shortest possible runtime.
Suppose I have a directory with over 3,000 .c, .cpp, and .h files.
Is there a difference in performance if I run a separate cppcheck command on each file individually vs. giving cppcheck the root path to all of the files?
If there are several core files, which are included by all of the other files, will this cause a performance hit since the core file will have to be loaded and analyzed separately for each file being analyzed?
On the other hand, if the files are being analyzed individually (and not by giving cppcheck a root directory), then this means that I can analyze several files simultaneously using threads.

The --cppcheck-build-dir can speedup the analysis a lot. After preprocessing, Cppcheck generates a hash and compares that to the old hash. If the old and new hash are the same the old results will be reused. I believe the speedup is ~10x.
If you compile Cppcheck yourself then ensure MATCHCOMPILER is used. The speedup is ~2x.
I have the feeling that analysis in Linux is faster than in Windows.

Related

How to write a "bidirectional" makefile that keeps two files synchronized?

I'm working with set of binary files that can be "decompiled" to or "compiled" from a set of INI files. Since both the binary and INI files are checked into my repository, I use a small script to (de)compile all of them.
Our workflow usually involves editing the binary files directly, and decompiling the modified binaries to INI format. However, occasionally we need to edit the INI files and compiling the changes to binaries.
The question: Can I make a single makefile that detects which set was modified more recently, and automatically issues (de)compile commands in either direction to keep both set of files up to sync? I prefer using common (GNU?) make features, but if there is a more specialized tool that works, I'm all ears.
(I could make two separate directives, "decompile-all" and "compile-all". I want to know if there's a single-command option.)
I don't see how that can work. Suppose it could be done in make; now you have two files foo.exe and foo.ini (you don't say what your actual filename patterns are). You run make and it sees that foo.exe is newer than foo.ini, so it decompiles the binary to build a new foo.ini. Now, you run make again and this time it sees that foo.ini is newer than foo.exe, because you just built the former, so it compiles foo.ini into foo.exe.
Etc. Every time you run make it will perform an operation on all the files because one or the other will always be out of date.
The only way this could work would be if you (a) tested to see if files did not have exactly identical time last modified times, and (b) had a way to reset the time on the compiled/decompiled file so that it was identical to the file it was built from, rather than "now" which is the default of course.
The answer is that make cannot be used for this situation. You could of course write yourself a small shell script that went through every file and tested that the last modified times were identical or not, and if not compiled them then used touch -m -r origin where origin is the file that had the newer modification time, so that both had the same modification time.

How to rebuild when the recipe has changed

I apollogize if this question has already been asked. It's not easy to search.
make has been designed with the assumption that the Makefile is kinda god-like. It is all-knowing about the future of your project and will never need any modification beside adding new source files. Which is obviously not true.
I used to make all my targets in a Makefile depend on the Makefile itself. So that if I change anything in the Makefile, the whole project is rebuilt.
This has two main limitations :
It rebuilds too often. Adding a linker option or a new source file rebuilds everything.
It won't rebuild if I pass a variable on the command line, like make CFLAGS=-O3.
I see a few ways of doing it correctly, but none of them seems satisfactory at first glance.
Make every target depend on a file that contains the content of the recipe.
Generate the whole rule with its recipe into a file destined to be included from the Makefile.
Conditionally add a dependency to the targets to force them being rebuilt whenever necessary.
Use the eval function to generate the rules.
But all these solutions need an uncommon way of writing the recipes. Either putting the whole rule as a string in a variable, or wrap the recipes in a function that would do some magic.
What I'm looking for is a solution to write the rules in a way as straightforward as possible. With as little additional junk as possible. How do people usually do this?
I have projects that compile for multiple platforms. When building a single project which had previously been compiled for a different architecture, one can force a rebuild manually. However when compiling all projects for OpenWRT, manual cleanup is unmanageable.
My solution was to create a marker identifying the platform. If missing, everything will recompile.
ARCH ?= $(shell uname -m)
CROSS ?= $(shell uname -s).$(ARCH)
# marker for the last built architecture
BUILT_MARKER := out/$(CROSS).built
$(BUILT_MARKER) :
#-rm -f out/*.built
#touch $(BUILT_MARKER)
build: $(BUILT_MARKER)
# TODO: add your build commands here
If your flags are too long, you may reduce them to a checksum.
"make has been designed with the assumption that the Makefile is kinda god-like. It is all-knowing about the future of your project and will never need any modification beside adding new source files."
I disagree. make was designed in a time when having your source tree sitting in a hierarchical file system was about all you needed to know about Software configuration management, and it took this idea to the logical consequence, namely that all that is, is a file (with a timestamp). So, having linker options, locator tables, compiler flags and everything else but the kitchen sink in a file, and putting the dependencies thereof also in a file, will yield a consistent, complete and error-free build environment as far as make is concerned.
This means that passing data to a process (which is nothing else than saying that this process is dependent on that data) has to be done via a file - command line arguments as make variables are an abuse of makes capabilities and lead to erroneous results. make clean is the technical remedy for a systemic misbehaviour. It wouldn't be necessary, had the software engineer designed the make process properly and correctly.
The problem is that a clean build process is hard to design and maintain. BUT: in a modern software process, transient/volatile build parameters such as make all CFLAGS=O3 never have a place anyway, as they wreck all good foundations of config management.
The only thing that can be criticised about make may be that it isn't the be-all-end-all solution to software building. I question if a program with this task would have reached even one percent of makes popularity.
TL;DR
place your compiler/linker/locator options into separate files (at a central, prominent, easy to maintain and understand, logical location), decide about the level of control through the granularity of Information (e.g. Compiler flags in one file, linker flags in another) and put the true dependencies down for all files, and voila, you will have the exactly necessary amount of compilation and a correct build.

In Windows, opening executables and writing output files quickly is failing randomly

I've got an executable that does some structural analysis. It's compiled from old Fortran code, somewhat of a black box. It reads an input file and writes output to the command window.
I've integrated that executable into an Excel VBA macro to do design optimization. My optimization routine does
Write 10 input files in different directories
Call 10 concurrent instances of the executable (each of the 10 instances is from a copied and renamed version of the exe file) and pipe the output to a file
Wait for them all to finish
Read in output files, use the results to generate a new set of designs, and start again.
The executable runs very quickly, less than a second for all the concurrent instances.
This scheme is pretty reliable when I run it on its own. However, I'd like to run multiple optimization jobs concurrently. So imagine 8 or 10 instances of Excel, each running these optimizations concurrently. On my computer, it generally runs fine. On other, nominally identical spec, machines, we're running into problems, where the output file isn't getting created, either because the executable isn't getting called, or is failing to run, or the output is failing to be piped to the results file. I'd welcome suggestions to check for those. This doesn't happen every time, maybe once per 1000 iterations. But it does happen simultaneously across most of the Excel instances and most of the 10 executable calls.
Any idea what is going wrong? It seems like it has something to do with calling so many executables or writing so many files so quickly.

Why do we describe build procedures with Makefiles instead of shell scripts?

Remark This is a variation on the question “What is the purpose
of linking object files separately in a
Makefile?” by user4076675 taking
a slightly different point of view. See also the corresponding META
discussion.
Let us consider the classical case of a C project. The gcc compiler
is able to compile and link programs in one step. We can then easily
describe the build routine with a shell script:
case $1 in
build) gcc -o test *.c;;
clean) rm -f test;;
esac
# This script is intentionally very brittle, to keep
# the example simple.
However, it appears to be idiomatic to describe the build procedure
with a Makefile, involving extra steps to compile each compilation
unit to an object file and ultimately linking these files. The
corresponding GNU Makefile would be:
.PHONY: all
SOURCES=$(wildcard *.cpp)
OBJECTS=$(SOURCES:.cpp=.o)
%.o: %.cpp
g++ -c -o $# $<
all: default
default: $(OBJECTS)
g++ -o test $^
clean:
rm -rf *.o
This second solution is arguable more involved than the simple shell
script we wrote before. It as also a drawback, as it clutters the
source directory with object files. So, why do we describe build
procedures with Makefiles instead of shell scripts? At the hand of
the previous example, it seems to be a useless complication.
In the simple case where we compile and link three moderately sized
files, any approach is likely to be equally satisfying. I will
therefore consider the general case but many benefits of using
Makefiles are only important on larger projects. Once we learned the
best tool which allows us to master complicated cases, we want to use
it in simple cases as well.
Let me highlight the ''benefits'' of using make instead of a simple
shell script for compilation jobs. But first, I would like to make an
innocuous observation.
The procedural paradigm of shell scripts is wrong for compilation-like jobs
Writing a Makefile is similar to writing a shell script with a slight
change of perspective. In a shell script, we describe a procedural
solution to a problem: we can start to describe the whole procedure in
very abstract terms using undefined functions, and we refine this
description until we reached the most elementary level of description,
where a procedure is just a plain shell command. In a Makefile, we do
not introduce any similar abstraction, but we focus on the files we
want to produce and how we can produce them. This works well because
in UNIX, everything is a file, therefore each treatment is
accomplished by a program which reads its input data from input
files, do some computation and write the results in some output
files.
If we want to compute something complicated, we have to use a lot of
input files which are treated by programs whose outputs are used as
inputs to other programs, and so on until we have produced our final
files containing our result. If we translate the plan to prepare our
final file into a bunch of procedures in a shell script, then the
current state of the processing is made implicit: the plan executor
knows “where it is at” because it is executing a given procedure,
which implicitly guarantees that such and such computations were
already done, that is, that such and such intermediary files were
already prepared. Now, which data describes “where the plan executor
is at”?
Innocuous observation The data which describes “where the plan
executor is at” is precisely the set of intermediary files which
were already prepared, and this is exactly the data which is made
explicit when we write Makefiles.
This innocuous observation is actually the conceptual difference
between shell scripts and Makefiles which explains all the advantages
of Makefiles over shell scripts in compilation jobs and similar jobs.
Of course, to fully appreciate these advantages, we have to write
correct Makefiles, which might be hard for beginners.
Make makes it easy to continue an interrupted task where it was at
When we describe a compilation job with a Makefile, we can easily
interrupt it and resume it later. This is a consequence of the
innocuous observation. A similar effect can only be achieved with
considerable efforts in a shell script, while it is just built in
make.
Make makes it easy to work with several builds of a project
You observed that Makefiles will clutter the source tree with object
files. But Makefiles can actually be parametrised to store these
object files in a dedicated directory. I work with BSD Owl
macros for bsdmake and use
MAKEOBJDIR='/usr/home/michael/obj${.CURDIR:S#^/usr/home/michael##}'
so that all object files end under ~/obj and do not pollute my
sources. See this
answer
for more details.
Advanced Makefiles allow us to have simultaneously several directories
containing several builds of a project with distinct compilation
options. For instance, with distinct features enabled, or debug
versions, etc. This is also consequence of the innocuous observation
that Makefiles are actually articulated around the set of intermediary
files. This technique is illustrated in the testsuite of BSD Owl.
Make makes it easy to parallelise builds
We can easily build a program in parallel since this is a standard
function of many versions of make. This is also consequence of the
innocuous observation: because “where the plan executor is at” is an
explicit data in a Makefile, it is possible for make to reason about
it. Achieving a similar effect in a shell script would require a
great effort.
The parallel mode of any version of make will only work correctly if
the dependances are correctly specified. This might be quite
complicated to achieve, but bsdmake has the feature which
literally anhilates the problem. It is called the
META mode. It
uses a first, non-parallel pass, of a compilation job to compute
actual dependencies by monitoring file access, and uses this
information in later parallel builds.
Makefiles are easily extensible
Because of the special perspective — that is, as another consequence
of the innocuous observation — used to write Makefiles, we can
easily extend them by hooking into all aspects of our build system.
For instance, if we decide that all our database I/O boilerplate code
should be written by an automatic tool, we just have to write in the
Makefile which files should the automatic tool use as inputs to write
the boilerplate code. Nothing less, nothing more. And we can add this
description pretty much where we like, make will get it
anyway. Doing such an extension in a shell script build would be
harder than necessary.
This extensibility ease is a great incentive for Makefile code reuse.

How many files is most advised to have in a Windows folder (NTFS)?

we have a project that constitutes a large archive of image files...
We try to split them into sub-folders within the main archive folder.
Each sub-folder contains up to 2500 files in it.
For example:
C:\Archive
C:\Archive\Animals\
C:\Archive\Animals\001 - 2500 files...
C:\Archive\Animals\002 - 2300 files..
C:\Archive\Politics\
C:\Archive\Politics\001 - 2000 files...
C:\Archive\Politics\002 - 2100 files...
Etc... What would be the best way of storing files in such way under Windows ? and why exactly, please ... ?
Later on, the files have their EXIF metadata extracted and indexed for keywords, to be added into a Lucene index... (this is done by a Windows service that lives on the server)
We have an application where we try to make sure we don't store more than around 1000 files in a directory. Under Windows at least, we noticed extreme degradation in performance over this number. The folder can theoretically store up to 4,294,967,295 in Windows 7. Note that because the OS does a scan of the folder, doing lookups and lists very quickly degrades as you add many more files. Once we got to 100,000 files in a folder it was almost completely unusable.
I'd recommend breaking down the animals even further, perhaps by first letter of name. Same with the other files. This will let you separate things out more so you won't have to worry about the directory performance. Best advice I can give is to perform some stress tests on your system to see where the performance starts to tail off once you have enough files in a directory. Just be aware you'll need several thousand files to test this out.

Resources