Merging C Callergraphs with Doxygen or determining union of all calls - refactoring

I have a collection of legacy C code which I'm refactoring to split the C computational code from the GUI. This is complicated by the heavily recursive mathematical core code being K&R style declarations. I've already abandoned an attempt to convert these to ANSI declarations due to nested use of function parameters (just couldn't get those last 4 compiler errors to go).
I need to move some files into a pure DLL and determine the minimal interface to make public, which is going to require wrapper functions writing to publish a typed interface.
I've marked up the key source files with the Doxygen #callergraph markup so informative graphs are generated for individual functions. What I'd like to do beyond that is amalgamate these graphs so I can determine the narrowest boundary of functions exposed to the outside world.
The original header files are no use - they expose everything as untyped C functions.
There are hundreds of functions so simple inspection of the generated callergraphs is painful.
I'm considering writing some kind of DOT merge tool - setting DOT_CLEANUP=NO makes Doxygen leave the intermediate DOT files there rather then just retaining the png files they generated.
I'm not obsessed by this being a graphical solution - I'd be quite happy if someone could suggest an alternative analysis tool (free or relatively cheap) or technique using Doxygen's XML output to achieve the same goal.
A callergraph amalgamated at the file level does have a certain appeal for client documentation rather than a plain list :-)

In your Doxyfile, set
GENERATE_XML = YES
and then you can find your call graph in the XML files. For each function marked with the callergraph, you'll find <referencedby> elements in the output that you can use. Here's a sample from one of my C files:
<memberdef kind="function" id="spfs_8c_1a3"
prot="public" static="yes" const="no" explicit="no"
inline="no" virt="non-virtual">
<type>svn_error_t *</type>
<definition>svn_error_t * init_apr</definition>
<argsstring>(apr_pool_t **ppool, int *argc, char const *const **argv)</argsstring>
<name>init_apr</name>
<!-- param and description elements clipped for brevity ... -->
<location file="src/spfs.c" line="26" bodystart="101" bodyend="120"/>
<referencedby refid="spfs_8c_1a13" compoundref="spfs_8c"
startline="45" endline="94">main</referencedby>
</memberdef>
So for every memberdef/referencedby pair, you have a caller-callee relationship, which you can grab via XSLT:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="//memberdef[#kind eq 'function']"/>
</xsl:template>
<xsl:template match="memberdef">
<xsl:variable name="function-name"
select="concat(definition, argsstring)"/>
<xsl:for-each select="referencedby">
<xsl:value-of select="concat(./text(), ' calls ', $function-name, '
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Which gives you a line per caller-callee like this:
main calls svn_error_t * init_apr(apr_pool_t **ppool, int *argc, char const *const **argv)
You'll want to tweak that XSLT and then partition that directed graph in the way that cuts across the fewest edges. Woo hoo, an NP-complete problem! Luckily, there are lots of papers on the subject, some are here: http://www.sandia.gov/~bahendr/partitioning.html

I had a similar requirement.
Wrote a perl script to merge a set of dot files into a single dot file.
https://github.com/bharanis/scripts/blob/master/doxygen_dot_merge.pl
merge multiple doxygen generated dot files.
This is useful for generating a call map for a file or a bunch of files.
1) This command is to be run from outside the html directory where doxygen puts all the html, dot and map files.
2) This command assumes flat directory structure used in doxygen
CREATE_SUBDIRS = NO
3) doxygen prefixes the source filename to the name of the output dot files. One dot file is generated per function
4) provide the list of doxygen generated dot files to be merged.
eg:
./doxydotmerge.pl `ls html/ssd_*_8c*_cgraph.dot | grep -v test | grep -v buf`

You could use Scientific Toolworks to see your system-wide call graph.
If you want to automate the analysis and the cutting, you might consider
the DMS Software Reengineering Toolkit. It can compute full call graphs for C
(complete with points-to analysis for function pointers), and has been
proven for systems of 35 million lines of code. It will
produce full system DOT files to inspect; they are pretty big but you can
pick out subsets to look at. See flow analysis and call graphs.

Related

XPath - Select a node based on sibling node's value while using local-name()

For reasons out of scope for this question I need to be able to handle multiple xml documents of the same structure but belonging to different namespaces (don't ask).
To achieve this I've become very accustomed to using an xpath like the following for many of my value selections:
//*[local-name()='apple']/*[local-name()='flavor']/text()"
My lack of understanding of predicates is preventing me from selecting a node's value based upon a sibling node's value. Consider the following xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fruit>
<apple>
<kind>Red Delicious</kind>
<flavor>starchy</flavor>
</apple>
<apple>
<kind>Granny Smith</kind>
<flavor>tart</flavor>
</apple>
<apple>
<kind>Pink Lady</kind>
<flavor>sweet</flavor>
</apple>
</fruit>
Let's say I want to write an xpath that will select the flavor of a Granny Smith apple. While I would normally do something like:
//apple[kind/text()='Granny Smith']/flavor/text()
I cannot figure out how to merge the concept of utilizing local-name() to be namespace agnostic while still selecting a node based upon a sibling's value.
In short, what is the xpath necessary to return "tart" regardless of what namespace the input fruit xml document belongs to?
I need to be able to handle multiple xml documents of the same
structure but belonging to different namespaces (don't ask)
My preferred way to handle this is to first transform the data to use a single namespace, and then do the transformation proper. Doing it this way (a) keeps the real transformation much simpler, (b) puts all the namespace conversion logic in one place, and (c) makes the namespace conversion logic reusable - you can use the same transformation regardless how the data will subsequently be used.

Binary operation between hexadecimal numbers in Ruby

Is there any simple way in Ruby to operate with Hex numbers?
[Updated 2017-02-23] added background.
I created a ruby parser to analyse C code.
Background: The C code is automatically generated by a Python script that reads from a big configuration file. This Python script uses templates to create C and H files. Basically these C files are configuration for a C project.
The file contains macro definitions, arrays with parameters and operations like:
0X5EEA11 & 0X000100 // checking if the bit 8 is active
Since this code is safety relate the correctness of the code has to be ensure somehow, so I decided to use ruby to parse the generated file and compare them back to the original seeds (configuration files, which are excel lists with thousand of rows)
So I wonder if I have to convert it to binary and check bit a bit if the operation is correct.
I also check the result of the executable comparing that the mask was calculated correctly.
I saw how to convert to hex, but actually those are integers so I dont think I can do binary operation over them as if they were Hex.

Parse and write a content mathML rational number with boost ptree containing sep

I am trying to write and to read/parse MathMl content XML files with boost ptree (property_tree)I cannot seep. I cannot solve to write or to parse this code:
<?xml version="1.0" encoding="utf-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
<cn type="rational">1<sep/>2</cn>
</math>
The problem is the "sep/". When I use get_value() or get() with string or int I get "12". I cannot separate the 1 and the 2. How can I get or write the two separate values "1" and "2".
Boost Property Tree is not an XML parser.
Instead, it's a settings persistency utility, that facilitates to
(de) serialize a certain set of hierarchical data types
to a number of (partly interchangeable) formats
Note that the featureset for each format is not the same.
Specifically for your goal you need a parser that handles mixed-content elements (elements containing both text and sub-elements, mixed). There's a surprising number of XML parsers that don't handle this. Boost Property Tree is (uses?) such a parser.
So, you should look at another library to get you this.
What XML parser should I use in C++?

XSLT Performance Issue

The XSLT transformation is done through dot net code using API provided by Saxon. I am using Saxon 9 home edition api. The XSLT version is 2.0 and generates xml output. The input file size is 123 KB.
The XSLT adds attributes to the input XML file depending on certain scenarios. There are total 7 modes used in this XLST. The value of attribute generated in one mode is used in another mode and hence multiple modes are used.
The output is correctly generated but it takes around 10 second to execute this XSLT. When same XSLT executed in 'Altova XMLSpy 2013', it took around 3-4 seconds.
Is there a way to further reduce this 10 second execution time? What could be the the cause for this much time for execution?
The XSLT is available at below link for download.
XSLT Link
Without having a source document to run this against (and therefore to make measurements) it's very hard to be definitive about where the inefficiencies are, but the most obvious at first glance is the weird testing of element names in patterns like:
match="*[name()='J' or name()='H' or name()='F' or name()='D' or name()='B' or name()='I' or name()='G' or name()='E' or name()='C' or name()='A' or name()='X' or name()='Y' or name()='O' or name()='K' or name()='L' or name()='M' or name()='N']
which in Saxon would be vastly more efficient if written the natural way as
match="J|H|F|D|B|I|G|E|C|A|X|Y|O|K|L|M|N"
It's also more likely to be correct that way, since comparing name() against a string is sensitive to the chosen prefix, and XSLT code really ought to work whatever namespace prefix the source document author has chosen.
The reason the latter is much more efficient is that Saxon organizes the source tree for rapid matching of elements by name (meaning namespace URI plus local name, excluding prefix). When you match by name in this way, the matching template rule can be found by a quick hash table lookup. If you use predicates that have to be evaluated by expanding the name (with prefix) as a string and comparing the string, not only does each comparison take longer, it can't be optimized with a hash lookup.

Can Nokogiri retain attribute quoting style?

Here is the contents of my file (note the nested quotes):
<?xml version="1.0" encoding="utf-8"?>
<property name="eventData" value='{"key":"value"}'/>
in Ruby I have:
file = File.read(settings.test_file)
#xml = Nokogiri::XML( file)
puts "#xml " + #xml.to_s
and here is the output:
<property name="eventData" value="{"key":"value"}"/>
Is there a way to convert it so the output would preserve the quotes exactly? i.e. single on the outside, double on the inside?
No, it cannot. There is no information stored in a Nokogiri::XML::Attr (nor the underlying data structure in libxml2) about what type of quotes were (or should be) used to delimit an attribute. As such, all serialization (done by libxml2) uses the same attribute quoting style.
Indeed, this information is not even properly retained within the XML Information Set, as described by the specs:
Appendix D: What is not in the Information Set
The following information is not represented in the current version of the XML Information Set (this list is not intended to be exhaustive):
[...]
17) The kind of quotation marks (single or double) used to quote attribute values.
The good news is that the two XML serialization styles describe the exact same content. The bad news is that unless you're using a Canonical XML Serialization (which Nokogiri is not yet able to produce just recently able to produce) there are a large variety of ways to represent the same document that would look like many spurious 'changes' to a standard text-diffing tool.
Perhaps if you can describe why you wanted this functionality (what is the end goal you are trying to accomplish?) we could help you further.
You might also be interested in this similar question.

Resources