XSLT optimization: multiple templates or xsl:choose? [closed] - performance

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 12 months ago.
Improve this question
In XSLT, I have a modal template. Applied to certain elements (let's say A|B|C) it should return true. Applied to some other elements (let's say D|E|F) it should return false. For all other elements, it should print an error message.
I could do this with one template, <xsl:template match="*" mode="mymode"> where within, there is an xsl:choose to direct to the desired result.
Or I could do this with three templates, <xsl:template match="A|B|C" mode="mymode">, <xsl:template match="D|E|F" mode="mymode">, <xsl:template match="*" mode="mymode">.
Is there any reason to prefer one approach over the other? For example, is it more efficient to use multiple templates and avoid the xsl:choose? Or vice versa?
Is the answer different if there are more than 3 outcomes? If there are only 2 outcomes?

I can only answer for one processor, namely Saxon. It's an area where different processors are likely to be very different.
Saxon will try to optimize both large collections of template rules, and xsl:choose with large numbers of branches, but the optimisations it is capable of performing are different in the two cases. So if you have something with a 200-way choice, and performance matters to you, then it's worth doing some experiments to see. But with a 3-way choice it's very unlikely the difference will be measurable.
Don't even think about using performance as the criterion for choosing one coding style over another unless you have good reason to believe that it will make a difference to your bottom line. This one won't.

I would certainly prefer using xsl:template match="A | B | C" plus xsl:template match="D | E | F" plus xsl:mode on-no-match="fail" to having to having a single template using a nested xsl:choose/xsl:when approach.
The template matching feels more like the "idiomatic" XSLT approach, I find xsl:choose/xsl:when to be cumbersome/verbose.
As for performance, I don't think it matters for the simple example and in more complex cases I would certainly think to answer that you would need to measure performance with your particular XSLT processor.
In the past, when discussions/questions about whether to use separate templates or a single shot one with nested xsl:choose/xsl:when the majority of the XSLT community in my experience was in favour of the separate template approach, with the main exception being one developer, I think, who thought that in his experience you can ensure your code covers all cases/nodes you need to handle in that single template while it is hard, in a large, over a long time developed code base, to find all relevant templates that might handle your input node. I think that has a point but I would certainly nevertheless prefer a coding style with separate templates and with some coding guidelines establishing you need to keep all templates of a certain mode together in your code. In the future of XSLT 4 we might be allowed to wrap all templates belonging to a mode in e.g. <xsl:mode on-no-match="fail"><xsl:template match="A | B | C">..</xsl:template><xsl:template match="D | E | F">..</xsl:template></xsl:mode> to ensure the templates are kept in a container xsl:mode element together anyway: https://qt4cg.org/branch/master/xslt-40/Overview.html#declaring-modes

Related

Techniques for calculating adjective frequency [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I need to calculate word frequencies of a given set of adjectives in a large set of customer support reviews. However I don't want to include those that are negated.
For example suppose my list of adjectives was: [helpful, knowledgeable, friendly]. I want to make sure "friendly" isn't counted in a sentence such as "The representative was not very friendly."
Do I need to do a full NLP parse of the text or is there an easier approach? I don't need super high accuracy.
I'm not at all familiar with NLP. I'm hoping for something that doesn't have such a steep learning curve and isn't so processor intensive.
Thanks
If all you want is adjective frequencies, then the problem is relatively simple, as opposed to some brutal, not-so-good machine learning solution.
Wat do?
Do POS tagging on your text. This annotates your text with part of speech tags, so you'll have 95% accuracy or more on that. You can tag your text using the Stanford Parser online to get a feel for it. The parser actually also gives you the grammatical structure, but you only care about the tagging.
You also want to make sure the sentences are broken up properly. For this you need a sentence breaker. That's included with software like the Stanford parser.
Then just break up the sentences, tag them, and count all things with the tag ADJ or whatever tag they use. If the tags don't make sense, look up the Penn Treebank tagset (Treebanks are used to train NLP tools, and the Penn Treebank tags are the common ones).
How?
Java or Python is the language of NLP tools. Python, use NLTK. It's easy, well documented and well understood.
For Java, you have GATE, LingPipe and the Stanford Parser among others. It's a complete pain in the ass to use the Stanford Parser, fortunately I've suffered so you do not have to if you choose to go that route. See my google page for some code (at the bottom of the page) examples with the Stanford Parser.
Das all?
Nah, you might want to stem the adjectives too- that's where you get the root form of a word:
cars -> car
I can't actually think of a situation where this is necessary with adjectives, but it might happen. When you look at your output it'll be apparent if you need to do this. A POS tagger/parser/etc will get you your stemmed words (also called lemmas).
More NLP Explanations
See this question.
It depends on the source of your data. If the sentences come from some kind of generator, you can probably split them automatically. Otherwise you will need NLP, yes.
Properly parsing natural language pretty much is an open issue. It works "largely" for English, in particular since English sentences tend to stick to the SVO order. German for example is quite nasty here, as different word orders convey different emphasis (and thus can convey different meanings, in particular when irony is used). Additionally, German tends to use subordinate clauses much more.
NLP clearly is the way to go. At least some basic parser will be needed. It really depends on your task, too: do you need to make sure every one is correct, or is a probabilistic approach good enough? Can "difficult" cases be discarded or fed to a human for review? etc.

Proactively using 'lines of code' (LOC) metric in your software-development process? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Codebase size has a lot to do with complexity of a software system (the higher the size the higher the costs for maintenance and extensions). A way to map codebase size is the simple 'lines of code (LOC)' metric (see also blog-entry 'implications of codebase-size').
I wondered how many of you out there are using this metric as a part for retrospective to create awareness (for removing unused functionality or dead code). I think creating awareness that more lines-of-code mean more complexity in maintenance and extension can be valuable.
I am not taking the LOC as fine grained metric (on method or function level), but on subcomponent or complete product level.
I find it a bit useless. Some kinds of functions - user input handling, for example , are going to be a bit long winded no matter what. I'd much rather use some form of complexity metric. Of course, you can combine the two, and/or any other metrics that take your fancy. All you need is a good tool - I use Source Monitor (with whom I have no relationship other than satisfied user) which is free and can do you both LOC and complexity metrics.
I use SM when writing code to make me notice methods that have got too complex. I then go back and take a look at them. About half the time I say, OK, that NEEDS to be that complicated. What I'd really like is (free) tool as good as SM but which also supports a tag list of some sort which says "ignore methods X,Y & Z - they need to be complicated". But I guess that could be dangerous, which is why I have so far not suggested the feature to SM's author.
I'm thinking it could be used to reward the team when the LOC decreases (assuming they are still producing valuable software and readable code...).
Not always true. While it is usually preferable to have a low LOC, it doesn't mean the code is any less complex. In fact, its usually more-so. Code thats been optimized to get the minimal number of cycles can be completely unreadable, even by the person who wrote it a week later.
As an example from a recent project, imagine setting individual color values (RGBa) from a PNG file. You can do this a bunch of ways, the most compact being just 1 line using bitshifts. This is a lot less readable and maintainable then another approach, such as using bitfields, which would take a structure definition and many more lines.
It also depends on the tool doing the LOC calculations. Does it consider lines with just a single symbol on them as code (Ex: { and } in C-style languages)? That definitely doesn't make it more complex, but does make it more readable.
Just my two cents.
LOCs are easy to obtain and deliver reasonable information whithin one not trivial project. My first step in a new project is always counting LOCs.

If vs Case statements [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Are there any performance differences between using if-else and case statements when handling multiple conditions?
Which is preferred?
Use the one that's most readable in the given context.
In some languages, like C, switch may possibly be faster because it's usually implemented with a jump table. Modern compilers sometimes are smart enough to use one for several ifs as well, though.
And anyway it probably won't matter, micro optimizations are (almost) never useful.
When you have more than one ifelse, I recommend switch/select. Then again, it doesn't always work.
Suppose you something like that (not recommended but for example only)
If a > 0 and b > 0 then
' blabla
ElseIf b = -5 then
' blabla2
ElseIf a = -3 and b = 6 then
End If
Using a switch/select is NOT the way to go. However, when querying for a specific value for a variable like this
select case a
case 1:
' blabla
case 2:
' blabla2
case 3:
' blabla3
case 4:
' blabla4
case else:
end select
In those case, I highly recommend it because it is more readable for other people.
Some programming languages restrict when you can use switch/case statements. For example, in many C-like languages the case values must be constant integers known at compile time.
Performance characteristics may differ between the two techniques, but you're unlikely to be able to predict in advance what they are. If performance is really critical for this code in your application, make sure you profile both approaches before deciding, as the answers may surprise you.
Normally, performance differences will be negligible, and you should therefore choose the most readable, understandable and maintainable code, as you should in pretty much any other programming situation.
case or switch statements are really just special cases of "if .. elseif..." structures that you can use when the same object is being compared to a different simple value in every branch, and that is all that is being done. The nice thing about using them is that most compilers can implement them as jump tables, so effectively the entire 200 (or however many) branch checks can be implemented as a single table indexing operation, and a jump.
For that reason, you'd want to use a case statement when you
can and
have a fairly large number of branches.
The larger the number of "elseif"s, the more attractive a case statement is.
Case statements are generally preferred for readability and are generally faster if there is any speed difference, but this does not apply to every possible environment.
You could probably write a test that shows which is faster for your environment, but be careful with caching and compiler optimizations.
I will add to some of the answers here. This is not a performance question, and if you are really concerned about performance... write a test and see which is faster.
However, this should be a question about which is proper to use, not which is better. If you have multiple if/else statements then do yourself a favor and use a case statement. If it is a simple if/else then use an if/else. You'll thank yourself later.

Is MVC at odds with Agile? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Agile emphasizes quick iterations without wasteful planning.
MVC emphasizes separation of concerns based on a planned architecture.
Since non-MVC technologies require less planning, could they be more appropriate in an Agile project?
Separation of concerns does not necessitate that you plan out every detail before you start coding. And agile does not mean that you just write the code down as it comes to mind. Agile means not being too attached to your initial idea of what the project will look like and to be ready to refactor should the need arise (as it usually does), not being afraid to throw big pieces of code out in the process.
Separation of concerns can very well make refactoring a lot easier, so MVC can be a big helper of agility.
Agile development is typically a process of rapid prototyping and refactoring. MVC's separation of concerns can often make both processes easier and faster.
Design patterns are a fundamental part of quick development. Popular design patterns are popular because they have wide utility. Relying heavily on patterns can make a workable architecture for a project crystallize much more quickly. The common vocabulary afforded by design patterns make it easier for a team to communicate the structures of a project and focus on the domain specific issues. Should one pattern turn out to be inconvenient for the progress of the project, the relation ship that pattern has with other alternatives are likely well understood, simplifying the task of refactoring to an alternative layout.
That being said, the MVC pattern has tremendous gravity. One of the major reasons it works so well is that it tends to emphasize API's. This sort of isolation makes it much easier to change certain parts of a system without having a major effect on unrelated parts. If a layer of the system has a defect, it's normally easy to alter that layer without affecting other layers, because they are separated by a well defined API. If an API is itself deficient, then it is often possible to alter the API exposed without effecting the actual logic of either layer (Although this tends to be more difficult than the first kind of deficiency).
When you find the right balance between structure and flexibility, it's worth its weight in gold.
I tend not to like most (current) MVC paradigms, because I believe they introduce pointless abstraction, reinvent the wheel, and add a lot of rigidity.
But I also tend to have highly structured programs that separate content from business logic from data access, and have as few "configurations" as possible in order to accomplish 1 thing. Ideally, to accomplish 1 thing, you should only have to edit 1-2 things.
Needless abstraction is the root of many problems.
The key phrase in agile is 'the simplest thing that could possibly work'.
If the simplest solution to a problem is:
a single script
a single web page
a single installation of a standard tool like a wiki
a single-user single-database 'just edit the data' editor
Then those won't have MVC, and will be the appropriate agile solutions.
If it is obvious from the start of the project that nothing like that is going to come close to solving the problem, it would be pointlessly literal process-following to try them and wait to fail before trying the next simplest solution.

Is Erlang a concise language from a programmer's perspective? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Where would Erlang fall on the spectrum of conciseness between, say, Java/.net on the less concise end and Ruby/Python on the more concise end of the spectrum? I have an RSI problem so conciseness is particularly important to me for health reasons.
Conciseness as a language feature is ill defined and probably not uniform. Different languages could be more or less concise depending on the problem.
Erlang as a functional language can be very concise, beyond Ruby or Python. Specifically pattern matching often replaces if statements and recursion and list comprehensions can replace loops.
For example Java would have something like this:
String foobar(int number){
if (number == 0) {
return "foo";
} else if (number == 1) {
return "bar";
}
throw new Exception();
}
while Erlang code would look like:
foobar(0) -> "foo";
foobar(1) -> "bar".
With the exception being inherent because there is no clause for input other then 0 or 1. This is of cause a problem that lends itself well to Erlang style development.
In general anything you could define as a transformation will match a functional language particularly well and can be written very concise. Of cause many functional language zealots state that any problem in programming is a transformation.
Erlang allows you to realize functionallity in very few lines of code, compared to my experiences in Java and Python. Only Smalltalk or Scheme came near for me in the past. You've get only little overhead, but you typically tend to speaking identifiers for modules, functions, variables, and atoms. They make the code more readable. And you've got lot's of normal, curly, and square braces. So it depends on your keyboard layout how comfortable it will be. You should give it a try.
mue
Erlang is surprisingly concise especially when you want achieve performance and reliability.
Erlang is concise even when compared to Haskell:
http://thinkerlang.com/2006/01/01/haskell-vs-erlang-reloaded.html
And is surprisingly fast (and reliable) even when compared to C++:
http://www.erlang.se/euc/06/proceedings/1600Nystrom.ppt
(18x less SLOC is not surprise).
Anyway it always depends of your preferences and goal what you want achieve.
You have to spend some time, write code, to understand erlang's sweet spot, vs. all the other emerging tools, DHT, doc stores, mapreduce frameworks, hadoop, GPU, scala, ... If you try to do, say SIMD type apps outside the sweet spot, you'll probably end up fighting the paradigm and writing verbose code, whereas if you hit problems that need to scale servers and middleware seamlessly up and down, it flows naturally. (And the rise of scala in its sweet spot is inevitable, too, I think)
A good thing to look up would be the Tim Bray Wide Finder experiment (distilling big apache log files) from a couple years ago, and how he was disappointed with erlang.
I generally don't recommend putting much store in the Alioth shootout, given you inevitably end up comparing really good and bad code, but if you need to put numbers of LOC, erlang vs. C, ruby, whatever
https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/erlang.html

Resources