What does the period do in this line of Fortran? - syntax

I am not sure what version of Fortran this is, but the line is:
Term = F*F - 4.*E*G
I know that it multiplies F by F and then subtracts something, but I don't know what the period after the 4 is doing there.

I'm going to venture a guess based on every other programming language I've ever seen, and say that it's making the constant "4" of type Real, rather than Integer. In other words, it's making sure the types in the expression all match up. "4.0" would be equivalent; whoever wrote this code was just feeling extra concise that day.

It makes it a real number instead of an integer.

If you're new to Fortran, a "REAL" number is what is called in C-like languages a "float".
But only Fortran programmers can say the GOD is REAL, by default.

Related

Chicken Scheme.- How to convert a complex number (for ex: (sqrt 2) ) to an integer? Regardless of rounding strategy

I am working on a C extension for Chicken Scheme and have everything in place but I am running into an issue with complex number types.
My code can only handle integers and when any math is done that involves say a square root my extension may end up having to handle complex number.
I just need to remove the decimal place and get whatever integer is close by. I am not worried about accuracy for this.
I have looked around and through the code but did not find anything.
Thanks!
Well, you can inspect the number type from the header tag. A complex number is a block object which has 2 slots; the real and imaginary part. Then, those numbers themselves can be ratnums, flonums, fixnums or bignums. You'll need to handle those situations as well if you want to do it all in C.
It's probably a lot easier to declare your C code as accepting an integer and do any conversion necessary in Scheme.

How to recognize variables that don't affect the output of a program?

Sometimes the value of a variable accessed within the control-flow of a program cannot possibly have any effect on a its output. For example:
global var_1
global var_2
start program hello(var_3, var_4)
if (var_2 < 0) then
save-log-to-disk (var_1, var_3, var_4)
end-if
return ("Hello " + var_3 + ", my name is " + var_1)
end program
Here only var_1 and var_3 have any influence on the output, while var_2 and var_4 are only used for side effects.
Do variables such as var_1 and var_3 have a name in dataflow-theory/compiler-theory?
Which static dataflow analysis techniques can be used to discover them?
References to academic literature on the subject would be particularly appreciated.
The problem that you stated is undecidable in general,
even for the following very narrow special case:
Given a single routine P(x), where x is a parameter of type integer. Is the output of P(x) independent of the value of x, i.e., does
P(0) = P(1) = P(2) = ...?
We can reduce the following still undecidable version of the halting problem to the question above: Given a Turing machine M(), does the program
never stop on the empty input?
I assume that we use a (Turing-complete) language in which we can build a "Turing machine simulator":
Given the program M(), construct this routine:
P(x):
if x == 0:
return 0
Run M() for x steps
if M() has terminated then:
return 1
else:
return 0
Now:
P(0) = P(1) = P(2) = ...
=>
M() does not terminate.
M() does terminate
=> P(x) = 1 for a sufficiently large x
=> P(x) != P(0) = 0
So, it is very difficult for a compiler to decide whether a variable actually does not influence the return value of a routine; in your example, the "side effect routine" might manipulate one of its values (or even loop infinitely, which would most definitely change the return value of the routine ;-)
Of course overapproximations are still possible. For example, one might conclude that a variable does not influence the return value if it does not appear in the routine body at all. You can also see some classical compiler analyses (like Expression Simplification, Constant propagation) having the side effect of eliminating appearances of such redundant variables.
Pachelbel has discussed the fact that you cannot do this perfectly. OK, I'm an engineer, I'm willing to accept some dirt in my answer.
The classic way to answer you question is to do dataflow tracing from program outputs back to program inputs. A dataflow is the connection of a program assignment (or sideeffect) to a variable value, to a place in the application that consumes that value.
If there is (transitive) dataflow from a program output that you care about (in your example, the printed text stream) to an input you supplied (var2), then that input "affects" the output. A variable that does not flow from the input to your desired output is useless from your point of view.
If you focus your attention only the computations involved in the dataflows, and display them, you get what is generally called a "program slice" . There are (very few) commercial tools that can show this to you.
Grammatech has a good reputation here for C and C++.
There are standard compiler algorithms for constructing such dataflow graphs; see any competent compiler book.
They all suffer from some limitation due to Turing's impossibility proofs as pointed out by Pachelbel. When you implement such a dataflow algorithm, there will be places that it cannot know the right answer; simply pick one.
If your algorithm chooses to answer "there is no dataflow" in certain places where it is not sure, then it may miss a valid dataflow and it might report that a variable does not affect the answer incorrectly. (This is called a "false negative"). This occasional error may be satisfactory if
the algorithm has some other nice properties, e.g, it runs really fast on a millions of code. (The trivial algorithm simply says "no dataflow" in all places, and it is really fast :)
If your algorithm chooses to answer "yes there is a dataflow", then it may claim that some variable affects the answer when it does not. (This is called a "false positive").
You get to decide which is more important; many people prefer false positives when looking for a problem, because then you have to at least look at possibilities detected by the tool. A false negative means it didn't report something you might care about. YMMV.
Here's a starting reference: http://en.wikipedia.org/wiki/Data-flow_analysis
Any of the books on that page will be pretty good. I have Muchnick's book and like it lot. See also this page: (http://en.wikipedia.org/wiki/Program_slicing)
You will discover that implementing this is pretty big effort, for any real langauge. You are probably better off finding a tool framework that does most or all this for you already.
I use the following algorithm: a variable is used if it is a parameter or it occurs anywhere in an expression, excluding as the LHS of an assignment. First, count the number of uses of all variables. Delete unused variables and assignments to unused variables. Repeat until no variables are deleted.
This algorithm only implements a subset of the OP's requirement, it is horribly inefficient because it requires multiple passes. A garbage collection may be faster but is harder to write: my algorithm only requires a list of variables with usage counts. Each pass is linear in the size of the program. The algorithm effectively does a limited kind of dataflow analysis by elimination of the tail of a flow ending in an assignment.
For my language the elimination of side effects in the RHS of an assignment to an unused variable is mandated by the language specification, it may not be suitable for other languages. Effectiveness is improved by running before inlining to reduce the cost of inlining unused function applications, then running it again afterwards which eliminates parameters of inlined functions.
Just as an example of the utility of the language specification, the library constructs a thread pool and assigns a pointer to it to a global variable. If the thread pool is not used, the assignment is deleted, and hence the construction of the thread pool elided.
IMHO compiler optimisations are almost invariably heuristics whose performance matters more than effectiveness achieving a theoretical goal (like removing unused variables). Simple reductions are useful not only because they're fast and easy to write, but because a programmer using a language who understand basics of the compiler operation can leverage this knowledge to help the compiler. The most well known example of this is probably the refactoring of recursive functions to place the recursion in tail position: a pointless exercise unless the programmer knows the compiler can do tail-recursion optimisation.

most readable way in XPath to write "is value X a member of sequence S"?

XPath 2.0 has some new functions and syntax, relative to 1.0, that work with sequences. Some of theset don't really add to what the language could already do in 1.0 (with node sets), but they make it easier to express the desired logic in ways that are more readable. This increases the chances of the programmer getting the code correct -- and keeping it that way. For example,
empty(s) is equivalent to not(s), but its intent is much clearer when you want to test whether a sequence is empty.
Correction: the effective boolean value of a sequence is in general more complicated than that. E.g. empty((0)) != not((0)). This applies to exists(s) vs. s in a boolean context as well. However, there are domains of s where empty(s) is equivalent to not(s), so the two could be used interchangeably within those domains. But this goes to show that the use of empty() can make a non-trivial difference in making code easier to understand.
Similarly, exists(s) is equivalent to boolean(s) that already existed in XPath 1.0 (or just s in a boolean context), but again is much clearer about the intent.
Quantified expressions; e.g. "some $x in expression satisfies test($x)" would be equivalent to boolean(expression[test(.)]) (although the new syntax is more flexible, in that you don't need to worry about losing the context item because you have the variable to refer to it by).
Similarly, "every $x in expression satisfies test($x)" would be equivalent to not(expression[not(test(.))]) but is more readable.
These functions and syntax were evidently added at no small cost, solely to serve the goal of writing XPath that is easier to map to how humans think. This implies, as experienced developers know, that understandable code is significantly superior to code that is difficult to understand.
Given all that ... what would be a clear and readable way to write an XPath test expression that asks
Does value X occur in sequence S?
Some ways to do it: (Note: I used X and S notation here to indicate the value and the sequence, but I don't mean to imply that these subexpressions are element name tests, nor that they are simple expressions. They could be complicated.)
X = S: This would be one of the most unreadable, since it requires the reader to
think about which of X and S are sequences vs. single values
understand general comparisons, which are not obvious from the syntax
However, one advantage of this form is that it allows us to put the topic (X) before the comment ("is a member of S"), which, I think, helps in readability.
See also CMS's good point about readability, when the syntax or names make the "cardinality" of X and S obvious.
index-of(S, X): This one is clear about what's intended as a value and what as a sequence (if you remember the order of arguments to index-of()). But it expresses more than we need to: it asks for the index, when all we really want to know is whether X occurs in S. This is somewhat misleading to the reader. An experienced developer will figure out what's intended, with some effort and with understanding of the context. But the more we rely on context to understand the intent of each line, the more understanding the code becomes a circular (spiral) and potentially Sisyphean task! Also, since index-of() is designed to return a list of all the indexes of occurrences of X, it could be more expensive than necessary: a smart processor, in order to evaluate X = S, wouldn't necessarily have to find all the contents of S, nor enumerate them in order; but for index-of(S, X), correct order would have to be determined, and all contents of S must be compared to X. One other drawback of using index-of() is that it's limited to using eq for comparison; you can't, for example, use it to ask whether a node is identical to any node in a given sequence.
Correction: This form, used as a conditional test, can result in a runtime error: Effective boolean value is not defined for a sequence of two or more items starting with a numeric value. (But at least we won't get wrong boolean values, since index-of() can't return a zero.) If S can have multiple instances of X, this is another good reason to prefer form 3 or 6.
exists(index-of(X, S)): makes the intent clearer, and would help the processor eliminate the performance penalty if the processor is smart enough.
some $m in S satisfies $m eq X: This one is very clear, and matches our intent exactly. It seems long-winded compared to 1, and that in itself can reduce readability. But maybe that's an acceptable price for clarity. Keep in mind that X and S could potentially be complex expressions themselves -- they're not necessarily just variable references. An advantage is that since the eq operator is explicit, you can replace it with is or any other comparison operator.
S[. eq X]: clearer than 1, but shares the semantic drawbacks of 2: it computes all members of S that are equal to X. Actually, this could return a false negative (incorrect effective boolean value), if X is falsy. E.g. (0, 1)[. eq 0] returns 0 which is falsy, even though 0 occurs in (0, 1).
exists(S[. eq X]): Clearer than 1, 2, 3, and 5. Not as clear as 4, but shorter. Avoids the drawbacks of 5 (or at least most of them, depending on the processor smarts).
I'm kind of leaning toward the last one, at this point: exists(S[. eq X])
What about you... As a developer coming to a complex, unfamiliar XSLT or XQuery or other program that uses XPath 2.0, and wanting to figure out what that program is doing, which would you find easiest to read?
Apologies for the long question. Thanks for reading this far.
Edit: I changed = to eq wherever possible in the above discussion, to make it easier to see where a "value comparison" (as opposed to a general comparison) was intended.
For what it's worth, if names or context make clear that X is a singleton, I'm happy to use your first form, X = S -- for example when I want to check an attribute value against a set of possible values:
<xsl:when test="#type = ('A', 'A+', 'A-', 'B+')" />
or
<xsl:when test="#type = $magic-types"/>
If I think there is a risk of confusion, then I like your sixth formulation. The less frequently I have to remember the rules for calculating an effective boolean value, the less frequently I make a mistake with them.
I prefer this one:
count(distinct-values($seq)) eq count(distinct-values(($x, $seq)))
When $x is itself a sequence, this expression implements the (value-based) subset of relation between two sets of values, that are represented as sequences. This implementation of subset of has just linear time complexity -- vs many other ways of expressing this, that have O(N^2)) time complexity.
To summarize, the question whether a single value belongs to a set of values is a special case of the question whether one set of values is a subset of another. If we have a good implementation of the latter, we can simply use it for answering the former.
The functx library has a nice implementation of this function, so you can use
functx:is-node-in-sequence($X, $Y)
(this particular function can be found at http://www.xqueryfunctions.com/xq/functx_is-node-in-sequence.html)
The whole functx library is available for both XQuery (http://www.xqueryfunctions.com/) and XSLT (http://www.xsltfunctions.com/)
Marklogic ships the functx library with their core product; other vendors may also.
Another possibility, when you want to know whether node X occurs in sequence S, is
exists((X) intersect S)
I think that's pretty readable, and concise. But it only works when X and the values in S are nodes; if you try to ask
exists(('bob') intersect ('alice', 'bob'))
you'll get a runtime error.
In the program I'm working on now, I need to compare strings, so this isn't an option.
As Dimitri notes, the occurrence of a node in a sequence is a question of identity, not of value comparison.

Writing a program that writes a program

Its well known in theoretical computer science that the "Hello world tester" program is an undecidable problem.(Here is a link what i mean by hello world tester
My question is since given a program as input we can't say what the program will do,can we solve the reverse problem:
Given set of input and output,is there an algorithm for writing a program that writes a program to achieve a one to one mapping between the given input and output.
I know about metaprogramming but my question is more of theoretical interest. Something which can apply for a generic case.
With these kind of musings one has to be very careful. A lot of confusion arises from not clearly distinguishing about a program x for which proposition P(x) holds or any program x for which proposition P(x) hold. As long as the set of programs for which P(x) holds is finite there always is a proof, of their correctness (although this proof may not be known).
At this point you also have to distinguish between programs, which are and can be known and programs which can only be shown to exist by full enumeration of all posibilities. Let's make an example:
Take 10 Programs, which take no input and may or may not terminate and produce "hello World". Then there is a program which decides exactly which of these programs are correct, and which aren't. Lets call these programs (x_1,...,x_10). Then take the programs (X_0,...,X_{2^10}) where X_i output true for program x_j if the j-th bit in the binary representation of i is set. One of these programs has to be the one which decides correctly for all ten x_i, there just might not be any way to ever figure out which one of these 100 X_js is the correct one (a meta-problem at this point).
This goes to show that considering finite sets of programs and input/output pairs one can always resolve to full enumeration and all halting-problem type of paradoxies instantly disappear. In your case the set of generated programs for each input is of size one and the set of input/output pairs is of finite size (because you have to supply it to the meta-program). Hence full enumeration solves your problem very simple and you can also easily proof both the correctness of the corrected program as well as the correctness of the meta-program.
Note: Since the set of generated programs is infinite, this is one of the few cases where you can proof P(x) for a infinite set of programs (actually you can proof P(x,input,output) for this set). This shows that the set being infinite is only a necessary, not a sufficient condition for this type of paradoxes to appear.
Your question is ambiguously phrased.
How would you specify "what a program should do"?
Any precise, complete, and machine-readable specification of a program's functionality is already a program.
Thus, the answer to your question is, a compiler.
Now, you're asking how to find a function based on a sample of its input and output.
That is a question about statistical analysis that I cannot answer.
Sounds like you want to generate a state machine that learns by being given an input sequence and then updates itself to produce the appropriate output sequence. Assuming your output sequences are always the same for the same input sequence it should be simple enough to write. If your output is not deterministic, such as changing the output depending on the time of day, then you cannot automatically generate a state machine.
Depends on what you mean by "one to one mapping". (And also, I suppose, "input" and "output".)
My guess is that you're asking whether, given an example of inputs and outputs for a given arbitrary program, can one devise an algorithm to write an equivalent program? If so, the answer is no. Eg, you could have a program with the inputs/outputs of 1/1, 2/2, 3/3, 4/4, and yet, if the next input value was 5, the output would be 3782. There's no way to know, from a given set of results, what the next result might be.
The question is underspecified since you did not say how the input and output are presented. For finite lists, the answer is "yes", as in this Python code:
def f(input,output):
print "def g():"
print " x = {" + ",".join(repr(x) + ":" + repr(y) for x,y in zip(input,output)) + "}"
print " print x[raw_input()]"
>>> f(['2','3','4'],['a','b','x'])
def g():
x = {'2':'a','3':'b','4':'x'}
print x[raw_input()]
>>> def g():
... x = {'2':'a','3':'b','4':'x'}
... print x[raw_input()]
...
>>> g()
3
b
for infinite sets how are you going to present them? If you show only a small sample of input this does not specify the whole algorithm. Guessing the best fit is undecidable. If you have a "magic blackbox" then there are continuum many mappings but only a countable number of programs, so it's impossible.
I think I agree with SLaks, but taking things from a different angle, what does a compiler do?
(EDIT: I see SLaks edited his original answer, which was essentially 'you're describing the identity function').
It takes a program in one source language that describes the intended behaviour of a program, and "writes" another program in a target language that exhibits that behaviour.
We could also think of this in terms of things like process refinement --- given an abstract specification, we can construct a refinement mapping to some "more concrete" (read: less non-deterministic, usually) implementation.
But based on your question, it's really very difficult to tell which of these you meant, if any.

How many lines of machine code are generated by one statement in programming language X?

Reading an article about Lost Programming Skills, the author brings up this chat:
Me: How much horsepower do you need?
SE: I don't know.
Me: Let's see, how many lines of code in your main loop?
SE: 10,000.
Me: what language?
SE: Fortran
Me: ok, that's about 10 lines of machine code per line of Fortran, so
100,000 instructions per loop; how many times does the loop execute per
second?
SE: every 1/20th of a second.
Me: OK, so that's 20 x 100,000 = 2mops (which was faster than anything we had
at the time), maybe we'd better rethink this.
Which makes me wonder, what is the number for modern languages, say Ruby? How does one find out?
i dont think there would be an exact no. saying "for languange x the compiled binary has y lines per source code line". But if you still want to find out may be you can take a large no. of compiled code and corresponding source code and find out the average per source code line.
You can open the binary with any binary editor to see how many lines it generates. for eg. Ollydbg
In terms of determining how long a piece of code will take to execute, that doesn't even really work for Fortran any more! If you write this in Fortran 90:
SUBROUTINE foo(x, y)
IMPLICIT NONE
REAL, DIMENSION(:), INTENT(IN) :: x
REAL, DIMENSION(:), INTENT(OUT) :: y
y = EXP(x)
END SUBROUTINE foo
the line that says y = EXP(x) can take arbitrarily long to execute, depending on the size of the arrays x and y. The same goes for any language with vector assignment.
In the chat they where trying to estimate CPU performance.
If you know CPU performance and time of execution of the loop you can get number of CPU commands per loop and then per line.
Calculation in your chat is not precises.
You can do similar unprecise calculations even for ruby.
Be aware that it wrong to say that one fortran line is 10 CPU commands BUT is average for certain loop it was true.
Estimate time taken by your loop in ruby.
Multiply your CPU performance (in operations per second) on loop time. You will get operations per second.
Divide operations per second on number of lines in loop. That is your value for your loop.
For X="C#" you might want to take a look at Faster Managed Code: Know What Things Cost from Microsoft. It says, that (particular) modern languages are heavily optimized before actually touching the hardware.

Resources