How would I write a syntax checker? - ruby

I am interested in writing a syntax checker for a language. Basically what I want to do is make a cli tool that will take an input file, and then write errors that it finds. The language I would want to parse is basically similar to Turing, and it is rather ugly and sometimes a pain to work with. The only other syntax checker for it must be used
What language should I use? I figured I would write it in Ruby, but Python may be faster or have better parsing libraries.
What libraries should I use, in Ruby or Pearl? Which would be easier.
Is there a primer to read for defining a grammar? Such a task can become confusing, and I'm not sure how I would handle it.

If it were me, I would write it in Ruby, and worry about speed later. If the program is a runaway hit, I might add a native gem to speed up the slowest bit, but leave most of it in Ruby. If it becomes the most important program in the world, or if I had nothing else to do, I might rewrite it in C or C++ at that point, but not before.
And I would do all parsing using Treetop.
I might add that writing and optimizing a language parser directly in C is an interesting learning experience. You get roughly no string handling help, so you end up doing all the parsing, but you have a chance to do only the minimum amount of processing. It's sort of the opposite of the Ruby experience. To get maximum speed you end up doing things like writing frond-ends for malloc, where multiple objects you know you never have to free get allocated permanently within a malloced block. Although it is typical to use yacc(1) with C/C++, you can certainly write a recursive-descent parser and have an even deeper learning experience.
Of course, having done all that already, I'm happy to stick with Ruby these days.

Related

Convert Processing sketch to ruby-processing sketch

Having discovered Processing, and in the middle of trying to learn ruby, I naturally installed ruby-processing for Processing 2.2 (not 3 which requires JRubyArt as the alternative to ruby-processing as I understand).
I am hoping to kill two birds with one stone and create Processing sketches whilst learning more ruby.
However it would be really useful to translate example Processing sketches into ruby so I can then play with them. At the moment I am doing it manually. Does anyone know of a script to do this?
Short answer: no.
This kind of translation is not trivial. Generally, you can't really translate the syntax of one language into the syntax of a different language. You don't go line-by-line and simply change the code one line at a time. That makes what you're asking pretty difficult, so you probably won't find many tools that do stuff like that.
Instead, to translate a program from one language to another, you have to think about the semantics, not the syntax. You have to ask yourself "what does this program do?" and then you simply write a program that does the same thing in the target language. It's not a one-to-one syntax mapping.
In fact, if you're trying to learn, that's a pretty great exercise. The examples that come with Processing are pretty small, so it should be relatively straightforward. Here is what I would do if I were you:
Take an existing example Processing sketch.
Write down in English (not pseudocode) what the program does. Be as specific as possible. Break things down into a list of small steps, and break those steps down into even smaller sub-steps whenever possible. You should be able to hand your list to somebody who has never seen the sketch, and they should be able to tell you in their own words exactly what the sketch does.
Take that list and treat that as your programming assignment, and implement it in your target language. If you get stuck on one of the specific steps, post an MCVE and we'll go from there.
So, translating code involves a middle step of translating it into English first, which is why tools like what you're asking for are much more difficult than you might first think.

Convert Ruby to low level languages?

I have all kind of scripting with Ruby:
rails (symfony)
ruby (php, bash)
rb-appscript (applescript)
Is it possible to replace low level languages with Ruby too?
I write in Ruby and it converts it to java, c++ or c.
Cause People say that when it comes to more performance critical tasks in Ruby, you could extend it with C. But the word extend means that you write C files that you just call in your Ruby code. I wonder, could I instead use Ruby and convert it to C source code which will be compiled to machine code. Then I could "extend" it with C but in Ruby code.
That is what this post is about. Write everything in Ruby but get the performance of C (or Java).
The second advantage is that you don't have to learn other languages.
Just like HipHop for PHP.
Are there implementations for this?
Such a compiler would be an enormous piece of work. Even if it works, it still has to
include the ruby runtime
include the standard library (which wasn't built for performance but for usability)
allow for metaprogramming
do dynamic dispatch
etc.
All of these inflict tremendous runtime penalties, because a C compiler can neither understand nor optimize such abstractions. Ruby and other dynamic languages are not only slower because they are interpreted (or compiled to bytecode which is then interpreted), but also because they are dynamic.
Example
In C++, a method call can be inlined in most cases, because the compiler knows the exact type of this. If a subtype is passed, the method still can't change unless it is virtual, in which case a still very efficient lookup table is used.
In Ruby, classes and methods can change in any way at any time, thus a (relatively expensive) lookup is required every time.
Languages like Ruby, Python or Perl have many features that simply are expensive, and most if not all relevant programs rely heavily on these features (of course, they are extremely useful!), so they cannot be removed or inlined.
Simply put: Dynamic languages are very hard to optimize, simply doing what an interpreter would do and compiling that to machine code doesn't cut it. It's possible to get incredible speed out of dynamic languages, as V8 proves, but you have to throw huge piles of money and offices full of clever programmers at it.
There is https://github.com/seattlerb/ruby_to_c Ruby To C compiler. It actually only takes in a subset of Ruby though. I believe the main missing part is the Meta Programming features
In a recent interview (November 16th, 2012) Yukihiro "Matz" Matsumoto (creator of Ruby) talked about compiling Ruby to C
(...) In University of Tokyo a research student is working on an academic research project that compiles Ruby code to C code before compiling the binary code. The process involves techniques such as type inference, and in optimal scenarios the speed could reach up to 90% of typical hand-written C code. So far there is only a paper published, no open source code yet, but I’m hoping next year everything will be revealed... (from interview)
Just one student is not much, but it might be an interesting project. Probably a long way to go to full support of Ruby.
"Low level" is highly subjective. Many people draw the line differently, so for the sake of this argument, I'm just going to assume you mean compiling Ruby down to an intermediate form which can then be turned into machine code for your particular platform. I.e., compiling ruby to C or LLVM IR, or something of that nature.
The short answer is yes this is possible.
The longer answer goes something like this:
Several languages (Objective-C most notably) exist as a thin layer over other languages. ObjC syntax is really just a loose wrapper around the objc_*() libobjc runtime calls, for all practical purposes.
Knowing this, then what does the compiler do? Well, it basically works as any C compiler would, but also takes the objc-specific stuff, and generates the appropriate C function calls to interact with the objc runtime.
A ruby compiler could be implemented in similar terms.
It should also be noted however, that just by converting one language to a lower level form does not mean that language is instantly going to perform better, though it does not mean it will perform worse either. You really have to ask yourself why you're wanting to do it, and if that is a good reason.
There is also JRuby, if you still consider Java a low level language. Actually, the language itself has little to do here: it is possible to compile to JVM bytecode, which is independent of the language.
Performance doesn't come solely from "low level" compiled languages. Cross-compiling your Ruby program to convoluted, automatically generated C code isn't going to help either. This will likely just confuse things, include long compile times, etc. And there are much better ways.
But you first say "low level languages" and then mention Java. Java is not a low-level language. It's just one step below Ruby in terms of high- or low-level languages. But if you look at how Java works, the JVM, bytecode and just-in-time compilation, you can see how high level languages can be fast(er). Ruby is currently doing something similar. MRI 1.8 was an interpreted language, and had some performance problems. 1.9 is much faster, it's using a bytecode interpreter. I'm not sure if it'll ever happen on MRI, but Ruby is just one step away from JIT on MRI.
I'm not sure about the technologies behind jRuby and IronRuby, but they may already be doing this. However, both have their own advantages and disadvantages. I tend to stick with MRI, it's fast enough and it works just fine.
It is probably feasible to design a compiler that converts Ruby source code to C++. Ruby programs can be compiled to Python using the unholy compiler, so they could be compiled from Python to C++ using the Nuitka compiler.
The unholy compiler was developed more than a decade ago, but it might still work with current versions of Python.
Ruby2Cextension is another compiler that translates a subset of Ruby to C++, though it hasn't been updated since 2008.

Pseudocode interpreter?

Like lots of you guys on SO, I often write in several languages. And when it comes to planning stuff, (or even answering some SO questions), I actually think and write in some unspecified hybrid language. Although I used to be taught to do this using flow diagrams or UML-like diagrams, in retrospect, I find "my" pseudocode language has components of C, Python, Java, bash, Matlab, perl, Basic. I seem to unconsciously select the idiom best suited to expressing the concept/algorithm.
Common idioms might include Java-like braces for scope, pythonic list comprehensions or indentation, C++like inheritance, C#-style lambdas, matlab-like slices and matrix operations.
I noticed that it's actually quite easy for people to recognise exactly what I'm triying to do, and quite easy for people to intelligently translate into other languages. Of course, that step involves considering the corner cases, and the moments where each language behaves idiosyncratically.
But in reality, most of these languages share a subset of keywords and library functions which generally behave identically - maths functions, type names, while/for/if etc. Clearly I'd have to exclude many 'odd' languages like lisp, APL derivatives, but...
So my questions are,
Does code already exist that recognises the programming language of a text file? (Surely this must be a less complicated task than eclipse's syntax trees or than google translate's language guessing feature, right?) In fact, does the SO syntax highlighter do anything like this?
Is it theoretically possible to create a single interpreter or compiler that recognises what language idiom you're using at any moment and (maybe "intelligently") executes or translates to a runnable form. And flags the corner cases where my syntax is ambiguous with regards to behaviour. Immediate difficulties I see include: knowing when to switch between indentation-dependent and brace-dependent modes, recognising funny operators (like *pointer vs *kwargs) and knowing when to use list vs array-like representations.
Is there any language or interpreter in existence, that can manage this kind of flexible interpreting?
Have I missed an obvious obstacle to this being possible?
edit
Thanks all for your answers and ideas. I am planning to write a constraint-based heuristic translator that could, potentially, "solve" code for the intended meaning and translate into real python code. It will notice keywords from many common languages, and will use syntactic clues to disambiguate the human's intentions - like spacing, brackets, optional helper words like let or then, context of how variables are previously used etc, plus knowledge of common conventions (like capital names, i for iteration, and some simplistic limited understanding of naming of variables/methods e.g containing the word get, asynchronous, count, last, previous, my etc). In real pseudocode, variable naming is as informative as the operations themselves!
Using these clues it will create assumptions as to the implementation of each operation (like 0/1 based indexing, when should exceptions be caught or ignored, what variables ought to be const/global/local, where to start and end execution, and what bits should be in separate threads, notice when numerical units match / need converting). Each assumption will have a given certainty - and the program will list the assumptions on each statement, as it coaxes what you write into something executable!
For each assumption, you can 'clarify' your code if you don't like the initial interpretation. The libraries issue is very interesting. My translator, like some IDE's, will read all definitions available from all modules, use some statistics about which classes/methods are used most frequently and in what contexts, and just guess! (adding a note to the program to say why it guessed as such...) I guess it should attempt to execute everything, and warn you about what it doesn't like. It should allow anything, but let you know what the several alternative interpretations are, if you're being ambiguous.
It will certainly be some time before it can manage such unusual examples like #Albin Sunnanbo's ImportantCustomer example. But I'll let you know how I get on!
I think that is quite useless for everything but toy examples and strict mathematical algorithms. For everything else the language is not just the language. There are lots of standard libraries and whole environments around the languages. I think I write almost as many lines of library calls as I write "actual code".
In C# you have .NET Framework, in C++ you have STL, in Java you have some Java libraries, etc.
The difference between those libraries are too big to be just syntactic nuances.
<subjective>
There has been attempts at unifying language constructs of different languages to a "unified syntax". That was called 4GL language and never really took of.
</subjective>
As a side note I have seen a code example about a page long that was valid as c#, Java and Java script code. That can serve as an example of where it is impossible to determine the actual language used.
Edit:
Besides, the whole purpose of pseudocode is that it does not need to compile in any way. The reason you write pseudocode is to create a "sketch", however sloppy you like.
foreach c in ImportantCustomers{== OrderValue >=$1M}
SendMailInviteToSpecialEvent(c)
Now tell me what language it is and write an interpreter for that.
To detect what programming language is used: Detecting programming language from a snippet
I think it should be possible. The approach in 1. could be leveraged to do this, I think. I would try to do it iteratively: detect the syntax used in the first line/clause of code, "compile" it to intermediate form based on that detection, along with any important syntax (e.g. begin/end wrappers). Then the next line/clause etc. Basically write a parser that attempts to recognize each "chunk". Ambiguity could be flagged by the same algorithm.
I doubt that this has been done ... seems like the cognitive load of learning to write e.g. python-compatible pseudocode would be much easier than trying to debug the cases where your interpreter fails.
a. I think the biggest problem is that most pseudocode is invalid in any language. For example, I might completely skip object initialization in a block of pseudocode because for a human reader it is almost always straightforward to infer. But for your case it might be completely invalid in the language syntax of choice, and it might be impossible to automatically determine e.g. the class of the object (it might not even exist). Etc.
b. I think the best you can hope for is an interpreter that "works" (subject to 4a) for your pseudocode only, no-one else's.
Note that I don't think that 4a,4b are necessarily obstacles to it being possible. I just think it won't be useful for any practical purpose.
Recognizing what language a program is in is really not that big a deal. Recognizing the language of a snippet is more difficult, and recognizing snippets that aren't clearly delimited (what do you do if four lines are Python and the next one is C or Java?) is going to be really difficult.
Assuming you got the lines assigned to the right language, doing any sort of compilation would require specialized compilers for all languages that would cooperate. This is a tremendous job in itself.
Moreover, when you write pseudo-code you aren't worrying about the syntax. (If you are, you're doing it wrong.) You'll wind up with code that simply can't be compiled because it's incomplete or even contradictory.
And, assuming you overcame all these obstacles, how certain would you be that the pseudo-code was being interpreted the way you were thinking?
What you would have would be a new computer language, that you would have to write correct programs in. It would be a sprawling and ambiguous language, very difficult to work with properly. It would require great care in its use. It would be almost exactly what you don't want in pseudo-code. The value of pseudo-code is that you can quickly sketch out your algorithms, without worrying about the details. That would be completely lost.
If you want an easy-to-write language, learn one. Python is a good choice. Use pseudo-code for sketching out how processing is supposed to occur, not as a compilable language.
An interesting approach would be a "type-as-you-go" pseudocode interpreter. That is, you would set the language to be used up front, and then it would attempt to convert the pseudo code to real code, in real time, as you typed. An interactive facility could be used to clarify ambiguous stuff and allow corrections. Part of the mechanism could be a library of code which the converter tried to match. Over time, it could learn and adapt its translation based on the habits of a particular user.
People who program all the time will probably prefer to just use the language in most cases. However, I could see the above being a great boon to learners, "non-programmer programmers" such as scientists, and for use in brainstorming sessions with programmers of various languages and skill levels.
-Neil
Programs interpreting human input need to be given the option of saying "I don't know." The language PL/I is a famous example of a system designed to find a reasonable interpretation of anything resembling a computer program that could cause havoc when it guessed wrong: see http://horningtales.blogspot.com/2006/10/my-first-pli-program.html
Note that in the later language C++, when it resolves possible ambiguities it limits the scope of the type coercions it tries, and that it will flag an error if there is not a unique best interpretation.
I have a feeling that the answer to 2. is NO. All I need to prove it false is a code snippet that can be interpreted in more than one way by a competent programmer.
Does code already exist that
recognises the programming language
of a text file?
Yes, the Unix file command.
(Surely this must be a less
complicated task than eclipse's syntax
trees or than google translate's
language guessing feature, right?) In
fact, does the SO syntax highlighter
do anything like this?
As far as I can tell, SO has a one-size-fits-all syntax highlighter that tries to combine the keywords and comment syntax of every major language. Sometimes it gets it wrong:
def median(seq):
"""Returns the median of a list."""
seq_sorted = sorted(seq)
if len(seq) & 1:
# For an odd-length list, return the middle item
return seq_sorted[len(seq) // 2]
else:
# For an even-length list, return the mean of the 2 middle items
return (seq_sorted[len(seq) // 2 - 1] + seq_sorted[len(seq) // 2]) / 2
Note that SO's highlighter assumes that // starts a C++-style comment, but in Python it's the integer division operator.
This is going to be a major problem if you try to combine multiple languages into one. What do you do if the same token has different meanings in different languages? Similar situations are:
Is ^ exponentiation like in BASIC, or bitwise XOR like in C?
Is || logical OR like in C, or string concatenation like in SQL?
What is 1 + "2"? Is the number converted to a string (giving "12"), or is the string converted to a number (giving 3)?
Is there any language or interpreter
in existence, that can manage this
kind of flexible interpreting?
On another forum, I heard a story of a compiler (IIRC, for FORTRAN) that would compile any program regardless of syntax errors. If you had the line
= Y + Z
The compiler would recognize that a variable was missing and automatically convert the statement to X = Y + Z, regardless of whether you had an X in your program or not.
This programmer had a convention of starting comment blocks with a line of hyphens, like this:
C ----------------------------------------
But one day, they forgot the leading C, and the compiler choked trying to add dozens of variables between what it thought was subtraction operators.
"Flexible parsing" is not always a good thing.
To create a "pseudocode interpreter," it might be necessary to design a programming language that allows user-defined extensions to its syntax. There already are several programming languages with this feature, such as Coq, Seed7, Agda, and Lever. A particularly interesting example is the Inform programming language, since its syntax is essentially "structured English."
The Coq programming language allows "syntax extensions", so the language can be extended to parse new operators:
Notation "A /\ B" := (and A B).
Similarly, the Seed7 programming language can be extended to parse "pseudocode" using "structured syntax definitions." The while loop in Seed7 is defined in this way:
syntax expr: .while.().do.().end.while is -> 25;
Alternatively, it might be possible to "train" a statistical machine translation system to translate pseudocode into a real programming language, though this would require a large corpus of parallel texts.

How to write an interpreter?

I have decided to write a small interpreter as my next project, in Ruby. What knowledge/skills will I need to have to be successful?
I haven't decided on the language to interpret yet, but I am looking for something that is not a toy language, but would be relatively easy to write an interpreter for.
Thanks in advance.
You will have to learn at least:
lexical analysis (grouping characters into tokens)
parsing (grouping tokens together into structure)
abstract syntax trees (representing program structure in a data structure)
data representation (assuming your language will have variables)
an evaluation loop that "runs" your program
An excellent introduction to some of these topics can be found in the introductory text Structure and Interpretation of Computer Programs. The language used in that book is Scheme, which is a robust, well-specified language that is ideally suited for your first interpreter implementation. Highly recommended.
I haven't decided on the language to interpret yet, but I am looking for
something that is not a toy language, but would be relatively easy to write an
interpreter for. Thanks in advance.
Try some dialect of Lisp like Scheme or Clojure. (Now there's an idea: Clojure-in-Ruby, which integrates with Ruby as well as Clojure does with Java.)
With Lisp, there is no need to bother with idiosyncracies of syntax, as Lisp's syntax is much closer to the abstract syntax tree.
This SICP chapter shows how to write a Lisp interpreter in Lisp (a metacircular evaluator). In my opinion this is the best place to start. Then you can move on to Lisp in Small Pieces to learn how to write advanced interpreters and compilers for Lisp. The advantage of implementing a language like Lisp (in Lisp itself!) is that you get the lexical analyzer, parser, AST, data/program representation and REPL for free. You can concentrate on the task of getting your great language working!
There is Tree top project wich can be helpful for you http://treetop.rubyforge.org/
You can checkout Ruby Draft Specification http://ruby-std.netlab.jp/
I had a similar idea a couple of days ago. LISP is by far the easiest to implement because the syntax is so simple, and the data structures that the language manipulates are the same structures that the code is written in. Hence you need only a minimal implementation, and can define the rest in terms of itself.
However, if you are trying to learn about parsing, you may want to do a more complex language with Abstract Syntax Trees, etc.
If you want to check out my (literally two days old) Java implementation of lisp, check out mylisp.googlecode.com. I'm still working on it but it is incredible how short a time it took to get the existing stuff working.
It's not sooo hard. here's a LISP interpreter in ruby and the source is so small you are supposed to copy/paste it. but are you gonna learn LISP now? hehe.
If you're just doing this for fun, make up your own, simple language and just try it. My recommendation would be something like a really simple classic BASIC (no visual basic or object oriented stuff). With line numbers, GOTO, INPUT and PRINT and that's it. You get to do the basics, and you get a better understanding of how things work.
The knowledge you'll need?
Tokenizing (turning that huge chunk of characters into something more efficiently readable, effectively splitting it up into 'words')
Parsing (going over the tokens and building a data structure from it)
Interpreting (looping over the data structure and executing each command)
And for that last one you'll also need a way to keep around variables. Usually you'd just implement a "stack", one huge block of data where you can mark off an area at the end.
It's not implemented in Lisp, but I found Write Yourself A Scheme in 48 Hours to be a very useful document while I was starting out with Haskell (though I didn't get anywhere near finishing it after 48 hours; YMMV). It also gives you a lot of insight into interpreters in general.
I can recommend this book. It discusses patterns for writing parsers and interpreters and more:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0

Static/strong typing and refactoring

It seems to me that the most invaluable thing about a static/strongly-typed programming language is that it helps refactoring: if/when you change any API, then the compiler will tell you what that change has broken.
I can imagine writing code in a runtime/weakly-typed language ... but I can't imagine refactoring without the compiler's help, and I can't imagine writing tens of thousands of lines of code without refactoring.
Is this true?
I think you're conflating when types are checked with how they're checked. Runtime typing isn't necessarily weak.
The main advantage of static types is exactly what you say: they're exhaustive. You can be confident all call sites conform to the type just by letting the compiler do it's thing.
The main limitation of static types is that they're limited in the constraints they can express. This varies by language, with most languages having relatively simple type systems (c, java), and others with extremely powerful type systems (haskell, cayenne).
Because of this limitation types on their own are not sufficient. For example, in java types are more or less restricted to checking type names match. This means the meaning of any constraint you want checked has to be encoded into a naming scheme of some sort, hence the plethora of indirections and boiler plate common to java code. C++ is a little better in that templates allow a bit more expressiveness, but don't come close to what you can do with dependent types. I'm not sure what the downsides to the more powerful type systems are, though clearly there must be some or more people would be using them in industry.
Even if you're using static typing, chances are it's not expressive enough to check everything you care about, so you'll need to write tests too. Whether static typing saves you more effort than it requires in boilerplate is a debate that's raged for ages and that I don't think has a simple answer for all situations.
As to your second question:
How can we re-factor safely in a runtime typed language?
The answer is tests. Your tests have to cover all the cases that matter. Tools can help you in gauging how exhaustive your tests are. Coverage checking tools let you know wether lines of code are covered by the tests or not. Test mutation tools (jester, heckle) can let you know if your tests are logically incomplete. Acceptance tests let you know what you've written matches requirements, and lastly regression and performance tests ensure that each new version of the product maintains the quality of the last.
One of the great things about having proper testing in place vs relying on elaborate type indirections is that debugging becomes much simpler. When running the tests you get specific failed assertions within tests that clearly express what they're doing, rather than obtuse compiler error statements (think c++ template errors).
No matter what tools you use: writing code you're confident in will require effort. It most likely will require writing a lot of tests. If the penalty for bugs is very high, such as aerospace or medical control software, you may need to use formal mathematical methods to prove the behavior of your software, which makes such development extremely expensive.
I totally agree with your sentiment. The very flexibility that dynamically typed languages are supposed to be good at is actually what makes the code very hard to maintain. Really, is there such a thing as a program that continues to work if the data types are changed in a non trivial way without actually changing the code?
In the mean time, you could check the type of variable being passed, and somehow fail if its not the expected type. You'd still have to run your code to root out those cases, but at least something would tell you.
I think Google's internal tools actually do a compilation and probably type checking to their Javascript. I wish I had those tools.
To start, I'm a native Perl programmer so on the one hand I've never programmed with the net of static types. OTOH I've never programmed with them so I can't speak to their benefits. What I can speak to is what its like to refactor.
I don't find the lack of static types to be a problem wrt refactoring. What I find a problem is the lack of a refactoring browser. Dynamic languages have the problem that you don't really know what the code is really going to do until you actually run it. Perl has this more than most. Perl has the additional problem of having a very complicated, almost unparsable, syntax. Result: no refactoring tools (though they're working very rapidly on that). The end result is I have to refactor by hand. And that is what introduces bugs.
I have tests to catch them... usually. I do find myself often in front of a steaming pile of untested and nigh untestable code with the chicken/egg problem of having to refactor the code in order to test it, but having to test it in order to refactor it. Ick. At this point I have to write some very dumb, high level "does the program output the same thing it did before" sort of tests just to make sure I didn't break something.
Static types, as envisioned in Java or C++ or C#, really only solve a small class of programming problems. They guarantee your interfaces are passed bits of data with the right label. But just because you get a Collection doesn't mean that Collection contains the data you think it does. Because you get an integer doesn't mean you got the right integer. Your method takes a User object, but is that User logged in?
Classic example: public static double sqrt(double a) is the signature for the Java square root function. Square root doesn't work on negative numbers. Where does it say that in the signature? It doesn't. Even worse, where does it say what that function even does? The signature only says what types it takes and what it returns. It says nothing about what happens in between and that's where the interesting code lives. Some people have tried to capture the full API by using design by contract, which can broadly be described as embedding run-time tests of your function's inputs, outputs and side effects (or lack thereof)... but that's another show.
An API is far more than just function signatures (if it wasn't, you wouldn't need all that descriptive prose in the Javadocs) and refactoring is far more even than just changing the API.
The biggest refactoring advantage a statically typed, statically compiled, non-dynamic language gives you is the ability to write refactoring tools to do quite complex refactorings for you because it knows where all the calls to your methods are. I'm pretty envious of IntelliJ IDEA.
I would say refactoring goes beyond what the compiler can check, even in statically-typed languages. Refactoring is just changing a programs internal structure without affecting the external behavior. Even in dynamic languages, there are still things that you can expect to happen and test for, you just lose a little bit of assistance from the compiler.
One of the benefits of using var in C# 3.0 is that you can often change the type without breaking any code. The type needs to still look the same - properties with the same names must exist, methods with the same or similar signature must still exist. But you can really change to a very different type, even without using something like ReSharper.

Resources