Related
When I was skimming some prolog related questions recently, I stumbled upon this answer by #mat to question How to represent directed cyclic graph in Prolog with direct access to neighbour verticies .
So far, my personal experience with attributed variables in Prolog has been very limited. But the use-case given by #mat sparked my interest. So I tried using it for answering another question, ordering lists with constraint logic programming.
First, the good news: My first use of attributed variables worked out like I wanted it to.
Then, the not so good news: When I had posted by answer, I realized there were several API's and implementations for attributed variables in Prolog.
I feel I'm over my head here... In particular I want to know the following:
What API's are in wide-spread use? Up to now, I found two: SICStus and SWI.
Which features do the different attributed variable implementations offer? The same ones? Or does one subsume the other?
Are there differences in semantics?
What about the actual implementation? Are some more efficient than others?
Can be (or is) using attributed variables a portability issue?
Lots of question marks, here... Please share your experience / stance?
Thank you in advance!
Edit 2015-04-22
Here's a code snippet of the answer mentioned above:
init_att_var(X,Z) :-
put_attr(Z,value,X).
get_att_value(Var,Value) :-
get_attr(Var,value,Value).
So far I "only" use put_attr/3 and get_attr/3, but---according to the SICStus Prolog documentation on attributed variables---SICStus offers put_attr/2 and get_attr/2.
So even this very shallow use-case requires some emulation layer (one way or the other).
I would like to focus on one important general point I noticed when working with different interfaces for attributes variables: When designing an interface for attributed variables, an implementor should also keep in mind the following:
Is it possible to take attributes into account when reasoning about simultaneous unifications, as in [X,Y] = [0,1]?
This is possible for example in SICStus Prolog, because such bindings are undone before verify_attributes/3 is called. In the interface provided by hProlog (attr_unify_hook/2, called after the unification and with all bindings already in place) it is hard to take into account the (previous) attributes of Y when reasoning about the unification of X in attr_unify_hook/2, because Y is no longer a variable at this point! This may be sufficient for solvers that can make decisions based on ground values alone, but it is a serious limitation for solvers that need additional data, typically stored in attributes, to see whether a unification should succeed, and which are then no longer easily available. One obvious example: Boolean unification with decision diagrams.
As of 2016, the verify-attributes branch of SWI-Prolog also supports verify_attributes/3, thanks to great implementation work by Douglas Miles. The branch is ready for testing and intended to be merged into master as soon as it works correctly and efficiently. For compatibility with hProlog, the branch also supports attr_unify_hook/2: It does so by rewriting such definitions to the more general verify_attributes/3 at compilation time.
Performance-wise, it is clear that there may be a downside to verify_attributes/3, because making several variables ground at the same time may let you sooner see (in attr_unify_hook/2) that a unification cannot succeed. However, I will gladly and any time exchange this typically negligible advantage for the improved reliability, ease of use, and increased functionality that the more general interface gives you, and which is in any case already the standard behaviour in SICStus Prolog which is on top of its generality also one of the faster Prolog systems around.
SICStus Prolog also features an important predicate called project_attributes/2: It is used by the toplevel to project constraints to query variables. SWI-Prolog also supports this in recent versions.
There is also one huge advantage of the SWI interface: The residual goals that attribute_goals//1 and hence copy_term/3 give you are always a list. This helps users to avoid defaultyness in their code, and encourages a more declarative interface, because a list of pure constraint goals cannot contain control structures.
Interestingly, neither interface lets you interpret unifications other than syntactically. Personally, I think there are cases where you may want to interpret unifications differently than syntactically, however, there may also be good arguments against that.
The other interface predicates for attributed variables are mostly easily interchangable with simple wrapper predicates for different systems.
Jekejeke Minlog has state-less or thin attribute variables. Well not exactly, an attribute variable can have zero, one or many hooks, which are allowed to be closures, and hence can carry a little state.
But typically an implementation manages the state elsewere. For this
purpose Jekejeke Minlog allows creating reference types from variables,
so that they can be used as indexes into tables.
The full potential is unleashed if this combined with trailing and/or
forward chaining. As an example we have implemented CLP(FD). There is also a little solver tutorial.
The primitive ingredients in our case are:
1) State-less Attribute Variables
2) Trailing and Variable Keys
3) Continuation Queue
The attribute variables hooks might have binding effects upto extending the continuation queue but are only executed once. Goals from the continuation queue can be non-deterministic.
There are some additional layers before realizing applications, that are mostly aggregations of the primitives to make changes temporarily.
The main applications so far are open source here and here:
a) Finite Domain Constraint Solver
b) Herbrand Constraints
c) Goal Suspension
Bye
An additional perspective on attributed variable libraries is how many attributes can be defined per module. In the case of SWI-Prolog/YAP and citing SWI documentation:
Each attribute is associated to a module, and the hook
(attr_unify_hook/2) is executed in this module.
This is a severe limitation for implementers of libraries such as CLP(FD) as it forces using additional modules for the sole purpose of having multiple attributes instead of being able to define as many attributes as required in the module implementing their library. This limitation doesn't exist on the SICStus Prolog interface, which provides a directive attribute/1 that allows the declaration of an arbitrary number of attributes per module.
You can find one of the oldest and most elaborate implementations of attributed variables in ECLiPSe, where it forms part of the wider infrastructure for implementing constraint solvers.
The main characteristics of this design are:
attributes must be declared, and in return the compiler supports efficient access
a syntax for attributed variables, so that they can be read and written
a more complete set of handlers for attribute operations, so that attributes are not only taken into account for unification, but also for other generic operations such as term copying and subsumption tests
a clear separation between the concepts of variable attribute and suspended goals
used in over a dozen of ECLiPSe's libraries
This paper (section 4) and the ECLiPSe documentation have more details.
I'm learning scheme and until now have been using guile. I'm really just learning as a way to teach myself a functional programming language, but I'd like to publish an open source project of some sort to reenforce the study— not sure what yet... I'm a web developer, so probably something webby.
It's becoming apparent that publishing scheme code isn't very easy to do, with all these different implementations and no real standards beyond the core of the language itself (R5RS). For example, I'm almost certainly going to need to do basic IO on disk and over a TCP socket, along with string manipulation, such as scanning/regex, which seems not to be covered by R5RS, unless I'm not seeing it in the document. It seems like Scheme is more of a "concept" than a practical language... is this a fair assessment? Perhaps I should look to something like Haskell if I want to learn a functional programming language that lends itself more to use in open source projects?
In reality, how much pain do the differing scheme implementations pose when you want to publish an open source project? I don't really fancy having to maintain 5 different functions for basic things like string manipulation under various mainstream implementations (Chicken, guile, MIT, DrRacket). How many people actually write scheme for cross-implementation compatibility, as opposed to being tightly coupled with the library functions that only exist in their own scheme?
I have read http://www.ccs.neu.edu/home/dorai/scmxlate/scheme-boston/talk.html, which doesn't fill me with confidence ;)
EDIT | Let's re-define "standard" as "common".
I believe that in Scheme, portability is a fool's errand, since Scheme implementations are more different than they are similar, and there is no single implementation that other implementations try to emulate (unlike Python and Ruby, for example).
Thus, portability in Scheme is analogous to using software rendering for writing games "because it's in the common subset between OpenGL and DirectX". In other words, it's a lowest common denominator—it can be done, but you lose access to many features that the implementation offers.
For this reason, while SRFIs generally have a portable reference implementation (where practical), some of them are accompanied by notes that a quality Scheme implementation should tailor the library to use implementation-specific features in order to function optimally.
A prime example is case-lambda (SRFI 16); it can be implemented portably, and the reference implementation demonstrates it, but it's definitely less optimal compared to a built-in case-lambda, since you're having to implement function dispatch in "user" code.
Another example is stream-constant from SRFI 41. The reference implementation uses an O(n) simulation of circular lists for portability, but any decent implementation should adapt that function to use real circular lists so that it's O(1).†
The list goes on. Many useful things in Scheme are not portable—SRFIs help make more features portable, but there's no way that SRFIs can cover everything. If you want to get useful work done efficiently, chances are pretty good you will have to use non-portable features. The best you can do, I think, is to write a façade to encapsulate those features that aren't already covered by SRFIs.
† There is actually now a way to implement stream-constant in an O(1) fashion without using circular lists at all. Portable and fast for the win!
Difficult question.
Most people decide to be pragmatic. If portability between implementations is important, they write the bulk of the program in standard Scheme and isolate non-standard parts in (smallish) libraries. There have been various approaches of how exactly to do this. One recent effort is SnowFort.
http://snow.iro.umontreal.ca/
An older effort is SLIB.
http://people.csail.mit.edu/jaffer/SLIB
If you look - or ask for - libraries for regular expressions and lexer/parsers you'll quickly find some.
Since the philosophy of R5RS is to include only those language features that all implementors agree on, the standard is small - but also very stable.
However for "real world" programming R5RS might not be the best fit.
Therefore R6RS (and R7RS?) include more "real world" libraries.
That said if you only need portability because it seems to be the Right Thing, then reconsider carefully if you really want to put the effort in.
I would simply write my program on the implementation I know the best. Then if necessary port it afterwards. This often turns out to be easier than expected.
I write a blog that uses Scheme as its implementation language. Because I don't want to alienate users of any particular implementation of Scheme, I write in a restricted dialect of Scheme that is based on R5RS plus syntax-case macros plus my Standard Prelude. I don't find that overly restrictive for the kind of algorithmic programs that I write, but your needs may be different. If you look at the various exercises on the blog, you will see that I wrote my own regular-expression matcher, that I've done a fair amount of string manipulation, and that I've snatched files from the internet by shelling out to wget (I use Chez Scheme -- users have to provide their own non-portable shell mechanism if they use anything else); I've even done some limited graphics work by writing ANSI terminal sequences.
I'll disagree just a little bit with Jens. Instead of porting afterwards, I find it easier to build in portability from the beginning. I didn't use to think that way, but my experience over the last three years shows that it works.
It's worth pointing out that modern Scheme implementations are themselves fairly portable; you can often port whole programs to new environments simply by bringing the appropriate Scheme along. That doesn't help library programmers much, though, and that's where R7RS-small, the latest Scheme definition, comes in. It's not widely implemented yet, but it provides a larger common core than R5RS.
Like lots of you guys on SO, I often write in several languages. And when it comes to planning stuff, (or even answering some SO questions), I actually think and write in some unspecified hybrid language. Although I used to be taught to do this using flow diagrams or UML-like diagrams, in retrospect, I find "my" pseudocode language has components of C, Python, Java, bash, Matlab, perl, Basic. I seem to unconsciously select the idiom best suited to expressing the concept/algorithm.
Common idioms might include Java-like braces for scope, pythonic list comprehensions or indentation, C++like inheritance, C#-style lambdas, matlab-like slices and matrix operations.
I noticed that it's actually quite easy for people to recognise exactly what I'm triying to do, and quite easy for people to intelligently translate into other languages. Of course, that step involves considering the corner cases, and the moments where each language behaves idiosyncratically.
But in reality, most of these languages share a subset of keywords and library functions which generally behave identically - maths functions, type names, while/for/if etc. Clearly I'd have to exclude many 'odd' languages like lisp, APL derivatives, but...
So my questions are,
Does code already exist that recognises the programming language of a text file? (Surely this must be a less complicated task than eclipse's syntax trees or than google translate's language guessing feature, right?) In fact, does the SO syntax highlighter do anything like this?
Is it theoretically possible to create a single interpreter or compiler that recognises what language idiom you're using at any moment and (maybe "intelligently") executes or translates to a runnable form. And flags the corner cases where my syntax is ambiguous with regards to behaviour. Immediate difficulties I see include: knowing when to switch between indentation-dependent and brace-dependent modes, recognising funny operators (like *pointer vs *kwargs) and knowing when to use list vs array-like representations.
Is there any language or interpreter in existence, that can manage this kind of flexible interpreting?
Have I missed an obvious obstacle to this being possible?
edit
Thanks all for your answers and ideas. I am planning to write a constraint-based heuristic translator that could, potentially, "solve" code for the intended meaning and translate into real python code. It will notice keywords from many common languages, and will use syntactic clues to disambiguate the human's intentions - like spacing, brackets, optional helper words like let or then, context of how variables are previously used etc, plus knowledge of common conventions (like capital names, i for iteration, and some simplistic limited understanding of naming of variables/methods e.g containing the word get, asynchronous, count, last, previous, my etc). In real pseudocode, variable naming is as informative as the operations themselves!
Using these clues it will create assumptions as to the implementation of each operation (like 0/1 based indexing, when should exceptions be caught or ignored, what variables ought to be const/global/local, where to start and end execution, and what bits should be in separate threads, notice when numerical units match / need converting). Each assumption will have a given certainty - and the program will list the assumptions on each statement, as it coaxes what you write into something executable!
For each assumption, you can 'clarify' your code if you don't like the initial interpretation. The libraries issue is very interesting. My translator, like some IDE's, will read all definitions available from all modules, use some statistics about which classes/methods are used most frequently and in what contexts, and just guess! (adding a note to the program to say why it guessed as such...) I guess it should attempt to execute everything, and warn you about what it doesn't like. It should allow anything, but let you know what the several alternative interpretations are, if you're being ambiguous.
It will certainly be some time before it can manage such unusual examples like #Albin Sunnanbo's ImportantCustomer example. But I'll let you know how I get on!
I think that is quite useless for everything but toy examples and strict mathematical algorithms. For everything else the language is not just the language. There are lots of standard libraries and whole environments around the languages. I think I write almost as many lines of library calls as I write "actual code".
In C# you have .NET Framework, in C++ you have STL, in Java you have some Java libraries, etc.
The difference between those libraries are too big to be just syntactic nuances.
<subjective>
There has been attempts at unifying language constructs of different languages to a "unified syntax". That was called 4GL language and never really took of.
</subjective>
As a side note I have seen a code example about a page long that was valid as c#, Java and Java script code. That can serve as an example of where it is impossible to determine the actual language used.
Edit:
Besides, the whole purpose of pseudocode is that it does not need to compile in any way. The reason you write pseudocode is to create a "sketch", however sloppy you like.
foreach c in ImportantCustomers{== OrderValue >=$1M}
SendMailInviteToSpecialEvent(c)
Now tell me what language it is and write an interpreter for that.
To detect what programming language is used: Detecting programming language from a snippet
I think it should be possible. The approach in 1. could be leveraged to do this, I think. I would try to do it iteratively: detect the syntax used in the first line/clause of code, "compile" it to intermediate form based on that detection, along with any important syntax (e.g. begin/end wrappers). Then the next line/clause etc. Basically write a parser that attempts to recognize each "chunk". Ambiguity could be flagged by the same algorithm.
I doubt that this has been done ... seems like the cognitive load of learning to write e.g. python-compatible pseudocode would be much easier than trying to debug the cases where your interpreter fails.
a. I think the biggest problem is that most pseudocode is invalid in any language. For example, I might completely skip object initialization in a block of pseudocode because for a human reader it is almost always straightforward to infer. But for your case it might be completely invalid in the language syntax of choice, and it might be impossible to automatically determine e.g. the class of the object (it might not even exist). Etc.
b. I think the best you can hope for is an interpreter that "works" (subject to 4a) for your pseudocode only, no-one else's.
Note that I don't think that 4a,4b are necessarily obstacles to it being possible. I just think it won't be useful for any practical purpose.
Recognizing what language a program is in is really not that big a deal. Recognizing the language of a snippet is more difficult, and recognizing snippets that aren't clearly delimited (what do you do if four lines are Python and the next one is C or Java?) is going to be really difficult.
Assuming you got the lines assigned to the right language, doing any sort of compilation would require specialized compilers for all languages that would cooperate. This is a tremendous job in itself.
Moreover, when you write pseudo-code you aren't worrying about the syntax. (If you are, you're doing it wrong.) You'll wind up with code that simply can't be compiled because it's incomplete or even contradictory.
And, assuming you overcame all these obstacles, how certain would you be that the pseudo-code was being interpreted the way you were thinking?
What you would have would be a new computer language, that you would have to write correct programs in. It would be a sprawling and ambiguous language, very difficult to work with properly. It would require great care in its use. It would be almost exactly what you don't want in pseudo-code. The value of pseudo-code is that you can quickly sketch out your algorithms, without worrying about the details. That would be completely lost.
If you want an easy-to-write language, learn one. Python is a good choice. Use pseudo-code for sketching out how processing is supposed to occur, not as a compilable language.
An interesting approach would be a "type-as-you-go" pseudocode interpreter. That is, you would set the language to be used up front, and then it would attempt to convert the pseudo code to real code, in real time, as you typed. An interactive facility could be used to clarify ambiguous stuff and allow corrections. Part of the mechanism could be a library of code which the converter tried to match. Over time, it could learn and adapt its translation based on the habits of a particular user.
People who program all the time will probably prefer to just use the language in most cases. However, I could see the above being a great boon to learners, "non-programmer programmers" such as scientists, and for use in brainstorming sessions with programmers of various languages and skill levels.
-Neil
Programs interpreting human input need to be given the option of saying "I don't know." The language PL/I is a famous example of a system designed to find a reasonable interpretation of anything resembling a computer program that could cause havoc when it guessed wrong: see http://horningtales.blogspot.com/2006/10/my-first-pli-program.html
Note that in the later language C++, when it resolves possible ambiguities it limits the scope of the type coercions it tries, and that it will flag an error if there is not a unique best interpretation.
I have a feeling that the answer to 2. is NO. All I need to prove it false is a code snippet that can be interpreted in more than one way by a competent programmer.
Does code already exist that
recognises the programming language
of a text file?
Yes, the Unix file command.
(Surely this must be a less
complicated task than eclipse's syntax
trees or than google translate's
language guessing feature, right?) In
fact, does the SO syntax highlighter
do anything like this?
As far as I can tell, SO has a one-size-fits-all syntax highlighter that tries to combine the keywords and comment syntax of every major language. Sometimes it gets it wrong:
def median(seq):
"""Returns the median of a list."""
seq_sorted = sorted(seq)
if len(seq) & 1:
# For an odd-length list, return the middle item
return seq_sorted[len(seq) // 2]
else:
# For an even-length list, return the mean of the 2 middle items
return (seq_sorted[len(seq) // 2 - 1] + seq_sorted[len(seq) // 2]) / 2
Note that SO's highlighter assumes that // starts a C++-style comment, but in Python it's the integer division operator.
This is going to be a major problem if you try to combine multiple languages into one. What do you do if the same token has different meanings in different languages? Similar situations are:
Is ^ exponentiation like in BASIC, or bitwise XOR like in C?
Is || logical OR like in C, or string concatenation like in SQL?
What is 1 + "2"? Is the number converted to a string (giving "12"), or is the string converted to a number (giving 3)?
Is there any language or interpreter
in existence, that can manage this
kind of flexible interpreting?
On another forum, I heard a story of a compiler (IIRC, for FORTRAN) that would compile any program regardless of syntax errors. If you had the line
= Y + Z
The compiler would recognize that a variable was missing and automatically convert the statement to X = Y + Z, regardless of whether you had an X in your program or not.
This programmer had a convention of starting comment blocks with a line of hyphens, like this:
C ----------------------------------------
But one day, they forgot the leading C, and the compiler choked trying to add dozens of variables between what it thought was subtraction operators.
"Flexible parsing" is not always a good thing.
To create a "pseudocode interpreter," it might be necessary to design a programming language that allows user-defined extensions to its syntax. There already are several programming languages with this feature, such as Coq, Seed7, Agda, and Lever. A particularly interesting example is the Inform programming language, since its syntax is essentially "structured English."
The Coq programming language allows "syntax extensions", so the language can be extended to parse new operators:
Notation "A /\ B" := (and A B).
Similarly, the Seed7 programming language can be extended to parse "pseudocode" using "structured syntax definitions." The while loop in Seed7 is defined in this way:
syntax expr: .while.().do.().end.while is -> 25;
Alternatively, it might be possible to "train" a statistical machine translation system to translate pseudocode into a real programming language, though this would require a large corpus of parallel texts.
What do you lose in practice when you choose a statically-typed language such as Scala (or F#, Haskell, C#) instead of dynamically-typed ones like Ruby, Python, Clojure, Groovy (which have macros or runtime metaprogramming capabilities)? Please consider best statically-typed languages and best (in your opinion) dynamically-typed languages, not the worst ones.
Answers Summary:
Key advantages of dynamic languages like Ruby over statically-typed language like Scala IMHO are:
Quick edit-run cycle (does JavaRebel reduces the gap?)
Currently community of Scala/Lift is much smaller then of Ruby/Rails or Python/Django
Possible to modify type definitions (though motivation or need for that is not very clear)
In principle, you give up being able to ignore what type you're using when it is not clear (in the static context) what the right thing to do is, and that's about it.
Since complex type-checking can be rather time-consuming, you also probably are forced to give up fast on-line metaprogramming.
In practice, with Scala, you give up very little else--and nothing that I particularly care about. You can't inject new methods, but you can compile and run new code. You do have to specify types in function arguments (and the return type with recursive functions), which is slightly annoying if you never make type errors yourself. Since it compiles each command, the Scala REPL isn't as snappy as e.g. the Python shell. And since it uses Java reflection mechanisms, you don't have quite the ease of online inspection that you do with e.g. Python (not without building your own inspection library, anyway).
The choice of which static or dynamic language is more significant than the static/dynamic choice itself. Some dynamic languages have good performance and good tools. Some static languages can be concise, expressive, and incremental. Some languages have few of these qualities, but do have large libraries of proven code.
Dynamic languages tend to have much more flexible type systems. For example, Python lets you inject a new method into an existing classes, or even into a single object.
Many (not all) static languages lack the facility to construct complex literals. For instance, languages like C# and Java cannot easily mimic the following JavaScript { 'request':{'type':'GET', 'path':mypath}, 'oncomplete':function(response) { alert(response.result) } }.
Dynamic languages have very fluid semantics. Python allows import statements, function definitions and class definitions to appear inside functions and if statements.
eval is a staple of most dynamic languages and few static languages.
Higher order programming is easier (in my subjective opinion) in dynamic languages than static languages, due to the awkwardness of having to fully specify the types of function parameters.
This is particulary so with recursive HOP constructs where the type system can really get in the way.
Dynamic language users don't have to deal with covariance and contravariance.
Generic programming comes practically free in dynamic languages.
I'm not sure if you lose anything but simplicity. Static type systems are an additional burden to learn.
I suppose you usually also lose eval, but I never use it, even in dynamic languages.
I find the issue is much more about everything else when it comes to choosing which language to use for a given task. Tooling, culture, libraries are all much more interesting than typing when it comes to solving a problem with a language.
Programming language research, on the other hand, is completely different. :)
Some criticism of Scala has been expressed by Steve Yegge here and here, and by Guido van Rossum, who mainly attacked Scala's type system complexity. They clearly aren't "Scala programmers" though. On the other hand, here's some praise from James Strachan.
My 2 cents...
IMO (strong) statically-typed languages might reduce the amount of necessary testing code, because some of that work will be done by the compiler. On the other hand, if the compiling step is relatively long, it makes it more difficult to do "incremental-style" programming, which in the real life might result in error-prone code that was only tested to pass the compiler.
On the other hand, dynamically-typed languages feel like there is less threshold to change things, that might reduce the responding time from the point of bug-fixing and improvement, and as a result might provide a smoother curve during application development: handling constant flow of small changes is easier/less risky than handling changes which are coming in bug chunks.
For example, for the project where the design is very unclear and is supposed to change often, it might have been easier to use dynamic language than a static one, if it helps reduce interdependencies between different parts. (I don't insist on that one though:) )
I think Scala sits somewhere in between (e.g. you don't have to explicitly specify types of the variables, which might ease up code maintenance in comparison with e.g. C++, but if you end up with the wrong assumption about types, the compiler will remind about it, unlike in PHP where you can write whatever and if you don't have good tests covering the functionality, you are doomed to find it out when everything is live and bleeding). Might be terribly wrong of course :)
In my opinion, the difference between the static and dynamic typing comes down to the style of coding. Although there is structural types in Scala, most of the time the programmer is thinking in terms of the type of the object including cool gadgets like trait. On the other hand, I think Python/Javascript/Ruby programmers think in terms of prototype of the object (list of methods and properties), which is slightly different from types.
For example, suppose there's a family of classes called Vehicle whose subclasses include Plane, Train, and Automobile; and another family of classes called Animal whose subclasses include Cat, Dog, and Horse. A Scala programmer would probably create a trait called Transportation or something which has
def ride: SomeResult
def ride(rider: Someone): SomeResult
as a member, so she can handle both Train and Horse as a means of transportation. A Python programmer would just pass the train object without additional code. At the run time the language figures out that the object supports ride.
The fact that the method invocations are resolved at the runtime allows languages like Python and Ruby to have libraries that redefines the meaning of properties or methods. A good example of that is O/R mapping or XML data binding, in which undefined property name is interpreted to be the field name in a table/XML type. I think this is what people mean by "flexibility."
In my very limited experience of using dynamic languages, I think it's faster coding in them as long as you don't make mistakes. And probably as you or your coworkers get good at coding in dynamic language, they would make less mistakes or start writing more unit tests (good luck). In my limited experience, it took me very long to find simple errors in dynamic languages that Scala can catch in a second. Also having all types at compile time makes refactoring easier.
It seems to me that the most invaluable thing about a static/strongly-typed programming language is that it helps refactoring: if/when you change any API, then the compiler will tell you what that change has broken.
I can imagine writing code in a runtime/weakly-typed language ... but I can't imagine refactoring without the compiler's help, and I can't imagine writing tens of thousands of lines of code without refactoring.
Is this true?
I think you're conflating when types are checked with how they're checked. Runtime typing isn't necessarily weak.
The main advantage of static types is exactly what you say: they're exhaustive. You can be confident all call sites conform to the type just by letting the compiler do it's thing.
The main limitation of static types is that they're limited in the constraints they can express. This varies by language, with most languages having relatively simple type systems (c, java), and others with extremely powerful type systems (haskell, cayenne).
Because of this limitation types on their own are not sufficient. For example, in java types are more or less restricted to checking type names match. This means the meaning of any constraint you want checked has to be encoded into a naming scheme of some sort, hence the plethora of indirections and boiler plate common to java code. C++ is a little better in that templates allow a bit more expressiveness, but don't come close to what you can do with dependent types. I'm not sure what the downsides to the more powerful type systems are, though clearly there must be some or more people would be using them in industry.
Even if you're using static typing, chances are it's not expressive enough to check everything you care about, so you'll need to write tests too. Whether static typing saves you more effort than it requires in boilerplate is a debate that's raged for ages and that I don't think has a simple answer for all situations.
As to your second question:
How can we re-factor safely in a runtime typed language?
The answer is tests. Your tests have to cover all the cases that matter. Tools can help you in gauging how exhaustive your tests are. Coverage checking tools let you know wether lines of code are covered by the tests or not. Test mutation tools (jester, heckle) can let you know if your tests are logically incomplete. Acceptance tests let you know what you've written matches requirements, and lastly regression and performance tests ensure that each new version of the product maintains the quality of the last.
One of the great things about having proper testing in place vs relying on elaborate type indirections is that debugging becomes much simpler. When running the tests you get specific failed assertions within tests that clearly express what they're doing, rather than obtuse compiler error statements (think c++ template errors).
No matter what tools you use: writing code you're confident in will require effort. It most likely will require writing a lot of tests. If the penalty for bugs is very high, such as aerospace or medical control software, you may need to use formal mathematical methods to prove the behavior of your software, which makes such development extremely expensive.
I totally agree with your sentiment. The very flexibility that dynamically typed languages are supposed to be good at is actually what makes the code very hard to maintain. Really, is there such a thing as a program that continues to work if the data types are changed in a non trivial way without actually changing the code?
In the mean time, you could check the type of variable being passed, and somehow fail if its not the expected type. You'd still have to run your code to root out those cases, but at least something would tell you.
I think Google's internal tools actually do a compilation and probably type checking to their Javascript. I wish I had those tools.
To start, I'm a native Perl programmer so on the one hand I've never programmed with the net of static types. OTOH I've never programmed with them so I can't speak to their benefits. What I can speak to is what its like to refactor.
I don't find the lack of static types to be a problem wrt refactoring. What I find a problem is the lack of a refactoring browser. Dynamic languages have the problem that you don't really know what the code is really going to do until you actually run it. Perl has this more than most. Perl has the additional problem of having a very complicated, almost unparsable, syntax. Result: no refactoring tools (though they're working very rapidly on that). The end result is I have to refactor by hand. And that is what introduces bugs.
I have tests to catch them... usually. I do find myself often in front of a steaming pile of untested and nigh untestable code with the chicken/egg problem of having to refactor the code in order to test it, but having to test it in order to refactor it. Ick. At this point I have to write some very dumb, high level "does the program output the same thing it did before" sort of tests just to make sure I didn't break something.
Static types, as envisioned in Java or C++ or C#, really only solve a small class of programming problems. They guarantee your interfaces are passed bits of data with the right label. But just because you get a Collection doesn't mean that Collection contains the data you think it does. Because you get an integer doesn't mean you got the right integer. Your method takes a User object, but is that User logged in?
Classic example: public static double sqrt(double a) is the signature for the Java square root function. Square root doesn't work on negative numbers. Where does it say that in the signature? It doesn't. Even worse, where does it say what that function even does? The signature only says what types it takes and what it returns. It says nothing about what happens in between and that's where the interesting code lives. Some people have tried to capture the full API by using design by contract, which can broadly be described as embedding run-time tests of your function's inputs, outputs and side effects (or lack thereof)... but that's another show.
An API is far more than just function signatures (if it wasn't, you wouldn't need all that descriptive prose in the Javadocs) and refactoring is far more even than just changing the API.
The biggest refactoring advantage a statically typed, statically compiled, non-dynamic language gives you is the ability to write refactoring tools to do quite complex refactorings for you because it knows where all the calls to your methods are. I'm pretty envious of IntelliJ IDEA.
I would say refactoring goes beyond what the compiler can check, even in statically-typed languages. Refactoring is just changing a programs internal structure without affecting the external behavior. Even in dynamic languages, there are still things that you can expect to happen and test for, you just lose a little bit of assistance from the compiler.
One of the benefits of using var in C# 3.0 is that you can often change the type without breaking any code. The type needs to still look the same - properties with the same names must exist, methods with the same or similar signature must still exist. But you can really change to a very different type, even without using something like ReSharper.