What are the key aspects of a strongly typed language? - strong-typing

What makes a language strongly typed? I'm looking for the most important aspects of a strongly typed language.
Yesterday I asked if PowerShell was strongly typed, but no one could agree on the definition of "strongly-typed", so I'm looking to clarify the definition.
Feel free to link to wikipedia or other sources, but don't just cut and paste for your answer.

The term "strongly typed" has no agreed-upon definition.
It makes a "great" argument in a flamewar, because whenever someone is proven wrong, they can just redefine it to mean whatever they want it to mean. Other than that, the term serves no real purpose.
It is best to just not use the term, or, if you use it, rigorously define it first. If you see someone else use it, ask him to define the term.
Everybody has their own definition. Some that I have seen are:
strongly typed = statically typed
strongly typed = explicitly typed
strongly typed = nominally typed
strongly typed = typed
strongly typed = has no implicit typecasts, only explicit
strongly typed = has no typecasts at all
strongly typed = what I understand / weakly typed = what I don't understand
strongly typed = C++ / weakly typed = everything else
strongly typed = Java / weakly typed = everything else
strongly typed = .NET / weakly typed = everything else
strongly typed = my programming language / weakly typed = your programming language
In Type Theory, there exists the notion of one type system being stronger than another. In particular, if there exists an expression e1 such that it is accepted by a type system T1, but rejected by a type system T2, then T2 is said to be stronger than T1. There are two important things to note here:
this a comparative, not an absolute: there is no strong or weak, only stronger and weaker
there is no value implied by the term; stronger does not mean better

According to B.C. Pierce, the guy who wrote "Types and Programming Languages and Advanced Types and Programming Languages" :
I spent a few weeks trying to sort out
the terminology of "strongly typed,"
"statically typed," "safe," etc., and
found it amazingly difficult... The
usage of these terms is so various as
to render them almost useless.
So no wonder why your collegues disagree.
I'd go with the simplest answer : if you can concatenate a string and an int without casting, then it's not strongly typed.
EDIT: as stated in comments, Java just does that :-(

The key is to remember that there is a distinction between statically typed and strongly typed. A strongly typed language simply means that once assigned, a given variable will always behave as a certain type until it is reassigned. By definition statically typed languages like Java and C# are strongly typed, but so are many popular dynamic languages like Ruby and Python.
So in a strongly typed language
x = "5"
x will always be a string and will never be an integer.
In certain weakly typed languages you could do something like
x = "5"
y = x + 3
// y is now 8

People are confusing statically typed with strongly typed. Statically typed means "A string is a string is a string". Strongly typed means "Once you make this a string it will be treated as a string until it is reassigned as something different."
edit: I see someone else did point this out after all :)

I heard someone say in an interview (I think it was Anders Hejlsberg of C# and turbo pascal fame) that strong typing is not something that's on or off, some languages have a stronger type system than others.
There's also a lot of confusion between strongly, weakly, static and dynamic typing where staticly typed languages assign types to variables and dynamic languages give types to the objects stored in variables.
Try wikipedia for more info but don't expect a conclusive answer:
http://en.wikipedia.org/wiki/Strongly_typed_language

Strongly typed means you declare your variables of a certain type, and your compiler will throw a hissy fit if you try to convert that variable to another type without casting it.
Example (in java mind you):
int i = 4;
char s = i; // Type mismatch: cannot convert from int to char

The term 'strongly typed' is completely and utterly nonsensical. It has no meaning, and never did. Even if some of the claimed definitions were accurate, I see no purpose as to the reason for distinction; Why is it important to know, discuss or debate whether a language is strongly typed (whatever that may mean) or not?
This is very similar to the terms 'Web 2.0' or 'OEM', which also have no real meaning.
What is interesting to consider, is how these phrases begin, and take root in everyday communication.

Statically typed language is one where the variables need to be declared before they can be used. While a Dynamically typed language is one where the variables can be used anytime, even if they are not declared. The only condition is that they must be initialized before they can be used.
Now, let us come to Strongly typed language. In such a language the variables have a type, and they will always be that type. They cannot be assigned to a value of another type. While a Weakly typed language is one where variables don't have a type. One can assign value of any type to them.
Example:
Java is a statically typed as well as strongly typed language. It is statically typed, because one has to declare the variables before they can be used. It is strongly typed, because a variable of particular type int will always hold integer values. You can't assign boolean to them.
Powershell is a dynamically typed as well as weakly typed language. It is dynamically typed as variables need not be declared before using them. it is weakly typed as variable may hold value of one type at certain point while a value of another type at different point of time.

Related

Is there a meaningful difference between pass_by_reference vs pass_by_object_sharing in ruby?

Context: i argue that saying pass_by_reference when it's really pass_by_sharing is misleading
Here is the excerpt from the book "Effective Ruby" I'm arguing against
"Most objects are passed around as references and not as actual values. When these types of objects are inserted into a container the collection class is actually storing a reference to the object and not the object itself. (The notable exception to the rule is the Fixnum class whose objects are always passed by value and not by reference.)
The same is true when objects are passed as method arguments. The method will receive a reference to the object and not a new copy. This is great for efficiency but has a startling implication.
"
The 'call by value' and 'call by object sharing' terminology matches Ruby's behavior, and
the terminology is consistent with other object orientated languages that have the same
semantics.
'Call by value' and 'call by object sharing' basically mean the same thing in object orientated languages, so which one is used doesn't really matter. Someone just thought it would clarify the confusion in the terminology to add more terminology.
If 'call by reference' was implemented in Ruby though, it would be something like:
def f(byref x)
x = "CHANGED"
end
x = ""
f(x)
# X is "CHANGED"
Here, the value of x is changed. The value being which object x refers to.
Using terms 'call by reference' just creates confusion though because they mean
different things to different people. It's unnecessary in
languages like Ruby because you don't have a choice. In languages with different
calling mechanisms like C++ and C# it makes more sense to teach these terms because
they have a real effect on programs and we can come up with non hypothetical examples
of them.
When explaining parameters in Ruby, you don't need to use any of these terms though.
They're meaningless to people that don't already know the language. Just
describe the behavior itself without that terminology and avoid the baggage.
I would say if you insist on using these terms, then use 'call by value' because it's usually considered more correct. The 'Programming Ruby' book calls it 'call by value', as well as plenty of Ruby programmers. Using the term with a different meaning than its technical one isn't helpful.
You are right. Ruby is pass-by-value only. The semantics of passing and assigning in Ruby are exactly identical to those in Java. And Java is universally described (on Stack Overflow and the rest of the Internet) as pass-by-value only. Terms about languages such as pass-by-value and pass-by-reference must be consistently used across languages to be meaningful.
The thing that is often misunderstood by people who say Java, Ruby, etc. "pass objects by reference" is that "objects" are not values in these languages, and thus cannot be "passed". The value of every variable and result of every expression is a "reference", which is a pointer to an object. The expression for creating an object returns an object pointer; when you access an attribute through the dot notation, the left side takes an object pointer; when you assign one variable to another, you copy the pointer resulting in two pointers to the same object. You always deal with pointers to objects, never objects themselves.
This is made explicit in Java as the only types in Java are primitive types and reference types -- there are no "object types". So every value in Java that is not a primitive is a reference (a pointer to an object). Ruby is dynamically-typed, so variables don't have explicit types. But you can imagine a dynamically-typed language as just a statically-typed language having exactly one type; and for languages like Python and Ruby, if this type were described, it be a pointer-to-object type.
The issue ultimately boils down to a problem of definitions. People argue over things because there is no precise definition, or they each have slightly different definitions. Rather then argue over vaguely-defined things like what is the "value" of a variable, or whether named values are "variables" or "names", etc., we need to use a definition for pass-by-value and pass-by-reference that is based purely on semantics of a language structure. #fgb's answer provides a clear semantic test for pass-by-reference. In "true pass-by-reference", e.g. with & in C++ and PHP, or with ref or out in C#, simple assignment (i.e. =) to a parameter variable has the same effect as simple assignment to the passed variable in the original scope. In pass-by-value, simple assignment (i.e. =) to a parameter variable has no effect in the original scope. This is what we see in Java, Python, Ruby, and many other languages.
I dislike people coming up with new names like "pass by object sharing", when they don't understand that the semantics are covered by an existing term, pass-by-value. Adding a new term only adds more to the confusion rather than reduce it, because it does not resolve the definitions of existing terms, only adding a new term that also needs to be defined.

What is the difference between syntax and semantics in programming languages?

What is the difference between syntax and semantics in programming languages (like C, C++)?
TL; DR
In summary, syntax is the concept that concerns itself only whether or not the sentence is valid for the grammar of the language. Semantics is about whether or not the sentence has a valid meaning.
Long answer:
Syntax is about the structure or the grammar of the language. It answers the question: how do I construct a valid sentence? All languages, even English and other human (aka "natural") languages have grammars, that is, rules that define whether or not the sentence is properly constructed.
Here are some C language syntax rules:
separate statements with a semi-colon
enclose the conditional expression of an IF statement inside parentheses
group multiple statements into a single statement by enclosing in curly braces
data types and variables must be declared before the first executable statement (this feature has been dropped in C99. C99 and latter allow mixed type declarations.)
Semantics is about the meaning of the sentence. It answers the questions: is this sentence valid? If so, what does the sentence mean? For example:
x++; // increment
foo(xyz, --b, &qrs); // call foo
are syntactically valid C statements. But what do they mean? Is it even valid to attempt to transform these statements into an executable sequence of instructions? These questions are at the heart of semantics.
Consider the ++ operator in the first statement. First of all, is it even valid to attempt this?
If x is a float data type, this statement has no meaning (according to the C language rules) and thus it is an error even though the statement is syntactically correct.
If x is a pointer to some data type, the meaning of the statement is to "add sizeof(some data type) to the value at address x and store the result into the location at address x".
If x is a scalar, the meaning of the statement is "add one to the value at address x and store the result into the location at address x".
Finally, note that some semantics can not be determined at compile-time and therefore must be evaluated at run-time. In the ++ operator example, if x is already at the maximum value for its data type, what happens when you try to add 1 to it? Another example: what happens if your program attempts to dereference a pointer whose value is NULL?
Syntax refers to the structure of a language, tracing its etymology to how things are put together.
For example you might require the code to be put together by declaring a type then a name and then a semicolon, to be syntactically correct.
Type token;
On the other hand, the semantics is about meaning.
A compiler or interpreter could complain about syntax errors. Your co-workers will complain about semantics.
Semantics is what your code means--what you might describe in pseudo-code. Syntax is the actual structure--everything from variable names to semi-colons.
Wikipedia has the answer. Read syntax (programming languages) & semantics (computer science) wikipages.
Or think about the work of any compiler or interpreter. The first step is lexical analysis where tokens are generated by dividing string into lexemes then parsing, which build some abstract syntax tree (which is a representation of syntax). The next steps involves transforming or evaluating these AST (semantics).
Also, observe that if you defined a variant of C where every keyword was transformed into its French equivalent (so if becoming si, do becoming faire, else becoming sinon etc etc...) you would definitely change the syntax of your language, but you won't change much the semantics: programming in that French-C won't be easier!
You need correct syntax to compile.
You need correct semantics to make it work.
Late to the party - but to me, the answers here seem correct but incomplete.
Pragmatically, I would distinguish between three levels:
Syntax
Low level semantics
High level semantics
1. SYNTAX
Syntax is the formal grammar of the language, which specifies a well-formed statement the compiler will recognise.
So in C, the syntax of variable initialisation is:
data_type variable_name = value_expression;
Example:
int volume = 66 * 22 * 55;
While in Go, which offers type inference, one form of initialisation is:
variable_name := value_expression
Example:
volume := 66 * 22 * 55
Clearly, a Go compiler won't recognise the C syntax, and vice versa.
2. LOW LEVEL SEMANTICS
Where syntax is concerned with form, semantics is concerned with meaning.
In natural languages, a sentence can be syntactically correct but semantically meaningless. For example:
The man bought the infinity from the store.
The sentence is grammatically correct but doesn't make real-world sense.
At the low level, programming semantics is concerned with whether a statement with correct syntax is also consistent with the semantic rules as expressed by the developer using the type system of the language.
For example, this is a syntactically correct assignment statement in Java, but semantically it's an error as it tries to assign an int to a String
String firstName = 23;
So type systems are intended to protect the developer from unintended slips of meaning at the low level.
Loosely typed languages like JavaScript or Python provide very little semantic protection, while languages like Haskell or F# with expressive type systems provide the skilled developer with a much higher level of protection.
For example, in F# your ShoppingCart type can specify that the cart must be in one of three states:
type ShoppingCart =
| EmptyCart // no data
| ActiveCart of ActiveCartData
| PaidCart of PaidCartData
Now the compiler can check that your code hasn't tried to put the cart into an illegal state.
In Python, you would have to write your own code to check for valid state.
3. HIGH LEVEL SEMANTICS
Finally, at a higher level, semantics is concerned with what the code is intended to achieve - the reason that the program is being written.
This can be expressed as pseudo-code which could be implemented in any complete language. For example:
// Check for an open trade for EURUSD
// For any open trade, close if the profit target is reached
// If there is no open trade for EURUSD, check for an entry signal
// For an entry signal, use risk settings to calculate trade size
// Submit the order.
In this (heroically simplified) scenario, you are making a high-level semantic error if your system enters two trades at once for EURUSD, enters a trade in the wrong direction, miscalculates the trade size, and so on.
TL; DR
If you screw up your syntax or low-level semantics, your compiler will complain.
If you screw up your high-level semantics, your program isn't fit for purpose and your customer will complain.
Syntax is the structure or form of expressions, statements, and program units but Semantics is the meaning of those expressions, statements, and program units. Semantics follow directly from syntax.
Syntax refers to the structure/form of the code that a specific programming language specifies but Semantics deal with the meaning assigned to the symbols, characters and words.
Understanding how the compiler sees the code
Usually, syntax and semantics analysis of the code is done in the 'frontend' part of the compiler.
Syntax: Compiler generates tokens for each keyword and symbols: the token contains the information- type of keyword and its location in the code.
Using these tokens, an AST(short for Abstract Syntax Tree) is created and analysed.
What compiler actually checks here is whether the code is lexically meaningful i.e. does the 'sequence of keywords' comply with the language rules? As suggested in previous answers, you can see it as the grammar of the language(not the sense/meaning of the code).
Side note: Syntax errors are reported in this phase.(returns tokens with the error type to the system)
Semantics: Now, the compiler will check whether your code operations 'makes sense'.
e.g. If the language supports Type Inference, sematic error will be reported if you're trying to assign a string to a float. OR declaring the same variable twice.
These are errors that are 'grammatically'/ syntaxially correct, but makes no sense during the operation.
Side note: For checking whether the same variable is declared twice, compiler manages a symbol table
So, the output of these 2 frontend phases is an annotated AST(with data types) and symbol table.
Understanding it in a less technical way
Considering the normal language we use; here, English:
e.g. He go to the school. - Incorrect grammar/syntax, though he wanted to convey a correct sense/semantic.
e.g. He goes to the cold. - cold is an adjective. In English, we might say this doesn't comply with grammar, but it actually is the closest example to incorrect semantic with correct syntax I could think of.
He drinks rice (wrong semantic- meaningless, right syntax- grammar)
Hi drink water (right semantic- has meaning, wrong syntax- grammar)
Syntax: It is referring to grammatically structure of the language.. If you are writing the c language . You have to very care to use of data types, tokens [ it can be literal or symbol like "printf()". It has 3 tokes, "printf, (, )" ]. In the same way, you have to very careful, how you use function, function syntax, function declaration, definition, initialization and calling of it.
While semantics, It concern to logic or concept of sentence or statements. If you saying or writing something out of concept or logic, then you are semantically wrong.

Why isn't DRY considered a good thing for type declarations?

It seems like people who would never dare cut and paste code have no problem specifying the type of something over and over and over. Why isn't it emphasized as a good practice that type information should be declared once and only once so as to cause as little ripple effect as possible throughout the source code if the type of something is modified? For example, using pseudocode that borrows from C# and D:
MyClass<MyGenericArg> foo = new MyClass<MyGenericArg>(ctorArg);
void fun(MyClass<MyGenericArg> arg) {
gun(arg);
}
void gun(MyClass<MyGenericArg> arg) {
// do stuff.
}
Vs.
var foo = new MyClass<MyGenericArg>(ctorArg);
void fun(T)(T arg) {
gun(arg);
}
void gun(T)(T arg) {
// do stuff.
}
It seems like the second one is a lot less brittle if you change the name of MyClass, or change the type of MyGenericArg, or otherwise decide to change the type of foo.
I don't think you're going to find a lot of disagreement with your argument that the latter example is "better" for the programmer. A lot of language design features are there because they're better for the compiler implementer!
See Scala for one reification of your idea.
Other languages (such as the ML family) take type inference much further, and create a whole style of programming where the type is enormously important, much more so than in the C-like languages. (See The Little MLer for a gentle introduction.)
It isn't considered a bad thing at all. In fact, C# maintainers are already moving a bit towards reducing the tiring boilerplate with the var keyword, where
MyContainer<MyType> cont = new MyContainer<MyType>();
is exactly equivalent to
var cont = new MyContainer<MyType>();
Although you will see many people who will argue against var usage, which kind of shows that many people is not familiar with strong typed languages with type inference; type inference is mistaken for dynamic/soft typing.
Repetition may lead to more readable code, and sometimes may be required in the general case. I've always seen the focus of DRY being more about duplicating logic than repeating literal text. Technically, you can eliminate 'var' and 'void' from your bottom code as well. Not to mention you indicate scope with indentation, why repeat yourself with braces?
Repetition can also have practical benefits: parsing by a program is easier by keeping the 'void', for example.
(However, I still strongly agree with you on prefering "var name = new Type()" over "Type name = new Type()".)
It's a bad thing. This very topic was mentioned in Google's Go language Techtalk.
Albert Einstein said, "Everything should be made as simple as possible, but not one bit simpler."
Your complaint makes no sense in the case of a dynamically typed language, so you must intend this to refer to statically typed languages. In that case, your replacement example implicitly uses Generics (aka Template Classes), which means that any time that fun or gun is used, a new definition based upon the type of the argument. That could result in dozens of extra methods, regardless of the intent of the programmer. In particular, you're throwing away the benefit of compiler-checked type-safety for a runtime error.
If your goal was to simply pass through the argument without checking its type, then the correct type would be Object not T.
Type declarations are intended to make the programmer's life simpler, by catching errors at compile-time, instead of failing at runtime. If you have an overly complex type definition, then you probably don't understand your data. In your example, I would have suggested adding fun and gun to MyClass, instead of defining them separately. If fun and gun don't apply to all possible template types, then they should be defined in an explicit subclass, not as separate functions that take a templated class argument.
Generics exist as a way to wrap behavior around more specific objects. List, Queue, Stack, these are fine reasons for Generics, but at the end of the day, the only thing you should be doing with a bare Generic is creating an instance of it, and calling methods on it. If you really feel the need to do more than that with a Generic, then you probably need to embed your Generic class as an instance object in a wrapper class, one that defines the behaviors you need. You do this for the same reason that you embed primitives into a class: because by themselves, numbers and strings do not convey semantic information about their contents.
Example:
What semantic information does List convey? Just that you're working with multiple triples of integers. On the other hand, List, where a color has 3 integers (red, blue, green) with bounded values (0-255) conveys the intent that you're working with multiple Colors, but provides no hint as to whether the List is ordered, allows duplicates, or any other information about the Colors. Finally a Palette can add those semantics for you: a Palette has a name, contains multiple Colors, but no duplicates, and order isn't important.
This has gotten a bit far afield from the original question, but what it means to me is that DRY (Don't Repeat Yourself) means specifying information once, but that specification should be as precise as is necessary.

Do you use articles in your variable names?

Edit: There appears to be at least two valid reasons why Smalltalkers do this (readability during message chaining and scoping issues) but perhaps the question can remain open longer to address general usage.
Original: For reasons I've long forgotten, I never use articles in my variable names. For instance:
aPerson, theCar, anObject
I guess I feel like articles dirty up the names with meaningless information. When I'd see a coworker's code using this convention, my blood pressure would tick up oh-so-slightly.
Recently I've started learning Smalltalk, mostly because I want to learn the language that Martin Fowler, Kent Beck, and so many other greats grew up on and loved.
I noticed, however, that Smalltalkers appear to widely use indefinite articles (a, an) in their variable names. A good example would be in the following Setter method:
name: aName address: anAddress.
self name: aName.
self address: anAddress
This has caused me to reconsider my position. If a community as greatly respected and influential as Smalltalkers has widely adopted articles in variable naming, maybe there's a good reason for it.
Do you use it? Why or why not?
This naming convention is one of the patterns in Kent Beck's book Smalltalk Best Practice Patterns. IMHO this book is a must-have even for non-smalltalkers, as it really helps naming things and writing self-documenting code. Plus it's probably one of the few pattern langages to exhibit Alexander's quality without a name.
Another good book on code patterns is Smalltalk with Style, which is available as a free PDF.
Generally, the convention is that instance variables and accessors use the bare noun, and parameters use the indefinite article plus either a role or a type, or a combination. Temporary variables can use bare nouns because they rarely duplicate the instance variable; alternatively, it's quite frequent to name them with more precision than just an indefinite article, in order to indicate their role in the control flow: eachFoo, nextFoo, randomChild...
It is in common use in Smalltalk as a typeless language because it hints the type of an argument in method call. The article itself signals that you are dealing with an instance of some object of specified class.
But remember that in Smalltalk the methods look differently, we use so called keyword messages and it this case the articles actually help the readability:
anAddressBook add: aPerson fromTownNamed: aString
I think I just found an answer. As Konrad Rudolph said, they use this convention because of a technical reason:
...this means it [method variable] cannot duplicate the name of an instance variable, a temporary variable defined in the interface, or another temporary variable.
-IBM Smalltalk Tutorial
Basically a local method variable cannot be named the same as an object/class variable. Coming from Java, I assumed a method's variables would be locally scoped, and you'd access the instance variables using something like:
self address
I still need to learn more about the method/local scoping in Smalltalk, but it appears they have no other choice; they must use a different variable name than the instance one, so anAddress is probably the simplest approach. Using just address results in:
Name is already defined ->address
if you have an instance variable address defined already...
I always felt the articles dirtied up the names with meaningless information.
Exactly. And this is all the reason necessary to drop articles: they clutter the code needlessly and provide no extra information.
I don’t know Smalltalk and can't talk about the reasons for “their” conventions but everywhere else, the above holds. There might be a simple technical reason behind the Smalltalk convention (such as ALL_CAPS in Ruby, which is a constant not only by convention but because of the language semantics).
I wobble back and forth on using this. I think that it depends on the ratio of C++ to Objective C in my projects at any given time. As for the basis and reasoning, Smalltalk popularized the notion of objects being "things". I think that it was Yourdon and Coad that strongly pushed describing classes in the first person. In Python it would be something like the following snippet. I really wish that I could remember enough SmallTalk to put together a "proper" example.
class Rectangle:
"""I am a rectangle. In other words, I am a polygon
of four sides and 90 degree vertices."""
def __init__(self, aPoint, anotherPoint):
"""Call me to create a new rectangle with the opposite
vertices defined by aPoint and anotherPoint."""
self.myFirstCorner = aPoint
self.myOtherCorner = anotherPoint
Overall, it is a conversational approach to program readability. Using articles in variable names was just one portion of the entire idiom. There was also an idiom surrounding the naming of parameters and message selectors IIRC. Something like:
aRect <- [Rectangle createFromPoint: startPoint
toPoint: otherPoint]
It was just another passing fad that still pops up every so often. Lately I have been noticing that member names like myHostName are popping up in C++ code as an alternative to m_hostName. I'm becoming more enamored with this usage which I think hearkens back to SmallTalk's idioms a little.
Never used, maybe because in my main language there are not any articles :P
Anyway i think that as long as variable's name is meaningful it's not important if there are articles or not, it's up to the coder's own preference.
Nope. I feel it is waste of characters space and erodes the readability of your code. I might use variations of the noun, for example Person vs People depending on the context. For example
ArrayList People = new ArrayList();
Person newPerson = new Person();
People.add(newPerson);
No I do not. I don't feel like it adds anything to the readability or maintainability of my code base and it does not distinguish the variable for me in any way.
The other downside is if you encourage articles in variable names, it's just a matter of time before someone does this in your code base.
var person = new Person();
var aPerson = GetSomeOtherPerson();
Where I work, the standard is to prefix all instance fields with "the-", local variables with "my-" and method parameters with "a-". I believe this came about because many developers were using text editors like vi instead of IDE's that can display different colors per scope.
In Java, I'd have to say I prefer it over writing setters where you dereference this.
Compare
public void setName(String name) {
this.name = name;
}
versus
public void setName(String aName) {
theName = aName;
}
The most important thing is to have a standard and for everyone to adhere to it.

Ruby Terminology Question: Is this a Ruby declaration, definition and assignment, all at the same time?

If I say:
x = "abc"
this seems like a declaration, definition and assignment, all at the same time, regardless of whether I have said anything about x in the program before.
Is this correct?
I'm not sure what the correct terminology is in Ruby for declarations, definitions and assigments or if there is even a distinction between these things because of the dynamic typing in Ruby.
#tg: Regarding your point # 2: even if x existed before the x = "abc" statement, couldn't you call the x = "abc" statement a definition/re-definition?
Declaration: No.
It doesn't make sense to talk about declaring variables in Ruby, because there's nothing analogous to a declaration in the languages. Languages designed for compilers have declarations because the compiler needs to know in advance how big datatypes are and how to access different parts of them. e.g., if I say in C:
int *i;
then the compiler knows that somewhere there is some memory set aside for i, and it's as big as it needs to be to hold a pointer to an int. Eventually the linker will hook all the references to i together, but at least the compiler knows it's out there somewhere.
Definition: Probably.
A definition typically set an initial value for something (at least in the familiar compiled languages). If x didn't exist before the x = "abc" statement, then I guess you could call this a definition, since that is when Ruby has to assign a value to the symbol x.
Again, though, definition is a specific term that people typically use to distinguish the initial, static assignment of a value to some variable from that variable's declaration. In Ruby, you don't have that kind of statement. You typically just say a variable is defined if it's been assigned a value somewhere in your current scope, and you say it's undefined if it hasn't.
You usually don't talk about it having a definition, because in Ruby that just amounts to assignment. There's no special context that would justify you saying definition like there is in other languages.
Which brings us to...
Assignment: Yes.
You can definitely call this an assignment, since it is assigning a value to the symbol x. I don't think anyone will disagree with that.
Pretty much. And if, on the very next line, you do:
x = 1
Then you've just re-defined it, as well as assigned it (its now an integer, not a string). Duck typing is very different to what you're probably used to.

Resources