High level programming languages are made to be understandable to humans, but 0 is usually not accepted as a natural number in mathematics. I do not understand why all programming languages I have seen always start counting from 0, eg. int[0] = 1st element instead of int[1] = 1st element. I want to know whether there are any programming languages that support this? If not, why?
Yes, lots. Fortran for example.
And then there are languages which allow array elements to start indexing at almost any integer. Fortran for example.
Not so many (considering the total number of programming languages)
ALGOL 68
APL
AWK
CFML
COBOL
Fortran
FoxPro
Informix
Julia
Lua
Mathematica
MATLAB
PL/I
Ring
RPG
Sass
Smalltalk
Wolfram Language
XPath/XQuery
You can do it in Perl
$[ = 1; # set the base array index to 1
Erlang's tuples and lists index starting at 1.
Sources
Wikipedia
Related
Forth famously allows users to alter the language by defining new words for control flow (beyond those given by the standard: DO, LOOP, BEGIN, UNTIL, WHILE, REPEAT, LEAVE IF, THEN, ELSE, CASE, ENDCASE, etc.)
Are there common examples of people actually creating their own new control flow words? What are some typical and useful examples? Or has the standard already defined everything that people actually need?
I'm hoping to find examples of useful language extensions that have gained acceptance or proved generally helpful to make the language more expressive.
One another big direction of control flow structures in Forth is backtracking. It is very expressive and powerful mechanism. To be implemented, it requires return address manipulation [Gas99].
Backtracking in Forth was developed as BacFORTH extension by M.L.Gassananko in ~1988-1990. First papers on this topic was in Russian.
The technique of backtracking enables one to create abstract iterator
and filter modules responsible for looking over sets of all possible
values and rejecting "undue" ones [Gas96b].
For some introduction see the short description: Backtracking (by mlg), also the multi-threading in Forth? discussion in comp.lang.forth can be useful (see the messages from Gassanenko).
Just one example of generator in BacFORTH:
: (0-2)=> PRO 3 0 DO I CONT LOOP ; \ generator
: test (0-2)=> CR . ." : " (0-2)=> . ;
test CR
Output:
0 : 0 1 2
1 : 0 1 2
2 : 0 1 2
The PRO and CONT are special control flow words. PRO designates generator word, and CONT calls the consumer — it is something like yield in Ruby or ECMAScript. A number of other special words is also defined in BacFORTH.
You can play with BacFORTH in SP-Forth (just include ~profit/lib/bac4th.f library).
Etymology
In general, backtracking is just an algorithm for finding solutions. In Prolog this algorithm was embedded under the hood, so backtracking in Prolog is the process how it works themselves. Backtracking in BacFORTH is programming technique that is supported by a set of special control flow words.
References
[Gas96a] M.L. Gassanenko, Formalization of Backtracking in Forth, 1996 (mirror)
[Gas96b] M.L. Gassanenko, Enhancing the Capabilities of Backtracking, 1996 (mirror)
[Gas99] M.L. Gassanenko, The Open Interpreter Word Set, 1999
Here's one example. CASE was a somewhat late addition to the set of Forth control flow words. In early 1980, a competition for defining the best CASE statment was announced in Forth Dimensions. It was settled later that year with a tie between three entries. One of those ended up in the Forth94 standard.
I'd like to be able to reason about code on paper better than just writing boxes or pseudocode.
The key thing here is paper. On a machine, I can most likely use a high-level language with a linter/compiler very quickly, and a keyboard restricts what can be done, somewhat.
A case study is APL, a language that we semi-jokingly describe as "write-only". Here is an example:
m ← +/3+⍳4
(Explanation: ⍳4 creates an array, [1,2,3,4], then 3 is added to each component, which are then summed together and the result stored in variable m.)
Look how concise that is! Imagine having to type those symbols in your day job! But, writing iota and arrows on a whiteboard is fine, saves time and ink.
Here's its haskell equivalent:
m = foldl (+) 0 (map (+3) [1..4])
And Python:
reduce(add, map(lambda x: x+3, range(4)))
But the principle behind these concise programming languages is different: they use words and punctuation to describe high-level actions (such as fold), whereas I want to write symbols for these common actions.
Does such a formalised pseudocode exist?
Not to be snarky, but you could use APL. It was after all originally invented as a mathematical notation before it was turned into a programming language. I seem to remember that there was something like what I think you are talking about in Backus' Turing Award lecture. Finally, maybe Z Notation is what you want: https://en.m.wikipedia.org/wiki/Z_notation
Okay, so what's up with this?
irb(main):001:0> 4/3
=> 1
irb(main):002:0> 7/8
=> 0
irb(main):003:0> 5/2
=> 2
I realize Ruby is doing integer division here, but why? With a langauge as flexible as Ruby, why couldn't 5/2 return the actual, mathematical result of 5/2? Is there some common use for integer division that I'm missing? It seems to me that making 7/8 return 0 would cause more confusion than any good that might come from it is worth. Is there any real reason why Ruby does this?
Because most languages (even advanced/high-level ones) in creation do it? You will have the same behaviour on integer in C, C++, Java, Perl, Python... This is Euclidian Division (hence the corresponding modulo % operator).
The integer division operation is even implemented at hardware level on many architecture. Others have asked this question, and one reason is symetry: In static typed languages such as see, this allows all integer operations to return integers, without loss of precision. It also allow easy access to the corresponding low-level assembler operation, since C was designed as a sort of extension layer over it.
Moreover, as explained in one comment to the linked article, floating point operations were costly (or not supported on all architectures) for many years, and not required for processes such as splitting a dataset in fixed lots.
I'm trying to understand the concept of languages levels (regular, context free, context sensitive, etc.).
I can look this up easily, but all explanations I find are a load of symbols and talk about sets. I have two questions:
Can you describe in words what a regular language is, and how the languages differ?
Where do people learn to understand this stuff? As I understand it, it is formal mathematics? I had a couple of courses at uni which used it and barely anyone understood it as the tutors just assumed we knew it. Where can I learn it and why are people "expected" to know it in so many sources? It's like there's a gap in education.
Here's an example:
Any language belonging to this set is a regular language over the alphabet.
How can a language be "over" anything?
In the context of computer science, a word is the concatenation of symbols. The used symbols are called the alphabet. For example, some words formed out of the alphabet {0,1,2,3,4,5,6,7,8,9} would be 1, 2, 12, 543, 1000, and 002.
A language is then a subset of all possible words. For example, we might want to define a language that captures all elite MI6 agents. Those all start with double-0, so words in the language would be 007, 001, 005, and 0012, but not 07 or 15. For simplicity's sake, we say a language is "over an alphabet" instead of "a subset of words formed by concatenation of symbols in an alphabet".
In computer science, we now want to classify languages. We call a language regular if it can be decided if a word is in the language with an algorithm/a machine with constant (finite) memory by examining all symbols in the word one after another. The language consisting just of the word 42 is regular, as you can decide whether a word is in it without requiring arbitrary amounts of memory; you just check whether the first symbol is 4, whether the second is 2, and whether any more numbers follow.
All languages with a finite number of words are regular, because we can (in theory) just build a control flow tree of constant size (you can visualize it as a bunch of nested if-statements that examine one digit after the other). For example, we can test whether a word is in the "prime numbers between 10 and 99" language with the following construct, requiring no memory except the one to encode at which code line we're currently at:
if word[0] == 1:
if word[1] == 1: # 11
return true # "accept" word, i.e. it's in the language
if word[1] == 3: # 13
return true
...
return false
Note that all finite languages are regular, but not all regular languages are finite; our double-0 language contains an infinite number of words (007, 008, but also 004242 and 0012345), but can be tested with constant memory: To test whether a word belongs in it, check whether the first symbol is 0, and whether the second symbol is 0. If that's the case, accept it. If the word is shorter than three or does not start with 00, it's not an MI6 code name.
Formally, the construct of a finite-state machine or a regular grammar is used to prove that a language is regular. These are similar to the if-statements above, but allow for arbitrarily long words. If there's a finite-state machine, there is also a regular grammar, and vice versa, so it's sufficient to show either. For example, the finite state machine for our double-0 language is:
start state: if input = 0 then goto state 2
start state: if input = 1 then fail
start state: if input = 2 then fail
...
state 2: if input = 0 then accept
state 2: if input != 0 then fail
accept: for any input, accept
The equivalent regular grammar is:
start → 0 B
B → 0 accept
accept → 0 accept
accept → 1 accept
...
The equivalent regular expression is:
00[0-9]*
Some languages are not regular. For example, the language of any number of 1, followed by the same number of 2 (often written as 1n2n, for an arbitrary n) is not regular - you need more than a constant amount of memory (= a constant number of states) to store the number of 1s to decide whether a word is in the language.
This should usually be explained in the theoretical computer science course. Luckily, Wikipedia explains both formal and regular languages quite nicely.
Here are some of the equivalent definitions from Wikipedia:
[...] a regular language is a formal language (i.e., a possibly
infinite set of finite sequences of symbols from a finite alphabet)
that satisfies the following equivalent properties:
it can be accepted by a deterministic finite state machine.
it can be accepted by a nondeterministic finite state machine
it can be described by a formal regular expression.
Note that the "regular expression" features provided with many programming languages
are augmented with features that make them capable of recognizing
languages which are not regular, and are therefore not strictly
equivalent to formal regular expressions.
The first thing to note is that a regular language is a formal language, with some restrictions. A formal language is essentially a (possibly infinite) collection of strings. For example, the formal language Java is the collection of all possible Java files, which is a subset of the collection of all possible text files.
One of the most important characteristics is that unlike the context-free languages, a regular language does not support arbitrary nesting/recursion, but you do have arbitrary repetition.
A language always has an underlying alphabet which is the set of allowed symbols. For example, the alphabet of a programming language would usually either be ASCII or Unicode, but in formal language theory it's also fine to talk about languages over other alphabets, for example the binary alphabet where the only allowed characters are 0 and 1.
In my university, we were taught some formal language theory in the Compilers class, but this is probably different between different schools.
I am not sure what version of Fortran this is, but the line is:
Term = F*F - 4.*E*G
I know that it multiplies F by F and then subtracts something, but I don't know what the period after the 4 is doing there.
I'm going to venture a guess based on every other programming language I've ever seen, and say that it's making the constant "4" of type Real, rather than Integer. In other words, it's making sure the types in the expression all match up. "4.0" would be equivalent; whoever wrote this code was just feeling extra concise that day.
It makes it a real number instead of an integer.
If you're new to Fortran, a "REAL" number is what is called in C-like languages a "float".
But only Fortran programmers can say the GOD is REAL, by default.