How do you make a lexical analyzer with pascal? - compilation

I study Computer Science in college and one of my group tasks is to create a simple statement lexical analyzer, along with a simple arithmetic expression, simple conditional If with block, and simple looping with conditional If block. I'm trying my hand at making at least a simple statement lexical analyzer.
However, I have no clue on how to code with Pascal language as I was never taught this language over the course of my 5 semesters in college.
I tried looking up documentations of Pascal programming language along with various examples of it online while trying to do it myself on an online compiler, cross referencing a lexical analyzer made with C language and tried converting it myself into Pascal language.
It turned out to be more complicated than I had anticipated.
Again, I am very new to Pascal language, just starting a few days ago in fact. So if there are any tips or examples, preferably short and to the point, it would be very much appreciated, thanks.
If you want to see my code, its not much, as I was more focused on grasping the language. The only languages I was taught and can understand is Java and C, along with a little bit of PHP.
program SimpleStatement;
var letters: array['a'...'z'] of char,
numbers: array['1'...'9'] of integer,
symbols: char;
function letVer(letters : char): boolean;
begin
clrscr;
if(letters = 'a'...'z') then
begin
for letters: 'a' to 'z' do
writeln('keywords: ', letters)
end
letVer = true;
end
begin
letVer;
writeln ();
end.
Along with the error code:
Free Pascal Compiler version 3.0.4+dfsg-23 [2019/11/25] for x86_64
Copyright (c) 1993-2017 by Florian Klaempfl and others
Target OS: Linux for x86-64
Compiling main.pas
main.pas(12,23) Error: Error in type definition
main.pas(12,23) Fatal: Syntax error, "]" expected but "..." found
Fatal: Compilation aborted
Error: /usr/bin/ppcx64 returned an error exitcode

Related

Pascal Syntax error near unexpected token

I am trying to figure out the issue with this piece of Pascal code
function Factorial(n: integer): integer;
begin
if n = 0 then
Result := 1
else if n > 0 then
Result := Factorial(n - 1) * n;
end;
When I run the code I get the error
-bash: syntax error near unexpected token `n:'
Anyone can tell why that is? I am using the fpc (free pascal compiler) is this code meant for a different Pascal compiler?
That code compiles fine in fpc.
From the error message you quote, as #KenWhite says, it sounds like you are using the wrong tool to try to compile it - bash is an operating system shell for Linux and it is a bash error message. bash is not for comiling Pascal code.
I suggest you download and use Lazarus, which is the freeware IDE for fpc and runs on Linux and Windows. Once you have Lazarus installed on your system, create a new project (a "simple project" from Lazarus's list of new project types. Then copy/paste the code above the begin ...end of the project source, then save and compile it and you will see that Lazarus reports successfully compiling the project.
Btw, there is an omission from the code - it only covers cases where n is greater or equal to zero, so the function has an undefined result for n less that zero.

When comparing a variable to a literal, should one place the literal on the left or right of the equals '==' operator?

When learning to code, I was taught the following style when checking the value of a variable:
int x;
Object *object;
...
if(x == 7) { ... }
if(object == NULL) { ... }
However, now that I am in the field, I have encountered more than one co-worker who swears by the approach of switching the lhs and rhs in the if statements:
if(7 == x) { ... }
if(NULL == object) { ... }
The reasoning being that if you accidentally type = instead of ==, then the code will fail at compile. Being unaccustomed to this style, reading 7 == x is difficult for me, slowing my comprehension of their code.
It seems if I adopt this style, I will likely someday in the future save myself from debugging an x = 7 bug, but in the mean time, every time somebody reads my code I may be wasting their time because I fear the syntax is unorthodox.
Is the 7 == x style generally accepted and readable in the industry, or is this just a personal preference of my coworkers?
The reasoning being that if you accidentally type = instead of ==, then the code will fail at compile.
True. On the other hand, I believe modern C and C++ compilers (I'm assuming you're using one of those languages? You haven't said) will warn you if you do this.
Have you tried it with the compiler you're using? If it doesn't do it by default, look to see if there are flags you can use to provoke it - ideally to make it an error rather than just a warning.
For example, using the Microsoft C compiler, I get:
cl /Wall Test.c
test.c(3) : warning C4706: assignment within conditional expression
That's pretty clear, IMO. (The default warning settings don't spot it, admittedly.)
Being unaccustomed to this style, reading 7 == x is difficult for me, slowing my comprehension of their code.
Indeed. Your approach is the more natural style, and should (IMO) be used unless you're really dealing with a compiler which doesn't spot this as a potential problem (and you have no alternative to using that compiler).
EDIT: Note that this isn't a problem in all languages - not even all C-like languages.
For example, although both Java and C# have a similar if construct, the condition expression in both needs to be implicitly convertible to a Boolean value. While the assignment part would compile, the type of the expression in your first example would be int, which isn't implicitly convertible to the relevant Boolean type in either language, leading to a compile-time error. The rare situation where you'd still have a problem would be:
if (foo == true)
which, if typo'd to:
if (foo = true)
would compile and do the wrong thing. The MS C# compiler even warns you about that, although it's generally better to just use
if (foo)
or
if (!foo)
where possible. That just leaves things like:
if (x == MethodReturningBool())
vs
if (MethodReturningBool() == x)
which is still pretty rare, and there's still a warning for it in the MS C# compiler (and probably in some Java compilers).

Treetop grammar infinite loop

I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.
My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.
class {
initialize = lambda (name) {
receiver.name = name
}
greet = lambda {
IO.puts("Hello, #{receiver.name}!")
}
}.new(:World).greet()
I asked treetop to compile your language into an .rb file. That gave me something to dig into:
$ tt -o /tmp/rip.rb /tmp/rip.treetop
Then I used this little stub to recreate the loop:
require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')
This hangs. Now, isn't that interesting! An empty string reproduces the behavior just as well as the dozen-or-so-line example in your question.
To find out where it's hanging, I used an Emacs keyboard macro to edit rip.rb, adding a debug statement to the entry of each method. For example:
def _nt_root
p [__LINE__, '_nt_root'] #DEBUG
start_index = index
Now we can see the scope of the loop:
[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...
Further debugging from there reveals that an integer is allowed to be an empty string:
rule integer
digit*
end
This indirectly allows a statement to be an empty string, and the top-level rule statement* to forever consume empty statements. Changing * to + fixes the loop, but reveals another problem:
/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
from /tmp/rip.rb:825:in `_nt_literal_object'
from /tmp/rip.rb:787:in `_nt_object'
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
... 3283 levels...
Range is left-recursing, indirectly, via special_literals, literal_object, object, and compound_object. Treetop, when faced with left recursion, eats stack until it pukes. I don't have a quick fix for that problem, but at least you've got a stack trace to go from now.
Also, this is not your immediate problem, but the definition of digit is odd: It can either one digit, or multiple. This causes digit* or digit+ to allow the (presumably) illegal integer 1________2.
I really enjoyed Language Implementation Patterns by Parr; since Parr created the ANTLR parser generator, it's the tool he uses throughout the book, but it should be simple enough to learn from it all the same.
What I really liked about it was the way each example grew upon the previous one; he doesn't start out with a gigantic AST-capable parser, instead he slowly introduces problems that need more and more 'backend smarts' to do the job, so the book scales well along with the language that needs parsing.
What I wish it covered in a little more depth is the types of languages that one can write and give advice on Do's and Do Not Do's when designing languages. I've seen some languages that are a huge pain to parse and I'd have liked to know more about the design decisions that could have been made differently.

How to do OOP with Pascal?

I am learning OOP with the pascal programming language.After googling the Internet, I found the OOP -- The GNU Pascal Manaul.
What does this function declaration mean?
function Baz (b, a, z: Char) = s: Str100; { not virtual }
I have never seen = xx before, and it seems that the pascal syntax does not have it.
That appears to be some nonstandard extension. The implementation of the function in question is:
function FooParent.Baz (b, a, z: Char) = s: Str100;
begin
WriteStr (s, 'FooParent.Baz (', b, ', ', a, ', ', z, ')')
end;
Normally in Pascal, the name of the function is used as the function return variable, so the above function would be written like this:
function FooParent.Baz (b, a, z: Char): Str100;
begin
WriteStr (Baz, 'FooParent.Baz (', b, ', ', a, ', ', z, ')')
end;
It appears that the = s syntax indicates that the return value of the function is stored in the variable s inside the function body. I'm not sure why that would need to be exposed in the object interface, though.
I can't answer your question but I suggest to learn OOP with a language like Python or Groovy. Pascal isn't really something on which you'll be able to build a career. I'd rather suggest that as a third or fourth language to give you some ideas how other languages work but not as your first/main language.
Both offer an interactive console (for Python, I suggest iPython) where you can do small experiments.
Note: Some people attack me on the grounds that I don't know much about Pascal. I'm not sure where you get that idea. I have over 200K points on SO, I know 30 programming languages (know == written large amounts of code in it), I've written code in Pascal at the beginning of my career.
I understand it hurts when your beloved programming language doesn't get the recognition you feel it deserves. A no brainer: You made the decision to use that language, so it must be the best one - otherwise your decision would be wrong.
But when I compare code written in all the languages that I know, Pascal/Delphi/Modula/Oberon are among those which I don't recommend for various reasons. Delphi got some traction because of Borland but that's over. I've seen the code which good Delphi programmers wrote and my first thought was: Maintenance cost.

What obscure syntax ruined your day? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When have you run into syntax that might be dated, never used or just plain obfuscated that you couldn't understand for the life of you.
For example, I never knew that comma is an actual operator in C. So when I saw the code
if(Foo(), Bar())
I just about blew a gasket trying to figure out what was going on there.
I'm curious what little never-dusted corners might exist in other languages.
C++'s syntax for a default constructor on a local variable. At first I wrote the following.
Student student(); // error
Student student("foo"); // compiles
This lead me to about an hour of reading through a cryptic C++ error message. Eventually a non-C++ newbie dropped by, laughed and pointed out my mistake.
Student student;
This is always jarring:
std::vector <std::vector <int> >
^
mandatory space.
When using the System.DirectoryServices name space to bind to an ADAM (Active Directory Application Mode; now called AD LDS, I think), I lost an entire day trying to debug this simple code:
DirectoryEntry rootDSE = new DirectoryEntry(
"ldap://192.168.10.78:50000/RootDSE",
login,
password,
AuthenticationTypes.None);
When I ran the code, I kept getting a COMException with error 0x80005000, which helpfully mapped to "Unknown error."
I could use the login and password and bind to the port via ADSI Edit. But this simple line of code didn't work. Bizarre firewall permission? Something screwed in configuration? Some COM object not registered correctly? Why on earth wasn't it working?
The answer? It's LDAP://, not ldap://.
And this is why we drink.
C++
class Foo
{
// Lots of stuff here.
} bar;
The declaration of bar is VERY difficult to see. More commonly found in C, but especially annoying in C++.
Perl's syntax caused me a bad day a while ago:
%table = {
foo => 1,
bar => 2
};
Without proper warnings (which are unavailable on the platform I was using), this creates a one-element hash with a key as the given hash reference and value undef. Note the subtle use of {}, which creates a new hash reference, and not (), which is an array used to populate the %table hash.
I was shocked Python's quasi-ternary operator wasn't a syntax error the first time I saw it:
X if Y else Z
This is stupid and common, but this syntax:
if ( x = y ) {
// do something
}
Has caught me about three times in the past year in a couple of different languages. I really like the R language's convention of using <- for assignment, like this:
x <- y
If the x = y syntax were made to mean x == y, and x <- y to mean assignment, my brain would make a smoother transition to and from math and programming.
C/C++'s bitvector syntax. The worst part about this is trying to google for it simply based on the syntax.
struct C {
unsigned int v1 : 12;
unsigned int v2 : 1;
};
C#'s ?? operator threw me for a loop the first time I saw it. Essentially it will return the LHS if it's non-null and the RHS if the LHS is null.
object bar = null;
object foo = bar ?? new Student(); // gets new Student()
Powershell's function calling semantics
function foo() {
params ($count, $name);
...
}
foo (5, "name")
For the non powershellers out there. This will work but not how you expect it to. It actually creates an array and passes it as the first argument. The second argument has no explicit value. The correct version is
foo 5 "name"
The first time I saw a function pointer in C++ I was confused. Worse, because the syntax has no key words, it was really hard to look up. What exactly does one type into a search engine for this?
int (*Foo)(float, char, char);
I ended up having to ask the local C++ guru what it was.
VB's (yeah yeah, I have to use it) "And" keyword - as in:
If Object IsNot Nothing And Object.Property Then
See that Object.Property reference, after I've made sure the object isn't NULL? Well, VB's "And" keyword * does * not * block * further * evaluation and so the code will fail.
VB does have, however, another keyword - AndAlso:
If Object IsNot Nothing AndAlso Object.Property Then
That will work as you'd expect and not explode when run.
I was once very confused by some C++ code that declared a reference to a local variable, but never used it. Something like
MyLock &foo;
(Cut me some slack on the syntax, I haven't done C++ in nearly 8 years)
Taking that seemingly unused variable out made the program start dying in obscure ways seemingly unrelated to this "unused" variable. So I did some digging, and found out that the default ctor for that class grabbed a thread lock, and the dtor released it. This variable was guarding the code against simultaneous updates without seemingly doing anything.
Javascript: This syntax ...
for(i in someArray)
... is for looping through arrays, or so I thought. Everything worked fine until another team member dropped in MooTools, and then all my loops were broken because the for(i in ...) syntax also goes over extra methods that have been added to the array object.
Had to translate some scientific code from old FORTRAN to C. A few things that ruined my day(s):
Punch-card indentation. The first 6 characters of every line were reserved for control characters, goto labels, comments, etc:
^^^^^^[code starts here]
c [commented line]
Goto-style numbering for loops (coupled with 6 space indentation):
do 20, i=0,10
do 10, j=0,10
do_stuff(i,j)
10 continue
20 continue
Now imagine there are multiple nested loops (i.e., do 20 to do 30) which have no differentiating indentation to know what context you are in. Oh, and the terminating statements are hundreds of lines away.
Format statement, again using goto labels. The code wrote to files (helpfully referred to by numbers 1,2,etc). To write the values of a,b,c to file we had:
write (1,51) a,b,c
So this writes a,b,c to file 1 using a format statement at the line marked with label 51:
51 format (f10.3,f10.3,f10.3)
These format lines were hundreds of lines away from where they were called. This was complicated by the author's decision to print newlines using:
write (1,51) [nothing here]
I am reliably informed by a lecturer in the group that I got off easy.
C's comma operator doesn't seem very obscure to me: I see it all the time, and if I hadn't, I could just look up "comma" in the index of K&R.
Now, trigraphs are another matter...
void main() { printf("wat??!\n"); } // doesn't print "wat??!"
Wikipedia has some great examples, from the genuinely confusing:
// Will the next line be executed????????????????/
a++;
to the bizarrely valid:
/??/
* A comment *??/
/
And don't even get me started on digraphs. I would be surprised if there's somebody here who can fully explain C's digraphs from memory. Quick, what digraphs does C have, and how do they differ from trigraphs in parsing?
Syntax like this in C++ with /clr enabled. Trying to create a Managed Dictionary object in C++.
gcroot<Dictionary<System::String^, MyObj^>^> m_myObjs;
An oldie:
In PL/1 there are no reserved words, so you can define variables, methods, etc. with the same name as the language keywords.
This can be a valid line of code:
IF ELSE THEN IF ELSE THEN
(Where ELSE is a boolean, and IF and THEN are functions, obviously.)
Iif(condition, expression, expression) is a function call, not an operator.
Both sides of the conditional are ALWAYS evaluated.
It always ruines my day if I have to read/write some kind of Polish notation as used in a lot of HP calculators...
PHP's ternary operator associates left to right. This caused me much anguish one day when I was learning PHP. For the previous 10 years I had been programming in C/C++ in which the ternary operator associates right to left.
I am still a little curious as to why the designers of PHP chose to do that when, in many other respects, the syntax of PHP matches that C/C++ fairly closely.
EDIT: nowadays I only work with PHP under duress.
Not really obscure, but whenever I code too much in one language, and go back to another, I start messing up the syntax of the latter. I always chuckle at myself when I realize that "#if" in C is not a comment (but rather something far more deadly), and that lines in Python do not need to end in a semicolon.
While performing maintentnace on a bit of C++ code I once spotted that someone had done something like this:
for (i=0; i<10; i++)
{
MyNumber += 1;
}
Yes, they had a loop to add 1 to a number 10 times.
Why did it ruin my day? The perpetrator had long since left, and I was having to bug fix their module. I thought that if they were doing something like this, goodness knows what else I was going to encounter!
AT&T assembler syntax >:(
This counter-intuitive, obscure syntax has ruined many of my days, for example, the simple Intel syntax assembly instruction:
mov dword es:[ebp-5], 1 /* Cool, put the value 1 into the
* location of ebp minus five.
* this is so obvious and readable, and hard to mistake
* for anything else */
translates into this in AT&T syntax
movl $1, %es:-4(%ebp) /* huh? what's "l"? 4 bytes? 8 bytes? arch specific??
* wait, why are we moving 1 into -4 times ebp?
* or is this moving -4 * ebp into memory at address 0x01?
* oh wait, YES, I magically know that this is
* really setting 4 bytes at ebp-5 to 1!
More...
mov dword [foo + eax*4], 123 /* Intel */
mov $123, foo(, %eax, 4) /* AT&T, looks like a function call...
* there's no way in hell I'd know what this does
* without reading a full manual on this syntax */
And one of my favorites.
It's as if they took the opcode encoding scheme and tried to incorporate it into the programming syntax (read: scale/index/base), but also tried to add a layer of abstraction on the data types, and merge that abstraction into the opcode names to cause even more confusion. I don't see how anyone can program seriously with this.
In a scripting language (Concordance Programming Language) for stand alone database software (Concordance) used for litigation document review, arrays were 0 indexed while (some) string functions were 1 indexed. I haven't touched it since.
This. I had my run in with it more then once.
GNU extensions are often fun:
my_label:
unsigned char *ptr = (unsigned char *)&&my_label;
*ptr = 5; // Will it segfault? Finding out is half the fun...
The syntax for member pointers also causes me grief, more because I don't use it often enough than because there's anything really tricky about it:
template<typename T, int T::* P>
function(T& t)
{
t.*P = 5;
}
But, really, who needs to discuss the obscure syntax in C++? With operator overloading, you can invent your own!

Resources