is the code for interpreted languages re-interpreted every time the line is reached? - bytecode

suppose that no bytecode is generated for a program, like in Ruby, Perl, or PHP, in this case, is line 1 below re-interpreted every time the execution reach line 1 again?
while ($indexArrayMoviesData < $countArrayMoviesData + $countNewlyAddedMoviesData) {
# do something
}
that is, if the loop runs 100,000 times, then that line will be re-interpreted 100,000 times?
and if so, the bytecode creation helps not only the initial start up of he program, but also during the execution? (because code doesn't need to be re-interpreted again)

Typically, it'll be converted into byte code and that byte code will then be executed.
But in the case of PHP for example, the byte code is regenerated on every request/page view. Unless you install a byte code (or opcode as it's called often in the case of PHP) cache, such as XCache, APC or EAccelerator.

For recent languages, including perl, the code is precompiled before being executed. So most of the analysis work is performed only once.
This is not the case for shells, which interpret every line each time they execute them.

If the interpreter is sensible it would hopefully check if $countArrayMoviesData or $countNewlyAddedMoviesData were altered during the loop and if they weren't then the sum could be calculated and kept.
If the values are updated within the loop then in all likelihood even the bytecode would require an addition operation to take place, not making it any more efficient.

Very, very few interpreters will do this. An example is the ages-old, no longer used Hypertalk interpreter for Hypercard where you could actually rewrite the text of your code programatically (it's just a string!)
Even interpreters that don't produce byte code will parse your code first, as it's hard to do that line by line and much easier to do it all at once. So a really simple interpreter will basically have a tree, with a node for the "where" loop with two children: one "less than" expression for the conditional, and one block for the body of the loop.

The answer to your question, as all consultants know, is "it depends."
You're right, in some interpreted languages, that line may be reinterpreted each time. I suspect most shells handle it roughly this way.
The original versions of Basic also did it this way.
Most current interpreters will at least tokenize the language, so that the text doesn't need to be re-scanned each time. That is, a BASIC-ish program like
00010 LET A=42
00020 DO WHILE A > 0
00025 LET A = A - 1
00030 ENDDO
would convert it at the least to small tokens for the keywords, and addresses for the variable, something like
LET $0003, 42
LABEL 00020
LETEST A, 0
IFTRUEGOTO 00030
SUB $0005, $0003, 1
GOTO 00020
LABEL 00030
where each word in upper case in the translation is a single integer internally. That way, there's a single lexical analysis pass to translate it, followed by the interpreter being able to just interpret the token values.
Of course, once you go that far, you find yourself thinking "gee, why not use real opcodes?"

Related

readString vs readLine

I am writing an application to read from a list of files, line by line and do some processing. I want to use as little RAM as I can.
I came across this question https://stackoverflow.com/a/41741702/3531263
Where the poster is saying readString uses more RAM than readLine and they have posted some code.
What I don't understand is how one uses more RAM? Because ultimately, the way their code is written, they are still writing an entire line to their buffer. So would that not mean if they had just used readString, it would have been the same thing?
the way their code is written, they are still writing an entire line to their buffer
Their code, yes. Your code might not need the whole line to be in memory at the same time. For example, your program is filtering a log file by request id, which is in the beginning of the line. It doesn't need to read the whole line which may be a few megabytes or more, only to reject it due to wrong request id. But with ReadString you don't have the luxury of choice.
I 'gree with Sergio. Also, have a look at the current implementation in the standard library. ReadLine calls ReadSlice('\n') once, then runs through a few branches to make sure the appropriate sentinel values or errors are returned with the converted data. On the other hand, ReadBytes and ReadString both loop over repeated calls to ReadSlice(delim), so it follows that they would necessarily be copying at least as much data into memory as ReadLine, and potentially much more if the delimiter wasn't found in the first call.

How does an interpreter translate a for loop?

I understand that interpreter translates your source code into machine code line by line, and stops when it encounters an error.
I am wondering, what does an interpreter do when you give it for loops.
E.g. I have the following (MATLAB) code:
for i = 1:10000
pi*pi
end
Does it really run through and translate the for loop line by line 10000 times?
With compilers is the machine code shorter, consisting only of a set of statements that include a go to control statement that is in effect for 10000 iterations.
I'm sorry if this doesn't make sense, I don't have very good knowledge of the underlying bolts and nuts of programming but I want to quickly understand.
I understand that interpreter translates your source code into machine code line by line, and stops when it encounters an error.
This is wrong. There are many different types of interpreters, but very few execute code line-by-line and those that do (mostly shells) don't generate machine code at all.
In general there are four more-or-less common ways that code can be interpreted:
Statement-by-statement execution
This is presumably what you meant by line-by-line, except that usually semicolons can be used as an alternative for line breaks. As I said, this is pretty much only done by shells.
How this works is that a single statement is parsed at a time. That is the parser reads tokens until the statement is finished. For simple statements that's until a statement terminator is reached, e.g. the end of line or a semicolon. For other statements (such as if-statements, for-loops or while-loops) it's until the corresponding terminator (endif, fi, etc.) is found. Either way the parser returns some kind of representation of the statement (some type of AST usually), which is then executed. No machine code is generated at any point.
This approach has the unusual property that syntax error at the end of the file won't prevent the beginning of the file from being executed. However everything is still parsed at most once and the bodies of if-statements etc. will still be parsed even if the condition is false (so a syntax error inside an if false will still abort the script).
AST-walking interpretation
Here the whole file is parsed at once and an AST is generated from it. The interpreter then simply walks the AST to execute the program. In principle this is the same as above, except that the entire program is parsed first.
Bytecode interpretation
Again the entire file is parsed at once, but instead of an AST-type structure the parser generates some bytecode format. This bytecode is then executed instruction-by-instruction.
JIT-compilation
This is the only variation that actually generates machine code. There are two variations of this:
Generate machine code for all functions before they're called. This might mean translating the whole file as soon as it's loaded or translating each function right before it's called. After the code has been generated, execute it.
Start by interpreting the bytecode and then JIT-compile specific functions or code paths individually once they have been executed a number of times. This allows us to make certain optimizations based on usage data that has been collected during interpretation. It also means we don't pay the overhead of compilation on functions that aren't called a lot. Some implementations (specifically some JavaScript engines) also recompile already-JITed code to optimize based on newly gathered usage data.
So in summary: The overlap between implementations that execute the code line-by-line (or rather statement-by-statement) and implementations that generate machine code should be pretty close to zero. And those implementations that do go statement-by-statement still only parse the code once.

How does an interpreter run code?

Reading all the compiled vs interpreted articles it seems like compiled means the machine will run the compiled code directly whereas interpreted, the interpreter will run the code. But how does the interpreter run the code if it's on a machine? Doesn't it still end up having to convert what it's interpreting into machine code and STILL having the machine run it? At the end of the day, everything has to end up being machine code in order for the machine to run it right? It seems like interpreted just means that it's running through the language one line at a time whereas compiled means going thru it all at once. After that, it's pretty much the same right?
Related: How programs written in interpreted languages are executed if they are never translated into machine language?
No, it doesn't need to convert it to a machine code. The instructions merely provide instructions to the interpreter itself, which the interpreter then executes itself.
Consider a really dumb "language" that consists of the following instructions:
add [number]
subtract [number]
divide [number]
multiply [number]
We could implement an "interpreter" like this (written in C#):
public static void ExecuteStatements(List<string> instructions)
{
int result = 0;
foreach (string instruction in instructions)
{
string[] action = instruction.Split(' ');
int number = int.Parse(action[1]);
switch (action[0].Trim().ToLower())
{
case "add":
result += number;
break;
case "subtract":
result -= number;
break;
case "divide":
result /= number;
break;
case "multiply":
result *= number;
break;
}
}
Console.WriteLine("Result: " + result);
}
The ExecuteStatements method will be compiled to machine code. Separately, we have a text file like this:
add 1
subtract 1
add 10
multiply 50
divide 5
The result will be 100. The strings are never actually compiled to anything - they just tell the interpreter what actions to take.
Obviously, this "language" isn't even close to Turing-complete, but the point is that at no point are we somehow "translating" this into machine code - the "interpreter" just takes whatever action is specified.
I actually wrote an interpreter once as part of a Test Automation framework. When someone did a call against the API, we would intercept the call, use reflection on it to determine what the call was and what the parameters were, and serialize the reflection metadata to JSON. We then later deserialized the JSON and used reflection to call whatever methods had been run before with the same parameters. The engine consumed JSON, not machine code, and used that to figure out what calls it should perform; at no point did it create or need any machine code. The critical thing was that, assuming that it had been given a valid script, the engine "knew" how to perform all of the specified actions itself, so it never needed to "offload" any of the instructions to the machine itself.
Here's the key insight: the interpreted code itself is quite literally doing nothing - all it's doing is feeding the interpreter which actions it needs to take. The interpreter already "knows" how to take all of the actions you can perform in the interpreted language, so no additional machine code is required.
As an analogy, think of the code you're interpreting as a recipe and the interpreter as a cook. The recipe specifies actions like "add 1 cup of flour and mix." The cook knows how to follow whatever directions he finds in the recipe and he performs them himself. Strictly speaking, the recipe isn't actually doing anything - it's just sitting there for the cook to read so that the cook can know what actions to take. There's no need for the recipe to actually be a cook in order for the recipe to be completed - it just needs someone who knows how to follow its directions.
TL;DR You don't need to "translate" it into machine code - you just need to have enough information for your interpreter to know what actions to take. A good interpreter already "knows" how to take whatever actions the language could implement, so there's no need to create any additional machine code.

Applescript has a limit of the number of lines.

I am making an app with a TON of features. My problem is that applescript seems to have a cut-off point. After a certain number of lines, the script stopps working. It basically only works until the end. Once it gets to that point it stops. I have moved the code around to make sure that it is not an error within the code. Help?
I might be wrong, but I believe a long script is not a good way to put your code.
It's really hard to read, to debug or to maintain as one slight change in a part can have unexpected consequences at the other part of you file.
If your script is very long, I suggest you break your code in multiple parts.
First, you may use functions if some part of the code is reused several times.
Another benefit of the functions is that you can validate them separately from the rest of the execution code.
Besides, it makes your code easier to read.
on doWhatYouHaveTo(anArgument)
say "hello!"
end doWhatYouHaveTo
If the functions are used by different scripts, you may want to have your functions in a seperate library that you will call at need.
set cc to load script alias ((path to library folder as string) & "Scripts:Common:CommonLibrary.app")
cc's doWhatYouHaveTo(oneArgument)
At last, a thing that I sometimes do is calling a different script with some arguments, if a long code fits for slightly different purposes:
run script file {mainFileName} with parameters {oneWay}
This last trick has a great yet curious benefit : it can accelerate the execution time for a reason I never explained (and when I say accelerate, I say reduce execution time by 17 or so for the very same code).

Debugging a program without source code (Unix / GDB)

This is homework. Tips only, no exact answers please.
I have a compiled program (no source code) that takes in command line arguments. There is a correct sequence of a given number of command line arguments that will make the program print out "Success." Given the wrong arguments it will print out "Failure."
One thing that is confusing me is that the instructions mention two system tools (doesn't name them) which will help in figuring out the correct arguments. The only tool I'm familiar with (unless I'm overlooking something) is GDB so I believe I am missing a critical component of this challenge.
The challenge is to figure out the correct arguments. So far I've run the program in GDB and set a breakpoint at main but I really don't know where to go from there. Any pro tips?
Are you sure you have to debug it? It would be easier to disassemble it. When you disassemble it look for cmp
There exists not only tools to decompile X86 binaries to Assembler code listings, but also some which attempt to show a more high level or readable listing. Try googling and see what you find. I'd be specific, but then, that would be counterproductive if your job is to learn some reverse engineering skills.
It is possible that the code is something like this: If Arg(1)='FOO' then print "Success". So you might not need to disassemble at all. Instead you only might need to find a tool which dumps out all strings in the executable that look like sequences of ASCII characters. If the sequence you are supposed to input is not in the set of characters easily input from the keyboard, there exist many utilities that will do this. If the program has been very carefully constructed, the author won't have left "FOO" if that was the "password" in plain sight, but will have tried to obscure it somewhat.
Personally I would start with an ltrace of the program with any arbitrary set of arguments. I'd then use the strings command and guess from that what some of the hidden argument literals might be. (Let's assume, for the moment, that the professor hasn't encrypted or obfuscated the strings and that they appear in the binary as literals). Then try again with one or two (or the requisite number, if number).
If you're lucky the program was compiled and provided to you without running strip. In that case you might have the symbol table to help. Then you could try single stepping through the program (read the gdb manuals). It might be tedious but there are ways to set a breakpoint and tell the debugger to run through some function call (such as any from the standard libraries) and stop upon return. Doing this repeatedly (identify where it's calling into standard or external libraries, set a breakpoint for the next instruction after the return, let gdb run the process through the call, and then inspect what the code is doing besides that.
Coupled with the ltrace it should be fairly easy to see the sequencing of the strcmp() (or similar) calls. As you see the string against which your input is being compared you can break out of the whole process and re-invoke the gdb and the program with that one argument, trace through 'til the next one and so on. Or you might learn some more advanced gdb tricks and actually modify your argument vector and restart main() from scratch.
It actually sounds like fun and I might have my wife whip up a simple binary for me to try this on. It might also create a little program to generate binaries of this sort. I'm thinking of a little #INCLUDE in the sources which provides the "passphrase" of arguments, and a make file that selects three to five words from /usr/dict/words, generates that #INCLUDE file from a template, then compiles the binary using that sequence.

Resources