How does the interpretive program(ex:perl,shell) work - shell

I propose a hypothesis: 1. the operating system creates a process space to start the interpreter; 2. the interpreter creates a new process space to start the program that needs to be interpreted, translating the first statement into machine language; 3. the execution of the first statement ends and interrupts; 4. the interpreter translates the next statement and dynamically modifies and creates new instructions. Well, I can't make it up. I can't understand the concept of explaining and executing.

Here's a example interpreter:
while (<>) {
my ($cmd, #args) = split;
if ($cmd eq '...') { ... }
elsif ($cmd eq '...') { ... }
elsif ($cmd eq '...') { ... }
else { ... }
}
This points out that the interpreted program isn't run in a separate process from the interpreter.
This also points out there isn't necessarily any translation to machine language.
Due note that Perl is a compiled language rather than interpreted one.
$ perl -MO=Concise,-exec -e'print("Hello, world!\n");'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <0> pushmark s
4 <$> const[PV "Hello, world!\n"] s
5 <#> print vK
6 <#> leave[1 ref] vKP/REFC
-e syntax OK
That said, the compiled form is not native instructions. There are different ways this can be handled, but Perl effectively interprets these. The following is that interpreter:
int
Perl_runops_standard(pTHX)
{
OP *op = PL_op;
PERL_DTRACE_PROBE_OP(op);
while ((PL_op = op = op->op_ppaddr(aTHX))) {
PERL_DTRACE_PROBE_OP(op);
}
PERL_ASYNC_CHECK();
TAINT_NOT;
return 0;
}
(Copied from here.)
The ops are really data structures arranged in a linked list (with other pointers for jumps) rather than a stream of bytes encoding instructions. The above loop traverses the list, executing the function associated with each op. These function returns the address of the next op to execute, thus forming the program.
Some languages probably take a similar approach. Other languages definitely take a different approach.

Related

Tcl syntax explanation

Tcl code:
for {local i 0 } { $i < $bsLen } { incr i } {
local topb [bs rhex $bsStream 1]
local botb [bs rhex $bsStream 1]
local hexStr [strcat $hexStr $topb $botb ]
}
What are some documents that can help to explain the above syntax?
Only for and incr are standard Tcl commands in that sample. If you know C or Java or C#, you'll probably be able to guess what those do without too much difficulty; the syntax is a little different but not very.
The other commands are these, but I know not who defines them:
local — appears to be setting a local variable. What's wrong with using set? I don't know…
bs — specifically bs rhex, and it appears to be getting a value (in hex?) from a stream (named in the bsStream variable). This one is totally guesswork.
strcat — I'd guess this is doing string concatenation of its arguments, as it has the name of a standard C function that does that (doing anything else would be weird and designed to trip its own programmer up).
That last line would be more conventionally written:
append hexStr $topb $botb
as the append command is optimised (in its memory management, which tends to dominate these sorts of things) for the building-a-string-piecemeal case. In particular, it doesn't demonstrate quadratic Shlemiel the painter misbehaviour.
There is nothing like the local standard keyword in Tcl.
for {set i 0 } { $i < $bsLen } { incr i } {
set topb [bs rhex $bsStream 1]
set botb [bs rhex $bsStream 1]
sethexStr [strcat $hexStr $topb $botb ]
}
for {initialize} {condition_check} {inclriment} {body to execute}

Why can't dead code detection be fully solved by a compiler?

The compilers I've been using in C or Java have dead code prevention (warning when a line won't ever be executed). My professor says that this problem can never be fully solved by compilers though. I was wondering why that is. I am not too familiar with the actual coding of compilers as this is a theory-based class. But I was wondering what they check (such as possible input strings vs acceptable inputs, etc.), and why that is insufficient.
The dead code problem is related to the Halting problem.
Alan Turing proved that it is impossible to write a general algorithm that will be given a program and be able to decide whether that program halts for all inputs. You may be able to write such an algorithm for specific types of programs, but not for all programs.
How does this relate to dead code?
The Halting problem is reducible to the problem of finding dead code. That is, if you find an algorithm that can detect dead code in any program, then you can use that algorithm to test whether any program will halt. Since that has been proven to be impossible, it follows that writing an algorithm for dead code is impossible as well.
How do you transfer an algorithm for dead code into an algorithm for the Halting problem?
Simple: you add a line of code after the end of the program you want to check for halt. If your dead-code detector detects that this line is dead, then you know that the program does not halt. If it doesn't, then you know that your program halts (gets to the last line, and then to your added line of code).
Compilers usually check for things that can be proven at compile-time to be dead. For example, blocks that are dependent on conditions that can be determined to be false at compile time. Or any statement after a return (within the same scope).
These are specific cases, and therefore it's possible to write an algorithm for them. It may be possible to write algorithms for more complicated cases (like an algorithm that checks whether a condition is syntactically a contradiction and therefore will always return false), but still, that wouldn't cover all possible cases.
Well, let's take the classical proof of the undecidability of the halting problem and change the halting-detector to a dead-code detector!
C# program
using System;
using YourVendor.Compiler;
class Program
{
static void Main(string[] args)
{
string quine_text = #"using System;
using YourVendor.Compiler;
class Program
{{
static void Main(string[] args)
{{
string quine_text = #{0}{1}{0};
quine_text = string.Format(quine_text, (char)34, quine_text);
if (YourVendor.Compiler.HasDeadCode(quine_text))
{{
System.Console.WriteLine({0}Dead code!{0});
}}
}}
}}";
quine_text = string.Format(quine_text, (char)34, quine_text);
if (YourVendor.Compiler.HasDeadCode(quine_text))
{
System.Console.WriteLine("Dead code!");
}
}
}
If YourVendor.Compiler.HasDeadCode(quine_text) returns false, then the line System.Console.WriteLn("Dead code!"); won't be ever executed, so this program actually does have dead code, and the detector was wrong.
But if it returns true, then the line System.Console.WriteLn("Dead code!"); will be executed, and since there is no more code in the program, there is no dead code at all, so again, the detector was wrong.
So there you have it, a dead-code detector that returns only "There is dead code" or "There is no dead code" must sometimes yield wrong answers.
If the halting problem is too obscure, think of it this way.
Take a mathematical problem that is believed to be true for all positive integer's n, but hasn't been proven to be true for every n. A good example would be Goldbach's conjecture, that any positive even integer greater than two can be represented by the sum of two primes. Then (with an appropriate bigint library) run this program (pseudocode follows):
for (BigInt n = 4; ; n+=2) {
if (!isGoldbachsConjectureTrueFor(n)) {
print("Conjecture is false for at least one value of n\n");
exit(0);
}
}
Implementation of isGoldbachsConjectureTrueFor() is left as an exercise for the reader but for this purpose could be a simple iteration over all primes less than n
Now, logically the above must either be the equivalent of:
for (; ;) {
}
(i.e. an infinite loop) or
print("Conjecture is false for at least one value of n\n");
as Goldbach's conjecture must either be true or not true. If a compiler could always eliminate dead code, there would definitely be dead code to eliminate here in either case. However, in doing so at the very least your compiler would need to solve arbitrarily hard problems. We could provide problems provably hard that it would have to solve (e.g. NP-complete problems) to determine which bit of code to eliminate. For instance if we take this program:
String target = "f3c5ac5a63d50099f3b5147cabbbd81e89211513a92e3dcd2565d8c7d302ba9c";
for (BigInt n = 0; n < 2**2048; n++) {
String s = n.toString();
if (sha256(s).equals(target)) {
print("Found SHA value\n");
exit(0);
}
}
print("Not found SHA value\n");
we know that the program will either print out "Found SHA value" or "Not found SHA value" (bonus points if you can tell me which one is true). However, for a compiler to be able to reasonably optimise that would take of the order of 2^2048 iterations. It would in fact be a great optimisation as I predict the above program would (or might) run until the heat death of the universe rather than printing anything without optimisation.
I don't know if C++ or Java have an Eval type function, but many languages do allow you do call methods by name. Consider the following (contrived) VBA example.
Dim methodName As String
If foo Then
methodName = "Bar"
Else
methodName = "Qux"
End If
Application.Run(methodName)
The name of the method to be called is impossible to know until runtime. Therefore, by definition, the compiler cannot know with absolute certainty that a particular method is never called.
Actually, given the example of calling a method by name, the branching logic isn't even necessary. Simply saying
Application.Run("Bar")
Is more than the compiler can determine. When the code is compiled, all the compiler knows is that a certain string value is being passed to that method. It doesn't check to see if that method exists until runtime. If the method isn't called elsewhere, through more normal methods, an attempt to find dead methods can return false positives. The same issue exists in any language that allows code to be called via reflection.
Unconditional dead code can be detected and removed by advanced compilers.
But there is also conditional dead code. That is code that cannot be known at the time of compilation and can only be detected during runtime. For example, a software may be configurable to include or exclude certain features depending on user preference, making certain sections of code seemingly dead in particular scenarios. That is not be real dead code.
There are specific tools that can do testing, resolve dependencies, remove conditional dead code and recombine the useful code at runtime for efficiency. This is called dynamic dead code elimination. But as you can see it is beyond the scope of compilers.
A simple example:
int readValueFromPort(const unsigned int portNum);
int x = readValueFromPort(0x100); // just an example, nothing meaningful
if (x < 2)
{
std::cout << "Hey! X < 2" << std::endl;
}
else
{
std::cout << "X is too big!" << std::endl;
}
Now assume that the port 0x100 is designed to return only 0 or 1. In that case the compiler cannot figure out that the else block will never be executed.
However in this basic example:
bool boolVal = /*anything boolean*/;
if (boolVal)
{
// Do A
}
else if (!boolVal)
{
// Do B
}
else
{
// Do C
}
Here the compiler can calculate out the the else block is a dead code.
So the compiler can warn about the dead code only if it has enough data to to figure out the dead code and also it should know how to apply that data in order to figure out if the given block is a dead code.
EDIT
Sometimes the data is just not available at the compilation time:
// File a.cpp
bool boolMethod();
bool boolVal = boolMethod();
if (boolVal)
{
// Do A
}
else
{
// Do B
}
//............
// File b.cpp
bool boolMethod()
{
return true;
}
While compiling a.cpp the compiler cannot know that boolMethod always returns true.
The compiler will always lack some context information. E.g. you might know, that a double value never exeeds 2, because that is a feature of the mathematical function, you use from a library. The compiler does not even see the code in the library, and it can never know all features of all mathematical functions, and detect all weired and complicated ways to implement them.
The compiler doesn't necessarily see the whole program. I could have a program that calls a shared library, which calls back into a function in my program which isn't called directly.
So a function which is dead with respect to the library it's compiled against could become alive if that library was changed at runtime.
If a compiler could eliminate all dead code accurately, it would be called an interpreter.
Consider this simple scenario:
if (my_func()) {
am_i_dead();
}
my_func() can contain arbitrary code and in order for the compiler to determine whether it returns true or false, it will either have to run the code or do something that is functionally equivalent to running the code.
The idea of a compiler is that it only performs a partial analysis of the code, thus simplifying the job of a separate running environment. If you perform a full analysis, that isn't a compiler any more.
If you consider the compiler as a function c(), where c(source)=compiled code, and the running environment as r(), where r(compiled code)=program output, then to determine the output for any source code you have to compute the value of r(c(source code)). If calculating c() requires the knowledge of the value of r(c()) for any input, there is no need for a separate r() and c(): you can just derive a function i() from c() such that i(source)=program output.
Others have commented on the halting problem and so forth. These generally apply to portions of functions. However it can be hard/impossible to know whether even an entire type (class/etc) is used or not.
In .NET/Java/JavaScript and other runtime driven environments there's nothing stopping types being loaded via reflection. This is popular with dependency injection frameworks, and is even harder to reason about in the face of deserialisation or dynamic module loading.
The compiler cannot know whether such types would be loaded. Their names could come from external config files at runtime.
You might like to search around for tree shaking which is a common term for tools that attempt to safely remove unused subgraphs of code.
Take a function
void DoSomeAction(int actnumber)
{
switch(actnumber)
{
case 1: Action1(); break;
case 2: Action2(); break;
case 3: Action3(); break;
}
}
Can you prove that actnumber will never be 2 so that Action2() is never called...?
I disagree about the halting problem. I wouldn't call such code dead even though in reality it will never be reached.
Instead, lets consider:
for (int N = 3;;N++)
for (int A = 2; A < int.MaxValue; A++)
for (int B = 2; B < int.MaxValue; B++)
{
int Square = Math.Pow(A, N) + Math.Pow(B, N);
float Test = Math.Sqrt(Square);
if (Test == Math.Trunc(Test))
FermatWasWrong();
}
private void FermatWasWrong()
{
Press.Announce("Fermat was wrong!");
Nobel.Claim();
}
(Ignore the type and overflow errors) Dead code?
Look at this example:
public boolean isEven(int i){
if(i % 2 == 0)
return true;
if(i % 2 == 1)
return false;
return false;
}
The compiler can't know that an int can only be even or odd. Therefore the compiler must be able to understand the semantics of your code. How should this be implemented? The compiler can't ensure that the lowest return will never be executed. Therefore the compiler can't detect the dead code.

Automated GOTO removal algorithm

I've heard that it's been proven theoretically possible to express any control flow in a Turing-complete language using only structured programming constructs, (conditionals, loops and loop-breaks, and subroutine calls,) without any arbitrary GOTO statements. Is there any way to use that theory to automate refactoring of code that contains GOTOs into code that does not?
Let's say I have an arbitrary single subroutine in a simple imperative language, such as C or Pascal. I also have a parser that can verify that this subroutine is valid, and produce an Abstract Syntax Tree from it. But the code contains GOTOs and Labels, which could jump forwards or backwards to any arbitrary point, including into or out of conditional or loop blocks, but not outside of the subroutine itself.
Is there an algorithm that could take this AST and rework it into new code which is semantically identical, but does not contain any Labels or GOTO statements?
In principle, it is always possible to do this, though the results might not be pretty.
One way to always eliminate gotos is to transform the program in the following way. Start off by numbering all the instructions in the original program. For example, given this program:
start:
while (true) {
if (x < 5) goto start;
x++
}
You could number the statements like this:
0 start:
1 while (x < 3) {
2 if (x < 5) goto start;
3 x++
}
To eliminate all gotos, you can simulate the flow of the control through this function by using a while loop, an explicit variable holding the program counter, and a bunch of if statements. For example, you might translate the above code like this:
int PC = 0;
while (PC <= 3) {
if (PC == 0) {
PC = 1; // Label has no effect
} else if (PC == 1) {
if (x < 3) PC = 4; // Skip loop, which ends this function.
else PC = 2; // Enter loop.
} else if (PC == 2) {
if (x < 5) PC = 0; // Simulate goto
else PC = 3; // Simulate if-statement fall-through
} else if (PC == 3) {
x++;
PC = 1; // Simulate jump back up to the top of the loop.
}
}
This is a really, really bad way to do the translation, but it shows that in theory it is always possible to do this. Actually implementing this would be very messy - you'd probably number the basic blocks of the function, then generate code that puts the basic blocks into a loop, tracks which basic block is currently executing, then simulates the effect of running a basic block and the transition from that basic block to the appropriate next basic block.
Hope this helps!
I think you want to read Taming Control Flow by Erosa and Hendren, 1994. (Earlier link on Google scholar).
By the way, loop-breaks are also easy to eliminate. There is a simple mechanical procedure involving the creating of a boolean state variable and the restructuring of nested conditionals to create straight-line control flow. It does not produce pretty code :)
If your target language has tail-call optimization (and, ideally, inlining), you can mechanically remove both break and continue by turning the loop into a tail-recursive function. (If the index variable is modified by the loop body, you need to work harder at this. I'll just show the simplest case.) Here's the transformation of a simple loop:
for (Type Index = Start; function loop(Index: Type):
Condition(Index); if (Condition)
Index = Advance(Index)){ return // break
Body Body
} return loop(Advance(Index)) // continue
loop(Start)
The return statements labeled "continue" and "break" are precisely the transformation of continue and break. Indeed, the first step in the procedure might have been to rewrite the loop into its equivalent form in the original language:
{
Type Index = Start;
while (true) {
if (!Condition(Index))
break;
Body;
continue;
}
}
I use either/both Polyhedron's spag and vast's 77to90 to begin the process of refactoring fortran and then converting it to matlab source. However, these tools always leave 1/4 to 1/2 of the goto's in the program.
I wrote up a goto remover which accomplishes something similar to what you were describing: it takes fortran code and refactors all the remaining goto's from a program and replacing them with conditionals and do/cycle/exit's which can then be converted into other languages like matlab. You can read more about the process I use here:
http://engineering.dartmouth.edu/~d30574x/consulting/consulting_gotorefactor.html
This program could be adapted to work with other languages, but I have not gotten than far yet.

Does awk support dynamic user-defined variables?

awk supports this:
awk '{print $(NF-1);}'
but not for user-defined variables:
awk '{a=123; b="a"; print $($b);}'
by the way, shell supports this:
a=123;
b="a";
eval echo \${$b};
How can I achieve my purpose in awk?
OK, since some of us like to eat spaghetti through their nose, here is some actual code that I wrote in the past :-)
First of all, getting a self modifying code in a language that does not support it will be extremely non-trivial.
The idea to allow dynamic variables, function names, in a language that does not support one is very simple. At some state in the program, you want a dynamic anything to self modify your code, and resume execution
from where you left off. a eval(), that is.
This is all very trivial, if the language supports eval() and such equlavant. However, awk does not have such function. Therefore, you, the programmer has to provide a interface to such thing.
To allow all this to happen, you have three main problems
How to get our self so we can modify it
How to load the modified code, and resume from where we left off
Finding a way for the interpreter to accept our modified code
How to get our self so we can modify it
Here is a example code, suitable for direct execution.
This one is the infastrucure that I inject for enviroments running gawk, as it requires PROCINFO
echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
print "foo";
dbg_argv(A);
dbg_printarray(A);
print "bar";
}'
Result:
foo
A[1]=[awk]
A[2]=[
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
print "foo";
dbg_argv(A);
dbg_printarray(A);
print "bar";
}]
bar
As you can see, as long as the OS does not play with our args, and /proc/ is available, it is possible
to read our self. This may appear useless at first, but we need it for push/pop of our stack,
so that our execution state can be enbedded within the code, so we can save/resume and survive OS shutdown/reboots
I have left out the OS detection function and the bootloader (written in awk), because, if I publish that,
kids can build platform independent polynormal code, and it is easy to cause havoc with it.
how to load the modified code, and resume from where we left off
Now, normaly you have push() and pop() for registers, so you can save your state and play with
your self, and resume from where you left off. a Call and reading your stack is a typical way to get the
memory address.
Unfortunetly, in awk, under normal situations we can not use pointers (with out a lot of dirty work),
or registers (unless you can inject other stuff along the way).
However you need a way to suspend and resume from your code.
The idea is simple. Instead of letting awk in control of your loops and while, if else conditions,
recrusion depth, and functions you are in, the code should.
Keep a stack, list of variable names, list of function names, and manage it your self.
Just make sure that your code always calls self_modify( bool ) constantly, so that even upon sudden failure,
As soon as the script is re-run, we can enter self_modify( bool ) and resume our state.
When you want to self modify your code, you must provide a custom made
write_stack() and read_stack() code, that writes out the state of stack as string, and reads string from
the values out from the code embedded string itself, and resume the execution state.
Here is a small piece of code that demonstrates the whole flow
echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function _(s){return s}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
_(BEGIN_MODIFY"|");print "#foo";_("|"END_MODIFY)
dbg_argv(A);
sub( \
"BEGIN_MODIFY\x22\x5c\x7c[^\x5c\x7c]*\x5c\x7c\x22""END_MODIFY", \
"BEGIN_MODIFY\x22\x7c\x22);print \"#"PROCINFO["pid"]"\";_(\x22\x7c\x22""END_MODIFY" \
,A[2])
print "echo \x22\x22\x7c awk \x27"A[2]"";
print "function bar_"PROCINFO["pid"]"_(s){print \x22""doe\x22}";
print "\x27"
}'
Result:
Exactly same as our original code, except
_(BEGIN_MODIFY"|");print "65964";_("|"ND_MODIFY)
and
function bar_56228_(s){print "doe"}
at the end of code
Now, this may seem useless, as we are only replaceing code print "foo"; with our pid.
But it becomes usefull, when there are multiple _() with separate MAGIC strings to identify BLOCKS,
and a custome made multi line string replacement routine instead of sub()
You msut provide BLOCKS for stack, function list, execution point, as a bare minimum.
And notice that the last line contains bar
This it self is just a sting, but when this code repeatedly gets executed, notice that
function bar_56228_(s){print "doe"}
function bar_88128_(s){print "doe"}
...
and it keeps growing. While the example is intentionally made so that it does nothing useful,
if we provide a routine to call bar_pid_(s) instead of that print "foo" code,
Sudenly it means we have eval() on our hands :-)
Now, isn't eval() usefull :-)
Don't forget to provide a custome made remove_block() function so that the code maintains
a reasonable size, instead of growing every time you execute.
Finding a way for the interpreter to accept our modified code
Normally calling a binary is trivial. However, when doing so from with in awk, it becomes difficult.
You may say system() is the way.
There are two problems to that.
system() may not work on some envoroments
it blocks while you are executing code, trus you can not perform recrusive calls and keep the user happy at the same time.
If you must use system(), ensure that it does not block.
A normal call to system("sleep 20 && echo from-sh & ") will not work.
The solution is simple,
echo ""|awk '{print "foo";E="echo ep ; sleep 20 && echo foo & disown ; "; E | getline v;close(E);print "bar";}'
Now you have a async system() call that does not block :-)
Not at the moment. However, if you provide a wrapper, it is (somewhat hacky and dirty) possible.
The idea is to use # operator, introduced in the recent versions of gawk.
This # operator is normally used to call a function by name.
So if you had
function foo(s){print "Called foo "s}
function bar(s){print "Called bar "s}
{
var = "";
if(today_i_feel_like_calling_foo){
var = "foo";
}else{
var = "bar";
}
#var( "arg" ); # This calls function foo(), or function bar() with "arg"
}
Now, this is usefull on it's own.
Assuming we know var names beforehand, we can write a wrapper to indirectly modify and obtain vars
function get(varname, this, call){call="get_"varname;return #call();}
function set(varname, arg, this, call){call="set_"varname; #call(arg);}
So now, for each var name you want to prrvide access by name, you declare these two functions
function get_my_var(){return my_var;}
function set_my_var(arg){my_var = arg;}
And prahaps, somewhere in your BEGIN{} block,
BEGIN{ my_var = ""; }
To declare it for global access.
Then you can use
get("my_var");
set("my_var", "whatever");
This may appear useless at first, however there are perfectly good use cases, such as
keeping a linked list of vars, by holding the var's name in another var's array, and such.
It works for arrays too, and to be honest, I use this for nesting and linking Arrays within
Arrays, so I can walk through multiple Arrays like using pointers.
You can also write configure scripts that refer to var names inside awk this way,
in effect having a interpreter-inside-a-interpreter type of things, too...
Not the best way to do things, however, it gets the job done, and I do not have to worry about
null pointer exceptions, or GC and such :-)
The $ notation is not a mark for variables, as in shell, PHP, Perl etc. It is rather an operator, which receives an integer value n and returns the n-th column from the input. So, what you did in the first example is not the setting/getting of a variable dynamically but rather a call to an operator/function.
As stated by commenters, you can archive the behavior you are looking for with arrays:
awk '{a=123; b="a"; v[b] = a; print v[b];}'
I had a similar problem to solve, to load the settings from a '.ini' file and I've used arrays to set the variables dynamically.
It works with Awk or Gawk, Linux or Windows (GnuWin32)
gawk -v Settings_File="my_settings_file.ini" -f awk_script.awk <processing_file>
[my_settings_file.ini]
#comment
first_var=foo
second_var=bar
[awk_script.awk]
BEGIN{
FS="=";
while((getline < Settings_File)>0) {
if($0 !~ /^[#;]|^(\s*)$/) {
var_array[$1] = $2;
}
}
print var_array["first_var"];
print var_array["second_var"];
if (var_array["second_var"] == "bar") {
print "works!";
}
}
{
#more processing
}
END {
#finish processing
}

Check Registry Version using Perl

I need to go to the registry and check a programs installed version. I am using perl to a whole lot of others things but the registry checking part isn't working. The program version has to be 9.7 and up so it could be 9.8 or 9.7.5.
When I install the program it shows 9.7.4 but I just need the 9.7 to be checked.
Bellow is me going to DisplayVersion which is a REG_SZ which shows 9.7.4.
OR
I could use VersionMajor and VersionMinor together which is a REG_DWORD. Which for Major is 9 and Minor is 7.
$ProgVersion= `$rootpath\\system32\\reg\.exe query \\\\$ASSET\\HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\{9ACB414D-9347-40B6-A453-5EFB2DB59DFA} \/v DisplayVersion`;
if ($ProgVersion == /9.7/)
This doesn't work I could make it 9.200 and it still works. I tried to use this instead and it still wouldn't work. This next part is assuming that a new client needs to be install if it goes from 9.7. I was trying to use Great than or equal to, but it didn't work.
$ProgVersionMajor= `$rootpath\\system32\\reg\.exe query \\\\$ASSET\\HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\{9ACB414D-9347-40B6-A453-5EFB2DB59DFA} \/v VersionMajor`;
$ProgVersionMinor= `$rootpath\\system32\\reg\.exe query \\\\$ASSET\\HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\{9ACB414D-9347-40B6-A453-5EFB2DB59DFA} \/v VersionMinor`;
if (($ProgVersionMajor=~ /9/) && ($ProgVersionMinor=~ /7/))
Any help on doing this correctly or fixing what I am doing.
Several things:
You don't mention it, but are you using the Perl module Win32::TieRegistry? If not, you should. It'll make handling the Windows registry much easier.
In the Perl documentation, you can look at Version String under Scalar Value Constructors. This will make manipulating version strings much, much easier. Version strings have either more than one decimal place in them, or start with the letter v. I always prefix them with v to make it obvious what it is.
Here's a sample program below showing you how they can be used in comparisons:
#! /usr/bin/env perl
#
use strict;
use warnings;
my $version = v4.10.3;
for my $testVersion (v3.5.2, v4.4.1, v5.0.1) {
if ($version gt $testVersion) {
printf qq(Version %vd is greater than test %vd\n), $version, $testVersion;
}
else {
printf qq(Version %vd is less than test %vd\n), $version, $testVersion;
}
}
Note that I can't just print version strings. I have to use printf and sprintf and use the %vd vector decimal format to print them out. Printing version strings via a regular print statement can cause all sorts of havoc since they're really unicode representations. You put them in a print statement and you don't know what you're getting.
Also notice that you do not put quotes around them! Otherwise, you'll just make them regular strings.
NEW ANSWER
I was trying to find a way to convert a string into a v-string without downloading an optional package like Perl::Version or (Version), and I suddenly read that v-strings are deprecated, and I don't want to use a deprecated feature.
So, let's try something else...
We could simply divide up version numbers into their constituent components as arrays:
v1.2.3 => $version[0] = 1, $version[1] = 2, $version[2] = 3
By using the following bit of code:
my #version = split /\./, "9.7.5";
my #minVersion = split /\./, "9.7"
Now, we can each part of the version string against the other. In the above example, I compare the 9 of #version with the 9 of #version, etc. If #version was 9.6 I would have compared the 6 in #version against the 7 in #minVersion and quickly discovered that #minVersion is a higher version number. However, in both the second parts are 7. I then look at the third section. Whoops! #minVersion consists of only two sections. Thus, #version is bigger.
Here's a subroutine that does the comparison. Note that I also verify that each section is an integer via the /^\d+$/ regular expression. My subroutine can return four values:
0: Both are the same size
1: First Number is bigger
2: Second Number is bigger
undef: There is something wrong.
Here's the program:
my $minVersion = "10.3.1.3";
my $userVersion = "10.3.2";
# Create the version arrays
my $result = compare($minVersion, $userVersion);
if (not defined $results) {
print "Non-version string detected!\n";
}
elsif ($result == 0) {
print "$minVersion and $userVersion are the same\n";
}
elsif ($result == 1) {
print "$minVersion is bigger than $userVersion\n";
}
elsif ($result == 2) {
print "$userVersion is bigger than $minVersion\n";
}
else {
print "Something is wrong\n";
}
sub compare {
my $version1 = shift;
my $version2 = shift;
my #versionList1 = split /\./, $version1;
my #versionList2 = split /\./, $version2;
my $result;
while (1) {
# Shift off the first value for comparison
# Returns undef if there are no more values to parse
my $versionCompare1 = shift #versionList1;
my $versionCompare2 = shift #versionList2;
# If both are empty, Versions Matched
if (not defined $versionCompare1 and not defined $versionCompare2) {
return 0;
}
# If $versionCompare1 is empty $version2 is bigger
if (not defined $versionCompare1) {
return 2;
}
# If $versionCompare2 is empty $version1 is bigger
if (not defined $versionCompare2) {
return 1;
}
# Make sure both are numeric or else there's an error
if ($versionCompare1 !~ /\^d+$/ or $versionCompare2 !~ /\^\d+$/) {
return;
}
if ($versionCompare1 > $versionCompare2) {
return 1;
}
if ($versionCompare2 > $versionCompare1) {
return 2;
}
}
}
Using Win32::TieRegistry
You said in your answer you didn't use Win32::TieRegistry. I just want to show you what it can do for the readability of your program:
Your Way
$ProgVersion= `$rootpath\\system32\\reg\.exe query \\\\$ASSET\\HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\{9ACB414D-9347-40B6-A453-5EFB2DB59DFA} \/v DisplayVersion`;
With Win32::TieRegistry
use Win32::TieRegistry ( TiedHash => '%RegHash', DWordsToHex => 0 );
my $key = $TiedHash->{LMachine}->{Software}->{Wow6432Node}->{Microsoft}->{Windows}->{CurrentVersion}->{Uninstall}->{9ACB414D-9347-40B6-A453-5EFB2DB59DFA}->{Version};
my $programValue = $key->GetValue;
my $stringValue = unpack("L", $programValue);
Or, you can split it up:
my $MSSoftware = $TiedHash->{LMachine}->{Software}->{Wow6432Node}->{Microsoft};
my $uninstall = $MSSoftware->{Windows}->{CurrentVersion}->{Uninstall};
my $programVersion = $uninstall->{9ACB414D-9347-40B6-A453-5EFB2DB59DFA}->{Version};
See how much easier that's to read. You can also use this to test keys too.
(Word 'o Warning: I don't have a Windows machine in front of me, so I didn't exactly check the validity of the code. Try playing around with it and see what you get.)

Resources