awk supports this:
awk '{print $(NF-1);}'
but not for user-defined variables:
awk '{a=123; b="a"; print $($b);}'
by the way, shell supports this:
a=123;
b="a";
eval echo \${$b};
How can I achieve my purpose in awk?
OK, since some of us like to eat spaghetti through their nose, here is some actual code that I wrote in the past :-)
First of all, getting a self modifying code in a language that does not support it will be extremely non-trivial.
The idea to allow dynamic variables, function names, in a language that does not support one is very simple. At some state in the program, you want a dynamic anything to self modify your code, and resume execution
from where you left off. a eval(), that is.
This is all very trivial, if the language supports eval() and such equlavant. However, awk does not have such function. Therefore, you, the programmer has to provide a interface to such thing.
To allow all this to happen, you have three main problems
How to get our self so we can modify it
How to load the modified code, and resume from where we left off
Finding a way for the interpreter to accept our modified code
How to get our self so we can modify it
Here is a example code, suitable for direct execution.
This one is the infastrucure that I inject for enviroments running gawk, as it requires PROCINFO
echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
print "foo";
dbg_argv(A);
dbg_printarray(A);
print "bar";
}'
Result:
foo
A[1]=[awk]
A[2]=[
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
print "foo";
dbg_argv(A);
dbg_printarray(A);
print "bar";
}]
bar
As you can see, as long as the OS does not play with our args, and /proc/ is available, it is possible
to read our self. This may appear useless at first, but we need it for push/pop of our stack,
so that our execution state can be enbedded within the code, so we can save/resume and survive OS shutdown/reboots
I have left out the OS detection function and the bootloader (written in awk), because, if I publish that,
kids can build platform independent polynormal code, and it is easy to cause havoc with it.
how to load the modified code, and resume from where we left off
Now, normaly you have push() and pop() for registers, so you can save your state and play with
your self, and resume from where you left off. a Call and reading your stack is a typical way to get the
memory address.
Unfortunetly, in awk, under normal situations we can not use pointers (with out a lot of dirty work),
or registers (unless you can inject other stuff along the way).
However you need a way to suspend and resume from your code.
The idea is simple. Instead of letting awk in control of your loops and while, if else conditions,
recrusion depth, and functions you are in, the code should.
Keep a stack, list of variable names, list of function names, and manage it your self.
Just make sure that your code always calls self_modify( bool ) constantly, so that even upon sudden failure,
As soon as the script is re-run, we can enter self_modify( bool ) and resume our state.
When you want to self modify your code, you must provide a custom made
write_stack() and read_stack() code, that writes out the state of stack as string, and reads string from
the values out from the code embedded string itself, and resume the execution state.
Here is a small piece of code that demonstrates the whole flow
echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function _(s){return s}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
_(BEGIN_MODIFY"|");print "#foo";_("|"END_MODIFY)
dbg_argv(A);
sub( \
"BEGIN_MODIFY\x22\x5c\x7c[^\x5c\x7c]*\x5c\x7c\x22""END_MODIFY", \
"BEGIN_MODIFY\x22\x7c\x22);print \"#"PROCINFO["pid"]"\";_(\x22\x7c\x22""END_MODIFY" \
,A[2])
print "echo \x22\x22\x7c awk \x27"A[2]"";
print "function bar_"PROCINFO["pid"]"_(s){print \x22""doe\x22}";
print "\x27"
}'
Result:
Exactly same as our original code, except
_(BEGIN_MODIFY"|");print "65964";_("|"ND_MODIFY)
and
function bar_56228_(s){print "doe"}
at the end of code
Now, this may seem useless, as we are only replaceing code print "foo"; with our pid.
But it becomes usefull, when there are multiple _() with separate MAGIC strings to identify BLOCKS,
and a custome made multi line string replacement routine instead of sub()
You msut provide BLOCKS for stack, function list, execution point, as a bare minimum.
And notice that the last line contains bar
This it self is just a sting, but when this code repeatedly gets executed, notice that
function bar_56228_(s){print "doe"}
function bar_88128_(s){print "doe"}
...
and it keeps growing. While the example is intentionally made so that it does nothing useful,
if we provide a routine to call bar_pid_(s) instead of that print "foo" code,
Sudenly it means we have eval() on our hands :-)
Now, isn't eval() usefull :-)
Don't forget to provide a custome made remove_block() function so that the code maintains
a reasonable size, instead of growing every time you execute.
Finding a way for the interpreter to accept our modified code
Normally calling a binary is trivial. However, when doing so from with in awk, it becomes difficult.
You may say system() is the way.
There are two problems to that.
system() may not work on some envoroments
it blocks while you are executing code, trus you can not perform recrusive calls and keep the user happy at the same time.
If you must use system(), ensure that it does not block.
A normal call to system("sleep 20 && echo from-sh & ") will not work.
The solution is simple,
echo ""|awk '{print "foo";E="echo ep ; sleep 20 && echo foo & disown ; "; E | getline v;close(E);print "bar";}'
Now you have a async system() call that does not block :-)
Not at the moment. However, if you provide a wrapper, it is (somewhat hacky and dirty) possible.
The idea is to use # operator, introduced in the recent versions of gawk.
This # operator is normally used to call a function by name.
So if you had
function foo(s){print "Called foo "s}
function bar(s){print "Called bar "s}
{
var = "";
if(today_i_feel_like_calling_foo){
var = "foo";
}else{
var = "bar";
}
#var( "arg" ); # This calls function foo(), or function bar() with "arg"
}
Now, this is usefull on it's own.
Assuming we know var names beforehand, we can write a wrapper to indirectly modify and obtain vars
function get(varname, this, call){call="get_"varname;return #call();}
function set(varname, arg, this, call){call="set_"varname; #call(arg);}
So now, for each var name you want to prrvide access by name, you declare these two functions
function get_my_var(){return my_var;}
function set_my_var(arg){my_var = arg;}
And prahaps, somewhere in your BEGIN{} block,
BEGIN{ my_var = ""; }
To declare it for global access.
Then you can use
get("my_var");
set("my_var", "whatever");
This may appear useless at first, however there are perfectly good use cases, such as
keeping a linked list of vars, by holding the var's name in another var's array, and such.
It works for arrays too, and to be honest, I use this for nesting and linking Arrays within
Arrays, so I can walk through multiple Arrays like using pointers.
You can also write configure scripts that refer to var names inside awk this way,
in effect having a interpreter-inside-a-interpreter type of things, too...
Not the best way to do things, however, it gets the job done, and I do not have to worry about
null pointer exceptions, or GC and such :-)
The $ notation is not a mark for variables, as in shell, PHP, Perl etc. It is rather an operator, which receives an integer value n and returns the n-th column from the input. So, what you did in the first example is not the setting/getting of a variable dynamically but rather a call to an operator/function.
As stated by commenters, you can archive the behavior you are looking for with arrays:
awk '{a=123; b="a"; v[b] = a; print v[b];}'
I had a similar problem to solve, to load the settings from a '.ini' file and I've used arrays to set the variables dynamically.
It works with Awk or Gawk, Linux or Windows (GnuWin32)
gawk -v Settings_File="my_settings_file.ini" -f awk_script.awk <processing_file>
[my_settings_file.ini]
#comment
first_var=foo
second_var=bar
[awk_script.awk]
BEGIN{
FS="=";
while((getline < Settings_File)>0) {
if($0 !~ /^[#;]|^(\s*)$/) {
var_array[$1] = $2;
}
}
print var_array["first_var"];
print var_array["second_var"];
if (var_array["second_var"] == "bar") {
print "works!";
}
}
{
#more processing
}
END {
#finish processing
}
Related
Is there an easy way to change a "for loop header" depending on the packages a user has? For example, #progress for is good for adding a progress bar in Juno/Atom (just found out!), while we also have things like #simd, #acc, and #parallel. So what I want to have on this loop is to put a bunch of these macros conditionally given a boolean from the user or depending on availability. However, if I do a simple
if isdefined(#progress)
#progress for ...
elseif accelerate
#acc for ...
elseif
#parallel for ...
end
or something of that sort, I would have to keep pasting around the same for loop code. Is there some more elegant way of doing this? Also, I may want to combine some, and so once you start looking at the viable combinations that ends up being a lot of code!
The Pkg.installed method will error if the package isn't installed. It takes a string, and returning the decorated expression after that line with the other possibility in the catch block is effective for this sort of thing:
macro optional_something(pkg, expr)
try
Pkg.installed(string(pkg)) == nothing && return expr
esc(quote
#time $expr
end)
catch
expr
end
end
# this won't add the macro #time
#optional_something XXX rand(1000)
# this will
#optional_something Plots rand(1000)
Can you determine at runtime if the executed code is running as a function or a script? If yes, what is the recommended method?
There is another way. nargin(...) gives an error if it is called on a script. The following short function should therefore do what you are asking for:
function result = isFunction(functionHandle)
%
% functionHandle: Can be a handle or string.
% result: Returns true or false.
% Try nargin() to determine if handle is a script:
try
nargin(functionHandle);
result = true;
catch exception
% If exception is as below, it is a script.
if (strcmp(exception.identifier, 'MATLAB:nargin:isScript'))
result = false;
else
% Else re-throw error:
throw(exception);
end
end
It might not be the most pretty way, but it works.
Regards
+1 for a very interesting question.
I can think of a way of determining that. Parse the executed m-file itself and check the first word in the first non-trivial non-comment line. If it's the function keyword, it's a function file. If it's not, it's a script.
Here's a neat one-liner:
strcmp(textread([mfilename '.m'], '%s', 1, 'commentstyle', 'matlab'), 'function')
The resulting value should be 1 if it's a function file, and 0 if it's a script.
Keep in mind that this code needs to be run from the m-file in question, and not from a separate function file, of course. If you want to make a generic function out of that (i.e one that tests any m-file), just pass the desired file name string to textread, like so:
function y = isfunction(x)
y = strcmp(textread([x '.m'], '%s', 1, 'commentstyle', 'matlab'), 'function')
To make this function more robust, you can also add error-handling code that verifies that the m-file actually exists before attempting to textread it.
Maybe I've been staring at my Ruby book to the point where words don't mean anything anymore, but I thought I might as well ask.
What I'm looking to do is, rather than passing a bulk of variables through my function headers, or instead of using global variables, I'm looking to save my variables within a method, and call upon them at multiple times, throughout my functions. What I'm realistically having an issue with, is scope.
def DateGrab()
print "\nEnter the date you're looking for (Month/Day): "
longdate = gets.strip.split(/\/| /)
if longdate[0].length > 3
month = longdate[0].slice(0..2)
else
month = longdate[0]
end
day = longdate[1]
year = `date | awk '{print $6}'`.strip
grepdate = "#{day}/#{month}/#{year}"
date = Date.parse("#{day}-#{month}-#{year}").strftime('%m%d%Y').strip
end
I'm looking to pass "grepdate" and "date" through multiple functions, and I feel that using a method would be easier, but every time I try to call the variable, I get a "undefined local variable or method" error.
You want to look at instance variables. You can set them via #grepdate = "something" These instance variables are accessible throughout your class and in all methods of that class.
I was writing a file parser in Perl, so had to loop through file. File consists of fixed length records and I wanted to make a separate function that parses given record and call that function in a loop. However, final result turned to be slow with big files and my guess was that I shouldn't use external function. So I made some dummy tests with and without function call in a loop:
[A]
foreach (1 .. 10000000) {
$a = &get_string();
}
sub get_string {
return sprintf("%s\n", 'abc');
}
[B]
foreach (1 .. 10000000) {
$a = sprintf "%s\n", 'abc';
}
Measuring showed that A code runs about 3-4 times slower than code B. I knew beforehand that code A was supposed to run slower but still I was surprised that difference is that big. Also tried to run similar tests with Python and Java. In Python code A equivalent was about 20% slower than B and Java code was runing more or less at the same speed (as expected). Changing function from sprintf to something else didn't show any significant difference.
Is there any way to help Perl run such loops faster? Am I doing something totaly wrong here or is it Perl's feature that function calls are such overhead?
Perl function calls are slow. It sucks because the very thing you want to be doing, decomposing your code into maintainable functions, is the very thing that will slow your program down. Why are they slow? Perl does a lot of things when it enters a subroutine, a result of it being extremely dynamic (ie. you can mess with a lot of things at run time). It has to get the code reference for that name, check that it is a code ref, set up a new lexical scratchpad (to store my variables), a new dynamic scope (to store local variables), set up #_ to name a few, check what context it was called in and pass along the return value. Attempts have been made to optimize this process, but they haven't paid out. See pp_entersub in pp_hot.c for the gory details.
Also there was a bug in 5.10.0 slowing down functions. If you're using 5.10.0, upgrade.
As a result, avoid calling functions over and over again in a long loop. Especially if its nested. Can you cache the results, perhaps using Memoize? Does the work have to be done inside the loop? Does it have to be done inside the inner-most loop? For example:
for my $thing (#things) {
for my $person (#persons) {
print header($thing);
print message_for($person);
}
}
The call to header could be moved out of the #persons loop reducing the number of calls from #things * #persons to just #things.
for my $thing (#things) {
my $header = header($thing);
for my $person (#persons) {
print $header;
print message_for($person);
}
}
If your sub has no arguments and is a constant as in your example, you can get a major speed-up by using an empty prototype "()" in the sub declaration:
sub get_string() {
return sprintf(“%s\n”, ‘abc’);
}
However this is probably a special case for your example that do not match your real case. This is just to show you the dangers of benchmarks.
You'll learn this tip and many others by reading perlsub.
Here is a benchmark:
use strict;
use warnings;
use Benchmark qw(cmpthese);
sub just_return { return }
sub get_string { sprintf "%s\n", 'abc' }
sub get_string_with_proto() { sprintf "%s\n", 'abc' }
my %methods = (
direct => sub { my $s = sprintf "%s\n", 'abc' },
function => sub { my $s = get_string() },
just_return => sub { my $s = just_return() },
function_with_proto => sub { my $s = get_string_with_proto() },
);
cmpthese(-2, \%methods);
and its result:
Rate function just_return direct function_with_proto
function 1488987/s -- -65% -90% -90%
just_return 4285454/s 188% -- -70% -71%
direct 14210565/s 854% 232% -- -5%
function_with_proto 15018312/s 909% 250% 6% --
The issue you are raising does not have anything to do with loops. Both your A and B examples are the same in that regard. Rather, the issue is the difference between direct, in-line coding vs. calling the same code via a function.
Function calls do involve an unavoidable overhead. I can't speak to the issue of whether and why this overhead is costlier in Perl relative to other languages, but I can provide an illustration of a better way to measure this sort of thing:
use strict;
use warnings;
use Benchmark qw(cmpthese);
sub just_return { return }
sub get_string { my $s = sprintf "%s\n", 'abc' }
my %methods = (
direct => sub { my $s = sprintf "%s\n", 'abc' },
function => sub { my $s = get_string() },
just_return => sub { my $s = just_return() },
);
cmpthese(-2, \%methods);
Here's what I get on Perl v5.10.0 (MSWin32-x86-multi-thread). Very roughly, simply calling a function that does nothing is about as costly as directly running our sprintf code.
Rate function just_return direct
function 1062833/s -- -70% -71%
just_return 3566639/s 236% -- -2%
direct 3629492/s 241% 2% --
In general, if you need to optimize some Perl code for speed and you're trying to squeeze out every last drop of efficiency, direct coding is the way to go -- but that often comes with a price of less maintainability and readability. Before you get into the business of such micro-optimizing, however, you want to make sure that your underlying algorithm is solid and that you have a firm grasp on where the slow parts of your code actually reside. It's easy to waste a lot of effort working on the wrong thing.
The perl optimizer is constant-folding the sprintf calls in your sample code.
You can deparse it to see it happening:
$ perl -MO=Deparse sample.pl
foreach $_ (1 .. 10000000) {
$a = &get_string();
}
sub get_string {
return "abc\n";
}
foreach $_ (1 .. 10000000) {
$a = "abc\n";
}
- syntax OK
People use gdb on and off for debugging,
of course there are lots of other debugging tools
across the varied OSes, with and without GUI and,
maybe other fancy IDE features.
I would like to know what useful gdb scripts you have written and liked.
While, I do not mean a dump of commands in a something.gdb file that you source to pull out a bunch of data, if that made your day, go ahead and talk about it.
Lets think conditional processing, control loops and functions written for more elegant and refined programming to debug and, maybe even for whitebox testing
Things get interesting when you start debugging remote systems (say, over a serial/ethernet interface)
And, what if the target is a multi-processor (and, multithreaded) system
Let me put a simple case as an example...
Say,
A script that traversed serially over entries
to locate a bad entry in a large hash-table
that is implemented on an embedded platform.
That helped me debug a broken hash-table once.
This script, not written by me, pretty prints STL containers, such as vector, map, etc: http://www.yolinux.com/TUTORIALS/src/dbinit_stl_views-1.03.txt
Totally awesome.
When debugging an AOLserver SIGSEGV crash, I used the following script to examine the TCL-level call stack from GDB:
define tcl_stack_dump
set $interp = *(Interp*)interp
set $frame = $interp->framePtr
while (0 != (CallFrame *)$frame->callerPtr != 0)
set $i = 0
if 0 != $frame->objv
while ($i < $frame->objc)
if (0 != $frame->objv[$i] && 0 != $frame->objv[$i]->bytes)
printf " %s", (char *)(CallFrame *)$frame->objv[$i]->bytes
end
set $i = $i + 1
end
printf "\n"
end
set $frame = (CallFrame *)$frame->callerPtr
end
end
document tcl_stack_dump
Print a list of TCL procs and arguments currently being called in an
interpreter. Most TCL C API functions beginning with Tcl[^_] will have a
Tcl_Interp parameter. Assumes the `interp' local C variable holds a
Tcl_Interp which is represented internally as an Interp struct.
See:
ptype Interp
ptype CallFrame
ptype Proc
ptype Command
ptype Namespace
ptype Tcl_Obj
end
1. When trying to get some 3rd party closed-source DLLs working with our project under Mono, it was giving meaningless errors. Consequently, I resorted to the scripts from the Mono project.
2. I also had a project that could dump it's own information to stdout for use in GDB, so at a breakpoint, I could run the function, then cut-n-paste its output into GDB.
[Edit]
3. Most of my GCC/G++ use has been a while, but I also recall using a macro to take advantage of the fact that GDB knew the members of some opaque data I had (the library was compiled with debug). That was enormously helpful.
4. And I just found this, too. It dumps a list of objects (from a global "headMeterFix" SLL) that contain, among other things, dynamic arrays of another object type. One of the few times I've used nested loops in a macro:
define showFixes
set $i= headMeterFix
set $n = 0
while ($i != 0)
set $p = $i->resolved_list
set $x = $i->resolved_cnt
set $j = 0
printf "%08x [%d] = {", $i, $x
printf "%3d [%3d] %08x->%08x (D/R): %3d/%-3d - %3d/%-3d {", $n, $i, $x, $i->fix, $i->depend_cnt, dynArySizeDepList($i->depend_list), $i->resolved_cnt, dynArySizeDepList($i->resolved_list)
while ($j < $x)
printf " %08x", $p[$j]
set $j=$j+1
end
printf " }\n"
set $i = $i->next
set $n = $n+1
end
end