Replacing a field in text file with the result of a system command - bash

Using Bash commands, I would like to substitute field 3 of each line of a text file with the result of a command which takes the original field 3 as an argument. Fields are /-delimited.
Input file:
./REMOTE_PARENT_DIR/0x134000564f:0x4c:0x0/test_runs/testgsi_O1
./REMOTE_PARENT_DIR/0x134000564f:0x4c:0x0/test_runs/testgsi_O2
...
Desired output file (don't print field 1 and 2, field 3 will be result of Unix command, print remaining fields):
/scratch/000011/rin/test_runs/testgsi_O1
/scratch/000011/rin/test_runs/testgsi_O2
...
Command to translate field 3 into normal path components:
hostx#lfs fid2path /scratch [0x134000564f:0x4c:0x0]
/scratch/000011/rin
Maybe use awk to grab the relevant field then sed with command substitution then spit out the new line?
This prints out the bit I need but not sure how to substitute into the lines of the file:
awk -F "/" '{ system("/bin/lfs fid2path /scratch " $3) }' outfile.70.sample.tmp

The following awk one-liner would help to achieve your goals:
awk 'BEGIN { FS="/"; OFS="/" } { cmd = "/bin/lfs fid2path /scratch "$3; cmd | getline path; close(cmd); for (i = 4; i <= NF; i++) { path = path""OFS""$i};print path }' your_input_file.txt
Here we assign the field separator FS and output field separator OFS built-in variables to a slash / in the BEGIN rule before any output is read and processed.
Then we create a command cmd variable based on your desired shell call with the third field $3 as argument.
We execute cmd shell command and pipe its output into the built-in command getline and put it in the variable path.
The close() function is called to close cmd after it produced its output and to ensure the command runs for each record.
Then using a for loop we concatenate the values starting with the 4th field until the end of the line to path variable separated with OFS.
Finally we print out the desired, changed path. Since I don't have the /bin/lfs command installed i just tested it with cmd = "echo "$3" | cut -d':' -f2" to see the results and it looks fine.
For example paths.txt:
./REMOTE_PARENT_DIR/0x134000564f:0x4c:0x0/test_runs/testgsi_O1
./REMOTE_PARENT_DIR/0x134000564f:0x4c:0x0/test_runs/testgsi_O2
Example call:
awk 'BEGIN { FS="/"; OFS="/" } { cmd = "echo "$3" | cut -d':' -f2"; cmd | getline path; close(cmd); for (i = 4; i <= NF; i++) { path = path""OFS""$i};print path }' paths.txt
Produces the result:
0x4c/test_runs/testgsi_O1
0x4c/test_runs/testgsi_O2
Where a specific part is extracted from the third awk field $3 using a shell command cut -d':' -f2. That is the 2nd field from the colon (:) separated string: 0x134000564f:0x4c:0x0.

I hope understand your problem correct, otherwise tell me to delete this
answer please.
If you do not mind using Perl then you can do that very easy and straightforward
Consider the following one-liner
perl -F'/' -ne '$F[2]="add-some-text/"; print #F[2..$#F]' file
It reads the file line by line, and substitutes the filed 2 with add-some-text which has this output:
all-some-text/test_runstestgsi_O1
all-some-text/test_runstestgsi_O2
Now if you want to use a command, just instead of a simple text use a command but with back-stick operator in Perl:
perl -F'/' -ne '$F[2]=`date "+%H:M:S"`; print #F[2..$#F]' file
or qx() which is more readable:
perl -F'/' -ne '$F[2]=qx(date "+%H:M:S"); print #F[2..$#F]' file
Also if you want to pass an argument you can do it as well:
perl -F'/' -ne '$F[2]=qx(echo -n $F[2]/); print #F[2..$#F]' file
and eventually for substitution just use -i.bak before -F. It will create a back-up file like file.bak and modify your original one.

#!/bin/bash
OUTFILE03=tmpfile
while IFS=/ read first second fid remainder
do
REAL=`/bin/lfs fid2path /scratch $fid`
echo "$REAL/$remainder"
done <"input.70" >$OUTFILE03

Related

How to find content in a file and replace the adjecent value

Using bash how do I find a string and update the string next to it for example pass value
my.site.com|test2.spin:80
proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
Expected output is to update proxy_pass.map with
my.site2.com test2.spin:80
my.site.com test2.spin:80;
I tried using awk
awk '{gsub(/^my\.site\.com\s+[A-Za-z0-9]+\.spin:8080;$/,"my.site2.comtest2.spin:80"); print}' proxy_pass.map
but does not seem to work. Is there a better way to approch the problem. ?
One awk idea, assuming spacing needs to be maintained:
awk -v rep='my.site.com|test2.spin:80' '
BEGIN { split(rep,a,"|") # split "rep" variable and store in
site[a[1]]=a[2] # associative array
}
$1 in site { line=$0 # if 1st field is in site[] array then make copy of current line
match(line,$1) # find where 1st field starts (in case 1st field does not start in column #1)
newline=substr(line,1,RSTART+RLENGTH-1) # save current line up through matching 1st field
line=substr(line,RSTART+RLENGTH) # strip off 1st field
match(line,/[^[:space:];]+/) # look for string that does not contain spaces or ";" and perform replacement, making sure to save everything after the match (";" in this case)
newline=newline substr(line,1,RSTART-1) site[$1] substr(line,RSTART+RLENGTH)
$0=newline # replace current line with newline
}
1 # print current line
' proxy_pass.map
This generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
If the input looks like:
$ cat proxy_pass.map
my.site2.com test2.spin:80
my.site.com test.spin:8080;
This awk script generates:
my.site2.com test2.spin:80
my.site.com test2.spin:80;
NOTES:
if multiple replacements need to be performed I'd suggest placing them in a file and having awk process said file first
the 2nd match() is hardcoded based on OP's example; depending on actual file contents it may be necessary to expand on the regex used in the 2nd match()
once satisified with the result the original input file can be updated in a couple ways ... a) if using GNU awk then awk -i inplace -v rep.... or b) save result to a temp file and then mv the temp file to proxy_pass.map
If the number of spaces between the columns is not significant, a simple
proxyf=proxy_pass.map
tmpf=$$.txt
awk '$1 == "my.site.com" { $2 = "test2.spin:80;" } {print}' <$proxyf >$tmpf && mv $tmpf $proxyf
should do. If you need the columns to be lined up nicely, you can replace the print by a suitable printf .... statement.
With your shown samples and attempts please try following awk code. Creating shell variable named var where it stores value my.site.com|test2.spin:80 in it. which further is being passed to awk program. In awk program creating variable named var1 which has shell variable var's value in it.
In BEGIN section of awk using split function to split value of var(shell variable's value container) into array named arr with separator as |. Where num is total number of values delimited by split function. Then using for loop to be running till value of num where it creates array named arr2 with index of current i value and making i+1 as its value(basically 1 is for key of array and next item is value of array).
In main block of awk program checking condition if $1 is in arr2 then print arr2's value else print $2 value as per requirement.
##Shell variable named var is being created here...
var="my.site.com|test2.spin:80"
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
print $1,(($1 in arr2)?arr2[$1]:$2)
}
' Input_file
OR in case you want to maintain spaces between 1st and 2nd field(s) then try following code little tweak of Above code. Written and tested with your shown samples Only.
awk -v var1="$var" '
BEGIN{
num=split(var1,arr,"|")
for(i=1;i<=num;i+=2){
arr2[arr[i]]=arr[i+1]
}
}
{
match($0,/[[:space:]]+/)
print $1 substr($0,RSTART,RLENGTH) (($1 in arr2)?arr2[$1]:$2)
}
' Input_file
NOTE: This program can take multiple values separated by | in shell variable to be passed and checked on in awk program. But it considers that it will be in format of key|value|key|value... only.
#!/bin/sh -x
f1=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f1)
f2=$(echo "my.site.com|test2.spin:80" | cut -d'|' -f2)
echo "${f1}%${f2};" >> proxy_pass.map
tr '%' '\t' < proxy_pass.map >> p1
cat > ed1 <<EOF
$
-1
d
wq
EOF
ed -s p1 < ed1
mv -v p1 proxy_pass.map
rm -v ed1
This might work for you (GNU sed):
<<<'my.site.com|test2.spin:80' sed -E 's#\.#\\.#g;s#^(\S+)\|(\S+)#/^\1\\b/s/\\S+/\2/2#' |
sed -Ef - file
Build a sed script from the input arguments and apply it to the input file.
The input arguments are first prepared so that their metacharacters ( in this case the .'s are escaped.
Then the first argument is used to prepare a match command and the second is used as the value to be replaced in a substitution command.
The result is piped into a second sed invocation that takes the sed script and applies it the input file.

Appending result of function on another field into csv using shell script, awk

I have a csv file stored as a temporary variable in a shell script (*.sh).
Let's say the data looks like this:
Account,Symbol,Price
100,AAPL US,200
102,SPY US,500
I want to add a fourth column, "Type", which is the result of a shell function "foobar". Run from the command line or a shell script itself:
$ foobar "AAPL US"
"Stock"
$ foobar "SPY US"
"ETF"
How do I add this column to my csv, and populate it with calls to foobar which take the second column as an argument? To clarify, this is my ideal result post-script:
Account,Symbol,Price,Type
100,AAPL US,200,Common Stock
102,SPY US,500,ETF
I see many examples online involving such a column addition using awk, and populating the new column with fixed values, conditional values, mathematical derivations from other columns, etc. - but nothing that calls a function on another field and stores its output.
You may use this awk:
export -f foobar
awk 'BEGIN{FS=OFS=","} NR==1{print $0, "Type"; next} {
cmd = "foobar \"" $2 "\""; cmd | getline line; close(cmd);
print $0, line
}' file.csv
Account,Symbol,Price,Type
100,AAPL US,200,Common Stock
102,SPY US,500,ETF
#anubhavas answer is a good approach so please don't change the accepted answer as I'm only posting this as an answer as it's too big and in need of formatting to fit in a comment.
FWIW I'd write his awk script as:
awk '
BEGIN { FS=OFS="," }
NR==1 { type = "Type" }
NR > 1 {
cmd = "foobar \047" $2 "\047"
type = ((cmd | getline line) > 0 ? line : "ERROR")
close(cmd)
}
{ print $0, type }
' file.csv
to:
better protect $2 from shell expansion, and
protect from silently printing the previous value if/when cmd | getline fails, and
consolidate the print statements to 1 line so it's easy to change for all output lines if/when necessary
awk to the rescue!
$ echo "Account,Symbol,Price
100,AAPL US,200
102,SPY US,500" |
awk -F, 'NR>1{cmd="foobar "$2; cmd | getline type} {print $0 FS (NR==1?"Type":type)}'
Not sure you need to quote the input to foobar
Another way not using awk:
paste -d, input.csv <({ read; printf "Type\n"; while IFS=, read -r _ s _; do foobar "$s"; done; } < input.csv)

how to select the last line of the shell output

Hi I have a shell command like this.
s3=$(awk 'BEGIN{ print "S3 bucket path" }
/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 }
/s3:\/\//{ print "," $10 }' OFS=',' hive-server2.log)
The output of the above command like this.
echo $s3
2018-02-21T17:58:22,
2018-02-21T17:58:26,
2018-02-21T18:05:33,
2018-02-21T18:05:34
I want to select the last line only. I need the last output like this.
2018-02-21T18:05:34
I tried like this.
awk -v $s3 '{print $(NF)}'
Not working.Any help will be appreciated.
In general, command | tail -n 1 prints the last line of the output from command. However, where command is of the form awk '... { ... print something }' you can refactor to awk '... { ... result = something } END { print result }' to avoid spawning a separate process just to discard the other output. (Conversely, you can replace awk '/condition/ { print something }' | head -n 1 with awk '/condition/ { print something; exit }'.)
If you already have the result in a shell variable s3 and want to print just the last line, a parameter expansion echo "${s3##*$'\n'}" does that. The C-style string $'\n' to represent a newline is a Bash extension, and the parameter expansion operator ## to remove the longest matching prefix isn't entirely portable either, so you should make sure the shebang line says #!/bin/bash, not #!/bin/sh
Notice also that $s3 without quotes is an error unless you specifically require the shell to perform whitespace tokenization and wildcard expansion on the value. You should basically always use double quotes around variables except in a couple of very specific scenarios.
Your Awk command would not work for two reasons; firstly, as explained in the previous paragraph, you are setting s3 to the first token of the variable, and the second is your Awk script (probably a syntax error). In more detail, you are basically running
awk -v s3=firstvalue secondvalue thirdvalue '{ print $(NF) }'
^ value ^ script to run ^ names of files ...
where you probably wanted to say
awk -v s3=$'firstvalue\nsecondvalue\nthirdvalue' '{ print $(NF) }'
But even with quoting, your script would set v to something but then tell Awk to (ignore the variable and) process standard input, which on the command line leaves it reading from your terminal. A fixed script might look like
awk 'END { print }' <<<"$s3"
which passes the variable as standard input to Awk, which prints the last line. The <<<value "here string" syntax is also a Bash extension, and not portable to POSIX sh.
much simple way is
command | grep "your filter" | tail -n 1
or directly
command | tail -n 1
You could try this:
echo -e "This is the first line \nThis is the second line" | awk 'END{print}'
another approach can be, processing the file from the end and exiting after first match.
tac file | awk '/match/{print; exit}'
Hi you can do it just by adding echo $s3 | sed '$!d'
s3=$(awk 'BEGIN{ print "S3 bucket path" }/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 } /s3:\/\//{ print "," $10 }' OFS=',' hive-server2.log)
echo $s3 | sed '$!d'
It will simply print:-
2018-02-21T18:05:34
Hope this will help you.

How to execute awk command in shell script

I have an awk command that extracts the 16th column from 3rd line in a csv file and prints the first 4 characters.
awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}'
This works fine.
But when I execute it from a shell script, I get and error
#!/bin/ksh
YEAR=awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}'
Error message:
-F,: not found
Use command substitution to assign the output of a command to a variable, as shown below:
YEAR=$(awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}')
you are asking the shell to do :
VAR=value command [arguments...]
which means: launch command but pass it the VAR=value environment first
(ex: LC_ALL=C grep '[0-9]*' /some/file.txt : will grep a number in file.txt (and this with the LC_ALL variable set to C just for the duration of the call of grep)
So here : you ask the shell to launch the -F"," command (ie, -F, once the shell interpret the "," into , with arguments 'NR==3.......... and with the variable YEAR set to the value awk for the duration of the command invocation.
Just replace it with :
#!/bin/ksh
YEAR="$(awk -F',' 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,1,4)}')"
(I didn't try it, but I hope they work for you and your sample.csv file)
(Note that you use "0" to match character position 1, which works in many awk implementation but not all (ie most (but not all) assume 1 when you write 0))
From your description, it looks like you want to extract the year from the 16th field, which might contain leading spaces. You can accomplish it by calling AWK once:
YEAR=$(awk -F, 'NR==3{sub(/^[ \t]*/, "", $16); print ">" substr($16,1,4) "<" }')
Better yet, you don't even have to use awk. Since you are already writing shell script, let's do it all in shell script:
{ read line; read line; read line; } < sample.csv # Get the third line
IFS=, set $line # Breaks line into comma-separated fields
IFS=" " set ${16} # Trick to remove leading spaces, field 16 becomes field 1
YEAR=${1:0:4} # Extract the first 4 char from field 1
Do this:
year=$(awk -F, 'NR==3{sub(/^[ \t]+/,"",$16); print substr($16,1,4); exit }' sample.csv)

Calling an executable program using awk

I have a program in C that I want to call by using awk in shell scripting. How can I do something like this?
From the AWK man page:
system(cmd)
executes cmd and returns its exit status
The GNU AWK manual also has a section that, in part, describes the system function and provides an example:
system("date | mail -s 'awk run done' root")
A much more robust way would be to use the getline() function of GNU awk to use a variable from a pipe. In form cmd | getline result, cmd is run, then its output is piped to getline. It returns 1 if got output, 0 if EOF, -1 on failure.
First construct the command to run in a variable in the BEGIN clause if the command is not dependant on the contents of the file, e.g. a simple date or an ls.
A simple example of the above would be
awk 'BEGIN {
cmd = "ls -lrth"
while ( ( cmd | getline result ) > 0 ) {
print result
}
close(cmd);
}'
When the command to run is part of the columnar content of a file, you generate the cmd string in the main {..} as below. E.g. consider a file whose $2 contains the name of the file and you want it to be replaced with the md5sum hash content of the file. You can do
awk '{ cmd = "md5sum "$2
while ( ( cmd | getline md5result ) > 0 ) {
$2 = md5result
}
close(cmd);
}1'
Another frequent usage involving external commands in awk is during date processing when your awk does not support time functions out of the box with mktime(), strftime() functions.
Consider a case when you have Unix EPOCH timestamp stored in a column and you want to convert that to a human readable date format. Assuming GNU date is available
awk '{ cmd = "date -d #" $1 " +\"%d-%m-%Y %H:%M:%S\""
while ( ( cmd | getline fmtDate) > 0 ) {
$1 = fmtDate
}
close(cmd);
}1'
for an input string as
1572608319 foo bar zoo
the above command produces an output as
01-11-2019 07:38:39 foo bar zoo
The command can be tailored to modify the date fields on any of the columns in a given line. Note that -d is a GNU specific extension, the *BSD variants support -f ( though not exactly similar to -d).
More information about getline can be referred to from this AllAboutGetline article at awk.freeshell.org page.
There are several ways.
awk has a system() function that will run a shell command:
system("cmd")
You can print to a pipe:
print "blah" | "cmd"
You can have awk construct commands, and pipe all the output to the shell:
awk 'some script' | sh
Something as simple as this will work
awk 'BEGIN{system("echo hello")}'
and
awk 'BEGIN { system("date"); close("date")}'
I use the power of awk to delete some of my stopped docker containers. Observe carefully how i construct the cmd string first before passing it to system.
docker ps -a | awk '$3 ~ "/bin/clish" { cmd="docker rm "$1;system(cmd)}'
Here, I use the 3rd column having the pattern "/bin/clish" and then I extract the container ID in the first column to construct my cmd string and passed that to system.
It really depends :) One of the handy linux core utils (info coreutils) is xargs. If you are using awk you probably have a more involved use-case in mind - your question is not very detailled.
printf "1 2\n3 4" | awk '{ print $2 }' | xargs touch
Will execute touch 2 4. Here touch could be replaced by your program. More info at info xargs and man xargs (really, read these).
I believe you would like to replace touch with your program.
Breakdown of beforementioned script:
printf "1 2\n3 4"
# Output:
1 2
3 4
# The pipe (|) makes the output of the left command the input of
# the right command (simplified)
printf "1 2\n3 4" | awk '{ print $2 }'
# Output (of the awk command):
2
4
# xargs will execute a command with arguments. The arguments
# are made up taking the input to xargs (in this case the output
# of the awk command, which is "2 4".
printf "1 2\n3 4" | awk '{ print $2 }' | xargs touch
# No output, but executes: `touch 2 4` which will create (or update
# timestamp if the files already exist) files with the name "2" and "4"
Update In the original answer, I used echo instead of printf. However, printf is the better and more portable alternative as was pointed out by a comment (where great links with discussions can be found).
#!/usr/bin/awk -f
BEGIN {
command = "ls -lh"
command |getline
}
Runs "ls -lh" in an awk script
You can call easily with parameters via the system argument.
For example, to kill jobs corresponding to a certain string (we can otherly of course) :
ps aux | grep my_searched_string | awk '{system("kill " $2)}'
I was able to have this done via below method
cat ../logs/em2.log.1 |grep -i 192.168.21.15 |awk '{system(`date`); print $1}'
awk has a function called system it enables you to execute any linux bash command within the output of awk.

Resources