Bash awk append to same line - bash

There are numerous posts about removing leading white space and appending an entry to a single existing line in a file using awk. None of my attempts work - just three examples here of the many I have tried.
Say I have a file called $log with a single line
a:b:c
and I want to add a fourth entry,
awk '{ print $4"d" }' $log | tee -a $log
output seems to be a newline
`a:b:c:
d`
whereas, I want all on the same line;
a:b:c:d
try
BEGIN { FS = ":" } ; awk '{ print $4"d" }' $log | tee -a $log
or, this - avoid a new line
awk 'BEGIN { ORS=":" }; { print $4"d" }' $log | tee -a $log
no change
`a:b:c:
d`
awk is placing a space after c: and then writing d to the next line.
EDIT: | tee -a $log appears to be necessary to write the additional string to the file.
$log contains 39 variables and was generated using awk without | tee -a
odd...
The actual command to write $40 to the single line entries
awk '{ print $40"'$imagedir'" }' $log
output
+ awk '{ print $40"/home/geoland/Asterism-DEVEL/DSO" }'
/home/geoland/.asterism/log
but this does not write to the $log file.
How should I append d to the same line without leading white space using awk - also looking at sed xargs and other alternatives.

Using awk:
awk '{ print $0":d" }' file
Using sed:
sed 's/$/:d/' file
Using only bash:
while IFS= read -r line; do
echo "$line:d"
done < file

Using sed:
$ echo a:b:c | sed 's,\(^.*$\),\1:d,'
a:b:c:d

Thanks all... This is the solution I went with. I also needed to write the entire line to a perpetual log file because the log file is overwritten at each new process instance.
I will further investigate an awk solution.
logname=$imagedir/log_$name
while IFS=: read -r line; do
echo "$line$imagedir"
done < $log | tee $logname
This places $imagedir directly behind the last IFS ':' separator
There is probably room for refinement.

I too am not entirely sure what you're trying to do here.
Your command line, awk '{ print $4"d" }' $log | tee -a $log is problematic in a number of ways.
First, your awk script tries to print the 4th field, which is empty. Unless you say otherwise, fields are separated by whitespace, and the string a:b:c has no whitespace. So .. awk prints "d". And tee -a appends to your existing logfile, so what you're seeing is the original data, along with the d printed by awk. That's totally expected.
Second, it appears to have tee appending to the same file that awk is in the process of reading. This won't make an endless loop, as awk should stop reading the input file after whatever was the last byte when the file was opened, but it does mean you may have repeated data there.
Your other attempts, aside from some syntactical errors, all suffer from the same assumption that $4 means something that it does not.
The following awk snippet sets the input and output field separators to :, then sets the 4th field to "d", then prints the line.
$ echo "a:b:c" | awk 'BEGIN{FS=OFS=":"} {$4="d"} 1'
a:b:c:d
Is that what you want?
If you really do need to append this data to an existing log file, you can do so with tee -a or simple >> redirection. Just bear in mind that awk will only see the content of the file as of the time it was run, and by appending, you are not replacing lines.
One other thing. If you are actually hoping to use the content of the shell variable $imagedir inside awk, you should pass the variable in rather than exiting your quotes. For example:
$ echo "a:b:c" | awk -v d="foo/bar" 'BEGIN{FS=OFS=":"} {$4=d} 1'
a:b:c:foo/bar

sed "s|$|$imagedir|" file | tee newfile
This does the trick. Read 'file' and write the contents of 'file' with the substitution to a 'new file', so as to read the image directory when using a secondary standalone process.
Because the variable is a directory with several / these need to be escaped, so as not to interpret as sed delimiters. I had difficulty with this using a variable.
A neater option was to use an alternative delimiter. Not to be confused with the pipe that follows.

Related

How to remove the username/hostname line from an output on Korn Shell?

I run the command
df -gP /data1 /data2 | grep -v File | awk '{print $1}' |
awk -F/dev/ '$0=$2' | tr '\n' '
on the AIX shell (ksh) and it prints the output below:
lv_data01 lv_data02 root#testhost:/
However, I would like the output to be printed this way. Could someone help?
lv_data01 lv_data02
Using grep … | awk … | awk … is not necessary; a single awk could do the whole job. So could sed and it might even be easier. I'd be tempted to deal with the spacing by using:
x=$(df … | sed …); echo $x
The tr command, once corrected, replaces newlines with spaces, so the prompt follows without a newline before it. The ; echo suggestion adds the missing newline; the echo $x suggestion (note no double quotes) does too.
As for the sed command:
sed -n '/File/!{ s/[[:space:]].*//; s%^.*/dev/%%p; }'
Don't print anything by default
If the line doesn't match File (doing the work of grep -v):
remove the first space (blank or tab) and everything after it (doing the work of awk '{print $1}')
replace everything up to /dev/ with nothing and print (doing the work of awk -F/dev/ '{$0=$2}')
The command substitution and capture, followed by echo, deals with spaces and newlines.
So, my suggested solution is:
x=$(df -gP /data1 /data2 | sed -n '/File/!{ s/[[:space:]].*//; s%^.*/dev/%%p; }'); echo $x
You could add unset x after the echo if you are going to be using this directly in the shell and not in a shell script. If it'll be encapsulated in a shell script, you don't have to worry about it.
I'm blithely assuming the output from df -gP won't contain a path such as this, with two occurrences of /dev:
/who/knows/dev/lv_data01/dev/bin
If that's a real problem, you can fix the sed script, but I don't think it will be. It's one thing the second awk script in the question handles differently.

How to capture first column values of a command?

I am new to shell scripting. I am trying to write a script that is suppose to run a command and use for loop to capture first column of the output and do further processing.
command: tst get files
output of this command is something like
NAME COUNT ADMIN
FileA.txt 30 adminA
FileB.txt 21 local
FileC.txt 9 local
FileD.txt 90 adminA
Here is what I have tried so far : UPDATED also want to run additional commands
#!/bin/bash
for f in $(tst get files)
do
echo "FILE :[${f}]"
tst setprimary ${f} && tst get dataload
done
the output I am seeing is something like
FILE :[NAME]
FILE :[COUNT]
FILE :[ADMIN]
FILE :[FileA.txt]
FILE :[30]
FILE :[adminA]
FILE :[FileB.txt]
FILE :[21]
FILE :[local]
FILE :[FileC.txt]
FILE :[9]
FILE :[local]
FILE :[FileD.txt]
FILE :[90]
FILE :[adminA]
I am looking for an output something like
FILE :[FileA.txt]
FILE :[FileB.txt]
FILE :[FileC.txt]
FILE :[FileD.txt]
What should I modify in the shell script to only capture NAME column values? Am I executing the tst get files command correctly in the for loop or is there a better way to execute a command and loop thru the results?
EDIT (Samuel Kirschner): you can do without the for loop entirely and just use awk to print the lines you're interested in
tst get files | awk 'NR > 1 {print "FILE :[" $1 "]"}'
If you want to keep the for loop for some reason and just extract the file name from the lines while skipping the header, you have a few choices. Awk is probably the easiest because of the NR builtin variable (which counts lines) and automatic field-splitting ($1 refers to the first field in the line, for instance), but you can use sed and cut as well.
You can use awk 'NR > 1 {print $1}' to get the first column (using any whitespace character as a delimiter while skipping the first line) or sed 1d | cut -d$'\t' -f1. Note that $'\t' is bash-specific syntax for a literal tab character, if your file is padded with spaces rather than using tabs to delimit fields, you can't use the sed ... | cut ... example.
i.e.
#!/bin/bash
for f in $(tst get files | awk 'NR > 1 {print $1}')
do
echo "FILE :[${f}]"
done
or
#!/bin/bash
for f in $(tst get files | sed 1d | cut -d$'\t' -f1)
do
echo "FILE :[${f}]"
done
to avoid unnecessary splitting in the for loop. It's best to set IFS to something specific outside the loop body to prevent 'a file with whitespace.txt' from being broken up.
OLD_IFS=IFS
IFS=$'\n\t'
for f in $(tst get files | sed 1d | cut -d$'\t' -f1)
do
echo "FILE :[${f}]"
done
You can just do:
tst get files | awk 'NR > 1 { printf "FILE :[%s]\n", $1 }'
Update: To answer extended problem as per comments below by OP:
while read -r file _; do
tst setprimary "$file" && tst get dataload
done < <(tst get files)
Or perl:
tst ... | perl -lanE 'say "File: [$F[0]]" if $.>1'
the variable $. contains the current line number

Extract first word in colon separated text file

How do i iterate through a file and print the first word only. The line is colon separated. example
root:01:02:toor
the file contains several lines. And this is what i've done so far but it does'nt work.
FILE=$1
k=1
while read line; do
echo $1 | awk -F ':'
((k++))
done < $FILE
I'm not good with bash-scripting at all. So this is probably very trivial for one of you..
edit: variable k is to count the lines.
Use cut:
cut -d: -f1 filename
-d specifies the delimiter
-f specifies the field(s) to keep
If you need to count the lines, just
count=$( wc -l < filename )
-l tells wc to count lines
awk -F: '{print $1}' FILENAME
That will print the first word when separated by colon. Is this what you are looking for?
To use a loop, you can do something like this:
$ cat test.txt
root:hello:1
user:bye:2
test.sh
#!/bin/bash
while IFS=':' read -r line || [[ -n $line ]]; do
echo $line | awk -F: '{print $1}'
done < test.txt
Example of reading line by line in bash: Read a file line by line assigning the value to a variable
Result:
$ ./test.sh
root
user
A solution using perl
%> perl -F: -ane 'print "$F[0]\n";' [file(s)]
change the "\n" to " " if you don't want a new line printed.
You can get the first word without any external commands in bash like so:
printf '%s' "${line%%:*}"
which will access the variable named line and delete everything that matches the glob :* and do so greedily, so as close to the front (that's the %% instead of a single %).
Though with this solution you do need to do the loop yourself. If this is the only thing you want to do with the variable the cut solution is better so you don't have to do the file iteration yourself.

sed - unterminated `s' command

I have this peace of code:
cat BP.csv | while read line ; do
goterm=$(awk '{print $1}') ;
name=$(awk '{print $2}') ;
grep -w "$goterm" GOEA.csv | sed "s/$goterm/pi/g" ;
done
file BP.csv has this format:
GO:0008283 cell proliferation
GO:0009405 pathogenesis
GO:0010201 response to continuous far red light stimulus by the high-irradiance response system
GO:0009641 shade avoidance
while GOEA.csv has this format:
4577 GO:0006807 0.994 2014_06_01
4577 GO:0016788 0.989 2014_06_01
4577 GO:0043169 0.977 2014_06_01
4577 GO:0043170 0.963 2014_06_01
sed doesn't work. I want to change GO:0043170 for example, to string "pi", but it gives:
sed: -e expression #1, char 12: unterminated `s' command
Why?
Thanks.
You running your awk command against no input, Try this:
cat BP.csv | while read line ; do
goterm=$(awk '{print $1}' <<< "$line") ;
name=$(awk '{print $2}' <<< "$line" ) ;
grep -w "$goterm" GOEA.csv | sed "s/$goterm/pi/g" ;
done
Let's clean up this code a bit:
while read goterm name
do
grep -w "$goterm" GOEA.csv | sed "s/$goterm/pi/g"
done < BP.cvs
The problem is that your awk statements are attempting to read in from STDIN just like your while is doing. You're reading from the same input stream.
What you want to do is to pull out the values from your line. I'm using read to do this. The read statement uses the values in $IFS to separate out the input. This is normally spaces, tabs, and newlines. The read reads each variable you put on the line, and the last value read in contains the entire rest of the line.
Thus:
while read line
reads in the entire line while:
while goterm name
will break the line as
goterm="GO:0008283"
name="cell proliferation"
One more thing. When you use grep and sed together, you probably can get away with just sed:
while read goterm name
do
sed -n "/$goterm/s/$goterm/pi/gp" GOEA.csv
done < BP.csv
The format for the sed command is:
/lines/command/parameters/
So, I'm searching for lines with $goterm in them, then I am replacing $goterm with pi. The -n means don't print out the lines as sed processes them and p means to print out the lines were the substitute is located.
By the way, csv as a file suffix means comma separated values but neither file looks like it is comma separated. Are these tabs separating each field. If so, you'll need to modify $IFS to be tabs.
I would restructure that whole thing more like this:
while read goterm restofline
do
grep -w "${goterm}" GOEA.csv | sed -e "s/${goterm}/pi/g"
done < BP.csv
No reason for the awk things, as the bash read builtin will do rudimentary field splitting for you if you give it multiple variables. Also, you aren't using name anyway, so it's not needed. cat is unnecessary as well.
Depending on your exact use case, even the grep may be unnecessary, making the inner command simply sed -ne "s/${goterm}/pi/gp" GOEA.csv. Unless your purpose for the grep -w is eliminating lines where ${goterm} is a substring of a word instead of the whole word...
For future reference, inserting a set -x above your loop in your script would show you the exact commands that are being run, so that you can compare them with your expectations.

Concise and portable "join" on the Unix command-line

How can I join multiple lines into one line, with a separator where the new-line characters were, and avoiding a trailing separator and, optionally, ignoring empty lines?
Example. Consider a text file, foo.txt, with three lines:
foo
bar
baz
The desired output is:
foo,bar,baz
The command I'm using now:
tr '\n' ',' <foo.txt |sed 's/,$//g'
Ideally it would be something like this:
cat foo.txt |join ,
What's:
the most portable, concise, readable way.
the most concise way using non-standard unix tools.
Of course I could write something, or just use an alias. But I'm interested to know the options.
Perhaps a little surprisingly, paste is a good way to do this:
paste -s -d","
This won't deal with the empty lines you mentioned. For that, pipe your text through grep, first:
grep -v '^$' | paste -s -d"," -
This sed one-line should work -
sed -e :a -e 'N;s/\n/,/;ba' file
Test:
[jaypal:~/Temp] cat file
foo
bar
baz
[jaypal:~/Temp] sed -e :a -e 'N;s/\n/,/;ba' file
foo,bar,baz
To handle empty lines, you can remove the empty lines and pipe it to the above one-liner.
sed -e '/^$/d' file | sed -e :a -e 'N;s/\n/,/;ba'
How about to use xargs?
for your case
$ cat foo.txt | sed 's/$/, /' | xargs
Be careful about the limit length of input of xargs command. (This means very long input file cannot be handled by this.)
Perl:
cat data.txt | perl -pe 'if(!eof){chomp;$_.=","}'
or yet shorter and faster, surprisingly:
cat data.txt | perl -pe 'if(!eof){s/\n/,/}'
or, if you want:
cat data.txt | perl -pe 's/\n/,/ unless eof'
Just for fun, here's an all-builtins solution
IFS=$'\n' read -r -d '' -a data < foo.txt ; ( IFS=, ; echo "${data[*]}" ; )
You can use printf instead of echo if the trailing newline is a problem.
This works by setting IFS, the delimiters that read will split on, to just newline and not other whitespace, then telling read to not stop reading until it reaches a nul, instead of the newline it usually uses, and to add each item read into the array (-a) data. Then, in a subshell so as not to clobber the IFS of the interactive shell, we set IFS to , and expand the array with *, which delimits each item in the array with the first character in IFS
I needed to accomplish something similar, printing a comma-separated list of fields from a file, and was happy with piping STDOUT to xargs and ruby, like so:
cat data.txt | cut -f 16 -d ' ' | grep -o "\d\+" | xargs ruby -e "puts ARGV.join(', ')"
I had a log file where some data was broken into multiple lines. When this occurred, the last character of the first line was the semi-colon (;). I joined these lines by using the following commands:
for LINE in 'cat $FILE | tr -s " " "|"'
do
if [ $(echo $LINE | egrep ";$") ]
then
echo "$LINE\c" | tr -s "|" " " >> $MYFILE
else
echo "$LINE" | tr -s "|" " " >> $MYFILE
fi
done
The result is a file where lines that were split in the log file were one line in my new file.
Simple way to join the lines with space in-place using ex (also ignoring blank lines), use:
ex +%j -cwq foo.txt
If you want to print the results to the standard output, try:
ex +%j +%p -scq! foo.txt
To join lines without spaces, use +%j! instead of +%j.
To use different delimiter, it's a bit more tricky:
ex +"g/^$/d" +"%s/\n/_/e" +%p -scq! foo.txt
where g/^$/d (or v/\S/d) removes blank lines and s/\n/_/ is substitution which basically works the same as using sed, but for all lines (%). When parsing is done, print the buffer (%p). And finally -cq! executing vi q! command, which basically quits without saving (-s is to silence the output).
Please note that ex is equivalent to vi -e.
This method is quite portable as most of the Linux/Unix are shipped with ex/vi by default. And it's more compatible than using sed where in-place parameter (-i) is not standard extension and utility it-self is more stream oriented, therefore it's not so portable.
POSIX shell:
( set -- $(cat foo.txt) ; IFS=+ ; printf '%s\n' "$*" )
My answer is:
awk '{printf "%s", ","$0}' foo.txt
printf is enough. We don't need -F"\n" to change field separator.

Resources