Using awk, eliminate any empty fields in a file and print in proper format - bash

how to use awk on the following file named "awk.txt" and print all fields in proper length of space or tab length between.
# cat /root/awk.txt
abc hij klm
def pqr hij
mmm fgf hgt
yyt ghf jkw
I wanted to use awk on this and print in the following proper format.
abc hij klm
def pqr hij
mmm fgf hgt
yyt ghf jkw
Please help!!

Use the column command from coreutils:
column -t file
In this special case, where all entries have the same length, the following awk command would do the trick as well, however column can do the job even if the entries have different length:
awk '{$1=$1}1' OFS=' ' file

This line of awk will format the output using printf (documentation)
awk '{printf "%3s\t%3s\t%3s\n",$1,$2,$3}' awk.txt
If you want to strip the first line starting with #
awk '!/^#/{printf "%3s\t%3s\t%3s\n",$1,$2,$3}'

Related

Trying to merge 2 files but ignore new lines

I'm trying to merge 2 lists together: Only copy over common differences, but ignore new lines. Might be easier to explain by this:
a.txt b.txt
abc 123
def abc.^$234,~12
ghi abcdd
jkl asdf
mnn ghi.^$321,~11
opq jkl
mnn^$qws
zxy
Becomes:
output.txt:
abc.^$234,~12
def
ghi.^$321,~11
jkl
mnn^$qws
opq
Trying to combine to lists, copy common lines while dropping new lines.
This might work for you (GNU sed):
sed -nE '1{x;s/.*/cat file2/e;x};G;s/^([^\n]+)(\n.*)*\n(\1\>[^\n]*).*/\3/;P' file1
Slurp file2 into the hold space and then append it to each line in file1.
If the word in file1 matches a word in file2, print the contents of that line in file2. Otherwise, print the current line in file1.
you could try the diff and patch commands, they might help you.
diff -u old_file new_file > change.diff
patch new_file < change.diff
You're requirements aren't at all clear but this will produce the expected output you posted given the sample input you posted so it may be what you're looking for:
$ awk -F'[^[:alnum:]]' 'NR==FNR{a[$1]=$0; next} {print ($1 in a ? a[$1] : $1)}' b.txt a.txt
abc.^$234,~12
def
ghi.^$321,~11
jkl
mnn^$qws
opq
Using awk:
$ awk '
NR==FNR {
a[$0]
next
}
{
for(i in a)
if(index(i,$0)) {
print i
next
}
print
}' b a
Output:
abc.^$234,~12
def
ghi.^$321,~11
jkl
mnn^$qws
opq

BASH - Split file into several files based on conditions

I have a file (input.txt) with the following structure:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
...
I would like to split this file into multiple files (day.txt; month.txt; ...). Each new text file would contain all "header" lines (the one starting with >) and their content (lines between two header lines).
day.txt would therefore be:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
and month.txt:
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
I cannot use split -l in this case because the amount of lines is not the same for each category (day, month, etc.). However, each sub-category has the same number of lines (=3).
EDIT: As per OP adding 1 more solution now.
awk -F'[>_]' '/^>/{file=$2".txt"} {print > file}' Input_file
Explanation:
awk -F'[>_]' ' ##Creating field separator as > or _ in current lines.
/^>/{ file=$2".txt" } ##Searching a line which starts with > if yes then creating a variable named file whose value is 2nd field".txt"
{ print > file } ##Printing current line to variable file(which will create file name of variable file's value).
' Input_file ##Mentioning Input_file name here.
Following awk may help you on same.
awk '/^>day/{file="day.txt"} /^>month/{file="month.txt"} {print > file}' Input_file
You can set the record separator to > and then just set the file name based on the category given by $1.
$ awk -v RS=">" 'NF {f=$1; sub(/_.*$/, ".txt", f); printf ">%s", $0 > f}' input.txt
$ cat day.txt
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
$ cat month.txt
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
Here's a generic solution for >name_number format
$ awk 'match($0, /^>[^_]+_/){k = substr($0, RSTART+1, RLENGTH-2);
if(!(k in a)){close(op); a[k]; op=k".txt"}}
{print > op}' ip.txt
match($0, /^>[^_]+_/) if line matches >name_ at start of line
k = substr($0, RSTART+1, RLENGTH-2) save the name portion
if(!(k in a)) if the key is not found in array
a[k] add key to array
op=k".txt" output file name
close(op) in case there are too many files to write
print > op print input record to filename saved in op
Since each subcategory is composed of the same amount of lines, you can use grep's -A / --after flag to specify that number of lines to match after a header.
So if you know in advance the list of categories, you just have to grep the headers of their subcategories to redirect them with their content to the correct file :
lines_by_subcategory=3 # number of lines *after* a subcategory's header
for category in "month" "day"; do
grep ">$category" -A $lines_by_subcategory input.txt >> "$category.txt"
done
You can try it here.
Note that this isn't the most efficient solution as it must browse the input once for each category. Other solutions could instead browse the content and redirect each subcategory to their respective file in a single pass.

Replace a word of a line if matched

I am given a file. If a line has "xxx" as its third word then I need to replace it with "yyy". My final output must have all the original lines with the modified lines.
The input file is-
abc xyz mno
xxx xyz abc
abc xyz xxx
abc xxx xxx xxx
The required output file should be-
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
I have tried-
grep "\bxxx\b" file.txt | awk '{if ($3=="xxx") print $0;}' | sed -e 's/[^ ]*[^ ]/yyy/3'
but this gives the output as-
abc xyz yyy
abc xxx yyy xxx
Following simple awk may help you in same.
awk '$3=="xxx"{$3="yyy"} 1' Input_file
Output will be as follows.
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
Explanation: Checking condition here if $3 3rd field is equal to string xxx then setting $3's value to string yyy. Then mentioning 1 there, since awk works on method of condition then action. I am making condition TRUE here by mentioning 1 here and NOT mentioning any action here so be default print of current line will happen(either with changed 3rd field or with new 3rd field).
sed solution:
sed -E 's/^(([^[:space:]]+[[:space:]]+){2})apathy\>/\1empathy/' file
The output:
abc xyz mno
apathy xyz abc
abc xyz empathy
abc apathy empathy apathy
To modify the file inplace add -i option: sed -Ei ....
In general the awk command may look like
awk '{command set 1}condition{command set 2}' file
The command set 1 would be executed for every line while command set 2 will be executed if the condition preceding that is true.
My final output must have all the original lines with the modified
lines
In your case
awk 'BEGIN{print "Original File";i=1}
{print}
$3=="xxx"{$3="yyy"}
{rec[i++]=$0}
END{print "Modified File";for(i=1;i<=NR;i++)print rec[i]}'file
should solve that.
Explanation
$3 is the the third space-delimited field in awk. If it matches "xxx", then it is replaced. Print the unmodified lines first while storing the modified lines in an array. At the end, print the modified lines. BEGIN and END blocks are executed only at the beginning and the end respectively. NR is the awk built-in variable which denotes that number of records processed till the moment. Since it is used in the END block it should give us the total number of records.
All good :-)
Ravinder has already provided you with the shortest awk solution possible.
In sed, the following would work:
sed -E 's/(([^ ]+ ){2})xxx/\1yyy/'
Or if your sed doesn't include -E, you can use the more painful BRE notation:
sed 's/\(\([^ ][^ ]* \)\{2\}\)xxx/\1yyy/'
And if you're in the mood to handle this in bash alone, something like this might work:
while read -r line; do
read -r -a a <<<"$line"
[[ "${a[2]}" == "xxx" ]] && a[2]="yyy"
printf '%s ' "${a[#]}"
printf '\n'
done < input.txt

How can I replace a character in a specific column? [duplicate]

I have a text file and I'm trying to replace a specific character (.) in the first column to another character (-). Every field is delimited by comma. Some of the lines have the last 3 columns empty, so they have 3 commas at the end.
Example of text file:
abc.def.ghi,123.4561.789,ABC,DEF,GHI
abc.def.ghq,124.4562.789,ABC,DEF,GHI
abc.def.ghw,125.4563.789,ABC,DEF,GHI
abc.def.ghe,126.4564.789,,,
abc.def.ghr,127.4565.789,,,
What I tried was using awk to replace '.' in the first column with '-', then print out the contents.
ETA: Tried out sarnold's suggestion and got the output I want.
ETA2: I could have a longer first column. Is there a way to change ONLY the first 3 '.' in the first column to '-', so I get the output
abc-def-ghi-qqq.www,123.4561.789,ABC,DEF,GHI
abc-def-ghq-qqq.www,124.4562.789,ABC,DEF,GHI
abc-def-ghw-qqq.www,125.4563.789,ABC,DEF,GHI
abc-def-ghe-qqq.www,126.4564.789,,,
abc-def-ghr-qqq.www,127.4565.789,,,
. is regexp notation for "any character". Escape it with \ and it means .:
$ awk -F, '{gsub(/\./,"-",$1); print}' textfile.csv
abc-def-ghi 123.4561.789 ABC DEF GHI
abc-def-ghq 124.4562.789 ABC DEF GHI
abc-def-ghw 125.4563.789 ABC DEF GHI
abc-def-ghe 126.4564.789
abc-def-ghr 127.4565.789
$
The output field separator is a space, by default. Set OFS = "," to set that:
$ awk -F, 'BEGIN {OFS=","} {gsub(/\./,"-",$1); print}' textfile.csv
abc-def-ghi,123.4561.789,ABC,DEF,GHI
abc-def-ghq,124.4562.789,ABC,DEF,GHI
abc-def-ghw,125.4563.789,ABC,DEF,GHI
abc-def-ghe,126.4564.789,,,
abc-def-ghr,127.4565.789,,,
This still allows changing multiple fields:
$ awk -F, 'BEGIN {OFS=","} {gsub(/\./,"-",$1); gsub("1", "#",$2); print}' textfile.csv
abc-def-ghi,#23.456#.789,ABC,DEF,GHI
abc-def-ghq,#24.4562.789,ABC,DEF,GHI
abc-def-ghw,#25.4563.789,ABC,DEF,GHI
abc-def-ghe,#26.4564.789,,,
abc-def-ghr,#27.4565.789,,,
I don't know what -OFS, does, but it isn't a supported command line option; using it to set the output field separator was a mistake on my part. Setting OFS within the awk program works well.
This might work for you:
awk -F, -vOFS=, '{for(n=1;n<=3;n++)sub(/\./,"-",$1)}1' file
abc-def-ghi-qqq.www,123.4561.789,ABC,DEF,GHI
abc-def-ghq-qqq.www,124.4562.789,ABC,DEF,GHI
abc-def-ghw-qqq.www,125.4563.789,ABC,DEF,GHI
abc-def-ghe-qqq.www,126.4564.789,,,
abc-def-ghr-qqq.www,127.4565.789,,,

how to get the lines from one pattern to other pattern when those pattern are repeated by sed command

if i have file like this
test.txt
abc naveen
abc cde
naveen cde
kumar
naveen
abc
cde
abc
naveen
cde
Question 1: In this we have repeated patterns like abc, navee, cdf etc
Now we have to get the lines from first occurrence of one pattern to any second occurrence of another pattern
For example, I want to get the lines from the 2nd occurrence of abc to the 3rd occurrence of naveen i.e we get output as
abc cde
naveen cde
kumar
naveen
Question 2 (this question is continue to above question):
I want to get only the lines between them (exclude those abc and naveen )
So, I want output as
cde
naveen cde
kumar
this can be done by using sed command ....
so any one please give me the answer for this
try this
a=2
b=3
abcocc=`awk '$0~/abc/{print NR}' txt | awk -v occ=$a 'NR==occ{print $0}' `
naveenocc=`awk '$0~/naveen/{print NR}' txt | awk -v occ=$b 'NR==occ{print $0}'`
1) awk -v abc=$abcocc -v naveen=$naveenocc 'NR>=abc&&NR<=naveen{print $0}' txt
2) awk -v abc=$abcocc -v naveen=$naveenocc 'NR>abc&&NR<naveen{print $0}' txt
a is occurrence of abc and b is occurrence of Naveen and txt is input file. try and let me know if modification is needed.

Resources