sed | grep weird behaviour in script - bash

THE ISSUE
I have two files,
File1
INT1;INT2;INT3INT4;INT5;INT6INT7;INT7;INT9
File2
INT1;INT2;INT3
Next I'll grep the difference between the files and only take the integers of third column.
DIFFERENCE=`grep -vxFf File1 File2 | awk 'BEGIN { FS = ";" } ; { print $3 }'`
resulting in
INT6 INT9
Next I want to substitute the spaces with line breaks
echo $DIFFERENCE | sed 's/ /;\n/g'
which results in
INT6;
INT9
Just as it should.
Instead, when I do it in the script, it returns
INT6
INT9
Why does it do this in script, and is there solution to this / how can I modify my result easily?
ORIGINAL CODE - FOR CLARIFICATION
Original code and output here
CODE=`grep -vxFf $FOUND $COMPARETO | awk 'BEGIN { FS = ";" } ; { print $3 }'`
echo "$CODE;" | sed 's/ /;\n/g' > "testfile"
8000070118157
8002820000804
3394700015011;

Your intermediate output is not INT6 INT9 on single line but already two lines, therefore sed doesn't replace anything.
You can do all of this in awk itself, for example
$ awk -F';' 'NR==FNR{a[$0];next} !($0 in a){print $3 FS}' file2 file1
INT6;
INT9;
if you don't want the last ;, perhaps easier to pipe to sed '$ s/;$//'

Related

Replace one line of a file with another line in a second file if it matches the condition

I am here wondering that if I can read each line of a.txt and compare it to each line in b.txt. If any line in a.txt matches the beginning part of the line in b.txt, we replace the matched line with the line we found in a.txt. So let's say there are two lines: alias cd /correct/path/ and alias cd /wrong/path/sth in a.txt b.txt respectively. Now after I execute my command I would like the lines to be all like: alias cd /correct/path/ on both files. My own solution is to do two while...read.. functions and use sed -i /// to replace the line, but I think it is very clumsy and not efficient. I am looking to be enlightened with a more clean & efficient solution. Here is my code if it helps by any chance:
awk 'NR==FNR { array[$0]; next } { delete array[$0] } END{for (key in array) { print key } }' a.txt b.txt > tmp
input="tmp"
while IFS= read -r line
do
echo "$line"
cat b.txt > n_tmp
n_input="$n_tmp"
while IFS= read -r n_line
do
if $n_line | awk '{print $1, $2}' == $line | awk '{print $1, $2}'; then
sed -i "s/$n_line/$line/" b.txt
fi
done < "$n_input"
rm -rf n_tmp
done < "$input"
rm -rf tmp```
There are a few mistakes in this script and most of them are within the line: if $n_line | awk '{print $1, $2}' == $line | awk '{print $1, $2}'; then. First of all the way to get result from $n_line | awk '{print $1, $2}' is wrong as there is no action for n_line variable. There needs to be added an echo so that we can get the output of the string and the awk command can follow up. Secondly there is no double quotes for strings or whatever I was trying to get from the $n_line | awk '{print $1, $2}' command. Lastly, there is a double bracket needed to wrap around the two sides of the comparator. So in the end it should look something like this:
b_string=`echo "$n_line" | awk '{print $1, $2}'`
if [[ "$a_string" == "$b_string" ]]; then
I figured to declare the echoing part into a variable as well, it may look a bit cleaner and easier to handle. There are still some other problems with this script, but as of now I think the primary issue is solved.

AWK -F with print all but last record

/Home/in/test_file.txt
echo /Home/in/test_file.txt | awk -F'/' '{ print $2,$3 }'
Gives the result as:
Home in
But I need /Home/in/ as the result .I have to get all except test_file.txt
How to achieve this?
$ echo '/Home/in/test_file.txt' | awk '{sub("/[^/]+$","")} 1'
/Home/in
$ echo '/Home/in/test_file.txt' | awk '{sub("[^/]+$","")} 1'
/Home/in/
$ echo '/Home/in/test_file.txt' | sed 's:/[^/]*$::'
/Home/in
$ echo '/Home/in/test_file.txt' | sed 's:[^/]*$::'
/Home/in/
$ dirname '/Home/in/test_file.txt'
/Home/in
Your attempt awk -F'/' '{ print $2,$3 }' didn't do what you wanted as -F'/' is telling awk to split the input into fields at every / and then print $2,$3 is telling awk to print the 2nd and 3rd fields separated by a blank char (the default value for OFS). You could do:
$ echo '/Home/in/test_file.txt' | awk 'BEGIN{FS=OFS="/"} { print "",$2,$3,"" }'
/Home/in/
to get the expected output but it'd be the wrong approach since it's removing the field you don't want AND removing the input separators AND then adding new output separators which happen to the have the same value as the input separators rather than simply removing the field you don't want like the other solutions above do.
echo /Home/in/test_file.txt | awk -F'/[^/]*$' '{ print $1 }'
..will print the everything but the trailing slash
There are several ways to achieve this:
Using dirname:
$ dirname /home/in/test_file.txt
/home/in
Using Shell substitution:
$ var="/home/in/test_file.txt"
$ echo "${var%/*}"
/home/in
Using sed: (See Ed Morton)
Using AWK:
$ echo "/home/in/test_file.txt" | awk -F'/' '{OFS=FS;$NF=""}1'
/home/in/
Remark: all these work since you can't have a filename with a forward slash (Is it possible to use "/" in a filename?)
Note: all but dirname will fail if you just have a single file_name without a path. While dirname foo will return ./ all others will return foo
awk behaves as it should.
When you define slash / as a separator, the fields in your expression become the content between the separators.
If you need the separator to be printed as well, you need to do it explicitly, like:
echo /Home/in/test_file.txt | awk -F'/' '{ printf "%s/%s/",$2,$3 }'
replace your last field with an empty string and
put the slash back in as the (builtin) Output Field Separator (OFS)
echo /Home/in/test_file.txt | awk -F'/' -vOFS='/' '{$NF="";print}

awk append in CSV file

How to use awk command, as I need to add or append a 000 to my below timestamp column. I try to use the below command,
head -n 10000001 ratings.csv | tail -n +2 | awk '{print $1 "000"}' >> ratings_1.csv
but data is not as expected.
$ cat ratings.csv |wc -l
20000264
$ head ratings.csv
userId,movieId,rating,timestamp
1,2,3.5,1112486027
1,29,3.5,1112484676
1,32,3.5,1112484819
1,47,3.5,1112484727
1,50,3.5,1112484580
1,112,3.5,1094785740
1,151,4.0,1094785734
1,223,4.0,1112485573
1,253,4.0,1112484940
My expected output should look like
1,2,3.5,1112486027000
awk '{ if (NR > 1) { $1 = $1 "000" } print }'
Maybe a faster version that wouldn't run the if on every line would be:
awk 'BEGIN { getline; print } { print $0 "000" }'

Print all lines in "file2" which have line number stored in "file1" $2

File1:
count line_num
xy 55
ab 67
File2:
a|b|c
d|e|f
I want to print 55, 67 line numbers of file2
am trying:
#!/usr/bin/ksh
while read file_name; do
line_num=`echo $file_name | awk '{print $2}'`
awk 'NR==$line_num{print;exit}' file2 >> file3.txt
done < file1
but it's not working!
Using awk you can do:
awk 'NR==FNR{line[$2]; next} FNR in line' file1 file2
We iterate the first file and store second column in a map called line (we could ignore the first line which is the header by doing NR>1 but since it doesn't contain numbers we don't need to). Once the first file is loaded in map, we iterate the second file and print out lines that are in our map. NR and FNR are awk variables that remembers the line numbers.
You can use awk to read the line numbers in a loop and sed to print out the specific lines:
while read a; do sed -n ${a}p f2.txt; done < <(awk 'NR>1{print$2}' f1.txt)
If you have a bigger file, performance can be an issue as Ed pointed out, in that case you can use awk alone:
awk 'NR==FNR{if(NR>1)l[$2]=1;next}{if(l[FNR])print $0}' f1.txt f2.txt
Another way, is to use xargs:
awk 'NR>1{print $2}' f1.txt | xargs -n1 -I {} sed -n {}p f2.txt
Use sed to construct a sed one-liner (in the case of file1 it'd output and run sed -n "55p;67p;" file2):
sed -n "$(sed -n '2~1{s/.* //;s/.*/&p/p}' file1)" file2
A good advertisement for awk, alas!

Awk: Drop last record separator in one-liner

I have a simple command (part of a bash script) that I'm piping through awk but can't seem to suppress the final record separator without then piping to sed. (Yes, I have many choices and mine is sed.) Is there a simpler way without needing the last pipe?
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd \
| uniq | awk '{IRS="\n"; ORS=","; print}'| sed s/,$//);
Without the sed, this produces output like echo,sierra,victor, and I'm just trying to drop the last comma.
You don't need awk, try:
egrep -o ....uniq|paste -d, -s
Here is another example:
kent$ echo "a
b
c"|paste -d, -s
a,b,c
Also I think your chained command could be simplified. awk could do all things in an one-liner.
Instead of egrep, uniq, awk, sed etc, all this can be done in one single awk command:
awk -F":" '!($1 in a){l=l $1 ","; a[$1]} END{sub(/,$/, "", l); print l}' /etc/password
Here is a small and quite straightforward one-liner in awk that suppresses the final record separator:
echo -e "alpha\necho\nnovember" | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=","
Gives:
alpha,echo,november
So, your example becomes:
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd | uniq | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=",");
The benefit of using awk over paste or tr is that this also works with a multi-character ORS.
Since you tagged it bash here is one way of doing it:
#!/bin/bash
# Read the /etc/passwd file in to an array called names
while IFS=':' read -r name _; do
names+=("$name");
done < /etc/passwd
# Assign the content of the array to a variable
dolls=$( IFS=, ; echo "${names[*]}")
# Display the value of the variable
echo "$dolls"
echo "a
b
c" |
mawk 'NF-= _==$NF' FS='\n' OFS=, RS=
a,b,c

Resources