Removing lines based on column values read from file - bash

I use the following code to extract lines from input_file with a certain value in the first column. The values on which the extraction of lines is based is in "one_column.txt":
while read file
do
awk -v col="$file" '$1==col {print $0}' input_file >> output_file
done < one_column.txt
My question is, how do I extract the lines where the first column does not match any of the values in one_column.txt? In other words, how do I extract only the remaining lines from input_file that don't end up in output_file?

grep -vf can make it:
grep -vf output_file input_file
grep -f compares one file with another. grep -v matches the opposite.
Test
$ cat a
hello
good
bye
$ cat b
hello
good
bye
you
all
$ grep -f a b
hello
good
bye
$ grep -vf a b ## opposite
you
all

Related

How to write to first n lines of file in bash

I have a file test.txt with following content
first
second
AAA
BBB
CCC
DDD
And I want to remove the first two lines and add new values to the first two lines,
so,
once the first two lines are removed, file should look like this:
AAA
BBB
CCC
DDD
And then should add the two values to first line and then the second line, so the file would finally look like below:
new value at line 1
new value at line 2
AAA
BBB
CCC
DDD
So I tried the below command, but how can I remove the first two lines?
SERVER_HOSTNAME=$(hostname)
SERVER_IP=$(ip -o route get to 8.8.8.8 | sed -n 's/.*src \([0-9.]\+\).*/\1/p')
sed -i "1s/.*/$SERVER_HOSTNAME/" /tmp/test.txt
sed -i "2s/.*/$SERVER_IP/" /tmp/test.txt
My problem is, when I first remove the first two lines, and execute the above command, it will replace the line number 1 and two with new values, but I want to add them on the top so the others (already existing content will shift down) will go down.
Your existing sed already does what you want; it doesn't need you to remove the first two lines yourself first.
$ cat tmp.txt
first
second
AAA
BBB
CCC
DDD
$ SERVER_HOSTNAME=example.local
$ SERVER_IP=127.0.0.1
$ sed -i "1s/.*/$SERVER_HOSTNAME/;2s/.*/$SERVER_IP/" tmp.txt
$ cat tmp.txt
example.local
127.0.0.1
AAA
BBB
CCC
DDD
Don't run sed -i repeatedly. Instead, combine all your commands into one script.
sed -i "1s/.*/$(hostname)/
2s/.*/$(ip -o route get to 8.8.8.8)/
2s/.*src \([0-9.]\+\).*/\1/" /tmp/test.txt
This is rather brittle, though; in particular, it will break if either of the command substitutions produces a slash in their output.
(IIRC ip has options to produce machine-readable output; you should probably look into that instead of replacing out the parts you don't want.)
If you want to add, then delete, the sed d and a commands do that, respectively. But removing two and adding two is obviously equivalent to replacing two.
sed -i "1d
1a\hello
2d
2a\hello" file
Unfortunately, the a command is poorly standardized, and it's unclear how two s commands would not work, so I'm leaving this as a sketch.
Could you please try following, written and tested with shown samples in GNU awk. This will edit 1st and 2nd lines with shell variable values and do an inplace save into Input_file itself.
In case you want to keep 1st line's content as well as print current content then one could remove next in FNR==2 OR FNR==1 conditions.
SERVER_HOSTNAME=$(hostname)
SERVER_IP=$(ip -o route get to 8.8.8.8 | sed -n 's/.*src \([0-9.]\+\).*/\1/p')
awk -v server_ip="$SERVER_IP" -v server_hostname="$SERVER_HOSTNAME" '
FNR==1{ print server_hostname; next}
FNR==2{ print server_ip; ; next}
1
' Input_file > temp && mv temp Input_file
Explanation for above solution:
awk -v server_ip="$SERVER_IP" -v server_hostname="$SERVER_HOSTNAME" '
##Starting awk program from here, creating server_ip and server_hostname vars with respective shell vars.
FNR==1{ print server_hostname; next}
##Checking condition if this is 1st line then print server_hostname.
FNR==2{ print server_ip; ; next}
##Checking condition if this is 2nd line then print server_ip here.
1
##1 will print current line.
' Input_file
##Mentioning Input_file name here.
You may use this gnu sed:
gsed -i -e "1i\\$SERVER_HOSTNAME\n$SERVER_IP" -e '1,2{d;q;}' /tmp/test.txt
Or using POSIX sed:
sed -i.bak "1,2{d;q;};3i\\
$SERVER_HOSTNAME
3i\\
$SERVER_IP
" /tmp/test.txt

replace different text in different lines using sed

I need to do the following:
I have two files, the first one contains only the lines that are going to be modified:
1
2
3
and the second contains the text that is going to be replaced in original file (final_output.txt)
13e
19f
16a
the original file is
wire1: 0x'd318
wire2: 0x'd415
wire3: 0x'd362
I want to get the following:
wire1: 0x13e
wire2: 0x19f
wire3: 0x16a
This is only a part of final_output.txt, because the file can contain at least 100 lines, and I pretend to do it using for, but I don't know how to implement it
awk to the rescue!
assuming the part after the single quote will be replaced.
$ awk -v q="'" 'NR==FNR {a[$1]=$2;next}
FNR in a {sub(q".*",a[FNR])}1' <(paste index rep) file
index is the index file, rep is the replacement file, and file is the original data file.
Another solution where file1 contains only the lines, file2 contains the text that is going to be replaced in original file and final_output.txt contains your original text.
for ((i=1;i<=$(wc -l < file1);i++)); do sed -i "$(sed -n "${i}p" file1)s#$(sed -n "$(sed -n "${i}p" file1)p" final_output.txt | grep -oP "'.*")#$(sed -n "${i}p" file2)#g" final_output.txt; done
Output
darby#Debian:~/Scrivania$ cat final_output.txt
wire1: 0x13e
wire2: 0x19f
wire3: 0x16a
darby#Debian:~/Scrivania$

Compare 2 csv files and delete rows - Shell

I have a 2 csv files. One has several columns, the other is just one column with domains. Simplified data of these files would be
file1.csv:
John,example.org,MyCompany,Australia
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
file2.csv:
example.org
google.es
mysite.uk
The output should be
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
I have tried this solution
grep -v -f file2.csv file1.csv >output-file
Found here
http://www.unix.com/shell-programming-and-scripting/177207-removing-duplicate-records-comparing-2-csv-files.html
But since there is no explanation whatsoever about how the script works, and I suck at shell, I cannot tweak it to make it work for me
A solution for this would be highly appreciated, a solution with some explanation would be awesome! :)
EDIT:
I have tried the line that was suppose to work, but for some reason it does not. Here the output from my terminal. What's wrong with this?
Desktop $ cat file1.csv ; echo
John,example.org,MyCompany,Australia
Lenny ,domain.com,OtherCompany,US
Martha,mysite.com,ThirCompany,US
Desktop $ cat file2.csv ; echo
example.org
google.es
mysite.uk
Desktop $ grep -v -f file2.csv file1.csv
John,example.org,MyCompany,Australia
Lenny ,domain.com,OtherCompany,US
Martha,mysite.com,ThirCompany,US
Why grep doesn't remove the line
John,example.org,MyCompany,Australia
The line you posted, works just fine.
$ grep -v -f file2.csv file1.csv
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
And here's an explanation. grep will search for a given pattern in a given file and print all lines that match. The simplest example of usage is:
$ grep John file1.csv
John,example.org,MyCompany,Australia
Here we used a simple pattern that matches each character, but you can also use regular expressions (basic, extended, and even perl-compatible ones).
To invert the logic, and print only the lines that do not match, we use the -v switch, like this:
$ grep -v John file1.csv
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
To specify more than one pattern, you can use the option -e pattern multiple times, like this:
$ grep -v -e John -e Lenny file1.csv
Martha,site.com,ThirdCompany,US
However, if there is a larger number of patterns to check for, we might use the -f file option that will read all patterns from a file specified.
So, when we combine all of those; reading patterns from a file with -f and inverting the matching logic with -v, we get the line you need.
One in awk:
$ awk -F, 'NR==FNR{a[$1];next}($2 in a==0)' file2 file1
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US
Explained:
$ awk -F, ' # using awk, comma-separated records
NR==FNR { # process the first file, file2
a[$1] # hash the domain to a
next # proceed to next record
}
($2 in a==0) # process file1, if domain in $2 not in a, print the record
' file2 file1 # file order is important

Creating a script that checks to see if each word in a file

I am pretty new to Bash and scripting in general and could use some help. Each word in the first file is separated by \n while the second file could contain anything. If the string in the first file is not found in the second file, I want to output it. Pretty much "check if these words are in these words and tell me the ones that are not"
File1.txt contains something like:
dog
cat
fish
rat
file2.txt contains something like:
dog
bear
catfish
magic ->rat
I know I want to use grep (or do I?) and the command would be (to my best understanding):
$foo.sh file1.txt file2.txt
Now for the script...
I have no idea...
grep -iv $1 $2
Give this a try. This is straight forward and not optimized but it does the trick (I think)
while read line ; do
fgrep -q "$line" file2.txt || echo "$line"
done < file1.txt
There is a funny version below, with 4 parrallel fgrep and the use of an additional result.txt file.
> result.txt
nb_parrallel=4
while read line ; do
while [ $(jobs | wc -l) -gt "$nb_parralel" ]; do sleep 1; done
fgrep -q "$line" file2.txt || echo "$line" >> result.txt &
done < file1.txt
wait
cat result.txt
You can increase the value 4, in order to use more parrallel fgrep, depending on the number of cpus and cores and the IOPS available.
With the -f flag you can tell grep to use a file.
grep -vf file2.txt file1.txt
To get a good match on complete lines, use
grep -vFxf file2.txt file1.txt
As #anubhava commented, this will not match substrings. To fix that, we will use the result of grep -Fof file1.txt file2.txt (all the relevant keywords).
Combining these will give
grep -vFxf <(grep -Fof file1.txt file2.txt) file1.txt
Using awk you can do:
awk 'FNR==NR{a[$0]; next} {for (i in a) if (index(i, $0)) next} 1' file2 file1
rat
You can simply do the following:
comm -2 -3 file1.txt file2.txt
and also:
diff -u file1.txt file2.txt
I know you were looking for a script but I don't think there is any reason to do so and if you still want to have a script you can jsut run the commands from a script.
similar awk
$ awk 'NR==FNR{a[$0];next} {for(k in a) if(k~$0) next}1' file2 file1
rat

grep "output of cat command - every line" in a different file

Sorry title of this question is little confusing but I couldnt think of anything else.
I am trying to do something like this
cat fileA.txt | grep `awk '{print $1}'` fileB.txt
fileA contains 100 lines while fileB contains 100 million lines.
What I want is get id from fileA, grep that id in a different file-fileB and print that line.
e.g fileA.txt
1234
1233
e.g.fileB.txt
1234|asdf|2012-12-12
5555|asdd|2012-11-12
1233|fvdf|2012-12-11
Expected output is
1234|asdf|2012-12-12
1233|fvdf|2012-12-11
Getting rid of cat and awk altogether:
grep -f fileA.txt fileB.txt
awk alone can do that job well:
awk -F'|' 'NR==FNR{a[$0];next;}$1 in a' fileA fileB
see the test:
kent$ head a b
==> a <==
1234
1233
==> b <==
1234|asdf|2012-12-12
5555|asdd|2012-11-12
1233|fvdf|2012-12-11
kent$ awk -F'|' 'NR==FNR{a[$0];next;}$1 in a' a b
1234|asdf|2012-12-12
1233|fvdf|2012-12-11
EDIT
add explanation:
-F'|' #| as field separator (fileA)
'NR==FNR{a[$0];next;} #save lines in fileA in array a
$1 in a #if $1(the 1st field) in fileB in array a, print the current line from FileB
for further details I cannot explain here, sorry. for example how awk handle two files, what is NR and what is FNR.. I suggest that try this awk line in case the accepted answer didn't work for you. If you want to dig a little bit deeper, read some awk tutorials.
If the id's are on distinct lines you could use the -f option in grep as such:
cut -d "|" -f1 < fileB.txt | grep -F -f fileA.txt
The cut command will ensure that only the first field is searched for in the pattern searching using grep.
From the man page:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line.
The empty file contains zero patterns, and therefore matches nothing.
(-f is specified by POSIX.)

Resources