How to compare two files using if loop?

How to compare two files using if loop? - shell

I have a file named file1.txt that looks like this:
Alex Dog
Ana Cat
Jack Fish
Kyle Mouse
And a file named file2.txt that looks like this:
Alex Lion
Ana Cat
Jack Fish
Kyle Mouse
What would be a good way to run a loop that checks if the names (Alex, Ana etc) still own the same pets (second column)?
I want the script to run the compare and then if they all match do nothing. If there is 1 mismatch or more Echo the pet that has been changed. For example on these two files (file1.txt and file2.txt) the script would print:
Lion

There are a lot of assumptions in the following code, but it works as per KamilCuk's design:
$ join file1.txt file2.txt | awk ' $2 != $3 { print $3} '
Lion
Assumptions:
files are sorted
the files have the same people in the same order
the people have only first names - no spaces in the names
If we want to do a little better, we can sort the inputs and allow the names to have spaces as follows:
$ join <( sort file1.txt) <( sort file2.txt ) | awk ' $(NF - 1) != $NF { print $NF } '

Remove all lines in file2.txt that are found in file1.txt and show after the space.
grep -vf file1.txt file2.txt | cut -d" " -f2
Edit:
You asked for a loop. A loop will use a lot more resources and should be avoided (use awk when you can not think of smart commands, perhaps with some pipes).
# Avoid the following slow loop
while read -r name animal; do
grep "^${name} " file2.txt | grep -v " ${animal}$" | cut -d" " -f2
done < file1.txt

Related

Print common values in columns using bash

I have file with two columns
apple apple
ball cat
cat hat
dog delta
I need to extract values that are common in two columns (occur in both columns) like
apple apple
cat cat
There is no ordering in items in each column.

Could you please try following and let me know if this helps you.
awk '
{
col1[$1]++;
col2[$2]++;
}
END{
for(i in col1){
if(col2[i]){
while(++count<=(col1[i]+col2[i])){
printf("%s%s",i,count==(col1[i]+col2[i])?ORS:OFS)}
count=""}
}
}' Input_file
NOTE: It will print the values if found in both the columns exactly number of times they are occurring in both the columns too.

$ awk '{a[$1];b[$2]} END{for(k in a) if(k in b) print k}' file
apple
cat
to print the values twice change to print k,k
with sort/join
$ join <(cut -d' ' -f1 file | sort) <(cut -d' ' -f2 file | sort)
apple
cat
perhaps,
$ function f() { cut -d' ' -f"$1" file | sort; }; join <(f 1) <(f 2)

Assuming I can use unix commands:
cut -d' ' -f2 fil | egrep `cut -d' ' -f1 < fil | paste -sd'|'` -
Basically what this does is this:
The second cut command collects all the words in the first column. The paste command joins them with a pipe (i.e. dog|cat|apple).
The first cut command takes the second column of words in the list and pipes them into a regexp-enabled egrep command.

Here is the closest I could get. Maybe you could loop through whole file and print when it reaches another occurrence.
Code
cat file.txt | gawk '$1==$2 {print $1,"=",$2}'
or
gawk '$1==$2 {print $1,"=",$2}' file.txt

Counting the number of names in a category in a .csv with bash

I would like to count the number of students in a .csv file depending on the category
Category 1 is the name, Category 2 is the country, Category 3 is the city
The .csv file is displayed as such :
michael_s;jpa;NYC
john_d;chn;TXS
jim_h;usa;POP
I have tried in my .sh script but it didn't work
sort -k3 -t; students.csv
edit:
I am trying to make a bash script that counts students by city and something that can also count one city just by executing the script such as
cat students.csv | ./script.sh NYC
The terminal will only display the students from NYC

If I've understood you correctly, something like this?
cut -d";" -f3 mike.txt | sort | uniq -c
(Sorry, incorrect solution first time - updated now)
To count only one city:
cut -d";" -f3 mike.txt | grep "NYC" | wc -l
Depending on the size of the file, how often you'll be doing this etc. it may be sensible to look at other solutions, eg. awk. But this solution will work just fine.

The reason for the error message "sort: multi-character tab 'students.csv'" is you haven't given the -t option the separator character. If you add a semicolon after -t, the sort will work as expected:
sort -k3 -t';' students.csv

There is always awk:
$ awk -F\; 'a[$1]++==0{c++}END{print c}' file
3
Once you describe your requirements more throughly, (count the names but sort -k3. Update the OP, please) we can help you better.
Edited to match your update:
$ awk -F\; -v col=3 -v val=NYC '
(length(val) && $col==val) || length(val)==0 && a[$col]++==0 {
c++
}
END { print c }
' file
1
If you set -v val= with the value you are looking for and -v col= with the column number, it counts the occurrences of val in col. You you set col but not val ot counts different values in col.

Use awk to remove lines with strings stored in list

Trying to figure out how to store a list as a variable (array?) and use it in with awk.
I have a file like such:
Jimmy
May31
John
June19
Paul
Aug15
Mark
Sept1
David
Nov15
I want to use awk to search my file and remove three names and the line following each of those names. So the final file should only contain 2 names (and birthdays).
I can do this with:
awk '/Jimmy|Mark|David/{n=2}; n {n--; next}; 1' < file
But is there a way to store the "Jimmy|Mark|David" list in the above command as a variable/array and do the same thing. (The real project I've working on has a much longer list to match in a much bigger file).
Thanks!

You can do it with the -v/--assign option:
awk -v pat='Jimmy|Mark|David' '$0~pat {n=2}; n {n--; next}; 1' birthdays
and then invoke regex comparison manually with ~ operator on the complete line.
Alternatively, if you have a long list of names to filter out in a file, grep with -f would probably be much faster option (see here). For example:
$ cat names
Jimmy
Mark
David
$ paste - - <birthdays | grep -vFf names | tr '\t' '\n'
John
June19
Paul
Aug15

You can get the list in a variable like this:
LIST=$(cat list.txt | tr "\n" "|")
and then use #randomir 's answer
awk -v pat=$LIST '$0~pat {n=2}; n {n--; next}; 1' birthdays
if I put your list:
Jimmy
John
Paul
Mark
David
into the file list.txt
LIST=$(cat list.txt | tr "\n" "|")
will output
Jimmy|John|Paul|Mark|David
providing you don't add a linebreak at the end of the last line

Seems like it would be easier to do this:
Patch 2 lines together
cat file | paste - -
then use awk to do what you need to do
$ cat list.txt| paste - -
Jimmy May31
John June19
Paul Aug15
Mark Sept1
David Nov15

How to use grep -c to count ocurrences of various strings in a file?

i have a bunch files with data from a company and i need to count, let's say, how many people from a certain cities there are. Initially i was doing it manually with
grep -c 'Chicago' file.csv
But now i have to look for a lot cities and it would be time consuming to do this manually every time. So i did some reaserch and found this:
#!/bin/sh
for p in 'Chicago' 'Washington' 'New York'; do
grep -c '$p' 'file.csv'
done
But it doenst work. It keeps giving me 0s as output and im not sure what is wrong. Anyways, basically what i need is for an output with every result (just the values) given by grep in a column so i can copy directly to a spreadsheet. Ex.:
132
407
523
Thanks in advance.

You should use sort + uniq for that:
$ awk '{print $<N>}' file.csv | sort | uniq -c
where N is the column number of cities (I assume it structured, as it's CSV file).
For example, which shell how often used on my system:
$ awk -F: '{print $7}' /etc/passwd | sort | uniq -c
1 /bin/bash
1 /bin/sync
1 /bin/zsh
1 /sbin/halt
41 /sbin/nologin
1 /sbin/shutdown
$

From the title, it sounds like you want to count the number of occurrences of the string rather than the number of lines on which the string appears, but since you accept the grep -c answer I'll assume you actually only care about the latter. Do not use grep and read the file multiple times. Count everything in one pass:
awk '/Chicago/ {c++} /Washington/ {w++} /New York/ {n++}
END { print c; print w; print n }' input-file
Note that this will print a blank line instead of "0" for any string that does not appear, so you migt want to initialize. There are several ways to do that. I like:
awk '/Chicago/ {c++} /Washington/ {w++} /New York/ {n++}
END { print c; print w; print n }' c=0 w=0 n=0 input-file

Count how many times each word from a word list appears in a file?

I have a file, list.txt which contains a list of words. I want to check how many times each word appears in another file, file1.txt, then output the results. A simple output of all of the numbers sufficient, as I can manually add them to list.txt with a spreadsheet program, but if the script adds the numbers at the end of each line in list.txt, that is even better, e.g.:
bear 3
fish 15
I have tried this, but it does not work:
cat list.txt | grep -c file1.txt

You can do this in a loop that reads a single word at a time from a word-list file, and then counts the instances in a data file. For example:
while read; do
echo -n "$REPLY "
fgrep -ow "$REPLY" data.txt | wc -l
done < <(sort -u word_list.txt)
The "secret sauce" consists of:
using the implicit REPLY variable;
using process substitution to collect words from the word-list file; and
ensuring that you are grepping for whole words in the data file.

This awk method only has to pass through each file once:
awk '
# read the words in list.txt
NR == FNR {count[$1]=0; next}
# process file1.txt
{
for (i=0; i<=NF; i++)
if ($i in count)
count[$i]++
}
# output the results
END {
for (word in count)
print word, count[word]
}
' list.txt file1.txt

This might work for you (GNU sed):
tr -s ' ' '\n' file1.txt |
sort |
uniq -c |
sed -e '1i\s|.*|& 0|' -e 's/\s*\(\S*\)\s\(\S*\)\s*/s|\\<\2\\>.*|\2 \1|/' |
sed -f - list.txt
Explanation:
Split file1.txt into words
Sort the words
Count the words
Create a sed script to match the words (initially zero out each word)
Run the above script against the list.txt

single line command
cat file1.txt |tr " " "\n"|sort|uniq -c |sort -n -r -k 1 |grep -w -f list.txt
The last part of the command tells grep to read words to match from list (-f option) and then match whole words(-w) i.e. if list.txt contains contains car, grep should ignore carriage.
However keep in mind that your view of whole word and grep's view might differ. for eg. although car will not match with carriage, it will match with car-wash , notice that "-" will be considered for word boundary. grep takes anything except letters,numbers and underscores as word boundary. Which should not be a problem as this conforms to the accepted definition of a word in English language.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to compare two files using if loop? - shell

Related

Print common values in columns using bash

Counting the number of names in a category in a .csv with bash

Use awk to remove lines with strings stored in list

How to use grep -c to count ocurrences of various strings in a file?

Count how many times each word from a word list appears in a file?

Categories

Resources