Checking for changes in two lists in Bash

Checking for changes in two lists in Bash - bash

I have these two files containing a list of items, where the quantity of each item is separated by a space. These lists are supposed to be already ordered and always having the same amount of items each, however I would prefer making a code that relies on the item name and not on the number of the line.
I need to have an output where only the changes are present, for example an echo for every item that has changed its associated value. I know I could use diff or meld for this, but I need a very specific output, because then I have to send a mail for every one of these changes, so I guess I should be using something like awk.
cat before.txt
Apples 3
Oranges 5
Bananas 7
Avocados 2
cat after.txt
Apples 3
Oranges 7
Bananas 7
Avocados 3
output wanted:
Oranges has changed form 5 to 7
Avocados has changed form 2 to 3

awk is your friend
awk 'NR==FNR{price[$1]=$2;next}
$1 in price{
if(price[$1]!=$2){
printf "%s has changed from %s to %s%s",$1,price[$1],$2,ORS
}
}' before.txt after.txt
Output
Oranges has changed from 5 to 7
Avocados has changed from 2 to 3
If you're new to awk consider buying [ Effective awk Programming ] by Arnold Robbins.

Not as great as other answer but simple to understand. This is not very economic way to do this task , however I have added it as it makes things simple . Of course if performance is not really a concern.
paste before.txt after.txt | awk '$2!=$4 {print $1 " are changes from " $2 " to " $4}'
Oranges are changes from 5 to 7
Avocados are changes from 2 to 3

Related

How can I remove duplicates only once an X number of occurrences is reached with awk?

I know how to use awk to remove duplicate lines in a file:
awk '!x[$0]++' myfile.txt
But how can I remove the duplicates only if there are more than two occurrences of this duplicate?
For example:
apple
apple
banana
apple
pear
banana
cherry
would become:
banana
pear
banana
cherry
Thanks in advance!

I would harness GNU AWK for this task following following way, let file.txt content be
apple
apple
banana
apple
pear
banana
cherry
then
awk 'FNR==NR{cnt[$0]+=1;next}cnt[$0]<=2' file.txt file.txt
gives output
banana
pear
banana
cherry
Explanation: This is 2-pass approach. FNR=NR (current file number of row equal to total number of row) does hold true only for 1st file, here I simply count number of occurences in file.txt by increasing (+=) value in array cnt under key being whole line ($0) by 1 then I instruct GNU AWK to go to next line as I do not want to do anything else. After that only lines which fullfill number of occurrences is less or equal two are outputed. Note: file.txt file.txt is intentional.
(tested in gawk 4.2.1)

If you don't care about output order, this would do what you want without reading the whole file into memory in awk:
$ sort file | awk '
$0!=prev { if (cnt<3) printf "%s", vals; prev=$0; cnt=vals="" }
{ vals=vals $0 ORS; cnt++ }
END { if (cnt<3) printf "%s", vals }
'
banana
banana
cherry
pear
The output of sort has all the values grouped together so you only need to look at the count when the values change to know how many of the previous value there were. sort still has to consider the whole input but it's designed to handle massive files by using demand paging, etc. and so is far more likely to be able to handle huge files than reading it all into memory in awk.
If you do care about output order you could use a DSU approach, see How to sort data based on the value of a column for part (multiple lines) of a file?

`sort -t` doesn't work properly with string input

I want to sort some space separated numbers in bash. The following doesn't work, however:
sort -dt' ' <<< "3 5 1 4"
Output is:
3 5 1 4
Expected output is:
1
3
4
5
As I understand it the -t option should use it's argument as a delimiter. Why isn't my code working? I know I can tr the spaces to newlines, but I'm working on a code golf thing and want to be able to do it without any other utility.
EDIT: everybody is answering by splitting the spaces to lines. I do not want to do this. I already know how to do this with other utilities. I am specifically asking how to do this with sort, and sort only. If -t doesn't delimit input, what does it do?

Use process substitution with printf to have each input number in a separate line, otherwise sort gets only one line to sort:
sort <(printf "%s\n" 3 5 1 4)
1
3
4
5
While doing so -dt '' is not needed.

After searching around, I have discovered what -t is for. It is for delimiting a file if you want to sort by a certain part of each line, for e.g, if you have
Hello,56
Cat,81
Book,14
Nope,62
And you want to sort by the number, you would to -t',' to delimit by the comma and then use -k to select which part to sort by. It is for field delimiting, not record delimiting like I thought.

Since sort only separates fields on a single line, you have no choice but to pipe data into sort -dt such as this method using IFS:
#!/bin/bash
clear
var="8 3 5 1 4 7 2 9 6"
old_IFS="$IFS"
function main() {
IFS=" "
printf "%s\n" $var | sort -d
}
main
This will give an obvious output of:
1
2
3
4
5
6
7
8
9
If this is not the way you wish to use sort well you have already answered your own question by doing a bit of digging on the issue which, if you would have done so before, would have saved time for the others giving answers as well as yours.

Combining lines with same string in Bash

I have a file with a bunch of lines that looks like this:
3 world
3 moon
3 night
2 world
2 video
2 pluto
1 world
1 pluto
1 moon
1 mars
I want to take each line that contains the same word, and combine them while adding the preceding number, so that it looks like this:
6 world
4 moon
3 pluto
3 night
2 video
1 mars
I've been trying combinations with sed, but I can't seem to get it right. My next idea was to sort them, and then check if the following line was the same word, then add them, but I couldn't figure out how to get it to sort by word rather than the number.

Sum and sort:
awk -F" " '{c[$2]+=$1} END {for (i in c){print c[i], i}}' | sort -n -r

Frequency count of particular field appended to line without deleting duplicates

Trying to work out how to get a frequency appended or prepended to each line in a file WITHOUT deleting duplicate occurrences (which uniq can do for me).
So, if input file is:
mango
mango
banana
apple
watermelon
banana
I need output:
mango 2
mango 2
banana 2
apple 1
watermelon 1
banana 2
All the solutions I have seen delete the duplicates. In other words, what I DON'T want is:
mango 2
banana 2
apple 1
watermelon 1

Basically you cannot do it in one pass without keeping everything in memory. If this is what you want to do, then use python/perl/awk/whatever. The algorithm is quite simple.
Let's do it with standard Unix tools. This is a bit cumbersome and can be improved but should do the work:
$ sort input | uniq -c > input.count
$ nl input | sort -k 2 > input.line
$ join -1 2 -2 2 input.line input.count | sort -k 2 | awk '{print $1 " " $3}
The first step is to count the number occurrences of a given word.
As you said you cannot both repeat and keep line ordering. So we have to fix that. The second step prepends the line number that we will use later to fix the ordering issue.
In the last step, we join the two temporary files on the original word, the second column contains the original line number sort we sort on this key and strip it from the final output.

Loops in Shell Scripting [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I need help with this shell script.
Must use a loop of some sort.
Must use input data exactly as shown.
Output redirection should be accomplished within the script, not on the command line.
Here's the input files I have:
http://pastebin.com/m3f783597
Here's what the output needs to be:
http://pastebin.com/m2c53b25a
Here's my failed attempt:
http://pastebin.com/m2c60b41
And that failed attempt's output:
http://pastebin.com/m3460e78c

Here's the help. Try to follow these as much as possible before looking at my solution below. That will help you out more in the long run, and in the short runsince it's a certainty that your educator can see this as easily as you can.
If he finds you've plagiarized code, it will probably mean an instant fail.
Your "failed attempt" as you put it is here. It's actually not too bad for a first attempt.
echo -e "Name\t\t On-Call\t\t Phone"
for daycount in 2 1 4 5 7 6 3
do
for namecount in 3 2 6 1 7 4 5
do
day=`head -n $daycount p2input2|tail -n 1|cut -f 2 -d " "`
name=`head -n $namecount p2input1|tail -n 1|cut -f 1 -d " "`
phone=`head -n $namecount p2input1|tail -n 1|cut -f 2 -d " "`
echo -e "$name\c"
echo -e "\t\t$day\c"
echo -e "\t\t$phone"
continue
done
done
And here's the hints:
You have two loops, one inside the other, each occurring 7 times. That means 49 lines of output rather than 7. You want to process each day and look up up name and phone for that day (actually name for that day and phone for that name).
It's not really suitable hardcoding linenumbers (although I admit it is sneaky)- what if the order of data changes? Better to search on values.
Tabs make things messy, use spaces instead since then the output doesn't rely on terminal settings and you don't need to worry about misaligned tabs.
And, for completeness, here's the two input files and the expected output:
p2input1 p2input2
======== ========
Dave 734.838.9801 Bob Tuesday
Bob 313.123.4567 Carol Monday
Carol 248.344.5576 Ted Sunday
Mary 313.449.1390 Alice Wednesday
Ted 248.496.2204 Dave Thursday
Alice 616.556.4458 Mary Saturday
Frank 634.296.3357 Frank Friday
Expected output
===============
Name On-Call Phone
carol monday 248.344.5576
bob tuesday 313.123.4567
alice wednesday 616.556.4458
dave thursday 734.838.9801
frank friday 634.296.3357
mary saturday 313.449.1390
ted sunday 248.496.2204
Having said all that, and assuming you've gone away for at least two hours to try and get your version running, here's mine:
1 #!/bin/bash
2 spc20=" "
3 echo "Name On-Call Phone"
4 echo
5 for day in monday tuesday wednesday thursday friday saturday sunday
6 do
7 name=`grep -i " ${day}$" p2input2 | awk '{print $1}'`
8 name=`echo ${name} | tr '[A-Z]' '[a-z]'`
9 bigname=`echo "${name}${spc20}" | cut -c1-15`
10
11 bigday=`echo "${day}${spc20}" | cut -c1-15`
12
13 phone=`grep -i "^${name} " p2input1 | awk '{print $2}'`
14
15 echo "${bigname} ${bigday} ${phone}"
16 done
And the following description should help:
Line 1elects the right shell, not always necessary.
Line 2 gives us enough spaces to make formatting easier.
Lines 3-4 give us the title and blank line.
Lines 5-6 cycles through the days, one at a time.
Line 7 gives us a name for the day. 'grep -i " ${day}$"' searches for the given day (regardless of upper or lower case) at the end of a line in pinput2 while the awk statement gives you field 1 (the name).
Line 8 simply makes the name all lowercase.
Line 9 creates a string of the right size for output by adding 50 spaces then cutting off all at the end except for 15.
Line 11 does the same for the day.
Line 13 is very similar to line 7 except it searches pinput1, looks for the name at the start of the line and returns the phone number as the second field.
Line 15 just outputs the individual items.
Line 16 ends the loop.
So there you have it, enough hints to (hopefully) fix up your own code, and a sample as to how a professional would do it :-).
It would be wise to read up on the tools used, grep, tr, cut and awk.

This is homework, I assume?
Read up on the sort and paste commands: man sort, man paste

Pax has given a good answer, but this code invokes fewer processes (11 vs a minimum of 56 = 7 * 8). It uses an auxilliary data file to give the days of the week and their sequence number.
cat <<! >p2input3
1 Monday
2 Tuesday
3 Wednesday
4 Thursday
5 Friday
6 Saturday
7 Sunday
!
sort +1 p2input3 > p2.days
sort +1 p2input2 > p2.call
join -1 2 -2 2 p2.days p2.call | sort +2 > p2.duty
sort +0 p2input1 > p2.body
join -1 3 -2 1 p2.duty p2.body | sort +2n | tr '[A-Z]' '[a-z]' |
awk 'BEGIN { printf("%-14s %-14s %s\n", "Name", "On-Call", "Phone");
printf "\n"; }
{ printf("%-14s %-14s %s\n", $1, $2, $4);}'
rm -f p2input3 p2.days p2.call p2.duty p2.body
The join command is powerful, but requires the data in the two files in sorted order on the joining keys. The cat command gives a list of days and the day number. The first sort places that list in alphabetic order of day name. The second sort places the names of the people on duty in alphabetic order of day name too. The first join then combines those two files on day name, and then sorts based on user name, yielding the output:
Wednesday 3 Alice
Tuesday 2 Bob
Monday 1 Carol
Thursday 4 Dave
Friday 5 Frank
Saturday 6 Mary
Sunday 7 Ted
The last sort puts the names and phone numbers into alphabetic name order. The second join then combines the name + phone number list with the name + duty list, yielding a 4 column output. This is run through tr to make the data all lower case, and then formatted with awk, which demonstrates its power and simplicity nicely here (you could use Perl or Python instead, but frankly, that would be messier).
Perl has a motto: TMTOWTDI "There's more than one way to do it".
That often applies to shell scripting too.
I suppose my code does not use a loop...oh dear. Replace the initial cat command with:
for day in "1 Monday" "2 Tuesday" "3 Wednesday" "4 Thursday" \
"5 Friday" "6 Saturday" "7 Sunday"
do echo $day
done > p2input3
This now meets the letter of the rules.

Try this one:
sort file1.txt > file1sort.txt
sort file2.txt > file2sort.txt
join file2sort.txt file1sort.txt | column -t > result.txt
rm file1sort.txt file2sort.txt

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Checking for changes in two lists in Bash - bash

Related

How can I remove duplicates only once an X number of occurrences is reached with awk?

`sort -t` doesn't work properly with string input

Combining lines with same string in Bash

Frequency count of particular field appended to line without deleting duplicates

Loops in Shell Scripting [closed]

Categories

Resources