Comparision of old and files for new updated records and to insert new records in old files - shell

I want to compare a file which comes with new records, to the same structured file which already have records and exists in the server.
Like old file is file1:
SKU PROD NAME CODE PRICE
A001 shirt jm s02 478
B002 jean jn j02 348
C003 mwear mw m02 567
New file which comes with new record is:
SKU PROD NAME CODE PRICE
A001 shirt jm s02 680
m01 mwear mw m02 567
c02 kurta kr k04 677
d12 dr d3 d03 400
Based on the new records, either old records can be updated or new records can be appended after the old records.
I need to write a unix shell sript for the above scenerio. Please help

Look for the different SKUs in the new file first. If a SKU is not found there, get it from the old file.
grep SKU file1.txt > out.txt
cat file1.txt file2.txt |\
grep -v SKU | cut -d" " -f1 | sort | uniq | while read sku
do
grep $sku file2.txt >> out.txt
if [[ $? -eq 1 ]] # If SKU was not in file2.txt
then
grep $sku file1.txt >> out.txt
fi
done
Result:
SKU PROD NAME CODE PRICE
A001 shirt jm s02 680
B002 jean jn j02 348
C003 mwear mw m02 567
c02 kurta kr k04 677
d12 dr d3 d03 400
m01 mwear mw m02 567

Look at the problem in reverse : your new file is basically your newest data. You could simply replace the old with the new, except that there may be some old data that is not in your new file.
So, a lazy way to do this might be to read your old data, and look for stuff that isn't in your new file, and add it if you need to.
Something like this (off the top of my head, couldn't test it)
while read old_data
do
# find the key
sku=$(echo $old_data | cut -d\ -f1)
# Look for key in new data, paste the line if it's not there.
grep sku new_file > /dev/null || echo old_data >> new_file
done < old_file

Related

bash: conserve tab with spaces for alignment with column

I am trying to display .tsv files aligned nicely as columns, and yet allow limiting display to the current screen width. I am able to get this done in the following way that works in general but will fail if the input contains a particular character that is used by column. The current solution that I am using presently works as follows:
bash$ cat sample.tsv | tr '\t' '#' | column -n -t -s # | cut -c-`tput cols`
I tried using tab itself directly but could not make it work. And with default option for column, any whitespace and not just tabs are used so it does not work for me. Would be thankful for any better alternative than the above.
PS:
A sample is shown below
bash:~$ cat sample.tsv
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$ cat sample.tsv | tr '\t' '#' | column -n -t -s # | cut -c-`tput cols`
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$ cat sample.tsv | column -n -t | cut -c-`tput cols`
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y
bash:~$
You can set column to use tab as character to be used to delimit columns with -s:
column -t -s $'\t' -n sample.tsv
Sl Name Number Status
1 W Jhon +1 234 4454 y
2 M Walter +2 232 453 n
3 S M Ray +1 343 453 y

A UNIX Command to Find the Name of the Student who has the Second Highest Score

I am new to Unix Programming. Could you please help me to solve the question.
For example, If the input file has the below content
RollNo Name Score
234 ABC 70
567 QWE 12
457 RTE 56
234 XYZ 80
456 ERT 45
The output will be
ABC
I tried something like this
sort -k3,3 -rn -t" " | head -n2 | awk '{print $2}'
Using awk
awk 'NR>1{arr[$3]=$2} END {n=asorti(arr,arr_sorted); print arr[arr_sorted[n-1]]}'
Demo:
$cat file.txt
RollNo Name Score
234 ABC 70
567 QWE 12
457 RTE 56
234 XYZ 80
456 ERT 45
$awk 'NR>1{arr[$3]=$2} END {n=asorti(arr,arr_sorted); print arr[arr_sorted[n-1]]}' file.txt
ABC
$
Explanation:
NR>1 --> Skip first record
{arr[$3]=$2} --> Create associtive array with marks as index and name as value
END <-- read till end of file
n=asorti(arr,arr_sorted) <-- Sort array arr on index value(i.e marks) and save in arr_sorted. n= number of element in array
print arr[arr_sorted[n-1]]} <-- n-1 will point to second last value in arr_sorted (i.e marks) and print corresponding value from arr
Your attempt is 90% correct just a single change
Try this...it will work.
sort -k3,3 -rn -t" " | head -n1 | awk '{print $2}'
Instead of using head -n2 replace it with head -n1

How to put pivot table using Shell script

I have data in a CSV file as below...
Emailid Storeid
a#gmail.com 2000
b#gmail.com 2001
c#gmail.com 2000
d#gmail.com 2000
e#gmail.com 2001
I am expecting below output, basically finding out how many email ids are there for each store.
StoreID Emailcount
2000 3
2001 2
So far i tried to solve my issue
IFS=","
while read f1 f2
do
awk -F, '{ A[$1]+=$2 } END { OFS=","; for (x in A) print x,A[x]; }' > /home/ec2-user/storewiseemials.csv
done < temp4.csv
With the above shell script i am not getting desired output, Can you guys please help me?
Using miller (https://github.com/johnkerl/miller) and starting from this (I have used a CSV, because I do not know if you use a tab or a white space as separator)
Emailid,Storeid
a#gmail.com,2000
b#gmail.com,2001
c#gmail.com,2000
d#gmail.com,2000
e#gmail.com,2001
and running
mlr --csv count-distinct -f Storeid -o Emailcount input >output
you will have
+---------+------------+
| Storeid | Emailcount |
+---------+------------+
| 2000 | 3 |
| 2001 | 2 |
+---------+------------+

shell script inserting "$" into a formatted column and adding new column

Hi guys pardon for my bad English. I manage to display out my data nicely and neatly using column program in the code. But how do i add a "$" in the price column. Secondly how do i add a new column total sum to it and display it with "$". (Price * Sold)
(echo "Title:Author:Price:Quantity:Sold" && cat BookDB.txt) | column -s: -t
Output:
Title Author Price Quantity Sold
The Godfather Mario Puzo 21.50 50 20
The Hobbit J.R.R Tolkien 40.50 50 10
Romeo and Juliet William Shakespeare 102.80 200 100
The Chronicles of Narnia C.S.Lewis 35.90 80 15
Lord of the Flies William Golding 29.80 125 25
Memories of a Geisha Arthur Golden 35.99 120 50
I guess you could do it with awk (line break added before && for readability
(echo "Title:Author:Price:Quantity:Sold:Calculated"
&& awk -F: '{printf ("%s:%s:$%d:%d:%d:%d\n",$1,$2,$3,$4,$5,$3*$5)}' BookDB.txt) | column -s: -t

How can I compare two 2D-array files with bash?

I have two 2D-array files to read with bash.
What I want to do is extract the elements inside both files.
These two files contain different rows x columns such as:
file1.txt (nx7)
NO DESC ID TYPE W S GRADE
1 AAA 20 AD 100 100 E2
2 BBB C0 U 200 200 D
3 CCC 9G R 135 135 U1
4 DDD 9H Z 246 246 T1
5 EEE 9J R 789 789 U1
.
.
.
file2.txt (mx3)
DESC W S
AAA 100 100
CCC 135 135
EEE 789 789
.
.
.
Here is what I want to do:
Extract the element in DESC column of file2.txt then find the corresponding element in file1.txt.
Extract the W,S elements in such row of file2.txt then find the corresponding W,S elements in such row of file1.txt.
If [W1==W2 && S1==S2]; then echo "${DESC[colindex]} ok"; else echo "${DESC[colindex]} NG"
How can I read this kind of file as a 2D array with bash or is there any convenient way to do that?
bash does not support 2D arrays. You can simulate them by generating 1D array variables like array1, array2, and so on.
Assuming DESC is a key (i.e. has no duplicate values) and does not contain any spaces:
#!/bin/bash
# read data from file1
idx=0
while read -a data$idx; do
let idx++
done <file1.txt
# process data from file2
while read desc w2 s2; do
for ((i=0; i<idx; i++)); do
v="data$i[1]"
[ "$desc" = "${!v}" ] && {
w1="data$i[4]"
s1="data$i[5]"
if [ "$w2" = "${!w1}" -a "$s2" = "${!s1}" ]; then
echo "$desc ok"
else
echo "$desc NG"
fi
break
}
done
done <file2.txt
For brevity, optimizations such as taking advantage of sort order are left out.
If the files actually contain the header NO DESC ID TYPE ... then use tail -n +2 to discard it before processing.
A more elegant solution is also possible, which avoids reading the entire file in memory. This should only be relevant for really large files though.
If the rows order is not needed be preserved (can be sorted), maybe this is enough:
join -2 2 -o 1.1,1.2,1.3,2.5,2.6 <(tail -n +2 file2.txt|sort) <(tail -n +2 file1.txt|sort) |\
sed 's/^\([^ ]*\) \([^ ]*\) \([^ ]*\) \2 \3/\1 OK/' |\
sed '/ OK$/!s/\([^ ]*\) .*/\1 NG/'
For file1.txt
NO DESC ID TYPE W S GRADE
1 AAA 20 AD 100 100 E2
2 BBB C0 U 200 200 D
3 CCC 9G R 135 135 U1
4 DDD 9H Z 246 246 T1
5 EEE 9J R 789 789 U1
and file2.txt
DESC W S
AAA 000 100
CCC 135 135
EEE 789 000
FCK xxx 135
produces:
AAA NG
CCC OK
EEE NG
Explanation:
skip the header line in both files - tail +2
sort both files
join the needed columns from both files into one table like, in the result will appears only the lines what has common DESC field
like next:
AAA 000 100 100 100
CCC 135 135 135 135
EEE 789 000 789 789
in the lines, which have the same values in 2-4 and 3-5 columns, substitute every but 1st column with OK
in the remainder lines substitute the columns with NG

Resources