Unix sort function that sorts a space delimited txt file according to specific column by ASCII value - bash

I have tried to come up with this answer but everything I try does not work.
My code below is what I have come up with:
sort -k$field_number "$1".db > temp.txt && cp temp.txt "$1.db"
Shouldn't this line of code sort the .db file by ASCII value (the sort function should sort by ASCII by default?). In the code, field_number corresponds to the column I wish to sort the lines of the file by. When I use my code to format the file (where I am sorting by column 2), I get the output below.
Textfile (the .db file) format:
a 5 5 5
Green 72 72 72
Smith 84 72 93
Jones 85 73 94
z 9 9 9
Ford 92 64 93
Miller 93 73 87
bobua che Apple Xor

Maybe your problem is with your collection. Try this please:
LC_COLLATE=C sort -n --ignore-case -k$field_number "$1".db > temp.txt && cp temp.txt "$1.db"

Related

How to add the elements in a for loop [duplicate]

This question already has answers here:
Summing values of a column using awk command
(2 answers)
Closed 1 year ago.
so basically my code looks through data and greps whatever it begins with, and so I've been trying to figure out a way where I'm able to add the those values.
the sample input is
35 45 75 76
34 45 53 55
33 34 32 21
my code:
for id in $(awk '{ print $1 }' < $3); do echo $id; done
I'm printing it right now to see the values but basically whats outputted is
35
34
33
I'm trying to add them all together but I cant figure out how, some help would be appreciated.
my desired output would be
103
Lots of ways to do this, a few ideas ...
$ cat numbers.dat
35 45 75 76
34 45 53 55
33 34 32 21
Tweaking OP's current code:
$ sum=0
$ for id in $(awk '{ print $1 }' < numbers.dat); do ((sum+=id)); done
$ echo "${sum}"
102
Eliminating awk:
$ sum=0
$ while read -r id rest_of_line; do sum=$((sum+id)); done < numbers.dat
$ echo "${sum}"
102
Using just awk (looks like Aivean beat me to it):
$ awk '{sum+=$1} END {print sum}' numbers.dat
102
awk '{ sum += $1 } END { print sum }'
Test:
35 45 75 76
34 45 53 55
33 34 32 21
Result:
102
(sum(35, 34, 33) = 102, that's what you want, right?)
Here is the detailed explanation of how this works:
$1 is the first column of the input.
sum is the variable that holds the sum of all the values in the first column.
END { print sum } is the action to be performed after all the input has been processed.
So the awk program is basically summing up the first column of the input and printing the result.
This answer was partially generated by Davinci Codex model, supervised and verified by me.

extracting lines if the first field matches another list saved in a different file -- shell command

I have two files. One contains a list of items, e.g.,
Allie
Bob
John
Laurie
Another file (file2) contains a different list of items in a different order, but some items might overlap with the items in file 1, e.g,
Laurie 45 56 6 75
Moxipen 10 45 56 56
Allie 45 56 67 23
I want to intersect these two files and extract only those lines from file 2 whose first field matches an item in field 1.
i.e., my output should be
Allie 45 56 67 23
Laurie 45 56 6 75
(preferably in this order, but it's OK if not)
grep -f file1 file2 doesn't do what I want.
I also need something efficient because the second file is HUGE.
I also tried this:
awk -F, 'FNR==NR {a[$1]=$0; next}; $1 in a {print a[$1]}' file2 file1
If order doesn't matter then
awk 'FNR==NR{ arr[$1]; next }$1 in arr' file1 file2
Explanation
FNR==NR{ arr[$1]; next } Here we read first file (file1), arr is array, whose index key being first field $1.
$1 in arr we read second file ( file2), if array arr which was created while reading first file, has index key which is second file's first column ($1 in arr gives true, if index key exists), then print current record/row/line from file2
Test Results:
akshay#db-3325:/tmp$ cat file1
Allie
Bob
John
Laurie
akshay#db-3325:/tmp$ cat file2
Laurie 45 56 6 75
Moxipen 10 45 56 56
Allie 45 56 67 23
akshay#db-3325:/tmp$ awk 'FNR==NR{ arr[$1]; next }$1 in arr' file1 file2
Laurie 45 56 6 75
Allie 45 56 67 23
No need for complex joins, it is a filtering function
$ grep -wFf file1 file2
Laurie 45 56 6 75
Allie 45 56 67 23
has the benefit or keeping the order in file2 as well. -w option is for full word matches to eliminate sub-string matches to create false positives. Of course if your sample input is not representative and your data may contain key like entries in other fields this will not work without qualifying beginning of line.
This is the job that join is built for.
Providing a reproducer testable via copy-and-paste with shell functions (which you could replace with your actual input files):
cat_file1() {
printf '%s\n' Allie Bob John Laurie
}
cat_file2() {
printf '%s\n' 'Laurie 45 56 6 75' \
'Moxipen 10 45 56 56' \
'Allie 45 56 67 23'
}
join <(cat_file1 | sort) <(cat_file2 | sort)
...properly emits:
Allie 45 56 67 23
Laurie 45 56 6 75
Of course, don't cat file1 | sort -- run sort <file1 to provide a real handle for better efficiency, or (better!) store your inputs in sorted form in the first place.

While loop in bash getting duplicate result

$ cat grades.dat
santosh 65 65 65 65
john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84
santosh 99 99 99 99 99
Scripts:-
#!/usr/bin/bash
filename="$1"
while read line
do
a=`grep -w "santosh" $1 | awk '{print$1}' |wc -l`
echo "total is count of the file is $a";
done <"$filename"
O/p
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
Real O/P should be
total is count of the file is 2 like this right..please let me know,where i am missing in above scripts.
Whilst others have shown you better ways to solve your problem, the answer to your question is in the following line:
a=`grep -w "santosh" $1 | awk '{print$1}' |wc -l`
You are storing names in the variable "line" through the while loop, but it is never used. Instead your loop is always looking for "santosh" which does appear twice and because you run the same query for all 5 lines in the file being searched, you therefore get 5 lines of the exact same output.
You could alter your current script like so:
a=$(grep -w "$line" "$filename" | awk '{print$1}' | wc -l)
The above is not meant to be a solution as others have pointed out, but it does solve your issue.

Sort according to second column numerically and first alphabetically

I have 2 columns, I want to sort them using bash.
I used the command:
sort -k2 -n
c 9
c 11
c 11
sh 11
c 13
c 15
txt 47
txt 94
txt 345
txt 628
sh 3673
This is the result, but i need them to be sorted like this:
c 9
c 11
c 11
c 13
c 15
sh 11
sh 3673
txt 47
txt 94
txt 345
txt 628
Any ideas?
First sort by column 1, then by 2:
sort -k1,1 -k2,2n file.txt

Removing multiple block of lines of a text file in bash

Assume a text file with 40 lines of data. How can I remove lines 3 to 10, 13 to 20, 23 to 30, 33 to 40, in place using bash script?
I already know how to remove lines 3 to 10 with sed, but I wonder if there is a way to do all the removing, in place, with only one command line. I can use for loop but the problem is that with each iteration of loop the lines number will be changed and it needs some additional calculation of line numbers to be removed.
here is an awk oneliner, works for your needs no matter your file has 40 lines or 40k lines:
awk 'NR~/[12]$/' file
for example, with 50 lines:
kent$ seq 50|awk 'NR~/[12]$/'
1
2
11
12
21
22
31
32
41
42
sed -i '3,10d;13,20d;23,30d;33,40d' file
This might work for you (GNU sed):
sed '3~10,+7d' file
Deletes lines in the range of 3 and thereafter steps of 10 for the following 7 lines to be deleted.
If the file was longer than 40 lines and you were only interested in the first 40 lines:
sed '41,$b;3~10,+7d' file
The first instruction tells sed to ignore lines 41 to end-of-file.
Could also be written:
sed '1,40{3~10,+7d}' file
#Kent's answer is the way to go for this particular case, but in general:
$ seq 50 | awk '{idx=(NR%10)} idx>=1 && idx<=2'
1
2
11
12
21
22
31
32
41
The above will work even if you want to select the 4th through 7th lines out of every 13, for example:
$ seq 50 | awk '{idx=(NR%13)} idx>=4 && idx<=7'
4
5
6
7
17
18
19
20
30
31
32
33
43
44
45
46
its not constrained to N out of 10.
Or to select just the 3rd, 5th and 6th lines out of every 13:
$ seq 50 | awk 'BEGIN{split("3 5 6",tmp); for (i in tmp) tgt[tmp[i]]=1} tgt[NR%13]'
3
5
6
16
18
19
29
31
32
42
44
45
The point is - selecting ranges of lines is a job for awk, definitely not sed.
awk '{m=NR%10} !(m==0 || m>=3)' file > tmp && mv tmp file

Resources