bash script to sort - bash

I have this file I created:
Kuala Lumpur 78 56
Seoul 86 66
Karachi 95 75
Tokyo 85 60
Lahore 85 75
Manila 90 85
On the command line I can sort it no problem using sort -t and delimit with a tab space, but now I'm trying to write a script to read this in and print out different sorts. Now if I read into an array and tell it to store by the tab the "Kuala Lumpur" line is thrown off and then, so is the sort. What do i do about that space. I don't want to take it out or replace with a comma but if I have to I will.
#!/bin/bash
cat asiapac-temps | sort -t' ' -k 1,1d
echo ""
cat asiapac-temps | sort -t' ' -k 2,2n
echo ""
cat asiapac-temps | sort -t' ' -k 3
this is what I'm using now. I was trying to do this in a different way so to not use sort over and over
The output is:
By city:
Karachi 95 75
Kuala Lumpur 78 56
Lahore 85 75
Manila 90 85
Seoul 86 66
Tokyo 85 60
by high temp (col2)
Kuala Lumpur 78 56
Lahore 85 75
Tokyo 85 60
Seoul 86 66
Manila 90 85
Karachi 95 75
by low temp (col3)
Kuala Lumpur 78 56
Tokyo 85 60
Seoul 86 66
Karachi 95 75
Lahore 85 75
Manila 90 85

Since feature requests to mark a comment as an answer remain declined, I copy the above solution here.
You can't sort anything once and output 3 different results. Any time you write a loop in shell you've probably got the wrong approach (shell is primarily an environment from which to call tools, not a programming language). Just calling sort each time you want to produce sorted output will almost certainly be simpler and more efficient than any approach you can come up with involving array indexing. – Ed Morton

If your question is "how do I input the tab character from the command line", the answer is "you don't need to" -- sort recognizes the tab character as a separator by default.

Related

Combining summary statistics from multiple input files in Bash

I want to generate some Summary statistics for "Mary" based on data in multiple files.
input1.txt looks like
Jose 88518 95 75 95 62 100 78 68
Alex 97502 84 79 80 73 88 95 79 85 93
Mary 98765 80 75 100 51 83 75 99 50 75 89 94
...
input2.txt looks like
Jack 32954 100 98 95 100 93 100 99 98 100 100
Mary 98765 85 83 96 77 81 84 98 75 87
Lisa 83746 100 100 100 100 99 100 98 100 100 100
...
Running the following one-liner code in Bash for input1.txt:
awk '/Mary/{for(n=3;n<=NF;n++) print $n}' input1.txt | Rscript -e 'summary (as.numeric (readLines ("stdin")))'
The results are:
Min. 1st Qu. Median Mean 3rd Qu. Max.
50.00 75.00 80.00 79.18 91.50 100.00
Running the following code for input2.txt:
awk '/Mary/{for(n=3;n<=NF;n++) print $n}' input2.txt | Rscript -e 'summary (as.numeric (readLines ("stdin")))'
The results are:
Min. 1st Qu. Median Mean 3rd Qu. Max.
75.00 81.00 84.00 85.11 87.00 98.00
How can I write a one-liner solution to combine "Mary"'s stats from each data file into one report that results in something similar to the following?
Min. 1st Qu. Median Mean 3rd Qu. Max.
50.00 75.00 80.00 79.18 91.50 100.00
75.00 81.00 84.00 85.11 87.00 98.00
I think you need to use a bash for loop.
for file in $(ls input*.txt); do awk '/Mary/{for(n=3;n<=NF;n++) print $n}' $file | Rscript -e 'summary (as.numeric (readLines ("stdin")))'; done
Probably you will end with two headers now, but as we do not have visibility on how the headers are created it makes hard to suggest.
Min. 1st Qu. Median Mean 3rd Qu. Max.

filter multiline record file based if one of the lines meet condition ( word count)

everyone
I am looking for a way to keep the records from txt file that meet the following condition:
This is the example of the data:
aa bb cc
11 22 33
44 55 66
77 88 99
aa bb cc
11 22 33 44 55 66 77
44 55 66 66
77 88 99
aa bb cc
11 22 33 44 55
44 55 66
77 88 99 77
...
Basically, it's a file where one record where there are total 5 lines, 4 lines contain strings/numbers with tab delimeter , and the last is the new line \n.
The first line of the record always has 3 elements, while the number of elements in 2nd 3rd and 4th line can be different.
What I need to do is to remove every record(5 lines block) where total number of elements in the second line > 3 ( and I don't care about the number of elements in all the rest lines) . The output of the example should look like this:
aa bb cc
11 22 33
44 55 66
77 88 99
...
so only the record where i have 3 elements are kept and recorded in the new txt file.
I tried to do it with awk by modifying FS and RS values like this:
awk 'BEGIN {RS="\n\n"; FS="\n";}
{if(length($2)==3) print $2"\n\n"; }' test_filter.txt
but if(length($2)==3) is not correct, as I should count the number of entries in 2nd field instead of counting the length, which I can't find how to do.. any help would be much appreaciated!
thanks in advance,
You can use the split() function to break a line/field/string into components; in this case:
n=split($2,arr," ")
Where:
we split field #2, using a single space (" ") as the delimiter ...
components are stored in array arr[] and ...
n is the number of elements in the array
Pulling this into OP's current awk code, along with a couple small changes, we get:
awk 'BEGIN {ORS=RS="\n\n"; FS="\n"} {n=split($2,arr," "); if (n>=4) next}1' test_filter.txt
With an additional block added to our sample:
$ cat test_filter.txt
aa bb cc
11 22 33
44 55 66
77 88 99
aa bb cc
11 22 33 44 55 66 77
44 55 66 66
77 88 99
aa bb cc
111 222 333
444 555 665
777 888 999
aa bb cc
11 22 33 44 55
44 55 66
77 88 99 77
This awk solution generates:
aa bb cc
11 22 33
44 55 66
77 88 99
aa bb cc
111 222 333
444 555 665
777 888 999
# blank line here

in bash split a variable into an array with each array value containing n values from the list

So i'm issuing a query to mysql and it's returning say 1,000 rows,but each iteration of the program could return a different number of rows. I need to break up (without using a mysql limit) this result set into chunks of 100 rows that i can then programatically iterate through in these 100 row chunks.
So
MySQLOutPut='1 2 3 4 ... 10,000"
I need to turn that into an array that looks like
array[1]="1 2 3 ... 100"
array[2]="101 102 103 ... 200"
etc.
I have no clue how to accomplish this elegantly
Using Charles' data generation:
MySQLOutput=$(seq 1 10000 | tr '\n' ' ')
# the sed command will add a newline after every 100 words
# and the mapfile command will read the lines into an array
mapfile -t MySQLOutSplit < <(
sed -r 's/([^[:blank:]]+ ){100}/&\n/g; $s/\n$//' <<< "$MySQLOutput"
)
echo "${#MySQLOutSplit[#]}"
# 100
echo "${MySQLOutSplit[0]}"
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
echo "${MySQLOutSplit[99]}"
# 9901 9902 9903 9904 9905 9906 9907 9908 9909 9910 9911 9912 9913 9914 9915 9916 9917 9918 9919 9920 9921 9922 9923 9924 9925 9926 9927 9928 9929 9930 9931 9932 9933 9934 9935 9936 9937 9938 9939 9940 9941 9942 9943 9944 9945 9946 9947 9948 9949 9950 9951 9952 9953 9954 9955 9956 9957 9958 9959 9960 9961 9962 9963 9964 9965 9966 9967 9968 9969 9970 9971 9972 9973 9974 9975 9976 9977 9978 9979 9980 9981 9982 9983 9984 9985 9986 9987 9988 9989 9990 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000
Something like this:
# generate content
MySQLOutput=$(seq 1 10000 | tr '\n' ' ') # seq is awful, don't use in real life
# split into a large array, each item stored individually
read -r -a MySQLoutArr <<<"$MySQLOutput"
# add each batch of 100 items into a new array entry
batchSize=100
MySQLoutSplit=( )
for ((i=0; i<${#MySQLoutArr[#]}; i+=batchSize)); do
MySQLoutSplit+=( "${MySQLoutArr[*]:i:batchSize}" )
done
To explain some of the finer points:
read -r -a foo reads contents into an array named foo, split on IFS, up to the next character specified by read -d (none given here, thus reading only a single line). If you wanted each line to be a new array entry, consider IFS=$'\n' read -r -d '' -a foo, which will read each line into an array, terminated at the first NUL in the input stream.
"${foo[*]:i:batchSize}" expands to a list of items in array foo, starting at index i, and taking the next batchSize items, concatenated into a single string with the first character in $IFS used as a separator.

Creating a sequence of distinct random numbers within a certain range in bash script

I have a file which contains entries numbered 0 to 149. I am writing a bash script which randomly selects 15 out of these 150 entries and create another file from them.
I tried using random number generator:
var=$RANDOM
var=$[ $var % 150 ]
Using var I picked those 15 entries. But I want all of these entries to be different. Sometimes same entry is getting picked up twice. Is there a way to create a sequence of random numbers within a certain range, (in my example, 0-149) ?
Use shuf -i to generate a random list of numbers.
$ entries=($(shuf -i 0-149 -n 15))
$ echo "${entries[#]}"
55 96 80 109 46 58 135 29 64 97 93 26 28 116 0
If you want them in order then add sort -n to the mix.
$ entries=($(shuf -i 0-149 -n 15 | sort -n))
$ echo "${entries[#]}"
12 22 45 49 54 66 78 79 83 93 118 119 124 140 147
To loop over the values, do:
for entry in "${entries[#]}"; do
echo "$entry"
done

using sort command in shell scripting

I execute the following code :
for i in {1..12};do printf "%s %s\n" "${edate1[$i]}" "${etime1[$i]}"
(I retrieve the values of edate1 and etime1 from my database and store it in an array which works fine.)
I receive the o/p as:
97 16
97 16
97 12
107 16
97 16
97 16
97 16
97 16
97 16
97 16
97 16
100 15
I need to sort the first column using the sort command.
Expected o/p:
107 16
100 16
97 12
97 16
97 16
97 16
97 16
97 16
97 16
97 16
97 16
97 15
This is what I did to find your solution:
Copy your original input to in.txt
Run this code, which uses awk, sort, and paste.
awk '{print $1}' in.txt | sort -g -r -s > tmp.txt
paste tmp.txt in.txt | awk '{print $1 " " $3}' > out.txt
Then out.txt matches the expected output in your original post.
To see how it works, look at this:
$ paste tmp.txt in.txt
107 97 16
100 97 16
97 97 12
97 107 16
97 97 16
97 97 16
97 97 16
97 97 16
97 97 16
97 97 16
97 97 16
97 100 15
So you're getting the first column sorted, then the original columns in place.
Awk makes it easy to print out the columns (fields) you're interested in, ie, the first and third.
This is the best and simplest way to sort your data
<OUTPUT> | sort -nrk1
Refer the following link to know more about the magic of sort.

Resources