Reading Column and Find Median (Bash)

Reading Column and Find Median (Bash) - bash

I want to find the median for each column, however it doesn't work like what I want.
1 2 3
3 2 1
2 1 5
I'm expecting for
2 2 3
for the result, however turns out it just give sum error and some "sum" of the column. Below is a snippet of the code for "median in column"
while read -r line; do
read -a array <<< "$line"
for i in "${!array[#]}"
do
column[${i}]=${array[$i]}
((length[${i}]++))
result=${column[*]} | sort -n
done < file
for i in ${!column[#]}
do
#some median calculation.....
Notes: I want to practice bash, that's why I hard-coded using bash.
I really appreciate if someone could help me, especially in BASH. Thank you.

Bash is really not suitable for low-level text processing like this: the read command does a system call for each character that it reads, which means that it's slow, and it's a CPU hog. It's ok for processing interactive input, but using it for general text processing is madness. It would be much better to use awk (Python, Perl, etc) for this.
As an exercise in learning about Bash I guess it's ok, but please try to avoid using read for bulk text processing in real programs. For further information, please see Why is using a shell loop to process text considered bad practice? on the Unix & Linux Stack Exchange site, especially the answer written by
Stéphane Chazelas (the discoverer of the Shellshock Bash bug).
Anyway, to get back to your question... :)
Most of your code is ok, but
result=${column[*]} | sort -n
doesn't do what you want it to.
Here's one way to get the column medians in pure Bash:
#!/usr/bin/env bash
# Find medians of columns of numeric data
# See http://stackoverflow.com/q/33095764/4014959
# Written by PM 2Ring 2015.10.13
fname=$1
echo "input data:"
cat "$fname"
echo
#Read rows, saving into columns
numrows=1
while read -r -a array; do
((numrows++))
for i in "${!array[#]}"; do
#Separate column items with a newline
column[i]+="${array[i]}"$'\n'
done
done < "$fname"
#Calculate line number of middle value; which must be 1-based to use as `head`
#argument, and must compensate for extra newline added by 'here' string, `<<<`
midrow=$((1+numrows/2))
echo "midrow: $midrow"
#Get median of each column
result=''
for i in "${!column[#]}"; do
median=$(sort -n <<<"${column[i]}" | head -n "$midrow" | tail -n 1)
result+="$median "
done
echo "result: $result"
output
input data:
1 2 3
3 2 1
2 1 5
midrow: 3
result: 2 2 3

Related

Is there a way for me to simplify these echos? [duplicate]

This question already has answers here:
How do I iterate over a range of numbers defined by variables in Bash?
(20 answers)
Closed 3 years ago.
I am still learning how to shell script and I have been given a challenge to make it easier for me to echo "Name1" "Name2"..."Name15" and I'm not too sure where to start, I've had ideas but I don't want to look silly if I mess it up. Any help?
I haven't actually tried anything just yet it's all just been mostly thought.
#This is what I wrote to start
#!/bin/bash
echo "Name1"
echo "Name2"
echo "Name3"
echo "Name4"
echo "Name5"
echo "Name6"
echo "Name7"
echo "Name8"
echo "Name9"
echo "Name10"
echo "Name11"
echo "Name12"
echo "Name13"
echo "Name14"
echo "Name15"
My expected results are obviously just for it to output "Name1" "Name2" etc. But I'm looking for a more creative way to do it. If possible throw in a few ways to do it so I can learn. Thank you.

The easiest (possibly not the most creative) way to do this is to use printf:
printf "%s\n" name{1..15}
This relies on bash brace expansion {1..15} to have the 15 strings.

Use a for loop
for i in {1..15};do echo "Name$i";done

A few esoteric solutions, from the least to the most unreasonable :
base64 encoded string :
base64 -d <<<TmFtZTEKTmFtZTIKTmFtZTMKTmFtZTQKTmFtZTUKTmFtZTYKTmFtZTcKTmFtZTgKTmFtZTkKTmFtZTEwCk5hbWUxMQpOYW1lMTIKTmFtZTEzCk5hbWUxNApOYW1lMTUK
The weird chain is your expected result encoded in base64, an encoding generally used to represent binary data as text. base64 -d <<< weirdChain is passing the weird chain as input to the base64 tool and asking it to decode it, which displays your expected result
generate an infinite stream of "Name", truncate it, use line numbers :
yes Name | awk 'NR == 16 { exit } { printf("%s%s\n", $0, NR) }'
yes outputs an infinite stream of what it's passed as argument (or y by default, used to automatize interactive scripts asking for [y/n] confirmation). The awk command exits once it reaches the 16th line, and otherwise prints its input (provided by yes) followed by the line number. The truncature could as easily be done with head -15, and I've tried using the nl "number line" utility or grep -n to number lines, but they always added the line numbers as prefix which required an extra re-formatting step.
read random binary data and hope to stumble on all the lines you want to output :
timeout 1d strings /dev/urandom | grep -Eo "Name(1[0-5]|[1-9])" | sort -uV
strings /dev/urandom will extract ascii sequences from the binary random source /dev/urandom, grep will filter those which respect the format of a line of your expected output and sort will reorder those lines in the correct order. Since sort needs to have a received its whole input before it reorders it and /dev/urandom won't stop producing data, we use timeout 1d to stop reading from /dev/urandom after a whole day in hope it has sifted through enough random data to find your 15 lines (I'm not sure that's even remotely likely).
use an HTTP client to retrieve this page, extract the bash script you posted and execute it.
my_old_script=$(curl "https://stackoverflow.com/questions/57818680/" | grep "#This is what I wrote to start" -A 18 | tail -n+4)
eval "$my_old_script"
curl is a command line tool that can be used as an HTTP client, grep with its -A 18 parameter will select the "This is what I wrote to start" text and the 18 lines that follow, tail will remove the first 3 lines, and eval will execute your script.
While it will be much more efficient than the previous solution, it's an even less reasonable solution because high-rep users can edit your question to make this solution execute arbitrary code on your computer. Ideally you'd be using an HTML-aware parser rather than basic string manipulation to extract the code, but we're not talking about best practices here...

Bash split stdin by null and pipe to pipeline

I have a stream that is null delimited, with an unknown number of sections. For each delimited section I want to pipe it into another pipeline until the last section has been read, and then terminate.
In practice, each section is very large (~1GB), so I would like to do this without reading each section into memory.
For example, imagine I have the stream created by:
for I in {3..5}; do seq $I; echo -ne '\0';
done
I'll get a steam that looks like:
1
2
3
^#1
2
3
4
^#1
2
3
4
5
^#
When piped through cat -v.
I would like to pipe each section through paste -sd+ | bc, so I get a stream that looks like:
6
10
15
This is simply an example. In actuality the stream is much larger and the pipeline is more complicated, so solutions that don't rely on streams are not feasible.
I've tried something like:
set -eo pipefail
while head -zn1 | head -c-1 | ifne -n false | paste -sd+ | bc; do :; done
but I only get
6
10
If I leave off bc I get
1+2+3
1+2+3+4
1+2+3+4+5
which is basically correct. This leads me to believe that the issue is potentially related to buffering and the way each process is actually interacting with the pipes between them.
Is there some way to fix the way that these commands exchange streams so that I can get the desired output? Or, alternatively, is there a way to accomplish this with other means?
In principle this is related to this question, and I could certainly write a program that reads stdin into a buffer, looks for the null character, and pipes the output to a spawned subprocess, as the accepted answer does for that question. Given the general support of streams and null delimiters in bash, I'm hoping to do something that's a little more "native". In particular, if I want to go this route, I'll have to escape the pipeline (paste -sd+ | bc) in a string instead of just letting the same shell interpret it. There's nothing too inherently bad about this, but it's a little ugly and will require a bunch of somewhat error prone escaping.
Edit
As was pointed out in an answer, head makes no guarantees about how much it buffers. Unless it only buffers single byte at a time, which would be impractical, this will never work. Thus, it seems like the only solution would be to read it into memory, or write a specific program.

The issue with your original code is that head doesn't guarantee that it won't read more than it outputs. Thus, it can consume more than one (NUL-delimited) chunk of input, even if it's emitting only one chunk of output.
read, by contrast, guarantees that it won't consume more than you ask it for.
set -o pipefail
while IFS= read -r -d '' line; do
bc <<<"${line//$'\n'/+}"
done < <(build_a_stream)
If you want native logic, there's nothing more native than just writing the whole thing in shell.
Calling external tools -- including bc, cut, paste, or others -- involves a fork() penalty. If you're only processing small amounts of data per invocation, the efficiency of the tools is overwhelmed by the cost of starting them.
while read -r -d '' -a numbers; do # read up to the next NUL into an array
sum=0 # initialize an accumulator
for number in "${numbers[#]}"; do # iterate over that array
(( sum += number )) # ...using an arithmetic context for our math
done
printf '%s\n' "$sum"
done < <(build_a_stream)
For all of the above, I tested with the following build_a_stream implementation:
build_a_stream() {
local i j IFS=$'\n'
local -a numbers
for ((i=3; i<=5; i++)); do
numbers=( )
for ((j=0; j<=i; j++)); do
numbers+=( "$j" )
done
printf '%s\0' "${numbers[*]}"
done
}

As discussed, the only real solution seemed to be writing a program to do this specifically. I wrote one in rust called xstream-util. After installing it with cargo install xstream-util, you can pipe the input into
xstream -0 -- bash -c 'paste -sd+ | bc'
to get the desired output
6
10
15
It doesn't avoid having to run the program in bash, so it still needs escaping if the pipeline is complicated. Also, it currently only supports single byte delimiters.

Read range of numbers into a for loop

So, I am building a bash script which iterates through folders named by numbers from 1 to 9. The script depends on getting the folder names by user input. My intention is to use a for loop using read input to get a folder name or a range of folder names and then do some stuff.
Example:
Let's assume I want to make a backup with rsync -a of a certain range of folders. Usually I would do:
for p in {1..7}; do
rsync -a $p/* backup.$p
done
The above would recursively backup all content in the directories 1 2 3 4 5 6 and 7 and put them into folders named as 'backup.{index-number}'. It wouldn't catch folders/files with a leading . but that is not important right now.
Now I have a similar loop in an interactive bash script. I am using select and case statements for this task. One of the options in case is this loop and it shall somehow get a range of numbers from user input. This now becomes a problem.
Problem:
If I use read to get the range then it fails when using {1..7} as input. The input is taken literally and the output is just:
{1..7}
I really would like to know why this happens. Let me use a more descriptive example with a simple echo command.
var={1..7} # fails and just outputs {1..7}
for p in $var; do echo $p;done
read var # Same result as above. Just outputs {1..7}
for p in $var; do echo $p;done
for p in {1..7}; do echo $p;done # works fine and outputs the numbers 1-7 seperated with a newline.
I've found a workaround by storing the numbers in an array. The user can then input folder names seperated by a space character like this: 1 2 3 4 5 6 7
read -a var # In this case the output is similar to the 3rd loop above
for p in ${var[#]}; do echo $p; done
This could be a way to go but when backing up 40 folders ranging from 1-40 then adding all the numbers one-by-one completely makes my script redundant. One could find a solution to one of the millennium problems in the same time.
Is there any way to read a range of numbers like {1..9} or could there be another way to get input from terminal into the script so I can iterate through the range within a for-loop?
This sounds like a question for google but I am obviously using the wrong patterns to get a useful answer. Most of similar looking issues on SO refer to brace and parameter expansion issues but this is not exactly the problem I have. However, to me it feels like the answer to this problem is going in a similar direction. I fail to understand why when a for-loop for assigning {1..7} to a variable works but doing the same like var={1..7} doesn't. Plz help -.-
EDIT: My bash version:
$ echo $BASH_VERSION
4.2.25(1)-release
EDIT2: The versatility of a brace expansion is very important to me. A possible solution should include the ability to define as many ranges as possible. Like I would like to be able to choose between backing up just 1 folder or a fixed range between f.ex 4-22 and even multiple options like folders 1,2,5,6-7

Brace expansion is not performed on the right-hand side of a variable, or on parameter expansion. Use a C-style for loop, with the user inputing the upper end of the range if necessary.
read upper
for ((i=1; i<=$upper; i++)); do
To input both a lower and upper bound separated by whitespace
read lower upper
for (i=$lower; i <= $upper; i++)); do
For an arbitrary set of values, just push the burden to the user to generate the appropriate list; don't try to implement your own parser to process something like 1,2,20-22:
while read p; do
rsync -a $p/* backup.$p
done
The input is one value per line, such as
1
2
20
21
22
Even if the user is using the shell, they can call your script with something like
printf '%s\n' 1 2 20..22 | backup.sh
It's easier for the user to generate the list than it is for you to safely parse a string describing the list.

The evil eval
$ var={1..7}
$ for i in $(eval echo $var); do echo $i; done
this also works,
$ var="1 2 {5..9}"
$ for i in $(eval echo $var); do echo $i; done
1
2
5
6
7
8
9
evil eval was a joke, that is, as long as you know what you're evaluating.
Or, with awk
$ echo "1 2 5-9 22-25" |
awk -v RS=' ' '/-/{split($0,a,"-"); while(a[1]<=a[2]) print a[1]++; next}1'
1
2
5
6
7
8
9
22
23
24
25

Output of command in Bash script to Drop-down box?

First off, I appreciate any and all help in answering this question.
I have a command in a bash script that will output the following:
255 254 253 252 ... 7 6 5 4 3 2 1
It is a specific list of numbers, beginning with the largest (which is what I would like), then going to the smallest. The dataset is space-delimited. The output above (except including all numbers), is what you would see if you ran this command in the terminal on a linux machine, or through a bash script.
I have configured my apache2 server to allow for cgi/bash through the cgi-bin directory. When I run this command in a bash file from the web, I get the expected output.
What I'm looking for is for a way to be able to put these numbers each as a separate entry in a drop-down box for selection, meaning the user can select one point of data (254, for example) from the drop down menu.
I'm not sure what I'm doing with this, so any help would be appreciated. I'm not sure if I need to convert the data into an array, or what. The drop down menu can be on the same page of the bash script, but wherever it is, it has to update it's list of numbers from the command every time it is run.
Thank you for your help.

I've always found this site useful when fiddling with shell scripts: http://tldp.org/LDP/abs/html/
you'll have to get your output into an array using some sort of string manipulation using the spaces as delimiters, then loop over that to build some html output - so the return value will basically just output your select box on the page where you execute your cgi/bash script.
-sean

Repeating the answer (since the original question was marked as duplicate):
you can write a bash for loop to do everything. This just prints out the elements:
for i in `seq 1 "${#x[*]}"`; do
echo "|${x[i]} |"
done
To get the alignment correct, you need to figure out the max length (one loop) and then print out the terms:
# w will be the length
w=0
for i in `seq 1 "${#x[*]}"`; do
if [ $w -lt ${#x[$i]} ]; then w=${#x[$i]}; fi
done
for i in `seq 1 $((w+2))`; do printf "%s" "-"; done
printf "\n"
for i in `seq 1 "${#x[*]}"`; do
printf "|%-$ws |\n" ${#x[$i]}
done
for i in `seq 1 $((w+2))`; do printf "%s" "-"; done
printf "\n"

Manipulating data text file with bash command?

I was given this text file, call stock.txt, the content of the text file is:
pepsi;drinks;3
fries;snacks;6
apple;fruits;9
baron;drinks;7
orange;fruits;2
chips;snacks;8
I will need to use bash-script to come up this output:
Total amount for drinks: 10
Total amount for snacks: 14
Total amount for fruits: 11
Total of everything: 35
My gut tells me I will need to use sed, group, grep and something else.
Where should I start?

I would break the exercise down into steps
Step 1: Read the file one line at a time
while read -r line
do
# do something with $line
done
Step 2: Pattern match (drinks, snacks, fruits) and do some simple arithmetic. This step requires that you tokenized each line which I'll leave an exercise for you to figure out.
if [[ "$line" =~ "drinks" ]]
then
echo "matched drinks"
.
.
.
fi

Pure Bash. A nice application for an associative array:
declare -A category # associative array
IFS=';'
while read name cate price ; do
((category[$cate]+=price))
done < stock.txt
sum=0
for cate in ${!category[#]}; do # loop over the indices
printf "Total amount of %s: %d\n" $cate ${category[$cate]}
((sum+=${category[$cate]}))
done
printf "Total amount of everything: %d\n" $sum

There is a short description here about processing comma separated files in bash here:
http://www.cyberciti.biz/faq/unix-linux-bash-read-comma-separated-cvsfile/
You could do something similar. Just change IFS from comma to semicolon.
Oh yeah, and a general hint for learning bash: man is your friend. Use this command to see manual pages for all (or most) of commands and utilities.
Example: man read shows the manual page for read command. On most systems it will be opened in less, so you should exit the manual by pressing q (may be funny, but it took me a while to figure that out)

The easy way to do this is using a hash table, which is supported directly by bash 4.x and of course can be found in awk and perl. If you don't have a hash table then you need to loop twice: once to collect the unique values of the second column, once to total.
There are many ways to do this. Here's a fun one which doesn't use awk, sed or perl. The only external utilities I've used here are cut, sort and uniq. You could even replace cut with a little more effort. In fact lines 5-9 could have been written more easily with grep, (grep $kind stock.txt) but I avoided that to show off the power of bash.
for kind in $(cut -d\; -f 2 stock.txt | sort | uniq) ; do
total=0
while read d ; do
total=$(( total+d ))
done < <(
while read line ; do
[[ $line =~ $kind ]] && echo $line
done < stock.txt | cut -d\; -f3
)
echo "Total amount for $kind: $total"
done
We lose the strict ordering of your original output here. An exercise for you might be to find a way not to do that.
Discussion:
The first line describes a sub-shell with a simple pipeline using cut. We read the third field from the stock.txt file, with fields delineated by ;, written \; here so the shell does not interpret it. The result is a newline-separated list of values from stock.txt. This is piped to sort, then uniq. This performs our "grouping" step, since the pipeline will output an alphabetic list of items from the second column but will only list each item once no matter how many times it appeared in the input file.
Also on the first line is a typical for loop: For each item resulting from the sub-shell we loop once, storing the value of the item in the variable kind. This is the other half of the grouping step, making sure that each "Total" output line occurs once.
On the second line total is initialized to zero so that it always resets whenever a new group is started.
The third line begins the 'totaling' loop, in which for the current kind we find the sum of its occurrences. here we declare that we will read the variable d in from stdin on each iteration of the loop.
On the fourth line the totaling actually occurs: Using shell arithmatic we add the value in d to the value in total.
Line five ends the while loop and then describes its input. We use shell input redirection via < to specify that the input to the loop, and thus to the read command, comes from a file. We then use process substitution to specify that the file will actually be the results of a command.
On the sixth line the command that will feed the while-read loop begins. It is itself another while-read loop, this time reading into the variable line. On the seventh line the test is performed via a conditional construct. Here we use [[ for its =~ operator, which is a pattern matching operator. We are testing to see whether $line matches our current $kind.
On the eighth line we end the inner while-read loop and specify that its input comes from the stock.txt file, then we pipe the output of the entire loop, which by now is simply all lines matching $kind, to cut and instruct it to show only the third field, which is the numeric field. On line nine we then end the process substitution command, the output of which is a newline-delineated list of numbers from lines which were of the group specified by kind.
Given that the total is now known and the kind is known it is a simple matter to print the results to the screen.

The below answer is OP's. As it was edited in the question itself and OP hasn't come back for 6 years, I am editing out the answer from the question and posting it as wiki here.
My answer, to get the total price, I use this:
...
PRICE=0
IFS=";" # new field separator, the end of line
while read name cate price
do
let PRICE=PRICE+$price
done < stock.txt
echo $PRICE
When I echo, its :35, which is correct. Now I will moving on using awk to get the sub-category result.
Whole Solution:
Thanks guys, I manage to do it myself. Here is my code:
#!/bin/bash
INPUT=stock.txt
PRICE=0
DRINKS=0
SNACKS=0
FRUITS=0
old_IFS=$IFS # save the field separator
IFS=";" # new field separator, the end of line
while read name cate price
do
if [ $cate = "drinks" ]; then
let DRINKS=DRINKS+$price
fi
if [ $cate = "snacks" ]; then
let SNACKS=SNACKS+$price
fi
if [ $cate = "fruits" ]; then
let FRUITS=FRUITS+$price
fi
# Total
let PRICE=PRICE+$price
done < $INPUT
echo -e "Drinks: " $DRINKS
echo -e "Snacks: " $SNACKS
echo -e "Fruits: " $FRUITS
echo -e "Price " $PRICE
IFS=$old_IFS

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio