sort fields within a line - sorting

input:
87 6,1,9,13
3 9,4,14,35,38,13
31 3,1,6,5
(i.e. a tab-delimited column where the second field is a comma-delimited list of unordered integers.)
desired output:
87 1,6,9,13
3 4,9,13,14,35,38
31 1,3,5,6
Goal:
for each line separately, sort the comma-separated list appearing in the second field. i.e. sort the 2nd column within for each line separately.
Note: the rows should not be re-ordered.
What I've tried:
sort - Since the order of the rows should not change, then sort is simply not applicable.
awk - since the greater file is tab-delimited, not comma-delimited, it cannot parse the second column as multiple "sub-fields"
There might be a perl way? I know nothing about perl though...

It can be done by simple perl oneliner:
perl -F'/\t/' -alne'$s=join",",sort{$a<=>$b}split",",$F[1];print"$F[0]\t$s"'
and shell (bash) one as well:
while read a b;do echo -e "$a\t$(echo $b|tr , '\n'|sort -n|tr '\n' ,|sed 's/,$//')"; done

while read LINE; do
echo -e "$(echo $LINE | awk '{print $1}')\t$(echo $LINE | awk '{print $2}' | tr ',' '\n' | sort -n | paste -s -d,)";
done < input
Obviously a lot going on here so here we go:
input contains your input
$(echo $LINE | awk '{print $1}') prints the first field, pretty straightforward
$(echo $LINE | awk '{print $2}' | tr ',' '\n' | sort -n | paste -s -d,) prints the second field, but breaks it down into lines by replacing the commas by newlines (tr ',' '\n'), then sort numerically, then assemble the lines back to comma-delimited values (paste -s -d,).
$ cat input
87 6,1,9,13
3 9,4,14,35,38,13
31 3,1,6,5
$ while read LINE; do echo -e "$(echo $LINE | awk '{print $1}')\t$(echo $LINE | awk '{print $2}' | tr ',' '\n' | sort -n | paste -s -d,)"; done < input
87 1,6,9,13
3 4,9,13,14,35,38
31 1,3,5,6

Another way:
echo happybirthday|awk '{split($0,A);asort(A); for (i=1;i<length(A);i++) {print A[i]}}' FS=""|tr -d '\n';echo aabdhhipprty
I didn't know how to get back to this page after recovering login info, so am posting as a guest.

Related

How to grab fields in inverted commas

I have a text file which contains the following lines:
"user","password_last_changed","expires_in"
"jeffrey","2021-09-21 12:54:26","90 days"
"root","2021-09-21 11:06:57","0 days"
How can I grab two fields jeffrey and 90 days from inverted commas and save in a variable.
If awk is an option, you could save an array and then save the elements as individual variables.
$ IFS="\"" read -ra var <<< $(awk -F, '/jeffrey/{ print $1, $NF }' input_file)
$ $ var2="${var[3]}"
$ echo "$var2"
90 days
$ var1="${var[1]}"
$ echo "$var1"
jeffrey
while read -r line; do # read in line by line
name=$(echo $line | awk -F, ' { print $1} ' | sed 's/"//g') # grap first col and strip "
expire=$(echo $line | awk -F, ' { print $3} '| sed 's/"//g') # grap third col and strip "
echo "$name" "$expire" # do your business
done < yourfile.txt
IFS=","
arr=( $(cat txt | head -2 | tail -1 | cut -d, -f 1,3 | tr -d '"') )
echo "${arr[0]}"
echo "${arr[1]}"
The result is into an array, you can access to the elements by index.
May be this below method will help you using
sed and awk command
#!/bin/sh
username=$(sed -n '/jeffrey/p' demo.txt | awk -F',' '{print $1}')
echo "$username"
expires_in=$(sed -n '/jeffrey/p' demo.txt | awk -F',' '{print $3}')
echo "$expires_in"
Output :
jeffrey
90 days
Note :
This above method will work if their is only distinct username
As far i know username are not duplicate

How to get nth line of a file in bash?

I want to extract from a file named datax.txt the second line being :
0/0/0/0/0/0 | 0/0/0/0/0/0 | 0/0/0/0/0/0
And then I want to store in 3 variables the 3 sequences 0/0/0/0/0/0.
How am I supposed to do?
Read the 2nd line into variables a,b and c.
read a b c <<< $(awk -F'|' 'NR==2{print $1 $2 $3}' datax)
the keys is to split the problem in two:
you want to get the nth line of a file -> see here
you want to split a line in chunks according to a delimiter -> that's the job of many tools, cut is one of them
For future questions, be sure to include a more complete dataset, here is one for now. I changed a bit the second line so that we can verify that we got the right column:
f.txt
4/4/4/4/4/4 | 4/4/4/4/4/4 | 4/4/4/4/4/4
0/0/0/0/a/0 | 0/0/0/0/b/0 | 0/0/0/0/c/0
8/8/8/8/8/8 | 8/8/8/8/8/8 | 8/8/8/8/8/8
8/8/8/8/8/8 | 8/8/8/8/8/8 | 8/8/8/8/8/8
Then a proper script building on the two key actions described above:
extract.bash
file=$1
target_line=2
# get the n-th line
# https://stackoverflow.com/questions/6022384/bash-tool-to-get-nth-line-from-a-file
line=$(cat $file | head -n $target_line | tail -1)
# get the n-th field on a line, using delimiter '|'
var1=$(echo $line | cut --delimiter='|' --fields=1)
echo $var1
var2=$(echo $line | cut --delimiter='|' --fields=2)
echo $var2
var3=$(echo $line | cut --delimiter='|' --fields=3)
echo $var3
aaand:
$ ./extract.bash f.txt
0/0/0/0/a/0
0/0/0/0/b/0
0/0/0/0/c/0
Please try the following:
IFS='|' read a b c < <(sed -n 2P < datax | tr -d ' ')
Then the variables a, b and c are assigned to each field of the 2nd line.
You can use sed to print a specific line of a file, so for your example on the second line:
sed -n -e 2p ./datax
Set the output of the sed to be a variable:
Var=$(sed -n -e 2p ./datax)
Then split the string into the 3 variables you need:
A="$(echo $Var | cut -d'|' -f1)"
B="$(echo $Var | cut -d'|' -f2)"
C="$(echo $Var | cut -d'|' -f3)"

what does this bash script line of code mean

I am new to shell scripting and I found following line of code in a given script.
Could someone explain me with an example what the following line of code means
Path=`echo $line | awk -F '|' '{print $1}'`
echo $line will print the value of the variable $line, the | symbol means that the output of this will be passed (or piped) to another program/command/script. I will not attempt to explain awk here, but what is done above is that the output from the echo $line is taken and processed with it.
the option -FS as per awk man page means
-F fs Use fs for the input field separator
so the string after it will be used to split the input string given to awk into different fields. Example, you variable $line has a value of a|b it will be split into two fields a and b. What is to be done with this is specified within the '{}' expression.
Again, what can be done in there is next to infinite, here the only thing that is done is to print the first field which can be accessed with $1, or a in the above example ($2 would be b as can be guessed).
Finally, the output of this whole operation is then stored in the variable Path.
to summarize:
line="a|b"
echo $line | awk -F '|' '{print $1}'
> a
Path=`echo $line | awk -F '|' '{print $1}'`
echo $Path
> a
echo $line | awk -F '|' '{print $1}'
Explanation:
echo -> display a line of text
$line -> parameter expansion read the line
| -> A pipeline is a sequence of one or more commands separated by one of the control operators |
awk -> Invoke awk program
-F '|' -> Field separator as | for the data feed
'{print $1}' -> Print the first field
Example
echo 'a|b|c' | awk -F '|' '{print $1}'
will print a
I think this is just a complicated way to express
echo ${line%%|*}
i.e. write to stdout the part of the content of the variable line which goes up to - but not including - the first vertical bar.
Path=`echo $line | awk -F '|' '{print $1}'`
^ ^ ^ ^
| | | |
| | | print 1st column
| | |
| | input field separator
| |
| echo variable line
|
variable Path
-F'|' - by default awk splits record/line/row into columns by single space, but with |, awk splits by pipe
Above one can be written as
Path=$( awk -F '|' '{ print $1 }' <<< "$line" )
Suppose say
$ line="1|2|3"
$ Path=$( awk -F '|' '{ print $1 }' <<< "$line" )
$ echo $Path; # you get first column
1
Same as
$ Path=$( cut -d'|' -f1 <<< "$line" )
$ echo $Path;
1
the default field separator is ' ', if you have -F , means change default separator to '|'

Bash math - Dividing a bunch of rows using for statement

Example file:
25 Firstname1 Lastname1 domain1.com planname #1.00 USD Monthly Active 04/24/2016 Edit
1068 Firstname2 Lastname2 domain2.com planname #7.95 USD Annually Active 05/09/2016 Edit
3888 Firstname3 Lastname3 domain3.com planname #19.95 USD Biennially Active 05/04/2016 Edit
I am extracting just the price and billing cycle and am converting the billing cycles into numerical value this way I can divide the price by the billing cycle to get a cost per month.
When using the for statement, its adding line breaks which is breaking the math.
Code:
for i in `cat asd | cut -d "#" -f 2 | awk '{print $1, $3}' | sed 's/Monthly/\/ 1/g' | sed 's/Annually/\/ 12/g' | sed 's/Biennially/\/ 24/g' |grep -Ev 0.00` ; do echo $i | bc -l' ; done
I would prefer to be able to get 1 answer meaning all the rows get divided up then added together to get one final answer.
All those calls to cat, cut, awk, sed, grep and bc - what a waste.
This is a mis-named post, because you are not using Bash to do any calculations. The reason is that bash, unlike korn shell (ksh), does not support floating point. So you fall back to utilities like bc. Hold on though, awk supports floating point as well.
awk is a programming language in its own right. This just uses one instance of awk. I have embedded it inside a bash script because you are probably doing other stuff, but with a little adjustment it could be stand-alone with #!/bin/awk at the top:
infile='asd'
# -f - means "read the program from stdin"
# << '_END_' is a here document. Redirect stdin from here to the label _END_
awk -f - "$infile" << '_END_'
BEGIN {
# an associative array for the billing cycles
cycles["Monthly"] = 1
cycles["Annually"] = 12
cycles["Biennially"] = 24
}
{
sub(/#/,"",$6) # Remove the # from the amount
total += $6/cycles[$8] # divide amount by the billing cycle, add to total
}
END { print total }
_END_
Don't you think this is simpler to understand and maintain? It's also more efficient. This awk script is probably a good exercise for an awk 101 training course.
You could do something like this: (If you are totally set on a single line)
cat asd | cut -d "#" -f 2 | awk '{print $1, $3}' | sed 's/Monthly/\/ 1/g' | sed 's/Annually/\/ 12/g' | sed 's/Biennially/\/ 24/g' | grep -Ev 0.00 | while IFS= read -r line; do echo "$line" | bc -l; done | tr '\n' '+' | sed 's/+$/\n/' | bc -l
But this would be way more clear:
tmp=$(mktemp)
cat asd | cut -d "#" -f 2 | awk '{print $1, $3}' | sed 's/Monthly/\/ 1/g' | sed 's/Annually/\/ 12/g' | sed 's/Biennially/\/ 24/g' | grep -Ev 0.00 > $tmp
tmp2=$(mktemp)
cat $tmp | while IFS= read -r line; do
echo "$line" | bc -l >> $tmp2
done
# Actual output
cat $tmp2 | tr '\n' '+' | sed 's/+$/\n/' | bc -l
rm $tmp $tmp2

How to get first and last element from a csv column

I have a csv file with the following format:
Time, Field1, Field2,
1000, 1, 2,
1001, 3, 4,
1002, 5, 6,
I want to get the first and last element from the time column and store them in variables in my bash script.
So, based in this example I need:
$start=1000
$end=1002
How can I do this?
You have a lot of alternatives. Here are some of them:
Using head, tail and cut
$start=$(head -n2 file.csv | tail -n1 | cut -d',' -f1)
$end=$(tail -n1 file.csv | cut -d',' -f1)
Using awk
$start=$(awk -F',' 'NR==2{print $1}' file.csv)
$end=$(awk -F',' 'END{print $1}' file.csv)
One-Liner using awk (thanks to this answer)
read start finish <<< $(awk -F',' 'NR==2{print $1}END{print $1}' file.csv)
Another One-Liner using awk
read -d'\n' start finish < <(awk -F',' 'NR==2{print $1}END{print $1}' file.csv)
You can use a while loop like this:
while IFS=',' read -r c _; do
((end=c))
((start==0 && c>0)) && start=$c
done < file.csv
Check variables:
declare -p start end
declare -- start="1000"
declare -- end="1002"
Also, try this:
start_end(){
start=$(cat csv.file | head -n +2 | tail -n 1 | awk -F ',' '{print $1}')
end=$(cat csv.file | tail -n 1 | awk -F ',' '{print $1}')
}
You can use sed -n 's/,.*//;2p;$p' file.csv to extract the first and last fields from the first column. From its output, you can separate each of them and read it into a variable like so:
{
read start
read end
} < <(sed -n 's/,.*//;2p;$p' file.csv)
The first read reads the first line into the variable $start, while the second read reads the second line of the output into the variable $end.

Resources