How can I take sort bash arguments alphabetically?
$ ./script.sh bbb aaa ddd ccc
and put it into an array such that I now have an array {aaa, bbb, ccc, ddd}
You can do:
A=( $(sort <(printf "%s\n" "$#")) )
printf "%s\n" "${A[#]}"
aaa
bbb
ccc
ddd
It is using steps:
sort the arguments list i.e."$#"`
store output of sort in an array
Print the sorted array
I hope following 2 lines will help.
sorted=$(printf '%s\n' "$#"|sort)
echo $sorted
This will give you a sorted cmdline args.I wonder though why its needed :)
But anyway it will sort your cmdlines
Removed whatever was not required.
Here's an invocation that breaks all the other solutions proposed here:
./script.sh "foo bar" "*" "" $'baz\ncow'
Here's a piece of code that works correctly:
array=()
(( $# )) && while IFS= read -r -d '' var
do
array+=("$var")
done < <(printf "%s\0" "$#" | sort -z)
As there seem not appreciate my effort to reducing forks, there is a better solution than using IFS for parsing and setting a variable
Part 1: Very short and robust solution:
As suggested by #rici in a comment on another post, I add the -t argument to mapfile:
mapfile -t args < <(sort < <(printf "%s\n" "$#"))
This work with white space too.
Sample:
#!/bin/bash
mapfile args < <(sort < <(printf "%s\n" "$#"))
mapfile -t args < <(sort < <(printf "%s\n" "$#"))
declare -p args
for (( i=0 ; i<${#args[#]} ;i++));do
printf "%3d: %s\n" $i "${args[i]%$'\n'}"
printf "%3d: %s\n" $i "${args[i]}"
done
run sample:
/tmp/script ccc "a a" aaa ddd aa AA z aab
declare -a args='([0]="aa
" [1]="a a
" [2]="AA
" [3]="aaa
" [4]="aab
" [5]="ccc
" [6]="ddd
" [7]="z
")'
0: aa
1: a a
2: AA
3: aaa
4: aab
5: ccc
6: ddd
7: z
Part 2: Very quick: pure bash way (without forks!)
Nota: Of course, this is not the better, robust way of doing sort, but in many cases, this could efficiently be used.
As (at least) one guy seem prefer a to be sorted before aa, this is edited to replace z by 0.
This sample is limited to 1st 6 chars but you could replace 6 by bigger number but add same number of z.
#!/bin/bash
sep='§'
for i;do
a=${i//[^a-zA-Z0-9]/0}000000
args[36#${a:0:6}]+=${args[36#${a:0:6}]+$sep}${i}
done
IFS=$sep args=(${args[*]})
printf "%s\n" ${args[#]}
declare -p args
For case sensitivity, you could replace 36# by 64#:
Working sample:
#!/bin/bash
sep=§
base=64
chars=8
fillc=0
usage() {
cat <<eousage
Usage: $0 [-ai] [-p precision] [-s inner separator]
-a for sorting \`\`empty'' After (\`\`aa'' after \`\`aaa'')
-i for case Insensitive
-p NUM tell the number of characters to compare (default: $chars)
-s SEP let you precise inner separator, (default \`\`$sep'')
eousage
}
while getopts "iap:s:" opt;do case $opt in
a ) fillc=z ;;
i ) base=36 ;;
p ) chars=$OPTARG ;;
s ) sep=$OPTARG ;;
* ) usage ; exit 1 ;;
esac ; done ;
shift $[OPTIND-1]
printf -v cfill "%${chars}s"
cfill=${cfill// /$fillc}
for i;do
a=${i//[^a-zA-Z0-9]/$fillc}$cfill
idx=$[$base#${a:0:$chars}]
args[$idx]+=${args[$idx]+$sep}${i}
done
declare -p args
IFS=$sep args=(${args[*]})
declare -p args
for (( i=0 ; i++<${#args[#]} ;b));do
printf "%3d: %s\n" $i ${args[i-1]}
done
Run cases:
/tmp/script ccc aaa ddd aa AA z aab
declare -a args='([44667659878400]="aa" [44678397296640]="aaa"
[44679471038464]="aab" [53614076755968]="ccc" [58081916485632]="ddd"
[153931627888640]="z" [160803575562240]="AA")'
declare -a args='([0]="aa" [1]="aaa" [2]="aab" [3]="ccc" [4]="ddd"
[5]="z" [6]="AA")'
1: aa
2: aaa
3: aab
4: ccc
5: ddd
6: z
7: AA
Case insensitive:
/tmp/script -i ccc aaa ddd aa AA z aab
declare -a args='([805409464320]="aa§AA" [806014126080]="aaa"
[806074592256]="aab" [967216951296]="ccc" [1047818363904]="ddd"
[2742745743360]="z")'
declare -a args='([0]="aa" [1]="AA" [2]="aaa" [3]="aab" [4]="ccc"
[5]="ddd" [6]="z")'
1: aa
2: AA
3: aaa
4: aab
5: ccc
6: ddd
Empty sorted after:
/tmp/script -ia ccc aaa ddd aa AA z aab
declare -a args='([806074592255]="aaa" [806135058431]="aab"
[807586246655]="aa§AA" [967277417471]="ccc" [1047878830079]="ddd"
[2821109907455]="z")'
declare -a args='([0]="aaa" [1]="aab" [2]="aa" [3]="AA" [4]="ccc"
[5]="ddd" [6]="z")'
1: aaa
2: aab
3: aa
4: AA
5: ccc
6: ddd
7: z
precision: 1 chars:
/tmp/script -iap1 ccc aaa ddd aa AA z aab
declare -a args='([10]="aaa§aa§AA§aab" [12]="ccc" [13]="ddd" [35]="z")'
declare -a args='([0]="aaa" [1]="aa" [2]="AA" [3]="aab" [4]="ccc"
[5]="ddd" [6]="z")'
1: aaa
2: aa
3: AA
4: aab
5: ccc
6: ddd
7: z
and precision: 10 chars:
/tmp/script -p 10 ccc aaa ddd aa AA z aab
declare -a args='([182958734861926400]="aa" [183002715327037440]="aaa"
[183007113373548544]="aab" [219603258392444928]="ccc"
[237903529925148672]="ddd" [630503947831869440]="z"
[658651445502935040]="AA")'
declare -a args='([0]="aa" [1]="aaa" [2]="aab" [3]="ccc" [4]="ddd"
[5]="z" [6]="AA")'
1: aa
2: aaa
3: aab
4: ccc
5: ddd
6: z
7: AA
Whitespaces and other chars:
/tmp/script -is # ccc "a a" aaa ddd 'a*a' 'a§a' aa AA z aab
declare -a args='([784246302720]="a a#a*a#a§a" [805409464320]="aa#AA"
[806014126080]="aaa" [806074592256]="aab" [967216951296]="ccc"
[1047818363904]="ddd" [2742745743360]="z")'
declare -a args='([0]="a a" [1]="a*a" [2]="a§a" [3]="aa" [4]="AA"
[5]="aaa" [6]="aab" [7]="ccc" [8]="ddd" [9]="z")'
1: a a
2: a*a
3: a§a
4: aa
5: AA
6: aaa
7: aab
8: ccc
9: ddd
10: z
Related
I am trying to use bash to merge/combine all text files in a directory with the same prefix into one text file. Thank you :).
directory
111.txt
aaa
aaa
222_1.txt
bbb
222_2.txt
ccc
ccc
333_1.txt
aaa
333_2.txt
ccc
ccc
333_3.txt
bbb
desired
111.txt
aaa
aaa
222.txt
bbb
ccc
ccc
333.txt
aaa
ccc
ccc
bbb
bash
for file in `ls`|cut -d"_" -f1 ; do
cat ${file}_* > ${file}
done
This is a good use of an associative array as a set. Iterate over the file names, trimming the trailing _* from each name before adding it to the associative array. Then you can iterate over the array's keys, treating each one as a filename prefix.
# IMPORTANT: Assumes there are no suffix-less file names that contain a _
declare -A prefixes
for f in *; do
prefixes[${f%_*}]=
done
for f in "${!prefixes[#]}"; do
[ -f "$f".txt ] && continue # 111.txt doesn't need anything done
cat "$f"_* > "$f".txt
done
build a test environment just as you did
mkdir -p tmp/test
cd !$
touch {111,222,333}.{txt,_2.txt,_3.txt}
cat > 111.txt
aaa
aaa
and so on
then you know how to increment filnames :
for i in $( seq 1 3 ) ; do echo $i* ; done
111._2.txt 111._3.txt 111.txt
222._2.txt 222._3.txt 222.txt
333._2.txt 333._3.txt 333.txt
so you make your resulting files and here is the answer of mechanism to your needs :
for i in $( seq 1 9 ) ; do cat $i* >> new.$i.txt ; done
and finaly
ls -l new.[1-3]*
-rw-r--r-- 1 francois francois 34 Aug 4 14:04 new.1.txt
-rw-r--r-- 1 francois francois 34 Aug 4 14:04 new.2.txt
-rw-r--r-- 1 francois francois 34 Aug 4 14:04 new.3.txt
all 3* contents are in new.".txt for example here.
you only have to set the desired file destination to add in the content & if needed but not set in initial question a sorting of datas by alphabetic order or numerical... etc
I have a text file like this:
id ; lorem ipsum fgdg df gdg
id ; lorem ipsum fgdg df gdg
id ; lorem ipsum fgdg df gdg
id ; lorem ipsum fgdg df gdg
id ; lorem ipsum fgdg df gdg
And if 2 id are similar, I want to separate to line where 2 id are similar and the line that are unique.
uniquefile contains the lines with unique id.
notuniquefile contains the lines that don't have one.
I already found a way to almost do it but only with the first word. Basically it is just isolating the id and deleting the rest of line.
Command 1: isolating unique id (but missing the line):
awk -F ";" '{!seen[$1]++};END{for(i in seen) if(seen[i]==1)print i }' originfile >> uniquefile
Command 2: isolating the not unique id (but missing the line and losing the "lorem ipsum" content that can be different depending on the line):
awk -F ":" '{!seen[$1]++;!ligne$0};END{for(i in seen) if(seen[i]>1)print i }' originfile >> notuniquefile
So in a perfect world I would like you to help me obtain this type of result:
originfile:
1 ; toto
2 ; toto
3 ; toto
3 ; titi
4 ; titi
uniquefile:
1 ; toto
2 ; toto
4 ; titi
notuniquefile:
3 ; toto
3 ; titi
Have a good day.
Yet another method with just two unix commands, that works if your id fields always have the same length (let's assume they are one character in length like in my testdata, but it of course works also for longer fields):
# feed the testfile.txt sorted to uniq
# -w means: only compare the first 1 character of each line
# -D means: output only duplicate lines (fully not just one per group)
sort testfile.txt | uniq -w 1 -D > duplicates.txt
# then filter out all duplicate lines from the text file
# to just let the unique files slip through
# -v means: negate the pattern
# -F means: use fixed strings instead of regex
# -f means: load the patterns from a file
grep -v -F -f duplicates.txt testfile.txt > unique.txt
And the output is (for the same input lines as used in my other post):
$uniq -w 2 -D testfile.txt
2;line B
2;line C
3;line D
3;line E
3;line F
and:
$ grep -v -F -f duplicates.txt testfile.txt
1;line A
4;line G
Btw. in case you want to avoid the grep, you can also store the output of the sort (lets say in sorted_file.txt) and replace the second line by
uniq -w 1 -u sorted_file.txt > unique.txt
where the number behind -w again is the length of your id field in characters.
untested: process the file twice: first to count the ids, second to decide where to print the record:
awk -F';' '
NR == FNR {count[$1]++; next}
count[$1] == 1 {print > "uniquefile"}
count[$1] > 1 {print > "nonuniquefile"}
' file file
Here is a small Python script which does this:
#!/usr/bin/env python3
import sys
unique_markers = []
unique_lines = []
nonunique_markers = set()
for line in sys.stdin:
marker = line.split(' ')[0]
if marker in nonunique_markers:
# found a line which is not unique
print(line, end='', file=sys.stderr)
elif marker in unique_markers:
# found a double
index = unique_markers.index(marker)
print(unique_lines[index], end='', file=sys.stderr)
print(line, end='', file=sys.stderr)
del unique_markers[index]
del unique_lines[index]
nonunique_markers.add(marker)
else:
# marker not known yet
unique_markers.append(marker)
unique_lines.append(line)
for line in unique_lines:
print(line, end='', file=sys.stdout)
It is not a pure shell solution (which would be cumbersome and hard to maintain IMHO), but maybe it helps you.
Call it like this:
separate_uniq.py < original.txt > uniq.txt 2> nonuniq.txt
With a pure bash script, you could do it like this:
duplicate_file="duplicates.txt"
unique_file="unique.txt"
file="${unique_file}"
rm $duplicate_file $unique_file
last_id=""
cat testfile.txt | sort | (
while IFS=";" read id line ; do
echo $id
if [[ "${last_id}" != "" ]] ; then
if [[ "${last_id}" != "${id}" ]] ; then
echo "${last_id};${last_line}" >> "${file}"
file="${unique_file}"
else
file="${duplicate_file}"
echo "${last_id};${last_line}" >> "${file}"
fi
fi
last_line="${line}"
last_id="${id}"
done
echo "${last_id};${last_line}" >> "${file}"
)
With an inputfile as:
1;line A
2;line B
2;line C
3;line D
3;line E
3;line F
4;line G
It outputs:
$ cat duplicates.txt
2;line B
2;line C
3;line D
3;line E
3;line F
work$ cat unique.txt
1;line A
4;line G
Im having this table structure (assume that the delimiters are tabs):
AAA BBBB CCC
01 Item Description here
02 Meti A very very veeeery long description which will easily extend the recommended output width of 80 characters.
03 Etim Last description
What i want is this:
AAA BBBB CCC
01 Item Description here
02 Meti A very very veeeery
long description which
will easily extend the
recommended output width
of 80 characters.
03 Etim Last description
That means I want to split $3 into an array of strings with predefined WIDTH, where the first element is appended "normally" to the current line and all subsequent elements get a new line width identation according to the padding of the first two columns (padding could also be fixed if thats easier).
Alternatively, the text in $0 could be split by a GLOBAL_WIDTH (e.g. 80 chars) into first string and "rest" -> first string gets printed "normally" with printf, the rest is split by GLOBAL_WIDTH - (COLPAD1 + COLPAD2) and appended width new lines as above.
I tried to work with fmt and fold after my awk formatting (which is basically just putting headings to the table) but they do not reflect awk's field perceptance of course.
How can I achieve this using bash tools and / or awk?
First build a test file (called file.txt):
echo "AA BBBB CCC
01 Item Description here
02 Meti A very very veeeery long description which will easily extend the recommended output width of 80 characters.
03 Etim Last description" > file.txt
Now the script (called ./split-columns.sh):
#!/bin/bash
FILE=$1
#find position of 3rd column (starting with 'CCC')
padding=`cat $FILE | head -n1 | grep -aob 'CCC' | grep -oE '[0-9]+'`
paddingstr=`printf "%-${padding}s" ' '`
#set max length
maxcolsize=50
maxlen=$(($padding + $maxcolsize))
cat $FILE | while read line; do
#split the line only if it exceeds the desired length
if [[ ${#line} -gt $maxlen ]] ; then
echo "$line" | fmt -s -w$maxcolsize - | head -n1
echo "$line" | fmt -s -w$maxcolsize - | tail -n+2 | sed "s/^/$paddingstr/"
else
echo "$line";
fi;
done;
Finally run it with the file as a single argument
./split-columns.sh file.txt > fixed-width-file.txt
Output will be:
AA BBBB CCC
01 Item Description here
02 Meti A very very veeeery long description
which will easily extend the recommended output
width of 80 characters.
03 Etim Last description
You can try Perl one-liner
perl -lpe ' s/(.{20,}?)\s/$1\n\t /g ' file
with the given inputs
$ cat thurse.txt
AAA BBBB CCC
01 Item Description here
02 Meti A very very veeeery long description which will easily extend the recommended output width of 80 characters.
03 Etim Last description
$ perl -lpe ' s/(.{20,}?)\s/$1\n\t /g ' thurse.txt
AAA BBBB CCC
01 Item Description
here
02 Meti A very very
veeeery long description
which will easily extend
the recommended output
width of 80 characters.
03 Etim Last description
$
If you want to try with length window of 30/40/50
$ perl -lpe ' s/(.{30,}?)\s/$1\n\t /g ' thurse.txt
AAA BBBB CCC
01 Item Description here
02 Meti A very very veeeery
long description which will easily
extend the recommended output width
of 80 characters.
03 Etim Last description
$ perl -lpe ' s/(.{40,}?)\s/$1\n\t /g ' thurse.txt
AAA BBBB CCC
01 Item Description here
02 Meti A very very veeeery long description
which will easily extend the recommended
output width of 80 characters.
03 Etim Last description
$ perl -lpe ' s/(.{50,}?)\s/$1\n\t /g ' thurse.txt
AAA BBBB CCC
01 Item Description here
02 Meti A very very veeeery long description which
will easily extend the recommended output width of
80 characters.
03 Etim Last description
$
#!/usr/bin/awk -f
# Read standard input, which should be a file of lines each line
# containing tab-separated strings. The string values may be very long.
# Columnate the output by
# wrapping long strings onto multiple lines within each field's
# specified length.
# Arguments are numeric field lengths. If an input line contains more
# values than the # of field lengths supplied, the last field length will
# be re-used.
#
# arguments are the field lengths
# invoke like this: wrapcolumns 30 40 40
BEGIN {
FS=" ";
for (i = 1; i < ARGC; i++) {
fieldlengths[i-1] = ARGV[i];
ARGV[i]="";
}
if (ARGC < 2) {
print "usage: wrapcolumns length1 ... lengthn";
exit;
}
}
function blanks(n) {
result = " ";
while (length(result) < n) {
result = result result;
}
return substr(result, 1, n);
}
{
# ARGC - 1 is the length of the fieldlengths array
# So ARGC - 2 is the index of its last element because its zero-origin.
# if the input line has more fields than the fieldlengths array,
# use the last element.
# any nonempty fields left?
gotanyleft = 1;
while (gotanyleft == 1) {
gotanyleft = 0;
for (i = 1; i <= NF; i++) {
# length of the current field
len = (ARGC - 2 < i) ? (fieldlengths[ARGC - 2]) : fieldlengths[i - 1];
# print that much of the current field and remove that much from the front
printf "%s", substr($(i) blanks(len), 1, len) ":::"
$(i) = substr($(i), len + 1);
if ($(i) != "") {
gotanyleft = 1;
}
}
print ""
}
}
loop-free awk-solution :
{m,g}awk -v ______="${WIDTH}" 'BEGIN {
1 OFS = ""
1 FS = "\t"
1 ___ = "\32\23"
1 __ = sprintf("\n%*s",
(_+=_^=_<_)+_^!_+(_+=_____=_+=_+_)+_____,__)
1 ____ = sprintf("%*s",______-length(__),"")
1 gsub(".",".",____)
sub("[.].......$","..?.?.?.?.?.?.?.[ ]",____)
1 ______ = _
} $!NF = sprintf("%.*s %*s %-*s %-s", _<_,_= $NF,_____,
$2,______, $--NF, substr("",gsub(____,
("&")___,_) * gsub("("(___)")+$","",_),
__ * gsub( (___), (__),_) )_)'
|
AAA BBBB CCC
01 Item Description here
02 Meti A very very veeeery long description which
will easily extend the recommended output
width of 80 characters.
03 Etim Last description
I have a file which looks like this:
aaa 15
aaa 12
bbb 131
bbb 12
ccc 123
ddddd 1
ddddd 2
ddddd 3
I would like to get a sum for each unique element in the left side like this and also calculate the rounded percentage each of this represents out of the total:
aaa 27 - 9%
bbb 143 - 48%
ccc 123 - 41%
ddddd 6 - 2%
How would I accomplish this in BASH?
Since I cannot find any proper duplicate, I am posting an answer. Feel free to report a good one, so I will delete my answer and close as duplicate.
awk '{count[$1]+=$2} END {for (i in count) print i, count[i]}' file
This creates an array count[key]=value that keeps track of the value for a given key. Finally, it loops through the values and prints them.
It returns:
aaa 27
ccc 123
bbb 143
ddddd 6
To show percentages, just keep track of the total sum and divide accordingly:
awk '{tot+=$2; count[$1]+=$2}
END {for (i in count)
printf "%s %d - %d%%\n", i, count[i], (count[i]/tot)*100
}' file
So you can get:
aaa 27 - 9%
ccc 123 - 41%
bbb 143 - 47%
ddddd 6 - 2%
Since you asked for Bash, here's a Bash≥4 solution (needs Bash≥4 for associative arrays):
#!/bin/bash
declare -Ai sums
while read -r ref num; do
# check that num is a valid number or continue
[[ $num = +([[:digit:]]) ]] || continue
sums[$ref]+=$(( 10#$num ))
done < file
for ref in "${!sums[#]}"; do
printf '%s %d\n' "$ref" "${sums[$ref]}"
done
The output is not sorted; pipe through sort (or use a sorting algorithm) to sort it.
So now you added the percentage requirement! I hope you're not going to edit the question further adding more and more stuff…
Once we have the associative array sums, we can sum the sums:
sum=0
for x in "${sums[#]}"; do ((sum+=x)); done
and print the percentage:
for ref in "${!sums[#]}"; do
printf '%s %d - %d%%\n' "$ref" "${sums[$ref]}" "$((100*${sums[$ref]}/sum))"
done
And a solution for bash 3, without associative arrays:
while read key value
do
keys=$(echo -e "$keys\n$key")
var=data_$key
(($var=${!var}+$value))
((total=total+$value))
done < input_file
unique=$(echo "${keys:1}" | sort -u)
while read key
do
var=data_$key
((percentage=100*${!var} / total))
echo "$key $percentage%"
done <<EOF
$unique
EOF
Changed to use indirect variable references, rather than the more traditional eval.
I have a bash script which runs as follows:
./script.sh var1 var2 var3.... varN varN+1
What i need to do is take first 2 variables, last 2 variables and insert them into file. The variables between 2 last and 2 first should be passed as a whole string to another file. How can this be done in bash?
Of course i can define a special variable with "read var" directive and then input this whole string from keyboard but my objective is to pass them from the script input
argc=$#
argv=("$#")
first_two="${argv[#]:0:2}"
last_two="${argv[#]:$argc-2:2}"
others="${argv[#]:2:$argc-4}"
#!/bin/bash
# first two
echo "${#:1:2}"
# last two
echo "${#:(-2):2}"
# middle
echo "${#:3:(($# - 4))}"
so sample
./script aaa bbb ccc ddd eee fff gggg hhhh
aaa bbb
gggg hhhh
ccc ddd eee fff