Align around a given character in bash - bash

Is there an easy way to align multiple rows of text about a single character, similar to this question, but in bash.
Also open to zsh solutions.
What I have:
aaa:aaaaaaaa
bb:bbb
cccccccccccc:cc
d:d
What I want:
aaa:aaaaaaaa
bb:bbb
cccccccccccc:cc
d:d
Preferably the output can be piped out and retain its layout too.

You can try with column and gnu sed
column -t -s':' infile | sed -E 's/(\S+)(\s{0,})( )(.*)/\2\1:\4/'

The shell itself does not seem like a particularly suitable tool for this task. Using an external tool makes for a solution which is portable between shells. Here is a simple Awk solution.
awk -F ':' '{ a[++n] = $1; b[n] = $2; if (length($1) > max) max = length($1) }
END { for (i=1; i<=n; ++i) printf "%" max "s:%s\n", a[i], b[i] }'
Demo: https://ideone.com/Eaebhh
This stores the input file in memory; if you need to process large amount of text, it would probably be better to split this into a two-pass script (first pass, just read all the lines to get max, then change the END block to actually print output from the second pass), which then requires the input to be seekable.

Related

bash command for group by count

I have a file in the following format
abc|1
def|2
abc|8
def|3
abc|5
xyz|3
I need to group by these words in the first column and sum the value of the second column. For instance, the output of this file should be
abc|14
def|5
xyz|3
Explanation: the corresponding values for word "abc" are 1, 8, and 5. By adding these numbers, the sum comes out to be 14 and the output becomes "abc|14". Similarly, for word "def", the corresponding values are 2 and 3. Summing up these, the final output comes out to be "def|5".
Thank you very much for the help :)
I tried the following command
awk -F "|" '{arr[$1]+=$2} END {for (i in arr) {print i"|"arr[i]}}' filename
another command which I found was
awk -F "," 'BEGIN { FS=OFS=SUBSEP=","}{arr[$1]+=$2 }END {for (i in arr) print i,arr[i]}' filename
Both didn't show me the intended results. Although I'm also in doubt of the working of these commands as well.
Short GNU datamash solution:
datamash -s -t\| -g1 sum 2 < filename
The output:
abc|14
def|5
xyz|3
-t\| - field separator
-g1 - group by the 1st column
sum 2 - sum up values of the 2nd column
I will just add an answer to fix the sorting issue you had, in your Awk logic, you don't need to use sort/uniq piped to the output of Awk, but process in Awk itself.
Referring to GNU Awk Using Predefined Array Scanning Orders with gawk, you can use the PROCINFO["sorted_in"] variable(gawk specific) to control how you want Awk to sort your final output.
Referring to the section below,
#ind_str_asc
Order by indices in ascending order compared as strings; this is the most basic sort. (Internally, array indices are always strings, so with a[2*5] = 1 the index is 10 rather than numeric 10.)
So using this in your requirement in the END clause just do,
END{PROCINFO["sorted_in"]="#ind_str_asc"; for (i in unique) print i,unique[i]}
with your full command being,
awk '
BEGIN{FS=OFS="|"}{
unique[$1]+=$2;
next
}
END{
PROCINFO["sorted_in"]="#ind_str_asc";
for (i in unique)
print i,unique[i]
}' file
awk -F\| '{ arry[$1]+=$2 } END { asorti(arry,arry2);for (i in arry2) { print arry2[i]"|"arry[arry2[i]]} }' filename
Your initial solution should work apart from the issue with sort. Use asorti function to sort the indices from arry to arry2 and then process these in the loop.

How to use output of a command inside an awk command?

I want to print out the last update of a log file and nothing above it (old logs). Every 5 minutes the log is updated/appended to, and there is no option to overwrite instead of append. The amount of lines per update don't vary now, but I don't want to have to change the script if and when new fields are added. Each appendage starts with "Date: ...."
This is my solution so far. I'm finding the line number of the last occurrence of "Date" and then trying to send that to "awk 'NR>line_num_here filename" -
line=$(grep -n Date stats.log | tail -1 | cut --delimiter=':' --fields=1) | awk "NR>$line" file.log
However, I cannot update $line! It always holds the very first value from the very first time I ran the script. Is there a way to correctly update $line? Or are there any other ways to do this? Maybe a way to directly pipe into awk instead of making a variable?
The problem in your solution is that you need to replace the pipe in front of awk by a ;. These are two separate commands which would normally appear on two separate lines:
line=$(...)
awk -v "NR>$line" file
However, you can separate them by a ; if the should appear on the same line:
line=$(...); awk -v "NR>$line" file
But anyway you can significantly simplify the command. Simply use twice awk twice, like this:
awk -v ln="$(awk '/Date/{l=NR}END{print l}' a.log)" 'NR>ln' a.log
I'm using
awk '/Date/{l=NR}END{print l}' a.log
to obtain the line number of the last occurrence of Date. This value get's passed via -v ln=... to the outer awk command.
Here's a way you could do it, in one invocation of awk and only reading the file once:
awk '/Date/ { n = 1 } { a[n++] = $0 } END { for (i = 1; i < n; ++i) print a[i] }' file
This writes each line to an array a, resetting the counter n back to 1 every time the pattern /Date/ matches. It then loops through the array once the file has been read, printing all the most recently saved values.

Using "awk" to find a string between two other specific strings:

I have a large output of text that'll include several lines like this:
sending:WHATIWANT:output
How would I use awk to make it so that this output would ONLY include WHATIWANT on each line?
edit: there is a changing amount of text before and after WHATIWANT so something like awk -F: '{print $2}' would not always work
From what you mention in the comments, this should do it:
perl -n -e'/sending:([^:]+):output/ && print $1' input_file
This runs a simple regex match line-by-line, capturing the interesting part and then printing it. It assumes that WHATIWANT does not contain the character :
If for some reason you absolutely must use awk(1), then I think you don't have much choice but to do this:
awk -F: '{ for (i = 2; i < NF; i++) if ($(i-1) == "sending" && $(i+1) == "output") print $i }' input_file
It basically splits each line by : and iterates through every field, comparing the left and right fields until it finds one that is between sending and output. Again, it assumes that WHATIWANT does not have a :
Can't you just use sed?
echo "asfasfdsf__sending:WHATIWANT:output__asdfadas" | sed -n 's/.*sending\:\([a-zA-Z0-9]*\)\:output.*/\1/p'
Gives you "WHATIWANT"

AWK array parsing issue

My two input files are pipe separated.
File 1 :
a|b|c|d|1|44
File 2 :
44|ab|cd|1
I want to store all my values of first file in array.
awk -F\| 'FNR==NR {a[$6]=$0;next}'
So if I store the above way is it possible to interpret array; say I want to know $3 of File 1. How can I get tat from a[].
Also will I be able to access array values if I come out of that awk?
Thanks
I'll answer the question as it is stated, but I have to wonder whether it is complete. You state that you have a second input file, but it doesn't play a role in your actual question.
1) It would probably be most sensible to store the fields individually, as in
awk -F \| '{ for(i = 1; i < NF; ++i) a[$NF,i] = $i } END { print a[44,3] }' filename
See here for details on multidimensional arrays in awk. You could also use the split function:
awk -F \| '{ a[$NF] = $0 } END { split(a[44], fields); print fields[3] }'
but I don't see the sense in it here.
2) No. At most you can print the data in a way that the surrounding shell understands and use command substitution to build a shell array from it, but POSIX shell doesn't know arrays at all, and bash only knows one-dimensional arrays. If you require that sort of functionality, you should probably use a more powerful scripting language such as Perl or Python.
If, any I'm wildly guessing here, you want to use the array built from the first file while processing the second, you don't have to quit awk for this. A common pattern is
awk -F \| 'FNR == NR { for(i = 1; i < NF; ++i) { a[$NF,i] = $i }; next } { code for the second file here }' file1 file2
Here FNR == NR is a condition that is only true when the first file is processed (the number of the record in the current file is the same as the number of the record overall; this is only true in the first file).
To keep it simple, you can reach your goal of storing (and accessing) values in array without using awk:
arr=($(cat yourFilename |tr "|" " ")) #store in array named arr
# accessing individual elements
echo ${arr[0]}
echo ${arr[4]}
# ...or accesing all elements
for n in ${arr[*]}
do
echo "$n"
done
...even though I wonder if that's what you are looking for. Inital question is not really clear.

sort string with delimiter as string in unix

I have some data in the following format::
Info-programNumber!/TvSource/11100001_233a_32c0/13130^Info-channelName!5 USA^Info-Duration!1575190^Info-programName!CSI: ab cd
Delimiter = Info-
I tried to sort the string based on the delimiter in ascending order. But none of my solutions are working.
Expected Result:
Info-channelName!5 USA^Info-Duration!1575190^Info-programName!CSI: ab cd^Info-programNumber!/TvSource/11100001_233a_32c0/13130
Is there any command that will allow me to do this or do i need to write an awk script to iterate over the string and sort it?
Temporarily split the info into multiple lines so you can sort:
tr ^ \\n | sort | tr \\n ^
Note: if you have multiple entries, you have to write a loop, which processes it per line.. with huge datasets this is probably not a good idea (too slow), in which case pick a programming language.. but you were asking about the shell...
Can be done in awk itself:
awk -F "^" '{OFS="^"; for (i=1; i<=NF; i++) a[i]=$i}
END {n=asort(a, b); for(i=1; i<=n; i++) printf("%s%s", b[i], FS); print ""}' file

Resources