I have a bash script that greps and sorts information from /etc/passwd here
export FT_LINE1=13
export FT_LINE2=23
cat /etc/passwd | grep -v "#" | awk 'NR%2==1' | cut -f1 -d":" | rev | sort -r | awk -v l1="$FT_LINE1" -v l2="$FT_LINE2" 'NR>=l1 && NR<=l2' | tr '\n' ',' | sed 's/, */, /g'
The result is this list
sstq_, sorebrek_brk_, soibten_, sirtsa_, sergtsop_, sec_, scodved_, rlaxcm_, rgmecived_, revreswodniw_, revressta_,
How can i replace the last comma with a dot (.)? I want it to look like this
sstq_, sorebrek_brk_, soibten_, sirtsa_, sergtsop_, sec_, scodved_, rlaxcm_, rgmecived_, revreswodniw_, revressta_.

You can add:
| sed 's/,$/./'
(where $ means "end of line").

There are way to many pipes in your command, some of them can be removed.
As explained in the comment cat <FILE> | grep is a bad habit!!! In general, cat <FILE> | cmd should be replaced by cmd <FILE> or cmd < FILE depending on what type of arguments your command does accept.
On a few GB size file to process, you will already feel the difference.
This being said, you can do the whole processing without using a single pipe by using awk for example:
awk -v l1="$FT_LINE1" -v l2="$FT_LINE2" 'function reverse(s){p=""; for(i=length(s); i>0; i--){p=p substr(s,i,1);}return p;}BEGIN{cmp=0; FS=":"; ORS=","}!/#/{cmp++;if(cmp%2==1) a[cmp]=reverse($1);}END{asort(a);for(i=length(a);i>0;i--){if((length(a)-i+1)>=l1 && (length(a)-i)<=l2){if(i==1){ORS=".";}print a[i];}}}' /etc/passwd
# BEGIN rule(s)
cmp = 0 #to be use to count the lines since NR can not be used directly
FS = ":" #file separator :
ORS = "," #output record separator ,
# Rule(s)
! /#/ { #for lines that does not contain this char
if (cmp % 2 == 1) {
a[cmp] = reverse($1) #add to an array the reverse of the first field
# END rule(s)
asort(a) #sort the array and process it in reverse order
for (i = length(a); i > 0; i--) {
# apply your range conditions
if (length(a) - i + 1 >= l1 && length(a) - i <= l2) {
if (i == 1) { #when we reach the last character to print, instead of the comma use a dot
ORS = "."
print a[i] #print the array element
# Functions, listed alphabetically
#if the reverse operation is necessary then you can use the following function that will reverse your strings.
function reverse(s)
p = ""
for (i = length(s); i > 0; i--) {
p = p substr(s, i, 1)
return p
If you don't need to reverse part you can just remove it from the awk script.
In the end, not a single pipe is used!!!


modularize awk script to mask sensitive data in delimited file

I have a delimited file in the below format:
Considering all the fields are sensitive data, i had written the following awk command to mask the first field with random data.
awk -F'|' -v cmd="strings /dev/urandom | tr -dc '0-9' | fold -w 5" 'BEGIN {OFS=FS} {cmd | getline a;$1=a;print}' source.dat > source_masked.dat
If i want to mask additional fields I add the following.
awk -F'|' -v cmd1="strings /dev/urandom | tr -dc '0-9' | fold -w 5" -v cmd2="strings /dev/urandom | tr -dc 'A-Za-z0-9' | fold -w 7" 'BEGIN {OFS=FS} {cmd | getline a; cmd2 | getline b;$2=b}' source.dat > source_masked.dat
How do i scale it if i want to mask 100s of columns with different datatypes?
Basically, i want to take the following from config file:
column number, datatype, length
and use it in the awk to generate the commands and the replacement script dynamically.
Could you please advice on the same.
I rewrote the same accepted answer on awk as it took a long time to mask larger files using bash.
The code for the same is:
function mask(datatype, precision) {
switch (datatype) {
case "string":
command = "strings /dev/urandom | tr -dc '[:alpha:]' | fold -w "
case "alphaNumeric":
command = "strings /dev/urandom | tr -dc '[:alnum:]' | fold -w "
case "number":
command = "strings /dev/urandom | tr -dc '[:digit:]' | fold -w "
command = "strings /dev/urandom | tr -dc '[:alnum:]' | fold -w "
command | getline v
return v
while ((getline line < "properties.conf") > 0) {
split(line, a, ",")
col = a[1]
type = a[2]
len = a[3]
masks[col] = type " "
IFS = "|"
OFS = "|"
} {
for (i = 1; i <= NF; i++) {
if (masks[i] != "") {
split(masks[i], m, " ")
$i = mask(m[1], m[2])
One approach is to read the mask configuration file into an array indexed by column number.
Then, read the data file line by line. put each field in a second array. Then, for each element of the mask array, randomize the appropriate data field. When all fields are updated, output the new line and move on to the next line.
Does this have to be done in awk? It might be easier/quicker to do it in native bash:
declare mask_file=masks.conf
declare input_file=input.dat
declare output_file=output.dat
function create_mask() {
# ${1} is type, ${2} is length
case ${1} in
string ) ;;
date ) ;;
number ) ;;
* ) ;;
while read column type length; do
masks[${column}]="${type} ${length}"
done < ${mask_file}
while read -a data; do
for column in ${!masks[#]}; do
data[${column}]=$(create_mask ${masks[${column}]})
echo "${data[*]}" # Uses IFS as output separator.
done < ${input_file} > ${output_file}
I have not included the full contents of the create_mask() function, as I do not know what types you plan to support or format you want for each type.
You can use the built-in rand function instead to generate a random number.
Define an associative array with the list of fields that you want to mask.
E.g. here is a sample code that will mask field 1 & 4
awk -F\| '
for ( i = 1; i <= NF; i++ )
if ( i in A_mask_field )
$i = sprintf( "%d", rand() * length($i) * 100000 )
' OFS=\| file

printing contents of variable to a specified line in outputfile with sed/awk

I have been working on a script to concatenate multiple csv files into a single, large csv. The csv's contain names of folders and their respective sizes, in a 2-column setup with the format "Size, Projectname"
Example of a single csv file:
For my current test I have 25 similar files, with different numbers in the first column.
I am trying to get this script to do the following:
Read each csv file
For each Project it sees, scan the outputfile if that Project was already printed to the file. If not, print the Projectname
For each file, for each Project, if the Project was found, print the Size to the output csv.
However, I need the Projects to all be on textline 1, comma separated, so I can use this outputfile as input for a javascript graph. The Sizes should be added in the column below their projectname.
My current script:
csv_folder=$(echo "$1" | sed 's/^[ \t]*//;s/\/[ \t]*$//')
echo -n "" > $csv_outputfile
for csv_inputfile in $csv_allfiles; do
while read line && [[ $line != "" ]]; do
projectname=$(echo $line | sed 's/^\([^,]*\),//')
projectfound1=$(cat $csv_outputfile | grep -w $projectname)
if [[ ! $projectfound1 ]]; then
sed "${textline}s/$/${projectname}, /" >> $csv_outputfile
for csv_foundfile in $csv_allfiles; do
textline=$(echo $textline + 1 | bc )
projectfound2=$(cat $csv_foundfile | grep -w $projectname)
projectdata=$(echo $projectfound2 | sed 's/\,.*$//')
if [[ $projectfound2 ]]; then
sed "${textline}s/$/$projectdata, /" >> $csv_outputfile
done < $csv_inputfile
My current script finds the right information (projectname, projectdata) and if I just 'echo' those variables, it prints the correct data to a file. However, with echo it only prints in a long list per project. I want it to 'jump back' to line 1 and print the new project at the end of the current line, then run the loop to print data at the end of each next line.
I was thinking this should be possible with sed or awk. sed should have a way of inserting text to a specific line with
sed '{n}s/search/replace/'
where {n} is the line to insert to
awk should be able to do the same thing with something like
awk -v l2="$textline" -v d="$projectdata" 'NR == l2 {print d} {print}' >> $csv_outputfile
However, while replacing the sed commands in the script with
echo $projectname
echo $projectdata
spit out the correct information (so I know my variables are filled correctly) the sed and awk commands tend to spit out the entire contents of their current inputcsv; not just the line that I want them to.
Pastebin outputs per variant of writing to file
https://pastebin.com/XwxiAqvT - sed output
https://pastebin.com/xfLU6wri - echo, plain output (single column)
https://pastebin.com/wP3BhgY8 - echo, detailed output per variable
https://pastebin.com/5wiuq53n - desired output
As you see, the sed output tends to paste the whole contents of inputcsv, making the loop stop after one iteration. (since it finds the other Projects after one loop)
So my question is one of these;
How do I make sed / awk behave the way I want it to; i.e. print only the info in my var to the current textline, instead of the whole input csv. Is sed capable of this, printing just one line of variable? Or
Should I output the variables through 'echo' into a temp file, then loop over the temp file to make sed sort the lines the way I want them to? (Bear in mind that more .csv files will be added in the future, I can't just make it loop x times to sort the info)
Is there a way to echo/print text to a specific text line without using sed or awk? Is there a printf option I'm missing? Other thoughts?
Any help would be very much appreciated.
A way to accomplish this transposition is to save the data to an associative array.
In the following example, we use a two dimensional array to keep track of our data. Because ordering seems to be important, we create a col array and create a new increment whenever we see a new projectname -- this col array ends up being our first index into our data. We also create a row array which we increment whenever we see a new data for the current column. The row number is our second index into data. At the end, we print out all the records.
#! /usr/bin/awk -f
FS = ","
OFS = ", "
split("", data)
split("", row)
split("", col)
!($2 in col) { # new project
if (head == "")
head = $2
head = head OFS $2
i = col[$2] = cols++
row[i] = 0
i = col[$2]
j = row[i]++
data[i,j] = $1
if (j > rows)
rows = j
print head
for (j=0; j<=rows; ++j) {
if ((0,j) in data)
x = data[0,j]
x = ""
for (i=1; i<cols; ++i) {
if ((i,j) in data)
x = x OFS data[i,j]
x = x OFS
print x
As a bonus, here is a script to reproduce the detailed output from one of your pastebins.
#! /usr/bin/awk -f
FS = ","
split("", data) # accumulated data for a project
split("", line) # keep track of textline for data
split("", idx) # index into above to maintain input order
sz = 0
$2 in idx { # have seen this projectname
i = idx[$2]
x = ORS "textline = " ++line[i]
x = x ORS "textdata = " $1
data[i] = data[i] x
{ # new projectname
i = sz++
idx[$2] = i
x = "textline = 1"
x = x ORS "projectname = " $2
x = x ORS "textline = 2"
x = x ORS "projectdata = " $1
data[i] = x
line[i] = 2
for (i=0; i<sz; ++i)
print data[i]
Fill parray with project names and array with values, then print them with bash printf, You can choose column width in printf command (currently 13 characters - %13s)
declare -i index=0
declare -i pindex=0
while read project; do
while read;do
done <<< $(grep -h "$project" *.csv|cut -d, -f1)
done <<< $(cat *.csv|cut -d, -f 2|sort -u)
for (( pindex=0; $pindex < $maxp ; pindex+=1 ));do
STR="%13s $STR"
VAL="$VAL ${parray[$pindex]}"
printf "$STR\n" $VAL
for (( index=0; $index < $maxi;index+=1 ));do
STR=""; VAL=""
for (( pindex=0; $pindex < $maxp;pindex+=1 )); do
STR="%13s $STR"
VAL="$VAL ${array[$pindex,$index]}"
printf "$STR\n" $VAL
If you are OK with the output being sorted by name this one-liner might be of use:
awk 'BEGIN {FS=",";OFS=","} {print $2,$1}' * | sort | uniq
The files have to be in the same directory. If not a list of files replaces the *. First it exchanges the two fields. Awk will take a list of files and do the concatenation. Then sort the lines and print just the unique lines. This depends on the project size always being the same.
The simple one-liner above gives you one line for each project. If you really want to do it all in awk and use awk write the two lines, then the following would be needed. There is a second awk at the end that accumulates each column entry in an array then spits it out at the end:
awk 'BEGIN {FS=","} {print $2,$1}' *| sort |uniq | awk 'BEGIN {n=0}
END {for (i=0;i<n;i++) printf "%s,",p[i];print "";
for (i=0;i<n;i++) printf "%s,",s[i];print ""}'
If you have the rs utility then this can be simplified to
awk 'BEGIN {FS=","} {print $2,$1}' *| sort |uniq | rs -C',' -T

Counting palindromes in a text file

Having followed this thread BASH Finding palindromes in a .txt file I can't figure out what am I doing wrong with my script.
search() {
tr -d '[[:punct:][:digit:]#]' \
| sed -E -e '/^(.)\1+$/d' \
| tr -s '[[:space:]]' \
| tr '[[:space:]]' '\n'
search "$1"
paste <(search <"$1") <(search < "$1" | rev) \
| awk '$1 == $2 && (length($1) >=3) { print $1 }' \
| sort | uniq -c
All im getting from this script is output of the whole text file. I want to only output palindromes >=3 and count them such as
425 did
120 non
etc. My textfile is called sample.txt and everytime i run the script with: cat sample.txt | source palindrome I get message 'bash: : No such file or directory'.
Using awk and sed
awk 'function palindrome(str) {len=length(str); for(k=1; k<=len/2+len%2; k++) { if(substr(str,k,1)!=substr(str,len+1-k,1)) return 0 } return 1 } {for(i=1; i<=NF; i++) {if(length($i)>=3){ gsub(/[^a-zA-Z]/,"",$i); if(length($i)>=3) {$i=tolower($i); if(palindrome($i)) arr[$i]++ }} } } END{for(i in arr) print arr[i],i}' file | sed -E '/^[0-9]+ (.)\1+$/d'
Tested on 1.2GB file and execution time was ~4m 40s (i5-6440HQ # 2.60GHz/4 cores/16GB)
Explanation :
awk '
function palindrome(str) # Function to check Palindrome
for(k=1; k<=len/2+len%2; k++)
return 0
return 1
for(i=1; i<=NF; i++) # For Each field in a record
if(length($i)>=3) # if length>=3
gsub(/[^a-zA-Z]/,"",$i); # remove non-alpha character from it
if(length($i)>=3) # Check length again after removal
$i=tolower($i); # Covert to lowercase
if(palindrome($i)) # Check if it's palindrome
arr[$i]++ # and store it in array
END{for(i in arr) print arr[i],i}' file | sed -E '/^[0-9]+ (.)\1+$/d'
sed -E '/^[0-9]+ (.)\1+$/d' : From the final result check which strings are composed of just repeated chracters like AAA, BBB etc and remove them.
Old Answer (Before EDIT)
You can try below steps if you want to :
Step 1 : Pre-processing
Remove all unnecessary chars and store the result in temp file
tr -dc 'a-zA-Z\n\t ' <file | tr ' ' '\n' > temp
tr -dc 'a-zA-Z\n\t ' This will remove all except letters,\n,\t, space
tr ' ' '\n' This will convert space to \n to separate each word in newlines
Step-2: Processing
grep -wof temp <(rev temp) | sed -E -e '/^(.)\1+$/d' | awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }'
grep -wof temp <(rev temp) This will give you all palindromes
-w : Select only those lines containing matches that form whole words.
For example : level won't match with levelAAA
-o : Print only the matched group
-f : To use each string in temp file as pattern to search in <(rev temp)
sed -E -e '/^(.)\1+$/d': This will remove words formed of same letters like AAA, BBBBB
awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }' : This will filter words having length>=3 and counts their frequency and finally prints the result
Example :
Input File :
$ cat file
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
kayak nalayak bob dad , pikachu. meow !! bhow !! 121 545 ding dong AAA BBB done
$ tr -dc 'a-zA-Z\n\t ' <file | tr ' ' '\n' > temp
$ grep -wof temp <(rev temp) | sed -E -e '/^(.)\1+$/d' | awk 'length>=3 {a[$1]++} END{ for(i in a) print a[i],i; }'
3 dad
3 kayak
3 bob
Just a quick Perl alternative:
perl -0nE 'for( /(\w{3,})/g ){ $a{$_}++ if $_ eq reverse($_)}
END {say "$_ $a{$_}" for keys %a}'
in Perl, $_ should be read as "it".
for( /(\w{3,})/g ) ... for all relevant words (may need some work to reject false positives like "12a21")
if $_ eq reverse($_) ... if it is palindrome
END {say "$_ $a{$_}" for...} ... tell us all the its and its number
Running the Script
The script expects that the file is given as an argument. The script does not read stdin.
Remove the line search "$1" in the middle of the script. It is not part of the linked answer.
Make the script executable using chmod u+x path/to/palindrome.
Call the script using path/to/palindrome path/to/sample.txt. If all the files are in the current working directory, then the command is
./palindrome sample.txt
Alternative Script
Sometimes the linked script works and sometimes it doesn't. I haven't found out why. However, I wrote an alternative script which does the same and is also a bit cleaner:
#! /bin/bash
grep -Po '\w{3,}' "$1" | grep -Evw '(.)\1*' | sort > tmp-words
grep -Fwf <(rev tmp-words) tmp-words | uniq -c
rm tmp-words
Save the script, make it executable, and call it with a file as its first argument.

How to combine the two awk files into one?

Here is a original awk file ,i want to format it.
input content----original awk file named test.txt
awk 'BEGIN {maxlength = 0}\
if (length($0) > maxlength) {\
maxlength = length($0);\
longest = $0;\
END {print longest}' somefile
expected output----well-formatted awk file
awk 'BEGIN {maxlength = 0} \
{ \
if (length($0) > maxlength) { \
maxlength = length($0); \
longest = $0; \
} \
} \
END {print longest}' somefile
step1:to get the longest line and chracters number
#! /usr/bin/awk
BEGIN {max =0 }
if (length($0) > max) { max = length($0)}
END {print max}
awk -f step1.awk test.txt
Now the max length for all lines is 50.
step2 to put \ in the position 50+2=52.
#! /usr/bin/awk
if($0 ~ /\\$/){
awk -f step2.awk -v n=52 test.txt > well_formatted.txt
How to combine step1 and step2 into only one step,and combine step1.awk and step2.awk as one awk file?
Better version, where you can use sub() instead of gsub(), and to avoid testing the same regexp twice sub(/\\$/,""){ ... }
awk 'FNR==NR{
if(length>max)max = length
printf "%-*s\\\n", max+2, $0
}1' test.txt test.txt
awk 'FNR==NR{ # Here we read file and will find,
# max length of line in file
# FNR==NR is true when awk reads first file
if(length>max)max = length # find max length
next # stop processing go to next line
sub(/\\$/,""){ # Here we read same file once again,
# if substitution was made for the regex in record then
printf "%-*s\\\n", max+2, $0 # printf with format string max+2
next # go to next line
}1 # 1 at the end does default operation print $0,
# nothing but your else statement printf("%s\n",$0) in step2
' test.txt test.txt
You have not shown us, what's your input and expected output, with some assumption,
if your input looks like below
akshay#db-3325:/tmp$ cat f
123 \
123456 \
1234567 \
You get output as follows
akshay#db-3325:/tmp$ awk 'FNR==NR{ if(length>max)max = length; next}
sub(/\\$/,"",$0){ printf "%-*s\\\n",max+2,$0; next }1' f f
123 \
123456 \
1234567 \
awk '
# first round
FNR == NR {
# take longest (compare and take longest line by line)
M = M < (l = length( $0) ) ? l : M
# go to next line
# for every line of second round (due to previous next) that finish by /
/[/]$/ {
# if a modification is needed
if ( ( l = length( $0) ) < M ) {
# add the missing space (using sprintf "%9s" for 9 spaces)
sub( /[/]$/, sprintf( "%" (M - l) "s/", ""))
# print all line [modified or not] (7 is private joke but important is <> 0 )
' test.txt test.txt
twice the file at the end is mandatory for reading twice the file
assume that there is nothing after last / (no space). Could be easily adapted but not the purpose
assume that line without / are not modified but still printed
Here is one for GNU awk. Two runs, first one finds the max length and the second one outputs. FS is set to "" so that each char goes on its field and the last char will in $NF:
$ awk 'BEGIN{FS=OFS=""}NR==FNR{m=(m<NF?NF:m);next}$NF=="\\"{$NF=sprintf("% "m-NF+2"s",$NF)}1' file file
awk 'BEGIN {maxlength = 0} \
{ \
if (length($0) > maxlength) { \
maxlength = length($0); \
longest = $0; \
} \
} \
END {print longest}' somefile
BEGIN { FS=OFS="" } # each char on different field
NR==FNR { m=(m<NF?NF:m); next } # find m ax length
$NF=="\\" { $NF=sprintf("% " m-NF+2 "s",$NF) } # NF gets space padded
1 # output
If you want the \s further away from the code, change that 2 in sprintf to suit your liking.
Maybe something like this?
wc -L test.txt | cut -f1 -d' ' | xargs -I{} sed -i -e :a -e 's/^.\{1,'{}'\}$/& /;ta' test.txt && sed -i -r 's/(\\)([ ]*)$/\2\1/g' test.txt

sorting group of lines

I have a text file like below
RATERUSG.iv_destination_code_10 = WORK.maf_feature_info[53,6]
RATERUSG.iv_destination_code_2 = WORK.maf_feature_info[1,6]
RATERUSG.iv_destination_code_3 = WORK.maf_feature_info[7,6]
RATERUSG.iv_destination_code_4 = WORK.maf_feature_info[13,6]
RATERUSG.iv_destination_code_5 = WORK.maf_feature_info[19,6]
RATERUSG.iv_destination_code_6 = WORK.maf_feature_info[29,6]
RATERUSG.iv_destination_code_7 = WORK.maf_feature_info[35,6]
RATERUSG.iv_destination_code_8 = WORK.maf_feature_info[41,6]
RATERUSG.iv_destination_code_9 = WORK.maf_feature_info[47,6]
combination of three lines form a unit:
RATERUSG.iv_destination_code_9 = WORK.maf_feature_info[47,6]
is one unit.
9 indicates the number by which i have to sort
i need a shell script/awk which will sort the units in a descending order.
how is it possible?
cat file | tr '\n' '#' | sed 's/]#/]\n/g' | sort -nrt_ -k4 | tr '#' '\n'
First all end of lines are replaced by #, and end of lines at the end of blocks (]#) are recreated.
Then a numeric reverse sort is performed on the fourth field with fields separated by _.
Finally, original end of lines are retrieved.
sed 'N;N;s/\n/#/g' file |sort -t"_" -nr -k4 | sed 's|#|\n|g'
Or with gawk
awk -vRS="\niv_" -vFS="\n" 'BEGIN{t=0}
line[a[m]] = $0
cmd="sort -nr"
for(i in num){ print i |& cmd }
while((cmd |& getline m) > 0) {
print line[ arr2[1] ]
if(line[ arr2[j]] != "" ){
print "iv_"line[ arr2[j] ]
}' file
This works similarly to mouvicel's answer, but uses non-printing characters as the special markers (and assumes that the original file doesn't contain them).
sed 's/]$/]'$'\1''/' text_file | tr '\1' '\0' | sort -znrt_ | tr '\0' '\n' | sed '/^$/d'
It assumes that there are no blank lines in the original file since it deletes them at the end. It also relies on every group-ending line to end in "]".
