I need to take a large file, with lines such as:
member: cn=user0001,ou=people
And replace all the usernames such that they still have letters in the same position and numbers in the same position, at random. So the output might be something like:
member: cn=kvud7405,ou=people
The usernames vary in length and format, but they're always bounded by a cn= and a comma.
Can anyone offer a solution with sed/awk/bash preferably, or failing that python might be an option (not sure which version).
Thanks in advance.
something like
sed -i 's/blah/blah?$(cat /dev/urandom | tr -dc "a-z0-9" | fold -w 6 | head -n 1)/g' /home/test.html
awk -F 'cn=|,' 'BEGIN {srand(); OFS = ""} {n = split($2, a, ""); for (i = 1; i <= n; i++) {if (a[i] ~ /[[:digit:]]/) {new = new int(rand() * 10)} else {new = new sprintf("%c", int(rand() * 26 + 97))}}; $2 = "cn=" new ","; print}'
Broken out on multiple lines:
awk -F 'cn=|,' '
BEGIN {
srand();
OFS = ""
}
{
n = split($2, a, "");
for (i = 1; i <= n; i++) {
if (a[i] ~ /[[:digit:]]/) {
new = new int(rand() * 10)
}
else {
new = new sprintf("%c", int(rand() * 26 + 97))
}
};
$2 = "cn=" new ",";
print
}'
It could easily be modified to handle uppercase alpha characters if needed.
Edit:
More robust:
awk 'BEGIN {srand()} {match($0, /cn=[^,]*,/); n = split(substr($0, RSTART+3, RLENGTH-4), a, ""); for (i = 1; i <= n; i++) {if (a[i] ~ /[[:digit:]]/) {new = new int(rand() * 10)} else {new = new sprintf("%c", int(rand() * 26 + 97))}}; print substr($0, 1, RSTART+2) new substr($0, RSTART+RLENGTH-1)}'
This version doesn't use FS so it works when there are additional fields.
A Bash solution:
letter=( a b c d e f g h i j k l m n o p q r s t u v w x y z )
digit=( 0 1 2 3 4 5 6 7 8 9 0 )
while read line; do
user=''
line=${line#*=} # separate cn-value
line=${line%,*} # separate cn-value
for (( CNTR=0; CNTR<${#line}; CNTR+=1 )); do
if [[ ${line:CNTR:1} =~ [[:alpha:]] ]] ; then
user=$user${letter[RANDOM%26]}
else
user=$user${digit[RANDOM%10]}
fi
done
echo "member: cn=${user},ou=people"
done < "$infile" > "$tempfile"
mv "$tempfile" "$infile" # replace original file
Related
I want to use this table:
a 16 moe max us
b 11 tom mic us
d 14 roe fox au
t 29 ann teo au
n 28 joe joe ca
and make this matrix by using awk (or any other simple option in bash):
a_16; b_11; d_14; t_29; n_28
us; moe_max; tom_mic; ; ;
au; ; ; roe_fox; ann_teo;
ca; ; ; ; ; joe_joe
I tried this but it didn't work:
awk '{a[$5]=a[$5]?a[$5] FS $1"_"$2:$1"_"$2; b[$5]=b[$5]?b[$5] FS $3"_"$4:$3"_"$4;} END{for (i in a){print i"\t" a[i] "\t" b[i];}}' fis.txt
Using any awk
$ cat tst.awk
{
row = $NF
col = $1 "_" $2
vals[row,col] = $3 "_" $4
}
!seenRow[row]++ { rows[++numRows] = row }
!seenCol[col]++ { cols[++numCols] = col }
END {
OFS = "; "
printf " "
for ( colNr=1; colNr<=numCols; colNr++ ) {
col = cols[colNr]
printf "%s%s", col, (colNr<numCols ? OFS : ORS)
}
for ( rowNr=1; rowNr<=numRows; rowNr++ ) {
row = rows[rowNr]
printf "%s%s", row, OFS
for ( colNr=1; colNr<=numCols; colNr++ ) {
col = cols[colNr]
#val = ((row,col) in vals ? vals[row,col] : " ")
val = vals[row,col]
printf "%s%s", val, (colNr<numCols ? OFS : ORS)
}
}
}
$ awk -f tst.awk file
a_16; b_11; d_14; t_29; n_28
us; moe_max; tom_mic; ; ;
au; ; ; roe_fox; ann_teo;
ca; ; ; ; ; joe_joe
I can't see the pattern in the expected output in your question of when there should be 1, 2, 3, or 4 spaces after each ; so I just used a consistent 2 in the above. Massage it to suit.
Using gawk multidimensional arrays for collecting header columns and row indices:
awk '{
head[NR] = $1"_"$2;
idx[$5][NR] = $3"_"$4
}
END {
h = ""; col_size = length(head);
for (i = 1; i <= col_size; i++) {
h = sprintf("%s %s", h, head[i])
}
print h;
for (lab in idx) {
printf("%s", lab);
for (i = 1; i <= col_size; i++) {
v = sprintf("%s; %s", v, idx[lab][i])
}
print v;
v = "";
}
}' test.txt
a_16 b_11 d_14 t_29 n_28
ca; ; ; ; ; joe_joe
au; ; ; roe_fox; ann_teo;
us; moe_max; tom_mic; ; ;
Here is a ruby to do that:
ruby -e 'd=$<.read.
split(/\R/).
map(&:split).
map{|sa| sa.each_slice(2).map{|ss| ss.join("_") } }.
group_by{|sa| sa[-1] }
# {"us"=>[["a_16", "moe_max", "us"], ["b_11", "tom_mic", "us"]], "au"=>[["d_14", "roe_fox", "au"], ["t_29", "ann_teo", "au"]], "ca"=>[["n_28", "joe_joe", "ca"]]}
heads=d.values.flatten(1).map{|sa| sa[0]}
# ["a_16", "b_11", "d_14", "t_29", "n_28"]
hsh=Hash.new {|h,k| h[k] = ["\t"]*heads.length}
d.each{|k,v|
v.each{|sa|
hsh[k][heads.index(sa[0])]="\t#{sa[1]}"
}
}
puts heads.map{|e| "\t#{e}" }.join(";")
hsh.each{|k,v| puts "#{k};\t#{v.join(";")}"}
' file
Prints:
a_16; b_11; d_14; t_29; n_28
us; moe_max; tom_mic; ; ;
au; ; ; roe_fox; ann_teo;
ca; ; ; ; ; joe_joe
I would like some help to make a code in awk that within 10,000 records would randomly choose 5,000.
Sort has a randomizer.
Assuming an input filename of 10k,
sort -R 10k | head -5000 > 5k # write selections to "5k"
The following method works for single as well as multi-line records or records with specific record-separators.
Define a script random_subset.awk
# Uniform(m) :: returns a random integer such that
# 1 <= Uniform(m) <= m
function Uniform(m) { return 1+int(m * rand()) }
# KnuthShuffle(m) :: creates a random permutation of the range [1,m]
function KnuthShuffle(m, i,j,k) {
for (i = 1; i <= m ; i++) { permutation[i] = i }
for (i = 1; i <= m-1; i++) {
j = Uniform(i-1)
k = permutation[i]
permutation[i] = permutation[j]
permutation[j] = k
}
}
BEGIN{ srand() }
{a[NR]=$0}
END{ KnuthShuffle(NR); for(r = 1; r <= count; r++) print a[permutation[r]] }
Then you can run it as:
$ awk -v count=5000 -f subset.awk inputfile > outputfile
Or if you have a file where the record separator is given by a character like #, then you can do:
$ awk -v count=5000 -v RS='#' -v ORS='#' -f subset.awk inputfile > outputfile
If you want to select random paragraphs, you can do:
$ awk -v count=5000 -v RS='' -v ORS='\n\n' -f subset.awk inputfile > outputfile
What I wanted to do is to create a Table (maximum=4 rows) from a one-column file using awk.
I have a file:
1 a,b
2 r,i
3 w
4 r,t
5 o,s
6 y
The desire output:
1 a,b 5 o,s
2 r,i 6 y
3 w
4 r,t
So far, I just separating the rows into different files and "paste" them into one. I would appreciate of any of more sophisticated method.
$ cat tst.awk
BEGIN {
numRows = 4
OFS = "\t"
}
{
rowNr = (NR - 1 ) % numRows + 1
if ( rowNr == 1 ) {
numCols++
}
val[rowNr,numCols] = $0
}
END {
for (rowNr=1; rowNr<=numRows; rowNr++) {
for (colNr=1; colNr<=numCols; colNr++) {
printf "%s%s", val[rowNr,colNr], (colNr<numCols ? OFS : ORS)
}
}
}
$
$ awk -f tst.awk file
1 a,b 5 o,s
2 r,i 6 y
3 w
4 r,t
Combination of awk to join lines and column to pretty-print them:
awk -v max=4 '
{ i = (NR-1) % max + 1; line[i] = line[i] "\t" $0 }
END { for(i=1; i<=max && i<=length(line); i++) print line[i] }' file | column -t -s $'\t'
Output:
1 a,b 5 o,s
2 r,i 6 y
3 w
4 r,t
Another:
$ awk ' {
i=(NR%4) # using modulo indexed array
a[i]=a[i] (a[i]==""?"":" ") $0 # append to it
}
END { # in the END
for(i=1;i<=4;i++) # loop all indexes in order
print a[i%4] # dont forget the modulo
}' file
1 a,b 5 o,s
2 r,i 6 y
3 w
4 r,t
Naturally it will be ugly if there are missing columns.
Here is another awk approach:-
awk '
{
A[++c] = $0
}
END {
m = sprintf ( "%.0f", ( c / 4 ) )
for ( i = 1; i <= 4; i++ )
{
printf "%s\t", A[i]
for ( j = 1; j <= m; j++ )
printf "%s\t", A[i+(j*4)]
printf "\n"
}
}
' file
you can combine split and paste
split -l 4 file part- && paste part-*
-l <number> means to split file to smaller files of <number> lines each.
part- is a prefix of our choice to be used for the new files. Note that they will be in alphabetical order, e.g. part-aa, part-ab etc. So paste will paste them as expected.
How can I calculate following data?
Input:
2 Printers
2 x 2 Cartridges
2 Router
1 Cartridge
Output:
Total Number of Printers: 2
Total Number of Cartridges: 5
Total Number of Router: 2
Please note that Cartridges have been multiplied (2 x 2) + 1 = 5. I tried following but not sure how to get the number when I have (2 x 2) type of scenario:
awk -F " " '{print $1}' Cartridges.txt >> Cartridges_count.txt
CartridgesCount=`( echo 0 ; sed 's/$/ +/' Cartridges_count.txt; echo p ) | dc`
echo "Total Number of Cartridges: $CartridgesCount"
Please advise.
This assumes that there are only multiplication operators in the data.
awk '{$NF = $NF "s"; sub("ss$", "s", $NF); qty = $1; for (i = 2; i < NF; i++) {if ($i ~ /^[[:digit:]]+$/) {qty *= $i}}; items[$NF] += qty} END {for (item in items) {print "Total number of", item ":", items[item]}}'
Broken out on multiple lines:
awk '{
$NF = $NF "s";
sub("ss$", "s", $NF);
qty = $1;
for (i = 2; i < NF; i++) {
if ($i ~ /^[[:digit:]]+$/) {
qty *= $i
}
};
items[$NF] += qty
}
END {
for (item in items) {
print "Total number of", item ":", items[item]
}
}'
Try something like this (assuming a well formatted input) ...
sed -e 's| x | * |' -e 's|^\([ 0-9+*/-]*\)|echo $((\1)) |' YourFileName | sh | awk '{a[$2]+=$1;} END {for (var in a) print a[var] " "var;}'
P.S. Cartridges and Cartridge are different. If you want to take care of that too, it would be even more difficult but you can modify the last awk in the pipeline.
I have data printed out in the console like this:
A B C D E
1 2 3 4 5
I want to manipulate it so A:1 B:2 C:3 D:4 E:5 is printed.
What is the best way to go about it? Should I tokenize the two lines and then print it out using arrays?
How do I go about it in bash?
Awk is good for this.
awk 'NR==1{for(i=0;i<NF;i++){row[i]=$i}} NR==2{for(i=0;i<NF;i++){printf "%s:%s",row[i],$i}}' oldfile > newfile
A slightly more readable version for scripts:
#!/usr/bin/awk -f
NR == 1 {
for(i = 0; i < NF; i++) {
first_row[i] = $i
}
}
NR == 2 {
for(i = 0; i < NF; i++) {
printf "%s:%s", first_row[i], $i
}
print ""
}
If you want it to scale vertically, you'll have to say how.
For two lines with any number of elements:
(read LINE;
LINE_ONE=($LINE);
read LINE;
LINE_TWO=($LINE);
for i in `seq 0 $((${#LINE_ONE[#]} - 1))`;
do
echo ${LINE_ONE[$i]}:${LINE_TWO[$i]};
done)
To do pairs of lines just wrap it in a loop.
This might work for you:
echo -e "A B C D E FFF GGG\n1 2 3 4 5 666 7" |
sed 's/ \|$/:/g;N;:a;s/\([^:]*:\)\([^\n]*\)\n\([^: ]\+ *\)/\2\1\3\n/;ta;s/\n//'
A:1 B:2 C:3 D:4 E:5 FFF:666 GGG:7
Perl one-liner:
perl -lane 'if($.%2){#k=#F}else{print join" ",map{"$k[$_]:$F[$_]"}0..$#F}'
Somewhat more legible version:
#!/usr/bin/perl
my #keys;
while (<>) {
chomp;
if ($. % 2) { # odd lines are keys
#keys = split ' ', $_;
} else { # even lines are values
my #values = split ' ', $_;
print join ' ', map { "$keys[$_]:$values[$_]" } 0..$#values;
}
}