generate text file from set of character - bash

I want to generate one text file containing all combinations possible from a restricted character set into bash , or may be python
For example
I have
aAbBc01+
and I want to have all combinations 9 and 10 character long start with
aaaaaaaaa
finish with
++++++++++
passing through
+++++++++
aaaaaaaaaa

Already discussed in the forum
For python:
python -c "from itertools import permutations as p ; print('\n'.join([''.join(item) for line in open('File') for item in p(line[:-1])]))"
where File contains your input string
For bash -- Much slower
perm() {
items="$1"
out="$2"
[[ "$items" == "" ]] && echo "$out" && return
for (( i=0; i<${#items}; i++ )) ; do
( perm "${items:0:i}${items:i+1}" "$out${items:i:1}" )
done
}
while read line ; do perm $line ; done < File

Here is a python solution:
def combinations(chars,length,result="",place=0):
if place>=length:
print result;
return
for i in range(length):
combinations(chars,length,result+chars[i],place+1)
this function gets a string and desired length for results, and prints all the combinations of characters from it who has the specified length.
if you want the combinations of length 9 or 10, just call
combinations("aAbBc01+",9)
combinations("aAbBc01+",10)
and redirect the output to the text file

Related

BASH Script to find a string in a file by position, match, then modify that position and insert if it exists

I have several lines in a file (input.in) that may look like this (asterisks are not literal; added for emphasis):
200928,121546,00002,**0000004015K**,**0000000641}**,00102020
200928,121546,00002,**0000000227B**,**0000000970R**,84839923
200928,121546,00003,**0000001197A**,**0000000227B**,93877763
I need to be able to find the value of the last character in the forth and fifth element (or look at the position 31 and 43) to determine what the actual number should be and if it's positive or negative. The result should look like the following after modifications:
200928,121546,00002,-00000040152,-00000006410,00102020
200928,121546,00002,00000002272,-00000009709,84839923
200928,121546,00003,00000011971,00000002272,93877763
{ABCDEFGHI correspond to all positive field and subs are 0123456789
}JKLMNOPQR correspond to all negative field and subs are 0123456789
I'm able to get all the positive number conversions working correctly but I am having problems with the negative conversions.
My code looks sorta like this for getting the positive switches (This is a "packed field" conversion btw):
sed -i -E "s/^(.{$a})\{/\10/" input.in
This is for the { positive case where the sub will be 0.
Where $a is introduced by a for a in 30 42 do loop. I have no issues identifying and updating the last char for that string but I can't figure out how to only flip the negative values if the corresponding character is found. I was thinking something like looking at the entire group of 11 (4th and 5th element) and if the last char in that group is }JKLMNOPQR, insert - at the first position and replace }JKLMNOPQR with 0123456789. respectively. Stuck here though. Of course the objective is to update the file with the changes after subs have been completed.
Code sample:
input="input.in"
for a in 30 42
do
while IFS= read -r line
do
echo "${line:$a:1} found, converting"
edbvalue=${line:$a:1}
case $edbvalue in
{)
echo -n -e "{ being replaced with 0\n"
sed -i -E "s/^(.{$a})\{/\10/" input.in
;;
A)
echo -n -e "A being replaced with 1\n"
sed -i -E "s/^(.{$a})A/\11/" input.in
;;
.
.
.
R)
echo -n -e "R being replaced with 9\n"
sed -i -E "s/^(.{$a})R/\19/" input.in
;;
*)
echo -n -e "no conversion needed\n"
;;
esac
done < "$input"
done
Rewriting the input file repeatedly is horrendously inefficient. You want to perform all the replacements in one go.
sed is rather hard to read once you start doing nontrivial things, so I would recommend switching to Awk (or a proper modern scripting language like Python if you want to invest more into this).
awk -F , 'BEGIN { OFS=FS
pos = "{ABCDEFGHI"; neg = "}JKLMNOPQR";
for (i=0; i<10; ++i) { p[substr(pos, i+1, 1)] = i; n[substr(neg, i+1, 1)] = i }
}
{ for (i=4; i<=5; i++) {
where = length($i)
what = substr($i, where, 1)
if (what ~ "^[" pos "]$") sign = ""
else if (what ~ "^[" neg "]$") sign = "-"
else print "Error: field " i " " $i " malformed" >"/dev/stderr"
$i = sign substr($i, 1, where-1) (sign ? n[what] : p[what])
}
}1' input.in
Demo: https://ideone.com/z8wK0V
This isn't entirely obvious, but here's a quick breakdown.
In the BEGIN block, we create two associative arrays, such that
p["{"] = 0, n["}"] = 0
p["A"] = 1, n["J"] = 1
p["B"] = 2, n["K"] = 2
p["C"] = 3, n["L"] = 3
p["D"] = 4, n["M"] = 4
p["E"] = 5, n["N"] = 5
p["F"] = 6, n["O"] = 6
p["G"] = 7, n["P"] = 7
p["H"] = 8, n["Q"] = 8
p["I"] = 9, n["R"] = 9
(We also set OFS to FS so that Awk will print the output comma-separated, like it reads the input.)
Down in the main block, we loop over fields 4 and 5, extracting the last character and mapping it to the corresponding entry from the correct one of the two arrays, and add a sign if warranted.
This simply writes to standard output; save to a new file and move it back over the original input file, or if you have GNU Awk, explore its -i inplace option.
If you really wanted to do this in sed, it offers a rather convenient y/{ABCDEFGHI/0123456789/ but picking apart the fields and then reassembling the line when you are done is not going to be pleasant.

change charters in a string based on vcf table data

I have a long string file (string.txt) (abcdefghijklmnop)
and a vcf table (file.vcf) which lools like that
position 2 4 6 10 n...
name1 a b c d
name2 x y z a
namen...
the table also contain "mis" and "het" and in this case the character should not be replaced
I want to change the characters in the specific location and store all the strings in a new file that will look like this
>name1
aacbecghidklmnop
>name2
axcyezghiaklmnop
is there a way to do it in a bash loop ?
Would you please try the following:
mapfile -t string < <(fold -w1 "string.txt")
# set string to an array of single characters: ("a" "b" "c" "d" ..)
while read -ra ary; do
if [[ ${ary[0]} = "position" ]]; then
# 1st line of file.vcf
declare -a pos=("${ary[#]:1}")
# now the array pos holds: (2 4 6 10 ..)
else
# 2nd line of file.vcf and after
declare -a new=("${string[#]}")
# make a copy of string to modify
for ((i=0; i<${#pos[#]}; i++ )); do
repl="${ary[$i+1]}" # replacement
if [[ $repl != "mis" && $repl != "het" ]]; then
new[${pos[$i]}-1]="$repl"
# modify the position with the replacement
fi
done
echo ">${ary[0]}"
(IFS=""; echo "${new[*]}")
# print the modified array as a concatenated string
fi
done < "file.vcf"
string.txt:
abcdefghijklmnop
file.vcf:
position 2 4 6 10
name1 a b c d
name2 x y z a
name3 i mis k l
Output:
>name1
aacbecghidklmnop
>name2
axcyezghiaklmnop
>name3
aicdekghilklmnop
I have tried to embed explanations as comments in the script above, but
if you still have a question, please feel free to ask.
Hope this helps.

Random word Bash script if a number is supplied as the first command line argument then it will select from only words with that many characters

I am trying to create a Bash script that
- prints a random word
- if a number is supplied as the first command line argument then it will select from only words with that many characters.
This is my go at the first section (print a random word):
C=$(sed -n "$RANDOM p" /usr/share/dict/words)
echo $C
I am really stuck with the second section. Can anyone help?
might help someone coming from ryans tutorial
#!/bin/bash
charlen=$1
grep -E "^.{$charlen}$" $PWD/words.txt | shuf -n 1
you have to use a while loop to read every single line of that file and check if the length of a word equals the specified number ( including apostrophes ). In my o.s it is 99171 line ( i.e the file).
#!/usr/bin/env bash
readWords() {
declare -i int="$1"
(( int == 0 )) && {
printf "%s\n" "$int is 0, cant find 0 words"
return 1
}
while read getWords;do
if [[ ${#getWords} -eq $int ]];then
printf "%s\n" "$getWords"
fi
done < /usr/share/dict/words
}
readWords 20
this function takes a single argument. the declare command coerces the argument into an integer, if the argument is a string , it coerces it into a number which is 0 . Since we don't have 0 words if the specified argument ( number ) is 0 ( or a string coerced to 0 ) return from the function.
Read every single line in /usr/share/dict/words, get the length of each line with ${#getWords} ( $# >> gives the length of a string/commandline parameters/array size ) check if it equals the specified argument ( number )
A loop is not required, you can do something like
CH=$1; # how many characters the word must have
WordFile=/usr/share/dict/words; # file to read from
# find how many words that matches that length
TOTW=$(grep -Ec "^.{$CH}$" $WordFile);
# pick a random one, if you expect more than 32767 hits you
# need to do something like ($RANDOM+1)*($RANDOM+1)
RWORD=$(($RANDOM%$TOTW+1));
#show that word
grep -E "^.{$CH}$" $WordFile|sed -n "$RWORD p"
Depending on things you probably need to add checks for things like that $1 is a reasonable number, the file exist, that TOTW is >0 and so on.
This code would achieve what you want:
awk -v n="$1" 'length($0) == n' /usr/share/dict/words > /tmp/wordsHolder
shuf -n 1 /tmp/wordsHolder
Some comments: by using "$RANDOM" (as you did on your original script attempt), one would generate an integer on the range 0 - 32767, which could be more (or less) than the number of words (lines) available, given the desired number of characters on a word -- thus, potential for errors here.
To avoid that, we are using a shuf syntax that will retrieve a (sub)randomly picked word (line) on the file using its entire range (from line 1 - last line of file).

Expand a range of numbers with grep

I work in telecoms and regularly need to expand number ranges.
For example, 6121234567X [note that there are 10 numbers preceeding the X] is shorthand for:
61212345670
61212345671
61212345672....... etc (a 10 number range)
and 612123456X [note that there are only 9 numbers preceeding the X] is shorthand for
61212345600
61212345601....... etc (a 100 number range)
So I need a grep command that...
reads how many characters in the line preceeding the X (to determine how many suffixes)
writes the appropriate amount of lines (10, 100, or 100) with ascending suffixes
hopefully removes the original line
Below is the Python script that does it, file-name is the expected first argument. Example usage: python script.py file.in > file.out
#!/usr/bin/env python
import sys
def generate(pattern):
p = pattern.lower().find('x')
ret = ""
for i in range(10**(10-p+1)):
ret += pattern[:p] + str(i).zfill(10-p+1) + " "
return ret
if __name__ == "__main__":
if len(sys.argv) <= 1:
print("Filename needed!")
else:
with open(sys.argv[1]) as f:
for ln in f:
print(generate(ln.rstrip()))
You can do this in awk quite quickly:
awk -v val=$a -v max=10
'BEGIN {
gsub("X","",val)
items=max - length(val)
for (i=0; i<=10^items; i++)
print val*(10^items)+i
}'
This works as an example. To do the same reading from a file, you just need to play with $1 (first field of the field) instead of val and move all the code from BEGIN into the main block.
Explanation
-v val=$a -v max=10 pass parameters: $a is the variable containing the string on the form 12345678X AND max contains the maximum amount of digits the number will have (10 in your case).
BEGIN {} perform all these actions [before/without] reading a file.
gsub("X","",val) remove X from val.
items=max - length(val) count the size of the variable without the X.
for (i=0; i<=10^items; i++) print val*(10^items)+i loop from 0 to 10^remaining_size. This means from 0 to 10 or from 0 to 100... depending on the result of 10 - size without X.
Test
With 9 as maximum:
$ a=12345678X
$ awk -v val=$a -v max=9 'BEGIN {gsub("X","",val); items=max - length(val); for (i=0; i<=10^items; i++) print val*(10^items)+i}'
123456780
123456781
123456782
123456783
123456784
123456785
123456786
123456787
123456788
123456789
123456790
echo 6121234567X | perl -nE 'm/(.*)X/;
say $1. $_ foreach (0..10**(11-length $1)-1)'
61212345670
61212345671
61212345672
61212345673
61212345674
61212345675
61212345676
61212345677
61212345678
61212345679
It's a little uglier to get the zero padded format:
echo 611234567X | perl -wne 'm/(.*)X/; $b=$1; $r=11 - length $b;
$fmt="%0" . $r . "s\n";
printf "$b$fmt", $_ foreach (0..10**$r-1) '

KornShell Script: List all even numbers in a range

What I am trying to do is list all the numbers that are even, between the two numbers the user enters via a KornShell (ksh) script. So if user enters for the first digit 2 then the second digit 25 it would display
2,4,6,8,10,12,14,16,18,20,22,24
first=2 # from user
last=25 # from user
seq $first 2 $last
This should work with ksh93 and bash, doesn't require seq or perl which might not be installed depending on the OS used.
function evens {
for((i=($1+($1%2));i<($2-3);i+=2));do printf "%s," $i;done
echo $((i+2))
}
$ evens 2 25
2,4,6,8,10,12,14,16,18,20,24
$ evens 3 24
4,6,8,10,12,14,16,18,20,24
$ evens 0 9
0,2,4,8
In ksh, assuming you have used variables start and end:
set -A evens # use an array to store the numbers
n=0
i=$start
(( i % 2 == 1 )) && (( i+=1 )) # start at an even number
while (( i <= end )); do
evens[n]=$i
(( n+=1 ))
(( i+=2 ))
done
IFS=,
echo "${evens[*]}" # output comma separated string
outputs
2,4,6,8,10,12,14,16,18,20,22,24
there are many ways to do it in shell, shell script, awk, seq etc...
since you tagged question with vi, I added one with vim:
fun! GetEven(f,t)
let ff=a:f%2?a:f+1:a:f
echom join(range(ff,a:t,2),",")
endf
source that function, and type :call GetEven(2,25) you will see your expected output.
It currently echoes in command area, if you want it to be shown in file, just use put or setline, easy too.
Using perl:
perl -e 'print join q{,}, grep { $_ % 2 == 0 } (shift .. shift)' 2 25
It yields:
2,4,6,8,10,12,14,16,18,20,22,24
EDIT to fix the trailing newline:
perl -e 'print join( q{,}, grep { $_ % 2 == 0 } (shift .. shift) ), "\n"' 2 25
By setting first=$(($1+($1%2))) and using the -s option to format the output you can use seq:
first=$(($1+($1%2)))
last=$2
seq -s, $first 2 $last
Save as a script called evens and call with even values of $first:
$ ./evens 2 25
2,4,6,8,10,12,14,16,18,20,22,24
Or odd values of $first:
$ ./evens 3 25
4,6,8,10,12,14,16,18,20,22,24

Resources