generate non-repeating random number sequences in bash - bash

I have been futzing about learning bash tonight and I have been trying to create a random number sequence that uses all the numbers of a range and uses each digit just once. So something like inputting the range of 1-5 will output something like 4-3-5-2-1 or 2-5-1-3-4 and so on. I am as stuck as can be on this one.
Thanks!

the following command is not specific to bash, but it worked
seq 1 5 | shuf
a more bash specific with substrings
x=12345
for((i=5;i>0;i--));do
((r=RANDOM%i+1))
echo ${x:r-1:1}
x=${x:0:r-1}${x:r}
done

Related

How to you repeatedly execute a command in bash with a loop statement

I'm trying to execute the command:
echo 'Hello World'
Ten times, but I don't want to type it out ten times, and I thought I could do it with a loop, but I don't know how.
You seem new to Bash loops so I will take some time to explain my approach to this question.
Approach
Yes, you can definitely use loops in Bash. And in this example I will use the for loop. I will also use the seq command. When seq receives only one argument, an integer, it will generate numbers 1,2,3 ... until that integer, or in other words until the last number. Since I want Hello World printed ten times, I will just specify 10 as the sole argument of seq. Note that in this case, the number specified is inclusive. More information on the official documentation here.
So my code below does basically this: for every number (stated as i here) from 1 to 10, print the string Hello World.
#!/bin/bash
for i in $(seq 10)
do
echo "Hello World"
done
And it is basically the same as:
#!/bin/bash
for i in 1 2 3 4 5 6 7 8 9 10
do
echo "Hello World"
done

Pipe input redirection to random spot in my new function

Sorry for the bad wording of the question, couldn't really think about a decent way to say this.
Problem:
I want a certain sequence to show up, and given is a number, 19 for example.
$ echo 19 | seq 1 2 [INPUT FROM PIPE HERE]
So i want the sequence to go from 1, with an increment of 2 till it reaches the input number 19.
And i don't know how to do this, although it's probably very easy, i'm still very new to the shell.
PS: Sorry if this is a duplicate, i couldn't find what i was looking for after 15min searching.
I'm not sure whether you can pipe parameters into seq. But it might be possible to get the output of a command and use it in seq.
You might be able to use this though
e=$(echo 19) && seq 1 2 "$e"
A more straightforward way is:
seq 1 2 $(echo 19)
This will run a command, echo 19 in this case, and assigns it to variable e. Only if this assignment was successful the next command will run (this is ensured by &&). The next program will then use that variable as a parameter via "$e". The double quotation marks are not necessary in this case, but they might be useful in some other cases of this method.

text manipulation using unix commands only

I have a task where I need to parse through files and extract information. I can do this easy using bash but I have to get it done through unix commands only.
For example, I have a file similar to the following:
Set<tab>one<tab>two<tab>three
Set<tab>four<tab>five<tab>six
ENDSET
Set<tab>four<tab>two<tab>nine
ENDSET
Set<tab>one<tab>one<tab>one
Set<tab>two<tab>two<tab>two
ENDSET
...
So on and so forth. I want to be able to extract a certain number of sets, say the first 10. Also, I want to be able to extract info from the columns.
Once again, this is a trivial thing to do using bash scripting, but I am unsure of how to do this with unix commands only. I can combine the commands together in a shell script but, once again, only unix commands.
Without an output example, it's hard to know your goal, but anyway, one UNIX command you can use is AWK.
Examples:
Extract 2 sets from your data sample (without include "ENDSET" nor blank lines):
$ awk '/ENDSET/{ if(++count==2) exit(0);next; }NF{print}' file.txt
Set one two three
Set four five six
Set four two nine
Extract 3 sets and print 2nd column only (Note 1st column is always "Set"):
$ awk '/ENDSET/{ if(++count==3) exit(0);next; }$2{print $2}' file.txt
two
five
two
one
two
And so on... (more info: $ man awk)

AWK - replace with constant character in a specified number of random lines

I'm tasked with imputing masked genotypes, and I have to mask (hide) 2% of genotypes.
The file I do this in looks like this (genotype.dat):
M rs4911642
M rs9604821
M rs9605903
M rs5746647
M rs5747968
M rs5747999
M rs2070501
M rs11089263
M rs2096537
and to mask it, I simply change M to S2.
Yet, I have to do this for 110 (2%) of 5505 lines, so my strategy of using a random number generator (generate 110 numbers between 1 and 5505 and then manually changing the corresponding line number's M to S2 took almost an hour... (I know, not terribly sophisticated).
I thought about saving the numbers in a separate file (maskedlines.txt) and then telling awk to replace the first character in that line number with S2, but I could not find any adjustable example of to do this.
Anyway, any suggestions of how to tackle this will be deeply appreciated.
Here's one simple way, if you have shuf (it's in Gnu coreutils, so if you have Linux, you almost certainly have it):
sed "$(printf '%ds/M/S2/;' $(shuf -n110 -i1-5505 | sort -n))" \
genotype.dat > genotype.masked
A more sophisticated version wouldn't depend on knowing that you want 110 of 5505 lines masked; you can easily extract the line count with lines=$(wc -l < genotype.dat), and from there you can compute the percentage.
shuf is used to produce a random sample of lines, usually from a file; the -i1-5505 option means to use the integers from 1 to 5505 instead, and -n110 means to produce a random sample of 110 (without repetition). I sorted that for efficiency before using printf to create a sed edit script.
awk 'NR==FNR{a[$1]=1;next;} a[FNR]{$1="S2"} 1' maskedlines.txt genotype.dat
How it works
In sum, we first read in maskedlines.txt into an associative array a. This file is assumed to have one number per line and a of that number is set to one. We then read in genotype.dat. If a for that line number is one, we change the first field to S2 to mask it. The line, whether changed or not, is then printed.
In detail:
NR==FNR{a[$1]=1;next;}
In awk, FNR is the number of records (lines) read so far from the current file and NR is the total number of lines read so far. So, when NR==FNR, we are reading the first file (maskedlines.txt). This file contains the line number of lines in genotype.dat that are to be masked. For each of these line numbers, we set a to 1. We then skip the rest of the commands and jump to the next line.
a[FNR]{$1="S2"}
If we get here, we are working on the second file: genotype.dat. For each line in this file, we check to see if its line number, FNR, was mentioned in maskedlines.txt. If it was, we set the first field to S2 to mask this line.
1
This is awk's cryptic shorthand to print the current line.

How to repeat a function a random number of times with an upper limit on the random number in bash

I am trying to repeatedly run a python script a random number of times using bash. However, to avoid running the script a massive amount of times I want to place an upper limit on the amount of times it can run. Currently I am using the 'modulo' operator to return a remainder and then using that as a string when performing a loop:
#!/bin/bash
RANGE=1000
number=$RANDOM
let "number %= $RANGE"
for run in {1..$number}
do
python script.py
done
The random number works (i.e. $number is a random number between 1-1000), but the problem is that this only seems to be running the script once, no matter what the random number is.
What might the problem be?
Problem is this line:
for run in {1..$number}
Since variables are not allowed (expanded) inside range {..} thus causing your loop to run only once no matter what is the value of $number.
Use it like this:
#!/bin/bash
range=1000
number=$((RANDOM % range))
for ((run=1; run <= number; run++)); do
python script.py
done

Resources