How to Get Random Array From Case - bash

I tried to get random array in bash
I used the following code:
agents=$(shuf $DISTR)
case $DISTR in
"1")
echo "amos"
;;
"2")
echo "dia"
;;
"3")
echo "dia lagi"
;;
esac
echo "$agents"
But I am getting the blank result.
Can anyone help me?
i want the output is one of the case

Assumption
$DISTR is a path of a non-empty file.
Fix Existing Code
agents=($(shuf "$DISTR")) # create shuffled array
echo "${agents[0]}" # print the first entry
The first entry is a random entry since the array was shuffled.
Note that this solution does only work well for files with one word (no whitespace) per line. If a line contains more than one word, each word will be stored in a separate array entry after the lines are shuffled. For a file with the lines
...
first second
1st 2nd
...
we may print first or 1st, but never second or 2nd.
Better Solution
Shuffling the complete array is a lot of work if you are only interested in one random entry. It is faster to read the array in its original order and pick a random index.
mapfile agents "$DISTR" # create array
length=${#agents[#]}
randomIndex=$(( RANDOM % length )) # random number between 0 and length-1
echo "${agents[randomIndex]}"
Or in short
mapfile agents "$DISTR"
echo "${agents[RANDOM % ${#agents[#]}]}"
RANDOM is a built-in variable/function returns a pseudo-random number (do not use for encryption and so on). mapfile stores one line per array entry, even if the lines contain whitespace.
Best Solution
Again we did too much work. Why read the whole file into an array when we could pick just one random line? (Typical case of an XY problem).
shuf -n 1 "$DISTR" # print one random line

To generate random numbers with bash use the $RANDOM internal Bash function. Note that $RANDOM should not be used to generate an encryption key. $RANDOM is generated by using your current process ID (PID) and the current time/date as defined by the number of seconds elapsed since 1970.
$(( ( RANDOM % 10 ) + 3 ))

Related

Is there a faster way to combine files in an ordered fashion than a for loop?

For some context, I am trying to combine multiple files (in an ordered fashion) named FILENAME.xxx.xyz (xxx starts from 001 and increases by 1) into a single file (denoted as $COMBINED_FILE), then replace a number of lines of text in the $COMBINED_FILE taking values from another file (named $ACTFILE). I have two for loops to do this which work perfectly fine. However, when I have a larger number of files, this process tends to take a fairly long time. As such, I am wondering if anyone has any ideas on how to speed this process up?
Step 1:
for i in {001..999}; do
[[ ! -f ${FILENAME}.${i}.xyz ]] && break
cat ${FILENAME}.${i}.xyz >> ${COMBINED_FILE}
mv -f ${FILENAME}.${i}.xyz ${XYZDIR}/${JOB_BASENAME}_${i}.xyz
done
Step 2:
for ((j=0; j<=${NUM_CONF}; j++)); do
let "n = 2 + (${j} * ${LINES_PER_CONF})"
let "m = ${j} + 1"
ENERGY=$(awk -v NUM=$m 'NR==NUM { print $2 }' $ACTFILE)
sed -i "${n}s/.*/${ENERGY}/" ${COMBINED_FILE}
done
I forgot to mention: there are other files named FILENAME.*.xyz which I do not want to append to the $COMBINED_FILE
Some details about the files:
FILENAME.xxx.xyz are molecular xyz files of the form:
Line 1: Number of atoms
Line 2: Title
Line 3-Number of atoms: Molecular coordinates
Line (number of atoms +1): same as line 1
Line (number of atoms +2): Title 2
... continues on (where line 1 through Number of atoms is associated with conformer 1, and so on)
The ACT file is a file containing the energies which has the form:
Line 1: conformer1 Energy
Line 2: conformer2 Energy2
Where conformer1 is in column 1 and the energy is in column 2.
The goal is to make the energy for the conformer the title for in the combined file (where the energy must be the title for a specific conformer)
If you know that at least one matching file exists, you should be able to do this:
cat -- ${FILENAME}.[0-9][0-9][0-9].xyz > ${COMBINED_FILE}
Note that this will match the 000 file, whereas your script counts from 001. If you know that 000 either doesn't exist or isn't a problem if it were to exist, then you should just be able to do the above.
However, moving these files to renamed names in another directory does require a loop, or one of the less-than-highly portable pattern-based renaming utilities.
If you could change your workflow so that the filenames are preserved, it could just be:
mv -- ${FILENAME}.[0-9][0-9][0-9].xyz ${XYZDIR}/${JOB_BASENAME}
where we now have a directory named after the job basename, rather than a path component fragment.
The Step 2 processing should be doable entirely in Awk, rather than a shell loop; you can read the file into an associative array indexed by line number, and have random access over it.
Awk can also accept multiple files, so the following pattern may be workable for processing the individual files:
awk 'your program' ${FILENAME}.[0-9][0-9][0-9].xyz
for instance just before catenating and moving them away. Then you don't have to rely on a fixed LINES_PER_FILE and such. Awk has the FNR variable which is the record in the current file; condition/action pairs can tell when processing has moved to the next file.
GNU Awk also has extensions BEGINFILE and ENDFILE, which are similar to the standard BEGIN and END, but are executed around each processed file; you can do some calculations over the record and in ENDFILE print the results for that file, and clear your accumulation variables for the next file. This is nicer than checking for FNR == 1, and having an END action for the last file.
if you really wanna materialize all the file names without globbing you can always jot it (it's like seq with more integer digits in default mode before going to scientific notation) :
jot -w 'myFILENAME.%03d' - 0 999 |
mawk '_<(_+=(NR == +_)*__)' \_=17 __=91 # extracting fixed interval
# samples without modulo(%) math
myFILENAME.016
myFILENAME.107
myFILENAME.198
myFILENAME.289
myFILENAME.380
myFILENAME.471
myFILENAME.562
myFILENAME.653
myFILENAME.744
myFILENAME.835
myFILENAME.926

Arithmetic in shell script (arithmetic in string)

I'm trying to write a simple script that creates five textfiles enumerated by a variable in a loop. Can anybody tell my how to make the arithmetic expression be evaluated. This doesn't seem to work:
touch ~/test$(($i+1)).txt
(I am aware that I could evaluate the expression in a separate statement or change of the loop...)
Thanks in advance!
The correct answer would depend on the shell you're using. It looks a little like bash, but I don't want to make too many assumptions.
The command you list touch ~/test$(($i+1)).txt will correctly touch the file with whatever $i+1 is, but what it's not doing, is changing the value of $i.
What it seems to me like you want to do is:
Find the largest value of n amongst the files named testn.txt where n is a number larger than 0
Increment the number as m.
touch (or otherwise output) to a new file named testm.txt where m is the incremented number.
Using techniques listed here you could strip the parts of the filename to build the value you wanted.
Assume the following was in a file named "touchup.sh":
#!/bin/bash
# first param is the basename of the file (e.g. "~/test")
# second param is the extension of the file (e.g. ".txt")
# assume the files are named so that we can locate via $1*$2 (test*.txt)
largest=0
for candidate in (ls $1*$2); do
intermed=${candidate#$1*}
final=${intermed%%$2}
# don't want to assume that the files are in any specific order by ls
if [[ $final -gt $largest ]]; then
largest=$final
fi
done
# Now, increment and output.
largest=$(($largest+1))
touch $1$largest$2

Bash Script: How to read a line from a file, that was passed as an argument, and store it in a variable

I need to make a program that takes as arguments a number of files that contain lines like this: num1:num2.
I need to store the left column of numbers in one array and the right column and then do some things to them. I need some help on the first part.
The number of files passed as arguments is variable. Also I don't know the name of the files neither how many lines they have. I just know that I will get at least 1 file with 1 line.
I am trying to make a loop for each argument file and then read each file line, break down each line with some string manipulation and then store the results in the 2 arrays. However I haven't succeeded. I know that I also have other kinds of mistakes but I can fix those.
When I try to run the program using:
sh <my_program_name>.sh <argument1_filename>
I just get no results on the terminal, blank screen like it is calculating something in an endless loop.
#!/bin/bash
length=0
b=1
c=1
d=0
args=$#
j=0
temp=0
temp2=0
temp3=0
temp4=0
for temp in "$#"
do
while read line
do
stringtmp=line
tmp=`expr index "$stringtmp" :`
let tmp=tmp-1
stringtmp2='expr substr $stringtmp $1 $tmp'
lengh=`expr index "$stringtmp" \n`
let tmp=tmp+2
let lengh=lengh-1
stringtmp3='expr substr $stringtmp $tmp $lengh'
array1[$length]=stringtmp2
array2[$length]=stringtmp3
let length=length+1
done
...
done
Your while loop is waiting for input from stdin. If you want to loop through contents of temp, you could use:
while read line; do
...
done<$temp

Bash: list files ordered by the number they contain

Goal
I have a series of files that have various names. Each file's name contain a unique integer number made of between 1 and 4 digits. I would like to order the files (by displaying their full name) but ordered by the number they contain.
Example
The files ..
Hello54.txt
piou2piou.txt
hap_1002_py.txt
JustAFile_78.txt
darwin2012.txt
.. should be listed as
piou2piou.txt
Hello54.txt
JustAFile_78.txt
hap_1002_py.txt
darwin2012.txt
You just need to sort by the numeric part. One approach might be to eliminate the non-numeric part and build an array. On the upside, bash has sparse arrays, so whatever order you add the members in, they will come out in the correct order. Pseudo code:
array[name_with_non_numerics_stripped]=name
print array
My quick and not careful implementation:
sortnum() {
# Store the output in a sparse array to get implicit sorting
local -a map
local key
# (REPLY is the default name `read` reads into.)
local REPLY
while read -r; do
# Substitution parameter expansion to nuke non-numeric chars
key="${REPLY//[^0-9]}"
# If key occurs more than once, you will lose a member.
map[key]="$REPLY"
done
# Split output by newlines.
local IFS=$'\n'
echo "${map[*]}"
}
If you have two members with the same "numeric" value, only the last one will be printed. #BenjaminW suggested appending such entries together with a newline. Something like…
[[ ${map[key]} ]] && map[key]+=$'\n'
map[key]+="$REPLY"
…in place of the map[key]="$REPLY" above.

AWK - replace with constant character in a specified number of random lines

I'm tasked with imputing masked genotypes, and I have to mask (hide) 2% of genotypes.
The file I do this in looks like this (genotype.dat):
M rs4911642
M rs9604821
M rs9605903
M rs5746647
M rs5747968
M rs5747999
M rs2070501
M rs11089263
M rs2096537
and to mask it, I simply change M to S2.
Yet, I have to do this for 110 (2%) of 5505 lines, so my strategy of using a random number generator (generate 110 numbers between 1 and 5505 and then manually changing the corresponding line number's M to S2 took almost an hour... (I know, not terribly sophisticated).
I thought about saving the numbers in a separate file (maskedlines.txt) and then telling awk to replace the first character in that line number with S2, but I could not find any adjustable example of to do this.
Anyway, any suggestions of how to tackle this will be deeply appreciated.
Here's one simple way, if you have shuf (it's in Gnu coreutils, so if you have Linux, you almost certainly have it):
sed "$(printf '%ds/M/S2/;' $(shuf -n110 -i1-5505 | sort -n))" \
genotype.dat > genotype.masked
A more sophisticated version wouldn't depend on knowing that you want 110 of 5505 lines masked; you can easily extract the line count with lines=$(wc -l < genotype.dat), and from there you can compute the percentage.
shuf is used to produce a random sample of lines, usually from a file; the -i1-5505 option means to use the integers from 1 to 5505 instead, and -n110 means to produce a random sample of 110 (without repetition). I sorted that for efficiency before using printf to create a sed edit script.
awk 'NR==FNR{a[$1]=1;next;} a[FNR]{$1="S2"} 1' maskedlines.txt genotype.dat
How it works
In sum, we first read in maskedlines.txt into an associative array a. This file is assumed to have one number per line and a of that number is set to one. We then read in genotype.dat. If a for that line number is one, we change the first field to S2 to mask it. The line, whether changed or not, is then printed.
In detail:
NR==FNR{a[$1]=1;next;}
In awk, FNR is the number of records (lines) read so far from the current file and NR is the total number of lines read so far. So, when NR==FNR, we are reading the first file (maskedlines.txt). This file contains the line number of lines in genotype.dat that are to be masked. For each of these line numbers, we set a to 1. We then skip the rest of the commands and jump to the next line.
a[FNR]{$1="S2"}
If we get here, we are working on the second file: genotype.dat. For each line in this file, we check to see if its line number, FNR, was mentioned in maskedlines.txt. If it was, we set the first field to S2 to mask this line.
1
This is awk's cryptic shorthand to print the current line.

Resources