Adding padded zeros to name files [duplicate] - bash

This question already has answers here:
How to zero pad a sequence of integers in bash so that all have the same width?
(15 answers)
Closed 2 years ago.
Based off this: How to zero pad a sequence of integers in bash so that all have the same width?
I need to create new file names to enter into an array representing chromosomes 1-22 with three digits (chromsome001_results_file.txt..chromsome022_results_file.txt)
Prior to using a three digit system (which sorts easier) I was using
for i in {1..22};
do echo chromsome${i}_results_file.txt;
done
I have read about printf and seq but was wondering how they could be put within the middle of a loop surrounded by text to get the 001 to 022 to stick to the text.
Many thanks

Use printf specifying a field with and zero padding.
for i in {1..22};
do
printf 'chromsome%03d_results_file.txt\n' "$i"
done
In %03d, d means decimal output, 3 means 3 digits, and 0 means zero padding.

Related

bash - Expliciting repetitions in a sequence : how to make AACCCC into 2A4C?

I am looking for a way to quantify the repetitiveness of a DNA sequence. My question is : how are distributed the tandem repeats of one single nucleotide within a given DNA sequence?
To answer that I would need a simple way to "compress" a sequence where there are identical letters repeated several times.
For instance:
AAAATTCGCATTTTTTAGGTA --> 4A2T1C1G1C1A6T1A2G1T1A
From this I would be able to extract the numbers to study the distribution of the repetitions (probably a Poisson distribution I would say), like :
4A2T1C1G1C1A6T1A2G1T1A --> 4 2 1 1 1 1 6 1 2 1 1
The limiting step for me is the first one. There are some topics which give an answer to my question but I am looking for a bash solution using regular expressions.
how to match dna sequence pattern (solution in C++)
Analyze tandem repeat motifs in DNA sequences (solution in python)
Sequence Compression? (solution in Javascript)
So if my questions inspires some regex kings, it would help me a lot.
If there is a software that does this I would take it for sure as well!
Thanks all, I hope I was clear enough
Egill
As others mentioned, Bash might not be ideal for data crunching. That being said, the compression part is not that difficult to implement:
#!/usr/bin/env bash
# Compress DNA sequence [$1: sequence string, $2: name of output variable]
function compress_sequence() {
local input="$1"
local -n output="$2"; output=""
local curr_char="" last_char="${input:0:1}" char_count=1 i
for ((i=1; i <= ${#input}; i++)); do
curr_char="${input:i:1}"
if [[ "${curr_char}" != "${last_char}" ]]; then
output+="${char_count}${last_char}"
last_char="${curr_char}"
char_count=1
else
char_count=$((char_count + 1))
fi
done
}
compress_sequence "AAAATTCGCATTTTTTAGGTA" compressed
echo "${compressed}"
This algorithm processes the sequence string character by character, counts identical characters and adds <count><char> to the output whenever characters change. I did not use regular expressions here and I'm pretty sure there wouldn't be any benefits in doing so.
I might as well add the number extracting part as it is trivial:
numbers_string="${compressed//[^0-9]/ }"
numbers_array=(${numbers_string})
This replaces everything that is not a digit with a space. The array is just a suggestion for further processing.

Unexpected arithmetic result with zero padded numbers

I have a problem in my script wherein I'm reading a file and each line has data which is a representation of an amount. The said field always has a length of 12 and it's always a whole number. So let's say I have an amount of 25,000, the data will look like this 000000025000.
Apparently, I have to get the total amount of these lines but the zero prefixes are disrupting the computation. If I add the above mentioned number to a zero value like this:
echo $(( 0 + 000000025000 ))
Instead of getting 25000, I get 10752 instead. I was thinking of looping through 000000025000 and when I finally get a non-zero value, I'm going to substring the number from that index onwards. However, I'm hoping that there must be a more elegant solution for this.
The number 000000025000 is an octal number as it starts with 0.
If you use bash as your shell, you can use the prefix 10# to force the base number to decimal:
echo $(( 10#000000025000 ))
From the bash man pages:
Constants with a leading 0 are interpreted as octal numbers. A leading 0x or 0X denotes hexadecimal. Otherwise, numbers take the form [base#]n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base.
Using Perl
$ echo "000000025000" | perl -ne ' { printf("%d\n",scalar($_)) } '
25000

Why do I not see the full expected range of random numbers? [duplicate]

This question already has answers here:
Random number from a range in a Bash Script
(19 answers)
Closed 8 years ago.
I would expect the below code to generate (quasi) random numbers between 0.9 and 1.0 for RH.
randno5=$((RANDOM % 100001))
upper_limit5=$(echo "scale=10; 1*1.0"|bc)
lower_limit5=$(echo "scale=10; 1*0.9"|bc)
range5=$(echo "scale=10; $upper_limit5-$lower_limit5"|bc)
RH=`echo "scale=10; ${lower_limit5}+${range5}*${randno5}/100001" |bc`
However, when I run this code I get value between 0.9 and 0.933(3sf). Why is this the case?
$RANDOM is, at most, 32767:
RANDOM Each time this parameter is referenced, a random integer between
0 and 32767 is generated. The sequence of random numbers may be
initialized by assigning a value to RANDOM. If RANDOM is unset,
it loses its special properties, even if it is subsequently
reset.
Your modulus will have no effect as all generated numbers will be restricted to that range.

Arithmetic with variable [duplicate]

This question already has answers here:
How do I use floating-point arithmetic in bash?
(23 answers)
Closed 8 years ago.
I need to perform some arithemic with bash.It goes like this
VariableA = (VariableB-VariableC) / 60
Variable A should be approximated to 2 decimal places
I don't know which one of these is the right answer(Don't have a linux server at hand atm to test)
VariableA = $((VariableB-VariableC)/60)
VariableA = $(((VariableB-VariableC)/))/60)
It would be nice if someone could also help me out about how to round the VariableA to 2 decimal places without using third party tools like bc
The bash itself can compute only integer values, so if you need to use a fixed number of decimals, you can shift your decimal point (it's like computing in cents instead of dollars or euros). Then only at the output you need to make sure there's a . before the last two digits of your number:
a=800
b=300
result=$((a*100/b)) # factor 100 because of division!
echo "${result:0:-2}.${result: -2}"
will print 2.66.
If you want to make computations in floating points, you should use a tool like bc to do that for you:
bc <<<'scale=2; 8.00/3.00'
will print out 2.66.

Print integer with "most appropriate" kilo/mega/etc multiplier [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to convert byte size into human readable format in java?
Given an integer, I'd like to print it in a human-readable way using kilo, mega, giga etc. multipliers. How do I pick the "best" multiplier?
Here are some examples
1 print as 1
12345 print as 12.3k
987654321 print as 988M
Ideally the number of digits printed should be configurable, e.g. in the last example, 3 digits would lead to 988M, 2 digits would lead to 1.0G, 1 digit would lead to 1G, and 4 digits would lead to 987.7M.
Example: Apple uses an algorithm of this kind, I think, when OSX tells me how many more bytes have to be copied.
This will be for Java, but I'm more interested in the algorithm than the language.
As a starting point, you could use the Math.log() function to get the "magnitude" of your value, and then use some form of associative container for the suffix (k, M, G, etc).
var magnitude = Math.log(value) / Math.log(10);
Hope this helps somehow

Resources