Bash: base64 decode a file x times - bash

During a scripting challenge, it was asked to me to decode X times (saying 100) a base64 files (base64.txt).
So I wrote this small bash script to do so.
for item in `cat base64.txt`;do
for count in {1..100};do
if [ $count -eq 1 ]; then
current=$(echo "$item" |base64 --decode)
else
current=$(echo "$current" |base64 --decode)
fi
if [ $count -eq 100 ]; then
echo $current
fi
done
done
It is working as expected, and I got the attended result.
What I am looking for now, is a way to improve this script, because I am far to be a specialist, and want to see what could improve the way I approach this kind of challenge.
Could some of you please give me some advice ?

decode X times (saying 100) a base64 file (base64.txt)
there is only 1 file, that contains 1 line in it.
Just read the content of the file, decode it 100 times and output.
state=$(<base64.txt)
for i in {1..100}; do
state=$(<<<"$state" base64 --decode)
done
echo "$state"
Notes:
Backticks ` are discouraged. Use $(...) instead. bash deprecated and obsolete syntax
for i in cat is a common antipattern in bash. How to read a file line by line in bash
If the file contains one line only, there is no need to iterate over the words in the file.
In bash echo "$item" | is a useless usage of echo (and is also a small risk that it may not work, when ex. item=-e). You can use a here string instead when in bash.

Related

Why won't my for/if loop work for renaming sequence names in a fasta file in bash

I am currently working on a bioinformatics project, and have been assigned the role of editing some genetic sequence files (fasta/.fa) to be viable for the next stage of processing. I am doing this on the command line linux with bash.
With how the files have been obtained, each read within the file has been assigned an arbitrary name following this format for 1-1587663 (denoted x) V1_x.
For the next step of my reads, I need to format these names within the file following a specific naming pattern. This is where all empty spaces must contain a 0. For example, V1_1 must be reformatted to V1_0000001, V1_15 must be reformatted to V1_0000015, V1_1050 must be formatted to V1_0001050, eventually ending with V1_1587663.
I will give an example of how one file is laid out:
V1_1 flag=1 multi=9.0000 len=342\
AAGGAGTGATGGCATGGCGTGGGACTTCTCCACCGACCCCGAGTTCCAGGAGAAGCTCGACTGGGTCGAGCGGTTCTGCCAGGAAAGGGTCGAGCCGCTCGACTATGTGTTTCCCCACGCGGTGCGCTGGCCAGACCCGGTGGTAAAGGCGTACGTCCGCGAACTCCAGCAGGAGGTCAAGGACCAGGGCCTGTGGGCGATCTTCCTCGACCGGGAACTAGGTGGCCCGGGCTTCGGACAGCTCAGGCTGGCTCTGCTCAACGAGGTGATCGGCCGCTATCCCGGCGCGCCCGCGATGTTCGGTGCCGCGGCGCCCGATACCGGGAA
V1_2 flag=1 multi=9.0000 len=330
ATCTTCACCCAGCTCGGCAGCATGTTTCCCGTGGCGATGGAGTGCAGCATCGAGCCCAGGCAGATCACCAGCCCGGCGTCTTTCAACTGCGCGGCGTAGGCGTCCTGCGCCGCGTTCATATCGGTAATCGTATCGGGCAGCGGGCCGTCGTCGCGCAGGCTGCCCGCCAGCACGAACGGAATCCCAGAGCGCACGCATTCGTACAGGATGCCTTCCCGCAGGCATCCGCCCTCCACGGCCTGCCGGACGCTCCCGGCGCGATAGATCGCATTGATGGCGCGCATGTGATTGCGGTGCCCGTGCTCTTCCTGCCTCCCGTCGCTCAGCCGC\
I am currently trying to write a loop which would do this all in one go, as it is a lot of reads and I have multiple of these genetic sequence fasta files.
I don't want to ruin my file so I have created a copy of the file with the first 5000 reads in to test my code.
The code I have been trying to make work is as follows
for i in {1..5000}
do
if [ "$i" -le "9"]; then
sed -i 's/V1_i/V1_000000i/' testfile.fa
elif [["$i" -gt "9"] && ["i" -le "99"]]; then
sed -i s/V1_i/V1_00000i/' testfile.fa
elif [["i" -gt "99"] && ["i" -le "999"]]; then
sed -i s/V1_i/V1_0000i/' testfile.fa
elif [["i" -gt "999"] && ["i" -le "9999"]]; then
sed -i s/V1_i/V1_000i/' testfile.fa
fi
done
I will rewrite the code below to explain what I think each line should be doing
for i in {1..5000} - **Denoting that it should be ran with i standing as 1-5000**
do
if [ "$i" -le "9"]; then **If 'i' is less than 9 then do...**
sed -i 's/V1_i/V1_000000i/' testfile.fa **replace V1_i with V1_000000i within testfile.fa**
elif [["$i" -gt "9"] && ["i" -le "99"]]; then **else if 'i' is more than 9 but equal to or less than 99 then do....**
sed -i s/V1_i/V1_00000i/' testfile.fa **replace V1_i with V1_000000i within testfile.fa**
elif [["i" -gt "99"] && ["i" -le "999"]]; then
sed -i s/V1_i/V1_0000i/' testfile.fa
elif [["i" -gt "999"] && ["i" -le "9999"]]; then
sed -i s/V1_i/V1_000i/' testfile.fa
fi
done
The result I get evertime is 4 lots of 'command not found' as pasted below, per number in the range.
[1: command not found
[[1: command not found
[[1: command not found
[[1: command not found
[2: command not found
[[2: command not found
[[2: command not found
[[2: command not found
etc until 5000
I assume I must have something wrong with how I've written the code, but as someone who is new to this, I can't see what is wrong.
Thank you for reading, if you can help that is very much appreciated. If you need anymore details, I will gladly try and help to the best of my ability. Unfortunately, I can't share the exact files I'm working on (I know this isn't helpful sorry) as I do not have permission.
Shell syntax
The result I get evertime is 4 lots of 'command not found' as pasted
below, per number in the range.
[1: command not found
[[1: command not found
[[1: command not found
[[1: command not found
[2: command not found
[[2: command not found
[[2: command not found
[[2: command not found
etc until 5000
The [ character is not special to the shell. [ and [[ are not operators, but rather an ordinary command and a reserved word, repsectively. They have no involvement in splitting command lines into words. Similar applies to ] and ]] -- the shell does not automatically break words on either side them.
The " character is special to the shell, but it does not create a word boundary. The shell has quoting, but it does not have not quote-delimited strings as a syntactic unit in the sense that some other languages do.
With that in mind, consider this code fragment:
elif [["$i" -gt "9"] && ["i" -le "99"]]; then
Because neither [[ nor " produce a word break, [["$i" expands to a single word, for example [[1, which, given its position, is interpreted as the name of a command to execute. There being no built-in command by that name and no program by that name in the path, executing that command fails with "command not found".
You need to insert whitespace to make separate words separate (but see also below):
elif [[ "$i" -gt "9" ] && [ "i" -le "99" ]]; then
Moreover, again, [ is a command and [[ is a reserved word naming a built-in command. ] is an argument with special meaning to the [ command, and ]] is an argument with special significance to the [[ built-in. Although they (intentionally) have a similar appearance, these are not analogous to parentheses. You don't need to impose grouping here anyway. The && operator already separates commands, and the overall pipeline does not need to be explicitly demarcated as a group. This would be correct and more natural:
elif [[ "$i" -gt "9" ]] && [[ "$i" -le "99" ]]; then
Furthermore, although it is not wrong, it is unnecessary and a bit weird to quote your numbers. The case is amore nuanced for the expansions of $i, since its values are fully under your control, but "always quote your variable expansions" is a pretty good rule until your shell scripting is strong enough for you to decide for yourself when you can do otherwise. So, this is where we arrive:
elif [[ "$i" -gt 9 ]] && [[ "$i" -le 99 ]]; then
You will want to do likewise throughout your script.
But wait, there's more!
I think the changes described above would make your script work, but it would be extremely slow, because it will make 5000 passes through the whole file. And on the whole 1.5M entry file, you would need a version that made 1.5M passes through the whole half-gigabyte-ish of data. It would take years to complete.
That approach is not viable, not really even for the 5000 lines. You need something that will make only a single pass through the file, or at worst a small, fixed number of passes. I think a one-pass approach would be possible with sed, but it would take a complex and very cryptic sed expression. I'm a sed fan, but I would recommend awk for this, or even shell without any external tool.
A pure-shell version could be built with the read and printf built-in commands combined with some of the shell's other features. An awk version could be expressed as a not-overly-complex one-liner. Details of either of these options depends on the file syntax, however, which, as I commented on the question, I think you have misrepresented.

How to pad a value with zeroes based on a match in a string and the length of the following string?

I have some problems adapting the answers from previous questions, so I hope it is ok to write for a specific solution.
I have a file with RNA-reads in the fasta format, however the end of the readname has been messed up, so I need to correct it.
It is a simple task of padding zeroes into the middle of a string, however I cannot get it to work as I also need to identify the length and the position of the problem.
My read file header looks like this:
#V350037327L1C001R0010000023/1_U1
and I need to search for the "/1_U" and then left pad zeroes to the rest of the line up to a total length of 6.
It will look like this:
#V350037327L1C001R0010000023/1_U000001
The final length should be six following "/1_U".
eg: input:
#V350037327L1C001R0010000055/1_U300 = /1_U000300
#V350037327L1C001R0010000122/1_U45000 = /1_U045000
I have tried with awk, however I cannot get it to check the initial length and hence not pad the correct number of zeroes.
Thank you in advance and thank you for your neverending support in this forum
Try this:
#! /bin/bash
files=('#V350037327L1C001R0010000023/1_U1'
'#V350037327L1C001R0010000055/1_U300'
'#V350037327L1C001R0010000122/1_U45000')
for file in "${files[#]}"; do
if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
printf '%s%06d\n' "${BASH_REMATCH[#]:1}"
fi
done
Update: This reads the files from stdin.
#! /bin/bash
while read -r file; do
if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
printf '%s%06d\n' "${BASH_REMATCH[#]:1}"
fi
done
Update 2: You should really learn the basics of shell programming before you start programming the shell. Typical basics are conditional constructs.
#! /bin/bash
while read -f file; do
if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
printf '%s%06d\n' "${BASH_REMATCH[#]:1}"
else
printf '%s\n' "$file"
fi
done

Script which will move non-ASCII files

I need help. I should write a script,whih will move all non-ASCII files from one directory to another. I got this code,but i dont know why it is not working.
#!/bin/bash
for file in "/home/osboxes/Parkhom"/*
do
if [ -eq "$( echo "$(file $file)" | grep -nP '[\x80-\xFF]' )" ];
then
if test -e "$1"; then
mv $file $1
fi
fi
done
exit 0
It's not clear which one you are after, but:
• To test if the variable $file contains a non-ASCII character, you can do:
if [[ $file == *[^[:ascii:]]* ]]; then
• To test if the file $file contains a non-ASCII character, you can do:
if grep -qP '[^[:ascii:]]' "$file"; then
So for example your code would look like:
for file in "/some/path"/*; do
if grep -qP '[^[:ascii:]]' "$file"; then
test -d "$1" && mv "$file" "$1"
fi
done
The first problem is that your first if statement has an invalid test clause. The -eq operator of [ needs to take one argument before and one after; your before argument is gone or empty.
The second problem is that I think the echo is redundant.
The third problem is that the file command always has ASCII output but you're checking for binary output, which you'll never see.
Using file pretty smart for this application, although there are two ways you can go on this; file says a variety of things and what you're interested in are data and ASCII, but not all files that don't identify as data are ASCII and not all files that don't identify as ASCII are data. You might be better off going with the original idea of using grep, unless you need to support Unicode files. Your grep is a bit strange to me so I don't know what your environment is but I might try this:
#!/bin/bash
for file in "/home/osboxes/Parkhom"/*
do
if grep -qP '[\0x80-\0xFF]' $file; then
[ -e "$1" ] && mv $file $1
fi
done
The -q option means be quiet, only return a return code, don't show the matches. (It might be -s in your grep.) The return code is tested directly by the if statement (no need to use [ or test). The && in the next line is just a quick way of saying if the left-hand side is true, then execute the right-hand side. You could also form this as an if statement if you find that clearer. [ is a synonym for test. Personally if $1 is a directory and doesn't change, I'd check it once at the beginning of the script instead of on each file, it would be faster.
If you mean you want to know if something is not a plain text file then you can use the file command which returns information about the type of a file.
[[ ! $( file -b "$file" ) =~ (^| )text($| ) ]]
The -b simply tells it not to bother returning the filename.
The returned value will be something like:
ASCII text
HTML document text
POSIX shell script text executable
PNG image data, 21 x 34, 8-bit/color RGBA, non-interlaced
gzip compressed data, from Unix, last modified: Mon Oct 31 14:29:59 2016
The regular expression will check whether the returned file information includes the word "text" that is included for all plain text file types.
You can instead filter for specific file types like "ASCII text" if that is all you need.

Bash Script Loop

so I am trying to make a bash script loop that takes a users file name they want and the number of files they want and creates empty files. I made the script but I keep getting the error "./dog.sh: line 6: 0]: No such file or directory". I'm new to bash script and don't know what I'm doing wrong. Any help would be awesome thanks!
#!/bin/bash
echo "what file name do you want?"; read filename
echo "how many files do you want"; read filenumber
x=$filenumber
if [$x < 0]
then
touch $fiename$filenumber
x=$((x--))
fi
for x in $(seq "$filenumber"); do touch "$filename$x"; done
seq $filenumber produces a list of numbers from 1 to $filenumber. The for loop assigns x to each of these numbers in turn. The touch command is run for each value of x.
Alternative
In bash, if we can type the correct file number into the command line, the same thing can be accomplished without a for loop:
$ touch myfile{1..7}
$ ls
myfile1 myfile2 myfile3 myfile4 myfile5 myfile6 myfile7
{1..7} is called "brace expansion". Bash will expand the expression myfile{1..7} to the list of seven files that we want.
Brace expansion does have a limitation. It does not support shell variables. So, touch myfile{1..$filenumber} would not work. We have to enter the correct number in the braces directly.
Maybe it's a typo: $fiename instead of $filename
also, you might want some kind of loop like so:
x=1
while [ $x -le $filenumber ]; do
touch $filename$x
let x=x+1
done
#!/bin/bash
echo "what file name do you want?"; read filename
echo "how many files do you want"; read filenumber
x=$filenumber
while [ $x -gt 0 ]; do
touch $filename$x
x=$(( $x - 1))
done

Bash: Too many arguments

I've coded the following script to add users from a text file. It works, but I'm getting an error that says "too many arguments"; what is the problem?
#!/bin/bash
file=users.csv
while IFS="," read USRNM DOB SCH PRG PST ENROLSTAT ; do
if [ $ENROLSTAT == Complete ] ;
then
useradd $USRNM -p $DOB
else
echo "User $USRNM is not fully enrolled"
fi
done < $file
#cat users.csv | head -n 2 | tail -n 1
Use quotes. Liberally.
if [ "$ENROLSTAT" = Complete ]
(It's a single equal sign, too.) My greatest problem in shell programming is always hidden spaces. It's one of the reasons I write so much in Perl, and why, in Perl, I tell everyone on my team to avoid the shell whenever running external programs. There is just so much power in the shell, with so many little things that can trip you up, that I avoid it where possible. (And not where not possible.)

Resources