How do I test if a fasta file exists in bash? - bash

I am working with fasta files in a bash script. Before starting it, I would like to check that I have a file in fasta format.
Let's say I have this file:
>seq1
ASVNJF
>seq2
PNGRW
I was trying
#!/bin/bash
if ! [[ $text_file =~ ^(>).+\n[A-Z\n] ]]
then
echo "It is not a fasta file"
fi
But it is not working. Any ideas for it?
Thank you!!

[[ $text_file =~ ^(>).+\n[A-Z\n] ]] compares the value of $text_file which is probably the file's name, not its contents.
You can use the following Perl one-liner to check the files:
perl -ne '
$id = />.+/;
die "Empty $.\n" if $id && $p || $id && eof;
$p = $id;
die "Invalid char $1 ($.)\n" if !$id && /([^A-Z\n])/
' -- file.seq
It stores whether there is an id on the current line in the variable $id. $p stores the previous $id, which guards against two consecutive id lines. If the current line doesn't contain an id but contains something else than A-Z or a newline, the second error is reported. The special variable $. contains the current input line number.
To make the shell script exit when the Perl command fails, you need to tell the shell script to exit. Just add || exit 1 after the Perl command:
perl -ne '...' -- file.seq || exit 1

Related

Finding a file extension in a string using shell script

I have a long string, which contains a filename somewhere in it. I want to return just the filename.
How can I do this in a shell script, i.e. using sed, awk etc?
The following works in python, but I need it to work in a shell script.
import re
def find_filename(string, match):
string_list = string.split()
match_list = []
for word in string_list:
if match in word:
match_list.append(word)
#remove any characters after file extension
fullfilename = match_list[0][:-1]
#get just the filename without full directory
justfilename = fullfilename.split("/")
return justfilename[-1]
mystr = "the string contains a lot of irrelevant information and then a filename: /home/test/this_filename.txt: and then more irrelevant info"
file_ext = ".txt"
filename = find_filename(mystr, file_ext)
print(filename)
this_filename.txt
EDIT adding shell script requirement
I would call shell script like this:
./test.sh "the string contains a lot of irrelevant information and then a filename: /home/test/this_filename.txt: and then more irrelevant info" ".txt"
test.sh
#!/bin/bash
longstring=$1
fileext=$2
echo $longstring
echo $fileext
With bash and a regex:
#!/bin/bash
longstring="$1"
fileext="$2"
regex="[^/]+\\$fileext"
[[ "$longstring" =~ $regex ]] && echo "${BASH_REMATCH[0]}"
Output:
this_filename.txt
Tested only with your example.
See: The Stack Overflow Regular Expressions FAQ
Considering that you want to get file name with extension and then check if file is present or not in system, if this is the case could you please try following. Adding an additional check which is checking if 2 arguments are NOT passed to script then exit from program.
cat script.bash
if [[ "$#" -ne 2 ]]
then
echo "Please do enter do arguments as per script's need, exiting from program now."
exit 1;
fi
fileName=$(echo "$1" | awk -v ext="$2" 'match($0,/\/[^ :]*/){print substr($0,RSTART,RLENGTH) ext}')
echo "File name with file extension is: $fileName"
if [[ -f "$fileName" ]]
then
echo "File $fileName is present"
else
echo "File $fileName is NOT present."
fi

Not able to skip blank lines in a shell script

I am reading a text file line by line and taking the count of all lines as a part of my requirement.
When there is blank line then it get messed up. I tried with if condition for [ -z "$line" ] , however not able to succeed.
Here is my current code:
countNumberOfCases() {
echo "2. Counting number of test cases -----------"
cd $SCRIPT_EXECUTION_DIR
FILE_NAME=Features
while read line || [[ -n "$line" ]]
do
TEST_CASE="$line"
if [ "${TEST_CASE:0:1}" != "#" ] ; then
cd $MVN_EXECUTION_DIR
runTestCase
fi
done < $FILE_NAME
echo " v_ToalNoOfCases : = " $v_ToalNoOfCases
}
And below is Features file
web/sprintTwo/TC_002_MultipleLoginScenario.feature
#web/sprintOne/TC_001_SendMoneyTransaction_Spec.feature
web/sprintTwo/TC_003_MultipleLoginScenario.feature
#web/sprintOne/TC_004_SendMoneyTransaction_Spec.feature
When there is blank line it wont work properly so my requirement is that if there is blank line then it should be skipped and should not get considered.
You can write your loop in a little more robust way:
#!/bin/bash
while read -r line || [[ $line ]]; do # read lines one by one
cd "$mvn_execution_dir" # make sure this is an absolute path
# or move it outside the loop unless "runTestCase" function changes the current directory
runTestCase "$line" # need to pass the argument?
done < <(sed -E '/^[[:blank:]]*$/d; /^[[:blank:]]+#/d' "$file_name") # strip blanks and comments
A few things:
get your script checked at shellcheck for common mistakes
see this post for proper variable naming convention:
Correct Bash and shell script variable capitalization
see this discussion about [ vs [[ in Bash
Test for non-zero length string in Bash: [ -n “$var” ] or [ “$var” ]
about reading lines from a text file
Looping through the content of a file in Bash

bash: dealing with strange filenames tail invalid option --1

I want my script to find a file (in the current directory) with the first line equal to START. Then that file should have FILE <file_name> as the last line. So I want to extract the <file_name> - I use tail for this. It works ok for standard file names but cracks for nonstandard file names like a a or a+b-c\ = e with tail reporting tail option used in invalid context -- 1
Here is the beginning of the script:
#!/bin/bash
next_stop=0;
# find the first file
start_file=$(find . -type f -exec sed '/START/F;Q' {} \;)
mv "$start_file" $start_file # << that trick doesn't work
if [ ! -f "$start_file" ]
then
echo "File with 'START' head not found."
exit 1
else
echo "Found $start_file"
fi
# parse the last line of the start file
last_line=$(tail -1 $start_file) # << here it crashes for hacky names
echo "last line: $last_line"
if [[ $last_line == FILE* ]] ; then
next_file=${last_line#* }
echo "next file from last line: $next_file"
elif [[ $last_line == STOP ]] ; then
next_stop=true;
else
echo "No match for either FILE or STOP => exit"
exit 1
fi
I tried to embrace the find output with braces this way
mv "$start_file" $start_file
but it doesn't help
This error is occur to the character of the escape.
You should write it start_file variable in quotes.
last_line=$(tail -1 $start_file) --> last_line=$(tail -1 "$start_file")
For you two examples, you need to escape space and egual in file name (with \ character), and escape escape character too.
So a a have to be a\ a when passing to tail, and a+b-c\ = e have to be a+b-c\\\ \=\ e.
You can use sed to make this replacement.
This example give you an better and easier way to make this replacement :
printf '%q' "$Strange_filename"

extract information from a file in unix using shell script

I have a below file which containing some data
name:Mark
age:23
salary:100
I want to read only name, age and assign to a variable in shell script
How I can achieve this thing
I am able to real all file data by using below script not a particular data
#!/bin/bash
file="/home/to/person.txt"
val=$(cat "$file")
echo $val
please suggest.
Rather than running multiple greps or bash loops, you could just run a single read that reads the output of a single invocation of awk:
read age salary name <<< $(awk -F: '/^age/{a=$2} /^salary/{s=$2} /^name/{n=$2} END{print a,s,n}' file)
Results
echo $age
23
echo $salary
100
echo $name
Mark
If the awk script sees an age, it sets a to the age. If it sees a salary , it sets s to the salary. If it sees a name, it sets n to the name. At the end of the input file, it outputs what it has seen for the read command to read.
Using grep : \K is part of perl regex. It acts as assertion and checks if text supplied left to it is present or not. IF present prints as per regex ignoring the text left to it.
name=$(grep -oP 'name:\K.*' person.txt)
age=$(grep -oP 'age:\K.*' person.txt)
salary=$(grep -oP 'salary:\K.*' person.txt)
Or using awk one liner ,this may break if the line containing extra : .
declare $(awk '{sub(/:/,"=")}1' person.txt )
Will result in following result:
sh-4.1$ echo $name
Mark
sh-4.1$ echo $age
23
sh-4.1$ echo $salary
100
You could try this
if your data is in a file: data.txt
name:vijay
age:23
salary:100
then you could use a script like this
#!/bin/bash
# read will read a line until it hits a record separator i.e. newline, at which
# point it will return true, and store the line in variable $REPLY
while read
do
if [[ $REPLY =~ ^name:.* || $REPLY =~ ^age:.* ]]
then
eval ${REPLY%:*}=${REPLY#*:} # strip suffix and prefix
fi
done < data.txt # read data.txt from STDIN into the while loop
echo $name
echo $age
output
vijay
23
well if you can store data in json or other similar formate it will be very easy to access complex data
data.json
{
"name":"vijay",
"salary":"100",
"age": 23
}
then you can use jq to parse json and get data easily
jq -r '.name' data.json
vijay

Add lines to a document if they do not already exist within the document

I am trying to say, if document does not exist, then create document. Next read each line of the document and if none of the lines match the $site/$name variables, then add the $site/$name variable into the document.
#!/bin/bash
site=http://example.com
doc=$HOME/myfile.txt
if [ ! -f $doc ]
then
touch $doc
fi
read -p "name? " name
while read lines
do
if [[ $lines != $site/$name ]]
then
echo $site/$name >> $doc
fi
done <$doc
echo $doc
echo $site
echo $name
echo $site/$name
echo $lines
Typing test at the read -p prompt the results are
path/to/myfile.txt
http://example.com
test
http://example.com/test
I feel like I should know this but I'm just not seeing it. What am I doing wrong?
If the file is initially empty, you'll never enter the loop, and thus never add the line. If the file is not empty, you'd add your line once for every non-matching line anyway. Try this: set a flag to indicate whether or not to add the line, then read through the file. If you ever find a matching line, clear the flag to prevent the line from being added after the loop.
do_it=true
while read lines
do
if [[ $lines = $site/$name ]]
then
do_it=false
break
fi
done < "$doc"
if [[ $do_it = true ]]; then
echo "$site/$name" >> "$doc"
fi
The following creates the file if it doesn't exist. It then checks to see if it contains $site/$name. If it doesn't find it, it adds the string to the end of the file:
#!/bin/bash
site=http://example.com
doc=$HOME/myfile.txt
read -p "name? " name
touch "$doc"
grep -q "$site/$name" "$doc" || echo "$site/$name" >>"$doc"
How it works
touch "$doc"
This creates the file if it doesn't exist. If it does already exist, the only side-effect of running this command is that the file's timestamp is updated.
grep -q "$site/$name" || echo "$site/$name" >>"$doc"
The grep command sets its exit code to true if it finds the string. If it doesn't find it, then the "or" clause (in shell, || means logical-or) is triggered and the echo command adds the string to the end of the file.

Resources