How can i make a if statement inside a while loop to perform certain actions only on certain lines? - bash

I'm trying to write a simple bash script launched in sh that allows me to create a new output file starting from an input file and keeping each line starting with ">" in its position, while for every line that does not satisfy this requirement, it must delete every third character and then hanging it in the new file.
input file:
>0197_16S
-AAAAACATGTCCTCTTGTTTATA-----TNTGAGGTTTGACCTGCCCTATG--A---
>0688_16S
-----ACATCTTCTCTTGAGTTAT-----TTTGAGATATGACCTGCCCAATG--A-T-
.
.
.
.
sh script:
while IFS= read line; do
if [ "$line" = ">"* ]; then echo "$line" >> output.txt
else
var=$(echo "$line" | awk -vFS= '{for (i = 1; i <=NF; i+3) {printf $i(i+1)} printf "\n"}');
echo "$var" >> output.txt
fi;
done <foo.txt
the else statement seems to work, however the condition of the if is never verified, eliminating every third character also from the lines that begin with the character ">".
actual output:
>09716
-AAACAGTCTTTTTAT----NTAGTTGACTCCTAG-A--
>08816
----CACTCTTTAGTA----TTAGTAGACTCCAAG-A--
.
.
.
expected output:
>0197_16S
-AAACAGTCTTTTTAT----NTAGTTGACTCCTAG-A--
>0688_16S
----CACTCTTTAGTA----TTAGTAGACTCCAAG-A--
.
.
.

Try avoiding a while-loop.
Without the condition keeping each line starting with ">" in its first position you can do
sed -r 's/(..)./\1/g' foo.txt
Add a condition for the lines with > can be done by changing all lines that don't match
sed -r '/^>/ !s/(..)./\1/g' foo.txt
Or with awk:
awk '/^>/ {print;next} {print gensub(/(..)./,"\\1", "g")}' foo.txt

Related

Trying to take input file and textline from a given file and save it to other, using bash

What I have is a file (let's call it 'xfile'), containing lines such as
file1 <- this line goes to file1
file2 <- this goes to file2
and what I want to do is run a script that does the work of actually taking the lines and writing them into the file.
The way I would do that manually could be like the following (for the first line)
(echo "this line goes to file1"; echo) >> file1
So, to automate it, this is what I tried to do
IFS=$'\n'
for l in $(grep '[a-z]* <- .*' xfile); do
$(echo $l | sed -e 's/\([a-z]*\) <- \(.*\)/(echo "\2"; echo)\>\>\1/g')
done
unset IFS
But what I get is
-bash: file1(echo "this content goes to file1"; echo)>>: command not found
-bash: file2(echo "this goes to file2"; echo)>>: command not found
(on OS X)
What's wrong?
This solves your problem on Linux
awk -F ' <- ' '{print $2 >> $1}' xfile
Take care in choosing field-separator in such a way that new files does not have leading or trailing spaces.
Give this a try on OSX
You can use the regex capabilities of bash directly. When you use the =~ operator to compare a variable to a regular expression, bash populates the BASH_REMATCH array with matches from the groups in the regex.
re='(.*) <- (.*)'
while read -r; do
if [[ $REPLY =~ $re ]]; then
file=${BASH_REMATCH[1]}
line=${BASH_REMATCH[2]}
printf '%s\n' "$line" >> "$file"
fi
done < xfile

How to prevent writing new line while read line in bash

The examplary code below writes hi in a new line at every iteration. Is there a way to prevent this?
#!/bin/bash
while read line; do
var=$(echo $line | cut -d \, -f 2)
echo -n " $var"
done < file.csv > output.txt
Desired output is a concatenation of '$var's at each iteration. The code is run in OS X.
[Resolved]
In most cases of similar problems, klashww's answer would be what you want to try so that I would accept it as the answer. Yet, in my case, such options all failed in fixing the bug. The behavior was due to non-displayed character '^M' at the end of each line, since the file was coming from windows. I relearned that we should make sure to get rid of '^M' before processing it in bash via the line below. After that, the original code works fine.
tr -d '\015' < file > newfile
You might like to try using pure bash:
while IFS=',' read nu1 var nu2; do
echo -n " $var"
done < file.csv > output.txt
nu: "not used"
Use echo "hi\c" instead of echo -n "hi" or printf if avaliable , example printf "hi".
In your example, this should work:
while read line; do
var=$(echo $line | cut -d \, -f 2)
printf " $var"
done < file.csv > output.txt
Or you can use a better tool:
awk -F\, '{printf " "$2}' file.csv > output.txt
If everything fails tr brute force:
echo " $var"| tr -d '\n'

Using cut on stdout with tabs

I have a file which contains one line of text with tabs
echo -e "foo\tbar\tfoo2\nx\ty\tz" > file.txt
I'd like to get the first column with cut. It works if I do
$ cut -f 1 file.txt
foo
x
But if I read it in a bash script
while read line
do
new_name=`echo -e $line | cut -f 1`
echo -e "$new_name"
done < file.txt
Then I get instead
foo bar foo2
x y z
What am I doing wrong?
/edit: My script looks like that right now
while IFS=$'\t' read word definition
do
clean_word=`echo -e $word | external-command'`
echo -e "$clean_word\t<b>$word</b><br>$definition" >> $2
done < $1
External command removes diacritics from a Greek word. Can the script be optimized any further without changing external-command?
What is happening is that you did not quote $line when reading the file. Then, the original tab-delimited format was lost and instead of tabs, spaces show in between words. And since cut's default delimiter is a TAB, it does not find any and it prints the whole line.
So quoting works:
while read line
do
new_name=`echo -e "$line" | cut -f 1`
#----------------^^^^^^^
echo -e "$new_name"
done < file.txt
Note, however, that you could have used IFS to set the tab as field separator and read more than one parameter at a time:
while IFS=$'\t' read name rest;
do
echo "$name"
done < file.txt
returning:
foo
x
And, again, note that awk is even faster for this purpose:
$ awk -F"\t" '{print $1}' file.txt
foo
x
So, unless you want to call some external command while looping the file, awk (or sed) is better.

Bash script get item from array

I'm trying to read file line by line in bash.
Every line has format as follows text|number.
I want to produce file with format as follows text,text,text etc. so new file would have just text from previous file separated by comma.
Here is what I've tried and couldn't get it to work :
FILENAME=$1
OLD_IFS=$IFSddd
IFS=$'\n'
i=0
for line in $(cat "$FILENAME"); do
array=(`echo $line | sed -e 's/|/,/g'`)
echo ${array[0]}
i=i+1;
done
IFS=$OLD_IFS
But this prints both text and number but in different format text number
here is sample input :
dsadadq-2321dsad-dasdas|4212
dsadadq-2321dsad-d22as|4322
here is sample output:
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
What did I do wrong?
Not pure bash, but you could do this in awk:
awk -F'|' 'NR>1{printf(",")} {printf("%s",$1)}'
Alternately, in pure bash and without having to strip the final comma:
#/bin/bash
# You can get your input from somewhere else if you like. Even stdin to the script.
input=$'dsadadq-2321dsad-dasdas|4212\ndsadadq-2321dsad-d22as|4322\n'
# Output should be reset to empty, for safety.
output=""
# Step through our input. (I don't know your column names.)
while IFS='|' read left right; do
# Only add a field if it exists. Salt to taste.
if [[ -n "$left" ]]; then
# Append data to output string
output="${output:+$output,}$left"
fi
done <<< "$input"
echo "$output"
No need for arrays and sed:
while IFS='' read line ; do
echo -n "${line%|*}",
done < "$FILENAME"
You just have to remove the last comma :-)
Using sed:
$ sed ':a;N;$!ba;s/|[0-9]*\n*/,/g;s/,$//' file
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
Alternatively, here is a bit more readable sed with tr:
$ sed 's/|.*$/,/g' file | tr -d '\n' | sed 's/,$//'
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
Choroba has the best answer (imho) except that it does not handle blank lines and it adds a trailing comma. Also, mucking with IFS is unnecessary.
This is a modification of his answer that solves those problems:
while read line ; do
if [ -n "$line" ]; then
if [ -n "$afterfirst" ]; then echo -n ,; fi
afterfirst=1
echo -n "${line%|*}"
fi
done < "$FILENAME"
The first if is just to filter out blank lines. The second if and the $afterfirst stuff is just to prevent the extra comma. It echos a comma before every entry except the first one. ${line%|\*} is a bash parameter notation that deletes the end of a paramerter if it matches some expression. line is the paramter, % is the symbol that indicates a trailing pattern should be deleted, and |* is the pattern to delete.

BASH - Reading Multiple Lines from Text File

i am trying to read a text file, say file.txt and it contains multiple lines.
say the output of file.txt is
$ cat file.txt
this is line 1
this is line 2
this is line 3
I want to store the entire output as a variable say, $text.
When the variable $text is echoed, the expected output is:
this is line 1 this is line 2 this is line 3
my code is as follows
while read line
do
test="${LINE}"
done < file.txt
echo $test
the output i get is always only the last line. Is there a way to concatenate the multiple lines in file.txt as one long string?
You can translate the \n(newline) to (space):
$ text=$(tr '\n' ' ' <file.txt)
$ echo $text
this is line 1 this is line 2 this is line 3
If lines ends with \r\n, you can do this:
$ text=$(tr -d '\r' <file.txt | tr '\n' ' ')
Another one:
line=$(< file.txt)
line=${line//$'\n'/ }
test=$(cat file.txt | xargs)
echo $test
You have to append the content of the next line to your variable:
while read line
do
test="${test} ${LINE}"
done < file.txt
echo $test
Resp. even simpler you could simply read the full file at once into the variable:
test=$(cat file.txt)
resp.
test=$(tr "\n" " " < file.txt)
If you would want to keep the newlines it would be as simple as:
test=<file.txt
I believe it's the simplest method:
text=$(echo $(cat FILE))
But it doesn't preserve multiple spaces/tabs between words.
Use arrays
#!/bin/bash
while read line
do
a=( "${a[#]}" "$line" )
done < file.txt
echo -n "${a[#]}"
output:
this is line 1 this is line 2 this is line 3
See e.g. tldp section on arrays

Resources