I have this input file
gb|KY798440.1|
gb|KY842329.1|
MG082893.1
MG173246.1
and I want to get all the characters that are between the "|" or the full line if there is no "|". That is a desired output that looks like
KY798440.1
KY842329.1
MG082893.1
MG173246.1
I wrote:
while IFS= read -r line; do
if [[ $line == *\|* ]] ; then
sed 's/.*\|\(.*\)\|.*/\1/' <<< $line >> output_file
else echo $line >> output_file
fi
done < input_file
Which gives me
empty line
empty line
MG082893.1
MG173246.1
(note: empty line means an actual empty line - it doesn't actually writes "empty line")
The sed command works on a single example (i.e. sed 's/.*\|\(.*\)\|.*/\1/' <<< "gb|KY842329.1|" outputs KY842329.1) but within the loop it just does a line return. The else echo $line >> output_file seems to work.
Bare sed:
$ sed 's/^[^|]*|\||[^|]*$//g' file
Output:
KY798440.1
KY842329.1
MG082893.1
MG173246.1
You could do
sed '/|/s/[^|]*|\([^|]*\)|.*/\1/' input
or
awk 'NF>1 {print $2} NF < 2 { print $1}' FS=\| input
or
sed -e 's/[^|]*|//' -e 's/|.*//' input
Related
I have a file for example with the name file.csv and content
adult,REZ
man,BRB
women,SYO
animal,HIJ
and a line that is nor a directory nor a file
file.csv BRB1 REZ3 SYO2
And what I want to do is change the content of the file with the words that are on the line and then get the nth letter of that word with the number at the end of the those words in capital
and the output should then be
umo
I know that I can get over the line with
for i in "${#:2}"
do
words+=$(echo "$i ")
done
and then the output is
REZ3 BRB1 SYO2
Using awk:
Pass the string of values as an awk variable and then split them into an array a. For each record in file.csv, iterate this array and if the second field of current record matches the first three characters of the current array value, then strip the target character from the first field of the current record and append it to a variable. Print the value of the aggregated variable.
awk -v arr="BRB1 REZ3 SYO2" -F, 'BEGIN{split(arr,a," ")} {for (v in a) { if ($2 == substr(a[v],0,3)) {n=substr(a[v],length(a[v]),1); w=w""substr($1,n,1) }}} END{print w}' file.csv
umo
You can also put this into a script:
#!/bin/bash
words="${2}"
src_file="${1}"
awk -v arr="$words" -F, 'BEGIN{split(arr,a," ")} \
{for (v in a) { \
if ($2 == substr(a[v],0,3)) { \
n=substr(a[v],length(a[v]),1); \
w=w""substr($1,n,1);
}
}
} END{print w}' "$src_file"
Script execution:
./script file.csv "BRB1 REZ3 SYO2"
umo
This is a way using sed.
Create a pattern string from command arguments and convert lines with sed.
#!/bin/bash
file="$1"
pat='s/^/ /;Te;'
for i in ${#:2}; do
pat+=$(echo $i | sed 's#^\([^0-9]*\)\([0-9]*\)$#s/.\\{\2\\}\\(.\\).*,\1$/\\1/;#')
done
pat+='Te;H;:e;${x;s/\n//g;p}'
eval "sed -n '$pat' $file"
Try this code:
#!/bin/bash
declare -A idx_dic
filename="$1"
pattern_string=""
for i in "${#:2}";
do
pattern_words=$(echo "$i" | grep -oE '[A-Z]+')
index=$(echo "$i" | grep -oE '[0-9]+')
pattern_string+=$(echo "$pattern_words|")
idx_dic["$pattern_words"]="$index"
done
pattern_string=${pattern_string%|*}
while IFS= read -r line
do
line_pattern=$(echo $line | grep -oE $pattern_string)
[[ -n $line_pattern ]] && line_index="${idx_dic[$line_pattern]}" && echo $line | awk -v i="$line_index" '{split($0, chars, ""); printf("%s", chars[i]);}'
done < $filename
first find the capital words pattern and catch the index corresponding
then construct the hole pattern words string which connect with |
at last, iterate the every line according to the pattern string, and find the letter by the index
Execute this script.sh like:
bash script.sh file.csv BRB1 REZ3 SYO2
I have a file which contains one line of text with tabs
echo -e "foo\tbar\tfoo2\nx\ty\tz" > file.txt
I'd like to get the first column with cut. It works if I do
$ cut -f 1 file.txt
foo
x
But if I read it in a bash script
while read line
do
new_name=`echo -e $line | cut -f 1`
echo -e "$new_name"
done < file.txt
Then I get instead
foo bar foo2
x y z
What am I doing wrong?
/edit: My script looks like that right now
while IFS=$'\t' read word definition
do
clean_word=`echo -e $word | external-command'`
echo -e "$clean_word\t<b>$word</b><br>$definition" >> $2
done < $1
External command removes diacritics from a Greek word. Can the script be optimized any further without changing external-command?
What is happening is that you did not quote $line when reading the file. Then, the original tab-delimited format was lost and instead of tabs, spaces show in between words. And since cut's default delimiter is a TAB, it does not find any and it prints the whole line.
So quoting works:
while read line
do
new_name=`echo -e "$line" | cut -f 1`
#----------------^^^^^^^
echo -e "$new_name"
done < file.txt
Note, however, that you could have used IFS to set the tab as field separator and read more than one parameter at a time:
while IFS=$'\t' read name rest;
do
echo "$name"
done < file.txt
returning:
foo
x
And, again, note that awk is even faster for this purpose:
$ awk -F"\t" '{print $1}' file.txt
foo
x
So, unless you want to call some external command while looping the file, awk (or sed) is better.
This is my code
title=""
line=""
fname=$1
numoflines=$(wc -l < $fname)
for ((i=2 ; i<=$numoflines ; i++))
do
...
done
In the for loop i want to print the first word of every line into $title
and the rest of the line without the first word into $line
(using bash)
tnx
I am assuming that by print to a variable you mean add the contents of each line to the variable. To do this, you can use the bash built-in function read:
while read -r t l; do title+="$t"; line+="$l"; done < "$fname"
This will add the first word of every line to $title and the rest of the line to $line.
You can do some like this:
echo "$fname"
This is my line.
My cat is green.
title=$(awk '{print $1}' <<< "$fname")
line=$(awk '{$1="";sub(/^ /,"")}1' <<< "$fname")
echo "$title"
This
My
echo "$line"
is my line.
cat is green.
Alternative approach using the cut command:
file="./myfile.txt"
title=$(cut -f1 -d ' ' "$file")
line=$(cut -f2- -d ' ' "$file")
#check print
pr -tm <(echo -e "TITLES\n$title") <(echo -e "LINES\n$line")
for the next myfile.txt
My cat is green.
Green cats are strange.
prints
TITLES LINES
My cat is green.
Green cats are strange.
do
Tempo="$( sed -n "${i} {s/^[[:blank:]]*\([^[:blank:]]*\)[[:blank:]]*\(.*\)/title='\1';line='\2'/p;q;}" ${fname} )"
eval "${Tempo}"
done
# or
do
sed -n "${i} {p;q;}" | read Line Title
# but this does not keep content available on each OS/shell
done
I am trying to split each line of a file in several lines by the string "\n" adding the number of the original file at the start of each line of the new file. I explain it in a example:
The original file:
2,5,6,\n6,3,4\n7,8,3
23,4,1,\n5,5,6,\n2,3,8
The file i want to get
1,2,5,6
1,6,3,4
1,7,8,3
2,23,4,1
2,5,5,6
2,2,3,8
I have tried with the following code but didn't work at all:
a=1
while read line
do
sed 's/^/tty/' "$line\n" >>file.csv
tr -s '\n' >> out.csv
a=`expr $a + 1`
done < file.csv
With awk
awk '{split($0, a, /\\n/)};
{for (i in a) {sub(/,$/, "", a[i]); print NR","a[i]}}' file.txt
Using sed and while loop:
a=1
while read line; do
sed -r "s/(([0-9]+,)+[0-9]+)[^0-9]*/$a,\1\n/g" <<< "$line"
((a++))
done < file.csv
1,2,5,6
1,6,3,4
1,7,8,3
2,23,4,1
2,5,5,6
2,2,3,8
a=1
while read line
do
echo $line|sed 's/\\n/\
'"$a"'/g'>>output.csv
a=`expr $a + 1`
done<filename
Manually press enter after sed 's/\\n/\
I'm trying to create a script to fix a csv file like this:
field_one,field_two,field_three
,field_two,field_three
So I need to check inside my loop if the current line is missing field_one and replace it with sed with a new value for field_one (overwrite the line missing field_one).
For this i have a loop but i need some help with identifying if the line is missing field one or not. I should probably use grep? but how to use it in a loop and get its response?
while read -r line; do
# this is pseudocode:
# if $line matches regex then
# sed 's/,/newfieldone/'
# overwrite the corrected line in the file
# end if
done < my_file
Thanks a lot in advance for your help!!!!
Inside your loop you can run following sed command:
sed 's/^\s*,/newfieldone,/'
To see if a line begins with a , and is hence missing field one, you can use if [[ "$line" =~ ^, ]].
For example:
while read -r line; do
if [[ "$line" =~ ^, ]]
then
echo "newfieldone$line"
else
echo "$line"
fi
done < my_file
Just for the heck of it, here's a solution in awk:
awk '{FS=","} {if ($1 == "") print "field_one" $0;else print $0} ' < /tmp/test.txt
$ sed -e "/^,/s/^,\([^,]*\),\([^,]\)/new_field_one,\1,\2/" < my_file
Edit: This probably is too complicated. Take one of the other fine answers :)
with sed try something like that:
sed -i 's|\(^,.*\)|new_field_one\1|g' <your file>
This might work for you:
a=Field_one,Field_two,Field_three
sed '/^,/c\'$a'' file
field_one,field_two,field_three
Field_one,Field_two,Field_three
Or if just inserting field_one:
a=Field_one
sed '/^,/s/^/'$a'/' file
field_one,field_two,field_three
Field_one,field_two,field_three
Simple bash solution using case statemetn:
while read -r line; do
case "$line" in
,*) printf "%s%s\n" newfieldone "$line" ;;
*) printf "%s\n" "$line" ;;
esac
done < my_file
case uses "glob" matching, not regular expressions, so ,* matches a string beginning with a comma.
sed -i 's/^,/fieldone,/' YOURFILE
Will replace every line starting , with fieldone, (inplace, so the original file gets overwritten, if you need a backup, try -i.backup).
If you want a dynamic fieldone value, well it depends, how dynamic want it to be :-), e.g.:
MYDYNAMICFIELDONE="DYNAF1"
sed -i "s/^,/${MYDYNAMICFIELDONE},/" YOURFILE
Or with your while loop:
while read -r line; do
MYDYNAMICFIELDONE="SET IT"
sed -i "s/^,/${MYDYNAMICFIELDONE},/"
done < my_file > tmpfile
mv tmpfile my_file
Or with awk:
awk '{
/^,/ {
DYNAF1="SET IT HERE"
print gensub("^,",DYNAF1 ",","g",$0)
}
} INPUT > OUTPUT
This is a pretty short 1-liner with awk
awk '{$1="field_one"}1' FS=',' OFS=',' file.csv
. . . and another awk one-liner:
awk '$1==""{$1="field_one"}1' FS=',' OFS=',' file
What about the use of bash only
while IFS=\, read field_one field_two rest_of_line
echo "${field_one:-default_field_one_value},$field_two,$rest_of_line"
doen < my_file > my_corecct_file
where the 'default_field_one_value' is used if the 'field_one' is empty