output of oddlines in sed not appearing on separate lines - bash

I have the following file:
>A6NGG8_201_I_F
line2
>B1AK53_719_S_R
line4
>B1AK53_744_D_N
line5
>B7U540_205_R_H
line6
>B7U540_354_T_M
line7
where I want to print out all odd lines. I can do this by:
$ sed -n 1~2p file
>A6NGG8_201_I_F
>B1AK53_719_S_R
>B1AK53_744_D_N
>B7U540_205_R_H
>B7U540_354_T_M
and so I want to store the number in each line as a variable in bash, however I run into a problem - storing the result of sed puts the output all on one line:
#!/bin/bash
line1=$(sed -n 1~2p)
echo ${line1}
in which the output is:
>A6NGG8_201_I_F >B1AK53_719_S_R >B1AK53_744_D_N >B7U540_205_R_H >B7U540_354_T_M
so that when I do something like:
#!/bin/bash
line1=$(sed -n 1~2p)
pos=$(echo ${line1} | awk -F"[__]" 'NF>2{print $2}')
echo ${pos}
I get
201
where I of course want:
201
719
744
205
354
How do I store the result of sed into separate lines so that they are processed properly when piped into my awk statement? I see you can use the /anotation, however when I tried sed -n '/1~2p/a' filethis does not work in my bash script. Thanks

As said in comments, you need to quote the variable to make this happen:
echo "${line1}"
instead of
echo ${line1}
However, you can directly say:
awk -F_ 'NR%2 && NF>2 {print $2}' file
This will process even lines and, in them, print the 2nd field on _ separated, just if it there are more than 2 fields.
From tripleee's answer I observe that a FASTA file can contain a different format. If so, I guess you will still want to get the ID in the lines starting with ">". This can be translated as:
awk -F_ '/^>/ && NF>2 {print $2}' file
See an example of how quoting preserves the format:
The file:
$ cat a
hello
bye
Read it into a variable:
$ var=$(< a)
echo without quoting:
$ echo $var
hello bye
Let's quote!
$ echo "$var"
hello
bye

If you are trying to get the header lines out of a FASTA file, your problem statement is wrong -- the data between the headers could be more than one line. You could simply do
sed -n '/^>/!d;s/^[^_]*//;s/_.*//p' file.fasta
to get just the second underscore-delimited field out of each header line; or equivalently, in Awk,
awk -F _ '/^>/ { print $2 }' file.fasta

Related

Assign bash value from value in specific line

I have a file that looks like:
>ref_frame=1
TPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSD
>ref_frame=2
HQGLDISTMCFHRDGKDHQQYSKVA*QKS*SLLENKIQT*LSINTWMICM*DLT
>ref_frame=3
TRD*ISVQCASTGMERITSNIPK*HDKNLRAF*KTKSRHSYLSIHG*FVCRI*
>test_3_2960_3_frame=1
TPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPSRKQNPDIVIYQYMDDLYVGSD
I want to assign a bash variable so that echo $variable gives test_3_2960
The line/row that I want to assign the variable to will always be line 7. How can I accomplish this using bash?
so far I have:
variable=`cat file.txt | awk 'NR==7'`
echo $variable = >test_3_2960_3_frame=1
Using sed
$ variable=$(sed -En '7s/>(([^_]*_){2}[0-9]+).*/\1/p' input_file)
$ echo "$variable"
test_3_2960
No pipes needed here...
$: variable=$(awk -F'[>_]' 'NR==7{ OFS="_"; print $2, $3, $4; exit; }' file)
$: echo $variable
test_3_2960
-F is using either > or _ as field separators, so your data starts in field 2.
OFS="_" sets the Output Field Separator, but you could also just use "_" instead of commas.
exit keeps it from wasting time bothering to read beyond line 7.
If you wish to continue with awk
$ variable=$(awk 'NR==7' file.txt | awk -F "[>_]" '{print $2"_"$3"_"$4}')
$ echo $variable
test_3_2960

How to find values ​in quotes using bash?

I have a file with the following content:
"X-Apple-I-MD-M" = "MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s";
I want to extract the returned results Output as:
MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s
Tks Everybody!
One awk idea, assuming this is the only line in the file:
$ awk -F'"' '{print $4}' file
MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s
If there are other lines and you wish to focus only on the line with the string "X-Apple-I-MD-M":
Input file:
$ cat file
some line to ignore
"X-Apple-I-MD-M" = "MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s";
other line to ignore and "with" some "quotes"
New awk idea:
$ pattern='X-Apple-I-MD-M'
$ awk -v ptn="${pattern}" -F'"' '$2==ptn {print $4}' file
MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s
And saving the awk result in a variable:
$ mystring=$(awk ... )
$ echo "${mystring}"
MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s
NOTE: keep in mind if there are multiple matching lines in file then ${mystring} will contain a multi-line value (eg, line1match\nline2match\nline3match
I always like sed.
$: echo '"X-Apple-I-MD-M" = "MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s";'| sed -E 's/^.*= *"([^"]+)" *; *$/\1/'
MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s
if it's a file,
$: sed -E 's/^.*= *"([^"]+)" *; *$/\1/' file
MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s
With GNU grep(1), something like.
grep -Po '(?<="X-Apple-I-MD-M" = ").*(?=";)' <<< '"X-Apple-I-MD-M" = "MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s";'
If it is in a file.
grep -Po '(?<="X-Apple-I-MD-M" = ").*(?=";)' file.txt
If your content is consistent, an ugly solution is:
VAL='"X-Apple-I-MD-M" = "MR7v7ctwW0yr3mAUY3rAluXgOReA4CIn1JWJS2ba1s";'
echo $VAL
echo $VAL | awk '{split($0, a, " = "); print(substr(a[2], 2, length(a[2]) - 3))}'
Guessing by the bash tag, this is probably supposed to be in pure Bash, without external processes…? Two (somewhat) random options:
while IFS='"' read _ _ _ code _; do
echo "$code"
done
while read line; do
line="${line#\"*\" = \"}"
line="${line%\";}"
echo "$line"
done

Replace text in file with incremented text

I have a file in the directory with this text
VERSION_NUMBER: 1
I need to get the value of VERSION_NUMBER convert it to number and make n+1 and write it to a variable, for example variable test.
How I can do this using sed
Assumptions:
there's only one line in the input file
there's no need to verify that the value following the : is a number
no need to update the file with the new value
Input file:
$ cat myfile
VERSION_NUMBER: 1
One sed idea:
$ x=$(sed -En 's/^.*: (.*)$/\1/p' myfile)
$ ((x++))
$ echo "${x}"
2
One cut idea:
$ x=$(cut -d: -f2 myfile)
$ ((x++))
$ echo "${x}"
2
Same thing with awk:
$ x=$(awk '{print $2}' myfile)
$ ((x++))
$ echo "${x}"
2
In a comment OP has asked how to update the file with the new value.
Since we're only talking about a single line the following ...
$ echo "VERSION_NUMBER: ${x}" > myfile
... is probably going to be easier/simpler than running another sed or awk command to overwrite the current file.

Read each line of a column of a file and execute grep

I have file.txt exemplary here:
This line contains ABC
This line contains DEF
This line contains GHI
and here the following list.txt:
contains ABC<TAB>ABC
contains DEF<TAB>DEF
Now I am writing a script that executes the following commands for each line of this external file list.txt:
take the string from column 1 of list.txt and search in a third file file.txt
if the first command is positive, return the string from column 2 of list.txt
So my output.txt is:
ABC
DEF
This is my code for grep/echo with putting the query/return strings manually:
if grep -i -q 'contains abc' file.txt
then
echo ABC >output.txt
else
echo -n
fi
if grep -i -q 'contains def' file.txt
then
echo DEF >>output.txt
else
echo -n
fi
I have about 100 search terms, which makes the task laborious if done manually. So how do I include while read line; do [commands]; done<list.txt together with the commands about column1 and column2 inside that script?
I would like to use simple grep/echo/awkcommands if possible.
Something like this?
$ awk -F'\t' 'FNR==NR { a[$1] = $2; next } {for (x in a) if (index($0, x)) {print a[x]}} ' list.txt file.txt
ABC
DEF
For the lines of the first file (FNR==NR), read the key-value pairs to array a. Then for the lines of the second line, loop through the array, check if the key is found on the line, and if so, print the stored value. index($0, x) tries to find the contents of x from (the current line) $0. $0 ~ x would instead take x as a regex to match with.
If you want to do it in the shell, starting a separate grep for each and every line of list.txt, something like this:
while IFS=$'\t' read k v ; do
grep -qFe "$k" file.txt && echo "$v"
done < list.txt
read k v reads a line of input and splits it (based on IFS) into k and v.
grep -F takes the pattern as a fixed string, not a regex, and -q prevents it from outputting the matching line. grep returns true if any matching lines are found, so $v is printed if $k is found in file.txt.
Using awk and grep:
for text in `awk '{print $4}' file.txt `
do
grep "contains $text" list.txt |awk -F $'\t' '{print $2}'
done

Bash Shell: Infinite Loop

The problem is the following I have a file that each line has this form:
id|lastName|firstName|gender|birthday|joinDate|IP|browser
i want to sort alphabetically all the firstnames in that file and print them one on each line but each name only once
i have created the following program but for some reason it creates an infinite loop:
array1=()
while read LINE
do
if [ ${LINE:0:1} != '#' ]
then
IFS="|"
array=($LINE)
if [[ "${array1[#]}" != "${array[2]}" ]]
then
array1+=("${array[2]}")
fi
fi
done < $3
echo ${array1[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
NOTES
if [ ${LINE:0:1} != '#' ] : this command is used because there are comments in the file that i dont want to print
$3 : filename
array1 : is used for all the seperate names
Wow, there's a MUCH simpler and cleaner way to achieve this, without having to mess with the IFS variable or using arrays. You can use "for" to do this:
First I created a file with the same structure as yours:
$ cat file
id|lastName|Douglas|gender|birthday|joinDate|IP|browser
id|lastName|Tim|gender|birthday|joinDate|IP|browser
id|lastName|Andrew|gender|birthday|joinDate|IP|browser
id|lastName|Sasha|gender|birthday|joinDate|IP|browser
#id|lastName|Carly|gender|birthday|joinDate|IP|browser
id|lastName|Madson|gender|birthday|joinDate|IP|browser
Here's the script I wrote using "for":
#!/bin/bash
for LINE in `cat file | grep -v "^#" | awk -F'|' '{print$3}' | sort -u`
do
echo $LINE
done
And here's the output of this script:
$ ./script.sh
Andrew
Douglas
Madson
Sasha
Tim
Explanation:
for LINE in `cat file`
Creates a loop that reads each line of "file". The commands between ` are run by linux, for example, if you wanted to store the date inside of a variable you could use "VARDATE=`date`".
grep -v "^#"
The option -v is used to exclude results matching the pattern, in this case the pattern is "^#". The "^" character means "line begins with". So grep -v "^#" means "exclude lines beginning with #".
awk -F'|' '{print$3}'
The -F option switches the column delimiter from the default (the default is a space) to whatever you put between ' after it, in this case the "|" character.
The '{print$3}' prints the 3rd column.
sort -u
And the "sort -u" command to sort the names alphabetically.

Resources