Isolate words followed by certain word combinations [closed] - terminal

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a dataset that looks something like what I have typed underneath. I'm working from the terminal from my Mac. I want to only keep queries that have 'nohitsfound' like query ENST00000446470.1 and remove queries that have 'length' two consecutive times or more like ENST00000382676.1. However, I don't know how to do this.
Query=ENST00000446470.1 Length=261 Nohitsfound Query=MSTRG.50645.1 Length=2007 Nohitsfound Query=ENST00000382676.1 Length=285 Length=94 Length=94 Length=94 Length=94 Query=ENST00000641821.1 Length=1217 Nohitsfound Query=ENST00000641436.1 Length=1821 Nohitsfound Query=ENST00000649959.1 Length=1734 Nohitsfound Query=MSTRG.50650.1 Length=245 Nohitsfound Query=ENST00000514465.1 Length=1395 Length=464 Length=464 Length=464
Any help is highly appreciated!

echo 'Query=ENST00000446470.1 Length=261 Nohitsfound Query=MSTRG.50645.1 Length=2007 Nohitsfound Query=ENST00000382676.1 Length=285 Length=94 Length=94 Length=94 Length=94 Query=ENST00000641821.1 Length=1217 Nohitsfound Query=ENST00000641436.1 Length=1821 Nohitsfound Query=ENST00000649959.1 Length=1734 Nohitsfound Query=MSTRG.50650.1 Length=245 Nohitsfound Query=ENST00000514465.1 Length=1395 Length=464 Length=464 Length=464' | tr "Q" "\n" | grep Nohitsfound | grep -vwE "Length.*Length"
This will work assuming the char Q will not be part of the Query itself.
Explanation:
echo is a terminal command to print to screen.
| (called pipe) is a form of redirection the output to the next
command.
tr will replace every "Q" with "\n" (which is a new line.)
grep will leave only the rows with Nohitsfound.
grep -vwE will remove lines that match the reg expression "Length.*Length".

Related

Bash grep pattern matching on a substring [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I'm struggling a little bit with grep pattern matching. I thought ${d%?} would match on all but the last character, but it seems to be matching more aggressively than that.
dbs=$(${db_home}/bin/srvctl config | sort)
counter=1
for d in ${dbs[#]};do
echo -e "dbs[${counter}] = ${d}"
inst2[${d}]=$(sudo ${grid_home}/bin/crsctl stat res -w "((TYPE = ora.database.type) AND (LAST_SERVER = $(hostname -s)))" -f | grep ^USR_ORA_INST_NAME= | grep ${d%?})
echo -e "inst2[${d}] = ${inst2[${d}]}"
I'd expect my output to be something like
dbs[1] = ope3u005
inst2[ope3u005] = USR_ORA_INST_NAME=ope3u0051
dbs[2] = ope3u006
inst2[ope3u006] = USR_ORA_INST_NAME=ope3u0061
But instead I'm getting
dbs[1] = ope3u005
inst2[ope3u005] = USR_ORA_INST_NAME=ope3u0051
USR_ORA_INST_NAME=ope3u0061
dbs[2] = ope3u006
inst2[ope3u006] = USR_ORA_INST_NAME=ope3u0051
USR_ORA_INST_NAME=ope3u0061
It's pretty clearly stripping off more than the last character for matching.
This isn't a use case for where we need to strip off the last character, but I'm trying to find a solution that works for this case as well as the cases where we do need to.
If you strip the final character off "ope3u005" then obviously it's going to match both "ope30051" and "ope3u0061"

Bash-script won't work, initializing a variable (string) with output of sed [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have some trouble with this script.
#!/bin/bash
i=1
for i in $(cat rows.txt);
do
j = "$(sed -ne "$[i-1]p" LOB_final.html)"
echo $j
sed -ne "$[i-1],$[i+4]p" LOB_final.html >> ./cards/$j.txt
done;
I have an other file that contains the right row-numbers (not exactly, [row-1] is the relevant row). This row contains a string with spaces included and should be the name of the file.
Script works so far as expected, but initializing j don't work.
Does anyone have a tip?
Thank you.
EDIT: The goal was, to write any six rows (1 before and 4 after ans the given row) in a file. The file should be named with the 1 before row of the given row (a string with white spaces included).
Questions is cleared, thanks to all.
check this, have fixed quite a few issues with your script
#!/bin/bash
i=1
for i in $(cat rows.txt);
do
j="$(sed -ne "$((i-1))p" LOB_final.html)"
echo "$j"
sed -ne "$((i-1)),$((i+4))p" LOB_final.html >> ./cards/"$j".txt
done;
Removed space which was there in assigning j
$[var] replaced with $((var))
Expanded variables in double quotes
You seem to want to get the filename from line i-1 and then to store that line together with the next five lines in a file with that name.
Using awk:
awk 'NR == FNR { rows[$0]; next }
FNR + 1 in rows { name = sprintf("./cards/%s.txt", $0); left = 6 }
left > 0 { print >name; --left }' rows.txt LOB_final.html
This has been tested on some toy data, but since I don't know what your data looks like I can't say for certain that it will work without (minor) modifications.
It has the benefit that it will not parse the whole of LOB_final.html twice for each row number read from rows.txt, which your original code does. In fact, it only ever reads each file once.

shell - remove numbers from a string column [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
SER1828-ZXC-A1-10002
SER1878-IOP-B1-98989
SER1930-QWE-A2-10301
SER1930-QWE-A2-10301
SER1930-QWS_GH-A2-10301
SER1930-REM_PH-A2-10301
From above data my requirement is to remove any number form eg "ZXC-A1"..
output required is
SER1828-ZXC-A-10002
SER1878-IOP-B-98989
SER1930-QWE-A-10301
SER1930-QWE-A-10301
SER1930-QWS_GH-A-10301
SER1930-REM_PH-A-10301
[I] have tried everything
... except, say:
awk -F- 'BEGIN { OFS="-" } { sub(/[0-9]/,"",$3); print }' yourdata
You should have been more clearer when you raise the problem. Do not add test cases later
You can try this, I have modified the third field to last but one. But credit to #Kaz
~> more test
SER1828-ZXC-A1-10002
SER1878-IOP-B1-98989
SER1930-QWE-A2-10301
SER1930-QWE-A2-10301
SER1930-QWS_GH-A2-10301
SER1930-REM_PH-A2-10301
SER1930-REM-SEW-PH-A2-10301
SER1940-REM-SPD-PL-D3-10301
~> awk -F- 'BEGIN { OFS="-" } { sub(/[0-9]/,"",$(NF-1)); print }' test
SER1828-ZXC-A-10002
SER1878-IOP-B-98989
SER1930-QWE-A-10301
SER1930-QWE-A-10301
SER1930-QWS_GH-A-10301
SER1930-REM_PH-A-10301
SER1930-REM-SEW-PH-A-10301
SER1940-REM-SPD-PL-D-10301
Edit
Explanation
-F- -> Define the input field separator
OFS="-" -> Set the output field separator
sub( -> Substitute
/[0-9]/ -> Select all the numbers
"" -> substitute with nothing
$(NF-1) -> For the last but one field
Print -> Print All results

Grep string if it has multiple match [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
It was hard to formulate a question, better i will show example.
Txt file has these lines
city:state:address
city:state
city:
I need to extract strings where
a) only one occurrences of :
b) only one occurrences of : and has value after :
c) two occurrences of :
and put these strings to deferent files, so one file will contain all strings with
city:state:address second with city:state third one city:
Note: File has many such strings. Not obligatory to create three files in one command. It will be enough one command where i may define how many : string should contain.
Use these invocations of grep and pipe the output into different files:
grep -E "^[^:]+:\s*$" file.txt
grep -E "^[^:]+:[^:]+$" file.txt
grep -E "^[^:]+:[^:]+:.*$" file.txt
It looks for something that is not : with the regex [^:]+. It uses ^ and $ at the begin and end to match the whole input line.
This is a job for awk, not grep. All you need is:
awk -F':' '
NF==3 { print > "file_c"; next }
{ print > ($2=="" ? "file_a" : "file_b") }
' file
and that'll create all the files you want in one pass of your input file.
If you have more fields and more rules just write them all down so they're mutually exclusive, e.g. you could implement the above as:
NF==3 { print > "file_c" }
NF==2 && $2=="" { print > "file_a" }
NF==2 && $2!="" { print > "file_b" }

ruby and sed -n matched group [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Doing git config -l | sed -n 's/^user.name=\(.*\)$/{\1}/p' in the shell will yield the current "user.name" set in the git config. But if I do this same command in backticks `` or with %x(<shel code>) in ruby, I get nothing returned.
I've found another way around without using sed in this case, but I'm wondering why I can get the output of sed without the -n flag, which would be whatever is piped to it, but I can never get the matched group (whether it be by itself or part of the stream that sed without the -n outputs).
You could do most of that in ruby:
conf = %x{git config -l}
if m = conf.match(/^user.name=(.*)/)
username = m[1]
end
To directly answer your question, the text in %x{} is subject to the same substitutions as double quoted strings, so you need to escape the backslashes:
irb(main):023:0> u = %x{git config -l | sed -n 's/^user.name=\(.*\)$/{\1}/p'}
=> ""
irb(main):024:0> u = %x{git config -l | sed -n 's/^user.name=\\(.*\\)$/{\\1}/p'}
=> "{Glenn Jackman}\n"
Or you could store the command in a single quoted string:
irb(main):020:0> cmd = %q{git config -l | sed -n 's/^user.name=\(.*\)$/{\1}/p'}
=> "git config -l | sed -n 's/^user.name=\\(.*\\)$/{\\1}/p'"
irb(main):022:0> u = %x{#{cmd}}
=> "{Glenn Jackman}\n"

Resources