RNA Splicing Python - bioinformatics

I have a gene sequence –
"acguccgcaagagaagccuuaauauauucaaaaagcuacgccucagauuucgcgcucgagcccaaaacaacugguguacggguugaucacaucaaaugaagucgcuaaagucggugaucucacuauccuugucuucggcuuuugcucucucggcuaucaucuaagcaggcgaguuccauggugaccggaacgacggcuacuggaguccaugaucgcaagcgucgggcugggguaaaagaggcucagcucauaauaguccgccccaccaguacgggacucgauaggccccgucguugccguagaaacgcaauuuuccucagacccacuauacgcaccucgauuuagcaugguuccgggguugcgcuuugagaaucauacguaaggaucggaaccuaggaaugcaccacagaacuuugaaauacuagaacaaguugauugacaacggaguaucggcgccccacauuuaacgaauaauugcaggcgccagacgaugcuaggugcguccguaucaagauucgaggucgcuacuggcuucgcuugccgaucgagcucagaguuugugagaguuguuacuaauugcguggucgccuaauauccuugauacuacguggguguacuagacaucccggacagaaaaucucuuaaacgcuagaguucucuuggaagcgccugcacuucuugugaacauacgaugauagccacucuaagcccaacgcacuucgcuuggcccacauugcccccagagcuuauucaucgacaggcguuccacucuuggauucaucaguaaacuuuauuauacgugguaagcgugcuuauagcugucggaaucucacuuaggcggauugaagugagacagccugaaaguaaccguguacaggcgccgucaauguguuuugagugugcaccuacaaaaaguguuauuuaggcaggggagcuuuguaguuucuuuagaagagccgcgaaugaaccaacgguagacugcgagcgcguucaaccuaau"
I want to splice the RNA and want to extract two lists (exons and introns). The key is that the intron section of RNA starts with gu and ends with ag. However, if ag appears before gu, it is a part of the exon and not the intron.
def splice(sequence):
introns = list()
exons = list()
while(sequence.count("gu")):
if "gu" not in sequence:
break
else:
exons.append(sequence[:sequence.find("gu")])
sequence = sequence[sequence.find("gu"):]
if "ag" not in sequence:
break
else:
introns.append(sequence[:sequence.find("ag")+2])
sequence = sequence[sequence.find("ag")+2:]
return introns, exons
This is what I have so far. It goes well pretty far but the issue begins at the end when gu appears without an ag in the remaining string.
Output:
Exons:
['ac',
'agaagccuuaauauauucaaaaagcuacgccucagauuucgcgcucgagcccaaaacaacug',
'ucgcuaaa',
'caggcga',
'uccaugaucgcaagc',
'aggcucagcucauaaua',
'uacgggacucgauaggcccc',
'aaacgcaauuuuccucagacccacuauacgcaccucgauuuagcaug',
'aaucauac',
'gaucggaaccuaggaaugcaccacagaacuuugaaauacuagaacaa',
'uaucggcgccccacauuuaacgaauaauugcaggcgccagacgaugcuag',
'auucgag',
'cucaga',
'a',
'acaucccggacagaaaaucucuuaaacgcuaga',
'cgccugcacuucuu',
'ccacucuaagcccaacgcacuucgcuuggcccacauugcccccagagcuuauucaucgacaggc',
'uaaacuuuauuauac',
'c',
'cu',
'gcggauugaa',
'acagccugaaa',
'gcgcc',
'u',
'u',
'gcaggggagcuuu',
'uuucuuuagaagagccgcgaaugaaccaacg',
'acugcgagcgc']
Introns:
['guccgcaag',
'guguacggguugaucacaucaaaugaag',
'gucggugaucucacuauccuugucuucggcuuuugcucucucggcuaucaucuaag',
'guuccauggugaccggaacgacggcuacuggag',
'gucgggcugggguaaaag',
'guccgccccaccag',
'gucguugccguag',
'guuccgggguugcgcuuugag',
'guaag',
'guugauugacaacggag',
'gugcguccguaucaag',
'gucgcuacuggcuucgcuugccgaucgag',
'guuugugag',
'guuguuacuaauugcguggucgccuaauauccuugauacuacguggguguacuag',
'guucucuuggaag',
'gugaacauacgaugauag',
'guuccacucuuggauucaucag',
'gugguaag',
'gugcuuauag',
'gucggaaucucacuuag',
'gugag',
'guaaccguguacag',
'gucaauguguuuugag',
'gugcaccuacaaaaag',
'guuauuuag',
'guag',
'guag']

I fixed the query by using regular expressions.
def splice(gene_Sequence):
regex = r"gu(?:\w{0,}?)ag"
introns = re.findall(regex, gene_Sequence)
for intron in introns:
exon = gene_Sequence.replace(intron, "")
return introns, exon

Related

How to parse username, ID or whole part using Ruby Regex in this sentence?

I have a sentences like this:
Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?
And what I want to is get #[Pratha](user:1) and #[John](user:3). Either their names and ids or just as texts as I quoted so that i can explode and parse name and id myself.
But there is an issue here. Names Pratha and John may include non-abc characters like ', ,, -, + , etc... But not [] and ()
What I tried so far:
c = ''
f = c.match(/(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s(\[)|$)))(\w+)(?=\s|$)/i)
But no success.
You may use
/#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
See the regex demo
Details
# - a # char
\[ - a [
([^\]\[]*) - Group 1: 0+ chars other than [ and ]
\] - a ] char
\( - a ( char
[^()]*- 0+ chars other than ( and )
: - a colon
(\d+) - Group 2: 1 or more digits
\) - a ) char.
Sample Ruby code:
s = "Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?"
rx = /#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
res = s.scan(rx)
puts res
# = > [["Pratha", "1"], ["John", "3"]]
"Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?".scan(/#.*?\)/)
#⇒ ["#[Pratha](user:1)", "#[John](user:3)"]
Since the line is not coming from the user input, you might rely on that the part you are interested in starts with # and ends with ).
You could use 2 capturing groups to get the names and the id's:
#\[([^]]+)]\([^:]+:([^)]+)\)
That will match
# Match literally
\[ Match [
([^]]+) 1st capturing group which matches not ] 1+ times using a negated character class.
\( Match literally
[^:]+: Match not :, then match :
([^)]+) 2nd capturing group which matches not ) 1+ times
\) Match )
Regex demo | Ruby demo

Remove nTh record from array using loop [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm writing a program that reads a .csv file, and then loops through it removing every 10th record it encounters before outputting it.
I've been stuck on what I believe is a syntax issue for a while now and just can't seem to nail it. Anyone mind having a look?
lines = []
i = 0
elements = []
element2 = []
output = []
file = File.open("./properties.csv", "r")
while (line = file.gets)
i += 1
# use split to break array up using commas
arr = line.split(',')
elements.push({ id: arr[0], streetAddress: arr[1], town: arr[2], valuationDate: arr[3], value: arr[4] })
end
file.close
# filter out blanks and nill rows
x = elements.select { |elements| elements[:id].to_i >= 0.1}
# Loop to remove every 10th record
e = 0
d = 1
loop do x.length
if e == (10 * d)
d ++
e ++
else
x = elements.select[e]
e ++
end
puts x
puts "#{x.length} house in list, #{d} records skipped."
CSV FILE
ID,Street address,Town,Valuation date,Value
1,1 Northburn RD,WANAKA,1/1/2015,280000
2,1 Mount Ida PL,WANAKA,1/1/2015,280000
3,1 Mount Linton AVE,WANAKA,1/1/2015,780000
4,1 Kamahi ST,WANAKA,1/1/2015,155000
5,1 Kapuka LANE,WANAKA,1/1/2015,149000
6,1 Mohua MEWS,WANAKA,1/1/2015,560000
7,1 Kakapo CT,WANAKA,1/1/2015,430000
8,1 Mt Gold PL,WANAKA,1/1/2015,1260000
9,1 Penrith Park DR,WANAKA,1/1/2015,1250000
10,1 ATHERTON PL,WANAKA,1/1/2015,650000
11,1 WAIMANA PL,WANAKA,1/1/2015,780000
12,1 ROTO PL,WANAKA,1/1/2015,1470000
13,1 Toms WAY,WANAKA,1/1/2015,2230000
14,1 MULBERRY LANE,WANAKA,1/1/2015,415000
15,1 Range View PL,WANAKA,1/1/2015,300000
16,1 Clearview ST,WANAKA,1/1/2015,1230000
17,1 Clutha PL,WANAKA,1/1/2015,700000
18,1 Centre CRES,WANAKA,1/1/2015,295000
19,1 Valley CRES,WANAKA,1/1/2015,790000
20,1 Edgewood PL,WANAKA,1/1/2015,365000
21,1 HUNTER CRES,WANAKA,1/1/2015,335000
22,1 KOWHAI DR,WANAKA,1/1/2015,480000
23,1 RIMU LANE,WANAKA,1/1/2015,465000
24,1 CHERRY CT,WANAKA,1/1/2015,495000
25,1 COLLINS ST,WANAKA,1/1/2015,520000
26,1 AUBREY RD,WANAKA,1/1/2015,985000
27,1 EELY POINT RD,WANAKA,1/1/2015,560000
28,1 LINDSAY PL,WANAKA,1/1/2015,385000
29,1 WINDERS ST,WANAKA,1/1/2015,760000
30,1 Manuka CRES,WANAKA,1/1/2015,510000
31,1 WILEY RD,WANAKA,1/1/2015,420000
32,1 Baker GR,WANAKA,1/1/2015,820000
33,1 Briar Bank DR,WANAKA,1/1/2015,1260000
34,1 LAKESIDE RD,WANAKA,1/1/2015,440000
35,1 PLANTATION RD,WANAKA,1/1/2015,345000
36,1 Allenby PL,WANAKA,1/1/2015,640000
37,1 ROB ROY LANE,WANAKA,1/1/2015,380000
38,1 Ansted PL,WANAKA,1/1/2015,590000
39,1 Fastness CRES,WANAKA,1/1/2015,640000
40,1 APOLLO PL,WANAKA,1/1/2015,385000
41,1 AEOLUS PL,WANAKA,1/1/2015,370000
42,1 Peak View RDGE,WANAKA,1/1/2015,1750000
43,1 Moncrieff PL,WANAKA,1/1/2015,530000
44,1 Islington PL,WANAKA,1/1/2015,190000
45,1 Hidden Hills DR,WANAKA,1/1/2015,1280000
46,1 Weatherall CL,WANAKA,1/1/2015,425000
47,1 Terranova PL,WANAKA,1/1/2015,900000
48,1 Cliff Wilson ST,WANAKA,1/1/2015,1200000
49,1 TOTARA TCE,WANAKA,1/1/2015,460000
50,1 Koru WAY,WANAKA,1/1/2015,570000
51,1 Bovett PL,Wanaka,1/1/2015,495000
52,1 Pearce PL,Wanaka,1/1/2015,675000
53,1 Ironside DR,WANAKA,1/1/2015,570000
54,1 Bob Lee PL,WANAKA,1/1/2015,610000
55,1 Hogan LANE,WANAKA,1/1/2015,395000
56,1 ARDMORE ST,WANAKA,1/1/2015,1190000
57,1 Bullock Creek LANE,WANAKA,1/1/2015,11125000
58,1 DUNMORE ST,WANAKA,1/1/2015,1300000
59,1 Primary LANE,WANAKA,1/1/2015,430000
60,1 SYCAMORE PL,WANAKA,1/1/2015,720000
61,1 FAULKS TCE,WANAKA,1/1/2015,780000
62,1 Alpha CL,WANAKA,1/1/2015,500000
63,1 Coromandel ST,WANAKA,1/1/2015,530000
64,1 Niger ST,WANAKA,1/1/2015,475000
65,1 Maggies Way,WANAKA,1/1/2015,375000
66,1 Hollyhock LANE,QUEENSTOWN,1/1/2015,1080000
67,1 ELDERBERRY CRES,WANAKA,1/1/2015,1340000
68,1 Foxglove HTS,WANAKA,1/1/2015,2520000
69,1 MEADOWSTONE DR,WANAKA,1/1/2015,650000
70,1 OAKWOOD PL,WANAKA,1/1/2015,580000
71,1 MEADOWBROOK PL,WANAKA,1/1/2015,645000
72,1 Jessies CRES,WANAKA,1/1/2015,320000
73,1 Lansdown ST,WANAKA,1/1/2015,700000
74,1 Stonebrook DR,WANAKA,1/1/2015,640000
75,1 Hyland ST,WANAKA,1/1/2015,500000
76,1 TAPLEY PADDOCK,WANAKA,1/1/2015,720000
77,1 Homestead CL,WANAKA,1/1/2015,1750000
78,1 NORMAN TCE,WANAKA,1/1/2015,620000
79,1 Sunrise Bay DR,WANAKA,1/1/2015,3000000
80,1 LARCH PL,WANAKA,1/1/2015,570000
81,1 MILL END,WANAKA,1/1/2015,600000
82,1 Bills WAY,WANAKA,1/1/2015,750000
83,1 Heuchan LANE,WANAKA,1/1/2015,610000
84,1 SARGOOD DR,WANAKA,1/1/2015,455000
85,1 Frederick ST,WANAKA,1/1/2015,455000
86,1 Connell TCE,WANAKA,1/1/2015,600000
87,1 Soho ST,QUEENSTOWN,1/1/2015,320000
88,1 Hikuwai DR,ALBERT TOWN,1/1/2015,280000
89,1 Harrier LANE,WANAKA,1/1/2015,1000000
90,1 Ewing PL,WANAKA,1/1/2015,780000
91,1 Sherwin AVE,ALBERT TOWN,1/1/2015,440000
92,1 Hardie PL,WANAKA,1/1/2015,830000
93,1 Finch ST,ALBERT TOWN,1/1/2015,540000
94,1 Poppy LANE,ALBERT TOWN,1/1/2015,395000
95,1 Warbler LANE,ALBERT TOWN,1/1/2015,410000
96,1 Balneaves LANE,WANAKA,1/1/2015,250000
97,1 Mill Green,Arrowtown,1/1/2015,800000
require 'csv'
elements = {}
CSV.foreach("properties.csv", :headers => true, :header_converters => :symbol) do |row|
elements[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
d = 0
e = 0
elements.delete_if do |key, value|
e += 1
if e == 10
e = 0
d += 1
end
e == 0
end
puts "#{elements.length} house in list, #{d} records skipped."
At the end of this, elements will have every 10th row removed, and d contains the number of rows removed.

Regex for First Line (Only) that Contains a String

I have a bunch of phone numbers with one per line:
[Home] (202) 121-7777
C (202) 456-1111
[mobile] 55 55 5 55555
[Work] (404) 555-1234
[Cell] (505) 555-1234
W 303-555-5555
M 777-555-5555
c 12346567s
I want to grab the first one that contains the letter "c" upper or lower case.
So far, I have this /^.*[C].*$/i and that matches C (202) 456-1111, [Cell] (505) 555-1234 and c 12346567s. How do I return only the first? In other words, the match should only be C (202) 456-1111.
I have been blindly putting question marks everywhere without success.
I am using Ruby if it makes a difference http://www.rubular.com/r/h6ReB9IN8t
Edit: Here is another question that Hrishi pointed to but I cannot figure out how to adapt it to match the whole line.
Try match method. Here is an example:
list = <<EOF
[Home] (202) 121-7777
C (202) 456-1111
[mobile] 55 55 5 55555
[Work] (404) 555-1234
[Cell] (505) 555-1234
W 303-555-5555
M 777-555-5555
c 12346567s
EOF
Update
#match line with "c" letter in line, even that are part of word
puts list.match(/^.*C.*$/i)
#match line with "c" letter in line, that are not a part of word
puts list.match(/^\W*C\W.*$/i)
I'd go about this a bit differently. I prefer to reduce regular expressions to very simple patterns:
str = <<EOT
[Home] (202) 121-7777
C (202) 456-1111
[mobile] 55 55 5 55555
[Work] (404) 555-1234
[Cell] (505) 555-1234
W 303-555-5555
M 777-555-5555
c 12346567s
EOT
Finding the right line to work with is easily done using either select or find:
str.split("\n").select{ |s| s[/c/i] }.first # => "C (202) 456-1111"
str.split("\n").find{ |s| s[/c/i] } # => "C (202) 456-1111"
I'd recommend find because it only returns the first occurrence.
Once the desired string is found, use scan to grab the numbers:
str.split("\n").find{ |s| s[/c/i] }.scan(/\d+/) # => ["202", "456", "1111"]
Then join them. When you have phone numbers stored in a database you don't really want them to be formatted, you just want the numbers. Formatting occurs later when you're outputting them again.
phone_number = str.split("\n").find{ |s| s[/c/i] }.scan(/\d+/).join # => "2024561111"
When you need to output the number, break it into the right grouping based on the regional phone-number representation. You should have some idea where the person is located, because you've usually also got their country code. Based on that you know how many digits you should have, plus the groups:
area_code, prefix, number = phone_number[0 .. 2], phone_number[3 .. 5], phone_number[6 .. 9] # => ["202", "456", "1111"]
Then output them so they're displayed correctly:
"(%s) %s-%s" % [area_code, prefix, number] # => "(202) 456-1111"
As far as your original pattern /^.*[C].*$/i, there are some things wrong with your understanding of regex:
^.* says "start at the beginning of the string and find zero or more characters", which is no more effective than saying /[C].
Using [C] creates an unnecessary character-set which means "find one of the letters in the set "C"; It does nothing useful, so just use C as /C.
.*$ artificially finds the end of the string also, but since you're not capturing it there's nothing accomplished, so don't bother with it. The regex is now /C/.
Since you want to match upper and lower-case, use /C/i or /c/i. (Or you could use /[cC]/ but why?)
Instead:
To find a "c" or "C" anywhere in the string, just use /c/i. That's all that's needed. http://rubular.com/r/uPyxACOWls
To find "c", "C" or "cell" or "Cell", you can use /c(?:ell)?/. http://rubular.com/r/TkSRPWG2y6
To find "c", "C", "cell" or "Cell" as a separate word, use word-break markers like /\bc(?:ell)?\b/. http://rubular.com/r/Smo0bFs9w8
You can get a whole lot more complicated, but if you're not accomplishing anything with the additional pattern information, you're just wasting the regex-engine's CPU-time, and slowing your code. A confused regex-engine can waste a LOT of CPU-time, so be efficient and aware of what you're asking it to do.
EDIT Added two more ways of handling this. The last one is preferable.
This will do what you want. It will search for matches of your regex, and then get the first one. Please note that this will produce an error if string does not have any matches.
string = "[Home] (202) 121-7777
C (202) 456-1111
[mobile] 55 55 5 55555
[Work] (404) 555-1234
[Cell] (505) 555-1234
W 303-555-5555
M 777-555-5555
c 12346567s"
puts string.match(/^(.*[C].*)$/i).captures.first
puts string.match(/^(.*[C].*)$/i)
puts string[/^(.*[C].*)$/i]
Ruby Docs String#match.
Split the string by the new line characters, and select the substring which matches your requirements and grab the first one:
str = '[Home] (202) 121-7777
C (202) 456-1111
[mobile] 55 55 5 55555
[Work] (404) 555-1234
[Cell] (505) 555-1234
W 303-555-5555
M 777-555-5555
c 12346567s'
p str.split(/\n/).select{|el| el =~ /^.*[C].*$/i}[0]
or use match:
p str.match(/^.*[C].*$/i)[0]
EDITED:
Or, in case you want to find the first chunk that exactly starts with C try this:
p str.match(/^C.*$/)[0]

Join array of strings into 1 or more strings each within a certain char limit (+ prepend and append texts)

Let's say I have an array of Twitter account names:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
And a prepend and append variable:
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
How can I turn this into an array of as few strings as possible each with a maximum length of 140 characters, starting with the prepend text, ending with the append text, and in between the Twitter account names all starting with an #-sign and separated with a space. Like this:
tweets = ['Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday', 'Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday', 'Check out these cool people: #example18 #example19 #example20 #FollowFriday']
(The order of the accounts isn't important so theoretically you could try and find the best order to make the most use of the available space, but that's not required.)
Any suggestions? I'm thinking I should use the scan method, but haven't figured out the right way yet.
It's pretty easy using a bunch of loops, but I'm guessing that won't be necessary when using the right Ruby methods. Here's what I came up with so far:
# Create one long string of #usernames separated by a space
tmp = twitter_accounts.map!{|a| a.insert(0, '#')}.join(' ')
# alternative: tmp = '#' + twitter_accounts.join(' #')
# Number of characters left for mentioning the Twitter accounts
length = 140 - (prepend + append).length
# This method would split a string into multiple strings
# each with a maximum length of 'length' and it will only split on empty spaces (' ')
# ideally strip that space as well (although .map(&:strip) could be use too)
tweets = tmp.some_method(' ', length)
# Prepend and append
tweets.map!{|t| prepend + t + append}
P.S.
If anyone has a suggestion for a better title let me know. I had a difficult time summarizing my question.
The String rindex method has an optional parameter where you can specify where to start searching backwards in a string:
arr = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
str = arr.map{|name|"##{name}"}.join(' ')
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
max_chars = 140 - prepend.size - append.size
until str.size <= max_chars do
p str.slice!(0, str.rindex(" ", max_chars))
str.lstrip! #get rid of the leading space
end
p str unless str.empty?
I'd make use of reduce for this:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
prepend = 'Check out these cool people:'
append = '#FollowFriday'
# Extra -1 is for the space before `append`
max_content_length = 140 - prepend.length - append.length - 1
content_strings = string.reduce([""]) { |result, target|
result.push("") if result[-1].length + target.length + 2 > max_content_length
result[-1] += " ##{target}"
result
}
tweets = content_strings.map { |s| "#{prepend}#{s} #{append}" }
Which would yield:
"Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday"
"Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday"
"Check out these cool people: #example18 #example19 #example20 #FollowFriday"

multiline matching with ruby

I have a string variable with multiple lines: e.g.
"SClone VARPB63A\nSeq_vec SVEC 1 65 pCR2.1-topo\nSequencing_vector \"pCR2.1-topo\"\nSeq_vec SVEC 102 1710 pCR2.1-topo\nClipping QUAL 46 397\n
I would want to get both of lines that start with "Seq_vec SVEC" and extract the values of the integer part that matches...
string = "Clone VARPB63A\nSeq_vec SVEC 1 65 pCR2.1-topo\nSequencing_vector \"pCR2.1-topo\"\nSeq_vec SVEC 102 1710 pCR2.1-topo\nClipping QUAL 46 397\n"
seqvector = Regexp.new("Seq_vec\\s+SVEC\\s+(\\d+\\s+\\d+)",Regexp::MULTILINE )
vector = string.match(seqvector)
if vector
vector_start,vector_stop = vector[1].split(/ /)
puts vector_start.to_i
puts vector_stop.to_i
end
However this only grabs the first match's values and not the second as i would like.
Any ideas what i could be doing wrong?
Thank you
To capture groups use String#scan
vector = string.scan(seqvector)
=> [["1 65"], ["102 1710"]]
match finds just the first match. To find all matches use String#scan e.g.
string.scan(seqvector)
=> [["1 65"], ["102 1710"]]
or to do something with each match:
string.scan(seqvector) do |match|
# match[0] will be the substring captured by your first regexp grouping
puts match.inspect
end
Just to make this a bit easier to handle, I would split the whole string into an array first and then would do:
string = "SClone VARPB63A\nSeq_vec SVEC 1 65 pCR2.1-topo\nSequencing_vector \"pCR2.1-topo\"\nSeq_vec SVEC 102 1710 pCR2.1-topo\nClipping QUAL 46 397\n"
selected_strings = string.split("\n").select{|x| /Seq_vec SVEC/.match(x)}
selected_strings.collect{|x| x.scan(/\s\d+/)}.flatten # => [" 1", " 65", " 102", " 1710"]

Resources