RNA Splicing Python - bioinformatics
I have a gene sequence –
"acguccgcaagagaagccuuaauauauucaaaaagcuacgccucagauuucgcgcucgagcccaaaacaacugguguacggguugaucacaucaaaugaagucgcuaaagucggugaucucacuauccuugucuucggcuuuugcucucucggcuaucaucuaagcaggcgaguuccauggugaccggaacgacggcuacuggaguccaugaucgcaagcgucgggcugggguaaaagaggcucagcucauaauaguccgccccaccaguacgggacucgauaggccccgucguugccguagaaacgcaauuuuccucagacccacuauacgcaccucgauuuagcaugguuccgggguugcgcuuugagaaucauacguaaggaucggaaccuaggaaugcaccacagaacuuugaaauacuagaacaaguugauugacaacggaguaucggcgccccacauuuaacgaauaauugcaggcgccagacgaugcuaggugcguccguaucaagauucgaggucgcuacuggcuucgcuugccgaucgagcucagaguuugugagaguuguuacuaauugcguggucgccuaauauccuugauacuacguggguguacuagacaucccggacagaaaaucucuuaaacgcuagaguucucuuggaagcgccugcacuucuugugaacauacgaugauagccacucuaagcccaacgcacuucgcuuggcccacauugcccccagagcuuauucaucgacaggcguuccacucuuggauucaucaguaaacuuuauuauacgugguaagcgugcuuauagcugucggaaucucacuuaggcggauugaagugagacagccugaaaguaaccguguacaggcgccgucaauguguuuugagugugcaccuacaaaaaguguuauuuaggcaggggagcuuuguaguuucuuuagaagagccgcgaaugaaccaacgguagacugcgagcgcguucaaccuaau"
I want to splice the RNA and want to extract two lists (exons and introns). The key is that the intron section of RNA starts with gu and ends with ag. However, if ag appears before gu, it is a part of the exon and not the intron.
def splice(sequence):
introns = list()
exons = list()
while(sequence.count("gu")):
if "gu" not in sequence:
break
else:
exons.append(sequence[:sequence.find("gu")])
sequence = sequence[sequence.find("gu"):]
if "ag" not in sequence:
break
else:
introns.append(sequence[:sequence.find("ag")+2])
sequence = sequence[sequence.find("ag")+2:]
return introns, exons
This is what I have so far. It goes well pretty far but the issue begins at the end when gu appears without an ag in the remaining string.
Output:
Exons:
['ac',
'agaagccuuaauauauucaaaaagcuacgccucagauuucgcgcucgagcccaaaacaacug',
'ucgcuaaa',
'caggcga',
'uccaugaucgcaagc',
'aggcucagcucauaaua',
'uacgggacucgauaggcccc',
'aaacgcaauuuuccucagacccacuauacgcaccucgauuuagcaug',
'aaucauac',
'gaucggaaccuaggaaugcaccacagaacuuugaaauacuagaacaa',
'uaucggcgccccacauuuaacgaauaauugcaggcgccagacgaugcuag',
'auucgag',
'cucaga',
'a',
'acaucccggacagaaaaucucuuaaacgcuaga',
'cgccugcacuucuu',
'ccacucuaagcccaacgcacuucgcuuggcccacauugcccccagagcuuauucaucgacaggc',
'uaaacuuuauuauac',
'c',
'cu',
'gcggauugaa',
'acagccugaaa',
'gcgcc',
'u',
'u',
'gcaggggagcuuu',
'uuucuuuagaagagccgcgaaugaaccaacg',
'acugcgagcgc']
Introns:
['guccgcaag',
'guguacggguugaucacaucaaaugaag',
'gucggugaucucacuauccuugucuucggcuuuugcucucucggcuaucaucuaag',
'guuccauggugaccggaacgacggcuacuggag',
'gucgggcugggguaaaag',
'guccgccccaccag',
'gucguugccguag',
'guuccgggguugcgcuuugag',
'guaag',
'guugauugacaacggag',
'gugcguccguaucaag',
'gucgcuacuggcuucgcuugccgaucgag',
'guuugugag',
'guuguuacuaauugcguggucgccuaauauccuugauacuacguggguguacuag',
'guucucuuggaag',
'gugaacauacgaugauag',
'guuccacucuuggauucaucag',
'gugguaag',
'gugcuuauag',
'gucggaaucucacuuag',
'gugag',
'guaaccguguacag',
'gucaauguguuuugag',
'gugcaccuacaaaaag',
'guuauuuag',
'guag',
'guag']
I fixed the query by using regular expressions.
def splice(gene_Sequence):
regex = r"gu(?:\w{0,}?)ag"
introns = re.findall(regex, gene_Sequence)
for intron in introns:
exon = gene_Sequence.replace(intron, "")
return introns, exon
Related
How to parse username, ID or whole part using Ruby Regex in this sentence?
I have a sentences like this: Hello #[Pratha](user:1), did you see #[John](user:3)'s answer? And what I want to is get #[Pratha](user:1) and #[John](user:3). Either their names and ids or just as texts as I quoted so that i can explode and parse name and id myself. But there is an issue here. Names Pratha and John may include non-abc characters like ', ,, -, + , etc... But not [] and () What I tried so far: c = '' f = c.match(/(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s(\[)|$)))(\w+)(?=\s|$)/i) But no success.
You may use /#\[([^\]\[]*)\]\([^()]*:(\d+)\)/ See the regex demo Details # - a # char \[ - a [ ([^\]\[]*) - Group 1: 0+ chars other than [ and ] \] - a ] char \( - a ( char [^()]*- 0+ chars other than ( and ) : - a colon (\d+) - Group 2: 1 or more digits \) - a ) char. Sample Ruby code: s = "Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?" rx = /#\[([^\]\[]*)\]\([^()]*:(\d+)\)/ res = s.scan(rx) puts res # = > [["Pratha", "1"], ["John", "3"]]
"Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?".scan(/#.*?\)/) #⇒ ["#[Pratha](user:1)", "#[John](user:3)"] Since the line is not coming from the user input, you might rely on that the part you are interested in starts with # and ends with ).
You could use 2 capturing groups to get the names and the id's: #\[([^]]+)]\([^:]+:([^)]+)\) That will match # Match literally \[ Match [ ([^]]+) 1st capturing group which matches not ] 1+ times using a negated character class. \( Match literally [^:]+: Match not :, then match : ([^)]+) 2nd capturing group which matches not ) 1+ times \) Match ) Regex demo | Ruby demo
Remove nTh record from array using loop [closed]
Closed. This question needs debugging details. It is not currently accepting answers. Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question. Closed 6 years ago. Improve this question I'm writing a program that reads a .csv file, and then loops through it removing every 10th record it encounters before outputting it. I've been stuck on what I believe is a syntax issue for a while now and just can't seem to nail it. Anyone mind having a look? lines = [] i = 0 elements = [] element2 = [] output = [] file = File.open("./properties.csv", "r") while (line = file.gets) i += 1 # use split to break array up using commas arr = line.split(',') elements.push({ id: arr[0], streetAddress: arr[1], town: arr[2], valuationDate: arr[3], value: arr[4] }) end file.close # filter out blanks and nill rows x = elements.select { |elements| elements[:id].to_i >= 0.1} # Loop to remove every 10th record e = 0 d = 1 loop do x.length if e == (10 * d) d ++ e ++ else x = elements.select[e] e ++ end puts x puts "#{x.length} house in list, #{d} records skipped." CSV FILE ID,Street address,Town,Valuation date,Value 1,1 Northburn RD,WANAKA,1/1/2015,280000 2,1 Mount Ida PL,WANAKA,1/1/2015,280000 3,1 Mount Linton AVE,WANAKA,1/1/2015,780000 4,1 Kamahi ST,WANAKA,1/1/2015,155000 5,1 Kapuka LANE,WANAKA,1/1/2015,149000 6,1 Mohua MEWS,WANAKA,1/1/2015,560000 7,1 Kakapo CT,WANAKA,1/1/2015,430000 8,1 Mt Gold PL,WANAKA,1/1/2015,1260000 9,1 Penrith Park DR,WANAKA,1/1/2015,1250000 10,1 ATHERTON PL,WANAKA,1/1/2015,650000 11,1 WAIMANA PL,WANAKA,1/1/2015,780000 12,1 ROTO PL,WANAKA,1/1/2015,1470000 13,1 Toms WAY,WANAKA,1/1/2015,2230000 14,1 MULBERRY LANE,WANAKA,1/1/2015,415000 15,1 Range View PL,WANAKA,1/1/2015,300000 16,1 Clearview ST,WANAKA,1/1/2015,1230000 17,1 Clutha PL,WANAKA,1/1/2015,700000 18,1 Centre CRES,WANAKA,1/1/2015,295000 19,1 Valley CRES,WANAKA,1/1/2015,790000 20,1 Edgewood PL,WANAKA,1/1/2015,365000 21,1 HUNTER CRES,WANAKA,1/1/2015,335000 22,1 KOWHAI DR,WANAKA,1/1/2015,480000 23,1 RIMU LANE,WANAKA,1/1/2015,465000 24,1 CHERRY CT,WANAKA,1/1/2015,495000 25,1 COLLINS ST,WANAKA,1/1/2015,520000 26,1 AUBREY RD,WANAKA,1/1/2015,985000 27,1 EELY POINT RD,WANAKA,1/1/2015,560000 28,1 LINDSAY PL,WANAKA,1/1/2015,385000 29,1 WINDERS ST,WANAKA,1/1/2015,760000 30,1 Manuka CRES,WANAKA,1/1/2015,510000 31,1 WILEY RD,WANAKA,1/1/2015,420000 32,1 Baker GR,WANAKA,1/1/2015,820000 33,1 Briar Bank DR,WANAKA,1/1/2015,1260000 34,1 LAKESIDE RD,WANAKA,1/1/2015,440000 35,1 PLANTATION RD,WANAKA,1/1/2015,345000 36,1 Allenby PL,WANAKA,1/1/2015,640000 37,1 ROB ROY LANE,WANAKA,1/1/2015,380000 38,1 Ansted PL,WANAKA,1/1/2015,590000 39,1 Fastness CRES,WANAKA,1/1/2015,640000 40,1 APOLLO PL,WANAKA,1/1/2015,385000 41,1 AEOLUS PL,WANAKA,1/1/2015,370000 42,1 Peak View RDGE,WANAKA,1/1/2015,1750000 43,1 Moncrieff PL,WANAKA,1/1/2015,530000 44,1 Islington PL,WANAKA,1/1/2015,190000 45,1 Hidden Hills DR,WANAKA,1/1/2015,1280000 46,1 Weatherall CL,WANAKA,1/1/2015,425000 47,1 Terranova PL,WANAKA,1/1/2015,900000 48,1 Cliff Wilson ST,WANAKA,1/1/2015,1200000 49,1 TOTARA TCE,WANAKA,1/1/2015,460000 50,1 Koru WAY,WANAKA,1/1/2015,570000 51,1 Bovett PL,Wanaka,1/1/2015,495000 52,1 Pearce PL,Wanaka,1/1/2015,675000 53,1 Ironside DR,WANAKA,1/1/2015,570000 54,1 Bob Lee PL,WANAKA,1/1/2015,610000 55,1 Hogan LANE,WANAKA,1/1/2015,395000 56,1 ARDMORE ST,WANAKA,1/1/2015,1190000 57,1 Bullock Creek LANE,WANAKA,1/1/2015,11125000 58,1 DUNMORE ST,WANAKA,1/1/2015,1300000 59,1 Primary LANE,WANAKA,1/1/2015,430000 60,1 SYCAMORE PL,WANAKA,1/1/2015,720000 61,1 FAULKS TCE,WANAKA,1/1/2015,780000 62,1 Alpha CL,WANAKA,1/1/2015,500000 63,1 Coromandel ST,WANAKA,1/1/2015,530000 64,1 Niger ST,WANAKA,1/1/2015,475000 65,1 Maggies Way,WANAKA,1/1/2015,375000 66,1 Hollyhock LANE,QUEENSTOWN,1/1/2015,1080000 67,1 ELDERBERRY CRES,WANAKA,1/1/2015,1340000 68,1 Foxglove HTS,WANAKA,1/1/2015,2520000 69,1 MEADOWSTONE DR,WANAKA,1/1/2015,650000 70,1 OAKWOOD PL,WANAKA,1/1/2015,580000 71,1 MEADOWBROOK PL,WANAKA,1/1/2015,645000 72,1 Jessies CRES,WANAKA,1/1/2015,320000 73,1 Lansdown ST,WANAKA,1/1/2015,700000 74,1 Stonebrook DR,WANAKA,1/1/2015,640000 75,1 Hyland ST,WANAKA,1/1/2015,500000 76,1 TAPLEY PADDOCK,WANAKA,1/1/2015,720000 77,1 Homestead CL,WANAKA,1/1/2015,1750000 78,1 NORMAN TCE,WANAKA,1/1/2015,620000 79,1 Sunrise Bay DR,WANAKA,1/1/2015,3000000 80,1 LARCH PL,WANAKA,1/1/2015,570000 81,1 MILL END,WANAKA,1/1/2015,600000 82,1 Bills WAY,WANAKA,1/1/2015,750000 83,1 Heuchan LANE,WANAKA,1/1/2015,610000 84,1 SARGOOD DR,WANAKA,1/1/2015,455000 85,1 Frederick ST,WANAKA,1/1/2015,455000 86,1 Connell TCE,WANAKA,1/1/2015,600000 87,1 Soho ST,QUEENSTOWN,1/1/2015,320000 88,1 Hikuwai DR,ALBERT TOWN,1/1/2015,280000 89,1 Harrier LANE,WANAKA,1/1/2015,1000000 90,1 Ewing PL,WANAKA,1/1/2015,780000 91,1 Sherwin AVE,ALBERT TOWN,1/1/2015,440000 92,1 Hardie PL,WANAKA,1/1/2015,830000 93,1 Finch ST,ALBERT TOWN,1/1/2015,540000 94,1 Poppy LANE,ALBERT TOWN,1/1/2015,395000 95,1 Warbler LANE,ALBERT TOWN,1/1/2015,410000 96,1 Balneaves LANE,WANAKA,1/1/2015,250000 97,1 Mill Green,Arrowtown,1/1/2015,800000
require 'csv' elements = {} CSV.foreach("properties.csv", :headers => true, :header_converters => :symbol) do |row| elements[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])] end d = 0 e = 0 elements.delete_if do |key, value| e += 1 if e == 10 e = 0 d += 1 end e == 0 end puts "#{elements.length} house in list, #{d} records skipped." At the end of this, elements will have every 10th row removed, and d contains the number of rows removed.
Regex for First Line (Only) that Contains a String
I have a bunch of phone numbers with one per line: [Home] (202) 121-7777 C (202) 456-1111 [mobile] 55 55 5 55555 [Work] (404) 555-1234 [Cell] (505) 555-1234 W 303-555-5555 M 777-555-5555 c 12346567s I want to grab the first one that contains the letter "c" upper or lower case. So far, I have this /^.*[C].*$/i and that matches C (202) 456-1111, [Cell] (505) 555-1234 and c 12346567s. How do I return only the first? In other words, the match should only be C (202) 456-1111. I have been blindly putting question marks everywhere without success. I am using Ruby if it makes a difference http://www.rubular.com/r/h6ReB9IN8t Edit: Here is another question that Hrishi pointed to but I cannot figure out how to adapt it to match the whole line.
Try match method. Here is an example: list = <<EOF [Home] (202) 121-7777 C (202) 456-1111 [mobile] 55 55 5 55555 [Work] (404) 555-1234 [Cell] (505) 555-1234 W 303-555-5555 M 777-555-5555 c 12346567s EOF Update #match line with "c" letter in line, even that are part of word puts list.match(/^.*C.*$/i) #match line with "c" letter in line, that are not a part of word puts list.match(/^\W*C\W.*$/i)
I'd go about this a bit differently. I prefer to reduce regular expressions to very simple patterns: str = <<EOT [Home] (202) 121-7777 C (202) 456-1111 [mobile] 55 55 5 55555 [Work] (404) 555-1234 [Cell] (505) 555-1234 W 303-555-5555 M 777-555-5555 c 12346567s EOT Finding the right line to work with is easily done using either select or find: str.split("\n").select{ |s| s[/c/i] }.first # => "C (202) 456-1111" str.split("\n").find{ |s| s[/c/i] } # => "C (202) 456-1111" I'd recommend find because it only returns the first occurrence. Once the desired string is found, use scan to grab the numbers: str.split("\n").find{ |s| s[/c/i] }.scan(/\d+/) # => ["202", "456", "1111"] Then join them. When you have phone numbers stored in a database you don't really want them to be formatted, you just want the numbers. Formatting occurs later when you're outputting them again. phone_number = str.split("\n").find{ |s| s[/c/i] }.scan(/\d+/).join # => "2024561111" When you need to output the number, break it into the right grouping based on the regional phone-number representation. You should have some idea where the person is located, because you've usually also got their country code. Based on that you know how many digits you should have, plus the groups: area_code, prefix, number = phone_number[0 .. 2], phone_number[3 .. 5], phone_number[6 .. 9] # => ["202", "456", "1111"] Then output them so they're displayed correctly: "(%s) %s-%s" % [area_code, prefix, number] # => "(202) 456-1111" As far as your original pattern /^.*[C].*$/i, there are some things wrong with your understanding of regex: ^.* says "start at the beginning of the string and find zero or more characters", which is no more effective than saying /[C]. Using [C] creates an unnecessary character-set which means "find one of the letters in the set "C"; It does nothing useful, so just use C as /C. .*$ artificially finds the end of the string also, but since you're not capturing it there's nothing accomplished, so don't bother with it. The regex is now /C/. Since you want to match upper and lower-case, use /C/i or /c/i. (Or you could use /[cC]/ but why?) Instead: To find a "c" or "C" anywhere in the string, just use /c/i. That's all that's needed. http://rubular.com/r/uPyxACOWls To find "c", "C" or "cell" or "Cell", you can use /c(?:ell)?/. http://rubular.com/r/TkSRPWG2y6 To find "c", "C", "cell" or "Cell" as a separate word, use word-break markers like /\bc(?:ell)?\b/. http://rubular.com/r/Smo0bFs9w8 You can get a whole lot more complicated, but if you're not accomplishing anything with the additional pattern information, you're just wasting the regex-engine's CPU-time, and slowing your code. A confused regex-engine can waste a LOT of CPU-time, so be efficient and aware of what you're asking it to do.
EDIT Added two more ways of handling this. The last one is preferable. This will do what you want. It will search for matches of your regex, and then get the first one. Please note that this will produce an error if string does not have any matches. string = "[Home] (202) 121-7777 C (202) 456-1111 [mobile] 55 55 5 55555 [Work] (404) 555-1234 [Cell] (505) 555-1234 W 303-555-5555 M 777-555-5555 c 12346567s" puts string.match(/^(.*[C].*)$/i).captures.first puts string.match(/^(.*[C].*)$/i) puts string[/^(.*[C].*)$/i] Ruby Docs String#match.
Split the string by the new line characters, and select the substring which matches your requirements and grab the first one: str = '[Home] (202) 121-7777 C (202) 456-1111 [mobile] 55 55 5 55555 [Work] (404) 555-1234 [Cell] (505) 555-1234 W 303-555-5555 M 777-555-5555 c 12346567s' p str.split(/\n/).select{|el| el =~ /^.*[C].*$/i}[0] or use match: p str.match(/^.*[C].*$/i)[0] EDITED: Or, in case you want to find the first chunk that exactly starts with C try this: p str.match(/^C.*$/)[0]
Join array of strings into 1 or more strings each within a certain char limit (+ prepend and append texts)
Let's say I have an array of Twitter account names: string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20] And a prepend and append variable: prepend = 'Check out these cool people: ' append = ' #FollowFriday' How can I turn this into an array of as few strings as possible each with a maximum length of 140 characters, starting with the prepend text, ending with the append text, and in between the Twitter account names all starting with an #-sign and separated with a space. Like this: tweets = ['Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday', 'Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday', 'Check out these cool people: #example18 #example19 #example20 #FollowFriday'] (The order of the accounts isn't important so theoretically you could try and find the best order to make the most use of the available space, but that's not required.) Any suggestions? I'm thinking I should use the scan method, but haven't figured out the right way yet. It's pretty easy using a bunch of loops, but I'm guessing that won't be necessary when using the right Ruby methods. Here's what I came up with so far: # Create one long string of #usernames separated by a space tmp = twitter_accounts.map!{|a| a.insert(0, '#')}.join(' ') # alternative: tmp = '#' + twitter_accounts.join(' #') # Number of characters left for mentioning the Twitter accounts length = 140 - (prepend + append).length # This method would split a string into multiple strings # each with a maximum length of 'length' and it will only split on empty spaces (' ') # ideally strip that space as well (although .map(&:strip) could be use too) tweets = tmp.some_method(' ', length) # Prepend and append tweets.map!{|t| prepend + t + append} P.S. If anyone has a suggestion for a better title let me know. I had a difficult time summarizing my question.
The String rindex method has an optional parameter where you can specify where to start searching backwards in a string: arr = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20] str = arr.map{|name|"##{name}"}.join(' ') prepend = 'Check out these cool people: ' append = ' #FollowFriday' max_chars = 140 - prepend.size - append.size until str.size <= max_chars do p str.slice!(0, str.rindex(" ", max_chars)) str.lstrip! #get rid of the leading space end p str unless str.empty?
I'd make use of reduce for this: string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20] prepend = 'Check out these cool people:' append = '#FollowFriday' # Extra -1 is for the space before `append` max_content_length = 140 - prepend.length - append.length - 1 content_strings = string.reduce([""]) { |result, target| result.push("") if result[-1].length + target.length + 2 > max_content_length result[-1] += " ##{target}" result } tweets = content_strings.map { |s| "#{prepend}#{s} #{append}" } Which would yield: "Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday" "Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday" "Check out these cool people: #example18 #example19 #example20 #FollowFriday"
multiline matching with ruby
I have a string variable with multiple lines: e.g. "SClone VARPB63A\nSeq_vec SVEC 1 65 pCR2.1-topo\nSequencing_vector \"pCR2.1-topo\"\nSeq_vec SVEC 102 1710 pCR2.1-topo\nClipping QUAL 46 397\n I would want to get both of lines that start with "Seq_vec SVEC" and extract the values of the integer part that matches... string = "Clone VARPB63A\nSeq_vec SVEC 1 65 pCR2.1-topo\nSequencing_vector \"pCR2.1-topo\"\nSeq_vec SVEC 102 1710 pCR2.1-topo\nClipping QUAL 46 397\n" seqvector = Regexp.new("Seq_vec\\s+SVEC\\s+(\\d+\\s+\\d+)",Regexp::MULTILINE ) vector = string.match(seqvector) if vector vector_start,vector_stop = vector[1].split(/ /) puts vector_start.to_i puts vector_stop.to_i end However this only grabs the first match's values and not the second as i would like. Any ideas what i could be doing wrong? Thank you
To capture groups use String#scan vector = string.scan(seqvector) => [["1 65"], ["102 1710"]]
match finds just the first match. To find all matches use String#scan e.g. string.scan(seqvector) => [["1 65"], ["102 1710"]] or to do something with each match: string.scan(seqvector) do |match| # match[0] will be the substring captured by your first regexp grouping puts match.inspect end
Just to make this a bit easier to handle, I would split the whole string into an array first and then would do: string = "SClone VARPB63A\nSeq_vec SVEC 1 65 pCR2.1-topo\nSequencing_vector \"pCR2.1-topo\"\nSeq_vec SVEC 102 1710 pCR2.1-topo\nClipping QUAL 46 397\n" selected_strings = string.split("\n").select{|x| /Seq_vec SVEC/.match(x)} selected_strings.collect{|x| x.scan(/\s\d+/)}.flatten # => [" 1", " 65", " 102", " 1710"]