Is there a neater way to put these hashes/arrays? [closed] - ruby
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a method from a long script that creates a hash from genetic sequences, however it is really messy and thus I was wondering whether there was a way to put it more elegantly.
Here is a sample of the script (i.e. it contains an example)...
def make_hash(motif)
main_hash = Hash.new
id = ">isotig00009_f2_3 ~: S.P. Cleavage Site: 22:23 - S.P. D-value: 0.532"
seq = "MLKCFSIIMGLILLLEIGGGCA~IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD"
id_hash = Hash[[[:id_start, :id_end], id.split("~").map(&:strip)].transpose]
seq_hash = Hash[[[:signalp, :seq_end], seq.split("~").map(&:strip)].transpose]
signalp = seq_hash[:signalp]
new_seq_end = seq_hash[:seq_end].gsub(/#{motif}/, '<span class="motif">\0</span>')
new_seq_hash = Hash[:signalp => signalp, :new_seq_end => new_seq_end ]
main_hash[id_hash] = [new_seq_hash]
return main_hash
end
motif = "VT|QAQ|F.D"
main_hash = make_hash(motif)
main_hash.each do |id_hash, seq_hash|
puts id_hash[:id_start]
puts id_hash[:id_end]
puts seq_hash[0][:signalp]
puts seq_hash[0][:new_seq_end]
end
So Is there a more elegant way to write the make_hash method...
Many Thanks
I haven't tested this, but I think this simplification will work:
def make_hash(motif)
id = ">isotig00009_f2_3 ~: S.P. Cleavage Site: 22:23 - S.P. D-value: 0.532"
seq = "MLKCFSIIMGLILLLEIGGGCA~IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD"
id_hash = Hash[[[:id_start, :id_end], id.split("~").map(&:strip)].transpose]
f, s = seq.split("~").map(&:strip)
s.gsub!(/#{motif}/, '<span class="motif">\0</span>')
new_seq_hash = Hash[Hash[:signalp, f], Hash[:new_seq_end, s]]
Hash[id_hash, new_seq_hash]
end
If (as it appears) id and seq both have constant values, you might consider breaking them apart manually, rather than with id.split("~").map(&:strip); i.e.,
id1 = ">isotig00009_f2_3
id2 = ": S.P. Cleavage Site: 22:23 - S.P. D-value: 0.532"
seq1 = "MLKCFSIIMGLILLLEIGGGCA"
seq2 = "IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD"
If there were a need to make seq2 more readable, we could use the "line continuation" character, \ (which even works within strings) like this:
seq2 = "IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNC"\
"SGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD"
or this:
seq2 = "IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNC\
SGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD"
If you preferred, you could make 'id' and 'seq' constants ('ID' and 'SEQ', say) and move them outside the method definition. Not surprisingly, line continuation also works for constant strings.
Related
Multiple choice quiz in Ruby via terminal
Is there any way to code in Ruby so that the terminal presents two options among which the user is required to select using the arrow keys and confirm using Enter? Pseudo code: p "What is the capital of Scotland? user_select = gets.chomp p "Edinburgh" p "Glasgow" if user_select == "Edinburgh" etc etc I want to know if this can be achieved without the user having to type in their answer. Can the terminal behave like a GUI?
Alternatively, you could use TTY::Prompt. It will let you use the arrow keys. Code sample require 'tty-prompt' prompt = TTY::Prompt.new greeting = 'What is the capital of Scotland?' choices = %w(Edinburgh Glasgow) answer = prompt.select(greeting, choices) 'do something' if answer == choices[0] Result $ ruby quiz.rb What is the capital of Scotland? (Use arrow keys, press Enter to select) ‣ Edinburgh Glasgow
You could use something like Highline, though that will not let you use arrow keys: → ruby test.rb 1. Edinburgh 2. Glasgow What is the capital of Scotland? → 1 Correct! Code (just to get an idea): require 'highline' cli = HighLine.new cli.choose do |menu| menu.prompt = "What is the capital of Scotland?" menu.choice("Edinburgh") { cli.say "Correct!" } menu.choice("Glasgow") { cli.say "Wrong!" } end For more of a GUI, try using something like MRDialog. Example: require 'mrdialog' dialog = MRDialog.new dialog.clear = true dialog.title = "Quiz" question = "What is the capital of Scotland?" answers = [['E', 'Edinburg'], ['G', 'Glasgow']] height = 0 width = 0 menu_height = 2 selected_item = dialog.menu(question, answers, height, width, menu_height) puts "Selected item: #{selected_item}" Result:
Extract multiple protein sequences from a Protein Data Bank along with Secondary Structure
I want to extract protein sequences and their corresponding secondary structure from any Protein Data bank, say RCSB. I just need short sequences and their secondary structure. Something like, ATRWGUVT Helix It is fine even if the sequences are long, but I want a tag at the end that denotes its secondary structure. Is there any programming tool or anything available for this. As I've shown above I want only this much minimal information. How can I achieve this?
from Bio.PDB import * from distutils import spawn Extract sequence: def get_seq(pdbfile): p = PDBParser(PERMISSIVE=0) structure = p.get_structure('test', pdbfile) ppb = PPBuilder() seq = '' for pp in ppb.build_peptides(structure): seq += pp.get_sequence() return seq Extract secondary structure with DSSP as explained earlier: def get_secondary_struc(pdbfile): # get secondary structure info for whole pdb. if not spawn.find_executable("dssp"): sys.stderr.write('dssp executable needs to be in folder') sys.exit(1) p = PDBParser(PERMISSIVE=0) ppb = PPBuilder() structure = p.get_structure('test', pdbfile) model = structure[0] dssp = DSSP(model, pdbfile) count = 0 sec = '' for residue in model.get_residues(): count = count + 1 # print residue,count a_key = list(dssp.keys())[count - 1] sec += dssp[a_key][2] print sec return sec This should print both sequence and secondary structure.
You can use DSSP. The output of DSSP is explained extensively under 'explanation'. The very short summary of the output is: H = α-helix B = residue in isolated β-bridge E = extended strand, participates in β ladder G = 3-helix (310 helix) I = 5 helix (π-helix) T = hydrogen bonded turn S = bend
Remove nTh record from array using loop [closed]
Closed. This question needs debugging details. It is not currently accepting answers. Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question. Closed 6 years ago. Improve this question I'm writing a program that reads a .csv file, and then loops through it removing every 10th record it encounters before outputting it. I've been stuck on what I believe is a syntax issue for a while now and just can't seem to nail it. Anyone mind having a look? lines = [] i = 0 elements = [] element2 = [] output = [] file = File.open("./properties.csv", "r") while (line = file.gets) i += 1 # use split to break array up using commas arr = line.split(',') elements.push({ id: arr[0], streetAddress: arr[1], town: arr[2], valuationDate: arr[3], value: arr[4] }) end file.close # filter out blanks and nill rows x = elements.select { |elements| elements[:id].to_i >= 0.1} # Loop to remove every 10th record e = 0 d = 1 loop do x.length if e == (10 * d) d ++ e ++ else x = elements.select[e] e ++ end puts x puts "#{x.length} house in list, #{d} records skipped." CSV FILE ID,Street address,Town,Valuation date,Value 1,1 Northburn RD,WANAKA,1/1/2015,280000 2,1 Mount Ida PL,WANAKA,1/1/2015,280000 3,1 Mount Linton AVE,WANAKA,1/1/2015,780000 4,1 Kamahi ST,WANAKA,1/1/2015,155000 5,1 Kapuka LANE,WANAKA,1/1/2015,149000 6,1 Mohua MEWS,WANAKA,1/1/2015,560000 7,1 Kakapo CT,WANAKA,1/1/2015,430000 8,1 Mt Gold PL,WANAKA,1/1/2015,1260000 9,1 Penrith Park DR,WANAKA,1/1/2015,1250000 10,1 ATHERTON PL,WANAKA,1/1/2015,650000 11,1 WAIMANA PL,WANAKA,1/1/2015,780000 12,1 ROTO PL,WANAKA,1/1/2015,1470000 13,1 Toms WAY,WANAKA,1/1/2015,2230000 14,1 MULBERRY LANE,WANAKA,1/1/2015,415000 15,1 Range View PL,WANAKA,1/1/2015,300000 16,1 Clearview ST,WANAKA,1/1/2015,1230000 17,1 Clutha PL,WANAKA,1/1/2015,700000 18,1 Centre CRES,WANAKA,1/1/2015,295000 19,1 Valley CRES,WANAKA,1/1/2015,790000 20,1 Edgewood PL,WANAKA,1/1/2015,365000 21,1 HUNTER CRES,WANAKA,1/1/2015,335000 22,1 KOWHAI DR,WANAKA,1/1/2015,480000 23,1 RIMU LANE,WANAKA,1/1/2015,465000 24,1 CHERRY CT,WANAKA,1/1/2015,495000 25,1 COLLINS ST,WANAKA,1/1/2015,520000 26,1 AUBREY RD,WANAKA,1/1/2015,985000 27,1 EELY POINT RD,WANAKA,1/1/2015,560000 28,1 LINDSAY PL,WANAKA,1/1/2015,385000 29,1 WINDERS ST,WANAKA,1/1/2015,760000 30,1 Manuka CRES,WANAKA,1/1/2015,510000 31,1 WILEY RD,WANAKA,1/1/2015,420000 32,1 Baker GR,WANAKA,1/1/2015,820000 33,1 Briar Bank DR,WANAKA,1/1/2015,1260000 34,1 LAKESIDE RD,WANAKA,1/1/2015,440000 35,1 PLANTATION RD,WANAKA,1/1/2015,345000 36,1 Allenby PL,WANAKA,1/1/2015,640000 37,1 ROB ROY LANE,WANAKA,1/1/2015,380000 38,1 Ansted PL,WANAKA,1/1/2015,590000 39,1 Fastness CRES,WANAKA,1/1/2015,640000 40,1 APOLLO PL,WANAKA,1/1/2015,385000 41,1 AEOLUS PL,WANAKA,1/1/2015,370000 42,1 Peak View RDGE,WANAKA,1/1/2015,1750000 43,1 Moncrieff PL,WANAKA,1/1/2015,530000 44,1 Islington PL,WANAKA,1/1/2015,190000 45,1 Hidden Hills DR,WANAKA,1/1/2015,1280000 46,1 Weatherall CL,WANAKA,1/1/2015,425000 47,1 Terranova PL,WANAKA,1/1/2015,900000 48,1 Cliff Wilson ST,WANAKA,1/1/2015,1200000 49,1 TOTARA TCE,WANAKA,1/1/2015,460000 50,1 Koru WAY,WANAKA,1/1/2015,570000 51,1 Bovett PL,Wanaka,1/1/2015,495000 52,1 Pearce PL,Wanaka,1/1/2015,675000 53,1 Ironside DR,WANAKA,1/1/2015,570000 54,1 Bob Lee PL,WANAKA,1/1/2015,610000 55,1 Hogan LANE,WANAKA,1/1/2015,395000 56,1 ARDMORE ST,WANAKA,1/1/2015,1190000 57,1 Bullock Creek LANE,WANAKA,1/1/2015,11125000 58,1 DUNMORE ST,WANAKA,1/1/2015,1300000 59,1 Primary LANE,WANAKA,1/1/2015,430000 60,1 SYCAMORE PL,WANAKA,1/1/2015,720000 61,1 FAULKS TCE,WANAKA,1/1/2015,780000 62,1 Alpha CL,WANAKA,1/1/2015,500000 63,1 Coromandel ST,WANAKA,1/1/2015,530000 64,1 Niger ST,WANAKA,1/1/2015,475000 65,1 Maggies Way,WANAKA,1/1/2015,375000 66,1 Hollyhock LANE,QUEENSTOWN,1/1/2015,1080000 67,1 ELDERBERRY CRES,WANAKA,1/1/2015,1340000 68,1 Foxglove HTS,WANAKA,1/1/2015,2520000 69,1 MEADOWSTONE DR,WANAKA,1/1/2015,650000 70,1 OAKWOOD PL,WANAKA,1/1/2015,580000 71,1 MEADOWBROOK PL,WANAKA,1/1/2015,645000 72,1 Jessies CRES,WANAKA,1/1/2015,320000 73,1 Lansdown ST,WANAKA,1/1/2015,700000 74,1 Stonebrook DR,WANAKA,1/1/2015,640000 75,1 Hyland ST,WANAKA,1/1/2015,500000 76,1 TAPLEY PADDOCK,WANAKA,1/1/2015,720000 77,1 Homestead CL,WANAKA,1/1/2015,1750000 78,1 NORMAN TCE,WANAKA,1/1/2015,620000 79,1 Sunrise Bay DR,WANAKA,1/1/2015,3000000 80,1 LARCH PL,WANAKA,1/1/2015,570000 81,1 MILL END,WANAKA,1/1/2015,600000 82,1 Bills WAY,WANAKA,1/1/2015,750000 83,1 Heuchan LANE,WANAKA,1/1/2015,610000 84,1 SARGOOD DR,WANAKA,1/1/2015,455000 85,1 Frederick ST,WANAKA,1/1/2015,455000 86,1 Connell TCE,WANAKA,1/1/2015,600000 87,1 Soho ST,QUEENSTOWN,1/1/2015,320000 88,1 Hikuwai DR,ALBERT TOWN,1/1/2015,280000 89,1 Harrier LANE,WANAKA,1/1/2015,1000000 90,1 Ewing PL,WANAKA,1/1/2015,780000 91,1 Sherwin AVE,ALBERT TOWN,1/1/2015,440000 92,1 Hardie PL,WANAKA,1/1/2015,830000 93,1 Finch ST,ALBERT TOWN,1/1/2015,540000 94,1 Poppy LANE,ALBERT TOWN,1/1/2015,395000 95,1 Warbler LANE,ALBERT TOWN,1/1/2015,410000 96,1 Balneaves LANE,WANAKA,1/1/2015,250000 97,1 Mill Green,Arrowtown,1/1/2015,800000
require 'csv' elements = {} CSV.foreach("properties.csv", :headers => true, :header_converters => :symbol) do |row| elements[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])] end d = 0 e = 0 elements.delete_if do |key, value| e += 1 if e == 10 e = 0 d += 1 end e == 0 end puts "#{elements.length} house in list, #{d} records skipped." At the end of this, elements will have every 10th row removed, and d contains the number of rows removed.
Improve genbank feature addition
I am trying to add more than 70000 new features to a genbank file using biopython. I have this code: from Bio import SeqIO from Bio.SeqFeature import SeqFeature, FeatureLocation fi = "myoriginal.gbk" fo = "mynewfile.gbk" for result in results: start = 0 end = 0 result = result.split("\t") start = int(result[0]) end = int(result[1]) for record in SeqIO.parse(original, "gb"): record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat")) SeqIO.write(record, fo, "gb") Results is just a list of lists containing the start and end of each one of the features I need to add to the original gbk file. This solution is extremely costly for my computer and I do not know how to improve the performance. Any good idea?
You should parse the genbank file just once. Omitting what results contains (I do not know exactly, because there are some missing pieces of code in your example), I would guess something like this would improve performance, modifying your code: fi = "myoriginal.gbk" fo = "mynewfile.gbk" original_records = list(SeqIO.parse(fi, "gb")) for result in results: result = result.split("\t") start = int(result[0]) end = int(result[1]) for record in original_records: record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat")) SeqIO.write(record, fo, "gb")
Date-time comparison in Ruby
I have one date, let's say '2010-12-20' of a flight departure, and two times, for instance, '23:30' and '02:15'. The problem: I need to get datetimes (yyyy-MM-dd HH:mm:ss, for example, 2010-12-17 14:38:32) of both of these dates, but I don't know the day of the second time (it can be the same day as departure, or the next one). I am looking for the best solution in Ruby on Rails. In PHP would just use string splitting multiple times, but I believe, that Rails as usually, has a much more elegant way. So, here is my pseudo code, which I want to turn into Ruby: depart_time = '23:30' arrive_time = '02:15' depart_date = '2010-12-20' arrive_date = (arrive.hour < depart.hour and arrive.hour < 5) ? depart_date + 1 : depart_date # Final results depart = depart_date + ' ' + depart_time arrive = arrive_date + ' ' + arrive_time I want to find the best way to implement this in Ruby on Rails, instead of just playing with strings.
This is just pure Ruby, nothing to do with Rails: require 'date' depart_time = DateTime.strptime '23:30', '%H:%M' arrive_time = DateTime.strptime '02:15', '%H:%M' arrive_date = depart_date = Date.parse( '2010-12-20' ) arrive_date += 1 if arrive_time.hour < depart_time.hour and arrive_time.hour < 5 puts "#{depart_date} #{depart_time.strftime '%H:%M'}", "#{arrive_date} #{arrive_time.strftime '%H:%M'}" #=> 2010-12-20 23:30 #=> 2010-12-21 02:15