Is there a neater way to put these hashes/arrays? [closed] - ruby

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a method from a long script that creates a hash from genetic sequences, however it is really messy and thus I was wondering whether there was a way to put it more elegantly.
Here is a sample of the script (i.e. it contains an example)...
def make_hash(motif)
main_hash = Hash.new
id = ">isotig00009_f2_3 ~: S.P. Cleavage Site: 22:23 - S.P. D-value: 0.532"
seq = "MLKCFSIIMGLILLLEIGGGCA~IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD"
id_hash = Hash[[[:id_start, :id_end], id.split("~").map(&:strip)].transpose]
seq_hash = Hash[[[:signalp, :seq_end], seq.split("~").map(&:strip)].transpose]
signalp = seq_hash[:signalp]
new_seq_end = seq_hash[:seq_end].gsub(/#{motif}/, '<span class="motif">\0</span>')
new_seq_hash = Hash[:signalp => signalp, :new_seq_end => new_seq_end ]
main_hash[id_hash] = [new_seq_hash]
return main_hash
end
motif = "VT|QAQ|F.D"
main_hash = make_hash(motif)
main_hash.each do |id_hash, seq_hash|
puts id_hash[:id_start]
puts id_hash[:id_end]
puts seq_hash[0][:signalp]
puts seq_hash[0][:new_seq_end]
end
So Is there a more elegant way to write the make_hash method...
Many Thanks

I haven't tested this, but I think this simplification will work:
def make_hash(motif)
id = ">isotig00009_f2_3 ~: S.P. Cleavage Site: 22:23 - S.P. D-value: 0.532"
seq = "MLKCFSIIMGLILLLEIGGGCA~IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD"
id_hash = Hash[[[:id_start, :id_end], id.split("~").map(&:strip)].transpose]
f, s = seq.split("~").map(&:strip)
s.gsub!(/#{motif}/, '<span class="motif">\0</span>')
new_seq_hash = Hash[Hash[:signalp, f], Hash[:new_seq_end, s]]
Hash[id_hash, new_seq_hash]
end
If (as it appears) id and seq both have constant values, you might consider breaking them apart manually, rather than with id.split("~").map(&:strip); i.e.,
id1 = ">isotig00009_f2_3
id2 = ": S.P. Cleavage Site: 22:23 - S.P. D-value: 0.532"
seq1 = "MLKCFSIIMGLILLLEIGGGCA"
seq2 = "IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD"
If there were a need to make seq2 more readable, we could use the "line continuation" character, \ (which even works within strings) like this:
seq2 = "IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNC"\
"SGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD"
or this:
seq2 = "IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNC\
SGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD"
If you preferred, you could make 'id' and 'seq' constants ('ID' and 'SEQ', say) and move them outside the method definition. Not surprisingly, line continuation also works for constant strings.

Related

Multiple choice quiz in Ruby via terminal

Is there any way to code in Ruby so that the terminal presents two options among which the user is required to select using the arrow keys and confirm using Enter?
Pseudo code:
p "What is the capital of Scotland?
user_select = gets.chomp
p "Edinburgh"
p "Glasgow"
if user_select == "Edinburgh" etc etc
I want to know if this can be achieved without the user having to type in their answer. Can the terminal behave like a GUI?
Alternatively, you could use TTY::Prompt. It will let you use the arrow keys.
Code sample
require 'tty-prompt'
prompt = TTY::Prompt.new
greeting = 'What is the capital of Scotland?'
choices = %w(Edinburgh Glasgow)
answer = prompt.select(greeting, choices)
'do something' if answer == choices[0]
Result
$ ruby quiz.rb
What is the capital of Scotland? (Use arrow keys, press Enter to select)
‣ Edinburgh
Glasgow
You could use something like Highline, though that will not let you use arrow keys:
→ ruby test.rb
1. Edinburgh
2. Glasgow
What is the capital of Scotland?
→ 1
Correct!
Code (just to get an idea):
require 'highline'
cli = HighLine.new
cli.choose do |menu|
menu.prompt = "What is the capital of Scotland?"
menu.choice("Edinburgh") { cli.say "Correct!" }
menu.choice("Glasgow") { cli.say "Wrong!" }
end
For more of a GUI, try using something like MRDialog.
Example:
require 'mrdialog'
dialog = MRDialog.new
dialog.clear = true
dialog.title = "Quiz"
question = "What is the capital of Scotland?"
answers = [['E', 'Edinburg'], ['G', 'Glasgow']]
height = 0
width = 0
menu_height = 2
selected_item = dialog.menu(question, answers, height, width, menu_height)
puts "Selected item: #{selected_item}"
Result:

Extract multiple protein sequences from a Protein Data Bank along with Secondary Structure

I want to extract protein sequences and their corresponding secondary structure from any Protein Data bank, say RCSB. I just need short sequences and their secondary structure. Something like,
ATRWGUVT Helix
It is fine even if the sequences are long, but I want a tag at the end that denotes its secondary structure. Is there any programming tool or anything available for this.
As I've shown above I want only this much minimal information. How can I achieve this?
from Bio.PDB import *
from distutils import spawn
Extract sequence:
def get_seq(pdbfile):
p = PDBParser(PERMISSIVE=0)
structure = p.get_structure('test', pdbfile)
ppb = PPBuilder()
seq = ''
for pp in ppb.build_peptides(structure):
seq += pp.get_sequence()
return seq
Extract secondary structure with DSSP as explained earlier:
def get_secondary_struc(pdbfile):
# get secondary structure info for whole pdb.
if not spawn.find_executable("dssp"):
sys.stderr.write('dssp executable needs to be in folder')
sys.exit(1)
p = PDBParser(PERMISSIVE=0)
ppb = PPBuilder()
structure = p.get_structure('test', pdbfile)
model = structure[0]
dssp = DSSP(model, pdbfile)
count = 0
sec = ''
for residue in model.get_residues():
count = count + 1
# print residue,count
a_key = list(dssp.keys())[count - 1]
sec += dssp[a_key][2]
print sec
return sec
This should print both sequence and secondary structure.
You can use DSSP.
The output of DSSP is explained extensively under 'explanation'. The very short summary of the output is:
H = α-helix
B = residue in isolated β-bridge
E = extended strand, participates in β ladder
G = 3-helix (310 helix)
I = 5 helix (π-helix)
T = hydrogen bonded turn
S = bend

Remove nTh record from array using loop [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm writing a program that reads a .csv file, and then loops through it removing every 10th record it encounters before outputting it.
I've been stuck on what I believe is a syntax issue for a while now and just can't seem to nail it. Anyone mind having a look?
lines = []
i = 0
elements = []
element2 = []
output = []
file = File.open("./properties.csv", "r")
while (line = file.gets)
i += 1
# use split to break array up using commas
arr = line.split(',')
elements.push({ id: arr[0], streetAddress: arr[1], town: arr[2], valuationDate: arr[3], value: arr[4] })
end
file.close
# filter out blanks and nill rows
x = elements.select { |elements| elements[:id].to_i >= 0.1}
# Loop to remove every 10th record
e = 0
d = 1
loop do x.length
if e == (10 * d)
d ++
e ++
else
x = elements.select[e]
e ++
end
puts x
puts "#{x.length} house in list, #{d} records skipped."
CSV FILE
ID,Street address,Town,Valuation date,Value
1,1 Northburn RD,WANAKA,1/1/2015,280000
2,1 Mount Ida PL,WANAKA,1/1/2015,280000
3,1 Mount Linton AVE,WANAKA,1/1/2015,780000
4,1 Kamahi ST,WANAKA,1/1/2015,155000
5,1 Kapuka LANE,WANAKA,1/1/2015,149000
6,1 Mohua MEWS,WANAKA,1/1/2015,560000
7,1 Kakapo CT,WANAKA,1/1/2015,430000
8,1 Mt Gold PL,WANAKA,1/1/2015,1260000
9,1 Penrith Park DR,WANAKA,1/1/2015,1250000
10,1 ATHERTON PL,WANAKA,1/1/2015,650000
11,1 WAIMANA PL,WANAKA,1/1/2015,780000
12,1 ROTO PL,WANAKA,1/1/2015,1470000
13,1 Toms WAY,WANAKA,1/1/2015,2230000
14,1 MULBERRY LANE,WANAKA,1/1/2015,415000
15,1 Range View PL,WANAKA,1/1/2015,300000
16,1 Clearview ST,WANAKA,1/1/2015,1230000
17,1 Clutha PL,WANAKA,1/1/2015,700000
18,1 Centre CRES,WANAKA,1/1/2015,295000
19,1 Valley CRES,WANAKA,1/1/2015,790000
20,1 Edgewood PL,WANAKA,1/1/2015,365000
21,1 HUNTER CRES,WANAKA,1/1/2015,335000
22,1 KOWHAI DR,WANAKA,1/1/2015,480000
23,1 RIMU LANE,WANAKA,1/1/2015,465000
24,1 CHERRY CT,WANAKA,1/1/2015,495000
25,1 COLLINS ST,WANAKA,1/1/2015,520000
26,1 AUBREY RD,WANAKA,1/1/2015,985000
27,1 EELY POINT RD,WANAKA,1/1/2015,560000
28,1 LINDSAY PL,WANAKA,1/1/2015,385000
29,1 WINDERS ST,WANAKA,1/1/2015,760000
30,1 Manuka CRES,WANAKA,1/1/2015,510000
31,1 WILEY RD,WANAKA,1/1/2015,420000
32,1 Baker GR,WANAKA,1/1/2015,820000
33,1 Briar Bank DR,WANAKA,1/1/2015,1260000
34,1 LAKESIDE RD,WANAKA,1/1/2015,440000
35,1 PLANTATION RD,WANAKA,1/1/2015,345000
36,1 Allenby PL,WANAKA,1/1/2015,640000
37,1 ROB ROY LANE,WANAKA,1/1/2015,380000
38,1 Ansted PL,WANAKA,1/1/2015,590000
39,1 Fastness CRES,WANAKA,1/1/2015,640000
40,1 APOLLO PL,WANAKA,1/1/2015,385000
41,1 AEOLUS PL,WANAKA,1/1/2015,370000
42,1 Peak View RDGE,WANAKA,1/1/2015,1750000
43,1 Moncrieff PL,WANAKA,1/1/2015,530000
44,1 Islington PL,WANAKA,1/1/2015,190000
45,1 Hidden Hills DR,WANAKA,1/1/2015,1280000
46,1 Weatherall CL,WANAKA,1/1/2015,425000
47,1 Terranova PL,WANAKA,1/1/2015,900000
48,1 Cliff Wilson ST,WANAKA,1/1/2015,1200000
49,1 TOTARA TCE,WANAKA,1/1/2015,460000
50,1 Koru WAY,WANAKA,1/1/2015,570000
51,1 Bovett PL,Wanaka,1/1/2015,495000
52,1 Pearce PL,Wanaka,1/1/2015,675000
53,1 Ironside DR,WANAKA,1/1/2015,570000
54,1 Bob Lee PL,WANAKA,1/1/2015,610000
55,1 Hogan LANE,WANAKA,1/1/2015,395000
56,1 ARDMORE ST,WANAKA,1/1/2015,1190000
57,1 Bullock Creek LANE,WANAKA,1/1/2015,11125000
58,1 DUNMORE ST,WANAKA,1/1/2015,1300000
59,1 Primary LANE,WANAKA,1/1/2015,430000
60,1 SYCAMORE PL,WANAKA,1/1/2015,720000
61,1 FAULKS TCE,WANAKA,1/1/2015,780000
62,1 Alpha CL,WANAKA,1/1/2015,500000
63,1 Coromandel ST,WANAKA,1/1/2015,530000
64,1 Niger ST,WANAKA,1/1/2015,475000
65,1 Maggies Way,WANAKA,1/1/2015,375000
66,1 Hollyhock LANE,QUEENSTOWN,1/1/2015,1080000
67,1 ELDERBERRY CRES,WANAKA,1/1/2015,1340000
68,1 Foxglove HTS,WANAKA,1/1/2015,2520000
69,1 MEADOWSTONE DR,WANAKA,1/1/2015,650000
70,1 OAKWOOD PL,WANAKA,1/1/2015,580000
71,1 MEADOWBROOK PL,WANAKA,1/1/2015,645000
72,1 Jessies CRES,WANAKA,1/1/2015,320000
73,1 Lansdown ST,WANAKA,1/1/2015,700000
74,1 Stonebrook DR,WANAKA,1/1/2015,640000
75,1 Hyland ST,WANAKA,1/1/2015,500000
76,1 TAPLEY PADDOCK,WANAKA,1/1/2015,720000
77,1 Homestead CL,WANAKA,1/1/2015,1750000
78,1 NORMAN TCE,WANAKA,1/1/2015,620000
79,1 Sunrise Bay DR,WANAKA,1/1/2015,3000000
80,1 LARCH PL,WANAKA,1/1/2015,570000
81,1 MILL END,WANAKA,1/1/2015,600000
82,1 Bills WAY,WANAKA,1/1/2015,750000
83,1 Heuchan LANE,WANAKA,1/1/2015,610000
84,1 SARGOOD DR,WANAKA,1/1/2015,455000
85,1 Frederick ST,WANAKA,1/1/2015,455000
86,1 Connell TCE,WANAKA,1/1/2015,600000
87,1 Soho ST,QUEENSTOWN,1/1/2015,320000
88,1 Hikuwai DR,ALBERT TOWN,1/1/2015,280000
89,1 Harrier LANE,WANAKA,1/1/2015,1000000
90,1 Ewing PL,WANAKA,1/1/2015,780000
91,1 Sherwin AVE,ALBERT TOWN,1/1/2015,440000
92,1 Hardie PL,WANAKA,1/1/2015,830000
93,1 Finch ST,ALBERT TOWN,1/1/2015,540000
94,1 Poppy LANE,ALBERT TOWN,1/1/2015,395000
95,1 Warbler LANE,ALBERT TOWN,1/1/2015,410000
96,1 Balneaves LANE,WANAKA,1/1/2015,250000
97,1 Mill Green,Arrowtown,1/1/2015,800000
require 'csv'
elements = {}
CSV.foreach("properties.csv", :headers => true, :header_converters => :symbol) do |row|
elements[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
d = 0
e = 0
elements.delete_if do |key, value|
e += 1
if e == 10
e = 0
d += 1
end
e == 0
end
puts "#{elements.length} house in list, #{d} records skipped."
At the end of this, elements will have every 10th row removed, and d contains the number of rows removed.

Improve genbank feature addition

I am trying to add more than 70000 new features to a genbank file using biopython.
I have this code:
from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation
fi = "myoriginal.gbk"
fo = "mynewfile.gbk"
for result in results:
start = 0
end = 0
result = result.split("\t")
start = int(result[0])
end = int(result[1])
for record in SeqIO.parse(original, "gb"):
record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
SeqIO.write(record, fo, "gb")
Results is just a list of lists containing the start and end of each one of the features I need to add to the original gbk file.
This solution is extremely costly for my computer and I do not know how to improve the performance. Any good idea?
You should parse the genbank file just once. Omitting what results contains (I do not know exactly, because there are some missing pieces of code in your example), I would guess something like this would improve performance, modifying your code:
fi = "myoriginal.gbk"
fo = "mynewfile.gbk"
original_records = list(SeqIO.parse(fi, "gb"))
for result in results:
result = result.split("\t")
start = int(result[0])
end = int(result[1])
for record in original_records:
record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
SeqIO.write(record, fo, "gb")

Date-time comparison in Ruby

I have one date, let's say '2010-12-20' of a flight departure, and two times, for instance, '23:30' and '02:15'.
The problem: I need to get datetimes (yyyy-MM-dd HH:mm:ss, for example, 2010-12-17 14:38:32) of both of these dates, but I don't know the day of the second time (it can be the same day as departure, or the next one).
I am looking for the best solution in Ruby on Rails. In PHP would just use string splitting multiple times, but I believe, that Rails as usually, has a much more elegant way.
So, here is my pseudo code, which I want to turn into Ruby:
depart_time = '23:30'
arrive_time = '02:15'
depart_date = '2010-12-20'
arrive_date = (arrive.hour < depart.hour and arrive.hour < 5) ? depart_date + 1 : depart_date
# Final results
depart = depart_date + ' ' + depart_time
arrive = arrive_date + ' ' + arrive_time
I want to find the best way to implement this in Ruby on Rails, instead of just playing with strings.
This is just pure Ruby, nothing to do with Rails:
require 'date'
depart_time = DateTime.strptime '23:30', '%H:%M'
arrive_time = DateTime.strptime '02:15', '%H:%M'
arrive_date = depart_date = Date.parse( '2010-12-20' )
arrive_date += 1 if arrive_time.hour < depart_time.hour and arrive_time.hour < 5
puts "#{depart_date} #{depart_time.strftime '%H:%M'}",
"#{arrive_date} #{arrive_time.strftime '%H:%M'}"
#=> 2010-12-20 23:30
#=> 2010-12-21 02:15

Resources