Complementary DNA sequence - for-loop

I'm having a problem writing this loop; it seems to stop after the second sequence.
I want to return the complementary DNA sequence to the given DNA sequence.
E.g. ('AGATTC') -> ('TCTAAG'), where A:T and C:G
def get_complementary_sequence(dna):
"""(str) -> str
> Return the DNA sequence that is complementary to the given DNA sequence
>>> get_complementary_sequence('AT')
('TA')
>>> get_complementary_sequence('AGATTC')
('TCTAAG')
"""
x = 0
complementary_sequence = ''
for char in dna:
complementary_sequence = (get_complement(dna))
return complementary_sequence + (dna[x:x+1])
Can anyone spot why the loop does not continue?

Here is an example how I would do it - only several lines of code really:
from string import maketrans
DNA="CCAGCTTATCGGGGTACCTAAATACAGAGATAT" #example DNA fragment
def complement(sequence):
reverse = sequence[::-1]
return reverse.translate(maketrans('ATCG','TAGC'))
print complement(DNA)

What's wrong?
You're calling:
complementary_sequence = (get_complement(dna))
...n times where n is the length of the string. This leaves you with whatever the return value of get_complement(dna) is in complementary_sequence. Presumably just one letter.
You then return this one letter (complementary_sequence) followed by the substring dna[0:1] (i.e. the first letter in dna), because x is always 0.
This would be why you always get two characters returned.
How to fix it?
Assuming you have a function like:
def get_complement(d):
return {'T': 'A', 'A': 'T', 'C': 'G', 'G': 'C'}.get(d, d)
...you could fix your function by simply using str.join() and a list comprehension:
def get_complementary_sequence(dna):
"""(str) -> str
> Return the DNA sequence that is complementary to the given DNA sequence
>>> get_complementary_sequence('AT')
('TA')
>>> get_complementary_sequence('AGATTC')
('TCTAAG')
"""
return ''.join([get_complement(c) for c in dna])

You call get_complement on all of dna instead of each char. This will simply call the came function with the same parameters len(dna) times. There's no reason to loop through the chars if you never use them. If get_complement() can take a char, I would recommend:
for char in dna:
complementary_sequence += get_complement(char)
The implementation of get_complement would take a single character and return its complement.
Also, you're returning complementary_sequence + (dna[x:x+1]). If you want the function to conform to the behavior that you've documented, the + (dna[x:x+1]) will add an extra (wrong) character from the beginning off the dna string. All you need to return is complementary_sequence! Thanks to #Kevin for noticing.
What you're doing:
>>> dna = "1234"
>>> for char in dna:
... print dna
...
1234
1234
1234
1234
what I think is closer to what you want to be doing:
>>> for char in dna:
... print char
...
1
2
3
4
Putting it all together:
# you could also use a list comprehension, with a join() call, but
# this is closer to your original implementation.
def get_complementary_sequence(seq):
complement = ''
for char in seq:
complement += get_complement(char)
return complement
def get_complement(base):
complements = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
return complements[base]
>>> get_complementary_sequence('AT')
'TA'
>>> get_complementary_sequence('AGATTC')
'TCTAAG'

Related

Move decimal fixed number of spaces with Ruby

I have large integers (typically 15-30 digits) stored as a string that represent a certain amount of a given currenty (such as ETH). Also stored with that is the number of digits to move the decimal.
{
"base_price"=>"5000000000000000000",
"decimals"=>18
}
The output that I'm ultimately looking for is 5.00 (which is what you'd get if took the decimal from 5000000000000000000 and moved it to the left 18 positions).
How would I do that in Ruby?
Given:
my_map = {
"base_price"=>"5000000000000000000",
"decimals"=>18
}
You could use:
my_number = my_map["base_price"].to_i / (10**my_map["decimals"]).to_f
puts(my_number)
h = { "base_price"=>"5000000000000000000", "decimals"=>18 }
​
bef, aft = h["base_price"].split(/(?=\d{#{h["decimals"]}}\z)/)
#=> ["5", "000000000000000000"]
bef + '.' + aft[0,2]
#=> "5.00"
The regular expression uses the positive lookahead (?=\d{18}\z) to split the string at a ("zero-width") location between digits such that 18 digits follow to the end of the string.
Alternatively, one could write:
str = h["base_price"][0, h["base_price"].size-h["decimals"]+2]
#=> h["base_price"][0, 3]
#=> "500"
str.insert(str.size-2, '.')
#=> "5.00"
Neither of these address potential boundary cases such as
{ "base_price"=>"500", "decimals"=>1 }
or
{ "base_price"=>"500", "decimals"=>4 }
Nor do they consider rounding issues.
Regular expressions and interpolation?
my_map = {
"base_price"=>"5000000000000000000",
"decimals"=>18
}
my_map["base_price"].sub(
/(0{#{my_map["decimals"]}})\s*$/,
".#{$1}"
)
The number of decimal places is interpolated into the regular expression as the count of zeroes to look for from the end of the string (plus zero or more whitespace characters). This is matched, and the match is subbed with a . in front of it.
Producing:
=> "5.000000000000000000"

Convert digits in string to ints and then back to string

Say I have a string: formula = "C3H12O4"
How can I convert the digit chars in the string to ints?
My end goal is to do something along the lines of:
formula * 4
Once converted formula chars to an int, it would be best to report the result back to a string, thus
outputting as:
"C12H48O16"
formula = "C3H12O4"
Code
p formula.gsub(/\d+/) { |x| x.to_i * 4 }
output
"C12H48O16"
If you had many conversions to do it might be worthwhile to include the following in a benchmark of different methods:
h = (0..9).each_with_object({}) { |n,h| h[n.to_s] = (4*n).to_s }
#=> {"0"=>"0", "1"=>"4", "2"=>"8", "3"=>"12", "4"=>"16",
# "5"=>"20", "6"=>"24", "7"=>"28", "8"=>"32", "9"=>"36"}
Then for each string of interest the following calculation would be performed:
"C3H12O4".gsub(/\d/, h)
#=> "C12H48O16"
"99Ra$32".gsub(/\d/, h)
#=> "3636Ra$128"
This uses the form of String#gsub that employs a hash to make the substitutions.
A variant of this is the following.
"C3H12O4".gsub(/./) { |c| h.fetch(c, c) }
#=> "C12H48O16"
Here gsub matches every character, which it passes to the block to be held by the block variable c. Hash#fetch is then used to look up and return h[c], provided h has a key c. If h does not have a key c, fetch's second argument (c) is returned.
The use of the hash avoids the need to convert back and forth between integers and strings, except in the creation of the hash, of course, but that is done only once.

How to convert bytes in number into a string of characters? (character representation of a number)

How do I easily convert a number, e.g. 0x616263, equivalently 6382179 in base 10, into a string by dividing the number up into sequential bytes? So the example above should convert into 'abc'.
I've experimented with Array.pack but cant figure out how to get it to convert more than one byte in the number, e.g. [0x616263].pack("C*") returns 'c'.
I've also tried 0x616263.to_s(256), but that throws an ArgumentError: invalid radix. I guess it needs some sort of encoding information?
(Note: Other datatypes in pack like N work with the example I've given above, but only because it fits within 4 bytes, so e.g. [0x616263646566].pack("N") gives cdef, not abcdef)
This question is vaguely similar to this one, but not really. Also, I sort of figured out how to get the hex representation string from a character string using "abcde".unpack("c*").map{|c| c.to_s(16)}.join(""), which gives '6162636465'. I basically want to go backwards.
I don't think this is an X-Y problem, but in case it is - I'm trying to convert a number I've decoded with RSA into a character string.
Thanks for any help. I'm not too experienced with Ruby. I'd also be interested in a Python solution (for fun), but I don't know if its right to add tags for two separate programming languages to this question.
To convert a single number 0x00616263 into 3 characters, what you really need to do first is separate them into three numbers: 0x00000061, 0x00000062, and 0x00000063.
For the last number, the hex digits you want are already in the correct place. But for the other two, you have to do a bitshift using >> 16 and >> 8 respectively.
Afterwards, use a bitwise and to get rid of the other digits:
num1 = (0x616263 >> 16) & 0xFF
num2 = (0x616263 >> 8) & 0xFF
num3 = 0x616263 & 0xFF
For the characters, you could then do:
char1 = ((0x616263 >> 16) & 0xFF).chr
char2 = ((0x616263 >> 8) & 0xFF).chr
char3 = (0x616263 & 0xFF).chr
Of course, bitwise operations aren't very Ruby-esque. There are probably more Ruby-like answers that someone else might provide.
64 bit integers
If your number is smaller than 2**64 (8 bytes), you can :
convert the "big-endian unsigned long long" to 8 bytes
remove the leading zero bytes
Ruby
[0x616263].pack('Q>').sub(/\x00+/,'')
# "abc"
[0x616263646566].pack('Q>').sub(/\x00+/,'')
# "abcdef"
Python 2 & 3
In Python, pack returns bytes, not a string. You can use decode() to convert bytes to a String :
import struct
import re
print(re.sub('\x00', '', struct.pack(">Q", 0x616263646566).decode()))
# abcdef
print(re.sub('\x00', '', struct.pack(">Q", 0x616263).decode()))
# abc
Large numbers
With gsub
If your number doesn't fit in 8 bytes, you could use a modified version of your code. This is shorter and outputs the string correctly if the first byte is smaller than 10 (e.g. for "\t") :
def decode(int)
if int < 2**64
[int].pack('Q>').sub(/\x00+/, '')
else
nhex = int.to_s(16)
nhex = '0' + nhex if nhex.size.odd?
nhex.gsub(/../) { |hh| hh.to_i(16).chr }
end
end
puts decode(0x616263) == 'abc'
# true
puts decode(0x616263646566) == 'abcdef'
# true
puts decode(0x0961) == "\ta"
# true
puts decode(0x546869732073656e74656e63652069732077617920746f6f206c6f6e6720666f7220616e20496e743634)
# This sentence is way too long for an Int64
By the way, here's the reverse method :
def encode(str)
str.reverse.each_byte.with_index.map { |b, i| b * 256**i }.inject(:+)
end
You should still check if your RSA code really outputs arbitrary large numbers or just an array of integers.
With shifts
Here's another way to get the result. It's similar to #Nathan's answer, but it works for any integer size :
def decode(int)
a = []
while int>0
a << (int & 0xFF)
int >>= 8
end
a.reverse.pack('C*')
end
According to fruity, it's twice as fast as the gsub solution.
I'm currently rolling with this:
n = 0x616263
nhex = n.to_s(16)
nhexarr = nhex.scan(/.{1,2}/)
nhexarr = nhexarr.map {|e| e.to_i(16)}
out = nhexarr.pack("C*")
But was hoping for a concise/built-in way to do this, so I'll leave this answer unaccepted for now.

Ruby Truncate Words + Long Text

I have the following function which accepts text and a word count and if the number of words in the text exceeded the word-count it gets truncated with an ellipsis.
#Truncate the passed text. Used for headlines and such
def snippet(thought, wordcount)
thought.split[0..(wordcount-1)].join(" ") + (thought.split.size > wordcount ? "..." : "")
end
However what this function doesn't take into account is extremely long words, for instance...
"Helloooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
world!"
I was wondering if there's a better way to approach what I'm trying to do so it takes both word count and text size into consideration in an efficient way.
Is this a Rails project?
Why not use the following helper:
truncate("Once upon a time in a world far far away", :length => 17)
If not, just reuse the code.
This is probably a two step process:
Truncate the string to a max length (no need for regex for this)
Using regex, find a max words quantity from the truncated string.
Edit:
Another approach is to split the string into words, loop through the array adding up
the lengths. When you find the overrun, join 0 .. index just before the overrun.
Hint: regex ^(\s*.+?\b){5} will match first 5 "words"
The logic for checking both word and char limits becomes too convoluted to clearly express as one expression. I would suggest something like this:
def snippet str, max_words, max_chars, omission='...'
max_chars = 1+omision.size if max_chars <= omission.size # need at least one char plus ellipses
words = str.split
omit = words.size > max_words || str.length > max_chars ? omission : ''
snip = words[0...max_words].join ' '
snip = snip[0...(max_chars-3)] if snip.length > max_chars
snip + omit
end
As other have pointed out Rails String#truncate offers almost the functionality you want (truncate to fit in length at a natural boundary), but it doesn't let you independently state max char length and word count.
First 20 characters:
>> "hello world this is the world".gsub(/.+/) { |m| m[0..20] + (m.size > 20 ? '...' : '') }
=> "hello world this is t..."
First 5 words:
>> "hello world this is the world".gsub(/.+/) { |m| m.split[0..5].join(' ') + (m.split.size > 5 ? '...' : '') }
=> "hello world this is the world..."

Code Golf: Validate Sudoku Grid

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Introduction
A valid Sudoku grid is filled with numbers 1 to 9, with no number occurring more than once in each sub-block of 9, row or column. Read this article for further details if you're unfamiliar with this popular puzzle.
Challenge
The challenge is to write the shortest program that validates a Sudoku grid that might not be full.
Input will be a string of 9 lines of 9 characters each, representing the grid. An empty cell will be represented by a .. Your output should be Valid if the grid is valid, otherwise output Invalid.
Example
Input
123...789
...456...
456...123
789...456
...123...
564...897
...231...
897...564
...564...
Output
Valid
Input
123456789
987654321
123456789
123456789
987654321
123456789
123456789
987654321
123456789
Output
Invalid
Code Golf Rules
Please post your shortest code in any language that solves this problem. Input and output may be handled via stdin and stdout or by other files of your choice.
Winner will be the shortest solution (by byte count) in a language with an implementation existing prior to the posting of this question. So while you are free to use a language you've just made up in order to submit a 0-byte solution, it won't count, and you'll probably get downvotes.
Golfscript: 56
n%{zip''+9/.{'.'-..&=}%$0=\}:|2*{3/}%|;**"InvV"3/="alid"
C: 165 162 161 160 159
int v[1566],x,y=9,c,b;main(){while(y--)for(x=9;x--+1;)if((c
=getchar()*27)>1242)b|=v[x+c]++|v[y+9+c]++|v[x-x%3+y/3+18+c]
++;puts(b?"Invalid":"Valid");return 0;}
The two newlines are not needed. One char saved by josefx :-) ...
Haskell: 207 230 218 195 172
import List
t=take 3
h=[t,t.drop 3,drop 6]
v[]="V"
v _="Inv"
f s=v[1|v<-[s,transpose s,[g=<<f s|f<-h,g<-h]],g<-map(filter(/='.'))v,g/=nub g]++"alid\n"
main=interact$f.lines
Perl: 168 128
$_=join'',<>;#a=/.../g;print+(/(\d)([^\n]{0,8}|(.{10})*.{9})\1/s
+map"#a[$_,$_+3,$_+6]"=~/(\d).*\1/,0..2,9..11,18..20)?Inv:V,alid
The first regex checks for duplicates that are in the same row and column; the second regex handles duplicates in the "same box".
Further improvement is possible by replacing the \n in the first regex with a literal newline (1 char), or with >= Perl 5.12, replacing [^\n] with \N (3 char)
Earlier, 168 char solution:
Input is from stdin, output is to stderr because it makes things so easy. Linebreaks are optional and not counted.
$_=join'',<>;$m=alid.$/;$n=Inv.$m;/(\d)(\N{0,8}|(.{10})*.{9})\1/s&&
die$n;#a=/.../g;for$i(0,8,17){for$j($i..$i+2){
$_=$a[$j].$a[$j+3].$a[$j+6];/(\d).*\1/&&die$n}}die"V$m"
Python: 230 221 200 185
First the readable version at len=199:
import sys
r=range(9)
g=[raw_input()for _ in r]
s=[[]for _ in r*3]
for i in r:
for j in r:
n=g[i][j]
for x in i,9+j,18+i/3*3+j/3:
<T>if n in s[x]:sys.exit('Invalid')
<T>if n>'.':s[x]+=n
print'Valid'
Since SO doesn't display tab characters, I've used <T> to represent a single tab character.
PS. the same approach minEvilized down to 185 chars:
r=range(9)
g=[raw_input()for _ in r]
s=['']*27
for i in r:
for j in r:
for x in i,9+j,18+i/3*3+j/3:n=g[i][j];s[x]+=n[:n>'.']
print['V','Inv'][any(len(e)>len(set(e))for e in s)]+'alid'
Perl, 153 char
#B contains the 81 elements of the board.
&E tests whether a subset of #B contains any duplicate digits
main loop validates each column, "block", and row of the puzzle
sub E{$V+="#B[#_]"=~/(\d).*\1/}
#B=map/\S/g,<>;
for$d(#b=0..80){
E grep$d==$_%9,#b;
E grep$d==int(($_%9)/3)+3*int$_/27,#b;
E$d*9..$d*9+8}
print$V?Inv:V,alid,$/
Python: 159 158
v=[0]*244
for y in range(9):
for x,c in enumerate(raw_input()):
if c>".":
<T>for k in x,y+9,x-x%3+y//3+18:v[k*9+int(c)]+=1
print["Inv","V"][max(v)<2]+"alid"
<T> is a single tab character
Common Lisp: 266 252
(princ(let((v(make-hash-table))(r "Valid"))(dotimes(y 9)(dotimes(x
10)(let((c(read-char)))(when(>(char-code c)46)(dolist(k(list x(+ 9
y)(+ 18(floor(/ y 3))(- x(mod x 3)))))(when(>(incf(gethash(+(* k
9)(char-code c)-49)v 0))1)(setf r "Invalid")))))))r))
Perl: 186
Input is from stdin, output to stdout, linebreaks in input optional.
#y=map/\S/g,<>;
sub c{(join'',map$y[$_],#$h)=~/(\d).*\1/|c(#_)if$h=pop}
print(('V','Inv')[c map{$x=$_;[$_*9..$_*9+8],[grep$_%9==$x,0..80],[map$_+3*$b[$x],#b=grep$_%9<3,0..20]}0..8],'alid')
(Linebreaks added for "clarity".)
c() is a function that checks the input in #y against a list of lists of position numbers passed as an argument. It returns 0 if all position lists are valid (contain no number more than once) and 1 otherwise, using recursion to check each list. The bottom line builds this list of lists, passes it to c() and uses the result to select the right prefix to output.
One thing that I quite like is that this solution takes advantage of "self-similarity" in the "block" position list in #b (which is redundantly rebuilt many times to avoid having #b=... in a separate statement): the top-left position of the ith block within the entire puzzle can be found by multiplying the ith element in #b by 3.
More spread out:
# Grab input into an array of individual characters, discarding whitespace
#y = map /\S/g, <>;
# Takes a list of position lists.
# Returns 0 if all position lists are valid, 1 otherwise.
sub c {
# Pop the last list into $h, extract the characters at these positions with
# map, and check the result for multiple occurences of
# any digit using a regex. Note | behaves like || here but is shorter ;)
# If the match fails, try again with the remaining list of position lists.
# Because Perl returns the last expression evaluated, if we are at the
# end of the list, the pop will return undef, and this will be passed back
# which is what we want as it evaluates to false.
(join '', map $y[$_], #$h) =~ /(\d).*\1/ | c(#_) if $h = pop
}
# Make a list of position lists with map and pass it to c().
print(('V','Inv')[c map {
$x=$_; # Save the outer "loop" variable
[$_*9..$_*9+8], # Columns
[grep$_%9==$x,0..80], # Rows
[map$_+3*$b[$x],#b=grep$_%9<3,0..20] # Blocks
} 0..8], # Generates 1 column, row and block each time
'alid')
Perl: 202
I'm reading Modern Perl and felt like coding something... (quite a cool book by the way:)
while(<>){$i++;$j=0;for$s(split//){$j++;$l{$i}{$s}++;$c{$j}{$s}++;
$q{(int(($i+2)/3)-1)*3+int(($j+2)/3)}{$s}++}}
$e=V;for$i(1..9){for(1..9){$e=Inv if$l{$i}{$_}>1or$c{$i}{$_}>1or$q{$i}{$_}>1}}
print $e.alid
Count is excluding unnecessary newlines.
This may require Perl 5.12.2.
A bit more readable:
#use feature qw(say);
#use JSON;
#$json = JSON->new->allow_nonref;
while(<>)
{
$i++;
$j=0;
for $s (split //)
{
$j++;
$l{$i}{$s}++;
$c{$j}{$s}++;
$q{(int(($i+2)/3)-1)*3+int(($j+2)/3)}{$s}++;
}
}
#say "lines: ", $json->pretty->encode( \%l );
#say "columns: ", $json->pretty->encode( \%c );
#say "squares: ", $json->pretty->encode( \%q );
$e = V;
for $i (1..9)
{
for (1..9)
{
#say "checking {$i}{$_}: " . $l{$i}{$_} . " / " . $c{$i}{$_} . " / " . $q{$i}{$_};
$e = Inv if $l{$i}{$_} > 1 or $c{$i}{$_} > 1 or $q{$i}{$_} > 1;
}
}
print $e.alid;
Ruby — 176
f=->x{x.any?{|i|(i-[?.]).uniq!}}
a=[*$<].map{|i|i.scan /./}
puts f[a]||f[a.transpose]||f[a.each_slice(3).flat_map{|b|b.transpose.each_slice(3).map &:flatten}]?'Invalid':'Valid'
Lua, 341 bytes
Although I know that Lua isn't the best golfing language, however, considering it's size, I think it's worth posting it ;).
Non-golfed, commented and error-printing version, for extra fun :)
i=io.read("*a"):gsub("\n","") -- Get input, and strip newlines
a={{},{},{}} -- checking array, 1=row, 2=columns, 3=squares
for k=1,3 do for l=1,9 do a[k][l]={0,0,0,0,0,0,0,0,0}end end -- fillup array with 0's (just to have non-nils)
for k=1,81 do -- loop over all numbers
n=tonumber(i:sub(k,k):match'%d') -- get current character, check if it's a digit, and convert to a number
if n then
r={math.floor((k-1)/9)+1,(k-1)%9+1} -- Get row and column number
r[3]=math.floor((r[1]-1)/3)+3*math.floor((r[2]-1)/3)+1 -- Get square number
for l=1,3 do v=a[l][r[l]] -- 1 = row, 2 = column, 3 = square
if v[n] then -- not yet eliminated in this row/column/square
v[n]=nil
else
print("Double "..n.." in "..({"row","column","square"}) [l].." "..r[l]) --error reporting, just for the extra credit :)
q=1 -- Flag indicating invalidity
end
end
end
end
io.write(q and"In"or"","Valid\n")
Golfed version, 341 bytes
f=math.floor p=io.write i=io.read("*a"):gsub("\n","")a={{},{},{}}for k=1,3 do for l=1,9 do a[k][l]={0,0,0,0,0,0,0,0,0}end end for k=1,81 do n=tonumber(i:sub(k,k):match'%d')if n then r={f((k-1)/9)+1,(k-1)%9+1}r[3]=f((r[1]-1)/3)+1+3*f((r[2]-1)/3)for l=1,3 do v=a[l][r[l]]if v[n]then v[n]=nil else q=1 end end end end p(q and"In"or"","Valid\n")
Python: 140
v=[(k,c) for y in range(9) for x,c in enumerate(raw_input()) for k in x,y+9,(x/3,y/3) if c>'.']
print["V","Inv"][len(v)>len(set(v))]+"alid"
ASL: 108
args1["\n"x2I3*x;{;{:=T(T'{:i~{^0}?})}}
{;{;{{,0:e}:;{0:^},u eq}}/`/=}:-C
dc C#;{:|}C&{"Valid"}{"Invalid"}?P
ASL is a Golfscript inspired scripting language I made.

Resources