Completely new to Ruby, looking through docs and can't seem to find what I'm looking for. I have an object and I'm trying to dig down into it to access something. The object tweets[0] looks like this...
--- !ruby/object:Twitter::Tweet
attrs:
:created_at: Wed Apr 10 00:58:21 +0000 2013
:user:
:location: ''
:entities:
:description:
:urls: []
:protected: false
:geo:
:entities:
:hashtags:
- :text: adult
:indices:
- 34
- 40
:urls: []
:user_mentions: []
:media:
:indices:
- 41
- 63
:url: http:t.co/i-need-this-image
:type: photo
:sizes:
:thumb:
:w: 150
:h: 150
:resize: crop
:small:
:w: 340
:h: 453
:resize: fit
:medium:
:w: 600
:h: 800
:resize: fit
:large:
:w: 768
:h: 1024
:resize: fit
I've tried so many different ways, none of them seem to be working correctly. In order to dump them out I've been using
puts YAML::dump(tweets[0])
--
puts YAML::dump(tweets[0].media) # returns the media method correctly
puts YAML::dump(tweets[0]['media']) # also seems to do it
puts YAML::dump(tweets[0].media.url) # idk
puts YAML::dump(tweets[0]['media']['url']) # I feel like this should work but it doesn't
The following worked for me:
require 'yaml'
tweets = YAML.load_file('test.yml') # this file contains a copy of the YAML
p tweets["attrs"][:entities][:media][:url] # "http:t.co/i-need-this-image"
Related
I want to edit a tree that I got from BEAST2 treeannotator in nexus-format.
Usually I use the module Phylo from Biopython for such work but Phylo.read(r"filename.tree", "nexus") gave me the next exception:
---------------------------------------------------------------------------
NexusError Traceback (most recent call last)
Input In [29], in <cell line: 1>()
----> 1 Phylo.read(r"filename.tree", "nexus")
File ~\miniconda3\lib\site-packages\Bio\Phylo\_io.py:60, in read(file, format, **kwargs)
58 try:
59 tree_gen = parse(file, format, **kwargs)
---> 60 tree = next(tree_gen)
61 except StopIteration:
62 raise ValueError("There are no trees in this file.") from None
File ~\miniconda3\lib\site-packages\Bio\Phylo\_io.py:49, in parse(file, format, **kwargs)
34 """Parse a file iteratively, and yield each of the trees it contains.
35
36 If a file only contains one tree, this still returns an iterable object that
(...)
46
47 """
48 with File.as_handle(file) as fp:
---> 49 yield from getattr(supported_formats[format], "parse")(fp, **kwargs)
File ~\miniconda3\lib\site-packages\Bio\Phylo\NexusIO.py:40, in parse(handle)
32 def parse(handle):
33 """Parse the trees in a Nexus file.
34
35 Uses the old Nexus.Trees parser to extract the trees, converts them back to
(...)
38 eventually change Nexus to use the new NewickIO parser directly.)
39 """
---> 40 nex = Nexus.Nexus(handle)
42 # NB: Once Nexus.Trees is modified to use Tree.Newick objects, do this:
43 # return iter(nex.trees)
44 # Until then, convert the Nexus.Trees.Tree object hierarchy:
45 def node2clade(nxtree, node):
File ~\miniconda3\lib\site-packages\Bio\Nexus\Nexus.py:668, in Nexus.__init__(self, input)
665 self.options["gapmode"] = "missing"
667 if input:
--> 668 self.read(input)
669 else:
670 self.read(DEFAULTNEXUS)
File ~\miniconda3\lib\site-packages\Bio\Nexus\Nexus.py:718, in Nexus.read(self, input)
716 break
717 if title in KNOWN_NEXUS_BLOCKS:
--> 718 self._parse_nexus_block(title, contents)
719 else:
720 self._unknown_nexus_block(title, contents)
File ~\miniconda3\lib\site-packages\Bio\Nexus\Nexus.py:759, in Nexus._parse_nexus_block(self, title, contents)
757 for line in block.commandlines:
758 try:
--> 759 getattr(self, "_" + line.command)(line.options)
760 except AttributeError:
761 raise NexusError("Unknown command: %s " % line.command) from None
File ~\miniconda3\lib\site-packages\Bio\Nexus\Nexus.py:1144, in Nexus._translate(self, options)
1142 break
1143 elif c != ",":
-> 1144 raise NexusError("Missing ',' in line %s." % options)
1145 except NexusError:
1146 raise
NexusError: Missing ',' in line 1 AB298157.1_2015_-7.9133750332192605_114.8086828279248, 2 AB298158.1_2007_-8.41698974207…
Using Nexus.read(Nexus(), input=r"filename.tree") gave the same result. Please could anyone help with this? I cannot understand the reason of this error because nexus file looks correct.
The reason is that Biopython cannot read nexus trees with links, constituent from translations & a newick tree. So it is required previously to convert this to the form with full names into the tree (as hereinbelow).
Begin
tree TREE1 = (((your,tree),(in,(the, newick))),format);
End;
P.S. It is allowed in the newick format to surround the label with quotes, & some programmes or scripts add them to those names that have ambiguous characters. But it can lead to exceptions during the following phylogenetic analysis, for instance, in BEAST. I wish you would be careful with this.
I'm importing content from an outside database that is infected with a variety of odd characters, e.g.
> str
=> "Nature’s Variety, Best Friends Animal Society team up"
From context it seems that ’ represents a right single-quote. In cp1252 encoding:
> str.encode('cp1252')
=> "Nature\xE2\x80\x99s Variety, Best Friends Animal Society team up"
So how do I convert it to the correct UTF-8 character? Here's what I've tried:
> str.encode('UTF-8')
=> "Nature’s Variety, Best Friends Animal Society team up"
> str.encode('cp1252').encode('UTF-8')
=> "Nature’s Variety, Best Friends Animal Society team up"
> str.encode('UTF-8', invalid: :replace, replace: '?', undef: :replace)
=> "Nature’s Variety, Best Friends Animal Society team up"
> str.encode('cp1252').encode('UTF-8', invalid: :replace, replace: '?', undef: :replace)
=> "Nature’s Variety, Best Friends Animal Society team up"
I'd rather find a way to do a generic re-encoding so that it will handle all such miss-encoded characters. But if I have to I'll do individual search and replacing. But I'm not able to make that work either:
> str.encode('cp1252').gsub('\xE2/x80/x99', "'")
=> "Nature\xE2\x80\x99s Variety, Best Friends Animal Society team up"
> str.encode('cp1252').gsub(%r{\xE2\x80\x99}, "'")
SyntaxError: unexpected tIDENTIFIER, expecting $end
> str.encode('cp1252').gsub(Regexp.escape('\xE2\x80\x99'), "'")
=> "Nature\xE2\x80\x99s Variety, Best Friends Animal Society team up"
I'd like to do this, but I can't even paste these characters into my REPL:
> str.gsub('’', "'")
When I try I get:
> str.gsub('C"b,b,b
* "', ",")
=> "Nature’s Variety, Best Friends Animal Society team up"
Frustrating. Any suggestions on how to encode this properly into UTF-8?
Edit: At the request for the actual bytes in the string:
> str.bytes.to_a.join(' ')
=> "78 97 116 117 114 101 195 162 226 130 172 226 132 162 115 32 86 97 114 105 101 116 121 44 32 66 101 115 116 32 70 114 105 101 110 100 115 32 65 110 105 109 97 108 32 83 111 99 105 101 116 121 32 116 101 97 109 32 117 112"
I had this problem with Fixing Incorrect String Encoding From MySQL. You need to set the proper encoding and then force it back.
fallback = {
"\u0081" => "\x81".force_encoding("CP1252"),
"\u008D" => "\x8D".force_encoding("CP1252"),
"\u008F" => "\x8F".force_encoding("CP1252"),
"\u0090" => "\x90".force_encoding("CP1252"),
"\u009D" => "\x9D".force_encoding("CP1252")
}
str.encode('CP1252', fallback: fallback).force_encoding('UTF-8')
The fallback may not be necessary depending on your data, but it ensures that it won't raise an error by handling the five bytes which are undefined in CP1252.
Once Ruby has got the encoding wrong, the characters will stay incorrect, according to the original mistake. Conversions simply convert the now wrong characters into the new encoding.
To correct Ruby's mistake on input, you need to use the force_encoding method, which does not do a conversion, it just corrects Ruby's note of what encoding a String has.
In your case the fault has occurred before you read the values from the DB. If you pick out the problem bytes: bytes = %w(195 162 226 130 172 226 132 162).map(&:to_i) they look to be in UTF-8 encoding, and already in the database double-encoded. You can probably assume a problem with whatever has written these into the DB (note if it is a live process, this is a bug that needs sorting, you will continue to get these bad values in).
What has happened is your DB (or code that writes to it) received some UTF-8 bytes representing the correct character, but assumed they were CP1252 to be converted to UTF-8. It made that conversion and wrote valid UTF-8 (but wrong characters) into the DB.
If I do the following in Ruby console using UTF-8 encoding in my terminal and as the default Ruby encoding, I can replicate your problem:
str = "Nature’s Variety, Best Friends Animal Society team up"
=> "Nature’s Variety, Best Friends Animal Society team up"
str = str.force_encoding('CP1252').encode('UTF-8')
=> "Nature’s Variety, Best Friends Animal Society team up"
The fault is reversible, as shown here:
str = str.encode('CP1252').force_encoding('UTF-8')
=> "Nature’s Variety, Best Friends Animal Society team up"
The encode('CP1252') undoes the original mistaken conversion.
The force_encoding('UTF-8') sets the encoding back to what the system most likely received in the first place.
You will want to find where in your system an assumption of CP1252 input is being made, and instead assume UTF-8 (it may get more complicated than that if you have multiple sources in different encodings).
I have few mp3 files as binary strings with same number of channels and same sample rate. I need to concatenate them in memory without using command line tools.
Currently I just do string concatenation, like this:
out = ''
mp3s.each { |mp3| out << mp3 }
Audio players can play the result, but with some warnings, because mp3 headers were not handled correctly as far as I understand.
Is there a way to proceed the concatenation in more correct way?
After reading this article about MP3 in russian I came up with solution.
You must be able to get complete ID3 specification at http://id3.org/ but it seems to be down at the moment.
Usually Mp3 file have the next format:
[ID3 head(10 bytes) | ID3 tags | MP3 frames ]
ID3 is not part of MP3 format, but it's kind of container which is used to put information like artists, albums, etc...
The audio data itself are stored in MP3 frames.Every frame starts with 4 bytes header which provides meta info (codecs, bitrate, etc).
Every frame has fixed size. So if there are not enough samples at the end of last frame, coder adds silence to make frame have necessary size. I also found there chunks like
LAME3.97 (name and version of coder).
So, all we need to do is to get rid of ID3 container. The following solution works for me perfect, no warnings anymore and out file became smaller:
# Length of header that describes ID3 container
ID3_HEADER_SIZE = 10
# Get size of ID3 container.
# Length is stored in 4 bytes, and the 7th bit of every byte is ignored.
#
# Example:
# Hex: 00 00 07 76
# Bin: 00000000 00000000 00000111 01110110
# Real bin: 111 1110110
# Real dec: 1014
#
def get_id3_size(header)
result = 0
str = header[6..9]
# Read 4 size bytes from left to right applying bit mask to exclude 7th bit
# in every byte.
4.times do |i|
result += (str[i].ord & 0x7F) * (2 ** (7 * (3-i)))
end
result
end
def strip_mp3!(raw_mp3)
# 10 bytes that describe ID3 container.
id3_header = raw_mp3[0...ID3_HEADER_SIZE]
id3_size = get_id3_size(id3_header)
# Offset from which mp3 frames start
offset = id3_size + ID3_HEADER_SIZE
# Get rid of ID3 container
raw_mp3.slice!(0...offset)
raw_mp3
end
# Read raw mp3s
hi = File.binread('hi.mp3')
bye = File.binread('bye.mp3')
# Get rid of ID3 tags
strip_mp3!(hi)
strip_mp3!(bye)
# Concatenate mp3 frames
hi << bye
# Save result to disk
File.binwrite('out.mp3', hi)
I have written the following function that find if a pixel belongs to an image in matlab.
At the beginning, I wanted to test it as if a number in a set belongs to a vector like the following:
function traverse_pixels(img)
for i:1:length(img)
c(i) = img(i)
end
But, when I run the following commands for example, I get the error shown at the end:
>> A = [ 34 565 456 535 34 54 5 5 4532 434 2345 234 32332434];
>> traverse_pixels(A);
??? Error: File: traverse_pixels.m Line: 2 Column: 6
Unexpected MATLAB operator.
Why is that? How can I fix the problem?
Thanks.
There is a syntax error in the head of your for loop, it's supposed to be:
for i = 1:length(img)
Also, to check if an array contains a specific value you could use:
A = [1 2 3]
if sum(A==2)>0
disp('there is at least one 2 in A')
end
This should be faster since no for loop is included.
for i = 1:length(image)
silly error, not : , it is =
I created a big array a, whose memory grew to ~500 MB:
a = []
t = Thread.new do
loop do
sleep 1
print "#{a.size} "
end
end
5_000_000.times do
a << [rand(36**10).to_s(36)]
end
puts "\n size is #{a.size}"
a = []
t.join
After that, I "cleared" a, but the allocated memory didn't change until I killed the process. Is there something special I need to do to remove all these data which were assigned to a from the memory?
If I use the Ruby Garbage Collection Profiler on a lightly modified version of your code:
GC::Profiler.enable
GC::Profiler.clear
a = []
5_000_000.times do
a << [rand(36**10).to_s(36)]
end
puts "\n size is #{a.size}"
a = []
GC::Profiler.report
I get the following output (on Ruby 1.9.3)(some columns and rows removed):
GC 60 invokes.
Index Invoke Time(sec) Use Size(byte) Total Size(byte) ...
1 0.109 131136 409200 ...
2 0.125 192528 409200 ...
...
58 33.484 199150344 260938656 ...
59 36.000 211394640 260955024 ...
The profile starts with 131 136 bytes used, and ends with 211 394 640 bytes used, without decreasing in size anywhere in the run, we can assume that no garbage collection has taken place.
If I then add a line of code which adds a single element to the array a, placed after a has grown to 5 million elements, and then has an empty array assigned to it:
GC::Profiler.enable
GC::Profiler.clear
a = []
5_000_000.times do
a << [rand(36**10).to_s(36)]
end
puts "\n size is #{a.size}"
a = []
# the only change is to add one element to the (now) empty array a
a << [rand(36**10).to_s(36)]
GC::Profiler.report
This changes the profiler output to (some columns and rows removed):
GC 62 invokes.
Index Invoke Time(sec) Use Size(byte) Total Size(byte) ...
1 0.156 131376 409200 ...
2 0.172 192792 409200 ...
...
59 35.375 211187736 260955024 ...
60 36.625 211395000 469679760 ...
61 41.891 2280168 307832976 ...
This profiler run now starts with 131 376 bytes used, which is similar to the previous run, grows, but ends with 2 280 168 bytes used, significantly lower than the previous profile run that ended with 211 394 640 bytes used, we can assume that garbage collection took place this during this run, probably triggered by our new line of code that adds an element to a.
The short answer is no, you don't need to do anything special to remove the data that was assigned to a, but hopefully this gives you the tools to prove it.
You can call GC.start(), but you might not want to. See for example: Ruby garbage collect for a discussion here on Stack Overflow. Basically, I'd let the garbage collector decide for itself when to run unless you have a compelling reason to force it.