Ruby broken core methods? - ruby

Here we go:
images = {:default=>["http://original-img", "http://original-img2"]}
img_src = ["http://localhost/image987.jpeg", "http://localhost/image988.jpeg"]
img_ids = [2046, 2047]
_images_src = images.clone
_images_src.each_value{|v| v.map!{img_src.shift}}
p _images_src # {:default=>["http://localhost/image987.jpeg", "http://localhost/image988.jpeg"]}
images.each_value{|v| v.map!{img_ids.shift}}
p images # {:default=>[2046, 2047]}
p _images_src # {:default=>[2046, 2047]}
How each_value call on images, changes the _images_src hash? They refer to different objects and _images_src IS CLONED images and still changes.

You've done a "shallow clone" but need a "deep clone." Search around for how to make that happen and what the tradeoffs are. You can see this by running the below. Note the object ids are the same.
[8] pry(main)> #images.values.first.object_id
=> 70308363136840
[9] pry(main)> _images_src.values.first.object_id
=> 70308363136840

Related

Ruby Twitter, Retrieving Full Tweet Text

I'm using the Ruby Twitter gem to retrieve the full text of a tweet.
I first tried this, and as you can see the text was truncated.
[5] pry(main)> t = client.status(782845350918971393)
=> #<Twitter::Tweet id=782845350918971393>
[6] pry(main)> t.text
=> "A #Gameofthrones fan? Our #earlybird Dublin starter will get you
touring the GOT location in 2017
#traveldealls… (SHORTENED URL WAS HERE)"
Then I tried this:
[2] pry(main)> t = client.status(782845350918971393, tweet_mode: 'extended')
=> #<Twitter::Tweet id=782845350918971393>
[3] pry(main)> t.full_text
=>
[4] pry(main)> t.text
=>
Both the text and full text are empty when I use the tweet_mode: 'extended' option.
I also tried editing the bit of the gem that makes the request, the response was the same.
perform_get_with_object("/1.1/statuses/show/#{extract_id(tweet)}.json?tweet_mode=extended", options, Twitter::Tweet)
Any help would be greatly appreciated.
Here's a workaround I found helpful:
Below is the way I am handling this issue ATM. Seems to be working. I am using both Streaming (with Tweetstream) and REST APIs.
status = #client.status(1234567890, tweet_mode: "extended")
if status.truncated? && status.attrs[:extended_tweet]
# Streaming API, and REST API default
t = status.attrs[:extended_tweet][:full_text]
else
# REST API with extended mode, or untruncated text in Streaming API
t = status.attrs[:text] || status.attrs[:full_text]
end
From https://github.com/sferik/twitter/pull/848#issuecomment-329425006

Ruby - Matching Twitter URL from any html page using Regex

I am trying to fetch the Twitter URL from this page for instance; however, my result is nil. I am pretty sure my regex is not too bad, but my code fails. Here is it :
doc = `(curl --url "http://www.rabbitreel.com/")`
twitter_url = ("/^(?i)[http|https]+:\/\/(?i)[twitter]+\.(?i)(com)\/?\S+").match(doc)
puts twitter_url
# => nil
Maybe, I misused regex syntax. My initial idea was simple: I wanted to match a regular Twitter url structure. I even tried http://rubular.com to test my regex, and it seemed to be fine when I entered a Twitter url.
http://ruby-doc.org/core-2.2.0/String.html#method-i-match
tells you that the object you're calling match on should be the string you're parsing, and the parameter should be the regex pattern. So if anything, you should call :
doc.match("/^(?i)[http|https]+:\/\/(?i)[twitter]+\.(?i)(com)\/?\S+")
I prefer
doc[/your_regex/]
syntax, because it directly delivers a String, and not a MatchData, which needs another step to get the information out of.
For Regexen, I always try to begin as simple as possible
[3] pry(main)> doc[/twitter/]
=> "twitter"
[4] pry(main)> doc[/twitter\.com/]
=> "twitter.com"
[5] pry(main)> doc[/twitter\.com\//]
=> "twitter.com/"
[6] pry(main)> doc[/twitter\.com\/\//] #OOPS. One \/ too many
=> nil
[7] pry(main)> doc[/twitter\.com\//]
=> "twitter.com/"
[8] pry(main)> doc[/twitter\.com\/\S+/]
=> "twitter.com/rabbitreel\""
[9] pry(main)> doc[/twitter\.com\/[^"]+/]
=> "twitter.com/rabbitreel"
[10] pry(main)> doc[/http:\/\/twitter\.com\/[^"]+/]
=> nil
[11] pry(main)> doc[/https?:\/\/twitter\.com\/[^"]+/]
=> "https://twitter.com/rabbitreel"
[12] pry(main)> doc[/https?:\/\/twitter\.com\/[^" ]+/]
=> "https://twitter.com/rabbitreel"
[13] pry(main)> doc[/https?:\/\/twitter\.com\/\w+/] #DONE
=> "https://twitter.com/rabbitreel"
EDIT:
Sure, Regexen cannot parse an entire HTML document.
Here, we only want to find the first occurence of a Twitter URL. So, depending on the requirements, on possible input and the chosen platform, it could make sense to use a Regexp.
Nokogiri is a huge gem, and it might not be possible to install it.
Independently from this fact, it would be a very good idea to check that the returned String really is a correct Twitter URL.
I think this Regexp:
/https?:\/\/twitter\.com\/\w+/
is safe.
[31] pry(main)> malicious_doc = "https://twitter.com/userid#maliciouswebsite.com"
=> "https://twitter.com/userid#maliciouswebsite.com"
[32] pry(main)> malicious_doc[/https?:\/\/twitter\.com\/\w+/]
=> "https://twitter.com/userid"
Using Nokogiri doesn't prevent you from checking for malicious input.
The proposed solution from #mudasobwa is interesting, but isn't safe yet:
[33] pry(main)> Nokogiri::HTML('<html><body>Link</body></html>').css('a').map { |e| e.attributes.values.first.value }.select {|e| e =~ /twitter.com/ }
=> ["http://maliciouswebsitethatisnottwitter.com/"]
NB as of Nov 2021, rabbitreel.com domain is on sale, so please read the comments about the possibility of it’s serving malicious content.
One should never use regexps to parse HTML and here is why.
Below is a robust solution using Nokogiri HTML parsing library:
require 'nokogiri'
doc = Nokogiri::HTML(`(curl --url "http://www.rabbitreel.com/")`)
doc.css('a').map { |e| e.attributes.values.first.value }
.select {|e| e =~ /twitter.com/ }
#⇒ [
# [0] "https://twitter.com/rabbitreel",
# [1] "https://twitter.com/rabbitreel"
# ]
Or, alternatively, with xpath:
require 'nokogiri'
doc = Nokogiri::HTML(`(curl --url "http://www.rabbitreel.com/")`)
doc.xpath('//a[contains(#href, "twitter.com")]')
.map { |e| e.attributes['href'].value }

Ruby leaked objects are referenced by RubyVm::Env

I am tracing a memory leak problem in our application (ruby 2.1). I am using both techniques: ObjectSpace.dump_all for dumping all objects to JSON stream then do an offline analysis. The second technique I used is live analysis with ObjectSpace.reachable_objects_from. In both ways, I found that my leaked objects are referenced by an object RubyVM::Env. Anyone could explain to me what is RubyVM::Env. How to remove those references?
RubyVM::Env is an internal ruby class that holds variable references. Here is my test:
require 'objspace'
a = Object.new
a_id = a.object_id # we use #object_id to avoid creating more reference to `a`
ObjectSpace.each_object.select{ |o| ObjectSpace.reachable_objects_from(o).map(&:object_id).include?(a_id) }.count
# => 1
env = ObjectSpace.each_object.select{ |o| ObjectSpace.reachable_objects_from(o).map(&:object_id).include?(a_id) }.first
# => #<RubyVM::Env:0x007ff39ac09a78>
ObjectSpace.reachable_objects_from(env).count
# => 5
a = nil # remove reference
ObjectSpace.reachable_objects_from(env).count
# => 4

Weird behavior of #upcase! in Ruby

Consider the following code:
#person = { :email => 'hello#example.com' }
temp = #person.clone
temp[:email].upcase!
p temp[:email] # => HELLO#EXAMPLE.COM
p #person[:email] # => HELLO#EXAMPLE.COM, why?!
# But
temp[:email] = 'blah#example.com'
p #person[:email] # => HELLO#EXAMPLE.COM
Ruby version is: "ruby 2.1.0p0 (2013-12-25 revision 44422) [i686-linux]".
I have no idea why is it happening. Can anyone help, please?
In the clone documentation you can read:
Produces a shallow copy of obj—the instance variables of obj are
copied, but not the objects they reference. clone copies the frozen
and tainted state of obj.
Also pay attention to this:
This method may have class-specific behavior. If so, that behavior
will be documented under the #initialize_copy method of the class.
Meaning that in some classes this behaviour can be overrided.
So any object references will be kept, instead of creating new ones. So what you want is a deep copy you can use Marshal:
temp = Marshal.load(Marshal.dump(#person))

chef 11: any way to turn attributes into a ruby hash?

I'm generating a config for my service in chef attributes. However, at some point, I need to turn the attribute mash into a simple ruby hash. This used to work fine in Chef 10:
node.myapp.config.to_hash
However, starting with Chef 11, this does not work. Only the top-level of the attribute is converted to a hash, with then nested values remaining immutable mash objects. Modifying them leads to errors like this:
Chef::Exceptions::ImmutableAttributeModification
------------------------------------------------ Node attributes are read-only when you do not specify which precedence level to set. To
set an attribute use code like `node.default["key"] = "value"'
I've tried a bunch of ways to get around this issue which do not work:
node.myapp.config.dup.to_hash
JSON.parse(node.myapp.config.to_json)
The json parsing hack, which seems like it should work great, results in:
JSON::ParserError
unexpected token at '"#<Chef::Node::Attribute:0x000000020eee88>"'
Is there any actual reliable way, short of including a nested parsing function in each cookbook, to convert attributes to a simple, ordinary, good old ruby hash?
after a resounding lack of answers both here and on the opscode chef mailing list, i ended up using the following hack:
class Chef
class Node
class ImmutableMash
def to_hash
h = {}
self.each do |k,v|
if v.respond_to?('to_hash')
h[k] = v.to_hash
else
h[k] = v
end
end
return h
end
end
end
end
i put this into the libraries dir in my cookbook; now i can use attribute.to_hash in both chef 10 (which already worked properly and which is unaffected by this monkey-patch) and chef 11. i've also reported this as a bug to opscode:
if you don't want to have to monkey-patch your chef, speak up on this issue:
http://tickets.opscode.com/browse/CHEF-3857
Update: monkey-patch ticket was marked closed by these PRs
I hope I am not too late to the party but merging the node object with an empty hash did it for me:
chef (12.6.0)> {}.merge(node).class
=> Hash
I had the same problem and after much hacking around came up with this:
json_string = node[:attr_tree].inspect.gsub(/\=\>/,':')
my_hash = JSON.parse(json_string, {:symbolize_names => true})
inspect does the deep parsing that is missing from the other methods proposed and I end up with a hash that I can modify and pass around as needed.
This has been fixed for a long time now:
[1] pry(main)> require 'chef/node'
=> true
[2] pry(main)> node = Chef::Node.new
[....]
[3] pry(main)> node.default["fizz"]["buzz"] = { "foo" => [ { "bar" => "baz" } ] }
=> {"foo"=>[{"bar"=>"baz"}]}
[4] pry(main)> buzz = node["fizz"]["buzz"].to_hash
=> {"foo"=>[{"bar"=>"baz"}]}
[5] pry(main)> buzz.class
=> Hash
[6] pry(main)> buzz["foo"].class
=> Array
[7] pry(main)> buzz["foo"][0].class
=> Hash
[8] pry(main)>
Probably fixed sometime in or around Chef 12.x or Chef 13.x, it is certainly no longer an issue in Chef 15.x/16.x/17.x
The above answer is a little unnecessary. You can just do this:
json = node[:whatever][:whatever].to_hash.to_json
JSON.parse(json)

Resources