Grab xml-xpath by attribute - ruby

http://www.mdr.de/export/sandmann/folgen/sandmann612-mediaRss_doca-1_zc-1a3071ad.xml returns, besides others, these lines:
(...)
<media:content url="http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-e9ebd6e42ce1.mp4" type="video/mpeg" expression="full" width="512" height="288" bitrate="512" duration="398" />
<media:content url="http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-c7cca1d51b4b.mp4" type="video/mpeg" expression="full" width="960" height="544" bitrate="1536" duration="398" />
(...)
How would I tell Nokogiri to extract only the line where bitrate="1536"?
I'd actually just need the URL within that XPath, so I expect (I find it rather rude to write "expect" here, but I was told to do so ;) the following string returned:
http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-c7cca1d51b4b.mp4
If someone is interested, this will allow me to download the daily episode of the Sandmännchen, a german TV miniseries for Little kids. :)
So far I have tried using simpleRSS with this:
(...)
rss.entries.each do |entry|
pp entry
end
But that only returns the first item of the media:group "set" of links:
{:title=>"Sandmann vom 14. Oktober 2012",
:link=>"http://www.mdr.de/export/sandmann/folgen/video78338.html",
:description=>
"Die j\xC3\xBCngste Geschichte vom Sandmann gibt es f\xC3\xBCr 24 Stunden hier auf Abruf. Heute: Molly mag keine Schuhe. Das finden die anderen Monster merkw\xC3\xBCrdig, weil Monster Schuhe lieben.",
:pubDate=>2012-09-19 14:54:43 +0200,
:guid=>
"mp4:4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-8442e17c3177",
:media_content_url=>
"rtmp://x4100mp4dynonlc22033.f.o.f.lb.core-cdn.net/22033mdr/ondemand",
:media_content_type=>"fms/h264",
:media_content_height=>"272",
:media_content_width=>"480",
:media_title=>"Sandmann vom 14. Oktober 2012",
:media_thumbnail_url=>
"http://www.mdr.de/export/sandmann/folgen/sandmann864_v-standard43_zc-698fff06.jpg",
:media_thumbnail_height=>"135",
:media_thumbnail_width=>"180"}

How about this:
doc.at_xpath('//media:content[#bitrate="1536"]/#url').text
#=> "http://www.mdr.de/export/sandmann/folgen/sandmann612-mediaRss__zc-1a3071ad.xml"
The link by the way doesn't work, so I wasn't actually able to test this on the full document.
UPDATE:
Using the info from your answer below, in nokogiri:
filme = Nokogiri::XML(open('http://www.sandmann.de/static/san/app/filme.xml'))
folge = Nokogiri::XML(open(filme.xpath('//filme/folge').text))
folge.at_xpath('//media:content[#bitrate="1536"]/#url').text
#=> "http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-c7cca1d51b4b.mp4"

This is what I came up with in the end - no nokogiri (which, I assume, is very powerful but has a rather steep learning curve. Plus, I simply don't understand it...) but crack instead. It seems to be more rubyish and plays along nicely with the MRSS feed I am getting:
require 'rubygems'
require 'pp'
require 'crack'
require 'asciify'
require 'open-uri'
fileurl = ""
filme = Crack::XML.parse(open('http://www.sandmann.de/static/san/app/filme.xml'))
folge = Crack::XML.parse(open(filme['filme']['folge']))
titel = folge['rss']['channel']['item']['description'].to_s.sub(/.*Die jüngste Geschichte vom Sandmann gibt es für 24 Stunden hier auf Abruf. Heute: /, '')
folge['rss']['channel']['item']['media:group']['media:content'].each do |x|
fileurl << x['url'] if x['bitrate'] == "1536"
end
filename = titel.split(".").first.asciify + ".m4v"
filename.gsub!(" ","_")
system("curl -o \"#{filename}\" \"#{fileurl}\"")
Just in case your kids want to watch, too ;)

To make it easy, simply:
doc.at('content[#bitrate="1536"]')[:url]

require 'nokogiri'
require 'open-uri'
url = 'http://www.mdr.de/export/sandmann/folgen/sandmann612-mediaRss_doca-1_zc-1a3071ad.xml'
doc = Nokogiri.XML(open(url))
doc.remove_namespaces! # Just to make our life simpler
content = doc.at_css('content[bitrate="1536"]')
puts content['url']
#=> http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-fd2af820-ec90-4f34-a58e-db1b9fdcc25a-c7cca1d51b4b.mp4

Related

Twitter api not working in terminal

Am using the twitter gem to connect to the twitter streaming api.
When i run the code in the console in sublime text 2, everything works as it should and am getting the results from the api. However when i try to run the script from the terminal i get this error:
/Users/username/.rbenv/versions/2.1.4/lib/ruby/gems/2.1.0/gems/twitter-5.15.0/lib/twitter/streaming/connection.rb:16:in `initialize': Can't assign requested address - connect(2) for "199.16.156.217" port (Errno::EADDRNOTAVAIL)
Am only using the example code from the github page of the twitter gem.
https://github.com/sferik/twitter
client = Twitter::Streaming::Client.new do |config|
config.consumer_key = "YOUR_CONSUMER_KEY"
config.consumer_secret = "YOUR_CONSUMER_SECRET"
config.access_token = "YOUR_ACCESS_TOKEN"
config.access_token_secret = "YOUR_ACCESS_SECRET"
end
client.sample do |object|
puts object.text if object.is_a?(Twitter::Tweet)
end
Do anyone know why i get this error, and how i can fix this?
require 'twitter'
while true
config = {
:consumer_key => CONSUMER_KEY,
:consumer_secret => COMSUMER_SECRET,
:access_token => ACCESS_TOKEN,
:access_token_secret => ACCESS_TOKEN_SECRET,
}
sClient = Twitter::Streaming::Client.new(config)
topics = ['edelweiss', 'rose']
sClient.filter(:track => topics.join(',')) do |tweet|
if tweet.is_a?(Twitter::Tweet)
puts "#{tweet.user.screen_name}: #{tweet.text}"
end
end
end
running code
$ ruby lasswi.rb
Suphatra_Rfc: RT #GGiftfyy: Rose Gold ในมือนั้นอิจเเรงงงง เครื่องเก่าโยนมาทางนี้ก็ได้นะเพ่~ น้องพร้อมเสมอ😂😂 #งานซูมต้องมา #อยากได้อ่ะอยากได้ 😆 http://t.…
CBullsfans: Jimmy Butler Reportedly Doesn't Respect Derrick Rose's Work Ethic http://t.co/3Ikjvvjuth #Bulls #NBA
sobinasalvez: RT #iPhoneTeam: Rose gold everything http://t.co/1DLhXokknu
magicearth_: RT #magazine_wmw: Rose-ringed parakeets in flight on their way to roost in an urban cemetery in London, England.
Photograph: Sam Hobson htt…
demoo2012: Rose, use this pic 👍
#Razana96 http://t.co/uAwS9JdHyl
EndearingImages: New artwork for sale! - "Grace" - http://t.co/ugpIaxABqg #fineartamerica http://t.co/jg0e3eDNll
LoveKnitting: Great rose workshop with #NickyKnits at #TheKnittingandStitchingShow such a lovely lady! #twistedthread http://t.co/rV0Bsjg63t
camarillonican4: Brand New Sealed - Apple iPhone 6S Plus - 64GB - Rose Gold - UNLOCKED http://t.co/ygFPoN7pLO http://t.co/mqXKJmmrNz
Dekho00: RT #PAPIGFUNK: Giveaway ENDING on Sunday! Enter Now- iPhone 6S Plus - Rose Gold - Unboxing + Giveaway! https://t.co/02ONZ6D8IS #iPhone6SPlu…
souravmishra1: RT #RHIndia: "Only in art will the lion lie down with the lamb, and the rose grow without thorn." - Martin Amis​ #RandomAmis http://t.co/MT…
exol_lzw0112: RT #DOThFanclub: [Preview] 151009 ONE K Concert (cr.Like a star, Lovely Rose, Chibimori)
อันนยอง~~ http://t.co/w8vvE40dEK
BruhninhaD: Livro: Hugo & Rose da Editora Agir
Será o correto deixar a realidade para viver um sonho?http://t.co/OfxkKrzBog #books #book #livros #blog
This was a known issue with the twitter gem, using an updated version from GitHub solved the problem.
https://github.com/sferik/twitter/issues/709

Ruby - URL to Markdown

TOTAL rookie here.
I'm working on customizing a script made by Brett Terpstra - http://brettterpstra.com/2013/11/01/save-pocket-favorites-to-nvalt-with-ifttt-and-hazel/
Mine is a different use: I'd like to save my pinboard bookmarks with a specific tag to a file in dropbox in Markdown.
I feed it a text file such as:
Title: Yesterday is over.
URL: http://www.jonacuff.com/blog/want-to-change-the-world-get-doing/
Tags: 2md, 2wcx, 2pdf
Date: June 20, 2013 at 06:20PM
Image: notused
Excerpt: You can't start the next chapter of your life if you keep re-reading the last one.
And it outputs the markdown file.
Everything works great except when the 'excerpt' (see above) is more than one line. Sometimes it's a couple of paragraphs. When that happens, it stops working. When I hit enter from the command line, it's still waiting for more input.
Here's an example of a file that it doesn't work on:
Title: Talking ’bout my Generation.
URL: http://blog.greglaurie.com/?p=8881
Tags: 2md, 2wcx, 2pdf
Date: June 28, 2013 at 09:46PM
Image: notused
Excerpt: Contrast two men from the 19th century: Max Jukes and Jonathan Edwards.
Max Jukes lived in New York. He did not believe in Christ or in raising his children in the way of the Lord. He refused to take his children to church, even when they asked to go. Of his 1,026 descendants:
•300 were sent to prison for an average term of 13 years
•190 were prostitutes
•680 were admitted alcoholics
His family, thus far, has cost the state in excess of $420,000 and has made no contribution to society.
Jonathan Edwards also lived in New York, at the same time as Jukes. He was known to have studied 13 hours a day and, in spite of his busy schedule of writing, teaching, and pastoring, he made it a habit to come home and spend an hour each day with his children. He also saw to it that his children were in church every Sunday. Of his 929 descendants:
•430 were ministers
•86 became university professors
•13 became university presidents
•75 authored good books
•7 were elected to the United States Congress
•1 was Vice President of the United States
Edwards’ family never cost the state one cent.
We tend to think that our decisions only affect ourselves, but they have ramifications for generations to come.
Here's a screenshot of what it looks like after I run the command: https://www.dropbox.com/s/i9zg483k7nkdp6f/Screenshot%202013-11-22%2016.39.17.png
I'm hoping it's something easy. Any ideas?
#!/usr/bin/env ruby
# Works with IFTTT recipe https://ifttt.com/recipes/125999
#
# Set Hazel to watch the folder you specify in the recipe.
# Make sure nvALT is set to store its notes as individual files.
# Edit the $target_folder variable below to point to your nvALT
# ntoes folder.
require 'date'
require 'open-uri'
require 'net/http'
require 'fileutils'
require 'cgi'
$target_folder = "~/Dropbox/messx/urls2md"
def url_to_markdown(url)
res = Net::HTTP.post_form(URI.parse("http://heckyesmarkdown.com/go/"),{'u'=>url,'read'=>'1'})
if res.code.to_i == 200
res.body
else
false
end
end
file = ARGV[0]
begin
input = IO.read(file).force_encoding('utf-8')
headers = {}
input.each_line {|line|
key, value = line.split(/: /)
headers[key] = value.strip || ""
}
outfile = File.join(File.expand_path($target_folder), headers['Title'].gsub(/["!*?'|]/,'') + ".txt")
date = Time.now.strftime("%Y-%m-%d %H:%M")
date_added = Date.parse(headers['Date']).strftime("%Y-%m-%d %H:%M")
content = "Title: #{headers['Title']}\nDate: #{date}\nDate Added: #{date_added}\nSource: #{headers['URL']}\n"
tags = false
if headers['Tags'].length > 0
tag_arr = header s['Tags'].split(", ")
tag_arr.map! {|tag|
%Q{"#{tag.strip}"}
}
tags = tag_arr.join(" ")
content += "Keywords: #{tags}\n"
end
markdown = url_to_markdown(headers['URL']).force_encoding('utf-8')
if markdown
content += headers['Image'].length > 0 ? "\n\n> #{headers['Excerpt']}\n\n---#{markdown}\n" : "\n\n"+markdown
else
content += headers['Image'].length > 0 ? "\n\n![](#{headers['Image']})\n\n#{headers['Excerpt']}\n" : "\n\n"+headers['Excerpt']
end
File.open(outfile,'w') {|f|
f.puts content
}
if tags && File.exists?("/usr/local/bin/openmeta")
%x{/usr/local/bin/openmeta -a #{tags} -p "#{outfile}"}
end
# FileUtils.rm(file)
rescue Exception => e
puts e
end
How about this? Modify your input.each_line area accordingly:
headers = {}
key = nil
input.each_line do |line|
match = /^(?<key>\w+)\s*:\s*(?<value>.*)/.match(line)
value = line
if match
key = match[:key].strip
headers[key] = match[:value].strip
else
headers[key] += line
end
end
First, splitting on just ":" is dangerous since that can be in content. Instead, a (simplified from code) regex of /^\w+:.*/ will match "Word: Content". Since the lines after the "Excerpt:" aren't prefixed, you need to hang on to the last seen key, and just append if there's no key for this line. You may need to add a newline in there, depending on what you're doing with that header information, but it seems to work.

How do I parse Google image URLs using Ruby and Nokogiri?

I'm trying to make an array of all the image files on a Google images webpage.
I want a regular expression to pull everything after "imagurl=" and ending before "&amp" as seen in this HTML:
<img height="124" width="124" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRLy5inpSdHxWuE7z3QSZw35JwN3upbBaLr11LR25noTKbSMn9-qrySSg"><br><cite title="trendytree.com">trendytree.com</cite><br>Silent Night Chapel <b>20031</b><br>400 × 400 - 58k - jpg</td>
I feel like I can do this with a regex, but I can't find a way to search my parsed document using regex, but I'm not finding any solutions.
str = '<img height="124" width="124" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRLy5inpSdHxWuE7z3QSZw35JwN3upbBaLr11LR25noTKbSMn9-qrySSg"><br><cite title="trendytree.com">trendytree.com</cite><br>Silent Night Chapel <b>20031</b><br>400 × 400 - 58k - jpg</td>'
str.split('imgurl=')[1].split('&amp')[0]
#=> "http://www.trendytree.com/old-world- christmas/images/20031chapel20031-silent-night-chapel.jpg"
Is that what you're looking for?
The problem with using a regex is you assume too much knowledge about the order of parameters in the URL. If the order changes, or & disappears the regex won't work.
Instead, parse the URL, then split the values out:
# encoding: UTF-8
require 'nokogiri'
require 'cgi'
require 'uri'
doc = Nokogiri::HTML.parse('<img height="124" width="124" src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRLy5inpSdHxWuE7z3QSZw35JwN3upbBaLr11LR25noTKbSMn9-qrySSg"><br><cite title="trendytree.com">trendytree.com</cite><br>Silent Night Chapel <b>20031</b><br>400 × 400 - 58k - jpg</td>')
img_url = doc.search('a').each do |a|
query_params = CGI::parse(URI(a['href']).query)
puts query_params['imgurl']
end
Which outputs:
http://www.trendytree.com/old-world-christmas/images/20031chapel20031-silent-night-chapel.jpg
Both URI and CGI are used because URI's decode_www_form raises an exception when trying to decode the query.
I've also been known to decode the query string into a hash using something like:
Hash[URI(a['href']).query.split('&').map{ |p| p.split('=') }]
That will return:
{"imgurl"=>
"http://www.trendytree.com/old-world-christmas/images/20031chapel20031-silent-night-chapel.jpg",
"imgrefurl"=>
"http://www.trendytree.com/old-world-christmas/silent-night-chapel-20031-christmas-ornament-old-world-christmas.html",
"usg"=>"__YJdf3xc4ydSfLQa9tYnAzavKHYQ",
"h"=>"400",
"w"=>"400",
"sz"=>"58",
"hl"=>"en",
"start"=>"19",
"zoom"=>"1",
"tbnid"=>"ajDcsGGs0tgE9M:",
"tbnh"=>"124",
"tbnw"=>"124",
"ei"=>"qagfUbXmHKfv0QHI3oG4CQ",
"itbs"=>"1",
"sa"=>"X",
"ved"=>"0CE4QrQMwEg"}
To get all the img urls you want do
# get all links
url = 'some-google-images-url'
links = Nokogiri::HTML( open(url) ).css('a')
# get regex match or nil on desired img
img_urls = links.map {|a| a['href'][/imgurl=(.*?)&/, 1] }
# get rid of nils
img_urls.compact
The regex you want is /imgurl=(.*?)&/ because you want a non-greedy match between imgurl= and &, otherwise the greedy .* would take everything to the last & in the string.

Parsing an XML feed using Nokogiri isn't working

This is my code:
doc= Nokogiri::HTML(open("http://www.cincinnatisun.com/index.php?rss/90d24f4ad98a2793", 'User-Agent' => 'ruby'))
search=doc.css('item')
if !search.blank?
search.each do |data|
title=data.css("title").text
link=data.css("link").text
end
end
but I did not get the link.
Several things are wrong:
if !search.blank?
won't work because search would be a NodeSet returned by doc.css. NodeSet's don't have a blank? method. Perhaps you meant empty??
title=data.css("title").text
isn't the correct way to find the title because, like in the above problem, you're getting a NodeSet instead of a Node. Getting text from a NodeSet can return a lot of garbage you don't want. Instead do:
title=data.at("title").text
Changing the code to this:
require 'nokogiri'
require 'open-uri'
doc= Nokogiri::HTML(open("http://www.cincinnatisun.com/index.php?rss/90d24f4ad98a2793", 'User-Agent' => 'ruby'))
search=doc.css('item')
if !search.empty?
search.each do |data|
title=data.at("title").text
link=data.at("link").text
puts "title: #{ title } link: #{ link }"
end
end
Outputs:
title: Ex-Bengals cheerleaders lawsuit trial to begin link:
title: Freedom Center Offering Free Admission Monday link:
title: Miami University Band Performing in the Inaugural Parade link:
title: Northern Kentucky Man To Present Colors At Inauguration link:
title: John Gumms Monday Forecast link:
title: President Obama VP Biden sworn in officially begin second terms link:
title: Colerain Township Pizza Hut Robbed Saturday Night link:
title: Cold Snap Coming to Tri-State link:
title: 2 Men Arrested After Police Chase in Northern Kentucky link:
The link won't work because the XML is malformed, which, in my experience, is unbelievably common on the internet because people don't take the time to check their work.
The fix is going to take massaging the XML prior to Nokogiri receiving the content, or to modify your accessors. Luckily, this particular XML is easy to work around so this should help:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://www.cincinnatisun.com/index.php?rss/90d24f4ad98a2793", 'User-Agent' => 'ruby'))
search = doc.css('item')
if !search.empty?
search.each do |data|
title = data.at("title").text
link = data.at("link").next_sibling.text
puts "title: #{ title } link: #{ link }"
end
end
Which outputs:
title: Ex-Bengals cheerleaders lawsuit trial to begin link: http://www.cincinnatisun.com/index.php/sid/212072454/scat/90d24f4ad98a2793
title: Freedom Center Offering Free Admission Monday link: http://www.cincinnatisun.com/index.php/sid/212072914/scat/90d24f4ad98a2793
title: Miami University Band Performing in the Inaugural Parade link: http://www.cincinnatisun.com/index.php/sid/212072915/scat/90d24f4ad98a2793
title: Northern Kentucky Man To Present Colors At Inauguration link: http://www.cincinnatisun.com/index.php/sid/212072913/scat/90d24f4ad98a2793
title: John Gumms Monday Forecast link: http://www.cincinnatisun.com/index.php/sid/212070535/scat/90d24f4ad98a2793
title: President Obama VP Biden sworn in officially begin second terms link: http://www.cincinnatisun.com/index.php/sid/212060033/scat/90d24f4ad98a2793
title: Colerain Township Pizza Hut Robbed Saturday Night link: http://www.cincinnatisun.com/index.php/sid/212057132/scat/90d24f4ad98a2793
title: Cold Snap Coming to Tri-State link: http://www.cincinnatisun.com/index.php/sid/212057131/scat/90d24f4ad98a2793
title: 2 Men Arrested After Police Chase in Northern Kentucky link: http://www.cincinnatisun.com/index.php/sid/212057130/scat/90d24f4ad98a2793
All that done, you can write your code more clearly like:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://www.cincinnatisun.com/index.php?rss/90d24f4ad98a2793", 'User-Agent' => 'ruby'))
doc.css('item').each do |data|
title = data.at("title").text
link = data.at("link").next_sibling.text
puts "title: #{ title } link: #{ link }"
end
Interestingly enough, now the sample page appears to have its links fixed.
According to http://nokogiri.org/tutorials/searching_a_xml_html_document.html something like:
#doc = Nokogiri::XML(File.read("feed.xml"))
#doc.xpath('//xmlns:link')
should do the job. But be aware, that your provided xml snippet isn't a valid xml feed at all (no root element, item tag not opened - only closed etc.). The code assumes the xml feed looks i.e.
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<item>
<title>Atom-Powered Robots Run Amok</title>
<link>http://example.org/2003/12/13/atom03</link>
</item>
</feed>
And extracts:
<link>http://example.org/2003/12/13/atom03</link>
as result. Please try to to look at the documentation/reference first, if you have problems like this. If you tried something and it didn't worked like you would expect, than you can consult stackoverflow with actual code - that makes it easier to understand your problem & provide help.

trying to get content inside cdata tags in xml file using nokogiri

I have seen several things on this, but nothing has seemed to work so far. I am parsing an xml via a url using nokogiri on rails 3 ruby 1.9.2.
A snippet of the xml looks like this:
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
I am trying to parse this out to get the text associated with the NewsLineText
r = node.at_xpath('.//newslinetext') if node.at_xpath('.//newslinetext')
s = node.at_xpath('.//newslinetext').text if node.at_xpath('.//newslinetext')
t = node.at_xpath('.//newslinetext').content if node.at_xpath('.//newslinetext')
puts r
puts s ? if s.blank? 'NOTHING' : s
puts t ? if t.blank? 'NOTHING' : t
What I get in return is
<newslinetext></newslinetext>
NOTHING
NOTHING
So I know my tags are named/spelled correctly to get at the newslinetext data, but the cdata text never shows up.
What do I need to do with nokogiri to get this text?
You're trying to parse XML using Nokogiri's HMTL parser. If node as from the XML parser then r would be nil since XML is case sensitive; your r is not nil so you're using the HTML parser which is case insensitive.
Use Nokogiri's XML parser and you will get things like this:
>> r = doc.at_xpath('.//NewsLineText')
=> #<Nokogiri::XML::Element:0x8066ad34 name="NewsLineText" children=[#<Nokogiri::XML::Text:0x8066aac8 "\n ">, #<Nokogiri::XML::CDATA:0x8066a9c4 "\n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n ">, #<Nokogiri::XML::Text:0x8066a8d4 "\n">]>
>> r.text
=> "\n \n Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.\n \n"
and you'll be able to get at the CDATA through r.text or r.children.
Ah I see. What #mu said is correct. But to get at the cdata directly, maybe:
xml =<<EOF
<NewsLineText>
<![CDATA[
Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee.
]]>
</NewsLineText>
EOF
node = Nokogiri::XML xml
cdata = node.search('NewsLineText').children.find{|e| e.cdata?}

Resources