Extracting with Regex in Yahoo pipes - yahoo-pipes

I am trying to use Yahoo pipes and remove everything from "Article" till the end of the page.
If I use Regex exp Article.+ I can extract everything till the end of line which is till "2011" . But I need to extract till the end of the page which is till "url.replace"
What am I doing wrong. I am using http://gskinner.com/RegExr/ which is awesome
Here is what the section of the page looks like
This Article was reviewed by Brown Last updated on: Oct 2, 2011
'; _url = _url.replace
Thanks
Hil

I think I got it:
query string is:
Article.+
this will extract till end of line
If one wants to extract till end of page use the S checkbox in Yahoo Pipes

Related

why is Chef File search_file_replace not working

I am trying to get this code to replace one attribute and value but it erases the whole line.
ruby_block "update connection pool" do
block do
fe = Chef::Util::FileEdit.new(servlet_xml_path)
fe.search_file_replace("maxPoolSize=\"[0-9]+\"", "maxPoolSize=\"20\"")
fe.write_file
end
end
The function works if I have a simpler regex like:
fe.search_file_replace("maxPoolSize=", "maxPoolSize2=")
ShellFish provided the answer I was looking for.
fe.search_file_replace(/maxPoolSize="[0-9]+"/, 'maxPoolSize="20"')! –

Bizarre field switch between cli and file output in Ruby

I'm having such an strange issue with an ruby script which i'm working with... in this script i parse an iTunes Library xml file and form objects for Artists, Albums and Tracks. In my Album class, i have two numeric field, YEAR and TRACK_COUNT.
My script parses correctly the two fields, let's say, for example, the output of it:
#<Album:0x007f59b1472a18 #compilation=false, #title="Straight Out Of Hell", #year=2013, #track_count=13, #trackList=[], #coverList=[]>
when i output this same object to file, it get crippled, transforming to this, here in json format:
{"compilation":false,"title":"Straight Out Of Hell","year":13,"track_count":13,"trackList":[],"coverList":[]}]
as you can see, the field YEAR get overwritten with the value in TRACK_COUNT field... i'm getting crazy with this, as i don't do any change to this field between these outputs!
UPDATE
As asked by #Amadan...
http://pastebin.com/1FUuvaCr Biblioteca.xml (EXCERPT)
http://pastebin.com/F8wgu6bz Track.rb
http://pastebin.com/3qhd4TRU Song.rb
http://pastebin.com/RNf5S7AZ dependencies.rb
http://pastebin.com/haXPpJgN Cover.rb
http://pastebin.com/1JYtT1nn Artist.rb
http://pastebin.com/qsgLsAJa Album.rb
http://pastebin.com/eiUAMfwR app.rb (MAIN SCRIPT)
This is happening because your source file is not as clean as you believe it to be. In some albums in the source XML, "Track Count" and "Year" are appearing on the same line, without a recognized line break between them. So you might have a line like this:
<key>Track Count</key><integer>12</integer><key>Year</key><integer>2006</integer>
When your if-else-if ladder asks if "track count" appears in the line, it does, so you're grabbing the first <integer>something</integer> match on the line. This works fine. But when you try to extract the year out of this line, you're again asking for the first <integer> on the line, which is the Track Count.
The bigger problem is that you're attempting to parse an XML file line-by-line, and that's not how they're meant to be read. Install the nokogiri gem and call this:
data = Nokogiri::XML('Biblioteca.xml')
Now you can get to any information contained in the document. The official tutorials on user Nokogiri are here: http://www.nokogiri.org/tutorials/
Use this method to parse your file:
def parse filename
xml = Nokogiri::XML(filename)
songs = xml.css('dict key').select{|key| key.text =~ /^[0-9]{4}$/}
songs.map do |song|
info = {}
song.next_element.css('key').each do |attribute|
info[attribute.text] = attribute.next_element.text
end
info
end
end
This will create a list of song hashes. Here are some examples for how to use it:
# load the two songs in your example file
songs = parse('Biblioteca.xml')
# Get the year of the first song
songs[0]['Year'] #=> 2006
# Get the Track Count of the second song's album
songs[1]['Track Count'] #=> 12
# Get the Name of the second song
songs[1]['Name'] #=> 'Baby Come On'
# Get the Album name of the second song
songs[1]['Album'] #=> 'When Your Heart Stops Beating'
From here, you can easily put info into your song objects. Let me know if you have any more questions.
I've found a library for iTunes dodgy plist xml standart... Nokogiri-plist... working fine now :D

How can I QUICKLY get a string from one of the first couple lines of a long CSV at a remote URL?

I'm working on an assignment where I retrieve several stock prices from online, using Yahoo's stock price system. Unfortunately, the Yahoo API I'm required to use returns a .csv file that apparently contains a line for every single day that stock has been traded, which is at least 5 thousand lines for the stocks I'm working with, and over 10 thousand lines for some of them (example).
I only care about the current price, though, which is in the second line.
I'm currently doing this:
require 'open-uri'
def get_ticker_price(stock)
open("http://ichart.finance.yahoo.com/table.csv?s=#{stock}") do |io|
io.read.split(',')[10].to_f
end
end
…but it's really slow.
Is all the delay coming from getting the file, or is there some from the way I'm handling it? Is io.read reading the entire file?
Is there a way to download only the first couple lines from the Yahoo CSV file?
If the answers to questions 1 & 2 don't render this one irrelevant, is there a better way to process it that doesn't require looking at the entire file (assuming that's what io.read is doing)?
You can use query string parameters to reduce the data to the current date, by using date range parameters.
example for MO on 7/13/2012: (start/end month starts w/ a zero-index, { 00 - 11 } ).
http://ichart.finance.yahoo.com/table.csv?s=MO&a=06&b=13&c=2012&d=6&e=13&f=2012&g=d
api description here:
http://etraderzone.com/free-scripts/47-historical-quotes-yahoo.html

How to trim text and use it as parameter for next step in watir using ruby

This may be very simple question but I am very new to ruby or any programming language. I want to trim some text and use it as parameter for next step. Can any one please write me code for doing this. I am testing a web application which is used in financial domain. I need to use the cvv2 and expiry date of card number which is generated in next step as parameter. The text which gets displayed on html is
CVV2 - 657  Expiry - 05/12 (mm/yy)
Now from the above text I should some how get only '657' and '0512' as value to use is in next step.
Request for urgent assistance.
If all your strings will be formatted like this, I suggest using regexp, using String#gsub. There's lots of places to learn about regexp if you don't already, and rubular allows you to test it in your browser, with a short cheat sheet.
To do it quickly you could use
card_details = "CVV2 - 657 Expiry - 05/12 (mm/yy)"
card_details = card_details.scan(/\d{2,}/)
cvv2 = card_details[0]
expiry = card_details[1] + card_details[2]
Probably better ways of doing it as I'm no expert, but you said urgent, so.
For getting the text out of the cell you could try (I don't use the original watir anymore, so I might not be able to remember this):
card_details = browser.td(:text => /CVV2/).text
If that doesn't work give this a try (actually on second thought TRY THIS ONE FIRST)
card_details = browser.cell(:text => /CVV2/).text
For these examples I'm assuming your browser object is called "browser".
We can use regular expression to achieve the same,
> "CVV2 - 657 Expiry - 05/12 (mm/yy)".match(/\d{3}/)
=> "657"
>"CVV2 - 657 Expiry - 05/12 (mm/yy)".match(/\d+\/\d+/)
=> "05/12"

extract single string from HTML using Ruby/Mechanize (and Nokogiri)

I am extracting data from a forum. My script based on is working fine. Now I need to extract date and time (21 Dec 2009, 20:39) from single post. I cannot get it work. I used FireXPath to determine the xpath.
Sample code:
require 'rubygems'
require 'mechanize'
post_agent = WWW::Mechanize.new
post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')
puts post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]/text()').to_s.strip
puts post_page.parser.at_xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]/text()').to_s.strip
puts post_page.parser.xpath('//[#id="post1960370"]/tbody/tr[1]/td/div[2]/text()')
all my attempts end with empty string or an error.
I cannot find any documentation on using Nokogiri within Mechanize. The Mechanize documentation says at the bottom of the page:
After you have used Mechanize to navigate to the page that you need to scrape, then scrape it using Nokogiri methods.
But what methods? Where can I read about them with samples and explained syntax? I did not find anything on Nokogiri's site either.
Radek. I'm going to show you how to fish.
When you call Mechanize::Page::parser, it's giving you the Nokogiri document. So your "xpath" and "at_xpath" calls are invoking Nokogiri. The problem is in your xpaths. In general, start out with the most general xpath you can get to work, and then narrow it down. So, for example, instead of this:
puts post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]/text()').to_s.strip
start with this:
puts post_page.parser.xpath('//table').to_html
This gets the any tables, anywhere, and then prints them as html. Examine the HTML, to see what tables it brought back. It probably grabbed several when you want only one, so you'll need to tell it how to pick out the one table you want. If, for example, you notice that the table you want has CSS class "userdata", then try this:
puts post_page.parser.xpath("//table[#class='userdata']").to_html
Any time you don't get back an array, you goofed up the xpath, so fix it before proceding. Once you're getting the table you want, then try to get the rows:
puts post_page.parser.xpath("//table[#class='userdata']//tr").to_html
If that worked, then take off the "to_html" and you now have an array of Nokogiri nodes, each one a table row.
And that's how you do it.
I think you have copied this from Firebug, firebug gives you an extra tbody, which might not be there in actual code... so my suggestion is to remove that tbody and try again.
if it still doesn't work ... then follow Wayne Conrad's process that's the best!

Resources