I was creating a RSS feed using yahoo pipes but i want title in description and description in title. I want to swap the title and description.
<item>
<title>"**title**"</title>
<description>"**description**"</description>
<link>http://link</link>
</item>
I want
<item>
<title>"**description**"</title>
<description>"**title**"</description>
<link>http://link</link>
</item>
Here's one way you can do it - using 3 Pipes modules to swap the title and description fields:
Fetch Feed: Pull in the RSS feed. I'm using a Yahoo Finance feed as a sample.
Rename: Make a copy of item.title as newDesc, and item.description as newTitle
Create RSS: Set title to newTitle, description to newDesc, others as default values
I created a sample pipe you can view and copy from: http://pipes.yahoo.com/pipes/pipe.info?_id=ffc846056c71a4dd3df7b01d16fdd613
Here's a sample from the original Yahoo Finance RSS feed:
<item>
<title>Samsung Posts $7.4B Profit as Handsets Mask Weak Chip Sales</title>
<link>http://us.rd.yahoo.com/finance/news/rss/story/SIG=149kpevtc/*http%3A//us.rd.yahoo.com/finance/news/topfinstories/SIG=126hhlj6u/*http%3A//finance.yahoo.com/news/samsung-posts-7-4-bln-234859667.html?l=1</link>
<description>Samsung Electronics Co., the world's top technology firm by revenue, reported record quarterly profit of $7.4 billion on Friday, with strong sales of its Galaxy range of phones masking sharply lower memory chip sales.</description>
<guid isPermaLink="false">yahoo_finance/1866191516</guid>
<pubDate>Fri, 26 Oct 12 00:55:44 GMT</pubDate>
</item>
...and here's the corresponding output from the sample pipe, with title & description swapped:
<item>
<title>Samsung Electronics Co., the world's top technology firm by revenue, reported record quarterly profit of $7.4 billion on Friday, with strong sales of its Galaxy range of phones masking sharply lower memory chip sales.</title>
<link>http://us.rd.yahoo.com/finance/news/rss/story/SIG=149kpevtc/*http%3A//us.rd.yahoo.com/finance/news/topfinstories/SIG=126hhlj6u/*http%3A//finance.yahoo.com/news/samsung-posts-7-4-bln-234859667.html?l=1</link>
<description>Samsung Posts $7.4B Profit as Handsets Mask Weak Chip Sales</description>
<guid isPermaLink="false">yahoo_finance/1866191516</guid>
<pubDate>Sat, 30 Mar 1918 19:36:14 +0000</pubDate>
</item>
Note: in my sample above something happened to the pubDate. I think it might be a Yahoo Pipes caching issue that will clear up the next time the pipe runs.
Related
My university has a website where it posts announcements. I can't afford to miss these announcements, and at the same time, checking the website every day is kinda cumbersome. The website has no RSS feed.
The announcements are posted on a web page with the following as the format for the URL of an announcement :
http://example.com/news/detail/1/n
where n is the announcement ID, which is numeric.
When there is an announcement, the above web page (http://example.com/news/detail/1/180, for example), contains the announcement in the following format :
<div class="middleconten">
<h3>
Title </h3>
11 October, 2019
<p>
<a href='/some/link' target='_blank'>Click here for more details</a>
</p>
</div>
and when there is no announcement, (that is, when a user visits a web page with a n value, that doesn't correspond to an actual announcement ID, http://example.com/news/detail/1/1234567890, for example), the web page is as follows :
<div class="middleconten">
<h3>
</h3>
1 January, 1970
<p>
</p>
</div>
How do I make a RSS feed for the website capturing the <h3> value, the href attribute, and the date?
You will need to scrape the website regularly for new news items. You can use goquery for extracting the data.
The idea is simple. You need to generate the urls for the news section (fill in the value of n) starting from 1 and visit each url. If you find news (the structure exists), store the data. Add 1 to the n value to get the next ID. If the url doesn't contain news, stop and store the value of the number / ID of last successful news. The next time you can start from this ID instead of the beginning.
For example, I start from 1, I find the last successful news at ID 32. I save it somewhere. Next time I can start from 33 instead of 1.
When you have a database of the data extracted from the website, you can publish your own RSS feed from those. You can use a router like chi and gorilla feeds to create the rss feed.
Following vCard 2.1, I'm adding photos to vCards. I'm encoding the image fetched from a URL, then adding the encoded value to the proper place within the vCard. This seemingly displays the photo correctly for all programs that can open vCards except the Windows Contacts program on Windows 7 (probably doesn't work for newer versions of Windows either).
As far as I can tell, the below snippet should display the vcard photo when opened in Windows Contacts:
BEGIN:VCARD
VERSION:2.1
N;CHARSET=ISO-8859-1:Lastname;Firstname;
FN;CHARSET=ISO-8859-1:Firstname Lastname
ORG;CHARSET=ISO-8859-1: Organization LLP
PHOTO;ENCODING=b;TYPE=jpg: <base64 encoded image as one line>
TITLE;CHARSET=ISO-8859-1:Position
TEL;WORK;VOICE:+1 999 999 9999
END:VCARD
All the other information displays in Windows Contacts, but not the photo. The standard blank image placeholder displays.
I have tried
ENCODING=BASE64
ommitting the ENCODING keyword altogether
removing the TYPE keyword altogether
using specifically a 240px by 240px image
adding the image URL value in the file instead of the encoded value
Anyone have any ideas?
Version 2.1 uses ENCODING=BASE64.
Put an empty line after the PHOTO property. Outlook requires this, so Contacts might too.
Put all parameter names/values in upper case. I know of one compatibility problem with Windows Contacts where it doesn't recognize a parameter value if it's in lower case.
Remove the space character before the base64 data.
Try setting the TYPE parameter to JPEG.
You've correctly encoded the image data, right? Try using an online decoder to make sure.
Corrected property:
PHOTO;ENCODING=BASE64;TYPE=JPEG:<base64 encoded image as one line>
[empty line]
According to the specs, if you have a URL, you should set the VALUE parameter to URL.
PHOTO;TYPE=JPEG;VALUE=URL:<url goes here>
I've run into your post while researching this question and I was able to successfully find the correct way to achieve this.
Bellow is a self-containing example of a v-card containing an 96x96 embedded red.gif image as the PHOTO.
https://www.rfc-editor.org/rfc/rfc2426#section-3.1.4
Be sure to note the VERSION line and the PHOTO line. This is currently working in Outlook365.
BEGIN:VCARD
VERSION:3.0
N:Gump;Forrest;;Mr.;
FN:Forrest Gump
ORG:Bubba Gump Shrimp Co.
TITLE:Shrimp Man
PHOTO;ENCODING=BASE64;TYPE=GIF:R0lGODdhYABgAPAAALccHMlFJiH5BAEAAAEALAAAAABgAGAAAAJuhI+py+0Po5y02ouz3rz7D4biSJbmiabqyrbuC8fyTNf2jef6zvf+DwwKh8Si8YhMKpfMpvMJjUqn1Kr1is1qt9yu9wsOi8fksvmMTqvX7Lb7DY/L5/S6/Y7P6/f8vv8PGCg4SFhoeIiYqLiIUgAAOw==
TEL;TYPE=work,voice;VALUE=uri:tel:+1-111-555-1212
TEL;TYPE=home,voice;VALUE=uri:tel:+1-404-555-1212
ADR;TYPE=WORK;PREF=1;LABEL="100 Waters Edge\nBaytown\, LA 30314\nUnited States of America":;;100 Waters Edge;Baytown;LA;30314;United States of America
ADR;TYPE=HOME;LABEL="42 Plantation St.\nBaytown\, LA 30314\nUnited States of America":;;42 Plantation St.;Baytown;LA;30314;United States of America
EMAIL:forrestgump#example.com
REV:20080424T195243Z
x-qq:21588891
END:VCARD
I have three samples of text nodes that and I want to extract three different parts of the text, using a universal x-path.
First
<p class="product-summary">
This is an amazing game from the company Midway Games. Excellent gameplay. Very good game.
</p>
Second
<p class="product-summary">
New Line Cinema distributed this movie in 1995.
</p>
Third
<p class="product-summary">
New game from 2011, with new 3D graphics. This game was made by NetherRealm Studios.
</p>
The extraction should be either Midway Games or New Line Cinema or NetherRealm Studios
Note that the text node allways include just one company, never two or three (just one).
My try is from this question but the problem is that it dosen't work nor include all three companies.
substring('Midway Games',1,12*contains(//p[#class='product-summary']/following-sibling::text()[1], 'Midway Games'))
As the input will only contain one of them, you can use concat to join the results.
concat(
substring('Midway Games', 1,
12*contains(//p[#class='product-summary'], 'Midway Games')),
substring('Line Cinema', 1,
11*contains(//p[#class='product-summary'], 'Line Cinema')),
substring('NetherRealm Studios', 1,
19*contains(//p[#class='product-summary'], 'NetherRealm Studios'))
)
You can remove the line breaks that I added for readability as you want.
I had to fix the query you provided: the text nodes are no following-siblings, but children. Your XPath processor will query the (concatenated) text nodes below that element anyway as contains works on strings.
My friend works as a CMS admin for a Youtube Network and asked me if there is a way to automatically extract the number of views and the revenue for specific videos. Let me explain. If some people upload copyrighted material, as a CMS admin you have 2 options, either remove the video or claim it/add an asset so that ads appear on the video when being watched but the problem is that those videos are not connected to an partnered account so you can't see the revenue for that channel but have to check each video individually. So if I have a video (making it up) with the link: http://www.youtube.com/watch?v=123abcEFG56, you can take the code of the video, "123abcEFG56", paste it into the search-box of the Youtube CMS analytics and you get all the information for that video if you claimed it/added an asset to it (he tried to search by using words which are part of the video title (he tried even the exact title) in the YT CMS analytics but this works only for videos that are uploaded on a partnered channel/account so for videos uploaded on non-partnered channels, you can only view the statistics for that video if you put in the code of the video into the search box).
I came up with an idea, I visited some of the channels with the claimed videos, clicked on the video's tab and copied the html code. Then, by using a regular expression and PowerShell (Win7), I extracted all the video codes into a .txt file. Each line of the .txt file contains one video code, for example, it would look something like this:
123abcEFG56
123abcEFG57
123abcEFG58
...
So, this is not about a regular Youtube account but a CMS account and since analytics offers it, I would like to extract the data from the "Last Month" (default on Youtube is: "Last 30 days").
I am not familiar with the Youtube-API, so my question is, is it possible (and if yes, how) to make a batch script which would take one code per line and request the views and revenue made last month for the video with the corresponding code, and then "write" that info into another .txt or .csv file (ideally: "Video name", "Number of views", "Revenue")?
Thanks in advance for your answers!
You can use the Content ID API for this. Please reach to your partner manager for details.
Can anyone recommend a Ruby library for creating a summary of a given URL? What I have in mind is the sort of one- or two-sentence summary as seen in search engine results.
You could you just scrape the web page for either description meta tag or if that's not available the first few sentences from the first <p> element on the page. The description meta tag looks like this:
<meta name="description" content="Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser with XPath and CSS selector support." />
There's several Ruby libraries for parsing HTML. I hear that Nokogiri is good for this sort of stuff, but I have no experience with it personally.
Spidering a site and scraping pages is easy. Summarizing a page is difficult.
The metatags can help a little, as there is supposed to be a direct correlation between the summary and the content.
Unfortunately, not all pages have them, and many that do are inaccurate. That leaves us with having to scape text, hoping that it's pertinent to the content and context. Page layouts vary and there is no standard saying where on a page the main content actually lies and, because of CSS and Ajax, it might not be where we'd expect it, in the first couple lines of text. There might not be <p> tags, as a <div> or <span> with the appropriate CSS can replace the look.
I've written many spiders that did contextual analysis of the pages, trying to summarize, and it's ugly and not bullet-proof, especially when dealing with the English language because of homonyms, synonyms, and other "nyms" that get in the way.
If you can locate text to summarize, there are decent tools to reduce several paragraphs, or a paper, into a short sentence. Mac OS comes with a summarizer, and has for years. "Summarize Text Using Mac OSX Summarize Or Microsoft Word AutoSummarize" talks about enabling it if you want to experiment. "Mac 101: Shorten text using the Summarize Service" is about using it on the Mac. There's a driver or app for it that can be called from the CLI. See "How to use Mac OS X's Summary Service on the command line?" for more info.
And, as a demo, here's Lincoln's Gettysburg address summarized to one line:
It is rather for us to be here dedicated to the great task remaining before us—that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion—that we here highly resolve that these dead shall not have died in vain—that this nation, under God, shall have a new birth of freedom—and that government of the people, by the people, for the people, shall not perish from the earth.