Yahoo pipes: fetch page - yahoo-pipes

I want to use a part of the web page in an open sorce digital signage software.
Can anybody tell me how can I use yahoo pipes: fetch page to get part of the web page that I want in a very simple way. It seems to complex to me.
Thanks for your help.

Yahoo Pipes has a module called "Fetch Page" and all you have to do is fill in the URL of the page.
However would not advise using pipes for this application because..
The owner of a website can easily block Yahoo Pipes.
Yahoo has a bad habit of changing the way the Pipes service works.
A Yahoo Pipe can only be called 200 times in ten minutes.
Suggest using cURL to fetch your required information.

Related

Indexing Hash Bang #! Content Using Google Search Appliance (GSA)

Has anyone had success indexing content that contains #! (Hashbang) in the URL? If so, how did you do it?
We have a third party help center of ours that we are hosting that requires the use of #! in the URL, however, we need the ability to index this content within our GSA. We are using version 7.0.14.G.238 of our GSA
Here's an example of one of our help articles with a hashbang in the URL:
/templates/selfservice/example/#!portal/201500000001006/article/201500000006039/Resume-and-Cover-Letter-Reviews
I understand #! requires JavaScript, not the most friendly SEO in the world and many popular sites (Facebook, Twitter, etc.) deprecated the use of it.
While some Javascript content is indexed, if you want to make sure there is absolutely content in the index for this site you have two options. Either make sure the site is non-JS friendly which is supported in a lot of JS frontend sites, or alternatively use a content feed to push the data into the GSA instead. Turn off JS in your browser and access the site and see if content links are created.
If you have access to the database, you could just send the content straight in, however read up here: http://www.google.com/support/enterprise/static/gsa/docs/admin/72/gsa_doc_set/feedsguide/ on feeds which can send data straight in, or possibly read up on connectors in general https://support.google.com/gsa/topic/2721859?hl=en&ref_topic=2707841

Is there a way to be notified when a wiki page changes on a Google Code hosted site?

Is there a way to be automatically notified (via email) when a wiki page changes on a Google Code hosted site?
I'll use http://code.google.com/p/mutagen/wiki/FAQ as an example. The page has an RSS feed of edits, http://code.google.com/feeds/p/mutagen/svnchanges/basic?path=/wiki/FAQ.wiki.
If you wish to know when any page in the wiki is edited, then the wiki index page and it's associated RSS feed (e.g. http://code.google.com/p/mutagen/w/list and http://code.google.com/feeds/p/mutagen/svnchanges/basic?path=/wiki/) should do the job.
Use IFTTT feeds or a more dedicated service such as Blogtrottr. Enter the RSS feed URL and your email address, then off you go.
EDIT: In fact Blogtrottr works with the URL of the wiki page, no need to work out the RSS URL. IFTTT might also.
OK, I think I've found a simpler way of doing things.
Go to: https://code.google.com/hosting/settings
Click on the checkbox against "Whenever comments are added to a wiki page by another user, send me an email"
Star the wiki pages you want to be notified about.

a script to log into webpage

I want to write a script to log in and interact with a web page, and a bit at a loss as to where to start. I can probably figure out the html parsing, but how do I handle the login part? I was planning on using bash, since that is what I know best, but am open to any other suggestions. I'm just looking for some reference materials or links to help me get started. I'm not really sure if the password is then stored in a cookie or whatnot, so how do I assess the situation as well?
Thanks,
Dan
Take a look a cURL, which is generally available in a Linux/Unix environment, and which lets you script a call to a web page, including POST parameters (say a username and password), and lets you manage the cookie store, so that a subsequent call (to get a different page within the site) can use the same cookie (so your login will persist across calls).
I did something like that at work some time ago, I had to login in a page and post the same data over and over...
Take a look at here. I used wget because I did not get it working with curl.
Search this site for screen scraping. It can get hairy since you will need to deal with cookies, javascript and hidden fields (viewstate!). Usually you will need to scrape the login page to get the hidden fields and then post to the login page. Have fun :D

When trying to integrate one website with another what is the way to go? Iframe or pulling content?

My company has multiple vendors that all have their own websites. I am creating a website that acts as a dashboard where customers can access all of the vendor's sites. I wanted to know what is the best option for doing this?
Here's what I have so far:
Iframe
Can bring in the entire website
Seems secure enough (not sure if I'm missing any information on security issues for this)
Users can interact with the vendor's website through our site
Our website cannot fully interact with the vendor's website (Also may be missing info here)
Pulling in the content
Can bring in the entire website
Not very secure from what I hear (Some websites actually say that pulling another website in is a voilation of security and will alert the user of this or something similar...
Users can interact with their website through our site
Our website can fully interact with the vendor's website
Anyone have any other options...?
What are some of the downsides to bringing in a site with an iframe and is this really our only option for doing something like this?
Optimally, we would like to pull in their site to ours without using an iframe- What options do we have on this level? Is there anything better than an iframe?
Please add in as much information as you can about iframes, pulling content, security, and website interactions like this. Anything to add in is appreciated.
Thanks,
Matt
As far as "pulling content" is concerned I wouldn't advise it as it can break. All it takes is a simple HTML change on their end and your bot will break. Also, it's more work than you think to do this for one site, let alone the many that you speak of. However, there are 3rd party apps that can do this for you if you have the budget.
You could use an iframe/frames, however, many sites might try to bust out of them and it can ruin the user experience of the site within the frame.
My advice is to use the following HTML for each link in your dashboard.
Vendor Site Link
If you can have the sites that you are embedding add some client-side script, then you could use easyXSS. It allows for easy transferring of data, and also calling javascript methods across the domain boundry.
I would recommend iFrames. Whilst not the most glamorous of elements, many payment service providers use iFrames for the Verified by Visa/Mastercard Secure Code integration.

Reverse Engineer A Web Form

I have a web site which I download 2-3 MB of raw data from that then feeds into an ETL process to load it into my data mart. Unfortunately the data provider is the US Dept. of Ag (USDA) and they do not allow downloading via FTP. They require that I use a web form to select the elements I want, click through 2-3 screens and eventually click to download the file. I'd like to automate this download process. I am not a web developer but somehow it seems that I should be able to use some tool to tell me exactly what put/get/magic goes from the final request to the server. If I had a tool that said, "pass these parameters to this url and wait for a response" I could then hack something together in Perl to automate this process.
I realize that if I deconstructed all 5 of their pages and read through the JavaScript includes and tapped my heals together 3 times I could get this info from what I have access to. But I want a faster and more direct path that does not require me to manually parse all their JS.
Restatement of the final question: Is there a tool or method that will show clearly what the final request request sent from a web form was and how it was structured?
A tamperer's best friends (these are firefox extensions, you could also use something like Wireshark)
HTTPFox
Tamper Data
Best of luck
Use Fiddler2 as a proxy to see what is being passed back and forth. I've done this with success in other similar circumstances
Home page is here: http://www.fiddler2.com/fiddler2/
As with the other responses, except my tool of choice is Charles
What about using a web testing toolkit, like Watir and Ruby ?
Easy to fill in the forms.. just use the output..
Use WatiN and combine it with WatiN TestRecorder (Google for it)
It can "simulate" a user sitting in front of the browser punching in values which you can supply from your own C# code...

Resources