Automatic embeded script - ajax

I have a free website on 000webhost. The problem is it automatically puts its analytics code in all my files. It does not show up everywhere but it causes me problem when I use AJAX calls and display the returned data in a div, it displays the data as well as that particular code. Is there any type of method to avoid this or make this code invisible.
Also when I used Google webmaster tools, when it crawls my robots.txt file the code is also shown to the crawler on that file also which is a text file and it returns error.
Please help!
Here is the link to my website:
Portfolio

Your hosting provider seems to inject the following line at the end of some requests:
<!-- www.000webhost.com Analytics Code -->
<script type="text/javascript" src="http://analytics.hosting24.com/count.php"></script>
<noscript><img src="http://analytics.hosting24.com/count.php" alt="web hosting" /></noscript>
<!-- End Of Analytics Code -->
The injection point of their code is simply at the end of the request. When I checked, it meant that it was outside the HTML tag (and thus broke HTML syntax validity.
I see 2 possible solutions:
Work the issue out with them.
Find a work-around (your provider might be injecting this content on when they detect certaint type of requests. If you find out that they only inject the content on requests to URLs ending in .html (for example), you could try changing the destination of your AJAX call towards another URL)

Related

response.xpath not returning value

I am trying to extract the pricing for an item on the following page: https://www.lowesforpros.com/pd/DEWALT-20-Volt-Max-1-2-in-Cordless-Brushless-Drill/1000135807
In the following code nothing is returned:
response.xpath("//*[#id='main']/div[6]/section[1]/div[3]/div[2]/div[2]/div/span[1]/text()").extract()]
I have looked at the source and do not see any indication of JS in use to pull the pricing.
What about simple:
response.xpath('//span[#itemprop="price"]/#content').extract_first()
The price section is not included in the basic HTML of the page. It is loaded by javascript after the completion of page loading. Consequently, the path is not specified anything. You have to use javascript renderer engines such as Splash or web drivers like Selenium

Ajax content indexing, Google

I've followed the instructions from the Google website to enable Ajax crawling on my AngularJS site by adding the following meta tag:
<meta name="fragment" content="!">
The rendered content has some links like:
User 1
User 2
User 3
Also some Ajax tabs which render dynamic content like:
Popular
Recent
Looking at the server logs, GoogleBot did came and passed in correctly the _escaped_fragement in the Uri, which is correct:
_escaped_fragment_=%2fpopular
_escaped_fragment_=%2frecent
Problem is that looking at actual indexed content using site:www.somesite.com and logs on server, I see that GoogleBot attempted to index pages like:
/user/1/#!/popular
/user/1/#!/recent
Why would something like this happen considering those urls are relative and don't have #! on them to indicate ajax content and is there a way to prevent this?
If those URLs are available on all pages, it will simply add them.
So, if I would go to: User 1 and there are again Popular there pages, then it's logical that Google loads: /user/1#!/popular
You might want to know that I've solved this puzzle with a script that's on Github: https://github.com/kubrickology/Logical-escaped_fragment
Simply build your AJAX pages with: __init()

Why is my ajax content not being indexed by google

I have tried to set my site up ( http://www.diablo3values.com )according to the guidelines set out here : https://developers.google.com/webmasters/ajax-crawling/ However, it appears that Google has updated their indexes (because I see the revisions to the meta description tags) but the ajax content does not show up in the index.
I am trying to use the “Handle pages without hash fragments” option.
If you view either of the following:
http://www.diablo3values.com/?_escaped_fragment_=
http://www.diablo3values.com/about?_escaped_fragment_=
you will correctly see the HTML snap shot with my content. (those are the two pages I an most concerned about).
Any Ideas? Am I doing something wrong? How do you get google to correclty recognize the tag.
I'm typing this as an answer, since it got a little to long to be a comment.
First of all, your links seems to point to localhost:8080/about, and not /about, which probably is why google doesn't index it in the first place.
Second, here's my experience with pushstate urls and Google AJAX crawling:
My experience is that ajax crawling with pushstate urls is handled a little differently by google than with hashbang urls. Since google won't know that your url is a pushstate url (since it looks just like a regular url), you need to add <meta name="fragment" content="!"> to all your pages, not only the "root" page. And google doesn't seem to know that the pages are part of the same application, so it treats every page as a separate Ajax application. So the Google bot will never actually create a navigation structure inside _escaped_fragment_, like _escaped_fragment_=/about, as it would with a hashbang url (#!/about). Instead, it will request /about?_escaped_fragment_= (which you aparently already have set up). This goes for all your "deep links". Instead of /?_escaped_fragment_=/thelink, google will always request /thelink?_escaped_fragment_=.
But as said initially, the reason it doesn't work for you is probably because you have localhost:8080 urls in your _escaped_fragment_ generated html.
Googlebot only knows to crawl the escaped fragment if your urls conform to the hash bang standard. As users navigate your site, your urls need to be:
http://www.diablo3values.com/
http://www.diablo3values.com/#!contact
http://www.diablo3values.com/#!about
Googlebot actually needs to see these urls in the source code so that it can follow them. Then it knows to download the following urls:
http://www.diablo3values.com/?_escaped_fragment=contact
http://www.diablo3values.com/?_escaped_fragment=about
On your site you appear to be loading a new page on each click, and then loading the content of each page via AJAX too. This is not how I would expect an AJAX site to work. Usually the purpose of using AJAX is so that the user never has to load a whole new page. When the user clicks, the new content section is loaded and inserted into the page. You serve the navigation once and then you only serve escaped fragments of the content.

Making content accessible on Addon SDK

I am developing an addon using Firefox's Addon SDK (v. 1.11). My extension dynamically creates an iframe on each website and then loads an html file which includes other resources such as images, font files, etc. from the add on's local directory.
Problem
When loading any of such local resources (i.e.: "resource://" schema), the iframe fails to display them and a message is thrown:
Security Error: Content at http: //www.XXX may not load or link to
resource://XXX
This is a security measure introduced on Firefox 3. When developing without the Addon SDK, the way around it is declaring a directory with "contentaccessible=yes", making the directory's contents accessible to anyone, including my add on. However, I have not been able to find similar functionality using the Addon SDK. Is there a better way of using local data on an iframe that my addon creates and inserts into a page?
I don't think you can directly load an iFrame that points to a resource inside your URL. The browser complains because it's either breaking same origin policy or cross site scripting one. I can't remember which one right now.
if it is html content you want to load you can always inject it into the DOM and then send a message to the document object using the events API to display your custom html. I've done this in the past and it works.
so from main.js send a message to content script which will then inject your iframe html into the DOM and then you can send the document object a message to display it.
I hope this helps.
Not sure if this was the case when you posted the question, but it appears that "resource://" should no longer be used with the Addon SDK.
If you're using the resource inside of an HTML file in the extension, you can reference it locally, otherwise you should use data.url('whatever.jpg') and pass around that value as needed.
Full info is here: http://blog.mozilla.org/addons/2012/01/11/sdk-1-4-known-issue-with-hard-coding-resource-uris/

create a widget to retrieve and display data via ajax

I tried the classic ajax approach, but that throws an access denied javascript exception when trying to add a script stored on another domain.
Now, I'm sure this is possible since google populates google ads via js only; so does twitter, and the list can continue.
How I thought of it so far:
<div id="divId"></div>
<script type="text/javascript" src="http://mysite.com/script.js"></script>
The script in script.js should have changed the innerHTML attribute of the div above. Instead, I get the following message in fireBug: Access to restricted URI denied code: 1012
I googled around a bit but only found workarounds that are useless, like php proxies and such, whereas I want this widget to be copy-pasted into other peoples sites, blogs, forums, etc..
Thanks in advance for any helpful replies. :)
The behavior that you are seeing is intended and there for security reasons. You wouldn't want a third party script to make any changes to your page as that can be exploited heavily.
Instead, give your users a JavaScript snippet to embed on their page.
<script>
// do stuff here
</script>
Note that inside this snippet you can create a script tag dynamically, set the src attribute and load the actual JavaScript. This snippet that your users embed on their page has access to the entire DOM, but the script loaded externally does not.
Here's an example of the profile widget that Twitter gives out to embed on web pages:
<!-- external js, can't access or change the DOM -->
<script src="http://widgets.twimg.com/j/2/widget.js"></script>
<!-- local js, does that -->
<script>
new TWTR.Widget({
version: 2,
..
..
}).render().setUser('hulu').start();
</script>
The first script tag loads the library, while the second one which actually manipulates the page is added as code directly.
I finally found a solution that doesn't involve ajax.
I simply use
<div id="objectId"></div>
<script type="text/javascript" src="http://mysite.com/getAndDisplayData.php"></script>
<script type="text/javascript">getAndDisplayData();</script>
And in getAndDisplayData.php I generate a JS script that will create my widget inside the div above. The php file also connects to the database and retrieves all required widget data.
Apparently this is how google ads works, though I am not sure. It is certain though that they don't use ajax.

Resources