I am trying to scrape data from an infinite scroll site. I need a list of URL's though to be able to parse the data I need. The site is http://www.cabinetparts.com/c/hinges/european-cabinet-hinges/standard-european-cabinet-hinges. I'm looking to find something like http://www.cabinetparts.com/c/hinges/european-cabinet-hinges/standard-european-cabinet-hinges#/Page=2... Page=3 etc. I was able to find this by inspecting the other infinite scroll sites in Chrome, but this site continues to elude me. Thanks in advance!
Related
I have a few problems i could use some input on.
I have a website, where all the content is loaded with ajax, it works quite well. There are a few issues with that approach though, or some UX issues.
User cannot copy URL from loaded content, since it will allways show the default URL only.
SEO will take a hit, since it cannot be crawled, the sitemap is like 2 pages only, even though when a normal user browses, they will see alot more.
Browser history, back and forward, does not work. Hitting the back button goes to the main page.
Now, i have searched and read alot.
Google has a hack, that seems to allow the site to be crawled, IF you use # in your url, does not work with empty url, which leads me to...
Manipulating the browser history with pushState/popState.
Now, i have tried getting it to work, but i just cant get my head around which process is the best way to take. Should i redo all my ajax?
Right now i have 2 div boxes, and i switch between them with loaded content, to get that nice sweet transition between pages. My frontpage is basically just 2 empty divs, nothing else. It works, but i get the feeling it is a pretty bad way to do it, thoughts?
If anyone know some good guides, feel free to give me, i have as i said read alot, but i might have missed some golden ones out there.
Google does execute some Javascript when indexing and ranking pages. However, text which is not immediately visible to users is demoted when establishing content relevancy.
Manipulating the browser history with pushState/popState.
It is very unlikely Google will trust your content if you need to use those tricks. And content which is not trusted is not ranked.
UPDATE: Manipulating browser history with pushState is ok.
Moreover, if your URLs change all the time, Google won't appreciate it, unless you manage to set canonical links.
I'm working on a Wordpress site which displays posts through a JSON api and AngularJS. I render all post thumbnails on a page and when one is clicked the post is rendered in an overlay on the same page. The post url becomes something like mysite.com/#!/post-name.
Here's the development page http://givakt.kund.griffel.se/blogg-jobb/
Since everything is fetched by AJAX calls none of this info is available to search engines. I have tried to figure out a good approach to make it indexable but it's all very new ground to me.
Would it be possible to get content from or redirect the search engine to a php-rendered (wordpress) page, say like mysite.com/post-name, while thinking it's getting the correct content at mysite.com/#!/post-name. Is it even allowed or even frowned upon? The actual content would of course be as identical as possible at both sources.
Not sure if this is legit approach however, or if it could even work. Is there any other easier or preferred approach that I'm missing?
BTW, I have read http://www.yearofmoo.com/2012/11/angularjs-and-seo.html and how to use PhantomJS and so on to provide indexable pages. So what I'm basically asking is if there's a way to utilize wordpress pages to serve the content instead.
I'm not exactly sure how to do it in terms of technicalities, but Google is usually not happy if you show one version of the page to search engines and something else to actually visitors. It's called cloaking. Just keep it in mind.
I am trying to create a one-page WordPress website, something like the ones you sometimes see in ThemeForest's WP section: the whole website is a long page that has everything in one place, from about us, to portfolio, to some blog posts, to contacts.
Placing all things on one page is not difficult. But when I started thinking about how to present individual posts and pages, I realised that I probably need a general way of getting posts' data via AJAX, and create new blocks with JS. How should I go about this? I suppose this was done before, but I struggle to find something this specific on Codex or a tutorial with best practices.
Any advice or link will be greatly appreciated.
You could use a plugin such as jQuery Easytabs, download it here, that has a built-in Ajax component.
I've found that the easiest way is to just get all content to load into the divs ahead of time, vs. trying to load all pages through Ajax. However, appending something like '?ajax/ajax' to the end of your urls through the Easytabs plugin is one option that I have successfully used in the past.
If you decide to use the easytabs functionality, there is ample documentation on the page that I linked to.
I am developing my portfolio website and I thought that it would be nice to show websites I've created within my portfolio website.
So say I will have header with navigation of my portfolio website and beneath it a or with another website from my portfolio list.
I mentioned and as I've heard that they can do such job, but I have no idea of how to use them.
So could anyone please suggest the way that is efficient, browser friendly, wouldn't take a lot of my server resources to display different websites (and will do the job).
Additionally: the way that I was planning to do this is by having links to different websites like http://webpage.com/portfolio.php?url=http://anotherWebpage.com so when user clicks it the website shown in the (or something else) would be http://anotherWebpage.com . Also if possible could someone suggest "AJAX" method that will not require page refresh and will show different webpage in (or something else ;D ) instantly after user clicks on different links
Thank You )))
Use javascript and iframe
My page
<iframe id="iframe"></iframe>
I would say that the iframe is your best bet.
You would want to have a page with your toolbar and all your navigational elements and then an iframe that can change its source when you click a link.
You've probably heard that iframes don't always play nice with every browser but there are some ways you can eliminate the problems that come up.
Sometimes they really are the best way to do things!
Check out some tips here and here
I have build a site based on Ajax navigation.
I have build it that way, that whenever someone without javascript visits my site, the nav links, which usually load content via Ajax, are acting like normal links and the user can browse through the pages as usual.
Since, Google bot doesn't run javascript, it should theoretically be able to go through all links and corresponding sites as usual, right? Since they are valid links with the href tag pointed to the corresponding site.
Now I was wondering if thats sufficient or if I need to implant this method from Google too to make sure Google sees all my content?
Thanks for your insights and excuse my poor English!
If you can navigate your site by showing source (ctrl-u in chrome), google can also crawl your site. Yes, its that simple