Bulk import + export url rerwrites for Magento - magento

I found a "bulk import and export url rewrites extension" for Magento when looking on the internet on how to bulk redirect urls from my current live site to the new urls based on the new site which is on a development server still.
I’ve asked my programmer to help me out and they’ve sent me 2 CSV files, one with all request and target urls from the current live site (these are often different as well, probably due to earlier redirects), and one similar to that for the new site. The current live site comes with 2500 urls, the future site with 3500 (probably because some old, inactive and unnecessary categories are still in the new site as well).
I was thinking to paste the current site’s urls into an Excel sheet and to then insert the future urls per url. A lot of work… Then I thought: can’t I limit my work to the approx. 300 urls that Google has indexed (which can be found through Google Webmaster Tools as you probably know)?
What would you recommend? Would there be advantages to using such an extension? Which advantages would that be? (if you keep in mind that my programmer would upload all of my redirects into a .htaccess file for me?)
Thanks in advance.
Kind regards, Bob.

Axel is giving HORRIBLE advice.. 301 redirects tell Google to transfer old PR and authority of that page to the new page in the redirect.. More over if other websites linked to those pages you don't want a bunch of dead links, then someone MIGHT just remove them.
Even worse if you don't handle the 404's correctly Google can and will penalize you.
ALWAYS setup 301 redirects when changing platforms, only someone that either doesn't understand or perhaps care about SEO would suggest otherwise

Related

Robots.txt in Magento Websites

I have recently started working with a company that has a Magento eCommerce website.
We spotted that the traffic dipped considerably in May, and also the ranking on Google.
When I started investigating i saw that the pages of the ES website were not appearing on Screaming Frog
Only the homepage showed and status said blocked by robots.txt
I said this to my developer and they said they would move the robot.txt file to the /pub folder.
but would that not mean the file was in two places.. would this be an issue?
The developer has gone ahead and done this, how long should it take to see is screaming frog is indexing the pages.
Any Magento developers that could help with advise on this?
Thanks
Neo
There is a documentation page for how to manage robots.txt with Magento 2.x.
And you can use this to allow all traffic to your site:
User-agent:*
Disallow:
Regarding the Googlebot crawl rate, here is some explanation on it.
According to Google, “crawling and indexing are processes which can take some time and which rely on many factors.” It’s estimated that it takes anywhere between a few days to four weeks before Googlebots index a new site. If you have an older website and doesn’t experience crawling, the design of the site may be a problem. Sometimes, sites are temporarily unavailable when Google attempted to crawl, so checking the crawl stats and looking at errors can help you make changes to get your site crawled. Google also has two different crawlbots: a desktop crawler to simulate a user on a desktop, and a mobile crawler to simulate a device search.

Website is different when I upload on FTP

I have just started developing for a few weeks now and I bought a domain, but when I upload the files on live, the website looks different than what I have uploaded. Now, this gets fixed when I clear my cache. The problem is that my visitors enter, they see the page in a way, and after I update it they see it as the previous version!
Is there any possible solution for this? I don't want my visitors to clear cache every time I make a change on my website!
This is quite probable to be due to css cache. Your server is loading a cached version. You can specify the cached time in a few ways. Etags and htaccess (on apache) are the most common.
A very simple trick is just to add at the end of your style link url (where you load your main style in the head of the document) a get-like parameter: just like this:
main.css?v=2

Magento - prevent from browsing without rewrite

I have a problem with someone (using many IP addresses) browsing all over my shop using:
example.com/catalog/category/view/id/$i
I have URL rewrite turned on, so the usual human browsing looks "friendly":
example.com/category_name.html
Therefore, the question is - how to prevent from browsing the shop using "old" (not rewritten) URLs, leaving only "friendly" URLs allowed?
This is pretty important, since it is using hundreds of threads which is causing the shop to work really slow.
Since there are many random IP addresses, clearly you can't just block access from a single or small group of addresses. You may need to implement some logging that somehow identifies this crawler uniquely (maybe by browser agent, or possibly with some clever use of the Modernizr javascript library).
Once you've been able to distinguish some unique identifiers of this crawler, you could probably use a rule in .htaccess (if it's a user agent thing) to redirect or otherwise prevent them from consuming your server's oomph.
This SO question provides details on rules for user agents.
Block all bots/crawlers/spiders for a special directory with htaccess
If the spider crawls all the urls of the given pattern:
example.com/catalog/category/view/id/$i
then you can just kill these urls in a .htaccess. The rewrite is made internally from category.html -> /catalog/category/view/id/$i so, you only block the bots.
Once the rewrites are there ... They are there. They are stored in the Mage database for many reasons. One is crawlers like the one crawling your site. Another is users that might have the old page bookmarked. There are a number of methods individuals have come up with to go through and clean up your redirects (Google) ... But as it stands, in Magento, once they are there, they are not easily managed using Magento.
I might suggest generating a new site map and submitting it to the crawler affecting your site. Not only is this crawler going to be crawling tons of pages it doesn't need to, it's going to see duplicate content (bad ju ju).

Does Robots Meta Tag No Index remove indexed URL's

We have an application which has about 15000 pages. For better SEO reasons we had to change the URL's. Google had already crawled all of these pages earlier and due to the change, we see a lot of duplicate titles/meta description on webmasters. Our impressions on google have dropped and we believe this is the reason. Correct me if my assumption is incorrect. Now we are not able to write a regular expression for the change of URL's using a 301 redirect, because the change was such. The only way to do it would be to write 301 redirects for individual URL's which is not feasible for 10000 URL's. Now can we use a robots meta tag with NOINDEX? My question basically is if I write a NOINDEX metatag will google remove the already indexed URL's? If not what are the other ways to remove the old indexed URL's from google? ANother thing which I can do is make all the previous pages 404 errors to avoid the duplicates, but will that be a right thing to do?
Now we are not able to write a regular expression for the change of
URL's using a 301 redirect, because the change was such. The only way
to do it would be to write 301 redirects for individual URL's which is
not feasible for 10000 URL's.
Of course you can! I'm rewriting more than 15000 URLs with mod_rewrite and RewriteMap!
This is just a matter of scripting / echo all URLs and mastering vim, but you can, and easily. If you need more information, just ask.
What you can do is a RewriteMap fils like this:
/baskinrobbins/branch/branch1/ /baskinrobbins/branch/Florida/Jacksonville/branch1
I've made a huge answer here and you can very easily adapt it to your needs.
I could do that job in 1-2 hours max but I'm expensive ;).
Reindexing is slow
It would take weeks for Google to ignore the older URLs anyways.
Use Htaccess 301 redirects
You can add a file on your Apache server, called .htaccess, that is able to list all the old URLs and the new URLs and have the user instantly redirected to the new page. Can you generate such a text file? I'm sure you can loop through the sections in your app or whatever and generate a list of URLs.
Use the following syntax.
Redirect 301 /oldpage.html http://www.yoursite.com/newpage.html
Redirect 301 /oldpage2.html http://www.yoursite.com/folder/
This prevents the 404 File Not Found errors, and is better than the meta refresh or redirect tag because the the old page is not even served to clients.
I did this for a website that had gone through a recent upgrade, and since google kept pointing to the older files, we needed to redirect clients to view the new content instead.
Where's .htaccess?
Go to your site’s root folder, and create/download the .htaccess file to your local computer and edit it with a plain-text editor (ie. Notepad). If you are using FTP Client software and you don’t see any .htaccess file on your server, ensure you are viewing invisible / system files.

Content Water Marking

We have members-only paid content that is frequently copied and republished without our permission.
We are trying to ‘watermark’ our content by including each customer’s user id in a fake css class, for example <p class='userid_1234'> (except not so obivous, of course :), that would help us track the source of the copying, and then we place that class somewhere in the article body.
The problem is, by including user-specific information into an article, it makes it so that the article content is ineligible for caching because it is now unique to each user.
This bumps the page load time from ~.8ms to ~2.5sec for each article page view.
Does anyone know of any watermarking strategies that can still be used with caching?
Alternatively, what can be done to speed up database access? ( ha, ha, that there’s just a tiny topic i’m sure.. )
We're using the CMS Expression Engine, but I'd like to hear about any strategies. They don't have to be EE-specific.
If you're talking about images then you could use PHP to add a watermark to the images.
How can I add an image onto an image in PHP like a watermark
its a tool to help track down the lazy copiers who just copy the source code as-is. this is not preventative, nor is it a deterrent. – Ian 12 hours ago
Going by your above comment you are happy with users copying your content, just not without the formatting etc. So what you could do is provide the users an embed type of source code for that particular content just like YouTube does with videos. Into that embed source code you could add your own links back to your site, utilize your own CSS etc.
That way you can still allow the members to use the content but it will always come out the way you intended it with links back to your site.
Thanks
You could always cache a version that uses a special string, like #!username!#, and then later fill it in with PHP based on which user is viewing it.
Another way I believe is to switch from caching on the server to instead let the browser cache it locally for a little. That way it is only cached per user, and it reduces the calls to your database. Because an article is pretty static, you could just let the local computer cache it, and pull in comments via javascript.
This last one is probably not one you are really looking for, but I'm gonna come out and say it anyway. You could not treat your users like thieves, and instead treat the thieves as thieves. Go to the person hosting the servers your content is on and send them an email telling them copyrighted premium content is being hosted on their servers without your permission. You can even automate that process.
How to find out what sites are posting your content? Put a link in the body content to your site, and do a Google Search/Blog Search for articles linking to that site. To automate it, use Google Blog Search because it offers RSS feeds. Any one that has a link back to your site could go into a database with a link to the page, someone could look at it, and if it is the entire article, go do a Whois and send them an email.
What makes you think adding css to something is going to stop people from copying it without that CSS? It's more likely that they are just coping the source of the content you are showing them and ignoring all the styling around it. For example, I use tamper data to look at all HTTP requests made by Firefox, if I can see it on the page, I can see it in the logs. Even with all the "protection" some sites try to put in place, they generally will never work. I can grab what I want, without using any screen capture/recording.
If you were serving flv's, for example, I would easily be able to grab the source of that even if you overlayed it with some CSS. I think the best approach would be to get the sites publishing your premium content and ask them to remove it. It's either that or watermark the actual content on the fly while sending it to the browser.

Resources