Crawlers/SEO Friendly/Mod Rewrite/It doesn't make any sense - mod-rewrite

So I am attached to this rather annoying project where a clients client is all nit picky about the little things and he's giving my guy hell who is gladly returning the favor by following the good old rule of shoving shi* down the chain of command.
Now my question. The application consists basically of 3 different mini projects. The backend interface for the administrator, backend interface for the client and the frontend for everyone.
I was specifically asked to apply MOD_REWRITE rules to make things SEO friendly. That was the ultimate aim, so this was basically an exercise in making things more search friendly rather than making the links aesthetically better looking.
So I worked on the frontend, which is basically the landing page for everyone. It looks beautiful, the links are at worst followed by one backslash.
My clients issue. He wants to know why the backend interfaces for the admin and user are still displaying those gigantic ugly links. And these are very very ugly links, I am talking three to four backslashes followed by various get sequences and what not, so you can probably understand the complexities behind MOD_REWRITING something such as this.
In the spur of the moment I said that I left it the way it was to make sure the backend interface wouldn't be sniffed up by any crawlers.
But I am not sure if that's necessarily true. Where do crawlers stop? When do they give up on trying to parse links? I know I can use a .robot file to specify rules. But, as indigenous creatures, what are their instincts?
I know this is more of a rant than anything and I am running a very high risk of having my first question rejected :| But hey, it feels good to have this off my chest.
Cheers!

Where do crawlers stop? When do they give up on trying to parse links?
Robots.txt does not work for all bots.
You can use basic authentication or limited access by IP to hide back-end, if no files are needed for front-end.
If not practicable, try to send 404 or 401 headers for back-end files. But this is just an idea, no guarantee.
But, as indigenous creatures, what are their instincts?
Hyperlinks, toolbars and browser-sided, pre-activated functions for malware-, spam- and fraud-warnings...

Related

Can I blog adblock-enabled users from viewing content by wrapping content in an ad?

Just trying to find a good, simple way to block users who use adblock plus (ads are the only way writers can receive any money on our website, as with many sites).
It seems the plugins that block adblock-plus users can also be blocked.
I considered a simple solution. If their computers block ads, can't I just wrap my content in the ad, so they view all or nothing?
If not, can anyone think of any similar methods of denying users who want to deny any chance of compensation to the hardworking writers whose content they already enjoy without charge?
Possibly using a conditional statement dependent on the ad script?
It is impossible to do that.
What you can do, however, (and this is a very popular technique) is put an image behind your ad telling your viewers about how ad-blocking is bad. That way, you can possibly persuade your viewers to disable their ad-block.
If your viewer has ad-block on, they'll be able to see the image. If they don't have ad-block on, the ad will overlap the image, so they'll only see the ad.
By doing that, at least you can convince sympathetic people to perhaps disable their ad-block.
Aside from that, you're really out of luck.

Squid ICAP Rewriting Content

I have a need for the following:
Web App at http://mything.com/.
Reverse Proxy Server at http://mything.com/proxy/
Reverse Proxy should proxy: http://mything.com/proxy/http://bbc.co.uk/ for example.
The Reverse Proxy should rewrite all links, src, href, and so forth in the source page (htp://bbc.co.uk) to rebase them from http://mything.com/proxy/$1.
We have a 90% working implementation in Ruby, developed as a Rack app, a series of 8 middlewares that do various jobs:
Rewrite cookie domain
Inject a javascript chunk from our site
Rewrite links to fit the proxy.
Admittedly our Ruby version isn't bad, but it requires about one second to process a page, which is shockingly slow, and simply doesn't blend. It gets an order of magnitude worse when we bring SSL into the mix, and the management are talking about stripping all SSL out, and only working with http. (Which naturally is short sighted, and incredibly foolish.)
This kind of problem simply has to have been more widely solved by tools such as Squid and protocol drafts such as ICAP.
I'd like to move away from our home grown solution as it's a source of constant pain, and it's slow.
I've just come to learn about Squid's Content Adaptation function, but haven't been able to find a decent way to make it work.
I can see the following problems which I haven't been able to solve:
How to extract the URL component from the original request URI?
i.e how can I take .../proxy/(.*) and use that as the backend address?
How to rewrite cookies in an ICAP program?
i.e does the ICAP content adaptation program receive the full request?
How to rewrite urls, links, etc with ICAP? (Squirm looks promising, but looks like things have to be hard coded (see point #1.)
i.e are there any smarter tools that simply using regexes/etc?
I know this question is at risk of being flagged "demonstrates no evidence of research", but I'm at the limits of what I've been able to ascertain without investing a number of days (or so it would appear) to learn the intricacies of Squid, when the only thing I need is to rewrite some content!

iTunes URL in tweet in app

In my app, I've got a share button for facebook and twitter etc. I want that people can tweet and post the link of my app, but the app is not yet available in the app store, so i can't have a link to the app.
In some apps there is a link to the app if you are composing a tweet, how do they do that?
As H2CO3 pointed out, having that AppStore URL delivered from a server you control would likely be something you would prefer. Naturally, the first question to pop to mind would be something of the form 'Why do I have to use a server at all? Can't I just code it into my app directly?'
In short form, you are not required to use a server, but you may find that the benefits and flexibility it offers you in at the cost of a little more networking code and complexity in your app helpful. Let's expand on this a bit, by imagining we lived in the perfect world where we knew with 100% accuracy what that AppStore URL would be and that it would never ever change. We could get away with just coding that URL directly into our app and we would never again have to think about that part of our app -- it would always tweet the correct URL, users would be happy, and you as the developer can turn your attention to more important matters.
Unfortunately, things often don't work perfectly and as software developers we have to consider the ugly edges of reality and write code to handle them gracefully. Here are just a few of the potential problems with hard-coding a URL:
We don't actually know the full AppStore URL.
Apple may decide to change AppStore URL formats and render old URLs invalid.
Despite our best efforts to carefully type the URL into our code, we made a mistake and now we have to issue an update and wait for our app to be reviewed to get a simple typo fixed - Eep!
H2CO3's suggestion to use a server builds likely stems from experience in writing software coupled with developer's inherent risk management driven thought processes -- if there is something I can do to make my software handle these ugly edges more gracefully and make my software more reliable, it may make sense to take the extra time to implement my feature differently to protect it from the shadowy unknowns the future may have in store for my app.
For the sake of a balanced argument against putting the URL on a server:
If your users are in a place that has spotty cell or wifi coverage, they may not be able to connect with your server to get the AppStore URL from your server
Your server might not be working properly so it can't deliver information to your app when requested.
Adding a network request like this can be very easy to do, but also introduces its own set of risks to have to consider (isn't writing software great?)
As indicated above, you do not need to make your app snag the URL from a server you control, but you may want it to do so. Only you, as the app's developer, can determine what degree of risk you are willing to accept and which of the available options you've researched, invented, or otherwise acquired seem to fit your specific needs the best.
Since I don't know your background, I'm going to stay relatively high level and give you another couple of nudges in some directions you can go and do some additional research:
On the 'setup a server' idea -- you can purchase a hosting account from a number of hosting providers around the Internet or if this is a school or work project you may be able to speak with your IT people to request space for a website. After you have that setup, you can put a file on that space with a placeholder URL and write some code in your app to connect to your server and read your file (that has your fake URL in it!) the put it into your Tweet. Once your app is approved, you can change the fake URL on your server with the real one, and your App will work like you want. For the App part of things, you might look into some of the simple +stringWithContentsOfURL: methods on NSString (though do remember to consider things like what happens if the Internet is down, or if you don't get anything back from your server, etc.!)
On the 'just hard code the URL into the app' idea -- Apple makes some marketing resources available to developers even before they release an app. Checkout (https://developer.apple.com/appstore/resources/marketing/index.html) with an emphasis on the Shortlinks section, and also checkout Technical Q&A 1633 (https://developer.apple.com/library/ios/#qa/qa1633/_index.html). Both of these links give you information about how to build a link directly to an application or vendor on the AppStore given just their name(s). Like before, do remember to consider what happens if you ever decide to rename your app, or if linking elsewhere (or maybe nowhere!) would make more sense.
Hopefully this will help you think a little more about what you are actually trying to achieve, and give you a sense about what other developers think about when faced with decisions like the one you've posed here.
While i agree with Bryan i always avoided using servers for basic things. With ios 5+ you can send tweets from inside the app ( and you can add default tweet (i.e. a link to the app)
Your problem can be solved easily this way : make a short link with the link to the app store ( the link to the app store is formatted like this : https://itunes.apple.com/app/id <app id> , and the app id the the one in itunnesconnect under Apple ID )
For example you can make a default tweet like this : " Check out this awesome app!! goo.gl/buya " and then the user can edit it as he wishes.
Also..it's extremely unlikely that Apple will change the format of theyr links...too many users depend on this format to do..a lot of things

How effective is ajaxcrawling compared to serverside created website SEO?

I'm looking for real world experiences in regards to ajaxcrawling:
http://code.google.com/web/ajaxcrawling/index.html
I'm particularly concerned about the infamous Gizmodo failure of late, I know I can find them via Google now, but it's not clear to me how effective this method of ajaxcrawling is in comparison to serverside generated sites is.
I would like to make a wiki that lives mostly on the client side, and which is populated by ajax json. It just feels more fluid, and I think it would be a pluspoint over my competition. (wikipedia, wikimedia)
Obviously, for a wiki it's incredibly important to have working SEO.
I would be very happy for any experiences you have had dealing with clientside development.
My research shows that the general consensus on the web right now is, that you should absolutely avoid doing ajax sites unless you don't care about SEO (for example, a portfolio site, a corporate site etc).
Well, these SEO problems arise when you have a single page that loads content dynamically based on sophisticated client-side behavior. Spiders aren't always smart enough to know when JavaScript is being injected, so if they can't follow links to get to your content, most of them won't understand what's going on in a predictable way, and thus won't be able to fullly index your site.
If you could have the option of unique URLs that lead to static content, even if they all route back to a single page by a URL rewriting scheme, that could solve the problem. Also, it will yield huge benefits down the road when you've got a lot of traffic -- the whole page can be cached at the web server/proxy level, leading to less load on your servers.
Hope that helps.

Why is CoreGui RobloxLocked in the DataModel and why can't trusted users use CoreScripts?

We should be able to access some of it so that we can edit the placement of each GUI object inside of CoreGui. So, other than security reasons, why are we not allowed to edit placement of GUI objects?
Also, why can't trusted users use CoreScripts? What if they need to access HttpGet so they can provide a nice display showing where their best friend is at the current time and place? SocialService won't always do the trick.
Can a developer (or any other experienced Roblox player, particularly one that knows the UI in and out) please answer these questions to the best of his/her ability?
I asked this in the OBC cast, specifically about editing the UI inside CoreGui. I'm not sure what security reasons could be preventing this, however. They did reply - the answer was, "Well, we definitely don't want you moving the little help icon, or the exit button."
I got the feeling the general reason is because users would become confused if everything was misplaced. For example, if you went into a website where you could play several games all made by that company (like ROBLOX), would you expect the exit or help buttons to me placed differently in every game?
They did say we will be able to change the colours.
Hope this clears things up.
Some GUI objects like the report abuse button we don't want users to have the ability to be able to remove. Another sensitive area is the chat window. If it was completely scriptable, you could write a script to make it look like another user was saying something that he wasn't. This is not really desirable.
HttpGet is currently a privileged function for two main reasons:
It would allow users to get dynamic content into levels, which would make moderation a more difficult task.
Poorly or maliciously written scripts could HttpGet roblox.com in an infinite loop, sapping our server resources.
There was no obvious benefit, but some obvious downsides. We prefer to solve only the problems that need to be solved in order to ship features, so we err on the side of caution for things like this. If we later decide to open up new functionality, like making the ROBLOX social graph available through an API, we can do that with a dedicated interface that limits the number of requests you can make to the website in a given period, and only return the info that we are sure we want you to be able to get.
It's interesting to note that for a very long time Adobe Flash player didn't support TCP sockets for the same reason.

Resources