How to know quantity of users with turned off images in browser? - image

I'm working on the quite popular website, which looks good if user has turned on "Load images" option in his browser's settings.
When you try to open the website with "turned off images" option, it becomes not usable, many components won't work, because user won't see "important" buttons(we don't use standard OS buttons).
So, we can't understand and measure negative business impact of this mistake(absent alt/title attributes).
We can't set priority for this task - because we don't know how much such users comes to our website.
Please give me some advice how this problem can be solved?

Look in the logs for how many hits you get on a page without the subsequent requests from the browser for the other images.
Of course the browser might have images cached, so look for the first time you get a hit.
You can even use IP address for this, since it's OK if you throw out good data (that is, hits that are new that you disregard). The question is just: Of the hits you know are first-time, how many don't get images?
If this is a public page (i.e. not a web application that you've logged in to), also disregard search engine bots to the greatest extent possible; they usually won't retrieve images.

Related

How to read content from reCAPTCHA protected site

My client needs data scraped from a website. I am planning to use php_curl. The problem is, the site is using Google reCAPTCHA. Few powerful data items are visible only when you click "show this information link". then the reCAPTCHA appears in lightbox and vanishes, and information is displayed.
I have checked the source html, the protected item is actually loaded when someone clicks, and there is no way for me to automate this click. I have even tried to open the site in iframe and then use JS to click it, but it fails as both domains are different. I have also tried to use Selenium stand alone version but its downloads are corrupt.
Unless there is a design flaw with the website, the reCAPTCHA will prevent you from scraping the material without human intervention.
Technically, your best bet is to employ humans to solve CAPTCHAs all day and write some software to automatically scrape the material it protects for each one they solve. A number of viable businesses have been created this way, where the data is valuable and there is a genuine public interest in opening the data-set. (For example I heard that flight companies use CAPTCHA devices to prevent price comparison sites from driving down the cost to the consumer, and I'd argue in such a case there is an overwhelming public interest to defeating such defences).
Morally, however, you would need to tell us what you are doing in order for us to advise you. It is possible your client is merely planning to steal other people's material and then attempt to monetise it for him/herself, even though they had no hand in creating it. That may breach some copyright laws, but moreover, they (and you) need to decide if the scraping is fair.
I am facing the same problem but resolved it using clear my cookies in httprequest in useragent after clear cookie wait time function (tread sleep) for some time and then start scrapping again. But I am doing this in C#, not in PHP. Applying this logic may help you.

Can I blog adblock-enabled users from viewing content by wrapping content in an ad?

Just trying to find a good, simple way to block users who use adblock plus (ads are the only way writers can receive any money on our website, as with many sites).
It seems the plugins that block adblock-plus users can also be blocked.
I considered a simple solution. If their computers block ads, can't I just wrap my content in the ad, so they view all or nothing?
If not, can anyone think of any similar methods of denying users who want to deny any chance of compensation to the hardworking writers whose content they already enjoy without charge?
Possibly using a conditional statement dependent on the ad script?
It is impossible to do that.
What you can do, however, (and this is a very popular technique) is put an image behind your ad telling your viewers about how ad-blocking is bad. That way, you can possibly persuade your viewers to disable their ad-block.
If your viewer has ad-block on, they'll be able to see the image. If they don't have ad-block on, the ad will overlap the image, so they'll only see the ad.
By doing that, at least you can convince sympathetic people to perhaps disable their ad-block.
Aside from that, you're really out of luck.

Ajax Website, problems with History and Seo

I have a few problems i could use some input on.
I have a website, where all the content is loaded with ajax, it works quite well. There are a few issues with that approach though, or some UX issues.
User cannot copy URL from loaded content, since it will allways show the default URL only.
SEO will take a hit, since it cannot be crawled, the sitemap is like 2 pages only, even though when a normal user browses, they will see alot more.
Browser history, back and forward, does not work. Hitting the back button goes to the main page.
Now, i have searched and read alot.
Google has a hack, that seems to allow the site to be crawled, IF you use # in your url, does not work with empty url, which leads me to...
Manipulating the browser history with pushState/popState.
Now, i have tried getting it to work, but i just cant get my head around which process is the best way to take. Should i redo all my ajax?
Right now i have 2 div boxes, and i switch between them with loaded content, to get that nice sweet transition between pages. My frontpage is basically just 2 empty divs, nothing else. It works, but i get the feeling it is a pretty bad way to do it, thoughts?
If anyone know some good guides, feel free to give me, i have as i said read alot, but i might have missed some golden ones out there.
Google does execute some Javascript when indexing and ranking pages. However, text which is not immediately visible to users is demoted when establishing content relevancy.
Manipulating the browser history with pushState/popState.
It is very unlikely Google will trust your content if you need to use those tricks. And content which is not trusted is not ranked.
UPDATE: Manipulating browser history with pushState is ok.
Moreover, if your URLs change all the time, Google won't appreciate it, unless you manage to set canonical links.

Does your average web user understand the concept of the clipboard?

I'm designing a web site where I would expect the intended audience to have limited computer skills.
An important part of my site's functionality will require the end user to copy an URL that my site generates and use it in emails, or social network postings, or on their own web site.
I could write the URL to the clipboard when a button is pushed (like tinyurl.com does). However I'm wondering whether the average user even understands what that means and how to use it.
Any guidence will be appreciated.
There are several paths of though possible, you can make many of the scenarios possible at once.
Make the URL copied at once.
Make the URL copied when pushing the button.
Make the URL selected at once so the user can copy it directly
Of course since the URL is copied at once the two other actions does not need to do anything with the clipboard unless the user has modified the clipboard after the URL was presented.
When the URL is copied, at least in the first two steps you could show a message for the user that is has been copied, that makes it useful for those that understand it.
For those that might be focused on copying the URL themselves step 3 helps them on their way.
Whatever path the user tries should go as smooth as possible without forcing them to understand the benefits offered. In the long run it might help them, but if only used a few times the work of understanding what is going on outweighs the benefit.

Content Water Marking

We have members-only paid content that is frequently copied and republished without our permission.
We are trying to ‘watermark’ our content by including each customer’s user id in a fake css class, for example <p class='userid_1234'> (except not so obivous, of course :), that would help us track the source of the copying, and then we place that class somewhere in the article body.
The problem is, by including user-specific information into an article, it makes it so that the article content is ineligible for caching because it is now unique to each user.
This bumps the page load time from ~.8ms to ~2.5sec for each article page view.
Does anyone know of any watermarking strategies that can still be used with caching?
Alternatively, what can be done to speed up database access? ( ha, ha, that there’s just a tiny topic i’m sure.. )
We're using the CMS Expression Engine, but I'd like to hear about any strategies. They don't have to be EE-specific.
If you're talking about images then you could use PHP to add a watermark to the images.
How can I add an image onto an image in PHP like a watermark
its a tool to help track down the lazy copiers who just copy the source code as-is. this is not preventative, nor is it a deterrent. – Ian 12 hours ago
Going by your above comment you are happy with users copying your content, just not without the formatting etc. So what you could do is provide the users an embed type of source code for that particular content just like YouTube does with videos. Into that embed source code you could add your own links back to your site, utilize your own CSS etc.
That way you can still allow the members to use the content but it will always come out the way you intended it with links back to your site.
Thanks
You could always cache a version that uses a special string, like #!username!#, and then later fill it in with PHP based on which user is viewing it.
Another way I believe is to switch from caching on the server to instead let the browser cache it locally for a little. That way it is only cached per user, and it reduces the calls to your database. Because an article is pretty static, you could just let the local computer cache it, and pull in comments via javascript.
This last one is probably not one you are really looking for, but I'm gonna come out and say it anyway. You could not treat your users like thieves, and instead treat the thieves as thieves. Go to the person hosting the servers your content is on and send them an email telling them copyrighted premium content is being hosted on their servers without your permission. You can even automate that process.
How to find out what sites are posting your content? Put a link in the body content to your site, and do a Google Search/Blog Search for articles linking to that site. To automate it, use Google Blog Search because it offers RSS feeds. Any one that has a link back to your site could go into a database with a link to the page, someone could look at it, and if it is the entire article, go do a Whois and send them an email.
What makes you think adding css to something is going to stop people from copying it without that CSS? It's more likely that they are just coping the source of the content you are showing them and ignoring all the styling around it. For example, I use tamper data to look at all HTTP requests made by Firefox, if I can see it on the page, I can see it in the logs. Even with all the "protection" some sites try to put in place, they generally will never work. I can grab what I want, without using any screen capture/recording.
If you were serving flv's, for example, I would easily be able to grab the source of that even if you overlayed it with some CSS. I think the best approach would be to get the sites publishing your premium content and ask them to remove it. It's either that or watermark the actual content on the fly while sending it to the browser.

Resources