Google says: Sitemap.xml is HTML - sitemap

So I have been having a war with Google these last couple of days.
I created a sitemap (which is generated dynamicly) and routed to it using reverseproxy in nginx.
Happy with myself i uploaded the url too Google search console.
I got an error right away as seen in the picture. "Sitemap is HTML"
After digging around a while it appeared our pre-renderer had picked up the request and served google a pre-rendered version of the xml file, thus in html.
But even after fixing this, making sure no request for sitemap.xml goes to our pre-renderer google still, gives the same error message.
I have tried removing it and adding it again in search console multiple times on different days, i have tried waiting, I have tired serving it with another name (sitemap2.xml), and I have tried adding an actual xml file instead of the dynamic one. Nothing works!
I have verifed the xml file after i disabled the pre-renderer on multiple other sites and every one gives me an ok.
It's as though it's ignoring my requests to re-check the file.
Sitemap location: https://www.tirex.se/sitemap.xml
Any tips at this point would be much appriciated!

Related

Google Search Console verification fails on single site while many others had no issue

Interestingly, this is apparently the official way to reach Google API support? (...akin to Microsoft/SO's documentation partnership?) Interesting — but obviously this limits the private information that I can include in my "support request"...
I have added-then-verified 400+ domains (with each of their http/https/www/no-www variations, for 800+ total) on Google Search Console via the related API's, without issue.
One domain is giving me a problem with verification via 'HTML File Upload', even though it's triple-checked to be set-up the same as the other 825 that verified without issue.
I compared WHOIS and intodns.com DNS Health report and I also cleared the DNS Cache and waiting a couple days to see if it was a caching issue.
        
I've tried multiple verification methods, but this error persists on both the http:// and http://www. versions of the one site. The site itself works fine and I can't see any anomalies with it on my end.
I'm not sure if this could be related but the webmaster's site list, does include one strange property that is apparently verified (in addition to the two unverified versions of the problem domain):
        
          (I've masked the ID number since I have no idea what it represents.)
How can I get my ownership of this site verified on Google Search Console?
You can verify your site ownership by the alternate method. By inserting HTML tag you can verify your ownership easily. From search console you will get the HTML Tag. The Other way is to verify the ownership is Google Tag Manager and Google Analytics.
HTML Tag Sample is: <meta name="google-site-verification" content="String_we_ask_for">

How to remove files from Varnish-cache

I'm developing a game in js/php. When I first uploaded my project, it contained a file named "index.html" with nonsensical content (only the word "bla" and a facebook-like-button). I later deleted that "index.html" so that requests to the domain would hit my "index.php" instead (wich contains the actual game).
This happend over a week ago, and i still see people (friends i asked to test the game) getting this dumb "index.html" shown when they open the site in their browsers. I also see this happening to roughly 1/3rd of the browsers when requesting screenshots via browserstack.com or browsershots.org.
I'm assiming the index.html is still cached by cloudcontroles Varnish-cache, but i can't find any possibility to clear this cache for my site. How can i do this or what can i do to get rid of this cached version?
For anyone who wants to test this live: http://dotgame2.cloudcontrolled.com/ (note that this dosn't happen always and for everyone)
Consider using cache breakers dependent on deployment version. You can also try our *.cloudcontrolapp.com routing tier which do not provide caching at all - http://dotgame2.cloudcontrolapp.com.

URL Re-writes and Google Indexing

I was asked to perform some URL re-writes for a new site with numerous dynamic pages and this has all worked fine.
However when I look at the URLs that Google has indexed, it has indexed the 'non-rewrite' url, so all the '?', '&' etc are being used.
What do you have to do to force Google to index your re-written URLs?
I just assumed it would do this automatically and never expected it to be an issue.
All help is gratefully appreciated.
Thanks.
Steps
1) Make sure that expired pages are no longer publicly accessible
2) Anything you do not wish Bots to crawl should be flagged with appropriate "nofollow" meta tags
3) Submit a new sitemap to your Google Web developer account
4) Make sure your Website throws a 404 error when a page isn't found. It is always a good idea to make a splash page for a 404 error which links back to your home page. (this is accomplished different ways across different server-side languages)
Google will automatically remove indexed pages if they no longer exist.. So be patient.

Pdf files are not getting updated after docusign

We are using Embedded signing of DocuSign REST API to e-sign files.To sign a file, we upload the required file to our web app and then display it a viewer in the browser. This file can be signed immediately or later.
What is happening is that when the file is signed and the process is completed, we return to the same file view but the updated file is not reflected. Only when we refresh page like 3-4 times, it shows the sign on the file.
This issue comes only for files that were uploaded and signed later. For a fresh file which is uploaded and signed immediately, we get the updated file view.
It appears that all the browsers cache files (not HTML page, but the embedded files). The recommended solutions suggest to either add a parameter in the request when file is reloaded after signing- but this works only intermittently. The other is to rename the file so that the browser picks the updated file. But renaming file is not an option for us.
Is there some other alternative? Have any other DocuSign API users ever faced something similar? (I believe this issue would not come if you use email request mode for e-signing)
Thanks.
There have been no similar reports from anyone... I am not discounting yours necessarily but when you just write up something about your web app I could think of a few things that your web app could be doing out of sequence to see this behavior.
The first common mistake with embedded signing that comes to mind is this. In general embedded signing requires several steps (1) login call (2) create envelope (3) get the view of the recipients.
Most of the people put that logic in the controller code behind a web page so when they come back it goes through the same sequence. I understand that your page has some logic to maybe guard against it, but ideally on the "viewing" you should only call (3) - getting the view. If you somehow end up calling (2) again - you will see the signing sequence all over.
That's the most common mistake. However I do not want to discount your report. In order to actually get to the bottom of it you should post the web service call traces (XML for SOAP / JSON for REST) and show exactly what your app is doing.
Hope this helps.
-mb // i work for docusign

Google Docs Viewer - File Request Timeout

I'm working on a Joomla website, which has a set of documents that needs to displayed using a Google Docs viewer.
Though only Authenticated users can reach the file, but the file can also be access through direct path like http://www.example.com/files/somefile.pdf even without authentication.
So when i tried to view a file through Google Viewer with a link something like this..
http://docs.google.com/viewer?url=http://www.example.com/files/somefile.pdf
The files which are of size less than 100kb are viewable and for rest all an error message is displayed as:
Sorry, it took too long to find the document at the original source. Please try again later.
You can also try to download the original document by clicking here.
So I'm not sure whether this is something to do with the Google Doc Viewer, Joomla or any Server issue for request timeout.
How can I make each file irrespective of size viewable with Google Docs?
If its PDF only, you can also just use pdfjs from Mozilla directly. Then you should check your URL encoding. If the issue remains, check out https://code.google.com/p/google-api-php-client/ for converting your docs in-place. Opening them with pdfjs is still recommended to bypass Google-Doc-Viewer problems, at least that is how I could get this working properly.

Resources