Dynamic Sitemap URL in robots.txt file - sitemap

I have the following content in my robots.txt file:
Sitemap: https://example.com/sitemap.php
Is it ok to have the sitemap in robots.txt as .php file instead of .xml as I generate it dynamically?

A consumer (like a search engine bot) can’t know if a file is static or dynamically generated, it could only guess (e.g., based on the response time, HTTP response headers, or the URL design).
You could have a static file named sitemap.php, and you could have a dynamically generated file named sitemap.xml.
And a consumer doesn’t need to know. What matters is the content of the file, not how the file was/is created, not its URL.
In any case, make sure to send the correct Content-Type for this file. Some servers automatically select a content type based on the filename extension, which would fail with .php, so you might have to set it explicitly.

Related

DocPad generate pages without extension

Is there any way to configure DocPad generate pages without extension, so hosting as static site in url it will look like: http://mysite.com/page1/ ?
Yes, but in a different way.
The cleanurls plugin when generated for a static environment (so docpad generate --env static) will output say pages/welcome.html as pages/welcome/index.html which accomplishes what you're after - which is you can access pages/welcome in your browser no worries. Document URLs will also be updated to reflect this.
The issue of just outputting pages/welcome without the extension is that then the server is not aware of the mime type of the file which the browser often needs to know how to process the file correctly.

Don't execute flag in http response?

I was reading about attacks on sites with the ability to upload and download files. Some attacks were about uploading a jpg which is really a html file and a comment about what if you want users to be allowed to store html and download them (or perhaps view them in the browser w/o using the save as feature).
Is there some type of flag i can use to say do not execute? I will want users to view images or video files other have uploaded. What if i'd like user html to be displayed but i dont want to force users to download them (content-disposition attachment).
Is there a way i can say hey here is some user data. It could be an image so i should allow img src to work. It could be an html so i'd like users to see it but dont allow it to read/write cookies/localstorage/call ajax request/etc?
-edit- Come to think of it. All of my user data is hosted on its own cookieless subdomain for static files. That would get rid of many problems i mention but what else is left to deal with? Also i believe my mime response completely depends on what my web server does (nginx atm) which could simply be look at the file extension.
-edit2- I adjusted my nginx config to add the application/unknown Content-Type. It seems to do exactly what i want. I saw a suggestion to use octet-stream for unknown files but that causes browsers (at least firefox) to try to download it even if its a jpg capable being viewed in browser.
It all depends on the Content-Type in your HTTP Response.
Browsers handle the data returned by the Content-Type in HTTP response.
For example if let say a user uploads a HTML file in a upload field supposedly for photo upload, as long as your web server gives Content-Type as image/jpeg (or image/png et al) the browser should handle it as an image - and in this case an invalid image because the image contains weird HTML stuff inside instead of the usual binary.
In any case, if you are feeling unsecure, you can always peek into the file data during upload validation.

Valums file uploader: get full path to file in file system

I'm using valums file uploader and I want to display a file path in textbox after user choosen any file (like with standart file upload). Is there any possible solution to achieve this?
No. Based on my research I've found numerous posts suggesting that browser security features prevent objects from knowing the file system until the appropriate submit action is invoked.
Furthermore they suggest that if you do want to display the full path you'll need a non-browser solution like a java plugin (possibly even a flash object could do it).

Do browse served a cached file if its name or contents change?

Two general questions I'm wondering about both in the case for a given file(.js, .css, etc.) where you've set an expires header and also when you have not:
Do browsers request a new file (NOT serving the cached one) only if the file name has changed? Browser's don't assess the file contents too, correct?
Do ALL browsers behave the same regarding question #1 or are there known to be differences between them, for example on mobile (iOS safari, etc.)?
thank you,
tim
The browser can't check file contents unless it downloads the file. (The browser does not, for example, request a checksum). It usually delegates the task of content-checking (or timestamp-checking) to the server. The browser will send an if-modified-since header with a timestamp. The webserver will check to see if the file has changed, and if not, it will send a 304 not modified code.
All browsers follow this basic protocol. Servers may vary in how they decide if a file has changed.

Is it safe to serve an image on the web without an extension?

I'm treating all *.jpg files as static, but I need to serve a few dynamically. Can I simply omit the extension so I don't have to get fancy with my url rules? Is it enough to just set the file type in the header?
I've never had a problem serving dynamic images with a strange extension or no extension at all. Querystrings are also fine.
It will be enough for the headers to be correct and the binary file correctly formed. When you do this make sure you also set the Content-Disposition to a reasonable file name so people don't try to download your files with crazy querystring names. (Which windows users will be unable to save since they will most likely have a "?" in them.)
Instead of omitting the extension (on your server), activate content negotiation (i.e. +MultiViews if you're using Apache) and omit the extensions in your URIs. That way, Apache will decide what file to serve; you could have an image in both png and svg format, and serve the one accepted by the browser.
Generally, a correct Content-type header is enough.

Resources