Apache Traffic Server store partial document - apache-traffic-server

https://docs.trafficserver.apache.org/en/latest/admin/faqs.en.html?highlight=partial%20content
this document says 'Traffic Server does not store partial documents in the cache.'
but, Traffic Server cache partial image(like jpg) file.
sometimes, The upper half of the image file cached.
Is there a solution?
I'm using ats 5.3.0.

The short answer is that ATS currently does not support storing partial objects. This is a new feature that is actively being worked on, and will hopefully be available with v6.1.0 or possibly 6.2.0.
That much said, if you see partial objects in the cache like this, you are likely experiencing one of
Corrupted response from origin server.
Non-conformant response from the origin (did it respond without proper headers?)
Corrupted cache (very unlikely).

Related

HTTP2 does not yet support etags?

I am currently making a server for dynamic and static files with Node. I'm trying to implement HTTP2. What surprises me is that it seems that the HTTP2 push does not support ETags!
When the client sends the headers to retrieve a file that starts with a push, and that it has accepted, it ignores the "IF-NONE-MATCH" header.
It's a waste, I do not understand the reason for this behavior. Is this the case or am I missing something?
As discussed in the comments the server pushes the resource, so there is no client request, so there is no Etag to send.
So HTTP/2 does support Etags - they just have no relevance for pushed requests.
And yes this does mean cached resources are ignored for Pushed resources - which is one of the big drawbacks of Push and why many people do not recommend using it. When a client sees the PUSH_PROMISE that a server sends before pushing a resource, it can reject it with a RST_STREAM request but by the time that makes it to the server often a good part (if not all) of the resource will have already been pushed.
There are a few ways around this:
You could track what has already been pushed using cookies for example. I've a simple example with Apache config here: https://www.tunetheweb.com/performance/http2/http2-push/. Of course that assumes that cookies and cache are in sync but they may not be (they can be cleared independently).
Some servers track what has already been pushed. Apache for example allows an HTTP/2 push diary to be configured (set to 256 items by default) which tracks items pushed on that connection. If you visit page1.html and it pushes styles.css, and then you visit page2.html and it also attempts to push styles.css Apache will not push it as it knows you already have it. However that only works if you are using the same connection. It you come back later on a new connection, but it's still in the cache then it will be re-pushed.
There was a proposal for Cache digests, which allow the browser to send an encoded list of what is in the cache at the start of any connection, and the server could use that to know whether to push an item or not. However work on that has been stopped recently as there were some privacy concerns about this.
Ultimately HTTP/2 Push has proven to be tricky to make useful and usage of it is incredibly low because of this. In large part due to this, but also because it is complex and there are other implication issues. Even if all those were solved, it's still easy to over push resources when perhaps it's best to let the browser request the resources in the order it knows it needs them. The Chrome team have even talked about turning it off and not supporting it.
Many are recommending using Early Hints with status code 103 instead, as it tells the browser what to request, rather than just pushing it. The browser can then use all it's usual knowledge (what's in the cache, what priority it should be requested with...etc.) rather than overriding all this like Push does.
Cheap plug, but if interested in this then Chapter 5 of my recently published book discusses this all in a lot more detail then can be squeezed into an answer on Stack Overflow.

Implementing "extreme" bandwidth saving for web browsing with a compression proxy

I have a network connection where I pay per megabyte, so I'm interested in reducing my bandwidth usage as far as possible while still having a reasonably good browsing experience. I use this wonderful extension (https://bandwidth-hero.com/). This extension runs a image-compression proxy on my heroku account that accepts images URLs, and returns a low-quality version of those images.This reduces bandwidth usage by 30-40% when images are loaded.
To further reduce usage, I typically browse with both JavaScript and images disabled (there are various extensions for doing this in firefox/firefox-esr/google-chrome). This has an added bonus of blocking most ads (since they usually need JavaScript to run).
For daily browsing, the most efficient solution is using a text-mode browser in a virtual console such as elinks/lynx/links2 running over ssh (with zlib compression) on a VPS server. But sometimes using JavaScript becomes necessary, as sites will not render without it .Elinks is the only text-mode browser that even tries to support JavaScript, and even that support is quite rudimentary. When I have to come back to using firefox/chrome, I find my bandwidth usage shooting up. I would like to avoid this.
I find that bandwidth is used partially to get the 'raw' html files of the sites I'm browsing, but more often for the associated .js/.css files. These are typically highly compressible. On my local workstation, html+css+javascript files typically compress by a factor of more than 10x when using lzma(2) compression.
It seems to me that one way to do drastically reduce bandwidth consumption would be to use the same template as the bandwidth-hero extension, i.e. run a compression proxy either on a vps or on my heroku account but do so for text content (.html/.js/.css).
Ideally, I would like to run a compression proxy on my local machine. When I open a site (say www.stackoverflow.com), the browser should send a request to this local proxy. This local proxy then sends a request to a back-end running on heroku/vps. The heroku/vps back-end actually fetches all the content, and compresses it (lzma/bzip/gzip). The compressed content is sent back to my local proxy. The local proxy decompresses the content and finally gives it to the browser.
There is something like this mentioned in this answer (https://stackoverflow.com/a/42505732/10690958) for node.js . I am thinking of the same for python.
From what google searches show, HTTP can "automatically" ask for gzip versions of pages. But does this also apply for the associated files that are loaded by JavaScript, and for the css files? Perhaps, what I am thinking about is already implemented by default ?
Any pointers would be welcome. I was thinking of writing a local proxy in python,as I am reasonably fluent in it. But I know little about heroku or the intricacies of HTTP.
thanks.
Update: I found a possible solution here https://github.com/barnacs/compy
which does almost exactly what I need (minify+compress with brotli/gzip+transcode jpeg/gif/png). It uses go instead of python, but that does not really matter. It also has a docker image here https://hub.docker.com/r/andrewgaul/compy/ . Since I'm not very familiar with heroku, I cant figure out how to use this to run the compression proxy service on my account. The heroku docs also weren't of much help to me. Any pointers would be welcome.

How can I compress the JSON content from CouchDB's HTTP responses?

I am making a lot of _all_docs requests to CouchDB's HTTP server. One thing I'm realizing is that the data is not compressed, so this results in large file sizes. Even by using limit and skip, the files can sometimes be 10MB each. That doesn't cause any problems for my app, but it does mean that if a connection to our CouchDB server is slower than our office connection, it will go rather slow.
Is there any way I can enable HTTP compression? I am not referring to attachments - just the JSON files.
Also, I am using Windows Server - not Linux/Unix.
Thanks!
There is no support in CouchDB directly, but it has been requested. (so voice your support there if you want this included)
That being said, there are a number of options you have. First, you can set up nginx as a reverse proxy and allow it to compress (and possibly cache) responses for you. After a quick search, I found this plugin that you install in CouchDB directly.
Another thing is that CouchDB does a pretty solid job of allowing clients to cache reliably. You can leverage this to prevent repeatedly downloading the same large resource.

Blue Dragon Coldfusion server cache issue

I have an application build in ColdFusion MVC framework "Mach-II" and hosted on blue dragon ColdFusion server.
It causes caching issue. When i added a new page with some contents and load the page than it's working fine. But when i made some changes in the same file and hit it again its not update my changes. Its always showing me the content that i have made in the very first time. Its seems like that the server is caching my page and did not consider further changes. I have tried many solutions but failed to solve the problem.
Please let me know if you have any solution for that.
This is a bit too long for a comment - but it's not much of an answer.
First off, your question is quite broad for StackOverflow. If you aren't looking at the code yourself, and have nothing to show us, there is no guarantee we can help you at all.
It sounds like maybe this service is using query caching - which looks something like this.
<cfquery datasource="CRM" name="testQuery" cachedwithin="#CreateTimeSpan(0,0,30,0)#">
-SQL logic-
</cfquery>
Basically it stores a query's result in memory on the server. It can really help reduce strain on the database. It's possible that they've set a time limit on this caching feature that's longer than you'd like.
If you don't have access to the code, THIS is the issue you want to ask about first.
Edit: It may be entirely different.
https://docs.oracle.com/cd/E13176_01/bluedragon/621/BlueDragon_621_WL_User_Guide.html#_Toc121303111
From source:
Where ColdFusion (5 and MX) defines a ‘template cache” as a place to
holds templates in memory once rendered from source code, BlueDragon
has the same notion but refers to this as the “file cache”. In both
engines, a template once rendered from source will remain in the cache
until the server (or J2EE or .NET web app) is restarted.
The cache size, specified in the Admin Console, indicates how many of
these cached templates to keep. It defaults to 60 but that number may
need to change for your application, depending on how many CFML
templates your application uses. One entry is used for each template
(CFM or CFC file) requested.
It’s very important to understand that this is not caching the OUTPUT
of the page but rather the rendering of the template from source into
its internal objects. One cached instance of the template is shared
among all users in the application.
As in ColdFusion, once the file cache is full (for instance, you set
it to 60 and 60 templates have been requested), then the next request
for a template not yet cached will force the engine to flush the
oldest (least recently used) entry in the cache to make room.
Naturally, if you set this file cache size too low, thrashing in the
cache could occur as room is made for files only to soon have the
flushed file requested again.
It sounds like you might have to either restart the ColdFusion application or clear the Template Cache in the CFAdmin.

Do websites share cached files?

We're currently doing optimizations to our web project when our lead told us to push the use of CDNs for external libraries as opposed to including them into a compile+compress process and shipping them off a cache-enabled nginx setup.
His assumption is that if the user has visits example.com which uses a CDN'ed version of jQuery, the jQuery is cached that time. If the user happens to visit example2.com and happen to use the same CDN'ed jQuery, the jQuery will be loaded from cache instead of over the network.
So my question is: Do domains actually share their cache?
I argued that even if it is possible the browser does share cache, the problem is that we are running on the assumption that the previous sites use the same exact CDN'ed file from the same exact CDN. What are the chances of running into a user browsing through a site using the same CDN'ed file? He said to use the largest CDN to increase chances.
So the follow-up question would be: If the browser does share cache, is it worth the hassle to optimize based on his assumption?
I have looked up topics about CDNs and I have found nothing about this "shared domain cache" or CDNs being used this way.
Well your lead is right this is basic HTTP.
All you are doing is indicating to the client where it can find the file.
The client then handles sending a request to the CDN in compliance with their caching rules.
But you shouldn't over-use CDNs for libraries either, keep in mind that if you need a specific version of the library, especially older ones, you won't be likely to get much cache hits because of version fragmentation.
For widely used and heavy libraries like jQuery you want the latest version of it is recommended.
If you can take them all from the same CDN all the better (ie: Google's) especially as http2 is coming.
Additionally they save you bandwidth, which can amount to a lot when you have high loads of traffic, and can reduce the load time for users far from your server (Google's is great for this).

Resources