Is there an any option to get a file unique by Etag in minio - minio

I am currently making some progress in MinIO. Is there any way I can get the object by an Etag which is available for each object after uploading.. so that I can identify the thing uniquely!!
The existing way am using is to get the object only by mentioning the bucket name and the exact object name.

There is no get object by Etag under S3 API. The correct way to get object is via its object name or version id(optional given versioning is enabled)

Related

get s3 object expiration in golang

There seems to be a way to set expiration days in an upload by attaching a Metadata with an "expdays" key to the PutObjectInput but there doesn't appear to be any way to then get this metadata from an object in like a "ListObjects" call. ListObjectOutput returns "Contents" which is a list of s3.Objects but that object doesn't have a metadata field and I don't see any other way to get it either.
The ListObjects calls return a []Object for the actual object list in the result, and Object does not include the expiration. However, GetObject returns GetObjectOutput which does have an Expiration field. So you could iterate your returned Objects and get the expiration for each, though this could be time-consuming if there are many of them.

How to check if a url path exists in the service worker cache

I need to check if a particular URL path exists in the service worker cache.
For example, suppose my URL is:
/myserviceworker/service?a=110&b=70
this URL exists in the cache, but there are many of them with different values of a and b.
Now, suppose I want to refresh all of these URLs, how can I do that?
I want to know how to access the key values from Service Worker cache.
If I know the key, my plan is as follows:
var url = new URL(key);
if(url.pathname === "\/myserviceworker/service")
then refetch the key
But I am not sure how to get the cache key and in what format it is. I mean, is it a string or is it already a URL?
Cache API has a match() method which returns a promise resolving in a Response object if match or undefined if no match exists. The second parameter is an object where you can specify ignoreSearch to not take into account URL parameters.
The ignoreSearch option is actually supported only by Firefox (Chrome status here).
In the other hand, to retrieve all the cache entries, you can use the keys() method.

Setting noindex on Amazon S3 objects

We have some publicly shared S3 files that we want to make sure won't be indexed by Google. I can't seem to find any documentation on how to do this. Is there a way to set a "noindex" x-robots-tag response header on individual S3 objects?
(We're using the Ruby AWS client)
There does not appear to be a way to do this.
Only certain headers from an S3 PUT object request are documented as being returned when the object is fetched.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html
Anything else you send appears to be simply disregarded, as long as it doesn't actually invalidate the request.
Actually, that's what I thought before researching this, and it's almost true.
The documentation here seems incomplete, and elsewhere suggests the following request headers, if sent with the upload, will appear in the download:
Cache-Control
Content-Disposition
Content-Encoding
Content-Type
x-amz-meta-*
Other headers are listed at the latter link, but some of these like Expect wouldn't make sense on a GET request, so they logically wouldn't appear.
So far, this is all consistent with my experience with S3.
If you send a random but not-invalid header with your request, it's ignored. Example:
X-Foo: bar
S3 seems to accepts this on upload, but discards it (presumably doesn't store it)... downloading the object does not return the X-Foo header.
But X-Robots-Tag appears to be an undocumented exception to this.
Uploading a file with X-Robots-Tag: noindex (for example) does indeed result in the same header and value being returned with the object when you GET it.
Unless somebody can cite the documentation that explains why this works, we're operating in distinctly undocumented territory.
But, if you're interested in going there, the simple answer appears to be, you just add this header to the HTTP PUT request you send to the REST API to upload the object.
"Not so fast," you say, "I'm using the Ruby SDK." Indeed. The AWS Ruby client seems to be too "helpful" to let you get away with this, at least, not easily. The docs there show how to add "metadata" --
:metadata (Hash) — A hash of metadata to be included with the object. These will be sent to S3 as headers prefixed with x-amz-meta. Each name, value pair must conform to US-ASCII.
Well, that's not going to work, because you'd get x-amz-meta-x-robots-tag.
How do you set other headers in the upload? Every other header you'd normally set is an element of the options hash, like :cache_control, which turns into Cache-Control: in the upload request. Unless they're blindly applying the keys from that hash to the upload transaction (which would be terrible design combined with excellent luck) then you may not have a straightforward way to get here from there. I can't be much more specific, because the only I really know about Ruby is the same thing I know about Java -- from what I've seen of it, I don't like it. :)
But X-Robots-Tag does appear to be a custom header S3 supports, to some extent, without clear documentation of that fact. It's, at least, accepted by the REST API.
Failing the above, you can manually add this header to the metadata in the S3 console after uploading the object. (Note, X-Foo: Bar doesn't work from the S3 console, either -- it's silently discarded, with no error -- but X-Robots-Tag: works fine).
You can also, of course, put a publicly-readable robots.txt file (with the appropriate directives in it) in the root of the bucket. Depending on your cobtent mix, path hierarchy, and other factors, that isn't (perhaps) as simple as selectively setting headers, but if the entire bucket is comprised of information you don't want indexed, it should easily accomplish what you want, since content should not be indexed if disallowed in robots.txt, even when a search spider follows a link to it from another site -- every domain (and subdomain)'s robots.txt file stands alone.
#Michael - sqlbot is correct. The SDKs don't support it by default and it won't show in the AWS Console, but if you set it directly with the REST API it works. For those who don't want to figure out the REST API and its authentication method, I was able to modify the node.js aws-sdk to support this feature.
Amazon stores the method params configuration and validation in a large json file: apis/s3-2006-03-01.min.json . I guess that the other SDKs may implement their validation in the same way.
You can go to the "PutObject" command, and under "input.members" you can add a new parameter "XRobotsTag". Configure it as a "header" and set the location to "X-Robots-Tag".
"XRobotsTag": {
"location": "header",
"locationName": "X-Robots-Tag"
}
Your local aws-sdk is now configured to support X-Robots-Tag on your putObject requests. In node.js this would look like this:
s3.putObject({
ACL: "public-read",
Body: "hello world",
Bucket: "my-bucket",
CacheControl: "public, max-age=31536000",
ContentType: "text/plain",
Key: "hello.txt",
XRobotsTag: "noindex, nofollow"
}, function(err, resp){});

Fully update documents without creating if not existent

Is there any method on elasticsearch for fully (not partially) updating documents and not create new ones in case it doesn’t already exists?
Until now, I found that the _update method, while passing a doc attribute inside the json request body to partially updating documents, however, I would like to replace the entire document in this case, not only partially.
I have also found that, the index method, where sending a PUT request works fine, although creating a new document in case the id not yet indexed.
Setting the op_type parameter to create will enforce document creation instead update.
I was wondering if there is any way to always enforce update and never create a new one?
Or perhaps is there another method that would allow me to achieve such task?
If I understand correctly, you want to index a doc, but only if it already exists? Like an op_type option of update?
You can mostly do it with the update API, given that your mapping remains consistent. With an _update, if the document doesn't exist, you'll get back a 404. If it does exist, ES will merge the contents of doc with whatever document exists there. If you make sure you're sending over a new doc with all the fields in the mapping, then you're effectively replacing it outright.
Note, however, that you can do it without the document merge rather efficiently in two requests; the first one checking for doc existence with a HEAD request. If HEAD /idx/type/id is successful, then do a PUT. This is essentially what's happening internally anyway with the update API, with a little extra overhead. But HEAD is really cheap because it's not shuffling any payload around. It simply returns an HTTP 200/404.

About the object sequence in the RestKit

I recently used RestKit to handle my net request affairs. There is a solution for the sort with the sort descriptor. But there is no sort key for the data sent by the server.
How can I keep the data in the same sequence as the server.
There is a solution that I can add a sortID in the object, but this is not very elegant. I want to know if there is any api in RestKit for this problem?
You should add sortID to the object - this is the appropriate solution. To populate it with a value you need to use the #metadata made available to your mappings:
#"#metadata.mapping.collectionIndex" : #"sortID"
This code assumes that you are specifying your mapping with a dictionary (addAttributeMappingsFromDictionary:).
Documented here, the collectionIndex provides you with an NSNumber representing the order of the item in the response data.

Resources