Case insensitive file search from s3 bucket - spring-boot

I am using the Spring Boot SDK for aws and I want to check my s3 bucket to check if a file exists or not, ignoring the case of the filename. Right now I am searching if the file exits by:
s3client.doesObjectExist(bucketname,objectname)
objectname is the file key for s3 with complete filename at the end. So, what I want to do is: if the file path is a/b/c/d/car.pdf, the above method should return "true," even though the actual file path in s3 is: a/b/c/d/CAR.pdf, a/b/c/d/caR.pdf or a/b/c/D/car.pdf.

ObjectListing listObjects(String bucketName) throws SdkClientException, AmazonServiceException
Returns a list of summary information about the objects in the
specified buckets. List results are always returned in lexicographic
(alphabetical) order.
Because buckets can contain a virtually unlimited number of keys, the
complete results of a list query can be extremely large. To manage
large result sets, Amazon S3 uses pagination to split them into
multiple responses. Always check the ObjectListing.isTruncated()
method to see if the returned listing is complete or if additional
calls are needed to get more results. Alternatively, use the
AmazonS3Client.listNextBatchOfObjects(ObjectListing) method as an easy
way to get the next page of object listings.
The total number of keys in a bucket doesn't substantially affect list
performance.
So, you can do something like this:
ObjectListing objectListing = s3client.listObjects("MyBucketName");
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
// implement some search algorithm to find matching files using objectSummary.getKey()
}

Related

How to get list of objects from s3 bucket sorted by last modified timestamp using minio-go api?

I went through the documentation of minio-go-api. But didn't get any solution for that, as objects are sorted based on the alphabetic order.
A hack way, will be to first read all the objects and then take last modified date from each object and form the new list, which is not at all feasible for production
#Siddhanta Rath, One way to handle this is to use mc tool. Command mc find --newer and mc find --older will handle this. But internally, it will do listObjects and do the sorting for you.
The other approach would be to subscribe to notification and make sure that there is a list of uploaded objects in a database.
There is no capability to specify sort order in the Amazon S3 API. Your application will need to sort the objects into the desired oder.

How to filter an NSArray of file urls by Spotlight File Metadata Attributes / NSMetadataQuery?

The NSMetadataQuery class seems to be how Finder/Spotlight searches for files via their metadata.
NSMetadataQuery class provided by the Foundation framework. Queries can be run in two modes: asynchronous, and asynchronous with live updates. The first simply performs the search on the files that exist at the time of the initial search. The latter continues to search. updating the data as the files that fulfill or no longer fulfill the search parameters update.
https://developer.apple.com/library/content/documentation/Carbon/Conceptual/SpotlightQuery/Concepts/Introduction.html#//apple_ref/doc/uid/TP40001843-BBCFBCAG
However, it seems oriented around providing a directory (searchScopes), and then asynchronously returning results that were found in those directories (NSMetadataQueryDidFinishGathering).
I already have an NSArray containing file urls. I would like to construct a filter/search of those NSURLs using the same metadata and query syntax as a Spotlight Search. But I will provide a list of files to quickly filer, rather than a provide a directory with and receive asynchronous results.
// Something like this...
let imageFileTypePredicate = NSPredicate(fromMetadataQueryString: "(kMDItemGroupId = 13)")
let imageURLs = allURLs.filter{ imageFileTypePredicate.evaluate(with:$0) };
However, that is using a standard NSPredicate search rather than a file metadata filter and is throwing the error:
this class is not key value coding-compliant for the key _kMDItemGroupId.
The Spotlight Metadata Attributes I'm interested in filtering by are listed here:
https://developer.apple.com/library/content/documentation/CoreServices/Reference/MetadataAttributesRef/Reference/CommonAttrs.html#//apple_ref/doc/uid/TP40001694-SW1
How can an array of file urls be filtered by Spotlight metadata?
Create an MDItem for each url to get the file's spotlight attributes.
MDItem is a CF-compliant object that represents a file and the metadata associated with the file.
https://developer.apple.com/reference/coreservices/1658213-mditem
MDItemRef item = MDItemCreateWithURL(kCFAllocatorDefault, url);
CFArrayRef attributes = MDItemCopyAttributeNames(item);
NSDictionary *attributeValues = CFBridgingRelease(MDItemCopyAttributes(item, attributes));

Google Drive API v3, is there a way to get a list of folders that are parents of a fileId?

In v2 it was possible to make a call to /files with the query fileId in children to get a list of DriveFile objects that were parents of the supplied file.
Now, it seems to be required to make a call to /files/:fileId?fields=parents, then make a separate call to /files/:parentId for each returned parent, possibly turning one call into a dozen.
Is this correct, and if so why? This is a huge performance hit to our app, so hopefully there's an undocumented method.
The query "'fileId' in children'" doesn't publicly exist (not documented/supported) in v2 either and I don't recall it ever existing. What does exist in V2 is the Parents collection which effectively answers the same question. In v3, to get the parents of a file you just get the child and ask for the parents field.
As for whether or not that is a performance hit, I don't think it is in practice. The Parents resource in v2 was very light to begin with, and other than the ID the only useful field was the 'isRoot' property. That you can calculate yourself by calling files/root up front to get the ID of the root folder for that user (just once and save it, it won't change for that user.)
If you need to get more information about the parents than just the IDs and are worried about the # of calls you have to make, use batching to fetch them. If you just have one parent, no need to batch (it's just overhead.) If you find that a file has multiple parents, create a batch request. That'll be sent as a single HTTP request/response and is handled very efficiently on the back end.
Point is, if you just need IDs, it's no worse than before. It's one call to get the parents of a file.
If you need more than IDs, it's at most 2 HTTP requests (outside really bizarre edge cases like 1000+ parents which would exceed the batch size :)
In V3 it is possible to list all children of a parent as it's explained here: https://developers.google.com/drive/v3/web/search-parameters
Example call:
https://www.googleapis.com/drive/v3/files?q=parents in '0Byho0qAdzabmVl8xcDR1S0pNY3c' of course replace spaces with %20, this will list all the files in the folder which has id='0Byho0qAdzabmVl8xcDR1S0pNY3c'
you just need to mention like below:
var request = service.Files.List();
request.Q = "('root' in parents)";
var FileListOfParentOnly = request.Execute();

How do I access information from this unfamiliar data structure via Ruby?

I'm using Fog to access a cloud environment at Terremark. When I pull down our organizational data it returns a data structure that, while I know it should be straight forward, confuses me.
Using irb I initialize the connection and then request the data using conn.organizations and display it with awesome_print. It returns:
[
[0] <Fog::Compute::Ecloud::Organization
href="/cloudapi/ecloud/organizations/#######",
name="****************************** (***-###-###)",
type="application/vnd.tmrk.cloud.organization",
other_links=[{:href=>"/cloudapi/ecloud/admin/organizations/#######", :name=>"****************************** (***-###-###)", :type=>"application/vnd.tmrk.cloud.admin.organization", :rel=>"alternate"}, {:href=>"/cloudapi/ecloud/environments/organizations/#######", :type=>"application/vnd.tmrk.cloud.environment; type=collection", :rel=>"down"}, {:href=>"/cloudapi/ecloud/devicetags/organizations/#######", :type=>"application/vnd.tmrk.cloud.deviceTag; type=collection", :rel=>"down"}, {:href=>"/cloudapi/ecloud/alerts/organizations/#######", :type=>"application/vnd.tmrk.cloud.alertLog", :rel=>"down"}]
>
]
So it is returning an array with a singular element. That element is comprised of another data structure surrounded by < and >. But I'm not certain if that's accurate because there also appears to be another array containing a hash embedded within that structure.
My issue is that I need to extract the value represented by the ####### but I don't know how to access any of the sections of the output which contain that value.
What am I looking at as far as the data structure is concerned and how do I go about access the data contained within?
It's a Fog::Compute::Ecloud::Organization object and the documentation of that class should tell you what methods are available. Or you can just ask the object itself, by calling Object#methods.

Matching multiple DataItems using .getDataItems

I have data items stored with the path /item/<id>. I'm trying to get all current items stored in the network using Wearable.DataApi.getDataItems(GoogleApiClient, Uri), but passing in the uri with the /item/ path doesn't match anything. I've checked to make sure the hosts match up.
The Android Wear documentation says that:
if the host is elided, all data items matching that path are returned.
What does that mean?
Did you try Uri.parse("wear:/item")?
https://developer.android.com/reference/com/google/android/gms/wearable/DataApi.html

Resources