I am converting Oracle blob content in to byte stream and uploading the contents to azure cloud storage. Is there any way i can cross check whether the uploaded files to the storage are proper or not corrupted.
Thanks for your support.
#Bala,
As far as I known, we can check whether uploaded files are success via these methods:
After uploaded file, we can get the blob file length property and compare with the original file size.
blob.FetchAttributes();
bool success = blob.Properties.Length == length;
Another approach is that we can split files into chunks and upload those chunks asynchronously using PutBlockAsync method. We can view the uploading progress if you can create a progress bar based this methods and chunks size. I recommend you refer to this post about how to use this methods:
https://stackoverflow.com/a/21182669/4836342 or this blog.
Related
I am trying to upload videos using laravel, filepond and s3 bucket. When file size is greater than 5Mb aws is not returning anything and even the file is not getting uploaded. But when the file size is less than 5Mb it's getting uploaded and I am able to get the s3 file path.
public function upload_video(Request $request){
if ($request->hasFile('link')) {
$video_link = Storage::disk('s3')->put('videos', $request->file('link'));
return $video_link;
}
}
Ensure PHP.INI settings were allowing uploads high enough. PHP.INI only allows 2M by default, so increased this limit to 20M. And Amazon S3 doesn’t like files over 5 MB uploading in one go, instead, you need to stream it through.
please use the given reference link to solve this issue by using streaming from server to S3.
https://www.webdesign101.net/laravel-s3-uploads-failing-when-uploading-large-files-in-laravel-5-8/
If you want to upload big files you should use streams. Here’s the code to do it:
$disk = Storage::disk('s3');
$disk->put($targetFile, fopen($sourceFile, 'r+'));
I provide users with a SAS token to upload blobs. I'd like to check whether the blobs represent a valid image or not. How can I do this? In the SAS token, I make sure the blob name ends with a jpeg extension, but this does not mean the users upload an image since everything is uploaded as a byte stream.
This is Not possible as described here. Perhaps, the better way to validate is at the front end when the user tries to upload the file.
You can write an Azure Function that will be triggered every time a new blob is uploaded. And in that function you can validate if the blob is a valid image file, if it is not then you can delete it or send an email to uploader.
Summary: I'm using Blobstore to let users upload images to be served. I want to prevent users from uploading files that aren't valid images or have dimensions that are too large. I'm using App Engine's Images service to get the relevant metadata. BUT, in order to get any information about the image type or dimensions from the Images service, you have to first execute a transform, which fetches the transformed image to the App Engine server. I have it do a no-op crop and encode as a very low quality JPEG image, but it's still fetching an actual image, and all I want is the dimensions and file type. Is this the best I can do? Will the internal transfer of the image data (from Blobstore to App Engine server) cost me?
Details:
It seems like Blobstore was carefully designed for efficient serving of images from App Engine. On the other hand, certain operations seem to make you jump through inefficient hoops. I'm hoping someone can tell me that there's a more efficient way, or convince me that what I'm doing is not as wasteful as I think it is.
I'm letting users upload images to be served as part of other user-generated content. Blobstore makes the uploading and serving pretty easy. Unfortunately it lets the user upload any file they want, and I want to impose restrictions.
(Side note: Blobstore does let you limit the file size of uploads, but this feature is poorly documented. It turns out that if the user tries to exceed the limit, Blobstore will return a 413 "Entity too large", and the App Engine handler is not called at all.)
I want to allow only valid JPEG, GIF, and PNG files, and I want to limit the dimensions. The way to do this seems to be to check the file after upload, and delete it if it's not allowed. Here's what I've got:
class ImageUploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
try:
# TODO: Check that user is logged in and has quota; xsrfToken.
uploads = self.get_uploads()
if len(uploads) != 1:
logging.error('{} files uploaded'.format(len(uploads)))
raise ServerError('Must be exactly 1 image per upload')
image = images.Image(blob_key=uploads[0].key())
# Do a no-op transformation; otherwise execute_transforms()
# doesn't work and you can't get any image metadata.
image.crop(0.0, 0.0, 1.0, 1.0)
image.execute_transforms(output_encoding=images.JPEG, quality=1)
if image.width > 640 or image.height > 640:
raise ServerError('Image must be 640x640 or smaller')
resultUrl = images.get_serving_url(uploads[0].key())
self.response.headers['Content-Type'] = 'application/json'
self.response.body = jsonEncode({'status': 0, 'imageUrl': resultUrl})
except Exception as e:
for upload in uploads:
blobstore.delete(upload.key()) # TODO: delete in parallel with delete_async
self.response.headers['Content-Type'] = 'text/plain'
self.response.status = 403
self.response.body = e.args[0]
Comments in the code highlight the issue.
I know the image can be resized on the fly at serve time (using get_serving_url), but I'd rather force users to upload a smaller image in the first place, to avoid using up storage. Later, instead of putting a limit on the original image dimensions, I might want to have it automatically get shrunk at upload time, but I'd still need to find out its dimensions and type before shrinking it.
Am I missing an easier or more efficient way?
Actually the Blobstore is not exactly optimized for serving images, it operates on any kind of data. The BlobReader class can be used to manage the raw blob data.
The GAE Images service can be used to manage images (including those stored as blobs in the BlobStore). You are right in the sense that this service only offers info about the uploaded image only after executing a transformation on it, which doesn't help with deleting undesirable blob images prior to processing.
What you can do is use the Image module from the PIL library (available between the GAE's Runtime-Provided Libraries) overlayed on top of the BlobReader class.
The PIL Image format and size methods to get the info you seek and sanitize the image data before reading the entire image:
>>> image = Image.open('Spain-rail-map.jpg')
>>> image.format
'JPEG'
>>> image.size
(410, 317)
These methods should be very efficient since they only need image header info from the blob loaded by the open method:
Opens and identifies the given image file. This is a lazy operation;
the function reads the file header, but the actual image data is not
read from the file until you try to process the data (call the load
method to force loading).
This is how overlaying can be done in your ImageUploadHandler:
from PIL import Image
with blobstore.BlobReader(uploads[0].key()) as fd:
image = Image.open(fd)
logging.error('format=%s' % image.format)
logging.error('size=%dx%d' % image.size)
When you upload to Google Cloud Storage (GCS) instead of the blobstore you have much more control over object upload conditions like name, type and size. A policy document controls the user conditions. If the user does not meet these upload conditions, the object will be rejected.
Docs here.
Example:
{"expiration": "2010-06-16T11:11:11Z",
"conditions": [
["starts-with", "$key", "" ],
{"acl": "bucket-owner-read" },
{"bucket": "travel-maps"},
{"success_action_redirect":"http://www.example.com/success_notification.html" },
["eq", "$Content-Type", "image/jpeg" ],
["content-length-range", 0, 1000000]
]
}
The POST response if the content length was exceeded:
<Error>
<Code>EntityTooLarge</Code>
<Message>
Your proposed upload exceeds the maximum allowed object size.
</Message>
<Details>Content-length exceeds upper bound on range</Details>
</Error>
The POST response if a PDF was send:
<Error>
<Code>InvalidPolicyDocument</Code>
<Message>
The content of the form does not meet the conditions specified in the policy document.
</Message>
<Details>Policy did not reference these fields: filename</Details>
</Error>
And here you can find my Python code for a direct upload to GCS.
Is there any way to read line by line from a text file in the blob storage in windows Azure??
Thanks
Yes, you can do this with streams, and it doesn't necessarily require that you pull the entire file, though please read to the end (of the answer... not the file in question) because you may want to pull the whole file anyway.
Here is the code:
StorageCredentialsAccountAndKey credentials = new StorageCredentialsAccountAndKey(
"YourStorageAccountName",
"YourStorageAccountKey"
);
CloudStorageAccount account = new CloudStorageAccount(credentials, true);
CloudBlobClient client = new CloudBlobClient(account.BlobEndpoint.AbsoluteUri, account.Credentials);
CloudBlobContainer container = client.GetContainerReference("test");
CloudBlob blob = container.GetBlobReference("CloudBlob.txt");
using (var stream = blob.OpenRead())
{
using (StreamReader reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
{
Console.WriteLine(reader.ReadLine());
}
}
}
I uploaded a text file called CloudBlob.txt to a container called test. The file was about 1.37 MB in size (I actually used the CloudBlob.cs file from GitHub copied into the same file six or seven times). I tried this out with a BlockBlob which is likely what you'll be dealing with since you are talking about a text file.
This gets a reference to the BLOB as usualy, then I call the OpenRead() method off the CloudBlob object which returns you a BlobStream that you can then wrap in a StreamReader to get you the ReadLine method. I ran fiddler with this and noticed that it ended up calling up to get additional blocks three times to complete the file. It looks like the BlobStream has a few properties and such you can use to tweak the amount of reading ahead you have to do, but I didn't try adjusting them. According to one reference I found the retry policy also works at the last read level, so it won't attempt to re-read the whole thing again, just the last request that failed. Quoted here:
Lastly, the DownloadToFile/ByteArray/Stream/Text() methods performs it’s entire download in a single streaming get. If you use CloudBlob.OpenRead() method it will utilize the BlobReadStream abstraction which will download the blob one block at a time as it is consumed. If a connection error occurs, then only that one block will need to be re-downloaded(according to the configured RetryPolicy). Also, this will potentially help improve performance as the client may not need cache a large amount of data locally. For large blobs this can help significantly, however be aware that you will be performing a higher number of overall transactions against the service. -- Joe Giardino
I think it is important to note the caution that Joe points out in that this will lead to an overall larger number of transactions against your storage account. However, depending on your requirements this may still be the option you are looking for.
If these are massive files and you are doing a lot of this then it could many, many transactions (though you could see if you can tweak the properties on the BlobStream to increase the amount of blocks retrieved at a time, etc). It may still make sense to do a DownloadFromStream on the CloudBlob (which will pull the entire contents down), then read from that stream the same way I did above.
The only real difference is that one is pulling smaller chunks at a time and the other is pulling the full file immediately. There are pros and cons for each and it will depend heavily on how large these files are and if you plan on stopping at some point in the middle of reading the file (such as "yeah, I found the string I was searching for!) or if you plan on reading the entire file anyway. If you plan on pulling the whole file no matter what (because you are processing the entire file for example), then just use the DownloadToStream and wrap that in a StreamReader.
Note: I tried this with the 1.7 SDK. I'm not sure which SDK these options were introduced.
In case anyone finds themselves here, the Python SDK for Azure Blob Storage (v12) now has the simple download_blob() method, which accepts two parameters - offset and length.
Using Python, my goal was to extract the header row from (many) files in blob storage. I knew the locations of all of the files, so I created a list of the blob clients - one for each file. Then, I iterated through the list and ran the download_blob method.
Once you have created a Blob Client (either directly via connection string or using the BlobServiceClient.get_blob_client() method), just download the first (say,) 4k bytes to cover any long header rows, then split the text using an end of line character ('\n'). The first element of the resulting list will be a header row. My working code (just for a single file) looked like:
from azure.storage.blob import BlobServiceClient
MAX_LINE_SIZE = 4096 # You can change this..
my_blob_service_client = BlobServiceClient(account_url=my_url, credential=my_shared_access_key)
my_blob_client = my_blob_service_client.get_blob_client('my-container','my_file.csv')
file_size = my_blob_client.size
offset = 0
You can then write a loop to downloading the text line by by line, by counting the byte offset at the first end-of-line, and getting the next MAX_LINE_SIZE bytes. For optimum efficiency, it'd be nice to know the maximum length of a line, but if you don't, guess a sufficiently large length.
while offset < file_size - 1:
next_text_block = my_blob_client.download_blob(offset=offset, length=MAX_LINE_SIZE)
line = next_text_block.split('\n')[0]
offset = len(line) + 1
# Do something with your line..
Hope that helps. The obvious trade-offs here are network overhead, each call for a line of text is not fast, but it achieves your requirement of reading line-by-line.
To directly answer your question, you will have to write code to download the blob locally first and then read the content in it. This is mainly because you can not just peak into a blob and read its content in middle. IF you have used Windows Azure Table Storage, you sure can read the specific content in the table.
As your text file is a blob and located at the Azure Blob storage, what you really need is to download the blob locally (as local blob or memory stream) and then read the content in it. You will have to download the blob full or partial depend on what type of blob you have uploaded. With Page blobs you can download specific size of content locally and process it. It would be great to know about difference between block and page blob on this regard.
This the code I used to fetch a file line by line. The file was stored in Azure Storage. File service was used and not blob service.
//https://learn.microsoft.com/en-us/azure/storage/storage-dotnet-how-to-use-files
//https://<storage account>.file.core.windows.net/<share>/<directory/directories>/<file>
public void ReadAzureFile() {
CloudStorageAccount account = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudFileClient fileClient = account.CreateCloudFileClient();
CloudFileShare share = fileClient.GetShareReference("jiosongdetails");
if (share.Exists()) {
CloudFileDirectory rootDir = share.GetRootDirectoryReference();
CloudFile file = rootDir.GetFileReference("songdetails(1).csv");
if (file.Exists()) {
using(var stream = file.OpenRead()) {
using(StreamReader reader = new StreamReader(stream)) {
while (!reader.EndOfStream) {
Console.WriteLine(reader.ReadLine());
}
}
}
}
}
I have a requirement where I need to upload xsl (excel 2003) into SQL 2008R2 database. I am using ORCHAD stuff for scheduling.
I am using HTTPPOSTEDFILEBASE filestream to convert into byte array and store in database.
After storing background scheduler picks up the task and process the data stored. I need to create objects from data in excel and send for processing. I am struck at decoding byte array :(
What is the best way to handle this kinda requirement? any libraries which I make use.
My web app is build with MVC3, EF4.1,repository pattern, Autofaq.
I have not used the HTTPPOSTEDFILEBASE class, but you could:
Convert the file to a byte stream
Save it as the appropriate byte/blob type in your database (store the extension in a separate field)
Retrieve bytes and add the appropriate extension to the file stream
Treat as a normal file...
But I'm actually wondering if your requirements even demand this. Why are you storing the file in the first place? If you are only using the file data to shape your business object (that I'm guessing gets saved somewhere), you could perform that data extraction, shaping, and persistence before you store the file as raw bytes so you never have to reconstitute the file for that purpose.