Use Azure Blob storage instead of S3 buckets in Apache Superset cache - azure-blob-storage

I'm trying to enable thumbnail caching in my Superset instance. The official docs provide an example of caching the thumbnail images on Amazon S3. But I'm on the Azure cloud. How could I rework this block of code to work with an Azure blob (or other appropriate Azure storage) instead?
From the docs:
from flask import Flask
from s3cache.s3cache import S3Cache
...
class CeleryConfig(object):
broker_url = "redis://localhost:6379/0"
imports = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails")
result_backend = "redis://localhost:6379/0"
worker_prefetch_multiplier = 10
task_acks_late = True
CELERY_CONFIG = CeleryConfig
def init_thumbnail_cache(app: Flask) -> S3Cache:
return S3Cache("bucket_name", 'thumbs_cache/')
THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache
# Async selenium thumbnail task will use the following user
THUMBNAIL_SELENIUM_USER = "Admin"
I have the rest of my caching working with Redis. I see these Python packages that look relevant: https://github.com/alejoar/Flask-Azure-Storage and https://pypi.org/project/azure-storage-blob/ but I'm not sure where to go next.

Related

How do I access BlockBlobClient in Azure Storage JavaScript client library for browsers?

I'm attempting to use BlockBlobClient in a browser page to upload a file using a server-supplied sastoken / URL, similar to this C# code:
var blob = new CloudBlockBlob(new Uri(assetUploadUrl));
blob.UploadFromFile(FilePath, null, new BlobRequestOptions {RetryPolicy = new ExponentialRetry()});
Although the docs suggest BlockBlobClient is available in #azure/storage-blob and should be able to upload browser data from an input[type=file] element using uploadBrowserData, I can find no reference to BlockBlobClient in the browser library source. I looked into modifying the browserify export scripts but I can't find any references in the main package source either. Also the example code suggests that using #azure/storage-blog gives you a BlobServiceClient by default:
const { BlobServiceClient } = require("#azure/storage-blob");
Is BlockBlobClient actually available in the JavaScript client library?
Okay I've figured this out, I need to use the Azure Storage client library for JavaScript, there's even a sample of doing exactly what I need to do. Now I just need to figure out how to bundle npm package files for use in Razor pages.

How can I call list of ec2 instance based on the app code using tag method

I am trying to get all the instance(server name) ID based on the app. Let's say I have an app in the server. How do I know which apps below to which server. I want my code to find all the instance (server) that belongs to each app. Is there any way to look through the app in the ec2 console and figure out the servers are associated with the app. More of using tag method
import boto3
client = boto3.client('ec2')
my_instance = 'i-xxxxxxxx'
(Disclaimer: I work for AWS Resource Groups)
Seeing your comments that you use tags for all apps, you can use AWS Resource Groups to create a group - the example below assumes you used App:Something as tag, first creates a Resource Group, and then lists all the members of that group.
Using this group, you can for example get automatically a CloudWatch dashboard for those resources, or use this group as a target in RunCommand.
import json
import boto3
RG = boto3.client('resource-groups')
RG.create_group(
Name = 'Something-App-Instances',
Description = 'EC2 Instances for Something App',
ResourceQuery = {
'Type': 'TAG_FILTERS_1_0',
'Query': json.dumps({
'ResourceTypeFilters': ['AWS::EC2::Instance'],
'TagFilters': [{
'Key': 'App',
'Values': ['Something']
}]
})
},
Tags = {
'App': 'Something'
}
)
# List all resources in a group using a paginator
paginator = RG.get_paginator('list_group_resources')
resource_pages = paginator.paginate(GroupName = 'Something-App-Instances')
for page in resource_pages:
for resource in page['ResourceIdentifiers']:
print(resource['ResourceType'] + ': ' + resource['ResourceArn'])
Another option to just get the list without saving it as a group would be to directly use the Resource Groups Tagging API
What you install on an Amazon EC2 instance is totally up to you. You do this by running code on the instance itself. AWS is not involved in the decision of what you install on the instance, nor does it know what you installed on an instance.
Therefore, you will need to keep track of "what apps are installed on what server" yourself.
You might choose to take advantage of Tags on instances to add some metadata, such as the purpose of the server. You could also use AWS Systems Manager to run commands on instances (eg to install software) or even use AWS CodeDeploy to roll-out software to fleets of servers.
However, even with all of these deployment options, AWS cannot track what you have put on each individual server. You will need to do that yourself.
Update: You can use AWS Resource Groups to view/manage resources by tag.
Here's some sample Python code to list tags by instance:
import boto3
ec2_resource = boto3.resource('ec2', region_name='ap-southeast-2')
instances = ec2_resource.instances.all()
for instance in instances:
for tag in instance.tags:
print(instance.instance_id, tag['Key'], tag['Value'])

Saving Images Using Spring Boot and MySQL

I am creating a web app using Spring Boot and JPA. I want to upload many images. I want those images to save in a storage and save the location of the file into database. I am not understanding how to achieve this?
After researching a lot, I have come to a solution. I have used Amazon AWS S3 for storing the media data. Amazon AWS S3 comes with its own library. It gives you rich api to upload the media files and having control over them. I have just uploaded the file into the S3 Bucket and saved the corresponding URL into the MYSQL database. Please comment if you need the corresponding codes.
I have encountered this problem in Django application.
What i did was just used amazon S3 bucket and store all the images. I get the location of the image which is in s3 and store it in mysql DB. when i need the image i just call the image from s3 and used it.
Accept Images as a zipstream(Faster)/binarystream or as a MultipartFile
Load the stream ZipInputStream zis = new ZipInputStream(inputStream);
load the stream to File using Java.IO.File class
Push the file data to file system(Same server machine or any non-sql Database)
save the Location in a DB using Spring data JPA
This is how I DID it.
I store my images as a LONGBLOB in my MySQL database so use columnDefinition = "LONGBLOB". In your HTML code for your upload button, make sure you include the multiple = "true" in your tag if you are using HTML.
In your Controller class make a multipart file that will get the images if you are using binding models and also, you need to have a Set/List of images so you can go through them all.
MultipartFile[] images = bindingModel.getImages();
Set<Image> newImages = new HashSet<>();
I have this neat for loop so I can go through all of my images:
for (MultipartFile multipartFileFOREACH : images)
{
Image newImage = new Image(multipartFileFOREACH.getBytes(), postEntity);//Image is an Entity here, postEntity is an object
newImages.add(newImage);
this.imageRepository.saveAndFlush(newImage);
}
this.articleRepository.saveAndFlush(articleEntity);
So, here I store my bytes into the LONGBLOB in the SQL database
Then for the actual view
Set<Image> imagesBytes = post.getImages();
List<String> images = new ArrayList<>();
for (Image image : imagesBytes)
{
images.add(Base64.getEncoder().encodeToString(image.getLink()));
}
My Image Entity has a link variable of type byte[]
Then I simply add the view for the HTML:
model.addAttribute("images", images);
use s3 bucket for storing image and my sql for storing the address of the image
Using MultipartFile you can upload the file.
Save the File in your server using Files.Copy through BufferedInputStream
Save the saved-location of your file in your database through Spring-Data repository.
There are three options to store your files.
a) in your local file system
b) store in db as bytes
c) use cloud service such as AWS S3 bucket
This is a good tutorial for setting up S3 bucket with SpringBoot applications.
https://medium.com/oril/uploading-files-to-aws-s3-bucket-using-spring-boot-483fcb6f8646

Google API + proxy + httplib2

I'm currently running a script to pull data from Google Analytics with googleapiclient Python package (that is based on httplib2 client object)
--> My script works perfectly without any proxy.
But I have to put it behind my corporate proxy, so I need to adapt my httplib2.Http() object to embed proxy information.
Following httplib2 doc 1 I tried:
pi = httplib2.proxy_info_from_url('http://user:pwd#someproxy:80')
httplib2.Http(proxy_info=pi).request("http://www.google.com")
But it did not work.
I always get a Time out error, with or without the proxy info (so proxy_info in parameter is not taken into account)
I also downloaded socks in PySocks package (v1.5.6) and tried to "wrapmodule" httplib2 as described in here:
https://github.com/jcgregorio/httplib2/issues/205
socks.setdefaultproxy(socks.PROXY_TYPE_HTTP, "proxyna", port=80, username='p.tisserand', password='Telematics12')
socks.wrapmodule(httplib2)
h = httplib2.Http()
h.request("http://google.com")
But I get an IndexError: (tuple index out of range)
In the meantime,
When I use the requests package, this simple code works perfectly:
os.environ["HTTP_PROXY"] = "http://user:pwd#someproxy:80"
req = requests.get("http://www.google.com")
The problem is that need to fit with googleapiclient requirements and provide a htpplib2.Http() client object.
rather than using Python2, I think you'd better try using httplib2shim
You can have a look at this tutorial on my blog :
https://dinatam.com/fr/python-3-google-api-proxy/
In simple words, just replace this kind of code :
from httplib2 import Http
http_auth = credentials.authorize(Http())
by this one :
import httplib2shim
http_auth = credentials.authorize(httplib2shim.Http())
I decided to recode my web app in Python 2, still using the httplib2 package.
Proxy info are now taken into account. It now works.

How do I upload some file into Azure blob storage without writing my own program?

I created an Azure Storage account. I have a 400 megabytes .zip file that I want to put into blob storage for later use.
How can I do that without writing code? Is there some interface for that?
Free tools:
Visual Studio 2010 -- install Azure tools and you can find the blobs in the Server Explorer
Cloud Berry Lab's CloudBerry Explorer for Azure Blob Storage
ClumpsyLeaf CloudXplorer
Azure Storage Explorer from CodePlex (try version 4 beta)
There was an old program called Azure Blob Explorer or something that no longer works with the new Azure SDK.
Out of these, I personally like CloudBerry Explorer the best.
The easiest way is to use Azure Storage PowerShell. It provided many commands to manage your storage container/blob/table/queue.
For your mentioned case, you could use Set-AzureStorageBlobContent which could upload a local file into azure storage as a block blob or page blob.
Set-AzureStorageBlobContent -Container containerName -File .\filename -Blob blobname
For details, please refer to http://msdn.microsoft.com/en-us/library/dn408487.aspx.
If you're looking for a tool to do so, may I suggest that you take a look at our tool Cloud Storage Studio (http://www.cerebrata.com/Products/CloudStorageStudio). It's a commercial tool for managing Windows Azure Storage and Hosted Service. You can also find a comprehensive list of Windows Azure Storage Management tools here: http://blogs.msdn.com/b/windowsazurestorage/archive/2010/04/17/windows-azure-storage-explorers.aspx
Hope this helps.
The StorageClient has this built into it. No need to write really anything:
var account = new CloudStorageAccount(creds, false);
var client = account.CreateCloudBlobClient();
var blob = client.GetBlobReference("/somecontainer/hugefile.zip");
//1MB seems to be a pretty good all purpose size
client.WriteBlockSizeInBytes = 1024;
//this sets # of parallel uploads for blocks
client.ParallelOperationThreadCount = 4; //normally set to one per CPU core
//this will break blobs up automatically after this size
client.SingleBlobUploadThresholdInBytes = 4096;
blob.UploadFile("somehugefile.zip");
I use Cyberduck to manage my blob storage.
It is free and very easy to use. It works with other cloud storage solutions as well.
I recently found this one as well: CloudXplorer
Hope it helps.
There is a new OpenSource tool provided by Microsoft :
Project Deco - Crossplatform Microsoft Azure Storage Account Explorer.
Please, check those links:
Download binaries: http://storageexplorer.com/
Source Code: https://github.com/Azure/deco
You can use Cloud Combine for reliable and quick file upload to Azure blob storage.
A simple batch file using Microsoft's AzCopy utility will do the trick. You can drag-and-drop your files on the following batch file to upload into your blob storage container:
upload.bat
#ECHO OFF
SET BLOB_URL=https://<<<account name>>>.blob.core.windows.net/<<<container name>>>
SET BLOB_KEY=<<<your access key>>>
:AGAIN
IF "%~1" == "" GOTO DONE
AzCopy /Source:"%~d1%~p1" /Dest:%BLOB_URL% /DestKey:%BLOB_KEY% /Pattern:"%~n1%~x1" /destType:blob
SHIFT
GOTO AGAIN
:DONE
PAUSE
Note that the above technique only uploads one or more files individually (since the Pattern flag is specified) instead of uploading an entire directory.
You can upload files to Azure Storage Account Blob using Command Prompt.
Install Microsoft Azure Storage tools.
And then Upload it to your account blob will CLI command:
AzCopy /Source:"filepath" /Dest:bloburl /DestKey:accesskey /destType:blob
Hope it Helps.. :)
You can upload large files directly to the Azure Blob Storage directly using the HTTP PUT verb, the biggest file I have tried with the code below is 4,6 Gb. You can do this in C# like this:
// write up to ChunkSize of data to the web request
void WriteToStreamCallback(IAsyncResult asynchronousResult)
{
var webRequest = (HttpWebRequest)asynchronousResult.AsyncState;
var requestStream = webRequest.EndGetRequestStream(asynchronousResult);
var buffer = new Byte[4096];
int bytesRead;
var tempTotal = 0;
File.FileStream.Position = DataSent;
while ((bytesRead = File.FileStream.Read(buffer, 0, buffer.Length)) != 0
&& tempTotal + bytesRead < CHUNK_SIZE
&& !File.IsDeleted
&& File.State != Constants.FileStates.Error)
{
requestStream.Write(buffer, 0, bytesRead);
requestStream.Flush();
DataSent += bytesRead;
tempTotal += bytesRead;
File.UiDispatcher.BeginInvoke(OnProgressChanged);
}
requestStream.Close();
if (!AbortRequested) webRequest.BeginGetResponse(ReadHttpResponseCallback, webRequest);
}
void StartUpload()
{
var uriBuilder = new UriBuilder(UploadUrl);
if (UseBlocks)
{
// encode the block name and add it to the query string
CurrentBlockId = Convert.ToBase64String(Encoding.UTF8.GetBytes(Guid.NewGuid().ToString()));
uriBuilder.Query = uriBuilder.Query.TrimStart('?') + string.Format("&comp=block&blockid={0}", CurrentBlockId);
}
// with or without using blocks, we'll make a PUT request with the data
var webRequest = (HttpWebRequest)WebRequestCreator.ClientHttp.Create(uriBuilder.Uri);
webRequest.Method = "PUT";
webRequest.BeginGetRequestStream(WriteToStreamCallback, webRequest);
}
The UploadUrl is generated by Azure itself and contains a Shared Access Signature, this SAS URL says where the blob is to be uploaded to and how long time the security access (write access in your case) is given. You can generate a SAS URL like this:
readonly CloudBlobClient BlobClient;
readonly CloudBlobContainer BlobContainer;
public UploadService()
{
// Setup the connection to Windows Azure Storage
var storageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
BlobClient = storageAccount.CreateCloudBlobClient();
// Get and create the container
BlobContainer = BlobClient.GetContainerReference("publicfiles");
}
string JsonSerializeData(string url)
{
var serializer = new DataContractJsonSerializer(url.GetType());
var memoryStream = new MemoryStream();
serializer.WriteObject(memoryStream, url);
return Encoding.Default.GetString(memoryStream.ToArray());
}
public string GetUploadUrl()
{
var sasWithIdentifier = BlobContainer.GetSharedAccessSignature(new SharedAccessPolicy
{
Permissions = SharedAccessPermissions.Write,
SharedAccessExpiryTime =
DateTime.UtcNow.AddMinutes(60)
});
return JsonSerializeData(BlobContainer.Uri.AbsoluteUri + "/" + Guid.NewGuid() + sasWithIdentifier);
}
I also have a thread on the subject where you can find more information here How to upload huge files to the Azure blob from a web page
The new Azure Portal has an 'Editor' menu option in preview when in the container view. Allows you to upload a file directly to the container from the Portal UI
I've used all the tools mentioned in post, and all work moderately well with block blobs. My favorite however is BlobTransferUtility
By default BlobTransferUtility only does block blobs. However changing just 2 lines of code and you can upload page blobs as well. If you, like me, need to upload a virtual machine image it needs to be a page blob.
(for the difference please see this MSDN article.)
To upload page blobs just change lines 53 and 62 of BlobTransferHelper.cs from
new Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob
to
new Microsoft.WindowsAzure.Storage.Blob.CloudPageBlob
The only other thing to know about this app is to uncheck HELP when you first run the program to see the actual UI.
Check out this post Uploading to Azure Storage where it is explained how to easily upload any file via PowerShell to Azure Blob Storage.
You can use Azcopy tool to upload the required files to the azure default storage is block blob u can change pattern according to your requirement
Syntax
AzCopy /Source : /Destination /s
Try the Blob Service API
http://msdn.microsoft.com/en-us/library/dd135733.aspx
However, 400mb is a large file and I am not sure a single API call will deal with something of this size, you may need to split it and reconstruct using custom code.

Resources