I have the following code:
bucket = get_bucket('bucket-name')
blob = bucket.blob(os.path.join(*pieces))
blob.upload_from_string('test')
blob.make_public()
result = blob.public_url
# result is `<Mock name='mock().get_bucket().blob().public_url`
And I would do like to mock the result of public_url, my unit test code is something like this
with ExitStack() as st:
from google.cloud import storage
blob_mock = mock.Mock(spec=storage.Blob)
blob_mock.public_url.return_value = 'http://'
bucket_mock = mock.Mock(spec=storage.Bucket)
bucket_mock.blob.return_value = blob_mock
storage_client_mock = mock.Mock(spec=storage.Client)
storage_client_mock.get_bucket.return_value = bucket_mock
st.enter_context(
mock.patch('google.cloud.storage.Client', storage_client_mock))
my_function()
Is there something like FakeRedis or moto for Google Storage, so I can mock google.cloud.storage.Blob.public_url?
I found this fake gcs server written in Go which can be run within a Docker container and consumed by the Python library. See Python examples.
Related
I successfully built logic app where whenever a blob is added in container-one, it gets copied to container-2. However it fails when any blobs larger than 50 MB (default size) is uploaded.
Could you please guide.
Blobs are added via rest api.
Below is the flow,
Currently, the maximum file size with disabled chunking is 50MB. One of the workarounds is to use Azure functions in order to transfer the files from one container to another.
Below is the sample Python Code that worked for me when I'm trying to transfer files from One container to Another
from azure.storage.blob import BlobClient, BlobServiceClient
from azure.storage.blob import ResourceTypes, AccountSasPermissions
from azure.storage.blob import generate_account_sas
from datetime import datetime,timedelta
connection_string = '<Your Connection String>'
account_key = '<Your Account Key>'
source_container_name = 'container1'
blob_name = 'samplepdf.pdf'
destination_container_name = 'container2'
# Create client
client = BlobServiceClient.from_connection_string(connection_string)
# Create sas token for blob
sas_token = generate_account_sas(
account_name = client.account_name,
account_key = account_key,
resource_types = ResourceTypes(object=True),
permission= AccountSasPermissions(read=True),
expiry = datetime.utcnow() + timedelta(hours=4)
)
# Create blob client for source blob
source_blob = BlobClient(
client.url,
container_name = source_container_name,
blob_name = blob_name,
credential = sas_token
)
# Create new blob and start copy operation
new_blob = client.get_blob_client(destination_container_name, blob_name)
new_blob.start_copy_from_url(source_blob.url)
RESULT:
REFERENCES:
General Limits
How to copy a blob from one container to another container using Azure Blob storage SDK
I checked the whole azure-storage-blob gem and didn't find any way to get the URI for a blob. Is there some way to construct it correctly and in a generic way that will work for any other blob in any region?
I used S3 SDK before and I'm well grounded in S3 but new to Azure.
There is a protected method called blob_uri that looks like this:
def blob_uri(container_name, blob_name, query = {}, options = {})
if container_name.nil? || container_name.empty?
path = blob_name
else
path = ::File.join(container_name, blob_name)
end
options = { encode: true }.merge(options)
generate_uri(path, query, options)
end
So you could take the short cut of:
blob_client = Azure::Storage::Blob::BlobService.create(storage_account_name: 'XXX' , storage_access_key: 'XXX')
blob_client.send(:blob_uri, container_name,blob_name)
However, the actual URI is simply:
https://[storage_account_name].blob.core.windows.net/container/[container[s]]/[blob file name]
So since you have to know the blob name and the container to access to blob.
File.join(blob_client.host,container,blob_name)
Is the URI to the blob
Im trying to use boto3 in a job of AWS Glue to call a Lambda Function but without results.
I upload a zip with the libraries:
Like the examples by AWS
and without a zip.
The error is this " Unable to load data for: endpoints".
Im trying to invoke without zip but this go to a timeout exception.
import boto3
client = boto3.client('lambda' , region_name='us-east-1')
r_lambda = client.invoke(FunctionName='S3GlueJson')
Can someone help me ?
In Python, use Boto3 Lambda client 'invoke()'. For example, you can create a Lambda container, then call that from a Glue Job:
import boto3
import pandas as pd
lambda_client = boto3.client('lambda',region_name='us-east-1')
def get_predictions( df ):
# Call getPredictions Lambda container
response = lambda_client.invoke(
FunctionName='getPredictions',
InvocationType='RequestResponse',
LogType='Tail',
Payload=df
)
logger.info('Received response from Lambda container.')
data = response["Payload"].read().decode('utf-8')
x = json.loads(data)
df_pred = pd.DataFrame.from_dict(x)
return df_pred
dfjson = df.to_json()
df_pred = get_predictions( dfjson )
df_pred.head()
If you want to call a Glue Jobs from Lambda Function, can do it like this:
import boto3
glue = boto3.client(service_name='glue', region_name='us-east-1',
endpoint_url='https://glue.us-east-1.amazonaws.com')
#Start Job
myNewJobRun = glue.start_job_run(JobName=JOB_NAME)
#Get current state of Job, to be sure it's running
status = glue.get_job_run(JobName=JOB_NAME, RunId=myNewJobRun['JobRunId'])
logger.info('JOB State {}: {}'.format(
JOB_NAME, status['JobRun']['JobRunState']))
As Job execution can late some time to finish, it's better to don't wait on Lambda function for it to finish.
requests = [conn.request_spot_instances(price=0.0034, image_id='ami-6989a659', count=1,type='one-time', instance_type='m1.micro')]
I used the following code. But it is not working.
Use the following code to create instance from python command line.
import boto.ec2
conn = boto.ec2.connect_to_region(
"us-west-2",
aws_access_key_id="<aws access key>",
aws_secret_access_key="<aws secret key>",
)
conn = boto.ec2.connect_to_region("us-west-2")
conn.run_instances(
"<ami-image-id>",
key_name="myKey",
instance_type="t2.micro",
security_groups=["your-security-group-here"],
)
To create an EC2 instance using Python on AWS, you need to have "aws_access_key_id_value" and "aws_secret_access_key_value".
You can store such variables in config.properties and write your code in create-ec2-instance.py file
Create a config.properties and save the following code in it.
aws_access_key_id_value='YOUR-ACCESS-KEY-OF-THE-AWS-ACCOUNT'
aws_secret_access_key_value='YOUR-SECRETE-KEY-OF-THE-AWS-ACCOUNT'
region_name_value='region'
ImageId_value = 'ami-id'
MinCount_value = 1
MaxCount_value = 1
InstanceType_value = 't2.micro'
KeyName_value = 'name-of-ssh-key'
Create create-ec2-instance.py and save the following code in it.
import boto3
def getVarFromFile(filename):
import imp
f = open(filename)
global data
data = imp.load_source('data', '', f)
f.close()
getVarFromFile('config.properties')
ec2 = boto3.resource(
'ec2',
aws_access_key_id=data.aws_access_key_id_value,
aws_secret_access_key=data.aws_secret_access_key_value,
region_name=data.region_name_value
)
instance = ec2.create_instances(
ImageId = data.ImageId_value,
MinCount = data.MinCount_value,
MaxCount = data.MaxCount_value,
InstanceType = data.InstanceType_value,
KeyName = data.KeyName_value)
print (instance[0].id)
Use the following command to execute the python code.
python create-ec2-instance.py
The following code fetches parameter from request and respond from couchbase db as per the value of the parameter.
couchbase = Couchbase("ubuntumartini03:8091", "thebucket", "")
bucket = couchbase["thebucket"]
class MH(tornado.web.RequestHandler):
def get(self):
key = self.get_argument("pub_id", strip=True)
result = json.loads(bucket.get(key)[2])
self.write(result['metaTag'])
if __name__=="__main__":
app = tornado.web.Application(handlers=[(r"/", MH)])
app.listen(8888,"")
tornado.ioloop.IOLoop.instance().start()
Problem: For the given hardware, we can make 10k/sec calls to Couchbase from Tornado machine. But when we are making a call from client to Tornado machine, we are only able to make 350 calls/sec.
Surely the bottleneck here is Tornado. How to optimize it to be able to make atleast 7k calls/sec?
Edit your code like this :
from tornado.ioloop import IOLoop
couchbase = Couchbase("ubuntumartini03:8091", "thebucket", "")
bucket = couchbase["thebucket"]
class MH(tornado.web.RequestHandler):
async def get(self):
key = self.get_argument("pub_id", strip=True)
result = await IOLoop.current().run_in_executor(None,bucket.get,*(key))
self.write(result[2]['metaTag'])
if __name__=="__main__":
app = tornado.web.Application(handlers=[(r"/", MH)])
app.listen(8888,"")
tornado.ioloop.IOLoop.instance().start()
What client do you use, is it a synchronous or an ansynchronous client? If this is a synchronous client, it can't not make full use of the tornado ioloop reactor.