i've created all(as i think) as described in this article
building-snowpipe-on-azure-blob
snowflake
azure blob storage
snowpipe
but pipe works only after run "alter pipe myPipe refresh"
data loading correctly, but auto_ingest doesn't work.
please give an advice how to find an issue.
The refreshing pipe command fetches files directly from the stage while the auto-ingest option doesn't take the same route and consume messages from the Azure queue storage. Therefore, even if the Azure blob storage container is correct, the message could be delivered to the queue but not to Snowflake.
Solution Details: https://community.snowflake.com/s/article/Ingesting-new-files-with-Snowpipe-for-Azure-fails-while-refreshing-the-pipe-works
Related
I want to implement the following once a file is uploaded to aS3 Bucket
Download the file to a windows server
Run a 3rd party exe to process the file and generate an output file on a Windows Server
What is the best approach to implement this using .Net Core?
Solution 1:
Create a Lambda function to Trigger an API
API will download the file and process
Solution 2:
Create an executable to download the file from s3 bucket
Create a lambda function trigger an executable
Solution 3:
Create a service to check and download files from s3 bucket
The downloaded file will be processed by the service
Solution 4:
Use AWS Lambda to push the file to SQS
Create an application to monitor SQS.
Please let me know the best solution to implement this. Sorry for asking this non-technical question.
The correct architecture approach would be:
Create a trigger on the Amazon S3 bucket that sends a message to an Amazon SQS queue when the object is created
A Windows server is continually polling the Amazon SQS queue waiting for a message to appear
When a message appears, use the information in the message to download the object from S3 and process the file
Upload the result to Amazon S3 and optionally send an SQS message to signal completion (depending on what you wish to do after a file is processed)
This architecture is capable of scaling to large volumes and allows files to be processed in parallel and even across multiple servers. If a processing task fails and does not signal completion, then Amazon SQS will make the message visible again for processing.
Data factory Copy activity fails when copy the delta table from databricks to storage account gen2
Details
ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key
Caused by: Invalid configuration value detected for fs.azure.account.key.
Appreciate your help.
The above error mainly happens because the staging is not enabled. We need to enable staging to copy data from delta Lake.
Go to Azure Databricks inside cluster -> advance option and edit spark config as per the below format.
spark.hadoop.fs.azure.account.key.<storage_account_name>.blob.core.windows.net <Access Key>
After that you can follow this official document it has detail explanation about copy activity with delta lake.
you can refer this Article by RishShah-4592
Edit the cluster,
fs.azure.account.key..dfs.core.windows.net{{secrets//}}
Its working fine now...Able to copy data from delta lake table to adls gen2
I think so you can pass the secret as below:
spark.hadoop.fs.azure.account.key.<storage_account_name>.blob.core.windows.net {{secrets/<secret-scope-name>/<secret-name>}}
I am considering using Azure Blob Storage's build-in lifecycle management feature for deleting blobs of a certain age.
However, due to a business requirement, it must be possible to generate a report or log statement after each daily execution of the defined ruleset. The report or log must state the number of blob blocks that were affected, e.g. deleted during the run.
I have read through the documentation and Googled to see if others have had similar inquiries, but so far without any luck.
So my question: Does any of you know if and how I can get a build-in Lifecycle management system to do one of the following after each daily run:
Add a log statement to the storage account containing the Blob storage.
Generate and send a report to an endpoint I define.
If the above can't be done I will have to code the daily deletion job and report generation myself, which surely I can do, but I would like to use the built-in feature if possible.
I summarize the solution as below.
If you want to know which blobs are deleted every day, we can configure Diagnostics settings in the storqge account. After doing that, we will get the logs for read, write, and delete requests for the blob. For more detail, please refer to here and here
Regarding how to enable it, we can use PowerShell command Set-AzStorageServiceLoggingProperty.
I have turned on database activity events which I think is some kind of log file on AWS Aurora. They are currently being passed through AWS kinesis into s3 via AWS Firehose. The log in s3 looks like this:
{"type":"DatabaseActivityMonitoringRecords","version":"1.0","databaseActivityEvents":"AYADeOC+7S/mFpoYLr17gZCXuq8AXwABABVhd3MtY3J5cHRvLXB1YmxpYy1rZXkAREFvbjhIZ01uQTVpVHlyS0l3NnVIOS9xdXF3OWEza0xZV0c2QXYzQmtWUFI2alpIK2hsczNwalAyTTIzYnpPS2RXUT09AAEAAkJDABtEYXRhS2V5AAAAgAAAAAwzb2YKNe4h6b2CpykAMLzY7gDftUKUr3QxmxSzylw9qCRxnGW9Fn1qL4uKnbDV/PE44WyOQbXKGXv9s8BxEwIAAAAADAAAEAAAAAAAAAAAAAAAAAC+gU55u4hvWxW1RG/FNNSJ/////wAAAAEAAAAAAAAAAAAAAAEAAACtbmBmDwZw2/1rKiwA4Nyl7cm19/RcHhCpMMwbOFFkZHKL/bvsohf5T+yM9vNxCgAi2qTUIEe17VA5bJ0eCcNAA9mb6Ys+PR1w7QhKrQsHHTBC2dhJ4ELwpXamGRmPLga5Dml2rOveA59YefcJ4PhrqztZXfrS8fBYJ3HgBWHY9nPh1jdyinjQAl61hQrz2LPII85zlqAWTNeL2pXwaRdtGdYeIXXoh4VsoV3Q18Hj/uOQzTIbT8EJvwnk0gj8AGcwZQIxAJNuoCJhHPUfbkk0fHF6HYz1STIc4HX2HOl0qSIHqwpgtQK6BMa3YlPI9hNwhB8x+AIwWDY0bMjuLRGQgjjBv5z1xPpZQ+pMZ4K6m9JaNBFVKxZTvqDL1z7lrV0rlbZThad+","key":"AQIDAHhQgnMAiP8TEQ3/r+nxwePP2VOcLmMGvmFXX8om3hCCugE7IUxSH/eJBEKvnkYoNIqFAAAAfjB8BgkqhkiG9w0BBwagbzBtAgEAMGgGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMQIX97gE5ioBR1+nnAgEQgDuDX2B2T7nOxjKDyL31+wHJb0pwkCeaU7CwA6BwIkiT7FmhMB71XgvCVrY9C9ABUtc1e5J7QIfsVB214w=="}
I think a KMS key is being used to encrypt that log file. How do I decrypt it? Is there working sample code somewhere? Also, more importantly, the Aurora database I'm using is a test database with no activity (no inserts, selects, updates). Why are there so many logs? Why are there so many databaseActivityEvents. They seem to be getting written to s3 every minute of the day.
Yes it uses RDS Activity stream KMS key (ActivityStreamKmsKeyId) for encrypting the log event and also base64 encoding. You will have to make use of AWS cryptographic SDKs to decrypt the key and the log event.
For reference see below their the sample java and python versions:
Processing a Database Activity Stream using the AWS SDK
In your firehose pipeline you can add transformation with Lambda step and do this decryption in your lambda function.
Why there are so many events in idle postgres RDS cluster? They are heartbeat events.
When you decrypt and take a look at the actual activity event json, it has type field which can be either be record or heartbeat. Events with type record are the user activity generated ones.
I have 10 applications they have same logic to write the log on a text file located on the application root folder.
I have an application which reads the log files of all the applicaiton and shows details in a web page.
Can the same be achieved on Windows Azure? I don't want to use the 'DiagnosticMonitor' API's. As I cannot change logging logic of application.
Thanks,
Aman
Even if technically this is possible, this is not advisable as the Fabric Controller can re-create any role at a whim (well - with good reasons, but unpredictable none-the-less) and so whenever this happens you will lose any files stored locally on a role.
So - primarily you should be looking for a different place to store those logs, and there are many options, but all require that you change the logging logic of the application.
You could do this, but aside from the issue Yossi pointed out (the log would be ephemeral; it could get deleted at any time), you'd have a different log file on each role instance (VM). That means when you hit your web page to view the log, you'd see whatever happened to be on the log on that particular VM, instead of what you presumably want (a roll-up of the log files across all VMs).
Windows Azure Diagnostics could help, since you can configure it to copy log files off to blob storage (so no need to change the logging). But honestly I find Diagnostics a bit cumbersome for this. It will end up creating a lot of different blobs, and you'll have to change the log viewer to read all those blobs and combine them.
I personally would suggest writing a separate piece of code that monitors the log file and, for each new line, stores the line as an entity (row) in table storage. This bit of code could be launched as a startup task and just run continuously as a separate process (leaving everything else unchanged). Then modify the log viewer to read the last n entities from table storage and display them.
(I'm assuming you can modify the log viewer even if you can't modify the apps that log to the file.)
What about writing logs to something like azure storage table? Just need to define unique ParitionKey/RowKey, then you can easily retrieve the log for the web page.