Snowflake Not Accepting File Format In Bulk Load - etl

I am creating some new ETL tasks for our data pipeline. We have currently have several hundred loading data from various S3 buckets.
So it would go like this:
create or replace stage ETL_STAGE url='s3://bucketname/'
file_format = csv_etl;
create or replace file format csv_etl
type = 'CSV'
field_delimiter = ','
skip_header = 1
FIELD_OPTIONALLY_ENCLOSED_BY='"'
copy into db.schema.table
from #ETL_STAGE/Usage
pattern='/.*[.]csv'
on_error = 'continue'
However, whenever I use this my file format is not only not not escaping the enclosed double quotes it is not even skipping the header so I get this:
Pretty perplexed by this as I am 99% certain the formatting options are correct here.
+-------------------+----------+----------------+---------------------+-------------------+
| "Usage Task Name" | "Value" | "etl_uuid" | "etl_deviceServer" | "etl_timestamp" |
| "taskname" | "0" | "adfasdfasdf" | "hostserverip" | "2020-04-06 2124" |
+-------------------+----------+----------------+---------------------+-------------------+

Run below command by including file_format. This applied the file format while loading file:
copy into db.schema.table
from #ETL_STAGE/Usage
pattern='/.*[.]csv'
on_error = 'continue'
file_format = csv_etl;

Related

How can I source Terraform HCL variables in bash?

I have Terraform variables defined like
variable "location" {
type = string
default = "eastus"
description = "Desired Azure Region"
}
variable "resource_group" {
type = string
default = "my-rg"
description = "Desired Azure Resource Group Name"
}
and potentially / partially overwritten in terraform.tfvars file
location = "westeurope"
and then defined variables as outputs e.g. a file outputs.tf:
output "resource_group" {
value = var.resource_group
}
output "location" {
value = var.location
}
How can I "source" the effective variable values in a bash script to work with these values?
One way is to use Terraform output values as JSON and then an utility like jq to convert and source as variables:
source <(terraform output --json | jq -r 'keys[] as $k | "\($k|ascii_upcase)=\(.[$k] | .value)"')
note that output is only available after executing terraform plan, terraform apply or even a terraform refresh
If jq is not available or not desired, sed can be used to convert Terraform HCL output into variables, even with upper case variable names:
source <(terraform output | sed -r 's/^([a-z_]+)\s+=\s+(.*)$/\U\1=\L\2/')
or using -chdir argument if Terraform templates / modules are in another folder:
source <(terraform -chdir=$TARGET_INFRA_FOLDER output | sed -r 's/^([a-z_]+)\s+=\s+(.*)$/\U\1=\L\2/')
Then these variables are available in bash script:
LOCATION="westeurope"
RESOURCE_GROUP="my-rg"
and can be addressed as $LOCATION and $RESOURCE_GROUP.

Export text ouput into csv format ready for insert into databases using Powershell

I wish to pipe aws cli output which appears on my screen as text output from a powershell session into a text file in csv format.
I have researched the Export-CSV cmdlet from articles such as the below:
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-7.1
I cannot see how to use this to help me with my goal. From my testing, it only seems to work with specific windows programs, not general text output.
An article on this site shows how you can achieve my goal with unix commands, by replacing spaces with commas.
Output AWS CLI command with filters to CSV without jq
The answer with unix is to use sed at the end of the command like so:
aws rds describe-db-instance-automated-backups --query 'DBInstanceAutomatedBackups[*].{ARN:DBInstanceArn,EarliestTime:RestoreWindow.EarliestTime,LatestTime:RestoreWindow.LatestTime}' --output text | sed -E 's/\s+/,/g'
Export-csv` appears to not be able to do this.
Does anyone know how I might replicate what sed is doing here with powershell?
Here is an example of the output that I would like in csv format:
arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod 2019-03-03T09:54:29.402Z 2019-03-05T01:25:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:xyz-prod-rds-golf 2019-03-01T09:04:31.477Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-stardb 2019-02-01T09:07:30.648Z 2019-03-05T01:27
:20Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-domaindb 2019-02-02T09:04:30.771Z 2019-03-05T01:28
:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-ctz-prod-rds-datavault 2019-02-26T14:14:30.254Z 2019-03-05T01:29
:13Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-gcp-prod-rds-datavault 2019-02-01T14:05:40.456Z 2019-03-05T01:31
:05Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-conformed-datavault-prod 2019-02-02T14:06:26.050Z 2019-03-
05T01:27:02Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-dqm-datavault-prod 2019-02-01T14:12:05.286Z 2019-03-05T01:26
:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-prod-dgc-cde-lineage 2019-03-02T09:54:29.053Z 2019-03-05T01:29
:11Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-rec-prod 2019-02-02T22:09:00.673Z 2019-03-05T01:29:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-serve-prod 2019-03-02T09:54:20.729Z 2019-03-05T01:30:21Z
It's possible that you are working with a tab delimited text file, with no headers.
The tab separator can look like multiple spaces when it is displayed on your screen.
If this is the case, If so, you can actually read this file with import-csv, but you have to use the -header parameter to supply your own field names, and the -delimiter character to use tab as the delimiter. The tab character has to be specified using the backtick escape mechanism.
For details, see the accepted answer to this question.
If you have control over your data feed, there is an alternative. The aws cli interface has an option to format the output in JSON format. That format will be much easier to import into Powershell in a form you can use.
Edit:
The following script uses the mockup provided by Theo, except that the multiple spaces have been replaced by a tab character. It uses ConvertFrom-Csv rather than Import-Csv, but it's the same idea:
$awsReturn = #"
arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod 2019-03-03T09:54:29.402Z 2019-03-05T01:25:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:xyz-prod-rds-golf 2019-03-01T09:04:31.477Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-stardb 2019-02-01T09:07:30.648Z 2019-03-05T01:27:20Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-domaindb 2019-02-02T09:04:30.771Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-ctz-prod-rds-datavault 2019-02-26T14:14:30.254Z 2019-03-05T01:29:13Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-gcp-prod-rds-datavault 2019-02-01T14:05:40.456Z 2019-03-05T01:31:05Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-conformed-datavault-prod 2019-02-02T14:06:26.050Z 2019-03-05T01:27:02Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-dqm-datavault-prod 2019-02-01T14:12:05.286Z 2019-03-05T01:26:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-prod-dgc-cde-lineage 2019-03-02T09:54:29.053Z 2019-03-05T01:29:11Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-rec-prod 2019-02-02T22:09:00.673Z 2019-03-05T01:29:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-serve-prod 2019-03-02T09:54:20.729Z 2019-03-05T01:30:21Z
"#
$myarray = $awsreturn | ConvertFrom-Csv -header "Prod","DateStart","DateEnd" -delimiter "`t"
$myarray | Format-Table
$myarray | gm
When I ran it in my environment, it produced the following:
Prod DateStart DateEnd
---- --------- -------
arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod 2019-03-03T09:54:29.402Z 2019-03-05T01:25:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:xyz-prod-rds-golf 2019-03-01T09:04:31.477Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-stardb 2019-02-01T09:07:30.648Z 2019-03-05T01:27:20Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-domaindb 2019-02-02T09:04:30.771Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-ctz-prod-rds-datavault 2019-02-26T14:14:30.254Z 2019-03-05T01:29:13Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-gcp-prod-rds-datavault 2019-02-01T14:05:40.456Z 2019-03-05T01:31:05Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-conformed-datavault-prod 2019-02-02T14:06:26.050Z 2019-03-05T01:27:02Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-dqm-datavault-prod 2019-02-01T14:12:05.286Z 2019-03-05T01:26:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-prod-dgc-cde-lineage 2019-03-02T09:54:29.053Z 2019-03-05T01:29:11Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-rec-prod 2019-02-02T22:09:00.673Z 2019-03-05T01:29:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-serve-prod 2019-03-02T09:54:20.729Z 2019-03-05T01:30:21Z
TypeName: System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
DateEnd NoteProperty string DateEnd=2019-03-05T01:25:53Z
DateStart NoteProperty string DateStart=2019-03-03T09:54:29.402Z
Prod NoteProperty string Prod=arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod
Lets assume the data returned looks like this mockup (in the question it is strangely formatted):
$awsReturn = #"
arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod 2019-03-03T09:54:29.402Z 2019-03-05T01:25:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:xyz-prod-rds-golf 2019-03-01T09:04:31.477Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-stardb 2019-02-01T09:07:30.648Z 2019-03-05T01:27:20Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-domaindb 2019-02-02T09:04:30.771Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-ctz-prod-rds-datavault 2019-02-26T14:14:30.254Z 2019-03-05T01:29:13Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-gcp-prod-rds-datavault 2019-02-01T14:05:40.456Z 2019-03-05T01:31:05Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-conformed-datavault-prod 2019-02-02T14:06:26.050Z 2019-03-05T01:27:02Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-dqm-datavault-prod 2019-02-01T14:12:05.286Z 2019-03-05T01:26:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-prod-dgc-cde-lineage 2019-03-02T09:54:29.053Z 2019-03-05T01:29:11Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-rec-prod 2019-02-02T22:09:00.673Z 2019-03-05T01:29:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-serve-prod 2019-03-02T09:54:20.729Z 2019-03-05T01:30:21Z
"#
Then, you can do this:
# Since I don't know if that is one single string or a string array:
if ($awsReturn -isnot [array]) { $awsReturn = $awsReturn -split '\r?\n' }
# write it to csv file
$awsReturn -replace '\s+', ',' | Set-Content -Path 'WhereEver.csv' -PassThru # PassThru also displays on screen
to get a file that can serve as CSV (although it has no headers or quoted fields)
If you want to use Export-CSV to get a csv file with headers and quoted fields, you need to split the lines and output objects.
Something like this:
# Since I don't know if that is one single string or a string array:
if ($awsReturn -isnot [array]) { $awsReturn = $awsReturn -split '\r?\n' }
# write it to csv file (without headers or quotes values)
$awsReturn | ForEach-Object {
$data = $_ -split '\s+' # in this case we know we have 3 fields
[PsCustomObject]#{
Prod = $data[0]
DateStart = $data[1]
DateEnd = $data[2]
}
} | Export-Csv -Path 'WhereEver.csv' -NoTypeInformation
The WhereEver.csv file will then look like this:
"Prod","DateStart","DateEnd"
"arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod","2019-03-03T09:54:29.402Z","2019-03-05T01:25:53Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:xyz-prod-rds-golf","2019-03-01T09:04:31.477Z","2019-03-05T01:28:40Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-stardb","2019-02-01T09:07:30.648Z","2019-03-05T01:27:20Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-domaindb","2019-02-02T09:04:30.771Z","2019-03-05T01:28:40Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:-ctz-prod-rds-datavault","2019-02-26T14:14:30.254Z","2019-03-05T01:29:13Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:-gcp-prod-rds-datavault","2019-02-01T14:05:40.456Z","2019-03-05T01:31:05Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:prod-conformed-datavault-prod","2019-02-02T14:06:26.050Z","2019-03-05T01:27:02Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:prod-dqm-datavault-prod","2019-02-01T14:12:05.286Z","2019-03-05T01:26:53Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:prod-prod-dgc-cde-lineage","2019-03-02T09:54:29.053Z","2019-03-05T01:29:11Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:prod-rec-prod","2019-02-02T22:09:00.673Z","2019-03-05T01:29:40Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:-serve-prod","2019-03-02T09:54:20.729Z","2019-03-05T01:30:21Z"

Wrong chars using PDO ODBC connection to DB2 on Windows

I’m setting up a new server, and I'm updating some old script (PHP 5+) to PHP 7.
I'm connecting to a DB2 database via PDO ODBC and reading a CHAR field with CCSID 870 and saving it on a MySQL mediumtext field in a table with CHARSET=utf8. But i got wrong characters on MySQL database and event in PHP console.
I tried to switch to odbc_connect() like the old script but the results was the same.
Even saving the field in a txt file the results is the same.
utf8_encode & utf8_decode doesn't help.
Here an example of code:
$as = new PDO("odbc:MYODBC",$user, $psw);
$as->setAttribute(PDO::ATTR_DEFAULT_FETCH_MODE, PDO::FETCH_ASSOC);
$res = $as->query("SELECT FIELD FROM MYTABLE");
$rows = $res->fetchAll();
$mysql = new PDO("mysql:host=srvip;dbname=mydbname;charset=utf8",$user, $psw);
$mysql->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$mysql->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$mysql->setAttribute(PDO::ATTR_DEFAULT_FETCH_MODE, PDO::FETCH_ASSOC);
$ins = $mysql->prepare("INSERT INTO my_MySQL_TABLE (FIELD) VALUES (?)");
$ins->execute(array(trim($rows[0]["FIELD"])));
I expect the results on MySQL to be Wąż, but the actual output is W?? or WÈØ.
Edit on 2019-06-06
| Source | String | HEX |
|------------------|--------|------------|
| DB2 | Wąż | E6A0B2 |
| MySQL | W?? | 573F3F |
| MySQL C/P Insert | Wąż | 57C485C5BC |
The last version is a simple copy-paste to MySQL using a GUI
Edit on 2019-06-07
C:\Users\ME\>echo %DB2CODEPAGE%
1208
C:\Users\ME\>acs.exe /PLUGIN=cldownload /system=MYSYS /sql="SELECT FIELD as char,HEX(FIELD) as hex FROM TABLE" /display=1
CHAR HEX
W?? E6A0B2
If I use /clientfile=test.txt instead of /display=1 Notepad++ show me the file as UTF-8

PowerShell csv character replacement

I manage to generate a csv through PowerShell script on collecting a group of server disk info, but the result output on the csv file require some data massage.
below will be the script for ref:
foreach($pc in $comp)
$diskvalue += Get-WmiObject #Params | Select #{l='drives';e='DeviceID'}, #{l='server',e='SystemName'}, #{Name=”size(MB)”;Expression={“{0:N1}” -f($_.size/1mb)}}, #{Name=”freespace(MB)”;Expression={“{0:N1}” -f($_.freespace/1mb)}}, #{Name=”UsedSpace(MB)”;Expression={“{0:N2}” -f(($_.size - $_.FreeSpace)/1mb)}}
$diskvalue | Export-Csv C:\disk_info\DiskReport.csv -NoTypeInformation
The output csv file on "drives" column will contain:
C:
Yet I would like to remove the ":" at the back that data output.
C
Change:
#{l='drives';e='DeviceID'}
to
#{l='drives';e={"$($_.DeviceID)".Trim(": ")}
Trim(": ") will remove any whitespace and : characters from the DeviceID string

How to read length of video file from cmd

I have a bunch of mp4 files in a folder and I want to create a text file with all the names and the length of the files as in:
01_Welcome.mp4 00.01.23
02_Tools.mp4 00.03.12
I know how to read the names of the files buy how do I get the length attribute? When I click a file the length appears in the status bar, so there should be a way to read that property. And I would like to do it from the command line, not through a third-party package.
In ubuntu there you can
ffmpeg -i myvideo 2>&1 | grep Duration | cut -d ' ' -f 4 | sed s/,//
But in Windows MediaInfo is the one option
In Windows' PowerShell you can do the following to extract length of a single media file:
$Folder = 'C:\Path\To\Parent\Folder'
$File = 'Video.mp4'
$LengthColumn = 27
$objShell = New-Object -ComObject Shell.Application
$objFolder = $objShell.Namespace($Folder)
$objFile = $objFolder.ParseName($File)
$Length = $objFolder.GetDetailsOf($objFile, $LengthColumn)
Iteration over the folder content is left as an exercise for the reader.
Source

Resources