Find a column name from the list of parquet files in a folder in Synapse using SQL - parquet

I am using Azure synapse and have a folder with multiple folders with some parquet files in each folder. When you right click on a parquet file you get the option to select top 100 rows from that file.
I want to write a query there- If I have a column name and I want to find which folder has that column name how do I do that in SQL?

Related

Transfer xml data to Oracle table by column or fields by using talend

I am using Talend Studio with objects tFileInputDelimited row1(Main) to tOracleOutput what I want is to transfer the data in xml file to Oracle table.
I want to transfer the values of the last two columns (product_label and email_order) of my excel file to the product table which has this column structure (PRODUCT_ID,PRODUCT_CODE,PRODUCT_LABEL,EMAIL_COMAND
ORDER_ID).
Also, I want to process this condition if a row in my excel file contains an empty product code column then is not insert the column values product_label and email_command.
XML File to load
Product table
enter image description here
what is the proper settings in tFileInputDelimited , or do I need to use other tools?
Refer this image for your reference
Use tFileInputXMl file and filter the records by using tFilterRow and then connect with tOracleOutput

Write to csv file from deltalake table in databricks

How do I write the contents of a deltalake table to a csv file in Azure databricks?
Is there a way where I do not have to first dump the contents to a dataframe? https://docs.databricks.com/delta/delta-batch.html
While loading the data to the Delta table, I used an ADLS Gen2 folder location for the creation of the versioned parquet files.
The conversion of parquet to CSV could then be accomplished using the Copy Data Activity in ADF.
You can simply use Insert Overwrite Directory.
The syntax would be
INSERT OVERWRITE DIRECTORY <directory_path> USING <file_format> <options> select * from table_name
Here you can specify the target directory path where to generate the file. The file could be parquet, csv, txt, json, etc.

Create a hive table in my user location using the hdfs files in other location

I have following requirements:
- I want to Create a hive table in my user location using the hdfs files in other location.
- I want to copy and not move the files as it is shared by other users.
- The files are already stored by date folder. Each day a new folder will be created and there will be 'n' number of csv files inside the day folder. I want my table to be holding those files data partitioned by date field
- After one time table being created, i want the table to be updated everyday with that day's files

Creating HBase table for files in HDFS directory

I am trying to load all files data in a HDFS directory into HBase existed table.Can you please share me how to load all files data and incremental data into HBase table.
I created HBase table as
hbase>create 'sample','cf'
I have to copy
hdfs://ip:port/user/test
into sample hbase table.please suggest me any solution.
Answer 1:(possible)
ImportTSV, if you try providing /user/hadoop/ directory path only instead of full file path, it should process all files with in that dir.
Answer 2:(seems not possible)
The special column name HBASE_ROW_KEY is used to designate that this
column should be used as the row key for each imported record. You
must specify exactly one column to be the row key, and you must
specify a column name for every column that exists in the input data.

How to Delete a 000000 file in S3 bucket in AWS using a hive script

I've created a working hive script to backup data from dynamodb to a file in S3 bucket in AWS. A code snippet is shown below
INSERT OVERWRITE DIRECTORY '${hiveconf:S3Location}'
SELECT *
FROM DynamoDBDataBackup;
When I run the hive script it probably deletes the old file and creates a new file but if there are errors in the backup process I guess it rolls back to the old data because the file is still there when an error has occurred.
Each day we want to make a backup but I need to know if an error has occurred so I want to delete the previous days backup first then create a backup. If it fails then there is no file in the folder which we can automatically detect.
The filename gets automatically named 000000
In my hive script I've tried unsuccesfully:
delete FILE '${hiveconf:S3Location}/000000'
and
delete FILE '${hiveconf:S3Location}/000000.0'
Perhaps the filename is wrong. I haven't set any permissions on the file.
I've just tried this but fails at STORED
SET dynamodb.endpoint= ${DYNAMODBENDPOINT};
SET DynamoDBTableName = "${DYNAMODBTABLE}";
SET S3Location = ${LOCATION};
DROP TABLE IF EXISTS DynamoDBDataBackupPreferenceStore;
CREATE TABLE IF NOT EXISTS DynamoDBDataBackupPreferenceStore(UserGuid STRING,PreferenceKey STRING,DateCreated STRING,DateEmailGenerated STRING,DateLastUpdated STRING,ReceiveEmail STRING,HomePage STRING,EmailFormat STRING,SavedSearchCriteria STRING,SavedSearchLabel STRING),
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
LOCATION '${hiveconf:S3Location}',
TBLPROPERTIES ("dynamodb.table.name" = ${hiveconf:DynamoDBTableName}, "dynamodb.column.mapping" = "UserGuid:UserGuid,PreferenceKey:PreferenceKey,DateCreated:DateCreated,DateEmailGenerated:DateEmailGenerated,DateLastUpdated:DateLastUpdated,ReceiveEmail:ReceiveEmail,HomePage:HomePage,EmailFormat:EmailFormat,SavedSearchCriteria:SavedSearchCriteria,SavedSearchLabel:SavedSearchLabel");
You manage files directly using Hive Table commands
Firstly if you want to use external data controlled outside Hive use the External Command when creating the table
set S3Path='s3://Bucket/directory/';
CREATE EXTERNAL TABLE IF NOT EXISTS S3table
( data STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION ${hiveconf:S3Path};
You can now insert data into this table
INSERT OVERWRITE TABLE S3table
SELECT data
FROM DynamoDBtable;
This will create text files in S3 inside the directory location
Note depending on the data size and number of reducers there may be multiple text files.
Files names are also random GUID element i.e. 03d3842f-7290-4a75-9c22-5cdb8cdd201b_000000
DROP TABLE S3table;
Dropping the table just breaks the link to the files
Now if you want to manage the directory you can create a table that will take control of the S3 directory (Note there is no external command)
CREATE TABLE IF NOT EXISTS S3table
( data STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION ${hiveconf:S3Path};
If you now issue a drop table command all files in the folder are delete immediately
DROP TABLE S3table;
I suggest you create a non external table then drop it and carry on with the rest of your script. If you encounter errors you will have a blank directory after the job finishes
Hope this covers what you need

Resources