Uploading multiple CSVs from local source to GBQ using command line - bash

I'm putting together a for loop on Google Cloud SDK Shell that will upload every CSV from the current directory (on my local computer) to a separate Google BigQuery table, all in the same dataset. Also, I want the created tables in GBQ to have the same name of their corresponding CSV files (except the .csv part).
I was actually able to do all that using the following command line, expect that it appends all CSVs in the same table not in separate tables.
for %d in (*.csv); do set var1=%d & bq load --autodetect --source_format=CSV "DatasetName.%var1:~0,-5%" %d
Hint: it seems to me that the variable "var1" gets updated in each loop, but the bq load function doesn't use the updated values, it keeps the same original value until the loop ends regardless.
Current Output:

Even though I was not able to reproduce a BigQuery load from my local environment to BigQuery. I was capable to reproduce this case uploading .csv files from Google Shell to BigQuery.
I tried running your code but my attempts were unsuccessful. Thus, I created the following bash script to map and upload all the .csv files to BigQuery with a bq load command, described here.
#!/bin/bash
echo "Starting the script"
for i in *.csv;
do
echo ${i%.csv} " loading";
bq load --autodetect --source_format=CSV project_id:dataset.Table_${i%.csv} ./$i;
echo ${i%.csv} " was loaded"
done
Notice that, the script maps only the .csv files within the directory it is located. Also, ${i%.csv} returns only the filename without the extension, which is used to name the destination table. On the other hand, $i returns the whole filename including the .csv, so it is used to point to the source file in the bq load command.
About the bq command, the --autodetect flag was used in order to auto detect the schema of each table.
Furthermore, since this load job is from a local data source, it is necessary to specify the project id in the destination's table path, here: project_id:dataset.Table_${i%.csv}.
As a bonus information, you can also upload you data to a Google Cloud Bucket and upload all the files to BigQuery using wild cards, Python script with a loop or Dataflow( streaming or batch) depending on you needs.

Related

Import a CSV file which its name is changing everyday into SQLite without any external tool

I would like to import a CSV file with a timestamp in its filename. I want to do this by running the SQLite script file without any external modification (like using bash or another programming tool to copy the target CSV file to somewhere and rename it). The file name changes every day. So each day a different CSV file should be imported to the SQLite database file. The file name pattern looks like below:
abc_17-07-2021.csv
abc_18-07-2021.csv
abc_19-07-2021.csv
I would like to have a solely SQLite file script and run it on SQLite with it.
e.g. $ sqlite3 < example.sql
if you do not want an external tool to use with your SQlite file, a simple way would be for the .csv to be exported to a folder where no other non relevant to this scenario file will be added to the folder, and when you import you should be able to use a wildcard abc_*.csv to import.

How to download multiple files in multiple sub-directories using curl?

I am downloading multiple files using curl. The base URL for all the files is the same like
https://mydata.gov/daily/2017
The data in these directories are further grouped by date and file type. So the first data that I need has this directory
https://mydata.gov/daily/2017/001/17d/Roger001.gz
The second data being
https://mydata.gov/daily/2017/002/17d/Roger002.gz
I need to download up until the data for the last day of 2017 which is
https://mydata.gov/daily/2017/365/17d/Roger365.gz
How can I use curl or any other similar tool to download all the files to a single local folder, preferably adopting the original file names?
use for f in {001..365}; do curl https://mydata.gov/daily/2017/"$f"/17d/Roger"$f".gz -o /your-directory/Roger"$f".gz; done in bash terminal.
replace your-directory with your directory which you want to save files.

How to extract date from filename in batch, and unzip to multiple directories using batch/7z

I am trying to code a script to automatically process some of our daily ftp files.
I have already coded the files to download from the source ftp using WinSCP and calling it in a .bat file, and would ideally like to call it within the same bat. Scripting Language does not matter, as long as I can run/call it from the original batch.
I need will extract the date from a filename, and unzip the contents into corresponding folders. The source file is delivered automatically daily via FTP, and the filename is:
SOFL_CLAIM_TC201702270720000075.zip
The bolded section is the date that I would like to extract.
The contents of the .zip include two types of content, multiple PDFs and a .dat file.
For the supplied date of 20170227, the pdfs need to get extracted to a folder following the format:
\%root%\FNOIs\2017\02-Feb\02-27-2017
At the same time, the .dat file needs to get extracted to multiple folders following the format:
\%root%\Claim Add\2017 Claim Add\02-2017
\%root2%\vendorFTP\VendorFolder
After extracting, I need to move the source zip to
\%root%\Claim Add\2017 Claim Add\02-2017
What is the best way off accomplishing all of this?
I am assuming it would be the for /f batch command, but I am new to batch coding and cannot figure out how to start it from scratch.
I also have 7zip installed, but do not understand how to use the command-line options.
You have asked for a lot in one question, and not shown any code or demonstrated effort on your part.
For the first part, once you have the filename in a variable:
set FILENAME=SOFL_CLAIM_TC201702270720000075.zip
You can get the date part with:
echo %FILENAME:~13,-14%
The syntax: :13,-14 means "Remove the first 13 letters and the last 14 letters." That should leave you with just the date.
When you integrate that into your script, Show Your Code

Bash script behaving differently for different files

I have a bash script that uses awk to process some files that I have downloaded. If I run the script on any of the files it does not work properly. However, if I transfer the contents of a file in a newly created one it seems to work as supposed. Could it have anything to do with the settings of the files?
I have two files file hotel_12313.dat and hotel_99999.dat . The first one is downloaded and the second one is created by me. If I copy the data from the first file into the second one and I execute the script on both of them the output is different.

Specifying path of CSV file in an SQL script (H2 database)

I have an H2 database file which is created by importing tables from CSV files, using the CSVREAD command provided by H2 (described here: http://www.h2database.com/html/tutorial.html#csv). The CSV files are just tables in text format, and are generated by Perl scripts. Every time I generate new versions of the CSV files, I have to delete the database tables and import the CSV files again to the database, using the CSVREAD command. This database is to be used by a webserver. But, instead of uploading the database file (which gets too big), I prefer to upload the CSV files to the server, and then execute a script (with the RUNSCRIPT command or from the command line, using the H2 RunScript tool), so that the database tables are generated inside the server.
My problem is that, to use the CSVREAD command inside a script, I have to specify the absolute path of the CSV files. Instead of writing the absolute path inside the script file, it would be convenient to be able to pass the absolute path as an argument to the RUNSCRIPT command, so that I don't have to hardcode it inside the script. Then, in the script file, I would just write a placeholder that would get replaced by the argument value. How can I do this?

Resources