Specifying path of CSV file in an SQL script (H2 database) - h2

I have an H2 database file which is created by importing tables from CSV files, using the CSVREAD command provided by H2 (described here: http://www.h2database.com/html/tutorial.html#csv). The CSV files are just tables in text format, and are generated by Perl scripts. Every time I generate new versions of the CSV files, I have to delete the database tables and import the CSV files again to the database, using the CSVREAD command. This database is to be used by a webserver. But, instead of uploading the database file (which gets too big), I prefer to upload the CSV files to the server, and then execute a script (with the RUNSCRIPT command or from the command line, using the H2 RunScript tool), so that the database tables are generated inside the server.
My problem is that, to use the CSVREAD command inside a script, I have to specify the absolute path of the CSV files. Instead of writing the absolute path inside the script file, it would be convenient to be able to pass the absolute path as an argument to the RUNSCRIPT command, so that I don't have to hardcode it inside the script. Then, in the script file, I would just write a placeholder that would get replaced by the argument value. How can I do this?

Related

Import a CSV file which its name is changing everyday into SQLite without any external tool

I would like to import a CSV file with a timestamp in its filename. I want to do this by running the SQLite script file without any external modification (like using bash or another programming tool to copy the target CSV file to somewhere and rename it). The file name changes every day. So each day a different CSV file should be imported to the SQLite database file. The file name pattern looks like below:
abc_17-07-2021.csv
abc_18-07-2021.csv
abc_19-07-2021.csv
I would like to have a solely SQLite file script and run it on SQLite with it.
e.g. $ sqlite3 < example.sql
if you do not want an external tool to use with your SQlite file, a simple way would be for the .csv to be exported to a folder where no other non relevant to this scenario file will be added to the folder, and when you import you should be able to use a wildcard abc_*.csv to import.

Uploading multiple CSVs from local source to GBQ using command line

I'm putting together a for loop on Google Cloud SDK Shell that will upload every CSV from the current directory (on my local computer) to a separate Google BigQuery table, all in the same dataset. Also, I want the created tables in GBQ to have the same name of their corresponding CSV files (except the .csv part).
I was actually able to do all that using the following command line, expect that it appends all CSVs in the same table not in separate tables.
for %d in (*.csv); do set var1=%d & bq load --autodetect --source_format=CSV "DatasetName.%var1:~0,-5%" %d
Hint: it seems to me that the variable "var1" gets updated in each loop, but the bq load function doesn't use the updated values, it keeps the same original value until the loop ends regardless.
Current Output:
Even though I was not able to reproduce a BigQuery load from my local environment to BigQuery. I was capable to reproduce this case uploading .csv files from Google Shell to BigQuery.
I tried running your code but my attempts were unsuccessful. Thus, I created the following bash script to map and upload all the .csv files to BigQuery with a bq load command, described here.
#!/bin/bash
echo "Starting the script"
for i in *.csv;
do
echo ${i%.csv} " loading";
bq load --autodetect --source_format=CSV project_id:dataset.Table_${i%.csv} ./$i;
echo ${i%.csv} " was loaded"
done
Notice that, the script maps only the .csv files within the directory it is located. Also, ${i%.csv} returns only the filename without the extension, which is used to name the destination table. On the other hand, $i returns the whole filename including the .csv, so it is used to point to the source file in the bq load command.
About the bq command, the --autodetect flag was used in order to auto detect the schema of each table.
Furthermore, since this load job is from a local data source, it is necessary to specify the project id in the destination's table path, here: project_id:dataset.Table_${i%.csv}.
As a bonus information, you can also upload you data to a Google Cloud Bucket and upload all the files to BigQuery using wild cards, Python script with a loop or Dataflow( streaming or batch) depending on you needs.

Unable to Pass option using getopts to oozie shell Action

I created a script in shell and passing the arguments using getopts methods in my script like this:
sh my_code.sh -F"file_name"
where my_code.sh is my unix script name and file_name is the file I am passing to my script using getopts.
This is working fine when I am invoking my script from the command line.
I want to invoke the same script by using oozie, but I am not sure how can I do it.
I tried passing the argument to the "exec" as well as "file" tag in the xml
When I am trying passing argument in exec tag, it was giving "JavaNullPoint" Expection
exec TAG
<exec>my_code.sh -F file_name</exec>
file TAG
<file>$/user/oozie/my_code.sh#$my_code.sh -F file_name</file>
When I am trying passing argument in File Tag, I was getting error, "No such File or directory". It was searching the file_name in /yarn/hadoop directory.
Can anyone please suggest how can I achieve this by using oozie?
You need to create a lib/ folder as part of your workflow where Oozie will upload the script as part of its process. This directory should also be uploaded to the oozie.wf.application.path location.
The reason this is required is that Oozie will run on any random YARN node, and pretend that you had a hundred node cluster, and you would otherwise have to ensure that every single server had /user/oozie/my_code.sh file available (which of course is hard to track). When this file can be placed on HDFS, every node can download it locally.
So if you put the script in the lib directory next to the workflow xml that you submit, then you can reference the script by name directly rather than using the # syntax
Then, you'll want to use the argument xml tags for the opts
https://oozie.apache.org/docs/4.3.1/DG_ShellActionExtension.html
I have created lib/ folder and uploaded it to oozie.wf.application.path location.
I am able to pass files to my shell action.

Storing output of a pig script in a file whose filename is dynamically generated

While executing the pig script, if you have a step where you store the output into a file. Can we generate the name of the file dynamically without giving a fixed filename.
If it is possible to create the name of the file dynamically, can we keep the current date as the name of the file?
Thanks,
Regards,
Dheeraj Rampally.
You could use parameter substitution either through pig -param date=value or specify date=value in pig.params. You will be able to access the specified value in the pig script through $date

Jmeter CSV data config doesn't identify the userid and password in csv file

I have created a CSV file and placed in /bin directory and added csv data config file with paramaters UID and PWD but when I run the test it does not show the user id and password. Despite, in the csv file it shows as txtlogin=<EOF>&txtpassword=<EOF>.
Why isn't it picking the given userids and password?
You should place your csv file where you saved your test script file. Normally test script can be saved as with jmx extension and placed your csv file in the same directory of it.
Make sure you have put your CSV file in parallel of your saved result file, I have solved my problem with this while getting EOF error. :)
See this how to read and refer variables from csv.
And look into %JMETER_HOME%/bin/jmeter.log if something goes wrong.
Make sure you have the fully qualified path of the CSV file to ensure JMeter can find it.
I was getting in my variable values when I had ./mycsvfile.csv as the path. I took the ./ off the front and it started working. My csv file is in the same directory as my .jmx test script file.
I just ran into the same issues. It seems to be a bug.
Don't name your .csv file "user.csv" or "users.csv". They seem to be reserved names.
I changed the name of the file and then it worked.

Resources