How to exclude the total row information from VSQL output - vertica

I am using VSQL to extract data from a table in CSV format using this command:
vsql -h [host_address] -d [db_name] -u [user_name] -w [password] -A -F , -t -f script_to_extract_data.sql -o output.csv
However, it outputs column headers, data rows AND an extra row that indicates the total number of rows in the table like this:
Geography,Product,Campaign,VariableName,Outlet,Creative,Period,VariableValue
Geo_00000,Product,BABY,web_sales,Total,Total,2016-10-03,24.06
Geo_00000,Product,BABY,web_sales,Total,Total,2016-10-08,67.17
Geo_00000,Product,BABY,web_sales,Total,Total,2016-10-17,404.67
(3 rows)
If I exclude the -t option, it'll output just the data like this:
Geo_00000,Product,BABY,web_sales,Total,Total,2016-10-03,24.06
Geo_00000,Product,BABY,web_sales,Total,Total,2016-10-08,67.17
Geo_00000,Product,BABY,web_sales,Total,Total,2016-10-17,404.67
I would like the column headers AND the data, but not the total row number like this:
Geography,Product,Campaign,VariableName,Outlet,Creative,Period,VariableValue
Geo_00000,Product,BABY,web_sales,Total,Total,2016-10-03,24.06
Geo_00000,Product,BABY,web_sales,Total,Total,2016-10-08,67.17
Geo_00000,Product,BABY,web_sales,Total,Total,2016-10-17,404.67
From reading through VSQL commandline options, I don't think I have a way to restrict the total number of rows not show?? Anyone who is experienced with using VSQL via commandline could help me out, I would greatly appreciate the help. Thank you!

Without -t add -P footer=off...this will give you headers without the footer like you want.
vsql -h [host_address] -p [port] -d [db_name] -u [user_name] -w [password] -AP footer=off -F , -f script_to_extract_data.sql -o output.csv

There is no documented way to do this.
You could just inject a select into your script though to print out the header while leaving tuples-only on.
\t
\a
\f ,
\o output.csv
select 'Geography', 'Product', 'Campaign', 'VariableName', 'Outlet', 'Creative', 'Period', 'VariableValue';
select Geography, Product, Campaign, VariableName, Outlet, Creative, Period, VariableValue
from mytable;
\o
And I guess if it really, really bothers you to list the fields twice, you could use a variable.
\set fields Geography,Product,Campaign,VariableName,Outlet,Creative,Period,VariableValue
Then reference :fields in both queries (just use ' around it for the header list). In this case, the header list would just need to be a string, and the delimiter would have to be a , since it would also be used in sql. Just a thought.

Related

Create scheduled query with bq mk command-line tool from a SQL file

I am trying to create a scheduled query using the bq command-line tool storing the query to be executed on a sql file named scheduled_query.sql, but I am not able to have it working. I am checking this documentation.
I want to schedule the following query:
INSERT INTO github_actions_test_dataset.timestamp_to_unix
SELECT
CURRENT_DATETIME("Europe/Madrid") AS timestamp,
UNIX_MILLIS(CURRENT_TIMESTAMP()) AS unix_miliseconds
For this, I have executed the following command with success:
bq mk --transfer_config --display_name='Example Scheduled Query' --target_dataset=github_actions_test_dataset --service_account_name=my-sa#my-project.iam.gserviceaccount.com --data_source=scheduled_query --params='{"query":"INSERT INTO github_actions_test_dataset.timestamp_to_unix SELECT CURRENT_DATETIME(\"Europe/Madrid\") AS timestamp, UNIX_MILLIS(CURRENT_TIMESTAMP()) AS unix_miliseconds"}'
However, instead of having to write the query in the command I want to retrieve it from a .sql file, and feed it to the command. I tried using sed 's/"/\\"/g' scheduled_query.sql in order to escape de " character in my query like this:
bq mk --transfer_config --display_name='Example Scheduled Query' --target_dataset=$DATASET --service_account_name=github-actions-sa#my-project.iam.gserviceaccount.com --data_source=scheduled_query --params='{"query":"'$(sed 's/"/\\"/g' scheduled_query.sql)'"}'
But again received
Too many positional args, still have ['SELECT', 'CURRENT_DATETIME(\\"Europe/Madrid\\")', 'AS', 'timestamp,', 'UNIX_MILLIS(CURRENT_TIMESTAMP())', 'AS', 'unix_miliseconds"}']
The solution may be more related on how to concatenate quoted strings on a bash command. What am I doing wrong? Note that I want to use the bq mk command, not the bk query one.
Try the following approach:
QUERY="$(cat test_sql.sql)"
bq query \
--append_table=true \
--destination_table=elZagales:so_test.timestamp_to_unix \
--display_name='test run' \
--schedule='every 15 minutes' \
--use_legacy_sql=false \
$QUERY
Utilizing this approach you simply just pass it as a query that has been set as a shell variable.
This results in the creation of a scheduled query here:
I finally managed to use the bq mk command after applying a couple transformations to the content of the file:
QUERY=$(tr -s '[:space:]' ' ' < scheduled_query.sql | sed 's/"/\\"/g')
PARAMS='{"query":"'$QUERY'"}'
bq mk --transfer_config --display_name="Example scheduled query" --location=EU --data_source=scheduled_query --schedule='every day 06:45' --params="$PARAMS"
I will explain them:
tr -s '[:space:]' ' ' < scheduled_query.sql : as the query needed to be inserted on a json-like object, the new line characters and tabs where causing a Too many positional args error. To solve it, I substituted any consecutive number of spacing characters with a blank. Most probably there is a better alternative that keeps the original format, but after several days of trying things this worked for me.
sed 's/"/\\"/g' : also, as we are providing the query inside a json we need to scape all the double quotes inside my query so that they don't mess up with the field delimiter ones.
With this I was able to load any query I have tried without problems. The only limitation I have found has been when the query contains a line comment. As we are getting rid of the new line characters, if a line comment appears on the query it will comment out all the lines which were after it, so you will have to manually remove them or apply any other rule that automatically finds and delete them.
BONUS:
I used this piece of code to create a scheduled query from Github Actions. If you came here for the same reason, you can have a look to the logic I implemented to see if it fits your use case:
QUERY_DISPLAYNAME="Example scheduled query"
QUERY=$(tr -s '[:space:]' ' ' < $GITHUB_WORKSPACE/queries/scheduled_query.sql | sed 's/"/\\"/g')
PARAMS='{"query":"'$QUERY'"}'
RESULT=$(bq ls --format=json --transfer_config --transfer_location=eu | jq -r '.[] | select(.displayName=="'"$QUERY_DISPLAYNAME"'")')
if ! [[ "$RESULT" ]];
then
bq mk --transfer_config --display_name="$QUERY_DISPLAYNAME" --location=EU --service_account_name=${{ secrets.SA_NAME }} --data_source=scheduled_query --schedule='every day 06:45' --params="$PARAMS"
else
QUERY_NAME=$(echo $RESULT | jq -r '.name')
bq update --transfer_config --params="$PARAMS" $QUERY_NAME
fi

Storing results from a Postgres query directly into an array in Bash 4?

I'm trying to run a Postgres query like below and store the results into an array:
v2_ids=$(psql $(credstash get database/path/here) -tc "select distinct(user_id) from table where yada yada yada..." )
read -a arr_ids_temp <<< $v2_ids
Is there a better way to do this? It seems that read -a only grabs the first result sometimes and I'm not sure why.
Refer the following code:
IFS=$'\n' res=(`psql -tA database username -c 'select column_name from table_name where blah blah...'`)
We must tell the bash to store the result in array based on '\n' delimiter.
psql options -tA for removing header/footer etc and removing leading and trailing blanks.
${res[0]} contains value of first row and each element in array
contains the value of each row respectively.
Additional tips:
For taking all data in a row into array,
IFS='|' res=(`psql -tA database username -c 'select name1,name2,name3 from table_name where blah blah...'`)
Here I told bash to store the psql result in array based on | delimiter.
That should work fine, but use psql with the options -A (unaligned output mode), -q (suppress informational output) and -t (no column names, headers and footers).

Assigning multiple column's value from oracle query to multiple variables in Unix

I need to read values of 10 columns from oracle query and assign the same to 10 unix variables. The query will return only one record(row).
I tried to append all these columns using a delimiter(;) in the select query and assign the same to a single variable(V) in unix. I thought I could use 'cut' and assign values to all the required 10 variables from V.
But the thing is some of the the columns have special character and it is kind of hard for me to cut out the required details. Sometimes the delimiter(;)itself is present in column value. Also the code is very lengthy.
Is there a better way to assign multiple column values from query to unix variables.?
Also when i read '-e' from query and pass it unix varaible it becomes '?e'. Is there anyway we could solve this.
Thanks.
There are many ways to do it, depending on what you want to achieve. Here is one particular case with ; as a delimiter. However, you can always use the default space delimiter.
variable=$($ORACLE_HOME/bin/sqlplus -s / as sysdba <<EOF
set head off
set verify off
set feedback off
select sysdate || ';' || systimestamp from dual;
exit;
EOF
)
sysdate=$(echo $variable | awk -F ';' '{print $1}')
echo 'col1:'$sysdate
systimestamp=$(echo $variable | awk -F ';' '{print $2}')
echo 'col2:'$systimestamp

Mongo Import csv - How can import csv data starting from `x` row number into mongodb using mongoimport

I have a csv file whose data I want to import into my mongodb but I want to make it point to a specific row number from were it should start importing the data from csv file.
Right now I'm importing it in the following way:
mongoimport -d dbname -c collection_name --type csv --file filename.csv --headerline
The reason I want to import it from a specific row number is because starting few rows are informational but not required to insert into DB.
SampleFile(2015),,,
,,,
,,,
,,,
,,,
Theme,Category,Topic
Automobile,Auto Brands,Acura
Automobile,Auto Brands,Aston Martin
So I want to point it from the row Theme,Category,Topic. Is it possible or do I have to manually edit the csv file for this.
On unix or with a ported version you can use tail to skip the lines in the file, as mongoimport will accept STDIN as an alternate to --file. You probably want to set up --fieldFile for the headers as well since the --headerline cannot be used when you are not reading that first line in the file:
tail -n+<linesToSkip> | mongoimport -d dbname -c collectionname --type csv --headerfile headers.txt
Note the + there as that tells tail to "skip to that line"
If you don't want to install anything else on windows then use for:
for /f "skip=<linesToSkip> delims=\n" %i in (base.js) do #echo %i | | mongoimport -d dbname -c collectionname --type csv --headerfile headers.txt
In your sample though just skip the lines to the headerline and still use the option.
So just pipe the input to STDIN and allow mongoimport to slurp it up.

shell script to find filename & line count of each file, now insert this record into Oracle table

I have to find the filename available in a folder with each file line count. And then, i will have kind of two column data.
Now i have to insert this record into a oracle table having two column(col1, col2).
Can i write a shell script which will do both.
I found here itself of writing the first part.
i.e
wc -l *| egrep -v " total$" | awk '{print $2 " " $1}' > output.txt
Now, how will insert data of output.txt into oracle table.
In version 9i Oracle gave us external tables. These objects allow us to query data in OS files through SELECT statements. This is pretty cool. Even cooler, in 11.0.1.7 we can associate a shell script with an external table to generate its OS file. Check out Adrian Billington's article on listing files with the external table preprocessor in 11g. Your shell script is an ideal candidate for the preprocessor functionality.
If you need to know the contents of the directory now for whatever purpose you can simply SELECT from the external table. If you want to keep a permanent record of the file names you can issue an INSERT INTO ... SELECT * FROM external_table;. This statement could be run autonomically using a database job.
You have two choices.
Both I tested with this table structure:
SQL> create table tbl_files(fileName varchar(20), lineCount number);
First choice is generate sql script with generate insert SQL commands, and run sqlplus command line utility. I little bit modified your shell script:
wc -l *| egrep -v " total$" | awk '{q=sprintf("%c", 39); print "INSERT INTO TBL_FILES(fileName, lineCount) VALUES (" q$2q ", " $1 ");";}' > sqlplusfile.sql
After run this script file "sqlplusfile.sql" have this content:
INSERT INTO TBL_FILES(fileName, lineCount) VALUES ('File1.txt', 10);
INSERT INTO TBL_FILES(fileName, lineCount) VALUES ('File2.txt', 20);
INSERT INTO TBL_FILES(fileName, lineCount) VALUES ('File3.txt', 30);
Now you can run directly sqlplus command with this file in parametr:
sqlplus username/password#oracle_database #sqlplusfile.sql
After run this script, table look like this:
SQL> select * from tbl_files;
FILENAME LINECOUNT
-------------------- ----------
File1.txt 10
File2.txt 20
File3.txt 30
Note: At the end of file "sqlplusfile.sql" must be present "commit;" otherwise, you will not see data in the table.
Second choice is to use sqlldr command line tool (this tool is part of the installation of Oracle Client)
Again, little bit modified your script:
wc -l *| egrep -v " total$" | awk '{print "\""$2"\"" "," $1}' > data.txt
Content of file "data.txt" look like this:
"File1.txt",10
"File2.txt",20
"File3.txt",30
In the same directory I created the file "settings.ctl" with this content:
LOAD DATA
INFILE data.txt
INSERT
INTO TABLE TBL_FILES
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
(fileName, lineCount)
Now you can run this command, which loads data into the database:
sqlldr userid=user/passwd#oracle_database control=settings.ctl
sqlldr utility is better choice, but in some Oracle Client installation is not present.
You can load text data into Oracle with its command-line tool SQLLoader. There is too much to describe here about how to use SQLLoader, start by reading this web page:
http://www.orafaq.com/wiki/SQL*Loader_FAQ
I do not know Oracle, but it appears that the SQL syntax is quite similar to MySQL.
In essence you, you would do what you have done here with one minor change.
wc -l *| egrep -v " total$" | awk '{print $2 "|" $1}' > output.txt
You would write an SQL script called thesql.sql that would look like the following:
LOAD DATA
INFILE output.txt
INTO TABLE yourtable
FIELDS TERMINATED BY '|'
(col1, col2)
Then, in your script (this is where I am hazy), have a line like this:
sqlplus <user>/<pw>#<db> #thesql.sql
I found some help from a couple of different sources - here and here.
Hope this helps!

Resources