Will this kind of bash work as expected? - bash

#!/bin/bash
MYSQL="/usr/bin/mysql -uroot "
function create_db()
{
local db_name=${1}
$MYSQL<<!
create database IF NOT EXISTS ${db_name};
!
}
###-------------------------tb_bind_userid_xxx-------------------------------------
function create_table_userid
{
$MYSQL << !
create table if NOT EXISTS bind_pay.tb_bind_userid_${1}(
b_userid bigint not null,
b_membercode bigint not null ,
PRIMARY KEY (b_userid)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
!
}
Will $MYSQL be persistent across function calls,or reconnect each time?
If it reconnects every time,I don't think create_table_userid will work as expected though,because it hasn't specify a database name yet.

Well, because you will be calling the function each time you want to create the table, you will call mysql to connect to the database each time. If you want persistent connection, one way is to use a mysql library that is supported by most major programming languages these days, Perl/Python/Ruby/PHP etc. You can make a DB connection, then do whatever stuff you want, then finally close the connection. for example in the documentation of Python/Mysql
import MySQLdb
conn = MySQLdb.connect (host = "localhost",
user = "testuser",
passwd = "testpass",
db = "test")
cursor = conn.cursor ()
cursor.execute ("SELECT VERSION()")
row = cursor.fetchone ()
print "server version:", row[0]
cursor.close ()
conn.close ()
As you can see, a connection conn is opened to connect to database. Then using the connection handle (or rather database handle), stuff are done and then finally, the connection is closed.

$MYSQL is just a variable, so your code runs mysql each time it calls one of these functions.
You can create a persistant connection to mysql easily enough; just have your program write its sql to output, then pipe the whole results into mysql:
(
create_table "foo"
create_table "bar"
) | mysql
create_table() {
cat <<!
create table $1 ....
!
}

Related

Very slow connection to Snowflake from Databricks

I am trying to connect to Snowflake using R in databricks, my connection works and I can make queries and retrieve data successfully, however my problem is that it can take more than 25 minutes to simply connect, but once connected all my queries are quick thereafter.
I am using the sparklyr function 'spark_read_source', which looks like this:
query<- spark_read_source(
sc = sc,
name = "query_tbl",
memory = FALSE,
overwrite = TRUE,
source = "snowflake",
options = append(sf_options, client_Q)
)
where 'sf_options' are a list of connection parameters which look similar to this;
sf_options <- list(
sfUrl = "https://<my_account>.snowflakecomputing.com",
sfUser = "<my_user>",
sfPassword = "<my_pass>",
sfDatabase = "<my_database>",
sfSchema = "<my_schema>",
sfWarehouse = "<my_warehouse>",
sfRole = "<my_role>"
)
and my query is a string appended to the 'options' arguement e.g.
client_Q <- 'SELECT * FROM <my_database>.<my_schema>.<my_table>'
I can't understand why it is taking so long, if I run the same query from RStudio using a local spark instance and 'dbGetQuery', it is instant.
Is spark_read_source the problem? Is it an issue between Snowflake and Databricks? Or something else? Any help would be great. Thanks.

Airflow retain the same database connection?

I'm using Airflow for some ETL things and in some stages, I would like to use temporary tables (mostly to keep the code and data objects self-contained and to avoid to use a lot of metadata tables).
Using the Postgres connection in Airflow and the "PostgresOperator" the behaviour that I found was: For each execution of a PostgresOperator we have a new connection (or session, you name it) in the database. In other words: We lose all temporary objects of the previous component of the DAG.
To emulate a simple example, I use this code (do not run, just see the objects):
import os
from airflow import DAG
from airflow.operators.postgres_operator import PostgresOperator
default_args = {
'owner': 'airflow'
,'depends_on_past': False
,'start_date': datetime(2018, 6, 13)
,'retries': 3
,'retry_delay': timedelta(minutes=5)
}
dag = DAG(
'refresh_views'
, default_args=default_args)
# Create database workflow
drop_exist_temporary_view = "DROP TABLE IF EXISTS temporary_table_to_be_used;"
create_temporary_view = """
CREATE TEMPORARY TABLE temporary_table_to_be_used AS
SELECT relname AS views
,CASE WHEN relispopulated = 'true' THEN 1 ELSE 0 END AS relispopulated
,CAST(reltuples AS INT) AS reltuples
FROM pg_class
WHERE relname = 'some_view'
ORDER BY reltuples ASC;"""
use_temporary_view = """
DO $$
DECLARE
is_correct integer := (SELECT relispopulated FROM temporary_table_to_be_used WHERE views LIKE '%<<some_name>>%');
BEGIN
start_time := clock_timestamp();
IF is_materialized = 0 THEN
EXECUTE 'REFRESH MATERIALIZED VIEW ' || view_to_refresh || ' WITH DATA;';
ELSE
EXECUTE 'REFRESH MATERIALIZED VIEW CONCURRENTLY ' || view_to_refresh || ' WITH DATA;';
END IF;
END;
$$ LANGUAGE plpgsql;
"""
# Objects to be executed
drop_exist_temporary_view = PostgresOperator(
task_id='drop_exist_temporary_view',
sql=drop_exist_temporary_view,
postgres_conn_id='dwh_staging',
dag=dag)
create_temporary_view = PostgresOperator(
task_id='create_temporary_view',
sql=create_temporary_view,
postgres_conn_id='dwh_staging',
dag=dag)
use_temporary_view = PostgresOperator(
task_id='use_temporary_view',
sql=use_temporary_view,
postgres_conn_id='dwh_staging',
dag=dag)
# Data workflow
drop_exist_temporary_view >> create_temporary_view >> use_temporary_view
At the end of execution, I receive the following message:
[2018-06-14 15:26:44,807] {base_task_runner.py:95} INFO - Subtask: psycopg2.ProgrammingError: relation "temporary_table_to_be_used" does not exist
Someone knows if Airflow has some way to retain the same connection to the database? I think it can save a lot of work in creating/maintaining several objects in the database.
You can retain the connection to the database by building a custom Operator which leverages the PostgresHook to retain a connection to the db while you perform some set of sql operations.
You may find some examples in contrib on incubator-airflow or in Airflow-Plugins.
Another option is to persist this temporary data to XCOMs. This will give you the ability to keep the metadata used with the task in which it was created. This may help troubleshooting down the road.

Create Postgresql database in Heroku with Ruby (without Rails)

I'm currently hosting a simple Ruby script that stores URLs and Scores and saving them to YAML. However, I'd like to save to a Postgresql database instead since the yaml file is deleted every time I restart the app. Here's the error I'm getting in Heroku:
could not connect to server: No such file or directory (PG::ConnectionBad)
Here's an example script that works locally, but throws me the above error in Heroku:
require 'pg'
conn = PG.connect( dbname: 'template1' )
res1 = conn.exec('SELECT * from pg_database where datname = $1', ['words'])
if res1.ntuples == 1 # db exists
# do nothing
else
conn.exec('CREATE DATABASE words')
words_conn = PGconn.connect( :dbname => 'words')
words_conn.exec("create table top (url varchar, score integer);")
words_conn.exec("INSERT INTO top (url, score) VALUES ('http://apple.com', 1);")
end
Thanks in advance for any help or suggestions!
Assuming you have created a Postgres database using the Heroku toolchain via heroku addons:add heroku-postgresql:dev (or the plan of your choice) you should have a DATABASE_URL environmental variable that contains your connection string. You can check that locally through heroku pg:config.
Using the pg gem (docs: http://deveiate.org/code/pg/PG/Connection.html) - and modifying the example from there to suit -
require 'pg'
# source the connection string from the DATABASE_URL environmental variable
conn = PG::Connection.new(ENV['DATABASE_URL'])
res = conn.exec_params('create table top (url varchar, score integer;")
Update: A slightly more complete example for the purposes of error handling:
conn = PG::Connection.new(ENV['TEST_DATABASE_URL'])
begin
# Ensures the table is created if it doesn't exist
res = conn.exec("CREATE TABLE IF NOT EXISTS top (url varchar, score integer);")
res.result_status
rescue PG::Error => pg_error
puts "Table creation failed: #{pg_error.message}"
end

perl connectivity issue for newbie

friends could one of the perl expert tell me what I'm doing wrong here?
I'm still learning perl so newbie with this..whatever I do my connection string doesn't work.
trying to connect oracle database with perl script with below argument on cmd prompt.
$ list_tables /#testdb
Query dba_tables and list tables of user ABC
Also get output in logfile
#!/usr/local/bin/perl -w
use strict;
use Getopt::Std;
use OracleAgent;
use OracleLoginString;
my exitStatus = 0;
my %options = ();
my $oracleLogin;
getopts("o",\%options);
if (defined $options{o}) {
$oracleLogin = $options{o};
}
else {
exitWithError());
}
my $db = DBI->connect('dbi:Oracle:',$oracleLogin,'')
or die "Can't connect to Oracle database: $DBI::errstr\n";
exit($exitStatus);
Basically when I execute script I just want to provide instance name and not password.
I can connect from sqlplus prompt without password since using oracle login e.g. $sqlplus "/#testdb"
add DBD::Oracle
use DBD::Oracle;
Write a proper connection string:
my $db = DBI->connect("dbi:Oracle:host=$host;sid=$sid", $user, $passwd);

I'd like to scrape the iTunes top X RSS feed and insert into a dB

Preferably I'd like to do so with some bash shell scripting, maybe some PHP or PERL and a MySQL db. Thoughts?
Here is a solution using Perl, with the help of (of course!) a bunch of modules.
It uses SQLite so you can run it easily (the definition of the (simplistic) DB is at the end of the script). Also it uses Perl hashes and simple SQL statements, instead of proper objects and an ORM layer. I found it easier to parse the XML directly instead of using an RSS module (I tried XML::Feed), because you need access to specific tags (name, preview...).
You can use it as a basis to add more features, more fields in the DB, a table for genre... but at least this way you have a basis that you can expand on (and maybe you can then publish the result as open-source).
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig; # to parse the RSS
use DBIx::Simple; # DB interaction made easy
use Getopt::Std; # always need options for a script
use PerlIO::gzip; # itunes sends a gzip-ed file
use LWP::Simple 'getstore'; # to get the RSS
my %opt;
getopts( 'vc:', \%opt);
# could also be an option, but I guess it won't change that much
my #URLs= (
'http://ax.itunes.apple.com/WebObjects/MZStoreServices.woa/ws/RSS/topsongs/limit=10/xml',
);
# during debug, it's nice to use a cache of the feed instead of hitting hit every single run
if( $opt{c}) { #URLs= ($opt{c}); }
# I like using SQLite when developping,
# replace with MySQL connect parameters if needed (see DBD::MySQL for the exact syntax)
my #connect= ("dbi:SQLite:dbname=itunes.db","","", { RaiseError => 1, AutoCommit => 0 }) ;
my $NS_PREFIX='im';
# a global, could be passed around, but would make the code a bit more verbose
my $db = DBIx::Simple->connect(#connect) or die "cannot connect to DB: $DBI::errstr";
foreach my $url (#URLs)
{ add_feed( $url); }
$db->disconnect;
warn "done\n" if( $opt{v});
sub add_feed
{ my( $url)= #_;
# itunes sends gziped RSS, so we need to unzip it
my $tempfile= "$0.rss.gz"; # very crude, should use File::Temp instead
getstore($url, $tempfile);
open( my $in_feed, '<:gzip', $tempfile) or die " cannot open tempfile: $!";
XML::Twig->new( twig_handlers => { 'feed/title' => sub { warn "adding feed ", $_->text if $opt{v}; },
entry => \&entry,
},
map_xmlns => { 'http://phobos.apple.com/rss' => $NS_PREFIX },
)
->parse( $in_feed);
close $in_feed;
}
sub entry
{ my( $t, $entry)= #_;
# get the data
my %song= map { $_ => $entry->field( "$NS_PREFIX:$_") } qw( name artist price);
if( my $preview= $entry->first_child( 'link[#title="Preview"]') )
{ $song{preview}= $preview->att( 'href'); }
# $db->begin_work;
# store it
if( ($db->query( 'SELECT count(*) FROM song WHERE name=?', $song{name})->flat)[0])
{ warn " skipping $song{name}, already stored\n" if $opt{v};
}
else
{
warn " adding $song{name}\n" if $opt{v};
if( my $artist_id= ($db->query( 'SELECT id from ARTIST where name=?', $song{artist})->flat)[0])
{ warn " existing artist $song{name} ($artist_id)\n" if $opt{v};
$song{artist}= $artist_id;
}
else
{ warn " creating new artist $song{artist}\n" if $opt{v};
$db->query( 'INSERT INTO artist (name) VALUES (??)', $song{artist});
# should be $db->last_insert_id but that's not available in DBD::SQLite at the moment
$song{artist}= $db->func('last_insert_rowid');
}
$db->query( 'INSERT INTO song ( name, artist, price, preview) VALUES (??)',
#song{qw( name artist price preview)});
$db->commit;
}
$t->purge; # keeps memory usage lower, probably not needed for small RSS files
}
__END__
=head1 NAME
itunes2db - loads itunes RSS feeds to a DB
=head1 OPTIONS
-c <file> uses a cache instead of the list of URLs
-v verbose
=head1 DB schema
create table song ( id INT PRIMARY KEY, name TEXT, artist INT, price TEXT, preview TEXT);
create table artist (id INT PRIMARY KEY, name TEXT);
From what I can tell, it's not actively maintained, but Scriptella could be of some assistance. Very simple xml script, running on Java.
Example of how to suck RSS into a database:
<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
<connection id="in" driver="xpath" url="http://snippets.dzone.com/rss"/>
<connection id="out" driver="text" url="rss.txt"/>
<connection id="db" driver="hsqldb" url="jdbc:hsqldb:db/rss" user="sa" classpath="hsqldb.jar"/>
<script connection-id="db">
CREATE TABLE Rss (
ID Integer,
Title VARCHAR(255),
Description VARCHAR(255),
Link VARCHAR(255)
)
</script>
<query connection-id="in">
/rss/channel/item
<script connection-id="out">
Title: $title
Description: [
${description.substring(0, 20)}...
]
Link: $link
----------------------------------
</script>
<script connection-id="db">
INSERT INTO Rss (ID, Title, Description, Link)
VALUES (?rownum, ?title, ?description, ?link);
</script>
</query>
</etl>
Well, I'm not really sure what sort of answer you're looking for, but I don't think you need to do any sort of shell scripting. Bother PHP and Perl would be perfectly capable of downloading the RSS feed and insert the data into MySQL. Set the PHP or Perl script up to run every X number of hours/days/whatever with a cronjob and you'd be done.
Not really much else to tell you, with how vague your question was.
I'm scraping Stack Overflow's feed to perform some additional filtering using PHP's DOMDocument and then DOM methods to access what I want. I'd suggest looking into that.

Resources