How to upload Query Result from Snowflake to S3 Directly? - jdbc

I have a Query Interface where the user writes a SQL Query and Gets Result, The warehouse we use is Snowflake to Query Data and display the Queried SQL Result. We use Snowflake JDBC to establish a connection, Asynchronously Queue the Query get a Query ID(UUID) from snowflake and use the Query ID to get status and fetch the Result.
Sample Code:
try {
ResultSetMetaData resultSetMetaData = resultSet.getMetaData();
int numColumns = resultSetMetaData.getColumnCount();
for (int i = 1; i <= numColumns; i++) {
arrayNode.add(objectMapper.createObjectNode().put("name", resultSetMetaData.getColumnName(i))
.put("attribute_number", i)
.put("data_type", resultSetMetaData.getColumnTypeName(i))
.put("type_modifier", (Short) null)
.put("scale", resultSetMetaData.getScale(i)).put("precision",
resultSetMetaData.getPrecision(i)));
}
rootNode.set("metadata", arrayNode);
arrayNode = objectMapper.createArrayNode();
while (resultSet.next()) {
ObjectNode resultObjectNode = objectMapper.createObjectNode();
for (int i = 1; i <= numColumns; i++) {
String columnName = resultSetMetaData.getColumnName(i);
resultObjectNode.put(columnName, resultSet.getString(i));
}
arrayNode.add(resultObjectNode);
}
rootNode.set("results", arrayNode);
// TODO: Instead of returning the entire result string, send it in chunk to S3 utility class for upload
resultSet.close();
jsonString = objectMapper.writeValueAsString(rootNode);
}
As you can see here our use case is we need to send the metadata info(column details) along with the result. The result set is then uploaded to S3 and users are given a S3 link to view the results.
I am trying to figure if this scenario can be handled in Snowflake itself, where snowflake can generate the metadata for the query and upload the result set to a user-defined bucket SO that consumers of Snowflake won't have to do this. I have read about Snowflake Stream, Copy from Stages. Can someone help me understand if this is feasible and if yes how this can be achieved?
Is there any way where I can upload the result of a Query using QueryId from snowflake to S3 directly without fetching and uploading it to S3.

You can store the results in an S3 bucket using the COPY command. This is a simplified example showing the process on a temporary internal stage. For your use case, you would create and use an external stage in S3:
create temp stage FOO;
select * from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."NATION";
copy into #FOO from (select * from table(result_scan(last_query_id())));
The reason you want to use COPY from a previous select is that the COPY command is somewhat limited in what it can use for the query. By running the query as a regular select first and then running a select * from that result, you get past those limitations.
The COPY command supports other file formats. This way will use the default CSV format. You can also specify JSON, Parquet, or a custom delimited format using a named file format.
https://docs.snowflake.com/en/sql-reference/sql/copy-into-location.html

Related

How to get the count of documents for the object store using FileNet API's

I have more than a million documents in object store, and I want to know the count of documents for a specific time period. How can I get the count using FileNet CE api's
The code I use is below, which gives me only a maximum of 200 documents.
--Code
SearchScope scope= new SearchScope(obj);
SearchSQL sql= new SearchSQL();
sql.setMaxRecords(100000);
String query="select * from document where datecreated >(date)";
RepositoryRowSet res= scope.fetchRows(sql,1000,null,null);
int count=0;
PageIterator p= result.pageIterator();
while(p.nextPage){
count+=p.getElementCount();a
}
It is possible to use COUNT() function in background searches:
select COUNT(Id) from Document
Link to SQL syntax for background search query
Working with background search queries via API
Or, you can use a direct database connection and find the count of documents using documented database tables schema from DocVersion table.
Table schema - DocVersion

Uable to delete large data on parse.com

I am facing a problem in deleting large data from parse.com
Firstly i filtered the data using filter but it displays me only at max 100 rows and then i have to select this 100 rows and delete , and then again select and delete next 100.
Is there any way i can delete all data matching the filter,
something like
DELETE FROM Tablename WHERE fieldname LIKE '%foo%'
or is it possible to execute query in parse.com
or is there a way to deleted it using shell script and parse somehow (any package might help me)
If you want to do this programmatically, you can create a query to get all the objects and then delete them. Here is an example using swift for iOS:
var query = PFQuery(className: TABLENAME)
query.whereKey(fieldname, equals: "%foo%")
query.findObjectsInBackgroundWithBlock(
{(objects: [AnyObject]!, error: NSError!) -> Void in
for object in objects {
object.deleteInBackground()
}
})
The documentation for parse in any of its supported languages can be found here: https://parse.com/docs/

Selectively loading iis log files into Hive

I am just getting started with Hadoop/Pig/Hive on the cloudera platform and have questions on how to effectively load data for querying.
I currently have ~50GB of iis logs loaded into hdfs with the following directory structure:
/user/oi/raw_iis/Webserver1/Org/SubOrg/W3SVC1056242793/
/user/oi/raw_iis/Webserver2/Org/SubOrg/W3SVC1888303555/
/user/oi/raw_iis/Webserver3/Org/SubOrg/W3SVC1056245683/
etc
I would like to load all the logs into a Hive table.
I have two issues/questions:
1.
My first issue is that some of the webservers may not have been configured correctly and will have iis logs without all columns. These incorrect logs need additional processing to map the available columns in the log to the schema that contains all columns.
The data is space delimited, the issue is that when not all columns are enabled, the log only includes the columns enabled. Hive cant automatically insert nulls since the data does not include the columns that are empty. I need to be able to map the available columns in the log to the full schema.
Example good log:
#Fields: date time s-ip cs-method cs-uri-stem useragent
2013-07-16 00:00:00 10.1.15.8 GET /common/viewFile/1232 Mozilla/5.0+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/27.0.1453.116+Safari/537.36
Example log with missing columns (cs-method and useragent):
#Fields: date time s-ip cs-uri-stem
2013-07-16 00:00:00 10.1.15.8 /common/viewFile/1232
The log with missing columns needs to be mapped to the full schema like this:
#Fields: date time s-ip cs-method cs-uri-stem useragent
2013-07-16 00:00:00 10.1.15.8 null /common/viewFile/1232 null
How can I map these enabled fields to a schema that includes all possible columns, inserting blank/null/- token for fields that were missing? Is this something I could handle with a Pig script?
2.
How can I define my Hive tables to include information from the hdfs path, namely Org and SubOrg in my dir structure example so that it is query-able in Hive? I am also unsure how to properly import data from the many directories into a single hive table.
First provide Sample data for better help.
How can I map these enabled fields to a schema that includes all possible columns, inserting blank/null/- token for fields that were missing?
If you have delimiter in file you can use Hive and hive automatically inserts nulls properly wherever data is not there.provided that you do not have delimiter as part of your data.
Is this something I could handle with a Pig script?
If you have delimiter among the fields then you can use Hive ,otherwise you can go for mapreduce/pig.
How can I include information from the hdfs path, namely Org and SubOrg in my dir structure example so that it is query-able in Hive?
Seems you are new bee in hive,before querying you have to create a table which includes information like path,delimiter and schema.
Is this a good candidate for partitioning?
You can apply partition on date if you wish.
I was able to solve both my issues with Pig UDF (user defined functions)
Mapping columns to proper schema: See this answer and this one.
All I really had to do is add some logic to handle the iis headers that start with #. Below are the snippets from getNext() that I used, everything else is the same as mr2ert's example code.
See the values[0].equals("#Fields:") parts.
#Override
public Tuple getNext() throws IOException {
...
Tuple t = mTupleFactory.newTuple(1);
// ignore header lines except the field definitions
if(values[0].startsWith("#") && !values[0].equals("#Fields:")) {
return t;
}
ArrayList<String> tf = new ArrayList<String>();
int pos = 0;
for (int i = 0; i < values.length; i++) {
if (fieldHeaders == null || values[0].equals("#Fields:")) {
// grab field headers ignoring the #Fields: token at values[0]
if(i > 0) {
tf.add(values[i]);
}
fieldHeaders = tf;
} else {
readField(values[i], pos);
pos = pos + 1;
}
}
...
}
To include information from the file path, I added the following to my LoadFunc UDF that I used to solve 1. In the prepareToRead override, grab the filepath and store it in a member variable.
public class IISLoader extends LoadFunc {
...
#Override
public void prepareToRead(RecordReader reader, PigSplit split) {
in = reader;
filePath = ((FileSplit)split.getWrappedSplit()).getPath().toString();
}
Then within getNext() I could add the path to the output tuple.

how to export and import BLOB data type in oracle

how to export and import BLOB data type in oracle using any tool. i want to give that as release
Answering since it has a decent view count even with it being 5 year old question..
Since this question was asked 5 years ago there's a new tool named SQLcl ( http://www.oracle.com/technetwork/developer-tools/sqlcl/overview/index.html)
We factored out the scripting engine out of SQLDEV into cmd line. SQLDev and this are based on java which allows usage of nashorn/javascript engine for client scripting. Here's a short example that is a select of 3 columns. ID just the table PK , name the name of the file to create, and content the BLOB to extract from the db.
The script command triggers this scripting. I placed this code below into a file named blob2file.sql
All this adds up to zero plsql, zero directories instead just some sql scripts with javascript mixed in.
script
// issue the sql
// bind if needed but not in this case
var binds = {}
var ret = util.executeReturnList('select id,name,content from images',binds);
// loop the results
for (i = 0; i < ret.length; i++) {
// debug messages
ctx.write( ret[i].ID + "\t" + ret[i].NAME+ "\n");
// get the blob stream
var blobStream = ret[i].CONTENT.getBinaryStream(1);
// get the path/file handle to write to
// replace as need to write file to another location
var path = java.nio.file.FileSystems.getDefault().getPath(ret[i].NAME);
// dump the file stream to the file
java.nio.file.Files.copy(blobStream,path);
}
/
The result is my table emptied into files ( I only had 1 row ). Just run as any plain sql script.
SQL> #blob2file.sql
1 eclipse.png
blob2file.sql eclipse.png
SQL>

How to use variable mapping while using Oracle OLE DB provider in SSIS?

How to use variable mapping while using Oracle OLE DB provider? I have done the following:
Execute SQL Task: Full result set to hold results of the query.
Foreach ADO Enumerator: ADO object source above variable (Object data type).
Variable Mapping: 1 field.
The variable is setup as Evaluate as an Express (True)
Data Flow: SQL Command from variable, as SELECT columnName FROM table where columnName = ?
Basically what I am trying to do is use the results of a query from a SQL Server table, (ie ..account numbers) and pull records from Oracle reference the results from the SQL query
It feels like you're mixing items. The Parameterization ? is a placeholder for a variable which, in an OLE DB Source component, you'd click on the Parameters button and map.
However, since you're using the SQL Command from a Variables, that doesn't allow you to use the Parameterization option, probably because the risk of a user changing the shape of the result set, via Expressions, is too high.
So, pick one - either "SQL Command" with proper parametetization or "SQL Command from Variable" where you add in your parameters in terrible string building fashion like Dynamically assign value to variable in SSIS SQL Server 2005/2008/2008R2 people, be aware that you are limited to 4k characters in a string variable that uses Expressions.
Based on the comment of "Basically what I am trying to do is use the results of a query from a SQL Server table, (ie ..account numbers) and pull records from Oracle reference the results from the SQL query"
There's two ways of going about this. With what you've currently developed, my above answer still stands. You are shredding the account numbers and using those as the filter in your query to Oracle. This will issue a query to Oracle for each account number you have. That may or may not be desirable.
The upside to this approach is that it will allow you to retrieve multiple rows. Assuming you are pulling Sales Order type of information, one account number likely has many sales order rows.
However, if you are working with something that has a zero to one mapping with the account numbers, like account level data, then you can simplify the approach you are taking. Move your SQL Server query to an OLE DB Source component within your data flow.
Then, what you are looking for is the Lookup Component. That allows you to enrich an existing row of data with additional data. Here you will specify a query like "SELECT AllTheColumnsICareAbout, AccountNumber FROM schema.Table ". Then you will map the AccountNumber from the OLE DB Source to the one in the Lookup Component and the click the checkmark next to all the columns you want to augment the existing row with.
I believe what you are asking is how to use SSIS to push data to Oracle OleDb provider.
I will assume that Oracle is the destination. The idea of using data destinations with variable columns is not supported out of the box. You should be able to use the SSIS API or other means, I take a simpler approach.
I recently set up a package to get all tables from a database and create dynamic CSV output. One file for each table. You could do something similar.
Switch out the streamwriter part with a section to 1. Create the table in destination. 2. Insert records into Oracle. I am not sure if you will need to do single inserts to Oracle. In another project that works in reverse, dynamic csv into SQL. SInce I work with SQL server, I load a datatable and use SQLBulkCopy class to use bulk loading which provides excellent performance.
public void Main()
{
string datetime = DateTime.Now.ToString("yyyyMMddHHmmss");
try
{
string TableName = Dts.Variables["User::CurrentTable"].Value.ToString();
string FileDelimiter = ",";
string TextQualifier = "\"";
string FileExtension = ".csv";
//USE ADO.NET Connection from SSIS Package to get data from table
SqlConnection myADONETConnection = new SqlConnection();
myADONETConnection = (SqlConnection)(Dts.Connections["connection manager name"].AcquireConnection(Dts.Transaction) as SqlConnection);
//Read data from table or view to data table
string query = "Select * From [" + TableName + "]";
SqlCommand cmd = new SqlCommand(query, myADONETConnection);
//myADONETConnection.Open();
DataTable d_table = new DataTable();
d_table.Load(cmd.ExecuteReader());
//myADONETConnection.Close();
string FileFullPath = Dts.Variables["$Project::ExcelToCsvFolder"].Value.ToString() + "\\Output\\" + TableName + FileExtension;
StreamWriter sw = null;
sw = new StreamWriter(FileFullPath, false);
// Write the Header Row to File
int ColumnCount = d_table.Columns.Count;
for (int ic = 0; ic < ColumnCount; ic++)
{
sw.Write(TextQualifier + d_table.Columns[ic] + TextQualifier);
if (ic < ColumnCount - 1)
{
sw.Write(FileDelimiter);
}
}
sw.Write(sw.NewLine);
// Write All Rows to the File
foreach (DataRow dr in d_table.Rows)
{
for (int ir = 0; ir < ColumnCount; ir++)
{
if (!Convert.IsDBNull(dr[ir]))
{
sw.Write(TextQualifier + dr[ir].ToString() + TextQualifier);
}
if (ir < ColumnCount - 1)
{
sw.Write(FileDelimiter);
}
}
sw.Write(sw.NewLine);
}
sw.Close();
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception exception)
{
// Create Log File for Errors
//using (StreamWriter sw = File.CreateText(Dts.Variables["User::LogFolder"].Value.ToString() + "\\" +
// "ErrorLog_" + datetime + ".log"))
//{
// sw.WriteLine(exception.ToString());
//}
Dts.TaskResult = (int)ScriptResults.Failure;
throw;
}
Dts.TaskResult = (int)ScriptResults.Success;

Resources