Snowflake Copy Command Result Scan Inconsistency - validation

I am using Copy command to load data from CSV file to a table using internal Stage.
After loading data I am using below code to get number of rows loaded and failed.
Select * from table(Result_Scan('Copy_Query_ID'))
I am also using below query to get actual failed records:
select * from table(validate("Table_Name",job_id=>'Copy_Query_ID'))
it worked fine few times. But I noticed today that first query shows as below:
Rows_Parsed Rows_Loaded Error_Seen
10000 9600 400
So I was expecting 400 rows in second query result but instead I see 10400 records:
All rows once and additional 400 records for some other errors. If all rows are error rows then why are they loaded? Can i not use this queries for this purpose?
Note- In my file I have 6 fields but I am using only 4 of them in Copy and rest two fields I am getting using SYSdate(), may be this is the reason for mismatch?
Copy into table(col1,col2,col3,col4,col5,col6) from ( select $1,$2,$3,$4, sysdate(),'10/20/2020' from %#table)
so I am guessing validate is not looking at my new values for field 5,6 instead it is taking these values from file?

Related

Get only one row from jdbcTemplate query for performance optimization

jdbcTemplate.query(getQuery(id),
rs -> {
if(rs.next()) {
mainDTO.setSim(rs.getString("sim"));
mainDTO.setImei(rs.getString("imei"));
}
});
I use above code fragment to retrieve data from database and getting more than 100 records. But for all the records, sim and imei numbers are same. other fields are different. When executing above code I can get sim and imei number from the first record itself. but query run on all over the records and hence it take more than 3 seconds to complete. here is the problem.
How I can stop retrieving other records after I got the value for sim and imei from the first record. I cant change the sql query as the documentation and need to do the optimization in java code itself.
how can I optimize this to perform within below 100 mills.
You have two choices, either limit using a SQL query or use JdbcTemplate#setMaxRows:
SQL
You need to edit the query including what columns are about to be selected and the table name:
SELECT * FROM table LIMIT 1
JDBC
Use JdbcTemplate#setMaxRows to configure the JdbcTemplate to return up to one row:
jdbcTemplate.setMaxRows(1);
I guess it mimics Statement#setMaxRows.

Delete last record from flat file in blob to Azure data warehouse

I have some pipe delimited flat files in Blob storage and in each file I have a header and footer record with filename, date of extract and the number of records. I am using ADF pipeline with Polybase toload into Azure DWH. I could skip header record but unable to skip the footer. The only way I could think of is creating staging table with all varchar and load into staging and then convert the data types back into main tables. But that is not working as the number of columns is different to the footer and the data. Is there any easier way to do this? Please advise.
Polybase does not have an explicit option for removing footer rows but it does have a set of rejection options which you could potentially take advantage of. If you set your REJECT_TYPE as VALUE (rather than PERCENTAGE) and your REJECT_VALUE AS 1 you are telling Polybase to reject one row only. If your footer is in a different format to the main data rows, it will be rejected but your query should not fail.
CREATE EXTERNAL TABLE yourTable
...
<reject_options> ::=
{
| REJECT_TYPE = value,
| REJECT_VALUE = 2
Please post a simple, anonmyised example of your file with headers, columns and footers if you need further assistance.
Update: Check this blog post for information on tracking rejected rows:
https://azure.microsoft.com/en-us/blog/load-confidently-with-sql-data-warehouse-polybase-rejected-row-location/

How can I add static values in a column in grafana/influxdb

I'm using grafana, influxdb and jmeter. I have this table.
I need to add a column that says "Base Line" with a value different for each request name. I have tried the following:
- Grafana does not seem to have a way to add static values for columns, or a query equivalent of sql for "select 'value' as 'columnName'"
- I tried creating a new time series for static data (base lines) and do a join of the results from jmeter with the series I created, but I get this error:
error parsing query: found AS, expected ;
I'm having a hard time trying to create an extra column with fixed data for each request... my last resort is to modify or create a jmeter plugin for that matter, but before going for that, there might be something I could be missing.
The easiest solution I can think of is adding a Dummy Sampler which will be doing nothing but "sleeping" for specified amount of time which is your "baseline".
JMeter's Backend Listener will collect this sampler metrics and you will either see a straight horizontal line identifying your baseline in the chart or the baseline value in the table or will be able to aggregate as a vertical column.
You can install Dummy Sampler using JMeter Plugins Manager

'thrust::system::system_error' error when running GPGPU database engine Alenka

I am trying to run alenka (https://github.com/antonmks/Alenka) by loading a custom table test.tbl and fire select queries on it.
It works fine with 3 or 4 rows.
But when I increase number of entries beyond 6 or 10 rows, it does not show any error while loading(./alenka load_test.sql), however when i run query(./alenka testquery.sql), it gives an error:
terminate called after throwing an instance of 'thrust::system::system_error' what(): invalid argument Aborted (core dumped)
---test.tbl---
1|2.12345|3|4|5|6|7|
1|2|3|4|5|6|7|
1|2|3|4|5|6|7|
1|2|3|4|5|6|7|
1|2|3|4|5|6|7|
1|2|3|4|5|6|7|
1|2|3|4|5|6|7|
1|2|3|4|5|6|7|
1|2|3|4|5|6|7|
This is the load_test.sql query
A := LOAD 'test.tbl' USING ('|') AS (var1{1}:int, var2{2}:int, var3{4}:int,
var4{5}:int,var5{6}:int, var6{7}:int, var7{8}:int);
STORE A INTO 'test' BINARY;
And testquery.sql
B := FILTER test BY id <= 19980902;
D := SELECT var2 AS var2
FROM B;
STORE D INTO 'mytest.txt' USING ('|');
Can someone explain, what is the reason for such error?
Thank you
The problem was raised due to minor errors which summed up to this confusion.
When a load command is fired on alenka, it creates binary files containing data from each column of the table.
These files will be overwritten if they are load again, however if the column names are changed it would create a new file along side the old ones.
So its a good idea to delete those files after renaming columns in a table in-order to avoid using them again.
Hence, I got this error because I had loaded data with different column names earlier and forgot to delete those files( test.id*) files from its folder.
Along with that, I also committed one more blunder of filtering it with "id" instead of 'var1' in query(testquery.sql) file.
Since the id files had 9 entries(from the previous schema), it ran perfectly for 9 rows but when the database size increased beyond that ,the thrust library threw system error.
Hope this helps someone from wasting time like I did.

Ignore error in SSIS

I am getting an "Violation of UNIQUE KEY constraint 'AK_User'. Cannot insert duplicate key in object 'dbo.tblUsers when trying to copy data from an excel file to sql db using SSIS.
Is there any way of ingnoring this error, and let the package continue to the next record without stopping?
What I need is if it inserts three records but the first record is a duplicate, instead of failing, it should continue with the other records and insert them.
There is a System variable called propagate which can be used to continue or stop the execution of package .
1.Create an ON-Error event handler for the task which is failing .Generally it is created for the entire Data Flow Task.
2.Press F4 to get the list of all variables and click on the Icon at the top
to show System Variable.By default Propagate variable will be True ,you need to change it to false ,which basically means that SSIS wont propagate the Error to other component and let the execution continue
Update 1:
To skip the bad rows there are basically 2 ways to do so :-
1.Use Lookup
Try to match the primary key column values in source and destination and then use Lookup No Match Output to your destination.If the value doesn't match with the destination then insert the rows else just skip the rows or redirect to some table or flat file using Lookup Match Output
Example
For more details on Lookup refer this article
2.Or you can redirect the error rows to a flat file or a table .Every SSIS Data Flow components has a Error Output .
For example for Derived component ,the error output dialogue box is
But this condition may not helpful to u in your case as redirect error rows in destination doesn't work properly .If an error occurs it redirects the entire data without inserting any row in the destination .I think this happens because OLEDB destination does a bulk insert or inserts data using transactions.So try to use lookup to achieve your functionality .

Resources