Migration fails when column in column filter is in camel case - amazon-aurora

We are trying to migrate data from AWS Aurora Postgres (v13.4) to another AWS Aurora Postgres database (Serverless v2) but the data migration fails for tables wherein we add a column filter. Our table names and column names are in the upper camel case. I checked the logs it shows the following error:
2022-04-27T09:31:12 [TRANSFORMATION ]E: Failed to init filter for column CompanyId [20014] (manipulator.c:605)
2022-04-27T09:31:12 [TRANSFORMATION ]W: The metadata transformations defined for table 'public.Tenants' were not performed as at least one of the transformation expressions contains an error (manipulator.c:147)
Here is the mapping rule:
{
"rules": [
{
"rule-type": "selection",
"rule-id": "046273916",
"rule-name": "046273916",
"object-locator": {
"schema-name": "public",
"table-name": "Tenants"
},
"rule-action": "include",
"filters": [
{
"filter-type": "source",
"column-name": "CompanyId",
"filter-conditions": [
{
"filter-operator": "eq",
"value": "xxxxxx"
}
]
}
]
},
{
"rule-type": "selection",
"rule-id": "042087387",
"rule-name": "042087387",
"object-locator": {
"schema-name": "public",
"table-name": "Plans"
},
"rule-action": "include",
"filters": []
}
]
}

Related

AppSync/GraphQL filter nested objects

I have a DynamoDB table with the following structure
site (String)
addresses (List)
|-> address (String)
|-> isCurrent (Boolean)
I want to filter a specific site for either current or all address(s).
query MyQuery {
getSite(site: "site1", isCurrent: true) {
site
addresses{
adress
isCurrent
}
the schema looks like
type Sites{
site: String!
addresses: [Address]
}
type Address {
address: String
isCurrent: Boolean
}
type Query{
getSite(site: String!, isCurrent:Boolean)
}
The Resolver I have
#if($ctx.args.isCurrent)
{
"version": "2017-02-28",
"operation": "Query",
"query": {. // Filter for specific Site
"expression": "#siteName = :siteNameByUser",
"expressionNames": {
"#siteName": "site"
},
"expressionValues": {
":siteNameByUser": {"S": $util.toJson($ctx.args.site)}
}
}, // Filter Current Address(s)
"filter": {
"expression": "addresses.isCurrent = :isActiveByUser",
"expressionValues": {
":isActiveByUser": $util.dynamodb.toDynamoDBJson($ctx.args.isCurrent)
}
}
}
#else
{
"version": "2017-02-28",
"operation": "GetItem",
"key": {
"site": $util.dynamodb.toDynamoDBJson($ctx.args.site)
}
}
#end
I'm not getting any results when I add filter ( it works without the filter or with isCurrent=False ).
I am trying to filter the inner objects in Addresses list based on a value user sends for isCurrent. Any help is much appreciated!
I tried writing a resolver with a filter condition on an inner value (addresses.isCurrent).
{
"version": "2017-02-28",
"operation": "Query",
"query": {. // Filter for specific Site
"expression": "#siteName = :siteNameByUser",
"expressionNames": {
"#siteName": "site"
},
"expressionValues": {
":siteNameByUser": {"S": $util.toJson($ctx.args.site)}
}
}, // Filter Current Address(s)
"filter": {
"expression": "addresses.isCurrent = :isActiveByUser",
"expressionValues": {
":isActiveByUser": $util.dynamodb.toDynamoDBJson($ctx.args.isCurrent)
}
}
}
Apparently, DynamoDB does not let you filter on Complex object types like List of Maps (your case), see a related question: DynamoDB: How to store a list of items
I'd suggest changing your DynamoDB table data model if possible to site, address, isCurrentAddress to achieve what you are trying to do. Or you can
write logic in VTL response mapping template to filter your result set based on isCurrentAddress value. Btw AppSync recently launched JavaScript resolvers, go through that and see if that helps in writing your resolver logic simpler.

Oracle Table-valued Functions returns erroneous decimals in Data Factory

I am working on a cloud datawarehouse using Azure Data Factory v2.
Quite a few of my data sources are on-prem Oracle 12g databases.
Extracting tables 1-1 is not a problem.
However, from time to time I need to extract data generated by parametrized computations on the fly in my Copy Activities.
Since I cannot use PL/SQL stored procedures as sources in ADF, I instead use table-valued functions in the source database and query them in the copy activity.
In the majority of the cases, this works fine. However, when my table-valued function returns a decimal type column, ADF sometimes returns erroneous values. That is: executing the TVF on the source db and previeweing/copying through ADF yields different results.
I have done some experiments if the absolute value or the sign of the decimal number matters, but I cannot find any pattern in which decimals are returned correctly and which are not.
Here are a few examples of the erroneously mapped numbers:
Value in Oracle db
Value in ADF
-658388.5681
188344991.6319
-205668.1648
58835420.6352
10255676.84
188213627.97348
Has any of you experienced similar problems?
Do you know if this is a bug in ADF (which is not integrating well to PL/SQL in the first place)?
First hypothesis
At first I thought the issue was related to NLS, casting or something similar.
I tested this hypothesis by creating a table on the Oracle db side, persisted the output form the TVF there and then extracted from the table in ADF.
Using this method, the decimals were returned correctly in ADF. Thus the hypothesis does not hold.
Second hypothesis
It might have to do with user accesses.
However the linked service used in ADF uses the same db credentials as the ones used to log in to the database to execute the TVF there.
Observation
The error seems to happen more often when a lot of aggregate functions are involved in the tvf's logic
Minimum reproducible example
Oracle db:
CREATE OR REPLACE TYPE test_col AS OBJECT
(
dec_col NUMBER(20,5)
)
/
CREATE OR REPLACE TYPE test_tbl AS TABLE OF test_col;
create or replace function test_fct(param date) return test_tbl
AS
ret_tbl test_tbl;
begin
select
test_col(
<"some complex logic which return a decimal">
)
bulk collect into ret_tbl
from <"some complex joins and group by's">;
return ret_tbl;
end test_fct;
select dec_col from table(test_fct(sysdate));
ADF:
Dataset:
{
"name": "test_dataset",
"properties": {
"linkedServiceName": {
"referenceName": "some_name",
"type": "LinkedServiceReference"
},
"folder": {
"name": "some_name"
},
"annotations": [],
"type": "OracleTable",
"structure": [
{
"name": "dec_col",
"type": "Decimal"
}
]
}
}
Pipeline:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "OracleSource",
"oracleReaderQuery": "select * from table(test_fct(sysdate))",
"partitionOption": "None",
"queryTimeout": "02:00:00"
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "test_dataset",
"type": "DatasetReference"
}
]
}
],
"annotations": []
}
}

Filter with complex key not work (using startkey and endkey)

I create a view with Map function:
function(doc) {
if (doc.market == "m_warehouse") {
emit([doc.logTime,doc.dbName,doc.tableName], 1);
}
}
I want to filter the data with multi-keys:
_design/select_data/_view/new-view/?limit=10&skip=0&include_docs=false&reduce=false&descending=true&startkey=["2018-06-19T09:16:47,527","stage"]&endkey=["2018-06-19T09:16:43,717","stage"]
but I still got:
{
"total_rows": 248133,
"offset": 248129,
"rows": [
{
"id": "01CGBPYVXVD88FPDVR3NP50VJW",
"key": [
"2018-06-19T09:16:47,527",
"ods",
"o_ad_dsp_pvlog_realtime"
],
"value": 1
},
{
"id": "01CGBQ6JMEBR8KBMB8T7Q7CZY3",
"key": [
"2018-06-19T09:16:44,824",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
},
{
"id": "01CGBQ4BKT8S2VDMT2RGH1FQ71",
"key": [
"2018-06-19T09:16:44,707",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
},
{
"id": "01CGBQ18CBHQX3F28649YH66B9",
"key": [
"2018-06-19T09:16:43,717",
"stage",
"s_ad_ztc_realpv_base_indirect"
],
"value": 1
}
]
}
the key "ods" should not in the results.
What did I do wrong?
Your query is not multi-key .. ist start and endkey.
if you want to have results by dbname in a special time range.. you need to change the emit to [doc.dbName,doc.logTime,doc.tableName]
then you query startkey=["stage","2018-06-19T09:16:43,717"]&endkey=["stage","2018-06-19T09:16:47,527"]
(btw. are you sure that your timestamp is in the right order ? In your example the second TS is larger than the first..)
As you have chosen a full date/time stamp as the first level of your key, down to millisecond precision, there are unlikely to be any repeating values in the first level of your compound key. If you indexed just the date, say, as the first key, your date would be grouped by date, dbame and table name in a more predictable way
e.g.
["2018-06-19","ods","o_ad_dsp_pvlog_realtime"]
["2018-06-19","stage","s_ad_ztc_realpv_base_indirect"]
["2018-06-19",stage","s_ad_ztc_realpv_base_indirect"
["2018-06-19","stage","s_ad_ztc_realpv_base_indirect"
With this key structure, the hierarchical grouping of keys works in your favour i.e. all the data from "2018-06-19" is together in the index, with all the data matching ["2018-06-19","stage"] adjacent to each other.
If you need to get to millisecond precision, you could index the data as follows:
function(doc) {
if (doc.market == "m_warehouse") {
emit([doc.dbName,doc.logTime], 1);
}
}
This would create index organised by dbName, but with a secondary sort on time. You can then extract the data for specified dbName between two timestamps.

Spring Data ElasticSearch Build In IN query returning partial match

I am new to elastic search spring data, Today I was trying to get In query working with Spring data ES repository.
I have to do a lookup for list of user names, and if its exactly match in the index, need to get those users back as result.
I tried to use the built in repository 'In' method to do so, but it returns partial matches, please help me to make this working like SQL IN query.
Here is my repository code:
public interface UserRepository extends ElasticsearchRepository<EsUser, String>
{
public List<EsUser> findByUserAccountUserNameIn(Collection<String> terms);
}
REQUEST:
{"terms":["vijay", "arun"], "type":"NAME"}
RESPONSE:
[
{
"userId": "236000",
"fbId": "",
"userAccount": {
"userName": "arun",
"urlFriendlyName": "arun",
},
"userProfile": {
},
"userStats": {
}
},
{
"userId": "6228",
"userAccount": {
"userName": "vijay",
"urlFriendlyName": "vijay",
},
"userProfile": {
},
"userStats": {
}
},
{
"userId": "236000",
"fbId": "",
"userAccount": {
"userName": "arun singh",
"urlFriendlyName": "arun-singh",
},
"userProfile": {
},
"userStats": {
}
}
{
"userId": "236000",
"fbId": "",
"userAccount": {
"userName": "vijay mohan",
"urlFriendlyName": "vijay-mohan",
},
"userProfile": {
},
"userStats": {
}
}
]
This is because your userAccount.userName field is an analyzed string, and thus, the two tokens arun and singh have been indexed. Your query then matches the first token, which is normal.
In order to prevent this and guarantee an exact match you need to declare your field as not_analyzed, like this:
#Field(index = FieldIndex.not_analyzed)
private String userName;
Then you'll need to delete your index and the associated template in /_template, restart your application so a new template and index are created with the proper field mapping.
Then your query will work.

How to use two conditon in one array?

I have a list of task stored in Mongo, like below
{
"name": "task1",
"requiredOS": [
{
"name": "linux",
"version": [
"6.0"
]
},
{
"name": "windows",
"version": [
"2008",
"2008R2"
]
}
],
"requiredSW": [
{
"name": "MySQL",
"version": [
"1.0"
]
}
]
}
My purpose is to filter the task by OS and Software, for example the user give me below filter condition
{
"keyword": [
{
"OS": [
{
"name": "linux",
"version": [
"6.0"
]
},
{
"name": "windows",
"version": [
"2008"
]
}
]
},
{
"SW": [ ]
}
]
}
I need filter out all the task can both running on the windows2008 and Linux 6.0 by searching the "requiredOS" and "requiredSW" filed. As you seen, the search condition is an array (the "OS" part). I have a trouble when use an array as search condition. I expect the query to return me a list of Task which satisfy the condition.
A challenging thing is that I need to integrate the query in to spring-data using #Query. so the query must be parameterized
can anyone give me a hand ?
I have tried a query but return nothing. my purpose is to use $all to combine two condition together then use $elemMatch to search the "requiredOS" field
{"requiredOS":{"$elemMatch":{"$all":[{"name":"linux","version":"5.0"},{"name":"windows","version":"2008"}]}}}
If I understood correctly what you are trying, you need to use $elemMatch operator:
http://docs.mongodb.org/manual/reference/operator/query/elemMatch/#op._S_elemMatch
Taking your example, the query should be like:
#Query("{'requiredOS':{$elemMatch:{name:'linux', version:'7.0'},$elemMatch:{name:'windows', version:'2008'}}}")
It match the document you provided.
You basically seem to need to translate your "parameters" into a query form that produces results, rather than passing them straight though. Here is an example "translation" where the "empty" array is considered to match "anything".
Also the other conditions do not "literally" go straight through. The reason for this is that in that form MongoDB considers it to mean an "exact match". So what you want is a combination of the $elemMatch operator for multiple array conditions, and the $and operator which combines the conditions on the same property element.
This is a bit longer than $all but essentially because that operator is a "shortened" form of $and as $in is to $or:
db.collection.find({
"$and": [
{
"requiredOS": {
"$elemMatch": {
"name": "linux",
"version": "6.0"
}
}
},
{
"requiredOS": {
"$elemMatch": {
"name": "windows",
"version": "2008"
}
}
}
]
})
So it just a matter of transforming the properties in the request to actually match the required query form.
Building this into a query object can be done in a number of ways, such as using the Query builder:
DBObject query = new Query(
new Criteria().andOperator(
Criteria.where("requiredOS").elemMatch(
Criteria.where("name").is("linux").and("version").is("6.0")
),
Criteria.where("requiredOS").elemMatch(
Criteria.where("name").is("windows").and("version").is("2008")
)
)
).getQueryObject();
Which you can then pass in to a mongoOperations method as a query object or any other method that accepts the query object.

Resources