I have an ADF pipeline that copies 34 tables from an on premise Oracle database to an Azure data lake store; 32 of these copy just fine on a daily basis, the other 2 consistenly fail with...
Copy activity met an internal service error.
For more information, provide this message to customer support. ErrorCode: 8601 GatewayNodeName=XXXXXXXX,
ErrorCode=SystemErrorOdbcWrapperError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Unknown error from wrapper.,
Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,
''Type=Microsoft.DataTransfer.ClientLibrary.Odbc.Runtime.ValueException,Message=[DataSource.Error] The ODBC driver returned an invalid value.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.Wrapper,'.
The activity JSON is templated so is identical for all 34 activities. I can run the oracleReaderQuery in Oracle SQL Developer using the same connection details and credentials and get results.
Searches for this have shown 1 unanswered question on here (StackOverflow) and another Microsoft with a response that says "We will get back to you ASAP when we have new updates"....but there are no updates.
It seems I am not the only one having this issue; has anyone found a solution?
I have tried to do a one off copy in ADF but get the same result; I have tried copying the table to blob storage and get the same result.
Can anyone help me try to fathom what is wrong with this please?
The activity JSON is as follows...
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "OracleSource",
"oracleReaderQuery": "SELECT stuff FROM <source table>"
},
"sink": {
"type": "AzureDataLakeStoreSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "<source table dataset>"
},
{
"name": "<scheduling dependency dataset>"
}
],
"outputs": [
{
"name": "<destination dataset>"
}
],
"policy": {
"timeout": "02:00:00",
"concurrency": 1,
"retry": 3,
"longRetry": 2,
"longRetryInterval": "03:00:00",
"executionPriorityOrder": "OldestFirst"
},
"scheduler": {
"frequency": "Day",
"interval": 1
},
"name": "Copy Activity 34",
"description": "copy activity"
}
As I said though, this is identical, apart from the table it is accessing, to the 32 activities that work perfectly fine.
What's the data type of stuff in your table?
Related
I am working on a cloud datawarehouse using Azure Data Factory v2.
Quite a few of my data sources are on-prem Oracle 12g databases.
Extracting tables 1-1 is not a problem.
However, from time to time I need to extract data generated by parametrized computations on the fly in my Copy Activities.
Since I cannot use PL/SQL stored procedures as sources in ADF, I instead use table-valued functions in the source database and query them in the copy activity.
In the majority of the cases, this works fine. However, when my table-valued function returns a decimal type column, ADF sometimes returns erroneous values. That is: executing the TVF on the source db and previeweing/copying through ADF yields different results.
I have done some experiments if the absolute value or the sign of the decimal number matters, but I cannot find any pattern in which decimals are returned correctly and which are not.
Here are a few examples of the erroneously mapped numbers:
Value in Oracle db
Value in ADF
-658388.5681
188344991.6319
-205668.1648
58835420.6352
10255676.84
188213627.97348
Has any of you experienced similar problems?
Do you know if this is a bug in ADF (which is not integrating well to PL/SQL in the first place)?
First hypothesis
At first I thought the issue was related to NLS, casting or something similar.
I tested this hypothesis by creating a table on the Oracle db side, persisted the output form the TVF there and then extracted from the table in ADF.
Using this method, the decimals were returned correctly in ADF. Thus the hypothesis does not hold.
Second hypothesis
It might have to do with user accesses.
However the linked service used in ADF uses the same db credentials as the ones used to log in to the database to execute the TVF there.
Observation
The error seems to happen more often when a lot of aggregate functions are involved in the tvf's logic
Minimum reproducible example
Oracle db:
CREATE OR REPLACE TYPE test_col AS OBJECT
(
dec_col NUMBER(20,5)
)
/
CREATE OR REPLACE TYPE test_tbl AS TABLE OF test_col;
create or replace function test_fct(param date) return test_tbl
AS
ret_tbl test_tbl;
begin
select
test_col(
<"some complex logic which return a decimal">
)
bulk collect into ret_tbl
from <"some complex joins and group by's">;
return ret_tbl;
end test_fct;
select dec_col from table(test_fct(sysdate));
ADF:
Dataset:
{
"name": "test_dataset",
"properties": {
"linkedServiceName": {
"referenceName": "some_name",
"type": "LinkedServiceReference"
},
"folder": {
"name": "some_name"
},
"annotations": [],
"type": "OracleTable",
"structure": [
{
"name": "dec_col",
"type": "Decimal"
}
]
}
}
Pipeline:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "OracleSource",
"oracleReaderQuery": "select * from table(test_fct(sysdate))",
"partitionOption": "None",
"queryTimeout": "02:00:00"
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "test_dataset",
"type": "DatasetReference"
}
]
}
],
"annotations": []
}
}
Im working on cognos dashboard embedded using the reference from -
Cognos Dashboard embedded.
but instead of csv i'm working on JDBC data sources.
i'm trying to connect to JDBC data source as -
"module": {
"xsd": "https://ibm.com/daas/module/1.0/module.xsd",
"source": {
"id": "StringID",
"jdbc": {
"jdbcUrl": "jdbcUrl: `jdbc:db2://DATABASE-HOST:50000/YOURDB`",
"driverClassName": "com.ibm.db2.jcc.DB2Driver",
"schema": "DEFAULTSCHEMA"
},
"user": "user_name",
"password": "password"
},
"table": {
"name": "ROLE",
"description": "description of the table for visual hints ",
"column": [
{
"name": "ID",
"description": "String",
"datatype": "BIGINT",
"nullable": false,
"label": "ID",
"usage": "identifier",
"regularAggregate": "countDistinct",
},
{
"name": "NAME",
"description": "String",
"datatype": "VARCHAR(100)",
"nullable": true,
"label": "Name",
"usage": "identifier",
"regularAggregate": "countDistinct"
}
]
},
"label": "Module Name",
"identifier": "moduleId"
}
Note - here my database is hosted on private network on not hosted on public IP address.
So when i add the above code to add datasources, then the data is not loading from my DB,
even though i mentioned correct user and password for jdbc connection in above code then also when i drag and drop any field from data sources then it opens a pop up and which asks me for userID and Password.
and even after i filled userID and Password details again in popup i'm still unable to load the data.
Errors -
1 . when any module try to fetch data then calls API -
'https://dde-us-south.analytics.ibm.com/daas/v1/data?moduleUrl=%2Fda......'
but in my case this API is failing and giving the error - Status Code: 403 Forbidden
In SignOnDialog.js
At line - 98 call for saveDataSourceCredential method fails and it says saveDataSourceCredential is not a function.
Expectation -
It should not open a pop to asks for userID and password. and data will load directly just as it happens for database hosted on public IP domains.
This does not work in general. If you are using any type of functionality hosted outside your network that needs to access an API or data on your private network, there needs to be some communication channel.
That channel could be established by setting up a VPN, using products like IBM Secure Gateway to create a client / server connection between the IBM Cloud and your Db2 host, or by even setting up a direct link between your company network and the (IBM) cloud.
I understand that GitHub's GraphQL-based v4 API is much more efficient than the v3 API.
I would like to use the GraphQL API to retrieve, for a given repo:
All of the open milestones.
For each milestone, its title, description, all of its issues (open and closed)
For each issue, its title, description, status, and all messages.
Is there a straightforward way to do this?
Yes. It is straightforward to do so . The query looks like :
{
repository(owner: "gatsbyjs", name: "gatsby") {
description
url
milestones(states: [OPEN],first:2) {
nodes{
title
description
url
issues(states:[OPEN,CLOSED], first:2){
nodes{
title
state
url
comments(first:2){
nodes{
url
body
createdAt
}
pageInfo{
hasNextPage
endCursor
}
}
}
pageInfo{
endCursor
hasNextPage
}
}
}
pageInfo{
endCursor
hasNextPage
}
}
}
}
Note:
For the repository which the url is https://github.com/gatsbyjs/gatsby , its owner is gatsbyjs and its name is gatsby
Go to its API Explorer to try and fine tune the query.Click Ctrl+Space will auto-suggest the available fields that can be retrieved.
Do the paginating by yourself to loop through all records by adjusting the starting cursor and the number of records to be returned in first , after.
It gives you the following :
{
"data": {
"repository": {
"description": "Build blazing fast, modern apps and websites with React",
"url": "https://github.com/gatsbyjs/gatsby",
"milestones": {
"nodes": [
{
"title": "Next Major",
"description": "Issues that will require a breaking change, and which would constitute being done in the next major version of Gatsby.",
"url": "https://github.com/gatsbyjs/gatsby/milestone/5",
"issues": {
"nodes": [
{
"title": "Make accessibility warnings errors",
"state": "OPEN",
"url": "https://github.com/gatsbyjs/gatsby/issues/19945",
"comments": {
"nodes": [
{
"url": "https://github.com/gatsbyjs/gatsby/issues/19945#issuecomment-568891716",
"body": "Hiya!\n\nThis issue has gone quiet. Spooky quiet. 👻\n\nWe get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.\nIf we missed this issue or if you want to keep it open, please reply here. You can also add the label \"not stale\" to keep this issue open!\nAs a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out [gatsby.dev/contribute](https://www.gatsbyjs.org/contributing/how-to-contribute/) for more information about opening PRs, triaging issues, and contributing!\n\nThanks for being a part of the Gatsby community! 💪💜",
"createdAt": "2019-12-25T12:02:26Z"
},
{
"url": "https://github.com/gatsbyjs/gatsby/issues/19945#issuecomment-570779866",
"body": "Hey again!\n\nIt’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.\nPlease keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m `HUMAN_EMOTION_SORRY`. Please feel free to reopen this issue or create a new one if you need anything else.\nAs a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out [gatsby.dev/contribute](https://www.gatsbyjs.org/contributing/how-to-contribute/) for more information about opening PRs, triaging issues, and contributing!\n\nThanks again for being part of the Gatsby community! 💪💜",
"createdAt": "2020-01-04T12:02:28Z"
}
],
"pageInfo": {
"hasNextPage": false,
"endCursor": "Y3Vyc29yOnYyOpHOIgVo2g=="
}
}
},
{
"title": "Configurable output folder",
"state": "OPEN",
"url": "https://github.com/gatsbyjs/gatsby/issues/1878",
"comments": {
"nodes": [
{
"url": "https://github.com/gatsbyjs/gatsby/issues/1878#issuecomment-324062470",
"body": "Do you have a specific use case in mind? This has been discussed before but no one has come up with a concrete use case that justified adding a new option.\r\n\r\nEvery option we add to Gatsby makes the project more complex which has all sorts of long-term costs so unless something is really valuable, I'd rather people handle this sort of thing themselves e.g. just copy the files to the output directory you want or create a symlink. This could easily be turned into a plugin that people could install, etc.",
"createdAt": "2017-08-22T15:27:41Z"
},
{
"url": "https://github.com/gatsbyjs/gatsby/issues/1878#issuecomment-324074853",
"body": "Yes, I have a use-case. I am going to use Gatsby for a documentation part as a part of complex project. All static files (Gatsby output, plus some others) should be placed into one folder `build`, that will be deployed somehow later. In other words, the Gatsby output is only one subfolder in my setup.\r\n\r\nSo far I have worked this around in `postbuild` step, but it looks hacky:\r\n\r\n```\r\n\"build\": \"gatsby build\",\r\n\"postbuild\": \"mv public build/gatsby-subsite\"\r\n```\r\nAdding configurable output folder will reduce this complexity and will help me not to move files around one more time.",
"createdAt": "2017-08-22T16:08:21Z"
}
],
"pageInfo": {
"hasNextPage": true,
"endCursor": "Y3Vyc29yOnYyOpHOE1D9ZQ=="
}
}
}
],
"pageInfo": {
"endCursor": "Y3Vyc29yOnYyOpLPgAAAAAAAArvODwULXA==",
"hasNextPage": true
}
}
}
],
"pageInfo": {
"endCursor": "Y3Vyc29yOnYyOpHOAEEbsw==",
"hasNextPage": false
}
}
}
}
}
recently, the search list api (event type: live) return nothing
example: https://www.googleapis.com/youtube/v3/search?part=id%2Csnippet&eventType=live&type=video&key={YOUR_API_KEY}
return
{
"kind": "youtube#searchListResponse",
"etag": "\"5g01s4-wS2b4VpScndqCYc5Y-8k/jg2CTBtu0DNa8PVkxeurAMgwBzc\"",
"regionCode": "TW",
"pageInfo": {
"totalResults": 0,
"resultsPerPage": 5
},
"items": [
]
}
does anyone has this problem too?
this api used to can get live channel, but it return nothing now
I got the result dude at
https://developers.google.com/youtube/v3/docs/search/list#try-it.
Some times it does not work means, developer is improving api, or may be server problem. Its not a big thing.
Example from the home page of rethinkdb.com doesn't work as expected.
r.db("test").tableCreate("game");
r.db("test").table("game").indexCreate("score");
r.db("test").table("game").insert({name: "brandon", score: 60});
r.db("test").table("game").insert({name: "leon", score: 80});
r.db("test").table("game").insert({name: "connor", score: 100});
r.db("test").table("game").orderBy({index: "score"}).limit(3).changes()
Output:
{ "new_val": { "id": "c727b9eb-5aaa-46f9-bc09-a6c879cfbfa0" , "name":
"brandon" , "score": 60 } } { "new_val": { "id":
"b59d4314-b78c-48c9-8780-0f9d3a6b6887" , "name": "leon" , "score": 80
} } { "new_val": { "id": "519343b1-cd98-4969-8f07-7bff5d981c81" ,
"name": "connor" , "score": 100 } }
r.db("test").table("game").insert({name: "mike", score: 70});
Nothing changes but must be changed due to ordering by score.
r.db("test").table("game").get("519343b1-cd98-4969-8f07-7bff5d981c81").update({score: 50}) // {name: "connor"}
Still nothing..
So why ordered list is not updates as it should be?
This is a bug in the data explorer, unfortunately. It was fixed in https://github.com/rethinkdb/rethinkdb/issues/4852 and the fix will be pushed out as a point release soon. Until it's released I'd recommend using one of the drivers to test these queries instead.
This appears to be a failure of the Web interface. Running these commands from a client driver (I tried it with the JavaScript driver on 2.1.3 and 2.1.4) shows the changes as expected, it's just that the Data Explorer does not update correctly. If you swap tabs to Table View and back you can see that the cursor has received the changes at the bottom.