Azure Data Factory - How to filter out specific files in multiple Zip. files? - filter

I have set up a ADF pipeline that gets a set of .Zip files from Azure Storage, and iterates through each Zip file's folders and files to land them in an output container with preserved hierarchy.
Get Metadata:
For Each:
Issue:
The issue is that there is a specific .PDF file (ASC_NTS.pdf) that is embedded within each .Zip file that has the same name:
It is causing this error when trying to run the pipeline:
Error
Operation on target ForEach1 failed: Activity failed because an inner activity failed; Inner activity name: Copy data1, Error: ErrorCode=AdlsGen2OperationFailedConcurrentWrite,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error occurred when trying to upload a file. It's possible because you have multiple concurrent copy activities runs writing to the same file 'FAERS_output/ascii/ASC_NTS.pdf'. Check your ADF configuration.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'PreconditionFailed'. Account: 'asastgssuaefdbhdg2dbc4'. FileSystem: 'curated'. Path: 'FAERS_output/ascii/ASC_NTS.pdf'. ErrorCode: 'LeaseIdMissing'. Message: 'There is currently a lease on the resource and no lease ID was specified in the request.'. RequestId: 'b21022a6-b01f-0031-641a-453ab6000000'. TimeStamp: 'Thu, 31 Mar 2022 16:15:56 GMT'..,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.Azure.Storage.Data.Models.ErrorSchemaException,Message=Operation returned an invalid status code 'PreconditionFailed',Source=Microsoft.DataTransfer.ClientLibrary,'
Is there a workaround for this pipeline setup that allows me to filter within the For Each loop? I just need the .TXT files, the .PDF files can be discarded.
This was the closest reference I could find, but does not address my use case:
Filter out file using wildcard path azure data factory

Have you tried using an If Condition activity? You can set the expression to check for the correct file extension.

Related

Reg: database is not starting up an error

getting below error while starting the database:-
startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/mis/PARAMETERFILE/spfile.276.967375255'
ORA-17503: ksfdopn:10 Failed to open file +DATA/mis/PARAMETERFILE/spfile.276.967375255
ORA-04031: unable to allocate 56 bytes of shared memory ("shared pool","unknown object","KKSSP^24","kglseshtSegs")
Your database cannot find the SPFILE (newer init.ora) within ASM with the actual system parameters or has no permissions to access it.
Either your Grid Infrastructure stack or the dbs/spfile.ora is pointing to the wrong file.
To find out what the grid infrastructure stack is using, run "srvctl" which should display the parameterfile name the database should be using
srvctl config database -d <dbname>
...
Spfile: +DATA/<dbname>/PARAMETERFILE/spfile.269.1066152225
...
Then check (as the grid user), if the file indeed is not visible (by using asmcmd):
asmcmd
ASMCMD> ls +DATA/<dbname>/PARAMETERFILE/
spfile.269.1066152225
If the name is different, then you got the issue... (and you have to point to the correct file).
If the name is correct, then it could be wrong permissions on the oracle executable(s) (check My Oracle Support):
RAC Database Can't Start: ORA-01565, ORA-17503: ksfdopn:10 Failed to open file +DATA/BPBL/spfileBPBL.ora (Doc ID 2316088.1)

Feature type rename failed after 9

I'm trying to upload shape file for geoserver and I'm getting this error when I upload the same feature type 10 times in to different datastores.
I'm getting this warning,
WARN [rest.catalog] - Feature type surface_zone_line-line already exists in namespace MyWorkSpace, attempting to rename
And In the next line this error is showing,
ERROR java.lang.RuntimeException: java.lang.IllegalArgumentException: Resource named 'surface_zone_line-line9' already exists in namespace: 'MyWorkSpace'
That renaming worked well up to 9 feature types but It didn't work for 10th.
Please help!
GeoServer Version 2.14.1

Error in copy data ADF: FTP failed to get file length

While trying to build a pipeline that contains:
a getMetadata --> Filter --> ForEach --> Copy data
I have being facing the below error while trying to retrieve the source file:
ErrorCode=FtpFailedToGetFileLength,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=FTP
failed to get file length 'file_01/', file
'Filename_20201113_DATE_20211102_CODE01.DAT'
Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The
remote server returned an error: (550) File unavailable (e.g., file
not found, no access).,Source=System,'
Is there any explanation on why this error appears? How to fix it?

NiFi PutFile processor doesn't save file to a directory

In my NiFi workflow I need to download .zip file from SOAP web-server, save it on machine (optional) and unpack content to sub-folder. Everything works on my local Win 10 machine, but issues occur when I try to move to remote Linux server. Here is the part of my flow when error happens:
So we have a FlowFile entered UpdateAttribute where filename attributed is set with required name and .zip extension. The file is correct as can be seen in queue after starting the processor.
Problems start happen when I pass the FlowFile to PutFile processors. I tried different scenarios based on selected directory:
relative to NiFi main folder ./out:
12:30:01 MSK ERROR
PutFile[id=05788ae5-64e5-32af-bb40-88d50d4c886c] Penalizing StandardFlowFileRecord[uuid=3e0c5e38-76f8-4ce3-b911-90f6901c35a4,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1586337594333-49, container=default, section=49], offset=2335, length=375628606],offset=0,name=,size=375628606] and transferring to failure due to /opt/nifi/nifi-1.11.4/./out: java.nio.file.DirectoryNotEmptyException: /opt/nifi/nifi-1.11.4/./out
12:30:01 MSK ERROR
PutFile[id=05788ae5-64e5-32af-bb40-88d50d4c886c] Unable to remove temporary file /opt/nifi/nifi-1.11.4/./out/. due to /opt/nifi/nifi-1.11.4/./out/.: Invalid argument: java.nio.file.FileSystemException: /opt/nifi/nifi-1.11.4/./out/.: Invalid argument
Full path /opt/nifi/nifi-1.11.4/out/file/ :
12:32:45 MSK ERROR
PutFile[id=0171102b-c82d-149d-c9ae-ea4da99b1750] Penalizing StandardFlowFileRecord[uuid=0573803f-8407-46e4-93f0-e52a5fc35a07,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1586337594333-49, container=default, section=49], offset=2335, length=375628606],offset=0,name=,size=375628606] and transferring to failure due to Failed to export StandardFlowFileRecord[uuid=0573803f-8407-46e4-93f0-e52a5fc35a07,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1586337594333-49, container=default, section=49], offset=2335, length=375628606],offset=0,name=,size=375628606] to /opt/nifi/nifi-1.11.4/out/file/. due to java.io.FileNotFoundException: /opt/nifi/nifi-1.11.4/out/file/. (No such file or directory): org.apache.nifi.processor.exception.FlowFileAccessException: Failed to export StandardFlowFileRecord[uuid=0573803f-8407-46e4-93f0-e52a5fc35a07,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1586337594333-49, container=default, section=49], offset=2335, length=375628606],offset=0,name=,size=375628606] to /opt/nifi/nifi-1.11.4/out/file/. due to java.io.FileNotFoundException: /opt/nifi/nifi-1.11.4/out/file/. (No such file or directory)
So it adds dot ('.') to path which causes exception. All folders are created and permissions granted. I tried to run simple test flow with file of 42B and the same path (GenerateFlowFile -> PutFile) and everything is OK.
What am I doing wrong?
The problem was in Linux system rights: there was 'nifi' user for flow execution and one more user for accessing Linux filesystem.
Assigning rwxrwxrwx for used folders solved the issue.

Oracle Data Integrator SQL to HDFS IKM returns error

I am using ODI (12.1.3.0.0). I created topology for Oracle DB which is OK and I created topology for HDFS using File technology where I think the problem is in.
DataServer for HDFS, I left JDBC driver empty, and filled JDBC Url with hdfs://remotehostname:port
Physical Schema for HDFS, I filled both Schema and Work Schema with /my/path
Then created Logical Schema and Model. After that created Datastore under the model with these definitions.
Name: TestName
Resource Name: TESTFILE.txt
File Format: Fixed
After all these, created a project and a mapping under the project.
Finally when I run the mapping I see these errors:
ODI-1217: Session Oracle2HDFSMapping_Physical_SESS (15) fails with return code ODI-1298.
ODI-1226: Step Physical_STEP fails after 1 attempt(s).
ODI-1240: Flow Physical_STEP fails while performing a Add execute to Sqoop script-IKM SQL to HDFS File (Sqoop)- operation. This flow loads target table null.
ODI-1298: Serial task "SERIAL-MAP_MAIN- (10)" failed because child task "SERIAL-EU-GGUSER_UNIT (20)" is in error.
ODI-1298: Serial task "SERIAL-EU-GGUSER_UNIT (20)" failed because child task "Add execute to Sqoop script-IKM SQL to HDFS File (Sqoop)- (40)" is in error.
Caused By: java.io.IOException: Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at java.lang.Runtime.exec(Runtime.java:617)
at java.lang.Runtime.exec(Runtime.java:450)
at java.lang.Runtime.exec(Runtime.java:347)
at oracle.odi.runtime.agent.execution.cmd.OSCommandExecutor.execute(OSCommandExecutor.java:54)
at oracle.odi.runtime.agent.execution.cmd.OSCommandExecutor.execute(OSCommandExecutor.java:29)
at oracle.odi.runtime.agent.execution.TaskExecutionHandler.handleTask(TaskExecutionHandler.java:52)
at oracle.odi.runtime.agent.execution.SessionTask.processTask(SessionTask.java:203)
at oracle.odi.runtime.agent.execution.SessionTask.doExecuteTask(SessionTask.java:114)
at oracle.odi.runtime.agent.execution.AbstractSessionTask.execute(AbstractSessionTask.java:886)
at oracle.odi.runtime.agent.execution.SessionExecutor$SerialTrain.runTasks(SessionExecutor.java:2198)
at oracle.odi.runtime.agent.execution.SessionExecutor.executeSession(SessionExecutor.java:591)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:718)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:611)
at oracle.odi.core.persistence.dwgobject.DwgObjectTemplate.execute(DwgObjectTemplate.java:203)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor.doProcessStartAgentTask(TaskExecutorAgentRequestProcessor.java:800)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor.access$1400(StartSessRequestProcessor.java:74)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor$StartSessTask.doExecute(StartSessRequestProcessor.java:702)
at oracle.odi.runtime.agent.processor.task.AgentTask.execute(AgentTask.java:180)
at oracle.odi.runtime.agent.support.DefaultAgentTaskExecutor$2.run(DefaultAgentTaskExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:385)
at java.lang.ProcessImpl.start(ProcessImpl.java:136)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 20 more
I wonder where I did it wrong?
For a file Datastore, you need to define the attributes (columns) by opening the Datastore and going on the attribute tab. If the file already exists, you can reverse-engineer the attributes and rename them and change the datatype if needed.
The error message you received for the second task mentions that the file (generated in the fist task) does not exist. So there might be a problem with the first task, probably due to the missing attributes in your datastore.
Here is a detailed article about SQL To HDFS file (Sqoop) KM written by the ODI A-Team : http://www.ateam-oracle.com/importing-data-from-sql-databases-into-hadoop-with-sqoop-and-oracle-data-integrator-odi/

Resources