Apache Nifi GetTwitter - apache-nifi

I have a simple question, as I am new to NiFi.
I have a GetTwitter processor set up and configured (assuming correctly). I have the Twitter Endpoint set to Sample Endpoint. I run the processor and it runs, but nothing happens. I get no input/output
How do I troubleshoot what it is doing (or in this case not doing)?

A couple things you might look at:
What activity does the processor show? You can look at the metrics to see if anything has been attempted (Tasks/Time) as well as if it succeeded (Out)
Stop the downstream processor temporarily to make any output FlowFiles visible in the connection queue.
Are there errors? Typically these appear in the top-left corner as a yellow icon
Are there related messages in the logs/nifi-app.log file?
It might also help us help you if you describe the GetTwitter Property settings a bit more. Can you share a screenshot (minus keys)?

In my case its because there are two sensitive values set. According to the documentation when a sensitive value is set, the nifi.properties file's nifi.sensitive.props.key value must be set - it is an empty string by default using HortonWorks DataPlatform distribution. I set this to some random string (literally random_STRING but you can use anything) and re-created my process from the template and it began working.

In general I suppose this topic can be debugged by setting the loglevel to DEBUG.
However, in my case the issue was resolved more easily:
I just set up a new cluster, and decided to copy all twitter keys and secrets to notepad first.
It turns out that despite carefully copying the keys from twitter, one of them had a leading tab. When pasting directly into the GetTwitter processer, this would not show, but fortunately it showed up in notepad and I was able to remove it and make this work.

Related

NiFi: ListFile Processor is not detecting file changes sometimes

ListFile processor is not detecting any changes to a previously processed file and reprocess it. FYI, I have tried the following options already for reprocessing and only the finally mentioned hack is working. This is in a single-node NiFi I am running in my development environment.
Update Scenario: ListFile processor is not detecting file content changes and trigger automatically post-update (i.e file updates using VIM editor)
Timestamp modification Scenario: Changing the file timestamp with touch -c command changes the file timestamp but this does not cause auto-trigger of the ListFile processor either.
Stop-start Scenario: Stop-start of the whole process group in NiFi after changing the file as mentioned above also does not cause triggering of ListFile processor.
Waiting Clause: Waiting for long enough after file change also does not help - just in case we assume it will auto-trigger after some delay.
HACK: The only way I am able to trigger the re-processing of the file by ListFile processor is by changing the wildcard expression for "File Filter" in ListFile processor in a harmless, idempotent manner, for example from .*test.*\.csv to test.*\.csv and vice versa later (i.e go back and forth like this for repeated reprocessing).
Reprocessing of files with same old names and with modified data is a requirement for us. Please help!
And sometimes forced reprocessing of even an unmodified file could be required in case of unanticipated data issues upstream/downstream. Please help!
UPDATE
Still facing this sporadic behavior! Only restart of NiFi helps when the ListFile processor fails to respond to file change.
Probably this is delayed answer.
The old List processors like ListFiles/ListFtp/ListSftp etc. used only timestamp tracking strategy to identify the changed files. The processor used to cache last seen timestamp in its processor state and use it to list files with only greater timestamp.
However, this approach was very buggy. Hence they had to come up with much better strategy which is called Entity Tracking. This approach gives broad
range of monitoring on file changes. It keeps track of below parameters of each file in the specified directory.
Name
Size
Last modified timestamp
Any change in file is reflected in these key parameters. Since they are cached, any difference is treated as change, thus changed files appear in the success connection.

Nifi: how to avoid copying file that are partially written

I am trying to use Nifi to get a file from SFTP server. Potentially the file can be big , so my question is how to avoid getting the file while it is being written. I am planning to use ListSFTP+FetchSFTP but also okay with GetSFTP if it can avoid copying partially written files.
thank you
In addition to Andy's solid answer you can also be a bit more flexible by using the ListSFTP/FetchSFTP processor pair by doing some metadata based routing.
After ListSFTP each flowfile will have attributes such as 'file.lastModifiedTime' and others. You can read about them here https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.3.0/org.apache.nifi.processors.standard.ListSFTP/index.html
You can put a RouteOnAttribute process in between the List and Fetch to detect objects that at least based on the reported last modified time are 'too new'. You could route those to a processor that is just a slow pass through to intentionally wait a bit. You can then run those back through the first router until they are 'old enough'. Now, this is admittedly a power user approach but it does give you a lot of flexibility and control. The approach I'm mentioning here is not fool proof as the source system may not report the last mod time correctly, it may not mean the source file is doing being written, etc.. But it gives you additional options IF you cannot do the definitely correct thing above that Andy talks about.
If you have control over the process which writes the file in, a common pattern to solve this is to initially write the file with a specific naming structure, such as beginning with .. After the successful write operation, the file is renamed without the . and it is picked up by the processor. Both GetSFTP and ListSFTP have a processor property called Ignore Dotted Files which is set to true by default and means those processors will not operate on or return files beginning with the dot character.
There is a minimum file age property you can use. The last modification time gets updated as the file is being written. Setting this value to something other than 0 will help fix the problem:

Set Perforce MaxResults from command line

I am getting cryptic error messages from a perforce server. I am checking out a Depot.
p4 sync
Request too large (over 3000000); see 'p4 help maxresults'.
I understand the issue, but the p4 help maxresults is a zero content manpage. Because it at no point makes reference how I can set this limit at the client.
export MaxResults=3000000
Is there any way to checkout the Depot?
In general, you can't override max results; it's been set that way for a reason by your administrator. Here's the documentation: http://www.perforce.com/perforce/doc.current/manuals/p4sag/chapter.performance.html#d0e20714
Do you really need to sync more than 3 million files? If so, you may need to ask your administrator to add you to a user group which has a higher resource limit.
Alternatively, you can reduce the scope of your workspace by defining a more precise view mapping in your workspace definition. Rather than
//depot/... //my-client/...
set your workspace view mapping to something more like:
//depot/main/my-project/... //my-client/main/my-project/...
That way, you will only sync the portion of the repository that is actually necessary for your project.

Modify the default WorkManager in WebSphere 7 using a wsadmin script

I want to raise the maximum number of threads in the default work manager's thread pool using a wsadmin (Jython) script. What is the best approach?
I can't seem to find documentation of a fine-grained control that would let me modify just this property. The closest I can find to what I want is AdminTask.applyConfigProperties, which requires passing a file. The documentation explains that if you want to modify an existing property, you must extract the existing properties file, edit it in an editor, and then pass the edited file to applyConfigProperties.
I want to avoid the manual step of extracting the existing properties file and editing it. The scripts needs to run completely unattended. In fact, I'd prefer to not use a file at all, but just set the property to a value directly in the script.
Something like the following pseudo-code:
defaultwmId = AdminConfig.getid("wm/default")
AdminTask.setProperty(defaultwmId, ['-propertyName', maxThreads, '-propertyValue', 20])
The following represents a fairly simplistic wsadmin approach to updating the max threads on the default work managers:
workManagers = AdminConfig.getid("/WorkManagerInfo:DefaultWorkManager/").splitlines()
for workManager in workManagers :
AdminConfig.modify(workManager, '[[maxThreads "20"]]')
AdminConfig.save()
Note that the first line will retrieve all of the default work managers across all scopes, so if you want to only choose one (for example, if you only one to modify a particular application server or cluster's work manager properties), you will need to refine the containment path further. Also, you may need to synchronize the nodes and restart the modified servers in order for the property to be applied at runtime.
More information on the use of the AdminConfig scripting object can be found in the WAS InfoCenter:
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/rxml_adminconfig1.html

JMeter - saving results + configuring "graph results" time span

I am using JMeter and have 2 questions (I have read the FAQ + Wiki etc):
I use the Graph Results listener. It seems to have a fixed span, e.g. 2 hours (just guessing - this is not indicated anywhere AFAIK), after which it wraps around and starts drawing on same canvas from the left again. Hence after a long weekend run it only shows the results of last 2 hours. Can I configure that span or other properties (beyond the check boxes I see on the Graph Results listener itself)?
Can I save the results of a run and later open them? I know I can save the test plan or parts of it. I am unclear if I can save separately just the test results data, and later open them and perform comparisons etc. And furthermore can I open them with different listeners even if they weren't part of original test (i.e. I think of the test as accumulating data, and later on I want to view and interpret the data using different "viewers").
Thanks,
-- Shaul
Don't know about 1. Regarding 2: listeners typically have a configuration field for "Write All Data to a File", which lets you specify the file name. You can use the Simple Data Writer to store results efficiently for later analysis.
You can load results from a previous test into a visualizer by choosing "Write All Data to a File" and browsing for the file you wish to load. Somewhat counterintuitively, selecting a file for writing also loads that file into the visualizer and displays the results. Just make sure you don't run the test again while that file is selected, otherwise you will lose your saved test data. :-)
Well, I later found a JMeter group that was discussing the issue raised in my first question, and B.Ramann gave me an excellent suggestion to use instead a better graph found here.
-- Shaul

Resources