Use sql_last_value for more than one file: Logstah - elasticsearch

I have a jdbc config file from logstash
statement => "SELECT * from TEST where id > :sql_last_value"
which includes the above query.
Suppose i have 2 or more conf files, how to i differentiate my sql_last_value from each other?
Can i give an alias to differentiate them? How?

The idea is to configure a different last_run_metadata_path value in each configuration file. For instance:
Configuration file 1:
input {
jdbc {
...
last_run_metadata_path => "/Users/me/.logstash_jdbc_last_run1"
...
}
}
Configuration file 2:
input {
jdbc {
...
last_run_metadata_path => "/Users/me/.logstash_jdbc_last_run2"
...
}
}

#Val's answer is the correct way to implement various .logstash_jdbc_last_run files.
On top of this I want to give you some hints on implementing multiple jdbc input plugins in the same pipeline:
You need to keep in mind that when one input plugin throws some errors (for example the query isn't correct, the db user has no grants etc.) the whole pipeline will stop - not only the respective input plugin. That means you can block your other input plugin (which may work well).
So a common way to avoid this is to specify multiple pipelines with only one jdbc input plugin. You can then decide if you want to copy the rest of the plugins (filters and output) or send the incoming requests to a central processing pipeline with the pipeline output plugin.

Related

Spring integration SFTP - issue with filters and number of messages emits

I started using spring integration SFTP and I have some questions.
Filters not working. I have example configuration:
Sftp.inboundAdapter(ftpFileSessionFactory())
.preserveTimestamp(true)
.deleteRemoteFiles(false)
.remoteDirectory(integrationProperties.getRemoteDirectory())
.filter(sftpFileListFilter()) // doesn't work
.patternFilter("*.xlsx") // doesn't work
And my ChainFileListFilter:
private ChainFileListFilter<ChannelSftp.LsEntry> sftpFileListFilter() {
ChainFileListFilter<ChannelSftp.LsEntry> chainFileListFilter = new ChainFileListFilter<>();
chainFileListFilter.addFilter(new SftpPersistentAcceptOnceFileListFilter(metadataStore(), "INT"));
chainFileListFilter.addFilter(new SftpSimplePatternFileListFilter("*.xlsx"));
return chainFileListFilter;
}
If I understand correctly, only the XLSX file should be saved in the local directory. If yes it doesn't work with this configuration. Am I doing something wrong or misunderstood this?
How I can configure SFTP that each downloaded file emit message? I see in the doc two params max-messages-per-poll and max-fetch-size, but I don't know how to set it up so that every file emits a message. I would like to sync files once every 24 hours and produce batch job queue. Maybe there is a workaround?
Is there built-in filter which allow me fetch only files with changed content? The best solution would be to check the checksums of the files.
I will be grateful for your help and explanations.
You cannot combine filter() and patternFilter(). Only one of them can be used: the last one overrides whatever you used before. In other words: or filter() or patternFilter() - not both. By default the logic is like this:
public SftpInboundChannelAdapterSpec patternFilter(String pattern) {
return filter(composeFilters(new SftpSimplePatternFileListFilter(pattern)));
}
private CompositeFileListFilter<ChannelSftp.LsEntry> composeFilters(FileListFilter<ChannelSftp.LsEntry>
fileListFilter) {
CompositeFileListFilter<ChannelSftp.LsEntry> compositeFileListFilter = new CompositeFileListFilter<>();
compositeFileListFilter.addFilters(fileListFilter,
new SftpPersistentAcceptOnceFileListFilter(new SimpleMetadataStore(), "sftpMessageSource"));
return compositeFileListFilter;
}
So, technically you don't need your custom one, if you don't use external persistent MetadataStore. But if you do, think about flipping SftpSimplePatternFileListFilter with SftpPersistentAcceptOnceFileListFilter. Since it is better to check for the pattern before storing the file into MetadataStore.
It is the fact that every synched remote file, passed those filters, is stored into local dir and the message for that local file is emitted immediately when the poller does a request.
The maxFetchSize plays the role when we load remote files into a local dir. The maxMessagesPerPoll is used from the poller, but those are already built from the local files. The message is emitted per local file, not as a batch for all of them. That's not what messaging is designed for.
Please, share more info what does not work with files. The SftpPersistentAcceptOnceFileListFilter checks not only file name, but also mtime of the file. So, that it not about any checksum, but more last modified timestamp of the file.

How can I globally set Oracle fetch size in my Scala Play (Anorm) application?

Our DBAs would like me to increase the fetch size from the JDBC's default (10). Is there a way to do this globally via application.conf, JDBC URL or similar?
My DB calls essentially look like
object SomeController extends Controller {
def someMethod(acronym: String) = Action { implicit request =>
DB.withConnection { implicit c =>
val cust = SQL("""select whatever.... where acronym = {acronym}""").on("acronym" -> acronym).apply()
But there's a lot of them over many controllers and methods.
What can be done to have a central setting?
defaultRowPrefresh is an Oracle JDBC driver property than can be set to change from the default of 10 (Table 4-2 Connection Properties Recognized by Oracle JDBC Drivers)
While not explicitly documented, it looks like custom JDBC properties are done under the datasource key (see this and this)
So something like db.default.datasource.defaultRowsPrefetch="100" should work.
After some searching through the Oracle JDBC jar, I found:
ojdbc6-unjar $ cat ./oracle/jdbc/defaultConnectionProperties.properties
# This properties file sets the default value for connection properties.
# Entries in this file override the predefined defaults as specified
# in the JavaDoc for oracle.jdbc.OracleConnection. These defaults are
# themselves overridden by any values set via -D which are overridden
# by values passed in the Properties argument to getConnection.
#
This bit and the Javadoc do a very bad job at explaining of how to derive the actual parameter name, but after many tries of various case styles, package names etc. I found this to be working:
JAVA_OPTS="-Doracle.jdbc.defaultRowPrefetch=1000" \
./activator -Dconfig.file=conf/xe.conf run
This will make boneCP use a reasonable fetch size without any code change.

How to access ruby code variables in the output section of logstash conf

I am working to create dynamic logstash buckets based on date formulas. My objective is to be able to dynamically calculate the date of a logstash bucket based on a defined variable in the incoming log file.
For this, I am currently testing with a single .conf file that contains the input, filter (with ruby code) and output section. I am pushing the output to my elasticsearch setup. I have worked out the formulas and tested the same in regular ruby through 'irb' and the formulas are working as expected.
I am lost when it comes to be able to access a variable which is present in the filter section in the output section.
I have successfully used the following syntax in the output section to reference the year/month/date:
output {
elasticsearch {
hosts => [ "localhost:9200" ]
user => elastic
password => "bar"
index => "foo-%{+YYYY.MM.dd}"
}
}
I would try the "%{variable}" syntax

assign variables directly to build in jenkins

I am new to Jenkins, i have a wrapper script that runs overnight in Jenkins.
This wrapper script takes input from a .CSV file which contains list of projects. i had to give this way = ./wrapper_script project.csv
This has one problem i.e., it runs all the projects in one single build, but my requirement is i should run one build per project. I have already installed necessary plugins.
How can i give project.csv content as input to the build where i will trigger the wrapper_script.sh directly
Have a look on Job DSL Plugin. You could create a seed-job that reads the CSV file and iterates over the records and creates a job for each record. If you need a more detailed code example, please include sample data from your CSV file.
Ok. Given that the CSV you provided it so simple, you could skip using the CSV library. Your Job DSL seed job would be something like this:
new File('project.csv').splitEachLine(',') { fields ->
job(fields[0]) {
steps {
shell("your build command " + fields[1])
}
}

Trying to import csv data into elasticsearch and then visualise it in Kibana

I am trying to import below csv formatted data into elasticsearch
Below is my SampleCSV.csv file
Date,Amount,Type,Db/Cr,Desc
4/1/2015,10773,Car M&F,Db,Insurance
4/1/2015,900,Supporting Item,Db,Wifi Router
4/1/2015,1000,Car M&F,Db,Fuel
4/1/2015,500,Car M&F,Db,Stepni Tyre Tube
4/1/2015,770,FI,Db,FI
4/1/2015,65,FI,Db,Milk
I am using configuration as below:
input {
stdin {
type => "sample"
}
file {
path => "C:/Users/Deepak/Desktop/SampleCSV.csv"
start_position => "beginning"
} }
filter {
csv {
columns => ["Date","Amount","Type","Db/Cr","Desc"]
separator => ","
}
}
elasticsearch {
action => "index"
host => "localhost"
index => "sample"
workers => 1
}
stdout {
debug=>true
}
}
I am executing below command
C:\MyDrive\Apps\logstash-2.0.0\bin>logstash agent -f C:\MyDrive\Apps\logstash-2.
0.0\bin\MyLogsConfig.conf
io/console not supported; tty will not be manipulated
Default settings used: Filter workers: 2
Logstash startup completed
Now my problem is that when I am looking in kibana about "sample" related index I am not getting any data at all.It looks no data imported into elastic search thus kibana is not getting not getting any thing.
Do you know the reason why??
Several things to take into account. You can split debugging your problem in two parts:
First you want to make sure that logstash is picking up the file contents as you would expect it to. For that you can remove/comment the elasticsearch output. Restart logstash and add a new line of data to your SampleCSV.csv file (don't forget the newcline/CR at the end of the new line otherwise it wount be picked up). If logstash picks up the new line it should appear in your console output (because you added the stdout output filter). Don't forget that the file input remebers where it last read from a logfile and continues reading from that position (it stores this index in a special file called since_db). start_position => "beginning" only works for the first time you start logstash, on subsequent runs it will start reading from it last ended meaning you won't see any new lines in your console unless you a.) add new lines to your files or b.) manually delete the since_db file (sincedb_path => null is not working under windows, at least when I last tried).
As soon as you get that working you can start looking at the elasticsearch/kibana side of things. When looking at your config, I'd suggest you stick with the default index name pattern (logstash-XXX where XXX is something like date created, notice: DON'T USE UPPERCASE LETTERS in your index name or elasticsearch will refuse your data). You should use the default pattern because asfaik the special logstash index-mapping (e.g. raw fields,...) is registered for that name pattern only. If you change you'd have to change the mapping too.
The ELK stack is a bit confusing in the beginning but don't hesitate to ask if you have any more questions.

Resources