Pull log data from diverse logging sources - elasticsearch

We have an environment with linux and windows servers, different cisco components and netapp storage technologies. We want to implement a central logging infrastructure (like graylog or ELK stack) and one requirement is to use the pull method for log data collection. Is using an agent considered a pull method, since the agent monitors log files for changes and then sends them? What other pull method is there that can be used to retrieve data from so many different sources and what products provide this functionality?

Related

Master data reload in spring boot apps running on azure vm scale sets

We have below flow on-prem to reload master data in cache of API servers. API servers are spring boot applications.
User uploads master data via excel files using admin UI.
These files is placed on NAS storage. After this, admin app calls REST endpoints on multiple API servers to reload this master data into API server memory cache.
We have plans to move this set up to azure cloud whereby API servers would be deployed on VM scale sets with auto scaling enabled.
Given that number of VMs, their IPs would vary, how can we support this master data reload in azure environment? One option I can think of -
Admin app reads the master data file and pushes it to a queue (either azure queue storage or active MQ)
API servers either have listeners or schedulers to get message from queue and reload master data.
Is this the best approach? But with queuing solutions, once message is read off queue by one API server, it will not be available for the other instances right?
Could anyone please advise on alternatives to support this master data reload in azure environment with minimal changes to current application?
Regards
Jacob

How to consume data in elasticsearch from a HTTP/REST API call

I am working on a project where I need to create a Kibana dashboard with devops metrics.
We have multiple toolsets being used(Bitbucket, TeamCity, SonarQube, Nexus, Nolio).
The intention of the dashboard is to show a highlevel snapshot of the project/application health. This will include some details such as; change lead time, deployment frequency, mean time to recovery, change failure rate, Code quality, number of commits, etc
My question is this; all the above tool sets have a RESTful API exposed(or http/s for that matter), hence how do I consume these data returned by the API calls from the devops tools (or the UI page of these tools) and them inset them into elasticsearch for it to be later used by Kibana.
Installing logstash or beats on the servers where these devops services are running is not an option as this is a centralized for the organization and having a third party software installed here will need a lot of hopping around for approvals and processes.
Please let me know if anymore information is required from myside.

Confusing by Ranger architecture

After spent whole day to setup and study Hortonworks' Ranger, I'm reluctantly able to use it now, but I'm still very confusing by it's structure. I'm listing the questions below:
What's the relationship between Ranger and Knox, why Hortonworks provides two solutions for same position? If I want apply them for my Hadoop cluster, what's the best practice?
Why I have to use UserSync? or in the other words, Ranger-Admin has ability to talk with LDAP/AD to get users, why it still needs UserSync? and how if UserSync gonna to talk with LDAP/AD also(or a different ldap server), what would happen? will it impacts to Ranger-Admin self's LDAP/AD connection?
Similar question for plugin's audit connection, as Ranger-Admin has audit connection, why plugin need itself's connection to audit database? Why they don't just push audit information to Admin, and let Admin to make decision where to store the information? How if they(Admin and plugin) talk to different database, what gonna happen?
I think I can briefly answer Q1
What's the relationship between Ranger and Knox, why Hortonworks provides two solutions for same position? If I want apply them for my Hadoop cluster, what's the best practice?
They are for different purpose. Ranger gives you fine-grained ACLs control, Knox is a proxy server (gateway) that gives a centralized web service security layer. That says, using Ranger, you have a central place (UI) to manage ACLs for hadoop stack services, e.g who can access a table on hive; using knox, you can put all your hadoop services under a private network using un-secure http protocol, and knox server running on gateway node (outside can access) that has https enabled, it gives user a central http/https entry to access web services that supports user login (some of hadoop stack services, e.g hadoop, doesn't support this yet).

Akka, AMI - discover remote actors for database access

I am working on a prototype for a client where, on AWS auto-scaling is used to create new VMs from Amazon Machine Images (AMIs), using Akka.
I want to have just one actor, control access to the database, so it will create new children, as needed, and queue up requests that go beyond a set limit.
But, I don't know the IP address of the VM, as it may change as Amazon adds/removes VMs based on activity.
How can I discover the actor that will be used to limit access to the database?
I am not certain if clustering will work (http://doc.akka.io/docs/akka/2.4/scala/cluster-usage.html), and this question and answers are from 2011 (Akka remote actor server discovery), and possibly routing may solve this problem: http://doc.akka.io/docs/akka/2.4.16/scala/routing.html
I have a separate REST service that just goes to the database, so it may be that this service will need to do the control before it goes to the actors.

Windows Centralized Configuration for third party applications?

We are looking at a standard way of configuring the various "endpoints" of our application. Our application is a distributed system with Windows Desktop applications, Windows Server "services" and databases.
We currently configure each piece using XML files. This is getting a little out of hands as we work with larger customers who can have dozens of Servers running our application and hundreds of desktop clients.
Can anyone recommend a Microsoft technology or a third party that would allow us to centralize all that configuration information and manage it in a one place for all our applications? Any changes would be "pushed" to the endpoint(s) that are interested.
For example, if we were to change the login for one of our database, we would make that change on the database, then reflect that change in our centralized system. Following that last step, any service that needs to connect to the database would be notified of the change (and potentially receive the new data). How and what each endpoint does with that information is outside the scope of the system.
Our primary business is not "Centralized Configuration Services". We are a GIS company that provides solutions for various utilities worldwide.
I've done a couple of things to give myself this functionality over the years. I build enterprise applicatons that may be distributed across many servers. I don't want to bury config settings in each services config file or each web server's web.config file. For application specific stuff I usually create an application settings table in the app's database. The table only has two fields. SettingName and SettingValue. I then write a web or wcf service whose sole function it is to retrieve these settings. I write a function called GetSetting where you pass "SettingName" and it returns SettingValue or an empty string if your setting is not found. This way I can store all application settings for all components of the application in one spot. Maintenance and troubleshooting for this is really easy, I'm not hunting through scads of config files spread across a dozen web and app servers.
For larger scale apps I might create a separate AppSettings database where I add a new field to my table mentioned above. ApplicationName. My web or wcf service for this approach has the same method call (GetSetting) only at this scope I pass ApplicationName and SettingName and it returns SettingValue or an empty string.
Doing either of these things allows you to centralize all app settings for any size application or IT shop. It has worked really well for us.
You could use RSS together with BitTorrent to distribute changes. See Wikipedia. It is not MS specific however, but should provide the flexibility you need - a configuration server holding the configuration and providing the feeds needed to configure the clients and possibly servers.
Any VCS through a secure channel?
For example, git through ssh (both available in cygwin).
I think the first step is to have the secure channel (if you want the push ability, pulling might be different).
As for managing the "versions" in different "branches", what's better than a version control system?
As it goes for the Microsoft requirement, well the Microsoft sofwares in that exists in that area would suck pretty bad in your case (as in not the best tool for the job).

Resources