I need a tool to analyze log Hadoop - hadoop

I have log files from Hadoop
I want to analyze these large files to generate as a report, So I am looking for a convenient tool for the analysis of these files. Please introduce tools for the analysis of these files.

You may want to try LogStash or better to say ELK Stack.
For your help i am mentioning few googled links/blogs by other people that may hep you:
Link1, Link2 ,Link3

Related

How to gather files from different sources into HDFS?

Currently I am working with a team that works on "search engines", especially with HP Idol,
The main idea of my work is to find a new search engine which is open source so that I started to work with Elasticsearch, but I still have some problems that I could not find solutions;
I need to index the documents into Elasticsearch from the servers of,
Sharepoint
Documentum
Alfresco
so from my searches on the web, I found out,
Talend ( can not use because, the team does not want to pay )
Apache Manifoldcf (open source but lots of problem with it)
Have seen those problems, I continue to find out new solutions.
Can you please tell me if I have some possibilities to put all files from sources into HDFS and then index them all on Elasticsearch with Apache Spark ?
I will appreciate also all your new techniques that I have never thought.
Thanks in advance

How do you configure the location of TeamCity server cache and temp directories?

This blog post indicates that it's possible to re-configure the locations of the $data/system/caches and $server/temp directories in order to optimise a TeamCity installation.
Admittedly the post is a bit outdated; but I've done plenty of searching around and tweaking and can find no direct references on how to do this.
Any help much appreciated.
At the moment the only way is to map the directories to the desired location using OS-specific means (using symlinks). The related ticket in TeamCity bug tracker: TW-15251, please comment/watch it to get status updates.
It seems there's no way to configure those directories locations on the TeamCity side. However, one can make use of OS-provided file system links to map the $data/system/caches and $server/temp directories to other locations, as documentation suggests doing for the artifacts directory ("Recommendations as to choosing Data Directory Location" section here)

Log report software for ELK stack

I have a node ELK (elasticsearch-logstash-kibana), and i need save report and statistics of logs, there is software that does this?
I have found other dashboard for elasticsearch but are all similar to kibana and not save report (packetbeat, elasticsearch-monitoring etc...)
According to this github thread you can save the results of a chart into CSV.
From there, it's easy to use MS Excel or some other tool to create a report.
I think is as far as you can go with free tools.
p.s.: this seems to be another option but it looks like it only dumps the result, not produce reports like you want.

Better Reporting for CruiseControl.NET

Is there any way to generate the good error report from Cruise Control?
I like to get the following things in that report.
The line number of File that break the build
The name of developer who commited that file. (It should not be related to last person who committed because the build might be broken earlier before last person check-in. )
Thanks.
this should be doable with a bit of xsl parsing alone :-)
Needed steps :
Create the xsl file (blame.xsl for instance)
this xsl should look at the <modifications/> node and the <msbuild/> node to get the data.
Define a new xslReportBuildPlugin in your webdashboard.config pointing to the
new xsl file. Something in the likes of :
<xslReportBuildPlugin description="Blame"
actionName="BlameBuildReport" xslFileName="xsl\blame.xsl"/>
do an iisreset to activate it (just to be sure) and clear your browser cache
now you should see a Blame in every build report :-)
Custom report creation information
A custom xsl modification to bring the error info up to the top of the file - this would be helpful in applying the blame
Then if you add in thesteps from Williams' answer you'd have blame information.
You do already have the file/line number. CruiseControl.NET provides the detailed MSBuild report, which is nothing but the usual compiler output.
This would be only possible with an extension that is specific your the Version Control System. You would have to write such an extension by yourself (but I doubt that it's worth the effort...).
HTH.

Cruise Control .NET time build spends in failed state

My team has a goal to minimize the amount of time that our build is broken.
We use CruiseControl.NET for continuous integration. What I'd like to find out is how best to approach answering the following question:
"In the last {timespan}, how much time has {project-name} spent in a broken status?"
For example:
"Over the last 1 month, how much time has our project spent in a broken status?"
Are there any advanced features of CruiseControl.NET that would facilitate making this information available in some type of a report or somewhere in the dashboard?
Alternatively, how would you approach parsing the xml artifact files to glean this info?
you can use the statistics publisher,
http://www.cruisecontrolnet.org/projects/ccnet/wiki/Statistics_Publisher
and you can display them via project statistics plugin
I see at least two ways to approach this:
You write an external tool which parses CC.NET's XML log files for a project (stored in buildlogs subdirectory by default), calculates statistics and writes a HTML report. This is probably easier to do, but it won't be directly integrated with CC.NET.
You write a CC.NET plug-in to do this. You'll need to do a bit of investigating in this case. My guess the starting point would be to look at the source code of some existing plug-in.
Here are some links about CCNET plugins:
http://www.cruisecontrolnet.org/projects/ccnet/wiki/DevInfo_MakingPlugins
BrekiLabeller - my own plug-in, useful if you want to see how a plug-in can be implemented.
Having had a very quick look at the CC docs, I imagine if you were writing your own Cruise control dashboard, you could consume the RSS feed of build results, parse in all the date times and success/failure states up to your threshold, then sum up the totals.
As for displaying it in a dashboard, I think Cruise Control has a plugin architecture which might help http://cruisecontrol.sourceforge.net/main/plugins.html
So my eventual solution wasn't ideal, but it was easy to do and it works:
I had CC.NET send build emails to an email address (we'll call it build_emails#build_statistics.com). Then I use a ruby script to get the emails via imap and process them to determine our build failure time.
I didn't go the route of directly parsing the xml because I would have had to parse every xml file in the timeframe to build up a timeline and then go over the timeline to make my calculations. It just seemed too complicated to get a simple statistic like this.
I like cc.net, but in this case TeamCity just does this for you. It has lots of other great statistics too. It's free for less than 20 projects.

Resources