Extract logs between time frame with non specific time stamp using bash - bash

I want to fetch log between two time stamps but i do not have specific time stamps with me. I can use the command sed for fetching if I have specific time stamp in log using the following command
sed -rne '/$StartTime/,/$EndTime/'p <filename>
My query is that since the specific StartTime and EndTime which I'm fetching from my DB might not be present in the log file, I will have to fetch the log between times near to the StartTime and EndTime that I provide using >= and <= signs. I tried the following command but it does not work.
awk '$0>=st && $0<=et' st=$StartTime et=$EndTime <filename>
Sample input and output
Input
Time retrieved from DB
StartTime - 2017-11-02 10:20:00
EndTime - 2017-11-02 11:20:00
The time present in log
T1 - 2017-11-02 10:17:44
T2 - 2017-11-02 11:19:32
Output: Entire Log text between T1 & T2
Sample Log
2017-03-03 10:43:18,736 [main] WARN - ORACLE_HOSTNAME=xxxxxxxxxx[OVERRIDES:
xxxxxxxxxxxxxxxx]
2017-03-03 10:43:18,736 [main] WARN - NLS_DATE_FORMAT=DD-MON-YYYY
HH24:MI:SS [OVERRIDES: DD-MON-YYYY HH24:MI:SS]
2017-03-03 10:43:18,736 [main] WARN - xxxxUsername=MDMPIUSER [OVERRIDES: MDMPIUSER]
2017-03-03 10:43:18,736 [main] WARN - BUNDLE_GEMFILE=uri:classloader://installer/Gemfile [OVERRIDES: uri:classloader://installer/Gemfile]
2017-03-03 10:43:18,736 [main] WARN - TIMEOUT=900 [OVERRIDES: 900]
2017-03-03 10:43:18,736 [main] WARN - SHLVL=4 [OVERRIDES: 4]
2017-03-03 10:43:18,736 [main] WARN - HISTSIZE=1000 [OVERRIDES: 1000]
2017-03-03 10:43:18,736 [main] WARN - JAVA_HOME=/usr/java/jdk1.8.0_60/jre [OVERRIDES: /usr/java/jdk1.8.0_60/jre]
2017-03-03 10:43:20,156 [main] WARN - APP_PROPS=/home/xxx/conf/appProperties [OVERRIDES: /home/xxx/conf/appProperties]

You can try
awk -v start="$StartTime" -v end="$EndTime" '
function fonct(date)
{
gsub(/-|,| |:/,"",date)
return date
}
BEGIN{
start=fonct(start)
end=fonct(end)
}
{
a=fonct($1$2)
if (a>=start && a<=end)print $0
}' infile

Related

Split ~200mb log4j log file by day

I have a log file formatted as follows and I want to split it into multiple files by day (ie. log-2017-10-2, log-2017-10-3 etc). I've seen people do it with awk but I'm not sure how to handle stack traces because java.io.Exception is a new line. Is there any convenient way to achieve this?
2017-10-02 04:26:02,534 INFO XXXXXXXXXXXXXXXXX
2017-10-03 04:26:02,543 INFO XXXXXXXXXXXX
2017-10-04 04:26:02,544 INFO XXXXXXXXX
2017-10-04 04:26:02,546 INFO XXXXXXXXXXXXX
2017-10-04 04:26:02,549 INFO XXXXXXXXXXX
2017-10-04 04:53:02,787 WARN class.class.class: [FetcherXXXXXX], Error in fetch XXXXXXXXXXXXXXXXXXXXXX
java.io.IOException: Connection to X was disconnected before the response was read
at XXXXXXXXXXXXXXXXXXXX
at XXXXXXXXXXXXXXXXXXXX
at XXXXXXXXXXXXXXXXXXXXX
at XXXXXXXXXXXXXXXX
at XXXXXXXXXXXXXXXX
2017-10-05 04:26:02,549 INFO XXXXXXXXXXX
Final file contents will be:
log-2017-10-2:
2017-10-02 04:26:02,534 INFO XXXXXXXXXXXXXXXXX
log-2017-10-3:
2017-10-03 04:26:02,543 INFO XXXXXXXXXXXX
log-2017-10-4:
2017-10-04 04:26:02,544 INFO XXXXXXXXX
2017-10-04 04:26:02,546 INFO XXXXXXXXXXXXX
2017-10-04 04:26:02,549 INFO XXXXXXXXXXX
2017-10-04 04:53:02,787 WARN class.class.class: [FetcherXXXXXX], Error in fetch XXXXXXXXXXXXXXXXXXXXXX
java.io.IOException: Connection to X was disconnected before the response was read
at XXXXXXXXXXXXXXXXXXXX
at XXXXXXXXXXXXXXXXXXXX
at XXXXXXXXXXXXXXXXXXXXX
at XXXXXXXXXXXXXXXX
at XXXXXXXXXXXXXXXX
log-2017-10-5:
2017-10-05 04:26:02,549 INFO XXXXXXXXXXX
awk to the rescue!
$ awk --posix 'BEGIN{f="log-header"}
$1~/^[0-9]{4}-[0-9]{2}-[0-9]{2}$/{f="log-"$1} {print > f}' log
if there are too many dates (corresponding to too many open files) you may need to close files at one point. For few hundred it should work as is.
The initial log file (log-header) is set in case your log doesn't start with the checked regex.
awk solution:
awk '/^[0-9]{4}-[0-9]{2}-[0-9]{2} /{
if (fn && !a[$1]++) close(fn);
fn="log-"$1
}{ print > fn }' logfile
/^[0-9]{4}-[0-9]{2}-[0-9]{2} / - on encountering line starting with date string
if(fn && !a[$1]++) close(fn) - close the previous opened file descriptor for the previous "date"
fn="log-"$1 - constructing filename
Viewing results:
$ head log-*
==> log-2017-10-02 <==
2017-10-02 04:26:02,534 INFO XXXXXXXXXXXXXXXXX
==> log-2017-10-03 <==
2017-10-03 04:26:02,543 INFO XXXXXXXXXXXX
==> log-2017-10-04 <==
2017-10-04 04:26:02,544 INFO XXXXXXXXX
2017-10-04 04:26:02,546 INFO XXXXXXXXXXXXX
2017-10-04 04:26:02,549 INFO XXXXXXXXXXX
2017-10-04 04:53:02,787 WARN class.class.class: [FetcherXXXXXX], Error in fetch XXXXXXXXXXXXXXXXXXXXXX
java.io.IOException: Connection to X was disconnected before the response was read
&XXXXXXXXXXXXXXXXXXXX
&XXXXXXXXXXXXXXXXXXXX
&XXXXXXXXXXXXXXXXXXXXX
&XXXXXXXXXXXXXXXX
&XXXXXXXXXXXXXXXX
==> log-2017-10-05 <==
2017-10-05 04:26:02,549 INFO XXXXXXXXXXX

Tez - DAGAppMaster - java.lang.IllegalArgumentException: Invalid ContainerId

I try to launch a mapreduce job, but I get an error while excuting the jobs in shell or in hive :
hive> select count(*) from employee ; Query ID =
mapr_20171107135114_a574713d-7d69-45e1-aa73-d4de07a3059b Total jobs =
1 Launching Job 1 out of 1 Number of reduce tasks determined at
compile time: 1 In order to change the average load for a reducer (in
bytes): set hive.exec.reducers.bytes.per.reducer= In order to
limit the maximum number of reducers: set
hive.exec.reducers.max= In order to set a constant number of
reducers: set mapreduce.job.reduces= Starting Job =
job_1510052734193_0005, Tracking URL =
http://hdpsrvpre2.intranet.darty.fr:8088/proxy/application_1510052734193_0005/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill
job_1510052734193_0005 Hadoop job information for Stage-1: number of
mappers: 0; number of reducers: 0 2017-11-07 13:51:25,951 Stage-1 map
= 0%, reduce = 0% Ended Job = job_1510052734193_0005 with errors Error during job, obtaining debugging information... **FAILED: Execution
Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: Stage-Stage-1: MAPRFS Read: 0 MAPRFS Write: 0
FAIL Total MapReduce CPU Time Spent: 0 mse
in Ressourcemanager logs that what I find :
> 2017-11-07 13:51:25,269 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1510052734193_0005_000002 State change from LAUNCHED to
> FINAL_SAVING 2017-11-07 13:51:25,269 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1510052734193_0005_000002 at:
> /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1510052734193_0005/appattempt_1510052734193_0005_000002
> 2017-11-07 13:51:25,283 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> Unregistering app attempt : appattempt_1510052734193_0005_000002
> 2017-11-07 13:51:25,283 INFO
> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
> Application finished, removing password for
> appattempt_1510052734193_0005_000002 2017-11-07 13:51:25,283 **INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1510052734193_0005_000002 State change from FINAL_SAVING to
> FAILED** 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The
> number of failed attempts is 2. The max attempts is 2 2017-11-07
> 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> Updating application application_1510052734193_0005 with final state:
> FAILED 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1510052734193_0005 State change from ACCEPTED to
> FINAL_SAVING 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Updating info for app: application_1510052734193_0005 2017-11-07
> 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Application appattempt_1510052734193_0005_000002 is done.
> finalState=FAILED 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for app: application_1510052734193_0005 at:
> /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/application_1510052734193_0005/application_1510052734193_0005
> 2017-11-07 13:51:25,284 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
> Application application_1510052734193_0005 requests cleared 2017-11-07
> 13:51:25,296 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> Application application_1510052734193_0005 failed 2 times due to AM
> Container for appattempt_1510052734193_0005_000002 exited with
> exitCode: 1 For more detailed output, check application tracking
> page:http://hdpsrvpre2.intranet.darty.fr:8088/cluster/app/application_1510052734193_0005Then,
> click on links to logs of each attempt. Diagnostics: Exception from
> container-launch. Container id:
> container_e10_1510052734193_0005_02_000001 Exit code: 1 Stack trace:
> ExitCodeException exitCode=1: at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at
> org.apache.hadoop.util.Shell.run(Shell.java:456) at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:748) Shell output: main : command
> provided 1 main : user is mapr main : requested yarn user is mapr
>
> Container exited with a non-zero exit code 1 Failing this attempt. Failing the application.
Also , in sys log of jobs I find :
2017-11-07 12:09:46,419 FATAL [main] app.DAGAppMaster: Error starting
DAGAppMaster java.lang.IllegalArgumentException: Invalid ContainerId:
container_e10_1510052734193_0001_01_000001 at
org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:1794)
Caused by: java.lang.NumberFormatException: For input string: "e10"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441) at
java.lang.Long.parseLong(Long.java:483) at
org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
at
org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
... 1 more
It seems to be that Tez which causes the issue, is there any solution to solve that?
Thank you !
I think that the execution environment has different versions of hadoop and their respective jar files.
Please verify the environment and make sure you use only the required version and remove the references of other versions from any of your environment variables.

error while executing pig script?

p.pig contains follwoing code
salaries= load 'salaries' using PigStorage(',') As (gender, age,salary,zip);
salaries= load 'salaries' using PigStorage(',') As (gender:chararray,age:int,salary:double,zip:long);
salaries=load 'salaries' using PigStorage(',') as (gender:chararray,details:bag{b(age:int,salary:double,zip:long)});
highsal= filter salaries by salary > 75000;
dump highsal
salbyage= group salaries by age;
describe salbyage;
salbyage= group salaries All;
salgrp= group salaries by $3;
A= foreach salaries generate age,salary;
describe A;
salaries= load 'salaries.txt' using PigStorage(',') as (gender:chararray,age:int,salary:double,zip:int);
vivek#ubuntu:~/Applications/Hadoop_program/pip$ pig -x mapreduce p.pig
15/09/24 03:16:32 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
15/09/24 03:16:32 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
15/09/24 03:16:32 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2015-09-24 03:16:32,990 [main] INFO org.apache.pig.Main - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05
2015-09-24 03:16:32,991 [main] INFO org.apache.pig.Main - Logging error messages to: /home/vivek/Applications/Hadoop_program/pip/pig_1443089792987.log
2015-09-24 03:16:38,966 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-09-24 03:16:41,232 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/vivek/.pigbootup not found
2015-09-24 03:16:42,869 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-09-24 03:16:42,870 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-09-24 03:16:42,870 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2015-09-24 03:16:45,436 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " <PATH> "salaries=load "" at line 7, column 1.
Was expecting one of:
<EOF>
"cat" ...
"clear" ...
"fs" ...
"sh" ...
"cd" ...
"cp" ...
"copyFromLocal" ...
"copyToLocal" ...
"dump" ...
"\\d" ...
"describe" ...
"\\de" ...
"aliases" ...
"explain" ...
"\\e" ...
"help" ...
"history" ...
"kill" ...
"ls" ...
"mv" ...
"mkdir" ...
"pwd" ...
"quit" ...
"\\q" ...
"register" ...
"rm" ...
"rmf" ...
"set" ...
"illustrate" ...
"\\i" ...
"run" ...
"exec" ...
"scriptDone" ...
"" ...
"" ...
<EOL> ...
";" ...
Details at logfile: /home/vivek/Applications/Hadoop_program/pip/pig_1443089792987.log
2015-09-24 03:16:45,554 [main] INFO org.apache.pig.Main - Pig script completed in 13 seconds and 48 milliseconds (13048 ms)
vivek#ubuntu:~/Applications/Hadoop_program/pip$
Here at starting p.pig comprised of the code give above.
i'm started my pig in mapreduce mode.
while executing above code it encounters following error:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "salaries=load "" at line 7, column 1.
please try to resolve the error .
You have not provided spaces between alias name and command.
Pig expects atleast on space before or after '=' operator.
Change this line :
salaries=load 'salaries' using PigStorage(',') as (gender:chararray,details:bag{b(age:int,salary:double,zip:long)});
TO
salaries = load 'salaries' using PigStorage(',') as (gender:chararray,details:bag{b(age:int,salary:double,zip:long)});

Grep from a file and output pattern if not found

I have a file, ids_list.csv, containing a list of ids, one on each line.
I also have a log file, in which I want to find the ids from ids_list.txt.
What I want is to print to a result.txt file the line from the log file if the pattern was found, or the pattern otherwise.
So I wrote this script :
#!/bin/bash
for i in `cat ids_list.csv`;
do
echo $i
echo `grep $i log_FVAScope`
if [[ ! res=$(grep $i log_FVAScope) ]]; then
echo $i >> result.txt;
else
echo $res >> result.txt
fi
done
However, result.txt is empty, what am I doing wrong ?
Also it seems to be rather slow, how can I speed that up (ids_list.csv contains ~40k lines, the log file contains 700k lines) ?
EDIT : sample input :
ids_list.csv :
KBKEQO17564
SPXTCT769178
KBKFXS1952894
CDNEVL_4148105
BBR10000130794156
log file :
18:51:59.368 [pool-1-thread-4] INFO c.s.m.x.liqor.filter.CodChainFilter - KBKEQO17564 excluded by CodeChain Filter
18:51:59.369 [pool-1-thread-5] INFO c.s.m.x.liqor.filter.CodChainFilter - KBKFXS1952894 excluded by CodeChain Filter
18:51:59.369 [main] INFO c.s.m.x.l.manager.FilterManagerImpl - waiting new deals to submit
18:51:59.369 [pool-1-thread-2] INFO c.s.m.x.liqor.filter.CodChainFilter - CDNEVL_4148105 excluded by CodeChain Filter
18:51:59.369 [pool-1-thread-1] INFO c.s.m.x.liqor.filter.CodChainFilter - BBR10000130794156 excluded by CodeChain Filter
Desired output (result.txt) :
18:51:59.368 [pool-1-thread-4] INFO c.s.m.x.liqor.filter.CodChainFilter - KBKEQO17564 excluded by CodeChain Filter
SPXTCT769178
18:51:59.369 [pool-1-thread-5] INFO c.s.m.x.liqor.filter.CodChainFilter - KBKFXS1952894 excluded by CodeChain Filter
18:51:59.369 [main] INFO c.s.m.x.l.manager.FilterManagerImpl - waiting new deals to submit
18:51:59.369 [pool-1-thread-2] INFO c.s.m.x.liqor.filter.CodChainFilter - CDNEVL_4148105 excluded by CodeChain Filter
18:51:59.369 [pool-1-thread-1] INFO c.s.m.x.liqor.filter.CodChainFilter - BBR10000130794156 excluded by CodeChain Filter
Well, you don't need script actually. Below command does what you want.
while read line; do grep $line log_FVAScope >> result.txt ; done < ids_list.csv
I don't know the performance issue though.

Fail to load data to Hortonworks Sandbox in Pig

Hi I am very neophyte to hadoop and when I first run this command
LOAD 'Pig/iris.csv' using PigStorage (',') the error popped out:
LOAD 'Pig/iris.csv' using PigStorage (',');
2014-09-05 06:04:04,853 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.1.2.1.1.0-385 (rexported) compiled Apr 16 2014, 15:59:00
2014-09-05 06:04:04,885 [main] INFO org.apache.pig.Main - Logging error messages to: /dev/null
2014-09-05 06:04:07,077 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /usr/lib/hue/.pigbootup not found
2014-09-05 06:04:14,699 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-09-05 06:04:14,699 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-09-05 06:04:14,699 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020
2014-09-05 06:05:11,826 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> LOAD 'Pig/iris.csv' using PigStorage (',');
2014-09-05 06:05:13,203 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " <IDENTIFIER> "LOAD "" at line 1, column 1.
Was expecting one of:
<EOF>
"cat" ...
"clear" ...
"fs" ...
"sh" ...
"cd" ...
"cp" ...
"copyFromLocal" ...
"copyToLocal" ...
"dump" ...
"\\d" ...
"describe" ...
"\\de" ...
"aliases" ...
"explain" ...
"\\e" ...
"help" ...
"history" ...
"kill" ...
"ls" ...
"mv" ...
"mkdir" ...
"pwd" ...
"quit" ...
"\\q" ...
"register" ...
"rm" ...
"rmf" ...
"set" ...
"illustrate" ...
"\\i" ...
"run" ...
"exec" ...
"scriptDone" ...
"" ...
"" ...
<EOL> ...
";" ...
Details at logfile: /dev/null
Does anyone know how to solve the problem?
LOAD creates a relation. You need to assign that to a variable so that you can do something with it later:
L = LOAD 'Pig/iris.csv' using PigStorage (',');
DUMP L;

Resources