App not displaying data after Ensembles Merge - ensembles

I have an iPad app that uses Ensembles; it uploads the CD to iCloud, but when I test it on a 2nd iPad, the app starts, but the sync'd data in iCloud is not downloaded (I can see the data in the 2nd iPad's iCloud Manage Data).
This is the log from Verbose logging:
2015-12-24 09:39:50.352 BookstoreInventoryManager[285:20376]
[Crashlytics] Version 3.4.1 (92) 2015-12-24 09:39:50.546
BookstoreInventoryManager[285:20376]
ensembleStatus: 1 2015-12-24 09:39:51.219
BookstoreInventoryManager[285:20376] -[CDEICloudFileSystem
repairEnsembleDirectory:completion:] line 176: Checking if repairs are
needed in iCloud Drive 2015-12-24 09:39:51.274
BookstoreInventoryManager[285:20376] -[CDECloudManager
retrieveRegistrationInfoForStoreWithIdentifier:completion:] line 724:
Retrieving registration info 2015-12-24 09:39:51.315
BookstoreInventoryManager[285:20376] __77-[CDECloudManager
retrieveRegistrationInfoForStoreWithIdentifier:completion:]_block_invoke_2
line 743: Downloading file at remote path:
/BookInventoryMgr/stores/1778720D-A5E8-472E-8E62-CD9FE8C31D94-215-0000000391179BA0
2015-12-24 09:39:51.385 BookstoreInventoryManager[285:20376]
-[CDECloudManager createRemoteDirectories:withCompletion:] line 688: Creating remote directories 2015-12-24 09:39:51.629
BookstoreInventoryManager[285:20376] -[CDECloudManager
importNewDataFilesWithCompletion:] line 159: Transferring new data
files from cloud to event store 2015-12-24 09:39:51.633
BookstoreInventoryManager[285:20376] -[CDECloudManager
importNewBaselineEventsWithCompletion:] line 143: Transferring new
baselines from cloud to event store 2015-12-24 09:39:51.659
BookstoreInventoryManager[285:20376] -[CDEBaselineConsolidator
consolidateBaselineWithCompletion:] line 65: Consolidating baselines
2015-12-24 09:39:51.667 BookstoreInventoryManager[285:20451]
__61-[CDEBaselineConsolidator consolidateBaselineWithCompletion:]_block_invoke line 76: Found
baselines with unique ids: (
"092431DC-4576-4062-9A15-6B2ED4318BBA-215-00000003A933FF2D" ) 2015-12-24 09:39:51.677 BookstoreInventoryManager[285:20451]
__61-[CDEBaselineConsolidator consolidateBaselineWithCompletion:]_block_invoke line 108: Baselines
remaining that need merging: (
"092431DC-4576-4062-9A15-6B2ED4318BBA-215-00000003A933FF2D" ) 2015-12-24 09:39:51.679 BookstoreInventoryManager[285:20376]
__61-[CDEBaselineConsolidator consolidateBaselineWithCompletion:]_block_invoke_2 line 131: Finishing
baseline consolidation 2015-12-24 09:39:51.680
BookstoreInventoryManager[285:20376] -[CDECloudManager
importNewRemoteNonBaselineEventsWithCompletion:] line 127:
Transferring new events from cloud to event store 2015-12-24
09:39:51.771 BookstoreInventoryManager[285:20420] -[CDEEventIntegrator
integrate:] line 311: Integrating new events into main context
2015-12-24 09:39:51.779 BookstoreInventoryManager[285:20420]
__32-[CDEEventIntegrator integrate:]_block_invoke line 329: Baseline has changed. Will carry out full integration of the persistent store.
2015-12-24 09:39:51.813 BookstoreInventoryManager[285:20376]
-[CDECloudManager exportDataFilesWithCompletion:] line 344: Transferring data files from event store to cloud 2015-12-24
09:39:51.821 BookstoreInventoryManager[285:20376] -[CDECloudManager
exportNewLocalBaselineWithCompletion:] line 332: Transferring baseline
from event store to cloud 2015-12-24 09:39:51.828
BookstoreInventoryManager[285:20376] -[CDECloudManager
exportNewLocalNonBaselineEventsWithCompletion:] line 320: Transferring
events from event store to cloud 2015-12-24 09:39:51.833
BookstoreInventoryManager[285:20376] -[CDECloudManager
removeOutdatedRemoteFilesWithCompletion:] line 584: Removing outdated
files 2015-12-24 09:39:51.836 BookstoreInventoryManager[285:20376]
-[CDECloudManager removeOutdatedRemoteFilesWithCompletion:] line 603: Baseline files in cloud: {(
"0_092431DC-4576-4062-9A15-6B2ED4318BBA-215-00000003A933FF2D_1778720D.cdeevent"
)} 2015-12-24 09:39:51.836 BookstoreInventoryManager[285:20376]
-[CDECloudManager removeOutdatedRemoteFilesWithCompletion:] line 604: Aliases for baseline files in store: {(
"0_092431DC-4576-4062-9A15-6B2ED4318BBA-215-00000003A933FF2D.cdeevent",
"0_092431DC-4576-4062-9A15-6B2ED4318BBA-215-00000003A933FF2D_1778720D.cdeevent"
)} 2015-12-24 09:39:51.837 BookstoreInventoryManager[285:20376]
-[CDECloudManager removeOutdatedRemoteFilesWithCompletion:] line 605: Baseline files to remove: {( )} 2015-12-24 09:39:51.839
BookstoreInventoryManager[285:20376] -[CDECloudManager
removeOutdatedRemoteFilesWithCompletion:] line 612: Event files in
cloud: {( )} 2015-12-24 09:39:51.839
BookstoreInventoryManager[285:20376] -[CDECloudManager
removeOutdatedRemoteFilesWithCompletion:] line 613: Aliases for event
files in store: {( )} 2015-12-24 09:39:51.839
BookstoreInventoryManager[285:20376] -[CDECloudManager
removeOutdatedRemoteFilesWithCompletion:] line 614: Event files to
remove: {( )} 2015-12-24 09:39:51.841
BookstoreInventoryManager[285:20376] -[CDECloudManager
removeOutdatedRemoteFilesWithCompletion:] line 620: Data files in
cloud: {( )} 2015-12-24 09:39:51.841
BookstoreInventoryManager[285:20376] -[CDECloudManager
removeOutdatedRemoteFilesWithCompletion:] line 621: Data files in
store: {( )} 2015-12-24 09:39:51.842
BookstoreInventoryManager[285:20376] -[CDECloudManager
removeOutdatedRemoteFilesWithCompletion:] line 622: Data files to
remove: {( )} 2015-12-24 09:39:51.842
BookstoreInventoryManager[285:20376] -[CDECloudManager
removeOutdatedRemoteFilesWithCompletion:] line 635: Removing cloud
files:
This all looks normal to me; The populated CD store doesn't appear to be on the 2nd device (I did a MR_findAll, but nothing was returned).
What do you suppose is wrong?

Related

Duplicated log events when deleting registry

I'm currently working on a PoC ELK installation and I'd like to re-send every log line of a file which is registered in Filebeat for testing purposes.
This is what I do:
I stop Filebeat
I delete the index in Logstash through Kibana
I delete the Filebeat registry file
I start Filebeat
In Kibana I can see that twice as many events are there as log lines, and I can also see that every event is duplicated once.
Why is that?
Filebeat logs:
2017-05-05T14:25:16+02:00 INFO Setup Beat: filebeat; Version: 5.2.2
2017-05-05T14:25:16+02:00 INFO Max Retries set to: 3
2017-05-05T14:25:16+02:00 INFO Activated logstash as output plugin.
2017-05-05T14:25:16+02:00 INFO Publisher name: anonymized
2017-05-05T14:25:16+02:00 INFO Flush Interval set to: 1s
2017-05-05T14:25:16+02:00 INFO Max Bulk Size set to: 2048
2017-05-05T14:25:16+02:00 INFO filebeat start running.
2017-05-05T14:25:16+02:00 INFO No registry file found under: /var/lib/filebeat/registry. Creating a new registry file.
2017-05-05T14:25:16+02:00 INFO Loading registrar data from /var/lib/filebeat/registry
2017-05-05T14:25:16+02:00 INFO States Loaded from registrar: 0
2017-05-05T14:25:16+02:00 INFO Loading Prospectors: 1
2017-05-05T14:25:16+02:00 INFO Prospector with previous states loaded: 0
2017-05-05T14:25:16+02:00 INFO Loading Prospectors completed. Number of prospectors: 1
2017-05-05T14:25:16+02:00 INFO All prospectors are initialised and running with 0 states to persist
2017-05-05T14:25:16+02:00 INFO Starting Registrar
2017-05-05T14:25:16+02:00 INFO Start sending events to output
2017-05-05T14:25:16+02:00 INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2017-05-05T14:25:16+02:00 INFO Starting prospector of type: log
2017-05-05T14:25:16+02:00 INFO Harvester started for file: /some/where/anonymized.log
2017-05-05T14:25:46+02:00 INFO Non-zero metrics in the last 30s: registrar.writes=2 libbeat.logstash.publish.read_bytes=54 libbeat.logstash.publish.write_bytes=32390 libbeat.logstash.published_and_acked_events=578 filebeat.harvester.running=1 registar.states.current=1 libbeat.logstash.call_count.PublishEvents=1 libbeat.publisher.published_events=578 publish.events=579 filebeat.harvester.started=1 registrar.states.update=579 filebeat.harvester.open_files=1
2017-05-05T14:26:16+02:00 INFO No non-zero metrics in the last 30s
Deleting the registry file created the problem.
Filebeat management the state of a file and the ACK of the event with the prospector(in memory) and with the Registry File(persisted in disk).
Please read the documentation Here
You can management the _id field of each event by yourself, so that any event that is duplicated (for any reason, even in production environment) will not have two of them in elasticsearch, but will update the event.
Create the following configuration in your logstash pipeline config file.
#if your logs don't have a unique ID, use the following to generate one
fingerprint{
#with the message field or choose other(s) that can give you a uniqueID
source => ["message"]
target => "LogID"
key => "something"
method => "MD5"
concatenate_sources => true
}
#in your output section
elasticsearch{
hosts => ["localhost:9200"]
document_id => "%{LogID}"
index => "yourindex"
}

ERROR 2997: Unable to recreate exception from backed error: while using CSVExcelStorage

ERROR 2997: Unable to recreate exception from backed error.
Here i have parsed apache log file but when i am trying to export it into csv format this error occurs. Code and error :
grunt> STORE logs INTO '/home/cloudera/workspace/Test_log.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE','NOCHANGE');
2015-12-24 10:50:44,821 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2015-12-24 10:50:44,830 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2015-12-24 10:50:44,937 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-12-24 10:50:49,055 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2015-12-24 10:50:49,056 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is
false, will not generate code.
2015-12-24 10:50:49,056 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Starting process to move
generated code to distributed cache
2015-12-24 10:50:49,056 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Setting key
[pig.schematuple.classes] with classes to deserialize []
2015-12-24 10:50:49,158 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2015-12-24 10:50:49,158 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker
is deprecated. Instead, use mapreduce.jobtracker.address
2015-12-24 10:50:49,159 [JobControl] INFO
org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager
at /0.0.0.0:8032
2015-12-24 10:50:49,177 [JobControl] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-12-24 10:50:49,428 [JobControl] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2015-12-24 10:50:49,431 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 2
2015-12-24 10:50:49,467 [JobControl] INFO
org.apache.hadoop.mapreduce.JobSubmitter - number of splits:2
2015-12-24 10:50:49,518 [JobControl] INFO
org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job:
job_1450979216927_0004
2015-12-24 10:50:49,578 [JobControl] INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted
application application_1450979216927_0004
2015-12-24 10:50:49,581 [JobControl] INFO
org.apache.hadoop.mapreduce.Job - The url to track the job:
http://quickstart.cloudera:8088/proxy/application_1450979216927_0004/
2015-12-24 10:50:49,659 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_1450979216927_0004
2015-12-24 10:50:49,659 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases logs
2015-12-24 10:50:49,659 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: logs[7,7],null[-1,-1] C: R:
2015-12-24 10:50:49,659 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1450979216927_0004
2015-12-24 10:50:49,702 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2015-12-24 10:51:17,030 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2015-12-24 10:52:04,848 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-12-24 10:52:04,848 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_1450979216927_0004 has failed! Stop running all dependent jobs
2015-12-24 10:52:04,848 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2015-12-24 10:52:05,039 [main] ERROR
org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
recreate exception from backed error:
AttemptID:attempt_1450979216927_0004_m_000001_3 Info:Error:
org.apache.pig.data.Tuple.isNull()Z
2015-12-24 10:52:05,040 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
failed! 2015-12-24
10:52:05,040[main]INFOorg.apache.pig.tools.pigstats.SimplePigStats -
Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0-cdh5.4.0 0.12.0-cdh5.4.0 cloudera 2015-12-24 10:50:44 2015-12-24 10:52:05 UNKNOWN
Failed!
Failed Jobs: JobId Alias Feature Message Outputs
job_1450979216927_0004 logs MAP_ONLY Message: Job
failed! /home/cloudera/workspace/Test_log.csv,
Input(s): Failed to read data from "/myhdfs/project/TestLog.txt"
Output(s): Failed to produce result in
"/home/cloudera/workspace/Test_log.csv"
Counters: Total records written : 0 Total bytes written : 0 Spillable
Memory Manager spill count : 0 Total bags proactively spilled: 0 Total
records proactively spilled: 0
Job DAG: job_1450979216927_0004
2015-12-24 10:52:05,040 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
Seems to be an error in your dataprocessing chain. This log entry points out that a tuple is null. Just an idea: This can happen, eg when you use customized UDFs, which return Null in some cases.
2015-12-24 10:52:05,039 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1450979216927_0004_m_000001_3 Info:Error: org.apache.pig.data.Tuple.isNull()Z

Big Data: Sqoop-Export Error

I am very new to this world. While running export command using sqoop, I am getting the below error “Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/Test5”. I have checked the path /home/cloudera/Test5 and the file exists in the path. From the core-site.xml file of sqoop configuration the details of hdfs path is coming, when I tested it through file browser Just opening IE and type hdfs://quickstart.cloudera:8020/home/cloudera/Test5, the message is coming as “Unable to connect”. I do not know the correct paramater values of the property. Please help me in solving this issue.
Please find the property file parameter and errir details below.
Parameter file
<name>fs.defaultFS</name>
<value>hdfs://quickstart.cloudera:8020</value>
Error
[cloudera#quickstart hadoop-conf]$ sqoop export --connect jdbc:sqlserver://10.34.83.177:54815 --username hadoop --password xxxxxx --table hadoop_sanjeeb3 --export-dir /home/cloudera/Test5 -mapreduce-job-name sqoop_export_job -m 1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/10/01 08:42:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.2
15/10/01 08:42:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/10/01 08:42:48 INFO manager.SqlManager: Using default fetchSize of 1000
15/10/01 08:42:48 INFO tool.CodeGenTool: Beginning code generation
15/10/01 08:42:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [hadoop_sanjeeb3] AS t WHERE 1=0
15/10/01 08:42:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/aa9c9fd9f69b76202be29508561f22ff/hadoop_sanjeeb3.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/10/01 08:42:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/aa9c9fd9f69b76202be29508561f22ff/hadoop_sanjeeb3.jar
15/10/01 08:42:51 INFO mapreduce.ExportJobBase: Beginning export of hadoop_sanjeeb3
15/10/01 08:42:51 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/10/01 08:42:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/10/01 08:42:54 WARN mapreduce.ExportJobBase: Input path hdfs://quickstart.cloudera:8020/home/cloudera/Test5 does not exist
15/10/01 08:42:54 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/10/01 08:42:54 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/10/01 08:42:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/10/01 08:42:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/01 08:42:58 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudera/.staging/job_1443557935828_0011
15/10/01 08:42:58 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/Test5
15/10/01 08:42:58 ERROR tool.ExportTool: Encountered IOException running export job: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/Test5
Regards - Sanjeeb
Seems like you have some confusion between local file system and hadoop file system. The file that you are trying to export using sqoop should be present in hdfs. The directory location /home/cloudera/Test seems to be present in local file system.
Execute the command given below and confirm that the location that you mentioned exists in hdfs.
hadoop fs -ls /home/cloudera/Test5
If this is giving error, it means the directory doesn't exists in hdfs. You can't browse hdfs with a simple ls command, you have to use hadoop commands. If you want to browse hdfs directories using a browser, open the namenode web ui (http://namenode-host:50070) and there you have an option to browse the files and directories.
You cannot browse the hdfs filesystem using a url like hdfs://quickstart.cloudera:8020/home/cloudera/Test5 using the browser. You can use webhdfs for similar operation.
Ensure that the file is present in hdfs and trigger the command again. It will work.
NB: Usually we never keep user directories like /home/cloudera in hdfs. The structure will be something like /user/{username}. By default, hdfs considers /user/{username} as the home dir in hdfs. Where {username} will be the current logged-in user in linux
That file may be in the local file system, but not in the hadoop distributed file system (HDFS). You can add those local files from local file system to HDFS by
hadoop fs -put <local_file_path> <HDFS_diresctory>
command. You should do it as an HDFS user.

If first attemp to reduce faills (network connection issues), the subsequent reduce attempts (retry) will fail because the output file already exists

I have mapreduce jobs failing big on Amazon EMR because if the first attempt fails to copy results to S3, the file (probably partial) will be created and subsequent reduce attempts will refuse write on a file that already exists.
The first attempt log:
014-11-30 06:56:19,774 INFO [main] com.amazonaws.latency: StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: null; Request ID: removed), S3 Extended Request ID: removed=], ServiceName=[Amazon S3], AWSErrorCode=[null], AWSRequestID=[removed], ServiceEndpoint=[https://devel.rui.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, ClientExecuteTime=[130.087], HttpRequestTime=[118.72], HttpClientReceiveResponseTime=[32.585], RequestSigningTime=[0.646], HttpClientSendRequestTime=[0.835],
2014-11-30 06:56:19,803 INFO [main] com.amazonaws.latency: StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: null; Request ID: removed), S3 Extended Request ID: 1removed=], ServiceName=[Amazon S3], AWSErrorCode=[null], AWSRequestID=[removed], ServiceEndpoint=[https://removed.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[27.899], HttpRequestTime=[26.898], HttpClientReceiveResponseTime=[9.405], RequestSigningTime=[0.559], HttpClientSendRequestTime=[1.016],
2014-11-30 06:56:19,939 INFO [main] com.amazonaws.latency: StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[removed], ServiceEndpoint=[https://removedi.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[127.219], HttpRequestTime=[20.791], HttpClientReceiveResponseTime=[15.467], RequestSigningTime=[0.391], ResponseProcessingTime=[82.617], HttpClientSendRequestTime=[0.955],
2014-11-30 06:56:19,999 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
A retry attempt log (the all look the same):
RequestSigningTime=[0.663], ResponseProcessingTime=[12.466], HttpClientSendRequestTime=[0.832],
2014-11-30 07:23:56,526 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child :
java.io.IOException: File already
exists:s3n://removed/removed/part-r-00005.gz
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:615)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:910)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:891)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:788)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:169)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:548)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:622)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
The funny thing is that if I open the partfiile0005.gz it has things inside and is the format that is supposed to be.
Any ideas, how to solve this (and how to do it):
a) increase the deal with the latency (eg. increase the timeout)
b) the retry to delete the existing file if already exists.
You can modify your job to write output to a temporary directory that is named with a jobId or timestamp for uniqueness, then when processing is complete move the contents to your desired output location. That way if something goes wrong while processing after having written partial output, your desired output directory isn't affected. This also means that you wont accidentally read that partial output from the failed job.

Flume NG not writing to HDFS

I'm new at using Flume and Hadoop so I'm trying to setup the simplest (but somewhat helpful/realistic) example I can. I'm using the HortonWorks Sandbox in a VM client. After following one tutorial 12 (which involves setting up and using Flume) everything seems to be working correctly.
So I setup my own flume.conf that should
Read from an apache access log
Use a memory channel
Write to the HDFS
Simple enough right? Here's my conf file
agent.sources=exec-source
agent.sinks=hdfs-sink
agent.channels=ch1
agent.sources.exec-source.type=exec
agent.sources.exec-source.command=tail -F /var/log/httpd/access_log
agent.sinks.hdfs-sink.type=hdfs
agent.sinks.hdfs-sink.hdfs.path=/flume/events
agent.sinks.hdfs-sink.hdfs.filePrefix=apacheaccess
agent.sinks.hdfs-sink.hdfs.rollInterval=10
agent.sinks.hdfs-sink.hdfs.rollSize=0
agent.channels.ch1.type=memory
agent.channels.ch1.capacity=1000
agent.sources.exec-source.channels=ch1
agent.sinks.hdfs-sink.channel=ch1
I've seen several people have problems writing to HDFS, and in most cases it was that there weren't enough logs to fill the HDFS block. However, rollInterval=10 should generate a new file every 10 seconds, as long as at least 1 line is written to it. I can run "tail -F /var/log/httpd/access_log" in another window and see lines being written to the log fairly consistantly, so I don't think it's that.
and here's the command/output from trying to start this agent
[root#sandbox ~]# flume-ng agent -f /etc/flume/conf/flume.conf -n apache-agent
Warning: No configuration directory set! Use --conf <dir> to override.
Info: Including Hadoop libraries found via (/usr/bin/hadoop) for HDFS access
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-api-1.4.3.jar from classpath
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-log4j12-1.4.3.jar from classpath
Info: Including HBASE libraries found via (/usr/bin/hbase) for HBASE access
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-api-1.6.1.jar from classpath
Info: Excluding /usr/lib/hbase/bin/../lib/slf4j-log4j12-1.6.1.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-api-1.4.3.jar from classpath
Info: Excluding /usr/lib/hadoop/lib/slf4j-log4j12-1.4.3.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-api-1.6.1.jar from classpath
Info: Excluding /usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar from classpath
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-api-1.4.3.jar from classpath
Info: Excluding /usr/lib/hadoop/libexec/../lib/slf4j-log4j12-1.4.3.jar from classpath
+ exec /usr/jdk/jdk1.6.0_31//bin/java -Xmx20m -cp '/usr/lib/flume/lib/*:/usr/lib/hadoop/libexec/../conf:/usr/jdk/jdk1.6.0_31/lib/tools.jar:/usr/lib/hadoop/libexec/..:/usr/lib/hadoop/libexec/../hadoop-core-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/ambari-log4j-1.2.3.7.jar:/usr/lib/hadoop/libexec/../lib/asm-3.2.jar:/usr/lib/hadoop/libexec/../lib/aspectjrt-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/aspectjtools-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/libexec/../lib/commons-cli-1.2.jar:/usr/lib/hadoop/libexec/../lib/commons-codec-1.4.jar:/usr/lib/hadoop/libexec/../lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-configuration-1.6.jar:/usr/lib/hadoop/libexec/../lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-digester-1.8.jar:/usr/lib/hadoop/libexec/../lib/commons-el-1.0.jar:/usr/lib/hadoop/libexec/../lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-io-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-lang-2.4.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/libexec/../lib/commons-math-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-net-3.1.jar:/usr/lib/hadoop/libexec/../lib/core-3.1.1.jar:/usr/lib/hadoop/libexec/../lib/guava-11.0.2.jar:/usr/lib/hadoop/libexec/../lib/hadoop-capacity-scheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-fairscheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-lzo-0.5.0.jar:/usr/lib/hadoop/libexec/../lib/hadoop-thriftfs-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-tools.jar:/usr/lib/hadoop/libexec/../lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/libexec/../lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/libexec/../lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jdeb-0.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-core-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-json-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-server-1.8.jar:/usr/lib/hadoop/libexec/../lib/jets3t-0.6.1.jar:/usr/lib/hadoop/libexec/../lib/jetty-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jsch-0.1.42.jar:/usr/lib/hadoop/libexec/../lib/junit-4.5.jar:/usr/lib/hadoop/libexec/../lib/kfs-0.2.2.jar:/usr/lib/hadoop/libexec/../lib/log4j-1.2.15.jar:/usr/lib/hadoop/libexec/../lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/libexec/../lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/libexec/../lib/oro-2.0.8.jar:/usr/lib/hadoop/libexec/../lib/postgresql-9.1-901-1.jdbc4.jar:/usr/lib/hadoop/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/libexec/../lib/xmlenc-0.52.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hbase/bin/../conf:/usr/jdk/jdk1.6.0_31/lib/tools.jar:/usr/lib/hbase/bin/..:/usr/lib/hbase/bin/../hbase-0.94.6.1.3.0.0-107-security.jar:/usr/lib/hbase/bin/../hbase-0.94.6.1.3.0.0-107-security-tests.jar:/usr/lib/hbase/bin/../lib/activation-1.1.jar:/usr/lib/hbase/bin/../lib/asm-3.1.jar:/usr/lib/hbase/bin/../lib/avro-1.5.3.jar:/usr/lib/hbase/bin/../lib/avro-ipc-1.5.3.jar:/usr/lib/hbase/bin/../lib/commons-beanutils-1.7.0.jar:/usr/lib/hbase/bin/../lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hbase/bin/../lib/commons-cli-1.2.jar:/usr/lib/hbase/bin/../lib/commons-codec-1.4.jar:/usr/lib/hbase/bin/../lib/commons-collections-3.2.1.jar:/usr/lib/hbase/bin/../lib/commons-configuration-1.6.jar:/usr/lib/hbase/bin/../lib/commons-digester-1.8.jar:/usr/lib/hbase/bin/../lib/commons-el-1.0.jar:/usr/lib/hbase/bin/../lib/commons-httpclient-3.1.jar:/usr/lib/hbase/bin/../lib/commons-io-2.1.jar:/usr/lib/hbase/bin/../lib/commons-lang-2.5.jar:/usr/lib/hbase/bin/../lib/commons-logging-1.1.1.jar:/usr/lib/hbase/bin/../lib/commons-math-2.1.jar:/usr/lib/hbase/bin/../lib/commons-net-1.4.1.jar:/usr/lib/hbase/bin/../lib/core-3.1.1.jar:/usr/lib/hbase/bin/../lib/guava-11.0.2.jar:/usr/lib/hbase/bin/../lib/hadoop-core.jar:/usr/lib/hbase/bin/../lib/high-scale-lib-1.1.1.jar:/usr/lib/hbase/bin/../lib/httpclient-4.1.2.jar:/usr/lib/hbase/bin/../lib/httpcore-4.1.3.jar:/usr/lib/hbase/bin/../lib/jackson-core-asl-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hbase/bin/../lib/jackson-xc-1.8.8.jar:/usr/lib/hbase/bin/../lib/jamon-runtime-2.3.1.jar:/usr/lib/hbase/bin/../lib/jasper-compiler-5.5.23.jar:/usr/lib/hbase/bin/../lib/jasper-runtime-5.5.23.jar:/usr/lib/hbase/bin/../lib/jaxb-api-2.1.jar:/usr/lib/hbase/bin/../lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hbase/bin/../lib/jersey-core-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-json-1.8.jar:/usr/lib/hbase/bin/../lib/jersey-server-1.8.jar:/usr/lib/hbase/bin/../lib/jettison-1.1.jar:/usr/lib/hbase/bin/../lib/jetty-6.1.26.jar:/usr/lib/hbase/bin/../lib/jetty-util-6.1.26.jar:/usr/lib/hbase/bin/../lib/jruby-complete-1.6.5.jar:/usr/lib/hbase/bin/../lib/jsp-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/lib/hbase/bin/../lib/jsr305-1.3.9.jar:/usr/lib/hbase/bin/../lib/junit-4.10-HBASE-1.jar:/usr/lib/hbase/bin/../lib/libthrift-0.8.0.jar:/usr/lib/hbase/bin/../lib/log4j-1.2.16.jar:/usr/lib/hbase/bin/../lib/metrics-core-2.1.2.jar:/usr/lib/hbase/bin/../lib/netty-3.2.4.Final.jar:/usr/lib/hbase/bin/../lib/protobuf-java-2.4.0a.jar:/usr/lib/hbase/bin/../lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hbase/bin/../lib/snappy-java-1.0.3.2.jar:/usr/lib/hbase/bin/../lib/stax-api-1.0.1.jar:/usr/lib/hbase/bin/../lib/velocity-1.7.jar:/usr/lib/hbase/bin/../lib/xmlenc-0.52.jar:/usr/lib/hbase/bin/../lib/zookeeper.jar:/etc/hadoop/conf:/usr/lib/hadoop/bin:/usr/lib/hadoop/build.xml:/usr/lib/hadoop/CHANGES.txt:/usr/lib/hadoop/conf:/usr/lib/hadoop/contrib:/usr/lib/hadoop/hadoop-ant-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-ant.jar:/usr/lib/hadoop/hadoop-client-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-client.jar:/usr/lib/hadoop/hadoop-core-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-core.jar:/usr/lib/hadoop/hadoop-examples-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-examples.jar:/usr/lib/hadoop/hadoop-minicluster-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-minicluster.jar:/usr/lib/hadoop/hadoop-test-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-test.jar:/usr/lib/hadoop/hadoop-tools-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/hadoop-tools.jar:/usr/lib/hadoop/HDP-CHANGES.txt:/usr/lib/hadoop/ivy:/usr/lib/hadoop/ivy.xml:/usr/lib/hadoop/lib:/usr/lib/hadoop/libexec:/usr/lib/hadoop/LICENSE.txt:/usr/lib/hadoop/logs:/usr/lib/hadoop/LONGWING-CHANGES.txt:/usr/lib/hadoop/NOTICE.txt:/usr/lib/hadoop/pids:/usr/lib/hadoop/README.txt:/usr/lib/hadoop/sbin:/usr/lib/hadoop/webapps:/usr/lib/hadoop/lib/ambari-log4j-1.2.3.7.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.11.jar:/usr/lib/hadoop/lib/aspectjtools-1.6.11.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-lang-2.4.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/hadoop-capacity-scheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/lib/hadoop-fairscheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.5.0.jar:/usr/lib/hadoop/lib/hadoop-thriftfs-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/lib/hadoop-tools.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/lib/jdeb-0.8.jar:/usr/lib/hadoop/lib/jdiff:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/jsp-2.1:/usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.2.2.jar:/usr/lib/hadoop/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/lib/*plugin*jar:/usr/lib/hadoop/lib/postgresql-9.1-901-1.jdbc4.jar:/usr/lib/hadoop/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/zookeeper/bin:/usr/lib/zookeeper/conf:/usr/lib/zookeeper/lib:/usr/lib/zookeeper/zookeeper-3.4.5.1.3.0.0-107.jar:/usr/lib/zookeeper/zookeeper.jar:/usr/lib/zookeeper/lib/ant-1.8.0.jar:/usr/lib/zookeeper/lib/ant-launcher-1.8.0.jar:/usr/lib/zookeeper/lib/backport-util-concurrent-3.1.jar:/usr/lib/zookeeper/lib/classworlds-1.1-alpha-2.jar:/usr/lib/zookeeper/lib/commons-codec-1.6.jar:/usr/lib/zookeeper/lib/commons-io-2.2.jar:/usr/lib/zookeeper/lib/commons-logging-1.1.1.jar:/usr/lib/zookeeper/lib/httpclient-4.2.3.jar:/usr/lib/zookeeper/lib/httpcore-4.2.3.jar:/usr/lib/zookeeper/lib/jline-0.9.94.jar:/usr/lib/zookeeper/lib/jsoup-1.7.1.jar:/usr/lib/zookeeper/lib/log4j-1.2.15.jar:/usr/lib/zookeeper/lib/maven-ant-tasks-2.1.3.jar:/usr/lib/zookeeper/lib/maven-artifact-2.2.1.jar:/usr/lib/zookeeper/lib/maven-artifact-manager-2.2.1.jar:/usr/lib/zookeeper/lib/maven-error-diagnostics-2.2.1.jar:/usr/lib/zookeeper/lib/maven-model-2.2.1.jar:/usr/lib/zookeeper/lib/maven-plugin-registry-2.2.1.jar:/usr/lib/zookeeper/lib/maven-profile-2.2.1.jar:/usr/lib/zookeeper/lib/maven-project-2.2.1.jar:/usr/lib/zookeeper/lib/maven-repository-metadata-2.2.1.jar:/usr/lib/zookeeper/lib/maven-settings-2.2.1.jar:/usr/lib/zookeeper/lib/nekohtml-1.9.6.2.jar:/usr/lib/zookeeper/lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/lib/plexus-container-default-1.0-alpha-9-stable-1.jar:/usr/lib/zookeeper/lib/plexus-interpolation-1.11.jar:/usr/lib/zookeeper/lib/plexus-utils-3.0.8.jar:/usr/lib/zookeeper/lib/wagon-file-1.0-beta-6.jar:/usr/lib/zookeeper/lib/wagon-http-2.4.jar:/usr/lib/zookeeper/lib/wagon-http-lightweight-1.0-beta-6.jar:/usr/lib/zookeeper/lib/wagon-http-shared-1.0-beta-6.jar:/usr/lib/zookeeper/lib/wagon-http-shared4-2.4.jar:/usr/lib/zookeeper/lib/wagon-provider-api-2.4.jar:/usr/lib/zookeeper/lib/xercesMinimal-1.9.6.2.jar:/usr/lib/hadoop/libexec/../conf:/usr/jdk/jdk1.6.0_31/lib/tools.jar:/usr/lib/hadoop/libexec/..:/usr/lib/hadoop/libexec/../hadoop-core-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/ambari-log4j-1.2.3.7.jar:/usr/lib/hadoop/libexec/../lib/asm-3.2.jar:/usr/lib/hadoop/libexec/../lib/aspectjrt-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/aspectjtools-1.6.11.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/libexec/../lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/libexec/../lib/commons-cli-1.2.jar:/usr/lib/hadoop/libexec/../lib/commons-codec-1.4.jar:/usr/lib/hadoop/libexec/../lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-configuration-1.6.jar:/usr/lib/hadoop/libexec/../lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-digester-1.8.jar:/usr/lib/hadoop/libexec/../lib/commons-el-1.0.jar:/usr/lib/hadoop/libexec/../lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/libexec/../lib/commons-io-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-lang-2.4.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/libexec/../lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/libexec/../lib/commons-math-2.1.jar:/usr/lib/hadoop/libexec/../lib/commons-net-3.1.jar:/usr/lib/hadoop/libexec/../lib/core-3.1.1.jar:/usr/lib/hadoop/libexec/../lib/guava-11.0.2.jar:/usr/lib/hadoop/libexec/../lib/hadoop-capacity-scheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-fairscheduler-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-lzo-0.5.0.jar:/usr/lib/hadoop/libexec/../lib/hadoop-thriftfs-1.2.0.1.3.0.0-107.jar:/usr/lib/hadoop/libexec/../lib/hadoop-tools.jar:/usr/lib/hadoop/libexec/../lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/libexec/../lib/hue-plugins-2.2.0-SNAPSHOT.jar:/usr/lib/hadoop/libexec/../lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/libexec/../lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/libexec/../lib/jdeb-0.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-core-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-json-1.8.jar:/usr/lib/hadoop/libexec/../lib/jersey-server-1.8.jar:/usr/lib/hadoop/libexec/../lib/jets3t-0.6.1.jar:/usr/lib/hadoop/libexec/../lib/jetty-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/libexec/../lib/jsch-0.1.42.jar:/usr/lib/hadoop/libexec/../lib/junit-4.5.jar:/usr/lib/hadoop/libexec/../lib/kfs-0.2.2.jar:/usr/lib/hadoop/libexec/../lib/log4j-1.2.15.jar:/usr/lib/hadoop/libexec/../lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/libexec/../lib/netty-3.6.2.Final.jar:/usr/lib/hadoop/libexec/../lib/oro-2.0.8.jar:/usr/lib/hadoop/libexec/../lib/postgresql-9.1-901-1.jdbc4.jar:/usr/lib/hadoop/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/libexec/../lib/xmlenc-0.52.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/conf' -Djava.library.path=:/usr/lib/hadoop/libexec/../lib/native/Linux-amd64-64:/usr/lib/hadoop/libexec/../lib/native/Linux-amd64-64:/usr/lib/hbase/bin/../lib/native/Linux-amd64-64 org.apache.flume.node.Application -f /etc/flume/conf/flume.conf -n apache-agent
13/09/03 12:35:11 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
13/09/03 12:35:11 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/etc/flume/conf/flume.conf
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Added sinks: hdfs-sink Agent: agent
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Processing:hdfs-sink
13/09/03 12:35:11 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent]
13/09/03 12:35:11 WARN node.AbstractConfigurationProvider: No configuration found for this host:apache-agent
13/09/03 12:35:11 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{} channels:{} }
Now at this point I realize I'm missing several things.
1) I expect to see something along the lines of "INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs-sink started" as my last line, which I don't
2) If I use the command “hadoop fs -lsr /flume” I should see new logs in my HDFS, but I don't. The last logs are from 8/28/2013, when I did the tutorial.
I also don't expect to see that WARN line in there, but I'm not sure why it's there, so maybe that's my problem and someone can tell me why.
So my questions are:
1) Can anyone tell me what might be going wrong here?
2) When I get this problem sorted out, is there anything else I should be looking for to see what Flume is working correctly, reading what it should and writing to where it should and when?
The answer is, of course, to name your agent when you start flume the same as your agent name in the config file. So my command line should have ended "-n agent" and NOT "-n apache-agent" since my flume.conf file specifies "agent.X"
After that everything appears to work.
In the config file you specified
agent.sources=exec-source
agent.sinks=hdfs-sink
agent.channels=ch1
so the agent name is 'agent' flume expects that while running the flume-agent you should use the same name as specified in the config file so the command should be
/usr/lib/flume/bin/flume-ng agent -n agent
Did you do set the agent in step #3 ?
Check out the original blog post and the Hadoop UI Hue and it Hadoop tutorials.

Resources