I'm trying to load two different configurations at a time in Logstash Pipeline. Those two configurations have different Input, Filters, and Output.
Note: My ELK setup on Docker Swarm
I tried like below:
[ec2-user#ip-10-9-9-92 logstash]$ ls -lrth pipeline/
-rw-rw-r-- 1 ec2-user ec2-user 1.4K Feb 20 16:53 alb.conf
-rw-rw-r-- 1 ec2-user ec2-user 2.0K Feb 20 17:28 logstash.conf
#Here my Elasticsearch is same but indexes are different in those files. Please help me with it.
You can try to use de -f flag for different inputs:
Edit with thanks to #leandrojmp:
logstash -f '/path/{file1,file2}'
Or use a single file with if conditions for the filter and output plugins - if you have the same input.
Related
Seeing nonsense values for user names in folder permissions for NFS mounted HDFS locations, while the HDFS locations themselves (using Hortonworks HDP 3.1) appear fine. Eg.
➜ ~ ls -lh /nfs_mount_root/user
total 6.5K
drwx------. 3 accumulo hdfs 96 Jul 19 13:53 accumulo
drwxr-xr-x. 3 92668751 hadoop 96 Jul 25 15:17 admin
drwxrwx---. 3 ambari-qa hdfs 96 Jul 19 13:54 ambari-qa
drwxr-xr-x. 3 druid hadoop 96 Jul 19 13:53 druid
drwxr-xr-x. 2 hbase hdfs 64 Jul 19 13:50 hbase
drwx------. 5 hdfs hdfs 160 Aug 26 10:41 hdfs
drwxr-xr-x. 4 hive hdfs 128 Aug 26 10:24 hive
drwxr-xr-x. 5 h_etl hdfs 160 Aug 9 14:54 h_etl
drwxr-xr-x. 3 108146 hdfs 96 Aug 1 15:43 ml1
drwxrwxr-x. 3 oozie hdfs 96 Jul 19 13:56 oozie
drwxr-xr-x. 3 882121447 hdfs 96 Aug 5 10:56 q_etl
drwxrwxr-x. 2 spark hdfs 64 Jul 19 13:57 spark
drwxr-xr-x. 6 zeppelin hdfs 192 Aug 23 15:45 zeppelin
➜ ~ hadoop fs -ls /user
Found 13 items
drwx------ - accumulo hdfs 0 2019-07-19 13:53 /user/accumulo
drwxr-xr-x - admin hadoop 0 2019-07-25 15:17 /user/admin
drwxrwx--- - ambari-qa hdfs 0 2019-07-19 13:54 /user/ambari-qa
drwxr-xr-x - druid hadoop 0 2019-07-19 13:53 /user/druid
drwxr-xr-x - hbase hdfs 0 2019-07-19 13:50 /user/hbase
drwx------ - hdfs hdfs 0 2019-08-26 10:41 /user/hdfs
drwxr-xr-x - hive hdfs 0 2019-08-26 10:24 /user/hive
drwxr-xr-x - h_etl hdfs 0 2019-08-09 14:54 /user/h_etl
drwxr-xr-x - ml1 hdfs 0 2019-08-01 15:43 /user/ml1
drwxrwxr-x - oozie hdfs 0 2019-07-19 13:56 /user/oozie
drwxr-xr-x - q_etl hdfs 0 2019-08-05 10:56 /user/q_etl
drwxrwxr-x - spark hdfs 0 2019-07-19 13:57 /user/spark
drwxr-xr-x - zeppelin hdfs 0 2019-08-23 15:45 /user/zeppelin
Notice the difference for users ml1 and q_etl that they have numerical user values when running ls on the NFS locations, rather then their user names.
Even doing something like...
[hdfs#HW04 ml1]$ hadoop fs -chown ml1 /user/ml1
does not change the NFS permissions. Even more annoying, when trying to change the NFS mount permissions as root, we see
[root#HW04 ml1]# chown ml1 /nfs_mount_root/user/ml1
chown: changing ownership of ‘/nfs_mount_root/user/ml1’: Permission denied
This causes real problems, since the differing uid means that I can't access these dirs even as the "correct" user to write to them. Not sure what to make of this. Anyone with more Hadoop experience have any debugging suggestions or fixes?
UPDATE:
Doing a bit more testing / debugging, found that the rules appear to be...
If the NFS server node has no uid (or gid?) that matches the uid of the user on the node accessing the NFS mount, we get the weird uid values as seen here.
If there is a uid associated to the username of the user on the requesting node, then that is the uid user that we see assigned to the location when accessing via NFS (even if that uid on the NFS server node is not actually for the requesting user), eg.
[root#HW01 ~]# clush -ab id ml1
---------------
HW[01,04] (2)
---------------
uid=1025(ml1) gid=1025(ml1) groups=1025(ml1)
---------------
HW[02-03] (2)
---------------
uid=1027(ml1) gid=1027(ml1) groups=1027(ml1)
---------------
HW05
---------------
uid=1026(ml1) gid=1026(ml1) groups=1026(ml1)
[root#HW01 ~]# exit
logout
Connection to hw01 closed.
➜ ~ ls -lh /hdpnfs/user
total 6.5K
...
drwxr-xr-x. 6 atlas hdfs 192 Aug 27 12:04 ml1
...
➜ ~ hadoop fs -ls /user
Found 13 items
...
drwxr-xr-x - ml1 hdfs 0 2019-08-27 12:04 /user/ml1
...
[root#HW01 ~]# clush -ab id atlas
---------------
HW[01,04] (2)
---------------
uid=1027(atlas) gid=1005(hadoop) groups=1005(hadoop)
---------------
HW[02-03] (2)
---------------
uid=1024(atlas) gid=1005(hadoop) groups=1005(hadoop)
---------------
HW05
---------------
uid=1005(atlas) gid=1006(hadoop) groups=1006(hadoop)
If wondering why I have, user on the cluster that have varying uids across the cluster nodes, see the problem posted here: How to properly change uid for HDP / ambari-created user? (note that these odd uid setting for hadoop service users was set up by Ambari by default).
After talking with someone more knowledgeable in HDP hadoop, found that the problem is that when Ambari was setup and run to initially install the hadoop cluster, there may have been other preexisting users on the designated cluster nodes.
Ambari creates its various service users by giving them the next available UID of a nodes available block of user UIDs. However, prior to installing Ambari and HDP on the nodes, I created some users on the to-be namenode (and others) in order to do some initial maintenance checks and tests. I should have just done this as root. Adding these extra users offset the UID counter on those nodes and so as Ambari created users on the nodes and incremented the UIDs, it was starting from different starting counter values. Thus, the UIDs did not sync and caused problems with HDFS NFS.
To fix this, I...
Used Ambari to stop all running HDP services
Go to Service Accounts in Ambari and copy all of the expected service users name strings
For each user, run something like id <service username> to get the group(s) for each user. For service groups (which may have multiple members), can do something like grep 'group-name-here' /etc/group. I recommend doing it this way as the Ambari docs of default users and groups does not have some of the info that you can get here.
Use userdel and groupdel to remove all the Ambari service users and groups
Then recreate all the groups across the cluster
Then recreate all the users across the cluster (may need to specify UID if nodes have other users not on others)
Restart the HDP services (hopefully everything should still run as if nothing happend, since HDP should be looking for the literal string (not the UIDs))
For the last parts, can use something like clustershell, eg.
# remove user
$ clush -ab userdel <service username>
# check that the UID you want to use is actually available on all nodes
$ clush -ab id <some specific UID you want to use>
# assign that UID to a new service user
$ clush -ab useradd --uid <the specific UID> --gid <groupname> <service username>
To get the lowest common available UID from each node, used...
# for UID
getent passwd | awk -F: '($3>1000) && ($3<10000) && ($3>maxuid) { maxuid=$3; } END { print maxuid+1; }'
# for GID
getent passwd | awk -F: '($4>1000) && ($4<10000) && ($4>maxuid) { maxuid=$4; } END { print maxuid+1; }'
Ambari also creates some /home dirs for users. Once you are done recreating the users, will need to change the permissions for the dirs (can also use something like clush there as well).
* Note that this was a huge pain and you would need to manually correct the UIDs of users whenever you added another cluster node. I did this for a test cluster, but for production (or even a larger test) you should just useKerberos or SSSD + Active Directory.
i need to sort some data in order to compare them. I use the command sort.
When it's use in a terminal it works well but when i use it in a script it reduces the file.
size of the dbs :
-rw-r--r-- 1 apache apache 10646 24 juil. 16:43 db1.txt
-rw-r--r-- 1 apache apache 10664 24 juil. 16:43 db2.txt
directly in the terminal
sort db1.txt > db1sorted.txt
sort db2.txt > db2sorted.txt
gives
-rw-r--r-- 1 apache apache 10646 24 juil. 16:06 db1sorted.txt
-rw-r--r-- 1 apache apache 10664 24 juil. 16:06 db2sorted.txt
While with the following script (of course input are db1.txt and db2.txt ):
FILE1=$1
FILE2=$2
sort ${FILE1} > db1sortedd.txt
sort ${FILE2} > db2sortedd.txt
gives
-rw-r--r-- 1 apache apache 10646 24 juil. 17:01 db1sortedd.txt
-rw-r--r-- 1 apache apache 8193 24 juil. 17:01 db2sortedd.txt
I don't understand why the db2sortedd is smaller with the script than with the terminal. Data are disapearing from the letter s.
Thank You in advance for your help
As suggested i tried with smaller files ( 10 lines ) and it works well.
So it's working for small files but not for big one...
I'm on OS X 10.9.4 and trying to use newsyslog to rotate my app development log files.
More specifically, I want to rotate the files daily but only if they are not empty (newsyslog writes one or two lines to every logfile it rotates, so let's say I only want to rotate logs that are at least 1kb).
I created a file /etc/newsyslog.d/code.conf:
# logfilename [owner:group] mode count size when flags [/pid_file] [sig_num]
/Users/manuel/code/**/log/*.log manuel:staff 644 7 1 $D0 GN
The way I understand the man page for the configuration file is that size and when conditions should work in combination, so logfiles should be rotated every night at midnight only if they are 1kb or larger.
Unfortunately this is not what happens. The log files are rotated every night, no matter if they only the rotation message from newsyslog or anything else:
~/code/myapp/log (master) $ ls
total 32
drwxr-xr-x 6 manuel staff 204B Aug 8 00:17 .
drwxr-xr-x 22 manuel staff 748B Jul 25 14:56 ..
-rw-r--r-- 1 manuel staff 64B Aug 8 00:17 development.log
-rw-r--r-- 1 manuel staff 153B Aug 8 00:17 development.log.0
~/code/myapp/log (master) $ cat development.log
Aug 8 00:17:41 localhost newsyslog[81858]: logfile turned over
~/code/myapp/log (master) $ cat development.log.0
Aug 7 00:45:17 Manuels-MacBook-Pro newsyslog[34434]: logfile turned over due to size>1K
Aug 8 00:17:41 localhost newsyslog[81858]: logfile turned over
Any tips on how to get this working would be appreciated!
What you're looking for (rotate files daily unless they haven't logged anything) isn't possible using newsyslog. The man page you referenced doesn't say anything about size and when being combined other than to say that if when isn't specified, than it is as-if only size was specified. The reality is that the log is rotated when either condition is met. If the utility is like its FreeBSD counterpart, it won't rotate logs less than 512 bytes in size unless the binary flag is set.
MacOS' newer replacement for newsyslog, ASL, also doesn't have the behavior you desire. As far as I know, the only utility which has this is logrotate using its notifempty configuration option. You can install logrotate on your Mac using Homebrew
I am a fresher to Hadoop. I have to find the trend of symbols traded among users.
I have 2 machines b040n10 and b040n11. The files in the machine are as mentioned below:
b040n10:/u/ssekar>ls -lrt
-rw-r--r-- 1 root root 482342353 Feb 8 2014 A.log
-rw-r--r-- 1 root root 481231231 Feb 8 2014 B.log
b040n11:/u/ssekar>ls -lrt
-rw-r--r-- 1 root root 412312312 Feb 8 2014 C.log
-rw-r--r-- 1 root root 412356315 Feb 8 2014 D.log
There is a field called "symbol_name" on all these logs (example below).
IP=145.45.34.2;***symbol_name=ABC;***timestamp=12:13:05
IP=145.45.34.2;***symbol_name=XYZ;***timestamp=12:13:56
IP=145.45.34.2;***symbol_name=ABC;***timestamp=12:14:56
I have Hadoop running on my Laptop and I have 2 machines connected to my Laptop (can be used as Datanodes).
My task now is to get the list of symbol_name and the Symbol count.
As mentioned below:
ABC - 2
XYZ - 1
Should I now:
1. copy all the files (A.log,B.log,C.log,D.log) from b040n10 and b040n11 to my Laptop,
2. Issue a copyFromLocal command to HDFS system and analyze the data?
or is there a better way to findout the symbol_name and count without copying these files to my laptop?
The question is a basic one, but I am new to Hadoop, please help me to understand and use Hadoop to better. Please let me know if more information on the question is need.
Thanks
Copying the files from Hadoop to your local laptop defies the entire purpose of Hadoop which is to move the processing to the data not the other way. Because when you really have "BigData", you won't be able to move the data around to process it locally.
Your problem is a typical case of Map/Reduce, all what you need is a job that counts the occurrence of each symbol. Just search for Map/Reduce WordCount example and adapt it to your case
I have a local file
-rw-r--r-- 1 me developers 102445154 Oct 22 10:02 file1.csv
which I'm attempting to put to hdfs:
/usr/bin/hdfs dfs -put ./file1.csv hdfs://000.00.00.00/user/me/
which works fine, but the group is wrong
-rw-r--r-- 3 me me 102445154 2013-10-22 10:23 hdfs://000.00.00.00/user/file1.csv
How do I get the group developers to come with?
Use the chgrp option on the file.