Total size of similar indices - elasticsearch

Kibana shows statistics for every index on the monitoring page 1. How can we group indices by type to get their overall size? For example I've got a lot of winlogbeat-6.2.2-YYYY.mm.dd indices and would like to know how much space do all of them consume in total.
thanks!

One way to achieve what you want is to use the Index stats API, and filter out the store.size_in_bytes value, like this:
winlogbeat-6.2.2*/_stats?filter_path=_all.total.store.size_in_bytes
You'll get a response like this:
{
"_all": {
"total": {
"store": {
"size_in_bytes": 922069687976
}
}
}
}
Another way to achieve what you want involves leveraging the Cat APIs, a bit of grep and a tad of awk...
The following shell command will give you the number of bytes consumed by all your winlogbeat-6.2.2 indices:
curl -s localhost:9200/_cat/indices?bytes=b | grep winlogbeat-6.2.2 | awk '{s+=$9} END {print s}'
You'll get a single number, like this: 922069687976
Let me explain:
The first command will retrieve all indices via the _cat/indices API.
curl -s localhost:9200/_cat/indices?bytes=b
The second command keeps only the indices matching winlogbeat-6.2.2
grep winlogbeat-6.2.2
The last command does the magic of summing up all numbers in the 9th column (i.e. store.size)
awk '{s+=$9} END {print s}'
VoilĂ ...

If you collect monitoring stats for your cluster, when you could use Kibana for visualization:
Prerequisites: X-Pack which will be creating .monitoring-* indices.
Create a scripted field for the index pattern .monitoring-es-6-*:
Field name: normalized_index_name (will work for for SOME-INDEX-YYYY.MM.DD pattern only as it removes everything from the index name after the last dash, a scripted field with a regex can be used, but regex fields must be explicitly allowed in the ES config):
def name = doc['index_stats.index'].value;
if (name != null) {
int lastDashIndex = name.lastIndexOf('-');
if (lastDashIndex > 0) {
return name.substring(0,lastDashIndex);
}
}
return name
Create a Line visualization. Note:
time frame should be set to 7 days
start to build the visualization from X-axis, otherwise it may not split series properly (bug?..)!

Related

Extract 2 fields from string with search

I have a file with several lines of data. The fields are not always in the same position/column. I want to search for 2 strings and then show only the field and the data that follows. For example:
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
I would like to return the following:
"id":"1111","hwVersion":"4444"
"id":"5555","hwVersion":"7777"
I am struggling because the data isn't always in the same position, so I can't chose a column number. I feel I need to search for "id" and "hwVersion" Any help is GREATLY appreciated.
Totally agree with #KamilCuk. More specifically
jq -c '{id: .id, hwVersion: .hwVersion}' <<< '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
Outputs:
{"id":"1111","hwVersion":"4444"}
Not quite the specified output, but valid JSON
More to the point, your input should probably be processed record by record, and my guess is that a two column output with "id" and "hwVersion" would be even easier to parse:
cat << EOF | jq -j '"\(.id)\t\(.hwVersion)\n"'
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
EOF
Outputs:
1111 4444
5555 7777
Since the data looks like a mapping objects and even corresponding to a JSON format, something like this should do, if you don't mind using Python (which comes with JSON) support:
import json
def get_id_hw(s):
d = json.loads(s)
return '"id":"{}","hwVersion":"{}"'.format(d["id"], d["hwVersion"])
We take a line of input string into s and parse it as JSON into a dictionary d. Then we return a formatted string with double-quoted id and hwVersion strings followed by column and double-quoted value of corresponding key from the previously obtained dict.
We can try this with these test input strings and prints:
# These will be our test inputs.
s1 = '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
s2 = '{"id":"5555","name":"6666","hwVersion":"7777"}'
# we pass and print them here
print(get_id_hw(s1))
print(get_id_hw(s2))
But we can just as well iterate over lines of any input.
If you really wanted to use awk, you could, but it's not the most robust and suitable tool:
awk '{ i = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
h = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
printf("\"id\":\"%s\",\"hwVersion\":\"%s\"\n"), i, h}' /your/file
Since you mention position is not known and assuming it can be in any order, we use one regex to extract id and the other to get hwVersion, then we print it out in given format. If the values could be something other then decimal digits as in your example, the [0-9]+ but would need to reflect that.
And for the fun if it (this preserves the order) if entries from the file, in sed:
sed -e 's#.*\("\(id\|hwVersion\)":"[0-9]\+"\).*\("\(id\|hwVersion\)":"[0-9]\+"\).*#\1,\3#' file
It looks for two groups of "id" or "hwVersion" followed by :"<DECIMAL_DIGITS>".

Get unique on base of part of string but should print whole string

I want get unique count on base of part of string but after count whole string should be display
Sample Logs:
Error [VALIDATION_ERROR_OFFER_ALREADY_EXISTS] Code [VAL-00019] Message
Error [VALIDATION_ERROR_OFFER_NOT_EXISTS] Code [VAL-00023] Message [Offer
Error [WEB_SERVICE_CLIENT_INITIALIZATION_FAILED] Code [WS-00001] Message [Error while initializing CBCM Web Service Client.]
Now on base of first part between [], i want to get count in who logs file, but first line out of all lines should be displayed complete
zgrep -h 'Error' my.log|awk -F'[][]' '{print $2}'|sort| uniq -c
Above only print
3 VALIDATION_ERROR_OFFER_ALREADY_EXISTS
1 VALIDATION_ERROR_OFFER_NOT_EXISTS
5 WEB_SERVICE_CLIENT_INITIALIZATION_FAILED
but i am looking that after count it display one complete sample line like
3 Error [VALIDATION_ERROR_OFFER_ALREADY_EXISTS] Code [VAL-00019] Message
This prints the first line found along with the count of what's inside the square brackets, using your existing method:
zcat your.log.gz | awk -F'[][]' '
!($2 in c) {c[$2]=$0}
{a[$2]++}
END {for(i in c){printf "%4d %s\n",a[i],c[i]}}
'
The logic here is that the c[] array will store first appearances of content, the a[] array functions as a counter for errors. The END block steps through the array (either would do, as they share indices), printing counts and content. Note that output from this is NOT necessarily in the same order as input, but you haven't specified that as a requirement.
You could make it a single command line if you like. I spread it out for easier reading.

Calculating averages over ranges of patterns

I am very new to this kind of work so bear with me please :) I am trying to calculate means over ranges of patterns. E.g. I have two files which are tab delimited:
The file coverage.txt contains two colums. The first colum indicates the position and the second the value assigned to that postion. There are ca. 4*10^6 positions.
coverage.txt
1 10
2 30
3 5
4 10
The second file "patterns.txt" contains three columns 1. the name of the pattern, 2. the starting position of the pattern and 3. end position of the pattern. The pattern ranges do not overlap. There are ca. 3000 patterns.
patterns.txt
rpoB 1 2
gyrA 3 4
Now I want to calculate the mean of the values assigned to the positions of the different patterns and write the output to a new file containing the first colum of the patterns.txt as an identifier.
output.txt
rpoB 20
gyrA 7.5
I think this can be accomplished using awk but I do not know where to start. Your help would be greatly appreciated!
With four million positions, it might be time to reach for a more substantial programming language than shell/awk, but you can do it in a single pass with something like this:
awk '{
if (FILENAME ~ "patterns.txt") {
min[$1]=$2
max[$1]=$3
} else {
for (pat in min) {
if ($1 >= min[pat] && $1 <= max[pat]) {
total[pat] += $2
count[pat] += 1
}
}
}
}
END {
for (pat in total) {
print pat,total[pat]/count[pat]
}
}' patterns.txt coverage.txt
This omits any patterns that don't have any data in the coverage file; you can change the loop in the END to loop over everything in the patterns file instead and just output 0s for the ones that didn't show up.

Grep for displaying count of muliple strings in a single file

Another question ... Can I get the count of items that are unique .. If in my previous case, i just took a simple instance . My business req is here ...
I have string like the below happy=7
happy=5
happy=5,
bascically I will be using regex for searching the word happy, I would give like "happy=*"... I need the output as "count of happy =2" as there is one duplicate instance ...
Use awk:
awk '/happy/{ happy+=1 } /sad/ {sad += 1 }
END { print "happy =", happy+0, "sad = ", sad+0 }'
Note that like grep -c, this does not count occurrences of each word but the number of lines that match each word.
You're better off using something like perl or awk, where you can increment counters based on conditional statements.

Ruby script for matching 3 patterns on ruby

I have a fail2ban.log from which I want to grab specific fields, from 'Ban' strings. I can grab the data I need using regex one at the time, but I am not able to combine them. A typical 'fail2ban' log file has many strings. I'm interested in strings like these:
2012-05-02 14:47:40,515 fail2ban.actions: WARNING [ssh-iptables] Ban 84.xx.xx.242
xx = numbers (digits)
I want to grab: a) Date and Time, b) Ban (keyword), c) IP address
Here is my regex:
IP = (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
date & time = ^(\d{4}\W\d{2}\W\d{2}\s\d{2}\W\d{2}\W\d{2})
My problem here is, how can I combine these three together. I tried something like this:
^(?=^\d{4}\W\d{2}\W\d{2}\s\d{2}\W\d{2}\W\d{2})(?=\.*d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$)(?=^(?Ban).)*$).*$
but does not work as I would wanted too.
To give a clearer example, here is what I want:
greyjewel:FailMap atma$ cat fail2ban.log |grep Ban|awk -F " " '{print $1, $2, $7}'|tail -n 3
2012-05-02 14:47:40,515 84.51.18.242
2012-05-03 00:35:44,520 202.164.46.29
2012-05-03 17:55:03,725 203.92.42.6
Best Regards
A pretty direct translation of the example
ruby -alne 'BEGIN {$,=" "}; print $F.values_at(0,1,-1) if /Ban/' fail2ban.log
And because I figure you must want them from within Ruby
results = File.foreach("input").grep(/Ban/).map { |line| line.chomp.split.values_at 0, 1, -1 }
If the field placement doesn't change, you don't even need a regex here:
log_line =
'2012-05-02 14:47:40,515 fail2ban.actions: WARNING [ssh-iptables] Ban 84.12.34.242'
date, time, action, ip = log_line.split.values_at(0,1,-2,-1)

Resources