elasticsearch disk usage / indexes size - elasticsearch

i use elasticsearch and when i use _cat/allocation/:
shards disk.indices disk.used disk.avail disk.total disk.percent
10 4.9mb 51.4gb 956.3gb 1007.8gb 5
10 4.7mb 51.5gb 956.2gb 1007.8gb 5
disk.used is over 50GB
using _cat/shards:
index shard prirep state docs store
cs-card-logs_20180712-001 4 p STARTED 724 572.8kb
cs-card-logs_20180712-001 4 r STARTED 724 539.7kb
cs-card-logs_20180712-001 3 r STARTED 673 997.8kb
cs-card-logs_20180712-001 3 p STARTED 673 969.8kb
cs-card-logs_20180712-001 2 p STARTED 699 1mb
cs-card-logs_20180712-001 2 r STARTED 699 556.9kb
cs-card-logs_20180712-001 1 r STARTED 670 1mb
cs-card-logs_20180712-001 1 p STARTED 670 546.7kb
cs-card-logs_20180712-001 0 p STARTED 722 1013.1kb
cs-card-logs_20180712-001 0 r STARTED 722 1020.8kb
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open read_me 5 1 0 0 1.5kb 795b
green open cs-card-logs_20180712-001 5 1 3106 0 4.8mb 2.4mb
the store size is lower than 5mb
using /_cat/segments/
index shard prirep segment generation docs.count docs.deleted size size.memory committed searchable version compound
cs-card-logs_20180712-001 0 p _5u 210 245 0 209.7kb 45308 true true 5.5.2 false
cs-card-logs_20180712-001 0 p _5v 211 1 0 5.2kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 0 p _5w 212 1 0 5kb 3872 true true 5.5.2 true
cs-card-logs_20180712-001 0 r _5u 210 243 0 207.8kb 45243 true true 5.5.2 false
cs-card-logs_20180712-001 0 r _5v 211 2 0 10.4kb 8095 true true 5.5.2 true
cs-card-logs_20180712-001 0 r _5w 212 1 0 5.2kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 0 r _5x 213 1 0 5kb 3872 true true 5.5.2 true
cs-card-logs_20180712-001 1 r _50 180 188 0 178.4kb 44552 true true 5.5.2 false
cs-card-logs_20180712-001 1 r _51 181 1 0 5.2kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 1 r _52 182 2 0 10.4kb 8095 true true 5.5.2 true
cs-card-logs_20180712-001 1 r _53 183 1 0 4.4kb 3262 true true 5.5.2 true
cs-card-logs_20180712-001 1 r _54 184 1 0 5kb 3872 true true 5.5.2 true
cs-card-logs_20180712-001 1 r _55 185 1 0 5.2kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 1 r _56 186 1 0 4.4kb 3262 true true 5.5.2 true
cs-card-logs_20180712-001 1 r _57 187 1 0 5.2kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 1 r _58 188 2 0 8.3kb 6826 true true 5.5.2 true
cs-card-logs_20180712-001 1 p _50 180 189 0 178.7kb 44568 true true 5.5.2 false
cs-card-logs_20180712-001 1 p _51 181 2 0 10.4kb 8095 true true 5.5.2 true
cs-card-logs_20180712-001 1 p _52 182 1 0 4.4kb 3262 true true 5.5.2 true
cs-card-logs_20180712-001 1 p _53 183 1 0 5kb 3872 true true 5.5.2 true
cs-card-logs_20180712-001 1 p _54 184 1 0 5.2kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 1 p _55 185 1 0 4.4kb 3262 true true 5.5.2 true
cs-card-logs_20180712-001 1 p _56 186 1 0 5.2kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 1 p _57 187 2 0 8.3kb 6826 true true 5.5.2 true
cs-card-logs_20180712-001 2 p _64 220 240 0 209.8kb 45900 true true 5.5.2 false
cs-card-logs_20180712-001 2 p _65 221 1 0 5kb 3872 true true 5.5.2 true
cs-card-logs_20180712-001 2 r _64 220 238 0 209.8kb 45873 true true 5.5.2 false
cs-card-logs_20180712-001 2 r _65 221 1 0 4.8kb 3872 true true 5.5.2 true
cs-card-logs_20180712-001 2 r _66 222 1 0 4.4kb 3262 true true 5.5.2 true
cs-card-logs_20180712-001 2 r _67 223 1 0 5kb 3872 true true 5.5.2 true
cs-card-logs_20180712-001 3 r _5u 210 226 0 207.1kb 45876 true true 5.5.2 false
cs-card-logs_20180712-001 3 r _5v 211 1 0 6.5kb 5269 true true 5.5.2 true
cs-card-logs_20180712-001 3 r _5w 212 2 0 39.5kb 27250 true true 5.5.2 true
cs-card-logs_20180712-001 3 p _5u 210 223 0 205.6kb 45812 true true 5.5.2 false
cs-card-logs_20180712-001 3 p _5v 211 2 0 10.4kb 8095 true true 5.5.2 true
cs-card-logs_20180712-001 3 p _5w 212 1 0 5.2kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 3 p _5x 213 1 0 6.5kb 5269 true true 5.5.2 true
cs-card-logs_20180712-001 3 p _5y 214 2 0 39.5kb 27250 true true 5.5.2 true
cs-card-logs_20180712-001 4 p _64 220 240 0 207kb 45498 true true 5.5.2 false
cs-card-logs_20180712-001 4 p _65 221 1 0 4.8kb 3872 true true 5.5.2 true
cs-card-logs_20180712-001 4 p _66 222 1 0 5.2kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 4 p _67 223 1 0 5.1kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 4 p _68 224 1 0 6.7kb 5397 true true 5.5.2 true
cs-card-logs_20180712-001 4 p _69 225 2 0 40.2kb 27796 true true 5.5.2 true
cs-card-logs_20180712-001 4 r _64 220 240 0 207.1kb 45506 true true 5.5.2 false
cs-card-logs_20180712-001 4 r _65 221 1 0 4.8kb 3872 true true 5.5.2 true
cs-card-logs_20180712-001 4 r _66 222 1 0 5.2kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 4 r _67 223 1 0 5.1kb 4113 true true 5.5.2 true
cs-card-logs_20180712-001 4 r _68 224 1 0 6.7kb 5397 true true 5.5.2 true
cs-card-logs_20180712-001 4 r _69 225 2 0 40.2kb 27796 true true 5.5.2 true
I can't figure why is my disk usage so high ?
what can i do to find the reason of this disk.used ?
how can i check what is taking that's much space ?
did someone can help me ?
thanks

The figured reported by the disk.used column is the disk space used in total, i.e. also outside of ES.
The size used by ES is in the disk.indices column. This column was added in order to provide more insights into the ES vs non-ES disk usage.
So in order to find out what's taking up disk space, you can leverage the du command at the root of your filesystem, but it's not ES.

Related

How to encode struct to byte slice and decode byte slice back to original struct using gob encoding?

i am trying marshall go struct to bytes (via gob encoding), and then to unmarshall those bytes back to original object. I am getting unexpected result (object is not getting the correct values). Help me to correct the programm please.
Input:
package main
import (
"bytes"
"encoding/gob"
"fmt"
)
type object struct {
name string
age int
}
func main() {
inputObject := object{age: 22, name: "Zloy"}
fmt.Println(inputObject)
var inputBuffer bytes.Buffer
gob.NewEncoder(&inputBuffer).Encode(inputObject)
fmt.Println(inputBuffer)
destBytes := inputBuffer.Bytes()
fmt.Println("\n", destBytes, "\n")
var outputBuffer bytes.Buffer
outputBuffer.Write(destBytes)
fmt.Println(outputBuffer)
var outputObject object
gob.NewDecoder(&outputBuffer).Decode(&outputObject)
fmt.Println(outputObject)
}
Output:
{Zloy 22}
{[18 255 129 3 1 1 6 111 98 106 101 99 116 1 255 130 0 0 0] 0 0}
[18 255 129 3 1 1 6 111 98 106 101 99 116 1 255 130 0 0 0]
{[18 255 129 3 1 1 6 111 98 106 101 99 116 1 255 130 0 0 0] 0 0}
{ 0}
Expected Output:
{Zloy 22}
{[18 255 129 3 1 1 6 111 98 106 101 99 116 1 255 130 0 0 0] 0 0}
[18 255 129 3 1 1 6 111 98 106 101 99 116 1 255 130 0 0 0]
{[18 255 129 3 1 1 6 111 98 106 101 99 116 1 255 130 0 0 0] 0 0}
{Zloy 22}
You need to capitalize the field names to make them publicly exportable/importable:
type object struct {
Name string
Age int
}
https://play.golang.org/p/_YqSmeDi6oH

EC2 high stolen time without load

I can see very high % of stolen time on a EC2 web server (t2.micro) without any load (one current user) with a high page load time. Is there a correlation between hight load time and hight stolen time? I have the same symptoms with another server from class t2.medium
Do you have an explanation?
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 79824 7428 479172 0 0 0 0 52 49 18 0 0 0 82
1 0 0 79792 7436 479172 0 0 0 6 54 49 18 0 0 0 82
1 0 0 79824 7444 479172 0 0 0 5 54 51 18 0 0 0 82

How to sort pandas dataframe non-lexical?

What I do to sort credit in the following dataframe is to use sort_values() function (I've also tried sort()):
df.sort_values('credit', ascending=False, inplace=True)
The problem is that credits are sorted like below:
i credit m reg_date b id
----------------------------------------------------------------------
238 0 4600000.00 0 2014-04-14 False 102214
127 0 4600000.00 0 2014-12-30 False 159479
13 0 16800000.00 0 2015-01-12 False 163503
248 0 16720000.00 0 2012-11-11 False 5116
Ascending is False that's why 4600000.00 is before other credits. But this is not what I wanted. I wanted to sort based on the values. So in the sample above 16800000.00 and 16720000.00 should be before 4600000.00. How to sort this Dataframe non-lexical?
EDIT-1:
Data is more than that and can contain:
120 0 16708000.00 0 2013-12-17 False 51433
248 0 16720000.00 0 2012-11-11 False 5116
13 0 16800000.00 0 2015-01-12 False 163503
21 0 4634000.00 0 2014-12-29 False 159239
136 0 4650000.00 0 2012-11-07 False 4701
.. ... ... ... ... ... ...
231 0 7715000.00 0 2014-02-15 False 83936
182 0 7750000.00 0 2015-07-13 False 201584
You could sort the column separately as type float and use the index to slice the original index
In your case:
import pandas as pd
from StringIO import StringIO
text = """136 0 4650000.00 0 2012-11-07 False 4701
231 0 7715000.00 0 2014-02-15 False 83936
13 0 16800000.00 0 2015-01-12 False 163503
120 0 16708000.00 0 2013-12-17 False 51433
248 0 16720000.00 0 2012-11-11 False 5116
21 0 4634000.00 0 2014-12-29 False 159239
182 0 7750000.00 0 2015-07-13 False 201584
"""
df = pd.read_csv(StringIO(text), delim_whitespace=True,
header=None, index_col=0,
names=['i', 'credit', 'm', 'reg_date', 'b', 'id'])
print df.loc[df.credit.astype(float).sort_values(ascending=False).index]
i credit m reg_date b id
13 0 16800000.0 0 2015-01-12 False 163503
248 0 16720000.0 0 2012-11-11 False 5116
120 0 16708000.0 0 2013-12-17 False 51433
182 0 7750000.0 0 2015-07-13 False 201584
231 0 7715000.0 0 2014-02-15 False 83936
136 0 4650000.0 0 2012-11-07 False 4701
21 0 4634000.0 0 2014-12-29 False 159239

Fastest way to find the sign of different square

Given an image I and two matrices m_1 ;m_2 (same size with I). The function f is defined as:
Because my goal design wants to get the sign of f . Hence, the function f can rewritten as following:
I think that second formula is faster than first formula because: It
can ignore the square term
It can compute the sign directly, instead of two steps in first equation: compute the f and check sign.
Do you agree with me? Do you have another faster formula for f
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
The ouput f is
f =[1 1 1 1 1
1 1 -1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1]
I implemented the first way, but I did not finish the second way by matlab. Could you check help me the second way and compare it
UPDATE: I would like to add code of chepyle and Divakar to make clearly question. Note that both of them give the same result as above f
function compare()
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
function f=first_way()
f=sign((I-m1).^2-(I-m2).^2);
f(f==0)=1;
end
function f= second_way()
f = double(abs(I-m1) >= abs(I-m2));
f(f==0) = -1;
end
function f= third_way()
v1=abs(I-m1);
v2=abs(I-m2);
f= int8(v1>v2) + -1*int8(v1<v2); % need to convert to int from logical
f(f==0) = 1;
end
disp(['First way : ' num2str(timeit(#first_way))])
disp(['Second way: ' num2str(timeit(#second_way))])
disp(['Third way : ' num2str(timeit(#third_way))])
end
First way : 1.2897e-05
Second way: 1.9381e-05
Third way : 2.0077e-05
This seems to be comparable and might be a wee bit faster at times than the original approach -
f = sign(abs(I-m1) - abs(I-m2)) + sign(abs(m1-m2)) + ...
sign(abs(2*I-m1-m2)) - 1 -sign(abs(2*I-m1-m2) + abs(m1-m2))
Benchmarking Code
%// Create random inputs
N = 5000;
I = randi(1000,N,N);
m1 = randi(1000,N,N);
m2 = randi(1000,N,N);
num_iter = 20; %// Number of iterations for all approaches
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('------------------------- With Original Approach')
tic
for iter = 1:num_iter
out1 = sign((I-m1).^2-(I-m2).^2);
out1(out1==0)=-1;
end
toc, clear out1
disp('------------------------- With Proposed Approach')
tic
for iter = 1:num_iter
out2 = sign(abs(I-m1) - abs(I-m2)) + sign(abs(m1-m2)) + ...
sign(abs(2*I-m1-m2)) - 1 -sign(abs(2*I-m1-m2) + abs(m1-m2));
end
toc
Results
------------------------- With Original Approach
Elapsed time is 1.751966 seconds.
------------------------- With Proposed Approach
Elapsed time is 1.681263 seconds.
There is a problem with the accuracy of second formula, but for the sake of comparison, here's how I would implement it in matlab, along with a third approach to avoid squaring and the sign() function, inline with your intent. Note that the matlab's matrix and sign functions are pretty well optimized, the second and third approaches are both slower.
function compare()
I =[16 23 11 42 10
11 21 22 24 30
16 22 154 155 156
25 28 145 151 156
11 38 147 144 153];
m1 =[0 0 0 0 0
0 0 22 11 0
0 23 34 56 0
0 56 0 0 0
0 11 0 0 0];
m2 =[0 0 0 0 0
0 0 12 11 0
0 22 111 156 0
0 32 0 0 0
0 12 0 0 0];
function f=first_way()
f=sign((I-m1).^2-(I-m2).^2);
end
function f= second_way()
v1=(I-m1);
v2=(I-m2);
f= int8(v1<=0 & v2>0) + -1* int8(v1>0 & v2<=0);
end
function f= third_way()
v1=abs(I-m1);
v2=abs(I-m2);
f= int8(v1>v2) + -1*int8(v1<v2); % need to convert to int from logical
end
disp(['First way : ' num2str(timeit(#first_way))])
disp(['Second way: ' num2str(timeit(#second_way))])
disp(['Third way : ' num2str(timeit(#third_way))])
end
The output:
First way : 9.4226e-06
Second way: 1.2247e-05
Third way : 1.1546e-05

How to define contrast coefficient matrix?

I have this data
y x1 x2 pre
1 16 1 1 14
2 15 1 1 13
3 14 1 2 14
4 13 1 2 13
5 12 2 1 12
6 11 2 1 12
7 11 2 2 13
8 13 2 2 13
9 10 3 1 10
10 11 3 1 11
11 11 3 2 11
12 9 3 2 10
And I fitted the following model
lm(y ~ x1 + x2 + x1*x2)
My design matrix is
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 14 1 0 1 1 0
[2,] 1 13 1 0 1 1 0
[3,] 1 14 1 0 0 0 0
[4,] 1 13 1 0 0 0 0
[5,] 1 12 0 1 1 0 1
[6,] 1 12 0 1 1 0 1
[7,] 1 13 0 1 0 0 0
[8,] 1 13 0 1 0 0 0
[9,] 1 10 0 0 1 0 0
[10,] 1 11 0 0 1 0 0
[11,] 1 11 0 0 0 0 0
[12,] 1 10 0 0 0 0 0
I'm trying to use this design to reproduce the following table:
Source DF Squares Mean Square F Value Pr > F
Model 6 44.79166667 7.46527778 12.98 0.0064
Error 5 2.87500000 0.57500000
Corrected Total 11 47.66666667
Source DF Type III SS Mean Square F Value Pr > F
pre 1 3.12500000 3.12500000 5.43 0.0671
x1 2 4.58064516 2.29032258 3.98 0.0923
x2 1 3.01785714 3.01785714 5.25 0.0706
x1*x2 2 1.25000000 0.62500000 1.09 0.4055
The first part is fine
XtX <- t(x) %*% x
XtXinv <- solve(XtX)
betahat <- XtXinv %*% t(x) %*% y
H <- x %*% XtXinv %*% t(x)
IH <- (diag(1,12) - H)
yhat <- H %*% y
e <- IH %*% y
ybar <- mean(y)
MSS <- t(betahat) %*% t(x) %*% y - length(y)*(ybar^2)
ESS <- t(e) %*% e
TSS <- MSS + ESS
dfM <- sum(diag(H)) - 1
dfE <- sum(diag(IH))
dfT <- dfM + dfE
MSM <- MSS/dfM
MSE <- ESS/dfE
Ftest <- MSM / MSE
pr <- 1 - pf(Ftest, dfM, dfE)
The contrast coefficient matrix for 'pre' seems correct.
L <- matrix(c(0,1,0,0,0,0,0), 1, 7, byrow=T)
Lb <- L %*% betahat
LXtXinvLt <- round(L %*% XtXinv %*% t(L), digits=4)
SSpre <- t(Lb) %*% solve(LXtXinvLt) %*% (Lb)
MSpre <- SSpre / 1
Fpre <- MSpre / MSE
PRpre <- 1 - pf(Fpre, 1, 12-7)
But I can't understand how to define the contrast coefficient matrix for x1, x2, and x1*x2. What's the problem with the rest of my code? Below an example for how I think I should calculate for x1
L <- matrix(c(0,0,1,1,0,0,0), 1, 7, byrow=T)
Lb <- L %*% betahat
LXtXinvLt <- round(L %*% XtXinv %*% t(L), digits=4)
SSX1 <- t(Lb) %*% solve(LXtXinvLt) %*% (Lb)
MSX1 <- SSX1 / 1
FX1 <- MSX1 / MSE
PRX1 <- 1 - pf(FX1, 1, 12-7)
Thanks!

Resources