Background:
I've inherited an Elasticsearch project that is returning some very odd results, and I can't really determine what I need to do to properly fix this.
Based on my reading of the code, it appears that 4 queries are run against the index based on search terms - the first one being exact match, the second and subsequent searches allowing more "slop" and "fuzziness". The search results with the highest scores are then combined into a single return; duplicate matches with lower scores are discarded.
Problem:
The queries with "slop" and "fuzziness" seem to cycle through results every 3 times I run the query. I've determined this by looking for a specific unique item in the query result and it only shows up 1 out of every 3 times I run the query. This cycle happens for all 3 of the non-exact match queries.
Additional information:
Based on the results of _cat/segments?v&index=[MY_INDEX_NAME], it appears that we have the index spread across 3 computers but only one shard. This gives me some hope that this is the reason we're getting the correct result only 1 out of every 3 times, but it's still very confusing as to why this would happen.
Band-aid fix:
I've been able to get consistent results for these problematic queries by increasing the "size" parameter from 50 to 150. This does slow the query down by a small amount, but at least it works for now. I am pretty sure that this isn't the correct solution.
Topology:
/_cat/nodes:
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
xx.x.xx.x 2 58 0 0.00 0.00 0.00 i - elastic-ingest-001
xx.x.xx.x 55 86 9 0.59 0.47 0.33 md - elastic-data-002
xx.x.xx.x 1 57 0 0.03 0.02 0.00 i - elastic-ingest-000
xx.x.xx.x 21 94 9 1.05 0.96 0.64 md - elastic-data-001
xx.x.xx.x 18 84 7 0.22 0.21 0.19 md * elastic-data-000
xx.x.xx.xx 7 58 0 0.00 0.00 0.00 i - elastic-ingest-002
/_cat/indices?v:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .watcher-history-7-2018.06.05 aCfGd37MT5W2fJfK6HZjsQ 1 1 7 0 203kb 101.5kb
green open .watcher-history-7-2018.06.29 WpTLI_WUSVeDUblRh59uvg 1 1 12114 0 25.2mb 12.7mb
green open .watcher-history-7-2018.06.22 vt-LYb9NRSaZ46eReuixXg 1 1 11953 0 23.7mb 11.9mb
green open .monitoring-es-6-2018.06.30 dnNVGu7pQ1GriAZLfIAQaA 1 1 458763 672 587.3mb 292mb
green open .watcher-history-7-2018.06.02 8zM5yosrQIGJiSfzvMEC_A 1 1 0 0 460b 230b
green open .triggered_watches J9SWF-w8R2yd0aYtPBigAg 1 1 2 61157 53.7mb 24.4mb
green open .watcher-history-7-2018.06.07 x0aT6E71RNCdXjIFEhHOPw 1 1 12094 0 24.3mb 12.3mb
green open .watcher-history-7-2018.06.28 1nhqH54JQJOj9g63ov_NPw 1 1 8909 0 21mb 10.5mb
green open .watcher-history-7-2018.06.26 _rpVOWKkS1mERWgFA5Myag 1 1 9144 0 22.1mb 11.1mb
green open .watcher-history-7-2018.06.17 8zK45nMcR8WmGda4wGW82Q 1 1 12219 0 24.7mb 12.3mb
green open [DIFFERENT_INDEX01] 0GCz0zu3R6SaRjNHiWOa6g 1 2 1818246 0 1.3gb 470.9mb
green open .watcher-history-7-2018.06.20 FGhBth4OTJW-xusT7gplaw 1 1 12180 0 24.2mb 12.2mb
green open .watcher-history-7-2018.06.27 -lK0pwYiTvi3a08dO7AoyQ 1 1 8955 0 20.9mb 10.4mb
green open .watcher-history-7-2018.07.03 JmTpXIY7SXqoVodSpKRtMA 1 1 11896 0 24.3mb 12mb
green open .watcher-history-7-2018.07.05 GMCpHn7MTc-D1HEtDa-Ydw 1 1 7853 0 16.5mb 8.3mb
green open .watcher-history-7-2018.06.04 GXgFHhDdS9GJDou4sBd6RA 1 1 0 0 460b 230b
green open .watches a3dbI5smSauUB7nSc8alTw 1 1 6 0 221.2kb 110.5kb
green open .watcher-history-7-2018.06.19 aCzHvUa5SJ6n6wzKoXBJwA 1 1 12026 0 24mb 12.1mb
green open .watcher-history-7-2018.06.09 56pGfAiWQmeNog8JVZtICw 1 1 11983 0 23.9mb 11.9mb
green open .watcher-history-7-2018.06.01 MRqAmVqmThaIF_6KK5AlRQ 1 1 0 0 460b 230b
green open .watcher-history-7-2018.07.02 Ij_8wgk4T-aJ6-PYAf9gqg 1 1 12015 0 24.4mb 12.2mb
green open .watcher-history-7-2018.06.18 oZViVas5SoWd1D2_naVr3w 1 1 11996 0 23.9mb 11.9mb
green open .watcher-history-7-2018.06.03 2_V6x656RCKGTe0IZyCkqA 1 1 0 0 460b 230b
green open .watcher-history-7-2018.06.11 F4STy7gFS9a7e8qOV81AOA 1 1 11780 0 23.8mb 11.9mb
green open .watcher-history-7-2018.06.10 MjxPItf4SOKtk4l0bPH7Tg 1 1 11909 0 23.7mb 12mb
green open .monitoring-es-6-2018.07.04 3FPHjJFfTvuZrb71X3hcZA 1 1 501436 212 608mb 306.2mb
green open .watcher-history-7-2018.06.12 STvls1wbSvCOU_kRerqckg 1 1 11897 0 24.1mb 12.1mb
green open .monitoring-es-6-2018.07.05 k0wjXw5tR2KaBqrmvJAgCg 1 1 336928 0 488.2mb 242.3mb
green open .security-6 ldkFJ1TkRVScBdJIpA0Aeg 1 2 1 0 10.4kb 3.4kb
green open [DIFFERENT_INDEX02] RAcmKwl3RuiXMgGiRlX1HQ 2 2 46436060 0 60.8gb 20.2gb
green open .monitoring-es-6-2018.07.03 nmBQmnnoTL2wZuF0O_pt1w 1 1 484715 306 593.1mb 305.2mb
green open .monitoring-es-6-2018.06.28 lZR6SssRRx-yPQXk_vfBsw 1 1 97451 192 124.2mb 62mb
green open .watcher-history-7-2018.07.04 8nDY3NoORYmWLOGpX5hb_g 1 1 12082 0 24.9mb 12.4mb
green open .watcher-history-7-2018.07.01 _hmho-_zSu-D9H90gCKzWg 1 1 12072 0 24.9mb 12.5mb
green open .watcher-history-7-2018.06.15 PGXkh70YTjOYhFLjK9a8pA 1 1 11946 0 24.3mb 12.1mb
green open .watcher-history-7-2018.06.21 BEPkxD46TKm2y3yEaGgHNQ 1 1 12077 0 24mb 12.1mb
green open .watcher-history-7-2018.06.14 Y74e7fY4SKS1aT8PK-S2vg 1 1 11907 0 23.9mb 12mb
green open .watcher-history-7-2018.06.06 7opzBsl1SF-mQ_O8Y_5sJg 1 1 1424 0 3.1mb 1.5mb
green open .monitoring-es-6-2018.07.01 AOG4_pk8RB-UanCjMM6GHg 1 1 467312 294 583.3mb 284.9mb
green open .watcher-history-7-2018.06.24 pYKR7RG3RuGdgw7naxn-5Q 1 1 11955 0 23.8mb 11.8mb
green open .watcher-history-7-2018.06.30 j4GdW5xhSNKeqT_c1376AQ 1 1 12125 0 25.1mb 12.7mb
green open [DIFFERENT_INDEX03] CDDpop1nTv6E3466IIhzCg 1 2 4591962 766253 9.4gb 2.6gb
green open .watcher-history-7-2018.06.08 5eP2tPteTwGnoGJhQ37HoA 1 1 11848 0 23.8mb 12.1mb
green open .watcher-history-7-2018.06.25 7xbkQaObSQWJhg93_PmFQw 1 1 12041 0 24.8mb 12.4mb
green open .monitoring-es-6-2018.07.02 HBRphDn_TcSEXIiFn0ZtQg 1 1 475272 300 593.9mb 295mb
green open .watcher-history-7-2018.06.13 CWOQnBuKTNa-DLGvo8XlMQ 1 1 11909 0 23.7mb 11.9mb
green open [MY_INDEX] NdA3qJ16RGa5hpxvKpsDsg 1 2 10171359 1260206 24.1gb 6.4gb
green open .monitoring-alerts-6 5HGKo73hQqa0dakVhdon6w 1 1 48 3 127.1kb 52.1kb
green open .watcher-history-7-2018.06.16 7xyor_rvTemap3DWx6vkqg 1 1 12015 0 24.2mb 12.1mb
green open .monitoring-es-6-2018.06.29 UfXjNo-ATjKKA0Hv5jZw-A 1 1 450751 0 580.1mb 287.3mb
green open .watcher-history-7-2018.06.23 MyZMWHeYSm65MDen6WSGkw 1 1 11919 0 23.8mb 11.9mb
/MY_INDEX/_search_shards
{
"nodes": {
"9cP8Z9B8SFqq9Plszz7-HQ": {
"name": "elastic-data-000",
"ephemeral_id": "5Mm87T8lR5CFoIjoFGLQGg",
"transport_address": "removed",
"attributes": {}
},
"gg6rbEX8QdqujYjuAu9kvw": {
"name": "elastic-data-002",
"ephemeral_id": "I6ZpdVLgTyigh-2f7gtMFQ",
"transport_address": "removed",
"attributes": {}
},
"JDakz0EGT6aib0m87CfiCg": {
"name": "elastic-data-001",
"ephemeral_id": "c-Z3VRmtTsubCbXiSfsyOg",
"transport_address": "removed",
"attributes": {}
}
},
"indices": {
"MY_INDEX": {}
},
"shards": [
[
{
"state": "STARTED",
"primary": true,
"node": "9cP8Z9B8SFqq9Plszz7-HQ",
"relocating_node": null,
"shard": 0,
"index": "MY_INDEX",
"allocation_id": {
"id": "IQzbKGCMR9O0BobnePiKpg"
}
},
{
"state": "STARTED",
"primary": false,
"node": "gg6rbEX8QdqujYjuAu9kvw",
"relocating_node": null,
"shard": 0,
"index": "MY_INDEX",
"allocation_id": {
"id": "3fvXIyXGTa2NgAsb_uv78A"
}
},
{
"state": "STARTED",
"primary": false,
"node": "JDakz0EGT6aib0m87CfiCg",
"relocating_node": null,
"shard": 0,
"index": "MY_INDEX",
"allocation_id": {
"id": "whHOsuxfTdSnQDi-9RAuKw"
}
}
]
]
}
Related
I have a very basic economic dataset in a consistent wide panel format: years as column names, rows denoting different countries. This should be pretty straightforward to reshape into a long panel. I have found workarounds but I'd like to know how to do this with panelr::long_panel() since it is so much simpler.
I keep on getting "column name "" cannot match any column" error however. Here is a reproducible example
library(panelr)
mockcountries <- c("A", "B", "C")
mockyears <- c(2001:2020)
mockdata <- data.frame(replicate(20,sample(0:1,3,rep=TRUE)))
mockdata <- cbind(mockcountries, mockdata)
colnames(mockdata) <- c("id", mockyears)
At this point the data looks like this:
id 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
1 A 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 0
2 B 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1 0
3 C 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 0
2017 2018 2019 2020
1 1 0 1 1
2 0 1 1 0
3 1 1 0 0
Then I try to use panelr::long_panel()
mockdata_panel <- panelr::long_panel(mockdata,
id = "id",
begin = 2001,
end = 2020)
#OR alternatively
years <- c("2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013",
"2014", "2015", "2016", "2017", "2018", "2019", "2020")
mockdata_panel <- panelr::long_panel(mockdata,
id = "id",
periods = years)
And I get the following error:
Error in [<-.data.frame(*tmp*, , v.names, value = c(0L, 0L, 1L)) :
column name "" cannot match any column
Neither approach seems to work. Where does it go wrong? Thank you!
Why are there so many free movable DMA32 blocks on the x86 64bits platform?
As its name, I think it is used for DMA. But 730 free blocks(with order 10) means more than 1GB memory. How huge the memory is!
cat /proc/pagetypeinfo says:
sudo cat /proc/pagetypeinfo
Page block order: 9
Pages per block: 512
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 0 1 1 0 2 1 1 0 1 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Unmovable 1 0 0 0 0 0 1 1 1 1 0
Node 0, zone DMA32, type Movable 3 4 5 4 2 3 4 4 1 2 730
Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Unmovable 17 2 2 1 0 0 0 1 2 13 0
Node 0, zone Normal, type Movable 15 4 0 15 4 1 1 0 0 0 934
Node 0, zone Normal, type Reclaimable 0 6 21 9 6 3 3 1 2 0 0
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable HighAtomic Isolate
Node 0, zone DMA 1 7 0 0 0
Node 0, zone DMA32 2 1526 0 0 0
Node 0, zone Normal 160 2314 78 0 0
I want to align the memory of a 5x5 matrix represented as an one-dimensional array.
The original array looks like this:
let mut a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25];
or
[ 1 2 3 4 5 ]
[ 6 7 8 9 10 ]
a = [ 11 12 13 14 15 ]
[ 16 17 18 19 20 ]
[ 21 22 23 24 25 ]
with a length of 25 elements.
after resizing the memory to memory aligned bounds (power of 2), the array will look like this:
a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 6 7 8 ]
[ 9 10 11 12 13 14 15 16 ]
[ 17 18 19 20 21 22 23 24 ]
[ 25 0 0 0 0 0 0 0 ]
a = [ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
the len of a is now 64 elements.
so it will become an 8x8 matrix
the goal is to have following representation:
a = [1 2 3 4 5 0 0 0 6 7 8 9 10 0 0 0 11 12 13 14 15 0 0 0 16 17 18 19 20 0 0 0 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 0 0 0 ]
[ 6 7 8 9 10 0 0 0 ]
[ 11 12 13 14 15 0 0 0 ]
[ 16 17 18 19 20 0 0 0 ]
[ 21 22 23 24 25 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
The background is to have a memory aligned to a power of two, so calculations can be partially done in parallel ( for OpenCL float4, or the available vector sizes.). I also do not want to use a new array to simply insert the old elements at the correct positions to keep memory consumption low.
At first, I thought about swapping the elements at the range, where there should be a zero with the elements at the end of the array, keeping a pointer to the elements and simulating a queue, but elements would stack up towards the end, and I didn't came up with a working solution.
My language of choice is rust. Is there any smart algorithm to achieve the desired result?
So you have an N * N matrix represented as a vector of size N^2, then you resize the vector to M^2 (M > N), so that the first N^2 elements are the original ones. Now you want to rearrange the original elements, so that the N * N sub-matrix in the upper left of the M * M matrix is the same as the original.
One thing to note is that if you go backwards you will never overwrite a value that you will need later.
The position of index X in the M * M matrix is row X / M (integer division) and column X % M.
The desired position of index X is row X / N and column X % N
An element at row R and column C in the M * M matrix has the index R * M + C
Now taking all this information we can come up with the formula to get the new index Y for the old index X:
Y = (X / N) * M + (X % N)
So you can just make a loop from N^2 - 1 to N and copy the element to the new position calculated with the formula and set its original position to 0. (Everything is 0-based, I hope rust is 0-based as well or you will have to add some +1.)
According to maraca's solution, the code would look like this:
fn zeropad<T: Copy>(
first: T,
data: &mut Vec<T>,
dims: (usize, usize),
) -> (usize, usize) {
let r = next_pow2(dims.0);
let c = next_pow2(dims.1);
if (r, c) == (dims.0, dims.1) {
return (r, c);
}
let new_len = r * c;
let old_len = data.len();
let old_col = dims.1;
// resize
data.resize(new_len, first);
for i in (old_col..old_len).rev() {
let row: usize = i / c;
let col: usize = i % c;
// bigger matrix
let pos_old = row * c + col;
// smaller matrix
let pos_new = (i / dims.1) * c + (i % dims.1);
data[pos_new] = data[pos_old];
data[pos_old] = first;
}
return (r, c);
}
I have the following data:
client_id <- c(1,2,3,1,2,3)
product_id <- c(10,10,10,20,20,20)
connected <- c(1,1,0,1,0,0)
clientID_productID <- paste0(client_id,";",product_id)
df <- data.frame(client_id, product_id,connected,clientID_productID)
client_id product_id connected clientID_productID
1 1 10 1 1;10
2 2 10 1 2;10
3 3 10 0 3;10
4 1 20 1 1;20
5 2 20 0 2;20
6 3 20 0 3;20
The goal is to produce a relational matrix:
client_id product_id clientID_productID client_pro_1_10 client_pro_2_10 client_pro_3_10 client_pro_1_20 client_pro_2_20 client_pro_3_20
1 1 10 1;10 0 1 0 0 0 0
2 2 10 2;10 1 0 0 0 0 0
3 3 10 3;10 0 0 0 0 0 0
4 1 20 1;20 0 0 0 0 0 0
5 2 20 2;20 0 0 0 0 0 0
6 3 20 3;20 0 0 0 0 0 0
In other words, when product_id equals 10, clients 1 and 2 are connected. Importantly, I do not want client 1 to be connected with herself. When product_id=20, I have only one client, meaning that there is no connection, so I should have only zeros.
To be more specific, all that I am trying to create is a square matrix of relations, with all the combinations of client/product in the columns. A client can only be connected with another if they bought the same product.
I have searched a bunch and played with other code. The difference between this problem and others already answered is that I want to keep on my table client number 3, even though she never bought any product. I want to show that she does not have a relationship with any other client. Right now, I am able to create the matrix by stacking the relationships by product (How to create relational matrix in R?), but I am struggling with a way to not stack them.
I apologize if the question is not specific enough, or too specific. Thank you anyway, stackoverflow is a lifesaver for beginners.
I believe I figured it out.
It is for sure not the most elegant answer, though.
client_id <- c(1,2,3,1,2,3)
product_id <- c(10,10,10,20,20,20)
connected <- c(1,1,0,1,0,0)
clientID_productID <- paste0(client_id,";",product_id)
df <- data.frame(client_id, product_id,connected,clientID_productID)
df2 <- inner_join(df[c(1:3)], df[c(1:3)], by = c("product_id", "connected"))
df2$Source <- paste0(df2$client_id.x,"|",df2$product_id)
df2$Target <- paste0(df2$client_id.y,"|",df2$product_id)
df2 <- df2[order(df2$product_id),]
indices = unique(as.character(df2$Source))
mtx <- as.matrix(dcast(df2, Source ~ Target, value.var="connected", fill=0))
rownames(mtx) = mtx[,"Source"]
mtx <- mtx[,-1]
diag(mtx)=0
mtx = as.data.frame(mtx)
mtx = mtx[indices, indices]
I got the result I wanted:
1|10 2|10 3|10 1|20 2|20 3|20
1|10 0 1 0 0 0 0
2|10 1 0 0 0 0 0
3|10 0 0 0 0 0 0
1|20 0 0 0 0 0 0
2|20 0 0 0 0 0 0
3|20 0 0 0 0 0 0
What is the correct way to track number of dropped or rejected events in the managed elasticsearch cluster?
GET /_nodes/stats/thread_pool which gives you something like:
"thread_pool": {
"bulk": {
"threads": 4,
"queue": 0,
"active": 0,
"rejected": 0,
"largest": 4,
"completed": 42
}
....
"flush": {
"threads": 0,
"queue": 0,
"active": 0,
"rejected": 0,
"largest": 0,
"completed": 0
}
...
Another way to get more concise and better formatted info (especially if you are dealing with several nodes) about thread pools is to use the _cat threadpool API
$ curl -XGET 'localhost:9200/_cat/thread_pool?v'
host ip bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected
10.10.1.1 10.10.1.1 1 10 0 2 0 0 10 0 0
10.10.1.2 10.10.1.2 2 0 1 4 0 0 4 10 2
10.10.1.3 10.10.1.3 1 0 0 1 0 0 5 0 0
UPDATE
You can also decide which thread pools to show and for each thread pool which fields to include in the output. For instance below, we're showing the following fields from the search threadpool:
sqs: The maximum number of search requests that can be queued before being rejected
sq: The number of search requests in the search queue
sa: The number of currently active search threads
sr: The number of rejected search threads (since the last restart)
sc: The number of completed search threads (since the last restart)
Here is the command:
curl -s -XGET 'localhost:9200/_cat/thread_pool?v&h=ip,sqs,sq,sa,sr,sc'
ip sqs sq sa sr sc
10.10.1.1 100 0 1 0 62636120
10.10.1.2 100 0 2 0 15528863
10.10.1.3 100 0 4 0 64647299
10.10.1.4 100 0 5 372 103014657
10.10.1.5 100 0 2 0 13947055