Elasticsearch read_only_allow_delete auto setting - elasticsearch

I have problem with Elasticsearch. I tried the following:
$ curl -XPUT -H "Content-Type: application/json" \
http://localhost:9200/_all/_settings \
-d '{"index.blocks.read_only_allow_delete": false}'
My settings:
"settings": {
"index": {
"number_of_shards": "5",
"blocks": {
"read_only_allow_delete": "true"
},
"provided_name": "new-index",
"creation_date": "1515433832692",
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": "2",
"max_gram": "4"
}
},
"analyzer": {
"ngram_analyzer": {
"filter": [
"ngram_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "OSG7CNAWR9-G3QC75K4oQQ",
"version": {
"created": "6010199"
}
}
}
When I check settings it looks fine, but only a few seconds (3-5) and it's still set to true. I can't add new elements and query anything, only _search and delete.
Someone have any idea how to resolve this?
NOTE: I'm using Elasticsearch version: 6.1.1

Elasticsearch automatically sets "read_only_allow_delete": "true" when hard disk space is low.
Find the files which are filling up your storage and delete/move them. Once you have sufficient storage available run the following command through the Dev Tool in Kibana:
PUT your_index_name/_settings
{
"index": {
"blocks": {
"read_only_allow_delete": "false"
}
}
}
OR (through the terminal):
$ curl -XPUT -H "Content-Type: application/json" \
http://localhost:9200/_all/_settings \
-d '{"index.blocks.read_only_allow_delete": false}'
as mentioned in your question.

In an attempt to add a sprinkling of value to the accepted answer (and because i'll google this and come back in future), for my case the read_only_allow_delete flag was set because of the default settings for disk watermark being percentage based - which on my large disk did not make as much sense. So I changed these settings to be "size remaining" based as the documentation explains.
So before setting read_only_allow_delete back to false, I first set the watermark values based on disk space:
(using Kibana UI):
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "20gb",
"cluster.routing.allocation.disk.watermark.high": "15gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "10gb"
}
}
PUT your_index_name/_settings
{
"index": {
"blocks": {
"read_only_allow_delete": "false"
}
}
}
OR (through the terminal):
$ curl -XPUT -H "Content-Type: application/json" \
http://localhost:9200/_cluster/_settings \
-d '{"cluster.routing.allocation.disk.watermark.low": "20gb",
"cluster.routing.allocation.disk.watermark.high": "15gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "10gb"}'
$ curl -XPUT -H "Content-Type: application/json" \
http://localhost:9200/_all/_settings \
-d '{"index.blocks.read_only_allow_delete": false}'

Background
We maintain a cluster where we have filebeat, metricbeat, packetbeat, etc. shippers pushing data into the cluster. Invariably some index would become hot and we'd want to either disable writing to it for a time or do clean up and reenable indices which had breached their low watermark thresholds and had automatically gone into read_only_allow_delete: true.
Bash Functions
To ease the management of our clusters for the rest of my team I wrote the following Bash functions to help perform these tasks without having to fumble around with curl or through Kibana's UI.
$ cat es_funcs.bash
### es wrapper cmd inventory
declare -A escmd
escmd[l]="./esl"
escmd[p]="./esp"
### es data node naming conventions
nodeBaseName="rdu-es-data-0"
declare -A esnode
esnode[l]="lab-${nodeBaseName}"
esnode[p]="${nodeBaseName}"
usage_chk1 () {
# usage msg for cmds w/ 1 arg
local env="$1"
[[ $env =~ [lp] ]] && return 0 || \
printf "\nUSAGE: ${FUNCNAME[1]} [l|p]\n\n" && return 1
}
enable_readonly_idxs () {
# set read_only_allow_delete flag
local env="$1"
usage_chk1 "$env" || return 1
DISALLOWDEL=$(cat <<-EOM
{
"index": {
"blocks": {
"read_only_allow_delete": "true"
}
}
}
EOM
)
${escmd[$env]} PUT '_all/_settings' -d "$DISALLOWDEL"
}
disable_readonly_idxs () {
# clear read_only_allow_delete flag
local env="$1"
usage_chk1 "$env" || return 1
ALLOWDEL=$(cat <<-EOM
{
"index": {
"blocks": {
"read_only_allow_delete": "false"
}
}
}
EOM
)
${escmd[$env]} PUT '_all/_settings' -d "$ALLOWDEL"
}
Example Run
The above functions can be sourced in your shell like so:
$ . es_funcs.bash
NOTE: The arrays at the top of the file map short names for clusters if you happen to have multiple. We have 2, one for our lab and one for our production. So I represented those as l and p.
You can then run them like this to enable the read_only_allow_delete attribute (true) on your l cluster:
$ enable_readonly_idxs l
{"acknowledged":true}
or p:
$ enable_readonly_idxs p
{"acknowledged":true}
Helper Script Overview
There's one additional script that contains the curl commands which I use to interact with the clusters. This script is referenced in the escmd array at the top of the es_func.bash file. The array contains names of symlinks to a single shell script, escli.bash. The links are called esl and esp.
$ ll
-rw-r--r-- 1 smingolelli staff 9035 Apr 10 23:38 es_funcs.bash
-rwxr-xr-x 1 smingolelli staff 1626 Apr 10 23:02 escli.bash
-rw-r--r-- 1 smingolelli staff 338 Apr 5 00:27 escli.conf
lrwxr-xr-x 1 smingolelli staff 10 Jan 23 08:12 esl -> escli.bash
lrwxr-xr-x 1 smingolelli staff 10 Jan 23 08:12 esp -> escli.bash
The escli.bash script:
$ cat escli.bash
#!/bin/bash
#------------------------------------------------
# Detect how we were called [l|p]
#------------------------------------------------
[[ $(basename $0) == "esl" ]] && env="lab1" || env="rdu1"
#------------------------------------------------
# source escli.conf variables
#------------------------------------------------
# g* tools via brew install coreutils
[ $(uname) == "Darwin" ] && readlink=greadlink || readlink=readlink
. $(dirname $($readlink -f $0))/escli.conf
usage () {
cat <<-EOF
USAGE: $0 [HEAD|GET|PUT|POST] '...ES REST CALL...'
EXAMPLES:
$0 GET '_cat/shards?pretty'
$0 GET '_cat/indices?pretty&v&human'
$0 GET '_cat'
$0 GET ''
$0 PUT '_all/_settings' -d "\$DATA"
$0 POST '_cluster/reroute' -d "\$DATA"
EOF
exit 1
}
[ "$1" == "" ] && usage
#------------------------------------------------
# ...ways to call curl.....
#------------------------------------------------
if [ "${1}" == "HEAD" ]; then
curl -I -skK \
<(cat <<<"user = \"$( ${usernameCmd} ):$( ${passwordCmd} )\"") \
"${esBaseUrl}/$2"
elif [ "${1}" == "PUT" ]; then
curl -skK \
<(cat <<<"user = \"$( ${usernameCmd} ):$( ${passwordCmd} )\"") \
-X$1 -H "${contType}" "${esBaseUrl}/$2" "$3" "$4"
elif [ "${1}" == "POST" ]; then
curl -skK \
<(cat <<<"user = \"$( ${usernameCmd} ):$( ${passwordCmd} )\"") \
-X$1 -H "${contType}" "${esBaseUrl}/$2" "$3" "$4"
else
curl -skK \
<(cat <<<"user = \"$( ${usernameCmd} ):$( ${passwordCmd} )\"") \
-X$1 "${esBaseUrl}/$2" "$3" "$4" "$5"
fi
This script takes a single property file, escli.conf. In this file you specify the commands to retrieve your username + password from whereever, I use LastPass for that so retrieve them via lpass as well as setting the base URL to use for accessing your clusters REST API.
$ cat escli.conf
#################################################
### props used by escli.bash
#################################################
usernameCmd='lpass show --username somedom.com'
passwordCmd='lpass show --password somedom.com'
esBaseUrl="https://es-data-01a.${env}.somdom.com:9200"
contType="Content-Type: application/json"
I've put all this together in a Github repo (linked below) which also includes additional functions beyond the above 2 that I'm showing as examples for this question.
References
https://github.com/slmingol/escli

Related

creating a nested json output file using jq variables using

I'm trying to create a json file via shellscript, and someone has mentioned jq, but I'm struggling a little bit to make it work:
The desire output is:
inboundurls{
"op": "add",
"path": "/support",
"apiSupports": [
{
"familyType": "EXAMPLE",
"healthCheckUris": "http://example.com"
}
],
"inboundurls": [
{
"healthCheckUris": "http://example.com"
}
]
}
Researching about I found a start point, but it's not working properly, I need some help, here is what I have:
script:
#!/bin/bash
apiSupports=$(jq -n --arg familyType EXAMPLE \
--arg healthCheckUris http://example.com \
'$ARGS.named'
)
final=$(jq -n --arg op "add" \
--arg path "/supportServices" \
--argjson apiSupports "[$apiSupports]" \
'$ARGS.named'
)
echo "$final"
the output of the script above:
{
"op": "add",
"path": "/supportServices",
"apiSupports": [
{
"familyType": "EXAMPLE",
"healthCheckUris": "http://example.com"
}
]
}
If anyone could help me I would be glad, or even suggesting Ideas, thank you in advance?
The following produces the valid JSON component of what is shown as the desired output:
jq -n --arg op "add" \
--arg path "/support" \
--arg familyType EXAMPLE \
--arg healthCheckUris http://example.com '
{$op, $path,
apiSupports: [ {$familyType, $healthCheckUris }],
inboundurls: [ {$healthCheckUris }]
}
'

Adding Mailchimp members through Bash from .csv file

I have got around 1000 contacts to import to Mailchimp. This is my company's old database, which we have exported from the CSM system, and we want every contact to confirm their subscription if they want to be on our subscription list.
When I try to import it through Mailchimp, I can't give the contact status pending.
So, I have managed how to do it with single contact through bash, but I will want to import the whole contact list.
I am not familiar with this scripting language that much, so can anybody advise me, is there a way to import the data from the CSV file and how can I do it?
Or maybe there is some other way to do it?
This is the code that is working for a single contact:
#!/bin/bash
set -euo pipefail
list_id="Add_LIST_ID"
user_email="Add_E_MAIL"
user_fname="Add_F_NAME"
user_lname="Add_L_NAME"
curl -sS --request POST \
--url "https://$API_SERVER.api.mailchimp.com/3.0/lists/$list_id/members" \
--user "key:$API_KEY" \
--header 'content-type: application/json' \
--data #- \
<<EOF | jq '.id'
{
"email_address": "$user_email",
"status": "pending",
"merge_fields": {
"FNAME": "$user_fname",
"LNAME": "$user_lname"
}
}
EOF
EDIT1
Okay, I have managed to load the data from csv file. The code is below.
while IFS=, read -r col1
do
{
#!/bin/bash
set -euo pipefail
list_id="LIST_ID"
echo "$col1"
curl -sS --request POST \
--url "https://$API_SERVER.api.mailchimp.com/3.0/lists/$list_id/members" \
--user "key:$API_KEY" \
--header 'content-type: application/json' \
--data #- \
<<EOF | jq '.id'
{
"email_address": "$(echo $col1)",
"status": "pending",
"merge_fields": {
"FNAME": "",
"LNAME": ""
}
}
EOF
}
done < mails.csv
I have put echo line after list_id to see if the data is imported correctly.
The code is working (no errors in the buildup), but I have managed to add a contact to the list only once (subscriber hash is the response). In other tries, I have got a "null" value in response. Does anybody know why?

How to poll the asda website more frequently (bash)

I make a script what will check the asda home delivery slots from the api
Here it is, I call it get_slots.sh
You have to start tor or if you don't then you have to get rid of the line about sock5 hostname (you can see tor port number in command line with ps) but if you don't use tor they might cancel your account if they get narky about you polling their website
obviously u have to change the vars at the top
Query parameters and api url was kind of found out with inspector in chrome while using their normal java script thingy for joe public, top secret NOT
#!/bin/bash
my_postcode="SW1A1AA" # CHANGEME
account_id=18972357834 # JUST INVENT A NUMBER
order_id=22985263473 # LIKEWISE
ua='user_agent_I_want_to_fake'
my_tor_port=9150
#----------------
#ftype="POPUP"
#ftype="INSTORE_PICKUP"
ftype="DELIVERY"
format="%Y-%m-%dT00:00:00+01:00"
start_date=$(date "+$format")
end_date=$(date -d "+16 days" "+$format")
read -r -d '' data <<EOF
{
"data": {
"customer_info": {
"account_id": "$account_id"
},
"end_date": "$end_date",
"order_info": {
"line_item_count": 0,
"order_id": "$order_id",
"restricted_item_types": [],
"sub_total_amount": 0,
"total_quantity": 0,
"volume": 0,
"weight": 0
},
"reserved_slot_id": "",
"service_address": {"postcode":"$my_postcode"},
"service_info": {
"enable_express": false,
"fulfillment_type": "$ftype"
},
"start_date": "$start_date"
},
"requestorigin": "gi"
}
EOF
data=$(echo $data | tr -d ' ')
url='https://groceries.asda.com/api/v3/slot/view'
referer='https://groceries.asda.com/checkout/book-slot?origin=/account/orders'
curl -s \
--socks5-hostname localhost:$my_tor_port \
-H "Content-type: application/json; charset=utf-8" \
-H "Referer: $referer" \
-A "$ua" \
-d "$data" \
$url \
| python -m json.tool
anyway now i make another script to keep running it and mail me if any slots are available,
more vars u need 2 change at the top of this one
#!/bin/sh
me="my#email.address"
my_smtp_server="smtp.myisp.net:25"
#------------------------------------
mailed=0
ftmp=/tmp/slots.$$
while true
do
date
f=slots/`date +%Y%m%d/%H/%Y-%m-%d_%H%M%S`.json
d=`dirname $f`
[ -d $d ] || mkdir -p $d
./get_slots.sh > $f
if egrep -B1 'status.*"AVAILABLE"' $f > $ftmp
then
echo "found"
if [ $mailed -eq 0 ]
then
dates=`perl -nle '/start_time.*2020-(..-..T..):/ && print $1' $ftmp`
mailx \
-r "$me" -s "asda on $dates lol" \
-S smtp="$my_smtp_server" "$me" < $ftmp
echo "mailed"
mailed=1
fi
fi
sleep 120
done
so i kind of naughty here cos i need the timestamp for slots with status available to put in the email ... and i really cba to parse the json properly so i just rely on its in the line before the status
like the pretty printed json puts the stuff in alphfabetical order and comes out with something like
"slot_info": {
STUFF
"slot_type": null,
"start_time": "2020-06-10T19:00:00Z",
"status": "AVAILABLE",
"total_discount": 0.0,
"total_premium": 0.0,
MORE STUFF
so yeah all i do is egrep -B1
oh yeah i also naughty hard coded 2020 not do proper regex for the year, cos if this is all still going on after 2020 i might as well just starve anyway so dont want to over engineer it
anyway as you can see once it already mailed me it still keeps running cos i want to store the json files and may be analise them laters , it just dont mail me again after that unless i re start it
anyway my question is my script only check every two minutes and i want it to check more often so i can beat people.
okay sorted it the sleep 120 is 2 minutes i thought it was 1.2 minutes sorry forgot a minute is 60 seconds not 100 lol
oh yeah dont worry im not going to do this every 5 seconds like....!
just now i know the sleep is working properly i can change it to 60, still no more often than a lot of the people sat there re loading it manually believe me ......

Editing GIST with cURL

#!/bin/bash
COMMIT=$(git log -1 --pretty=format:'{"subject": "%s", "name": "xxx", "date": "%cD"}')
curl -X PATCH -d'{"files": {"latest-commit": {"content": "$COMMIT"}}}' -u user:xxxx https://api.github.com/gists/xxx
This just shows $COMMIT in the Gist. I tried playing with ''' and stuff but cannot make this work.
Your $COMMIT variable is not expanded to its value, because it is enclosed in single-quotes.
About an actual implementation in Bash
The GitHub API require you send the file content as a string: https://developer.github.com/v3/gists/#input-1
When file content contains newlines, double quotes or other characters needing an escaping within a string, the most appropriate shell tool to fill-in and escape the content string is jq.
JavaScript provide a JSON.stringify() method, but here in the shell world, we use jq to process JSON data.
If you don't have jq available you can convert the content of the file, to a properly escaped JSON string with GNU sed this way:
# compose the GitHub API JSON data payload
# to update the latest-commit.json file in the $gist_id
# uses sed to properly fill-in and escape the content string
read -r -d '' json_data_payload <<EOF
{
"description": "Updated from GitHub API call in Bash",
"files": {
"latest-commit.json": {
"filename": "latest-commit.json",
"content": "$(
sed ':a;N;$!ba;s/\n/\\n/g;s/\r/\\r/g;s/\t/\\t/g;s/"/\\"/g;' <<<"$latest_commit_json_content"
)"
}
}
}
EOF
This is how jq is used to fill the content string with proper escaping:
json_data_payload="$(
jq \
--arg content "$latest_commit_json_content" \
--compact-output \
'.files."latest-commit.json".content = $content' \
<<'EOF'
{
"files": {
"latest-commit.json": {
"filename": "latest-commit.json",
"content": ""
}
}
}
EOF
)"
Detailed and tested ok implementation:
#!/usr/bin/env bash
# Set to the gist id to update
gist_id='4b85f310233a6b9d385643fa3a889d92'
# Uncomment and set to your GitHub API OAUTH token
github_oauth_token='###################'
# Or uncomment this and set to your GitHub username:password
#github_user="user:xxxx"
github_api='https://api.github.com'
gist_description='Gist update with API call from a Bash script'
filename='latest-commit.json'
get_file_content() {
# Populate variables from the git log of latest commit
# reading null delimited strings for safety on special characters
{
read -r -d '' subject
read -r -d '' author
read -r -d '' date
} < <(
# null delimited subject, author, date
git log -1 --format=$'%s%x00%aN%x00%cD%x00'
)
# Compose the latest commit JSON, and populate it with the latest commit
# variables, using jq to ensure proper encoding and formatting of the JSON
read -r -d '' jquery <<'EOF'
.subject = $subject |
.author = $author |
.date = $date
EOF
jq \
--null-input \
--arg subject "$subject" \
--arg author "$author" \
--arg date "$date" \
"$jquery"
}
# compose the GitHub API JSON data payload
# to update the latest-commit.json file in the $gist_id
# uses jq to properly fill-in and escape the content string
# and compact the output before transmission
get_gist_update_json() {
read -r -d '' jquery <<'EOF'
.description = $description |
.files[$filename] |= (
.filename = $filename |
.content = $content
)
EOF
jq \
--null-input \
--compact-output \
--arg description "$gist_description" \
--arg filename "$filename" \
--arg content "$(get_file_content)" \
"$jquery"
}
# prepare the curl call with options for the GitHub API request
github_api_request=(
curl # The command to send the request
--fail # Return shell error if request unsuccessful
--request PATCH # The request type
--header "Content-Type: application/json" # The MIME type of the request
--data "$(get_gist_update_json)" # The payload content of the request
)
if [ -n "${github_oauth_token:-}" ]; then
github_api_request+=(
# Authenticate the GitHub API with a OAUTH token
--header "Authorization: token $github_oauth_token"
)
elif [ -n "${github_user:-}" ]; then
github_api_request+=(
# Authenticate the GitHub API with an HTTP auth user:pass
--user "$github_user"
)
else
echo 'GitHub API require either an OAUTH token or a user:pass' >&2
exit 1
fi
github_api_request+=(
-- # End of curl options
"$github_api/gists/$gist_id" # The GitHub API url to address the request
)
# perform the GitHub API request call
if ! "${github_api_request[#]}"; then
echo "Failed execution of:" >&2
env printf '%q ' "${github_api_request[#]}" >&2
echo >&2
fi
Here is the generated curl call with my token redacted out:
curl --fail --request PATCH --header 'Content-Type: application/json' \
--data '{"description":"Hello World Examples","files":{"latest-commit.json":{"filename":"latest-commit.json","content":"{\n \"subject\": \"depricate Phosphor\",\n \"name\": \"Blood Asp\",\n \"date\": \"Wed, 12 Dec 2018 18:55:39 +0100\"\n}"}}}' \
--header 'Authorization: token xxxx-redacted-xxxx' \
-- \
https://api.github.com/gists/4b85f310233a6b9d385643fa3a889d92
And the JSON response it replied with:
"url": "https://api.github.com/gists/4b85f310233a6b9d385643fa3a889d92",
"forks_url": "https://api.github.com/gists/4b85f310233a6b9d385643fa3a889d92/forks",
"commits_url": "https://api.github.com/gists/4b85f310233a6b9d385643fa3a889d92/commits",
"id": "4b85f310233a6b9d385643fa3a889d92",
"node_id": "MDQ6R2lzdDRiODVmMzEwMjMzYTZiOWQzODU2NDNmYTNhODg5ZDky",
"git_pull_url": "https://gist.github.com/4b85f310233a6b9d385643fa3a889d92.git",
"git_push_url": "https://gist.github.com/4b85f310233a6b9d385643fa3a889d92.git",
"html_url": "https://gist.github.com/4b85f310233a6b9d385643fa3a889d92",
"files": {
"latest-commit.json": {
"filename": "latest-commit.json",
"type": "application/json",
"language": "JSON",
"raw_url": "https://gist.githubusercontent.com/leagris/4b85f310233a6b9d385643fa3a889d92/raw/7cb7f9d4a0170daf5083929858fb7eef706f8b59/latest-commit.json",
"size": 105,
"truncated": false,
"content": "{\n \"subject\": \"depricate Phosphor\",\n \"name\": \"Blood Asp\",\n \"date\": \"Wed, 12 Dec 2018 18:55:39 +0100\"\n}"
}
},
...

Elasticsearch docker burn data in image

I'm trying to build an elasticsearch image with preloaded data. I'm doing a restore operation from S3.
FROM elasticsearch:5.3.1
ARG bucket
ARG access_key
ARG secret_key
ARG repository
ARG snapshot
ENV ES_JAVA_OPTS="-Des.path.conf=/etc/elasticsearch"
RUN elasticsearch-plugin install repository-s3
ADD https://raw.githubusercontent.com/vishnubob/wait-for-it/e1f115e4ca285c3c24e847c4dd4be955e0ed51c2/wait-for-it.sh wait-for-it.sh
RUN chmod +x wait-for-it.sh
RUN /docker-entrypoint.sh elasticsearch -p /tmp/epid & ./wait-for-it.sh -t 0 localhost:9200 -- echo "Elasticsearch is ready!" && \
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/$repository" -d '{ "type": "s3", "settings": { "bucket": "'$bucket'", "access_key": "'$access_key'", "secret_key": "'$secret_key'" } }' && \
curl -H "Content-Type: application/json" -X POST "localhost:9200/_snapshot/$repository/$snapshot/_restore?wait_for_completion=true" -d '{ "indices": "myindex", "ignore_unavailable": true, "index_settings": { "index.number_of_replicas": 0 }, "ignore_index_settings": [ "index.refresh_interval" ] }' && \
curl -H "Content-Type: application/json" -X GET "localhost:9200/_cat/indices"
RUN kill $(cat /tmp/epid) && wait $(cat /tmp/epid); exit 0;
CMD ["-E", "network.host=0.0.0.0", "-E", "discovery.zen.minimum_master_nodes=1"]
The image is built successfully, but when I start the container the index is lost. I'm not using any volumes. What am I missing?
version: '2'
services:
elasticsearch:
container_name: "elasticsearch"
build:
context: ./elasticsearch/
args:
access_key: access_key_here
secret_key: secret_key_here
bucket: bucket_here
repository: repository_here
snapshot: snapshot_here
ports:
- "9200:9200"
- "9300:9300"
environment:
ES_JAVA_OPTS: "-Xms1g -Xmx1g -Des.path.conf=/etc/elasticsearch"
It seems that volumes cannot be burnt in images. The directory that holds the data generated are specified as a volume by the parent image. The only way to do this is to fork the parent Dockerfile and remove the volume part.

Resources