Elasticsearch not_analyzed field transform double '$' symbol - elasticsearch

I have a problem when I include a double dollar sign symbol in a not analyzed field. When I check the mapping for the type with the following command:
curl -XGET 'http://localhost:9200/documents/_mapping/document'
I get this output for the code field:
{
"documents": {
"mappings": {
"document": {
"properties": {
"code": {
"index": "not_analyzed",
"type": "string"
},
// More field mappings
If I test the mapping with the following command:
curl -XGET 'http://localhost:9200/documents/_analyze?field=code' -d "ABC$"
I get the following output:
{"tokens":[{"token":"ABC$","start_offset":0,"end_offset":4,"type":"word","position":1}]}
This is ok because the token has the same value that the data entered in the command. The problem is when I use a double dollar sign like this:
 curl -XGET 'http://localhost:9200/documents/_analyze?field=code' -d "ABC$$"
Then I get the following token:
{"tokens":[{"token":"ABC4088","start_offset":0,"end_offset":7,"type":"word","position":1}]}
It looks like the double dollar sign gets replaced by the number 4088. According to the docs, the not_analyzed index attribute means:
Index this field, so it is searchable, but index the value exactly as specified. Do not analyze it.
Do miss something in the code field mapping to avoid this?

This is because $$ is a special environment variable in your shell that gets expanded to the ID of the current shell process.
So when running the curl command what happens is that the PID of your current shell is resolved and replaces the $$ variable, hence the 4088 you see instead of $$.
Try to issue echo $$ and you'll see the ID of your current shell.
curl -XGET 'http://localhost:9200/documents/_analyze?field=code' -d "ABC4088"
^
|
$$ replaced by the ID of your process
Try to simply escape the $ characters and you should be fine
curl -XGET 'http://localhost:9200/documents/_analyze?field=code' -d "ABC\$\$"

Related

Replace string with Bash variable in jq command

I realize this is a simple question but I haven't been able to find the answer. Thank you to anyone who may be able to help me understand what I am doing wrong.
Goal: Search and replace a string in a specific key in a JSON file with a string in a Bash variable using jq.
For example, in the following JSON file:
"name": "Welcome - https://docs.mysite.com/",
would become
"name": "Welcome",
Input (file.json)
[
{
"url": "https://docs.mysite.com",
"name": "Welcome - https://docs.mysite.com/",
"Ocurrences": "679"
},
{
"url": "https://docs.mysite.com",
"name": "Welcome",
"Ocurrences": "382"
}
]
Failing script (using variable)
siteUrl="docs.mysite.com"
jq --arg siteUrl "$siteUrl" '.[].name|= (gsub(" - https://$siteUrl/"; ""))' file.json > file1.json`
Desired output (file1.json)
[
{
"url": "https://docs.mysite.com",
"name": "Welcome",
"Ocurrences": "679"
},
{
"url": "https://docs.mysite.com",
"name": "Welcome",
"Ocurrences": "382"
}
]
I've tried various iterations on removing quotes, changing between ' and ", and adding and removing backslashes.
Successful script (not using variable)
jq '.[].name|= (gsub(" - https://docs.mysite.com/"; ""))' file.json > file1.json
More specifically, if it matters, I am processing an export of a website's usage data from Azure App Insights. Unfortunately, the same page may be assigned different names. I sum the Ocurrences of the two objects with the newly identical url later. If it is better to fix this in App Insights I am grateful for that insight, although I know Bash better than Kusto queries. I am grateful for any help or direction.
Almost. Variables are not automatically expanded within a string. You must interpolate them explicitly with \(…):
jq --arg siteUrl 'docs.mysite.com' '.[].name |= (gsub(" - https://\($siteUrl)/"; ""))' file.json
Alternatively, detect a suffix match and extract the prefix by slicing:
jq --arg siteUrl 'docs.mysite.com' '" - https://\($siteUrl)/" as $suffix | (.[].name | select(endswith($suffix))) |= .[:$suffix|-length]' file.json

Create index-patterns from console with Kibana 6.0 or 7+ (v7.0.1)

I recently upgraded my ElasticStack instance from 5.5 to 6.0, and it seems that some of the breaking changes of this version have harmed my pipeline. I had a script that, depending on the indices inside ElasticSearch, created index-patterns automatically for some groups of similar indices. The problem is that with the new mapping changes of the 6.0 version, I cannot add any new index-pattern from the console. This was the request I used and worked fine in 5.5:
curl -XPOST "http://localhost:9200/.kibana/index-pattern" -H 'Content- Type: application/json' -d'
{
"title" : "index_name",
"timeFieldName" : "execution_time"
}'
This is the response I get now, in 6.0, from ElasticSearch:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [.kibana] as the final mapping would have more than 1 type: [index-pattern, doc]"
}
],
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [.kibana] as the final mapping would have more than 1 type: [index-pattern, doc]"
},
"status": 400
}
How could I add index-patterns from the console avoiding this multiple mapping issue?
The URL has been changed in version 6.0.0, here is the new URL:
http://localhost:9200/.kibana/doc/doc:index-pattern:my-index-pattern-name
This CURL should work for you:
curl -XPOST "http://localhost:9200/.kibana/doc/index-pattern:my-index-pattern-name" -H 'Content-Type: application/json' -d'
{
"type" : "index-pattern",
"index-pattern" : {
"title": "my-index-pattern-name*",
"timeFieldName": "execution_time"
}
}'
If you are Kibana 7.0.1 / 7+ then you can refer saved_objects API ex:
Refer: https://www.elastic.co/guide/en/kibana/master/saved-objects-api.html (Look for Get, Create, Delete etc).
In this case, we'll use: https://www.elastic.co/guide/en/kibana/master/saved-objects-api-create.html
$ curl -X POST -u $user:$pass -H "Content-Type: application/json" -H "kbn-xsrf:true" "${KIBANA_URL}/api/saved_objects/index-pattern/dummy_index_pattern" -d '{ "attributes": { "title":"index_name*", "timeFieldName":"sprint_start_date"}}' -w "\n" | jq
and
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 327 100 250 100 77 543 167 --:--:-- --:--:-- --:--:-- 543
{
"type": "index-pattern",
"id": "dummy_index_pattern",
"attributes": {
"title": "index_name*",
"timeFieldName": "sprint_start_date"
},
"references": [],
"migrationVersion": {
"index-pattern": "6.5.0"
},
"updated_at": "2020-02-25T22:56:44.531Z",
"version": "Wzg5NCwxNV0="
}
Where $KIBANA_URL was set to: http://my-elk-stack.devops.local:5601
If you don't have jq installed, remove | jq from the command (as listed above).
PS: When KIBANA's GUI is used to create an index-pattern, Kibana stores its i.e. index ID as an alpha-numeric value (ex: laskl32ukdflsdjflskadf-sdf-sdfsaldkjfhsdf-dsfasdf) which is hard to use/find/type when doing GET operation to find info about an existing index-pattern using the following curl command.
If you passed index pattern name (like we did above), then in Kibana/Elasticsearch, it'll story the Index-Pattern's ID by the name you gave to the REST call (ex: .../api/saved_objects/index-pattern/dummy_index_pattern")
here: dummy_index_pattern will become the ID (only visible if you hover over your mouse on the index-pattern name in Kibana GUI) and
it'll have it's index name as: index_name* (i.e. what's listed in GUI when you click on Kibana Home > Gear icon > Index Patterns and see the index patterns listed on the right side.
NOTE: The timeFieldName is very important. This is the field, which is used for looking for time-series events (i.e. especially TSVB Time Series Visual Builder Visualization type). By default, it uses #timestamp field, but if you recreate your index (instead of sending delta information to your target Elasticsearch index from a data source (ex: JIRA)) every time and send all data in one shot from scratch from a data source, then #timestamp won't help with Visualization's Time-Spanning/Window feature (where you change time from last 1 week to last 1 hour or last 6 months); in that case, you can set a different field i.e. sprint_start_date like I used (and now in Kibana Discover data page, if you select this index-pattern, it'll use sprint_start_date (type: date) field, for events.
To GET index pattern info about the newly created index-pattern, you can refer: https://www.elastic.co/guide/en/kibana/master/saved-objects-api-get.html --OR run the following where (the last value in the URL path is the ID value of the index pattern we created earlier:
curl -X GET "${KIBANA_URL}/api/saved_objects/index-pattern/dummy_index_pattern" | jq
or
otherwise (if you want to perform a GET on an index pattern which is created via Kibana's GUI/webpage under Page Index Pattern > Create Index Pattern, you'd have to enter something like this:
curl -X GET "${KIBANA_URL}/api/saved_objects/index-pattern/jqlaskl32ukdflsdjflskadf-sdf-sdfsaldkjfhsdf-dsfasdf" | jq
For Kibana 7.7.0 with Open Distro security plugin (amazon/opendistro-for-elasticsearch-kibana:1.8.0 Docker image to be precise), this worked for me:
curl -X POST \
-u USERNAME:PASSWORD \
KIBANA_HOST/api/saved_objects/index-pattern \
-H "kbn-version: 7.7.0" \
-H "kbn-xsrf: true" \
-H "content-type: application/json; charset=utf-8" \
-d '{"attributes":{"title":"INDEX-PATTERN*","timeFieldName":"#timestamp","fields":"[]"}}'
Please note, that kbn-xsrf header is required, but it seems like it's useless as from security point of view.
Output was like:
{"type":"index-pattern","id":"UUID","attributes":{"title":"INDEX-PATTERN*","timeFieldName":"#timestamp","fields":"[]"},"references":[],"migrationVersion":{"index-pattern":"7.6.0"},"updated_at":"TIMESTAMP","version":"VERSION"}
I can't tell why migrationVersion.index-pattern is "7.6.0".
For other Kibana versions you should be able to:
Open Kibana UI in browser
Open Developers console, navigate to Network tab
Create index pattern using UI
Open POST request in the Developers console, take a look on URL and headers, than rewrite it to cURL
Indices created in Elasticsearch 6.0.0 or later may only contain a single mapping type.
Indices created in 5.x with multiple mapping types will continue to function as before in Elasticsearch 6.x.
Mapping types will be completely removed in Elasticsearch 7.0.0.
Maybe you are creating a index with more than one doc_types in ES 6.0.0.
https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html
Create index-pattern in bulk with timestamp:
cat index_svc.txt
my-index1
my-index2
my-index3
my-index4
my-index5
my-index6
cat index_svc.txt | while read index; do
echo -ne "create index-pattern ${index} \t"
curl -XPOST "http://10.0.1.44:9200/.kibana/doc/index-pattern:${index}" -H 'Content-Type: application/json' -d "{\"type\":\"index-pattern\",\"index-pattern\":{\"title\":\"${index}2020*\",\"timeFieldName\":\"#timestamp\"}}"
echo
done

How to unset Elasticsearch routing

I'm using the shrink API and it requires you to move all shards to a single node. After the shrink operation is completed I wish to have the shards on the original index reassigned though out the cluster.
So my question is how to I reverse this command? I attempted to set _name to "*" but that did not work.
curl -s -XPUT "#{ES_HOST}:9200/#{BULK_INDEX}/_settings?pretty" -d '
{
"settings": {
"index.routing.allocation.require._name": "shrink-node-1"
}
}'
You can try to set it to null instead but you also need to remove the settings section since you're already hitting the _settings endpoint:
curl -s -XPUT "#{ES_HOST}:9200/#{BULK_INDEX}/_settings?pretty" -d '
{
"index.routing.allocation.require._name": null
}'

cURL operation performs differently when used in batch script

I'm a bit new to batch scripting, so I apologize if this is glaringly obvious, but I couldn't find any similar information.
I'm trying to perform the following cURL call:
curl -H "Content-Type:application/json" -d '{"lt":"f","sort":"n","max":"1500","offset":"1500"}' [API_KEY]#api.nal.usda.gov/ndb/list
When I run that line in a command line (or Cygwin) it does exactly what I want it to.
However, when I try to call it from a bat file, it seems my parameters are getting messed up somehow.
FOR /L %%A IN (0, 1500, 77500) DO (
curl -H "Content-Type:application/json" -d '{"lt":"f","sort":"n","max":"1500","offset":"%%A"}' [API_KEY]#api.nal.usda.gov/ndb/list > %%A.txt
)
I'm getting output into the correct .txt file, but it doesn't seem like the %%A in offset is getting replaced. I'm getting a "bad parameter" exception from the API. From the output on the command line, it looks accurate.
I'm open to using bash scripting instead if it would make more sense, but I was having the same issue using bash.
(Note: I replaced my API key with a placeholder in the example.. that's not the problem)
In bash at least, the problem is that variable expansion does not occur inside single quotes; you need to use double quotes and escape the nested double quotes:
for a in 0 1500 77500; do
curl -H "Content-Type:application/json" -d "{\"lt\":\"f\",\"sort\":\"n\",\"max\":\"1500\",\"offset\":\"$a\"}" [API_KEY]#api.nal.usda.gov/ndb/list > "$a".txt
)
I suspect you need to do the equivalent in a batch file.
You can concatenate adjacent single- and double-quoted strings to minimized the number of escaped quotes:
... -d '{"lt": "f", "sort": "n", "max": "1500", "offset": "'"$a"'"}' ...
but you may want to consider one of two other options. First, read the data from a here document instead of using a hard-coded string:
curl -H "..." -d#- [API_KEY]#api.nal.usda.gov/ndb/list > "$a".txt <<EOF
{"lt": "f", "sort": "n", "max": "1500", "offset": "$a"}
EOF
or use something like jq to generate the JSON for your:
curl -H "..." \
-d "$(jq --arg a "$a" '{lt: "f", sort: "n", max: "1500", offset: $a}') \
[API_KEY]#api.nal.usda.gov/ndb/list > "$a".txt
The jq solution would be preferable in general, since you don't have to worry about pre-escaping any variable values.

MongoDB - escaping quotes while inserting record

I have encountered a strange problem while I was trying to write a bash scritp to copy some data from one database to another.
To make things simple I will present the issue with the following example:
Let's say, we have a file in which are mongo insert commands that want to execute in mongo client. With Bash it will be:
cat file.json | mongo --verbose --host $HOST
This works fine until we use qoutes in records content.
For example:
use somedb;
db["collection"].insert({ "field": "test" })
#This of course works
db["collection"].insert({ "field": "test \" test" })
#But this doesn't
db["collection"].insert({ "field": "test \\" test" }) "#<-For syntax coloring
#I thounght maybe double quoting will help, but this also doesn't work
db["collection"].insert({ "field": "\"test\"" })
#This SUPRISINGLY works!!!
My question is, what is the propper way of escaping quotes for mongo client (I am using MongoDB shell verions: 2.2.4)?
Why when there is an even number of quotes in record, the script will succeed and with odd number will fail?
I will add that, there are no error messages. Mongo just fails silently(even with --verbose) and no new records appears in collection.
There's a JIRA ticket for this issue and it's fixed in 2.5.0.
For now, you can use the unicode point for double quote when inserting:
> db.foo.insert({ "field": "test \u0022 test" })
> db.foo.find()
{ "_id" : ObjectId("533455e563083f9b26efb5c2"), "field" : "test \" test" }

Resources