How can curl perform a get request with a data payload? - elasticsearch

The introductory materials on ElasticSearch include the following example curl request:
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string" : {
"query" : "(new york city) OR (big apple)",
"default_field" : "content"
}
}
}
'
This request has two parameter which I thought were incompatible:
-X GET, which specifies that the request is a GET.
-d [...], which specifies that the request has a data payload.
I thought that specifying a data payload was only possible in a PUT or POST requests, because GET requests do not have any concept of a data payload. Is this a valid curl command? What does it do, exactly?

Above curl request is a valid request, in fact, if you have index and data, then you can check the output of your command.
I tried it in my system and ES index and it gave me proper response.
curl -v -X GET "localhost:9500/querytime/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string" : {
"query" : "(avengers) OR (big apple)",
"default_field" : "movie_name"
}
}
}'
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9500 (#0)
> GET /querytime/_search?pretty HTTP/1.1
> Host: localhost:9500
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 156
>
* upload completely sent off: 156 out of 156 bytes
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 905
<
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.14874382,
"hits" : [
{
"_index" : "querytime",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.14874382,
"_source" : {
"movie_name" : "Avengers: Infinity War"
}
}
]
}
}
As mentioned in the official manual of curl command, if you are using *nix based system, then you can search below in the manual of curl.
-G, --get
When used, this option will make all data specified with -d, --data, --data-binary or --data-urlencode to be used in an
HTTP GET request instead of the POST request that otherwise would be
used. The
data will be appended to the URL with a '?' separator
As explained in this SO answer, it also depends on the web-server to parse the body in GET request.

Related

Why do I have to PUT new documents to a nested URI, if mapping types have been removed?

I'm on Elasticsearch 7.14.0 where mapping types have been removed.
If I run the following:
curl -X PUT "localhost:9200/products/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
I get
{
"error" : "Incorrect HTTP method for uri [/products/1?pretty] and method [PUT], allowed: [POST]",
"status" : 405
}
It seems that elastic wants me PUT it in an /index/type/ URI:
curl -X PUT "localhost:9200/pop/products/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
{
"_index" : "pop",
"_type" : "products",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
I am wondering why I must have a nested URI indicating a type, if mapping types have been removed?
You have to add _doc to your put request call as shown below
curl -X PUT "localhost:9200/products/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
As mentioned in elasticsearch official documentation after mapping types were removed in 7.x, you need to add , _doc (which does not represent a document type rather it represents the endpoint name) for the document index, get, and delete APIs

Bulk index text field containing new lines with cURL

I am trying to bulk index a file with the following format to my elasticsearch index:
{"index":{"_index":"articles","_type":"_doc"}}
{"title":"My Article Title","text":"My article text. \nNext paragraph here."}
Using this command:
curl -s -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/_bulk --data-binary #/data.json
The problem is that the article text in my documents may contain new line characters \n, which breaks the formatting for a cURL bulk index, so I get this error:
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
I have been able to bulk index these documents using the javascript API, so I'm hoping it will be possible to do using cURL, as I want to index these documents into my docker image as a part of the build.
I've managed to do it on Elasticsearch 7.3 and Red Hat Enterprise Linux 7 (7.7).
1) Changed .json to .txt and just hit enter after the last line, saved and uploaded on server
[root#host tmp]$ mv data.json data.txt
2) Forced curl to append new line to output
[root#host tmp]$ echo '-w "\n"' >> ~/.curlrc
3) Curled to ES:
[root#host tmp]$ curl -s -XPOST -H 'Content-Type: application/x-ndjson' https://localhost:9200/_bulk -k -u user:pass --data-binary #data.json
{"took":4,"errors":false,"items":[{"index":{"_index":"articles","_type":"_doc","_id":"QdsosG0B3nqkAGly3E6t","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":1,"_primary_term":1,"status":201}}]}
4) Result:
[root#host tmp]$ curl -XGET -H 'Content-Type: application/x-ndjson' https://localhost:9200/articles/_search?pretty -k -u user:pass
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "articles",
"_type" : "_doc",
"_id" : "QdsosG0B3nqkAGly3E6t",
"_score" : 1.0,
"_source" : {
"title" : "My Article Title",
"text" : "My article text. \nNext paragraph here."
}
}
]
}
}

Elasticsearch completion : strange behavior when multiple matches per document

When I use the completion type inside a suggest as described in the ElasticSearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-completion.html), I do not manage to get all the matching words (I only get one matching word per document)
I test the following commands on my ElasticSearch 6.7.2 (which is the latest available on AWS at this moment) :
Deleting the index in case it exists
curl http://localhost:9200/test -H 'Content-Type: application/json' -X DELETE
Creating the index
curl http://localhost:9200/test -H 'Content-Type: application/json' -X PUT -d '
{
"mappings": {
"page": {
"properties": {
"completion_terms": {
"type": "completion"
}
}
}
}
}
'
Indexing a document
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json' -X PUT -d '
{
"completion_terms": ["restaurant", "restauration", "réseau"]
}'
Check the document exists
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json'
Use the completion
curl -X GET "localhost:9200/test/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
"_source": ["suggestExact"],
"suggest": {
"suggestExact" : {
"prefix" : "res",
"completion" : {
"field" : "completion_terms"
}
}
}
}
'
The result is :
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
},
"suggest" : {
"suggestExact" : [
{
"text" : "res",
"offset" : 0,
"length" : 3,
"options" : [
{
"text" : "restaurant",
"_index" : "test",
"_type" : "page",
"_id" : "1",
"_score" : 1.0,
"_source" : { }
}
]
}
]
}
}
I'd like to get ALL the matching words (here, I get at most one result per document)
In the example, "restauration" and "réseau" are missing
Am I doing something wrong ?
After many searches, I found that this is the intended behavior (that is to "suggest documents", instead of "suggest terms")
Especially, see https://github.com/elastic/elasticsearch/issues/31738
However, I still do not manage to achieve "suggest terms" even with the term suggester which seems to be the correct way (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-term.html)

Elasticsearch-6.x norms false not working

That is what I have Done:
First:
curl -X PUT "localhost:9200/log_20180419"
Second
curl -X PUT "localhost:9200/log_20180419/_mapping/_doc" -H 'Content-Type: application/json' -d'
{
"properties": {
"title": {
"type": "text",
"norms": false
}
}
}
'
Third
#I insert data with python client : elastisearch-py
from elastisearch import Elastisearch
es_conn = Elastisearch()
content_tmp = "acxzcasiuchxzuicbhasuicgzyugas%s"
for i in range(10000):
result = content_tmp % i
es_conn.index(index="log_20180419", body = {"title":result}, doc_type="_doc")
Forth
I Query It
curl -X GET "localhost:9200/cdn_log_20180419/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match":{
"title":"dasuioczxuivcaduciqanbcaiushcauinhauincsaincdjkxzcbyquiwbjkfcznkajsbcjkzxhcuiasbcjkzxchjdsfasckjbjak9999"
}
}
}
'
Result is
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 7.2293553,
"hits" : [
{
"_index" : "cdn_log_20180419",
"_type" : "_doc",
"_id" : "oDR99mIBBZEcRu0i7LlO",
"_score" : 7.2293553,
"_source" : {
"title" : "dasuioczxuivcaduciqanbcaiushcauinhauincsaincdjkxzcbyquiwbjkfcznkajsbcjkzxhcuiasbcjkzxchjdsfasckjbjak9999"
}
}
]
}
}
You can see, it still has _score file in result, I get Confuse with it ?
The Doc is here https://www.elastic.co/guide/en/elasticsearch/reference/current/norms.html
The norm is only one part of scoring. The norm covers the field length norm and index-time boosting (if you are using that), but term frequency and inverse document frequency (TF/IDF) are independent of it.
If you don't need / want scoring for your query, look into boolean filters or constant score.

Set up Flume HTTP agent on a URL?

I've been able to set up my HTTP source on a host and port like:
agent.sources=s1
...
agent.sources.s1.type=http
agent.sources.s1.bind=0.0.0.0
agent.sources.s1.port=5140
And I can, for example, POST a json document to it via:
curl -X POST -H 'Content-Type: application/json; charset=UTF-8' -d '[{
"headers" : { "ip" : "192.168.1.102", "host" :
"random_host.example.com" }, "body" : "random_body" }, { "headers" : {
"ip" : "192.168.1.102", "host" : "random_host.example.com" }, "body" :
"really_random_body" }]' http://hostname:port
However I would like to be able to POST a Json document to http://hostname.com:port/a/b/c/
How may I do this?

Resources