how to configure elastic search in jaeger without docker - elasticsearch

I am trying to setup jaeger-all-in-one on one server. If I run the exe jaeger-all-in-one, everything works as expected (using in-memory). In order to see the options available with ES, I am not able to run a help command. Now, my requirement is to specify an elastic search URL. I have set up the environment variables SPAN_STORAGE_TYPES and ES_SERVER_URLS, but couldn't find how to run jaeger-all-in-one.exe by asking it to take in these environment variables.

Are you connecting it to only elasticsearch or stack like ELK/EFK?. I had tried but we cannot configure ELK in jeager-all-one.exe alone in windows without docker.You can do it by running Jeager-collector, Jeager agent and Jeager query individually by mentioning configurations related to ELK.
In Jeager collector and Jeager query you need to set up variables SPAN_STORAGE_TYPES and ES_SERVER_URLS.

Related

Elastic Cloud APM not showing logs in Transactions Page

What makes Kibana to not show docker container logs in APM "Transactions" page under "Logs" tab.
I verified the logs are successfully being generated with the "trace.id" associated for proper linking.
I have the exact same environment and configs (7.16.2) up via docker-compose and it works perfectly.
Could not figure out why this feature works locally but does not show in Elastic Cloud deploy.
UPDATE with Solution:
I just solved the problem.
It's related to the Filebeat version.
From 7.16.0 and ON, the transaction/logs linking stops working.
Reverted Filebeat back to version 7.15.2 and it started working again.
If you are not using file beats, for example - We rolled our own logging implementation to send logs from a queue in batches using the Bulk API.
We have our own "ElasticLog" class and then use Attributes to match the logs-* Schema for the Log Stream.
In particular we had to make sure that trace.id was the same as the the actual Traces, trace.id property. Then the logs started to show up here (It does take a few minutes sometimes)
Some more info on how to get the ID's
We use OpenTelemetry exporter for Traces and ILoggerProvider for Logs. The fire off batches independently of each other.
We populate the Trace Id's at the time of instantiation of the class as a default value. This way you in the context of the Activity. Also helps set the timestamp exactly when the log was created.
This LogEntry then gets passed into the ElasticLogger processor and mapped as displayed above to the ElasticLog entry with the Attributes needed for ES

How to log every query in solr using slow query log?

I have been using elasticsearch and in that you just set the slowquerylog threshold to 0 and all queries would be logged so I tried the same in solr.
I am using the techproducts example here and just added the following config to the file
/home/ygrover/software/solr-8.3.1/server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml
<slowQueryThresholdMillis>0</slowQueryThresholdMillis>
also I changed the logging level in solr via the http://localhost:8983/solr/#/~logging/level to ALL
The log folder is at the location /home/ygrover/software/solr-8.3.1/server/logs
but there are no logs printing in the file solr_slow_requests.log
Am I missing something here.
Note : I am doing this for testing and local env only. also if there is an alternative way then please suggest but I need to know what is the missing peice here as this process works seamlessly in elasticsearch.
Edit 1 :
Facing this problem in cloud mode only when launching the techproducts example: followed this tutorial : https://lucene.apache.org/solr/guide/8_4/solr-tutorial.html
I have edited the _default config as well and set the slow query thrshold to 0 there as well. This config works when I dont run in cloud mode and I can then see all queries logged in the solr_slow_requests.log

How to configure Jaeger collector with ElasticSearch in Windows Server

I am trying to setup jaeger-collector on one server with jaeger-agent running in another server.
If I run the exe jaeger-all-in-one, everything works as expected (using in memory).
In order to see the options available with ES, i am not able to run a help command. When I run jaeger-collector --help, it shows only cassandra related flags. How do I check the elastic search specific details.
Now, my requirement is to specify and elastic search url.
I have set up the Environment variables SPAN_STORAGE_TYPES and ES_SERVER_URLS, but couldn't find how to run jaeger-collector.exe by asking it to take in these environment variables.
Thanks,
Minu

How do you use s3a with spark 2.1.0 on aws us-east-2?

Background
I have been working on getting a flexible setup for myself to use spark on aws with docker swarm mode. The docker image I have been using is configured to use the latest spark, which at the time is 2.1.0 with Hadoop 2.7.3, and is available at jupyter/pyspark-notebook.
This is working, and I have been just going through to test out the various connectivity paths that I plan to use. The issue I came across is with the uncertainty around the correct way to interact with s3. I have followed the trail on how to provide the dependencies for spark to connect to data on aws s3 using the s3a protocol, vs s3n protocol.
I finally came across the hadoop aws guide and thought I was following how to provide the configuration. However, I was still receiving the 400 Bad Request error, as seen in this question that describes how to fix it by defining the endpoint, which I had already done.
I ended up being too far off the standard configuration by being on us-east-2, making me uncertain if I had a problem with the jar files. To eliminate the region issue, I set things back up on the regular us-east-1 region, and I was able to finally connect with s3a. So I have narrowed down the problem to the region, but thought I was doing everything required to operate on the other region.
Question
What is the correct way to use the configuration variables for hadoop in spark to use us-east-2?
Note: This example uses local execution mode to simplify things.
import os
import pyspark
I can see in the console for the notebook these download after creating the context, and adding these took me from being completely broken, to getting the Bad Request error.
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3 pyspark-shell'
conf = pyspark.SparkConf('local[1]')
sc = pyspark.SparkContext(conf=conf)
sql = pyspark.SQLContext(sc)
For the aws config, I tried both the below method and by just using the above conf, and doing conf.set(spark.hadoop.fs.<config_string>, <config_value>) pattern equivalent to what I do below, except doing it this was I set the values on conf before creating the spark context.
hadoop_conf = sc._jsc.hadoopConfiguration()
hadoop_conf.set("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3a.endpoint", "s3.us-east-2.amazonaws.com")
hadoop_conf.set("fs.s3a.access.key", access_id)
hadoop_conf.set("fs.s3a.secret.key", access_key)
One thing to note, is that I also tried an alternative endpoint for us-east-2 of s3-us-east-2.amazonaws.com.
I then read some parquet data off of s3.
df = sql.read.parquet('s3a://bucket-name/parquet-data-name')
df.limit(10).toPandas()
Again, after moving the EC2 instance to us-east-1, and commenting out the endpoint config, the above works for me. To me, it seems like endpoint config isn't being used for some reason.
us-east-2 is a V4 auth S3 instance so, as you attemped, the fs.s3a.endpoint value must be set.
if it's not being picked up then assume the config you are setting isn't the one being used to access the bucket. Know that Hadoop caches filesystem instances by URI, even when the config changes. The first attempt to access a filesystem fixes, the config, even when its lacking in auth details.
Some tactics
set the value is spark-defaults
using the config you've just created, try to explicitly load the filesystem via a call to Filesystem.get(new URI("s3a://bucket-name/parquet-data-name", myConf) will return the bucket with that config (unless it's already there). I don't know how to make that call in .py though.
set the property "fs.s3a.impl.disable.cache" to true to bypass the cache before the get command
Adding more more diagnostics on BadAuth errors, along with a wiki page, is a feature listed for S3A phase III. If you were to add it, along with a test, I can review it and get it in

Debugging elastic search code with IntelliJ

I am trying to debug the Elasticsearch source code with IntelliJ. I built the source with IntelliJ and the current Program argument is start. I tried passing the parameters necessary to create an index in the program argument section but it doesn't seem to work. Where do I need to pass the parameters to create indices or perform other operations?
To being testing/debugging requests, you will need to start the server by adding the start command as a Program argument. Once the server has started, open up a terminal and provide curl commands. You can place break points in the code to view the workflows involved.

Resources