Kibana: Cannot create Dashboard after a new install - elasticsearch

I recently installed ELK stack (Elastic Search, Logstash & Kibana) on a VM. Everything works great, as in, the messages from Elastic Search go thru Logstash & show up as expected in Kibana. Only problem is that I am not able to create any Dashboards in Kibana. I keep getting this error:
Error on saving 'My dashboard'. Forbidden
I Googled, used other answers in StackOverflow & tried several recommended suggestions such as:
Setting 'xpack.security.enabled' false in elasticsearch.yml
cluster.routing.allocation.disk.threshold_enabled: false
index.blocks.read_only_allow_delete: null
Nothing is helping. I have made sure that disk is not more than 95% full. Here's what I see when I run the 'df' command:
Filesystem 1K-blocks Used Available Use% Mounted on
udev 16438908 0 16438908 0% /dev
tmpfs 3294068 1188 3292880 1% /run
/dev/sda2 51340768 43199000 5504100 89% /
tmpfs 16470332 0 16470332 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 16470332 0 16470332 0% /sys/fs/cgroup
/dev/loop3 96128 96128 0 100% /snap/core/8935
/dev/loop2 123264 123264 0 100% /snap/docker/423
/dev/loop0 125312 125312 0 100% /snap/docker/418
tmpfs 3294064 0 3294064 0% /run/user/1001
/dev/loop4 96256 96256 0 100% /snap/core/9066
Any ideas? Note: All versions are latest.

Turns out it was indeed a space issue. When I switched to a new VM that has 300Gb of space this 'forbidden' message was gone. A bit confusing error message but now I know.

Related

Sonar-scanner hangs after 'Load active rules (done)' is shown in the logs

The tail of logging shows the following:
22:09:11.016 DEBUG: GET 200 http://someserversomewhere:9000/api/rules/search.protobuf?f=repo,name,severity,lang,internalKey,templateKey,params,actives,createdAt,updatedAt&activation=true&qprofile=AXaXXXXXXXXXXXXXXXw0&ps=500&p=1 | time=427ms
22:09:11.038 INFO: Load active rules (done) | time=12755ms
I have mounted the running container to see if the scanner process is pegged/running/etc and it shows the following:
Mem: 2960944K used, 106248K free, 67380K shrd, 5032K buff, 209352K cached
CPU: 0% usr 0% sys 0% nic 99% idle 0% io 0% irq 0% sirq
Load average: 5.01 5.03 4.83 1/752 46
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
1 0 root S 3811m 127% 1 0% /opt/java/openjdk/bin/java -Djava.awt.headless=true -classpath /opt/sonar-scanner/lib/sonar-scann
40 0 root S 2424 0% 0 0% bash
46 40 root R 1584 0% 0 0% top
I was unable to find any logging in the sonar-scanner-cli container to help indicate the state. It appears to just be hung and waiting for something to happen.
I am running Sonarqube locally from docker at the lts version 7.9.5
I am also running the docker container sonarsource:sonar-scanner-cli which is currently using the following version in the Dockerfile.
SONAR_SCANNER_VERSION=4.5.0.2216
I am triggering the scan via the following command:
docker run --rm \
-e SONAR_HOST_URL="http://someserversomewhere:9000" \
-e SONAR_LOGIN="nottherealusername" \
-e SONAR_PASSWORD="not12345likeinspaceballs" \
-v "$DOCKER_TEST_DIRECTORY:/usr/src" \
--link "myDockerContainerNameForSonarQube" \
sonarsource/sonar-scanner-cli -X -Dsonar.password=not12345likeinspaceballs -Dsonar.verbose=true \
-Dsonar.sources=app -Dsonar.tests=test -Dsonar.branch=master \
-Dsonar.projectKey="${PROJECT_KEY}" -Dsonar.log.level=TRACE \
-Dsonar.projectBaseDir=/usr/src/$PROJECT_NAME -Dsonar.working.directory=/usr/src/$PROJECT_NAME/$SCANNER_WORK_DIR
I have done a lot of digging to try to find anyone with similar issues and found the following older issue which seems to be similar but it is unclear how to determine if I am experiencing something related. Why does sonar-maven-plugin hang at loading global settings or active rules?
I am stuck and not sure what to do next any help or hints would be appreciated.
Additional note is that this process does work for the 8.4.2-developer version of Sonarqube that I am planning migrate to. The purpose of verifying 7.9.5 is to follow the recommended upgrade path from Sonarqube that recommends the interim step of first bringing your current version to the latest LTS then running the data migration before jumping to the next major version.

Azure DevOps build pipeline self-hosted agent "No space left on device"

I am running a build pipeline on Azure that runs on a private build server (Red Hat Enterprise Linux) running Self Hosted Agent. This build pipeline only has 1 Job and 2 Tasks where the 1st task is it basically SSH's into a Repo server we have (different server that just holds big files) generates an ISO image on that Repo server, then uses curl to put that ISO back on the the build server where the Azure Pipeline agent is running in the stereotypical $(Build.ArtifactStagingDirectory) Azure uses for Artifacts.
This 1st task succeeds, and the ISO is generated and copied over to the build server, but the "Publish Artifact" stage keeps failing. It's trying to publish to the path $(Build.ArtifactStagingDirectory) but produces an error message, with more logs:
No space left on device
I already went in a cleared all the directories and files that exceeded > 1GB in this working directory `/home/azure/vsts/_work
I'm not an expert with Linux. When I run df -h and view the filesystem, there are a bunch in the list. Is there a way to know what partition I'm actually using for this Azure pipeline agent that using the /home/azure/vsts/_work directory?
My df -h list looks like:
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_root-lv_root 19G 19G 28K 100% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 8.0K 3.9G 1% /dev/shm
tmpfs 3.9G 138M 3.8G 4% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/sdc 100G 152M 100G 1% /glusterfs
/dev/sda1 488M 119M 334M 27% /boot
/dev/mapper/vg_root-lv_var 997M 106M 891M 11% /var
/dev/mapper/vg_docker-lv_docker 50G 3.1G 44G 7% /var/lib/docker/overlay2
/dev/mapper/vg_root-lv_log 997M 46M 952M 5% /var/log
/dev/mapper/vg_root-lv_crash 997M 33M 965M 4% /var/crash
/dev/mapper/vg_root-lv_root_logins 29M 1.8M 27M 6% /var/log/root_logins
/dev/mapper/vg_root-lv_core 125M 6.6M 119M 6% /var/core
/dev/mapper/vg_root-lv_repo 997M 83M 915M 9% /var/cache/yum
/dev/mapper/vg_root-lv_home 997M 33M 965M 4% /export/home
/dev/mapper/vg_root-lv_logins 93M 5.0M 88M 6% /var/log/logins
/dev/mapper/vg_root-lv_audit 725M 71M 655M 10% /var/log/audit
tmpfs 799M 0 799M 0% /run/user/0
walkie1-ap2.nextgen.com:/hdd-volume0 200G 2.3G 198G 2% /gluster-hdd
If anyone could provide some insight I'd greatly appreciate it.
End of error log:
[2020-05-06 05:49:09Z ERR JobRunner] Caught exception from job steps StepsRunner: System.IO.IOException: No space left on device
at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
at System.IO.FileStream..ctor(String path, FileMode mode)
at Microsoft.VisualStudio.Services.Agent.PagingLogger.NewPage()
at Microsoft.VisualStudio.Services.Agent.PagingLogger.Write(String message)
at Microsoft.VisualStudio.Services.Agent.Worker.ExecutionContext.Write(String tag, String message)
at Microsoft.VisualStudio.Services.Agent.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
at Microsoft.VisualStudio.Services.Agent.Worker.StepsRunner.RunAsync(IExecutionContext jobContext, IList`1 steps)
at Microsoft.VisualStudio.Services.Agent.Worker.JobRunner.RunAsync(AgentJobRequestMessage message, CancellationToken jobRequestCancellationToken)```
I can't reproduce same issue on my side, but I think you can check this article for a trouble-shooting.
As I know this task itself will take extra space when being executed. You can try a bash command to make copy of content under path $(Build.ArtifactStagingDirectory) to double the size of those content, if this action throws same No space left on device error?
And in build pipeline there's one clean option to clean the caches before executing the job, enable it to check if it helps:
If it's yaml pipeline, try something like:
workspace:
clean: outputs | resources | all # what to clean up before the job runs
and
steps:
- checkout: self | none | repository name # self represents the repo where the initial Pipelines YAML file was found
clean: boolean # if true, run `execute git clean -ffdx && git reset --hard HEAD` before fetching
See Yaml schema.
So I was a n00b here, and the solution was simply to clean out space from the main directory we used to store our large ISO files:
/dev/mapper/vg_root-lv_root 19G 19G 28K 100% /
This is a custom VM we used to run builds on Azure and I wasn't accustomed to the error message. But yes, if anyone says this message and is using a custom build agent, its definitely a space issue.

Mac, list the usb port name that my device is connected to

I currently need to figure out what is the name of the port that my usb device is connected to. More specifically, I need to know what to put for the following
I am using a Mac, and I have run the command line system_profiler SPUSBDataType, and it gave me the following. The device of my interest is the second one: CP2102 USB to UART Bridge Controller. But how do I figure out what the serial port that this device is connected to? I need it for my first image.
USB:
USB 3.0 Bus:
Host Controller Driver: AppleUSBXHCIWPT
PCI Device ID: 0x9cb1
PCI Revision ID: 0x0003
PCI Vendor ID: 0x8086
Bluetooth USB Host Controller:
Product ID: 0x8290
Vendor ID: 0x05ac (Apple Inc.)
Version: 1.46
Speed: Up to 12 Mb/sec
Manufacturer: Broadcom Corp.
Location ID: 0x14300000 / 2
Current Available (mA): 500
Current Required (mA): 0
Extra Operating Current (mA): 0
Built-In: Yes
CP2102 USB to UART Bridge Controller:
Product ID: 0xea60
Vendor ID: 0x10c4 (Silicon Laboratories, Inc.)
Version: 1.00
Serial Number: 0001
Speed: Up to 12 Mb/sec
Manufacturer: Silicon Labs
Location ID: 0x14200000 / 9
Current Available (mA): 500
Current Required (mA): 100
Extra Operating Current (mA): 0
Microsoft USB Optical Mouse:
Product ID: 0x00cb
Vendor ID: 0x045e (Microsoft Corporation)
Version: 1.00
Speed: Up to 1.5 Mb/sec
Manufacturer: PixArt
Location ID: 0x14100000 / 4
Current Available (mA): 500
Current Required (mA): 100
Extra Operating Current (mA): 0
update: I ran df command, and it gave me the following:
Filesystem 512-blocks Used Available Capacity iused ifree %iused Mounted on
/dev/disk1 974716928 77014080 897190848 8% 754757 4294212522 0% /
devfs 380 380 0 100% 660 0 100% /dev
map -hosts 0 0 0 100% 0 0 100% /net
map auto_home 0 0 0 100% 0 0 100% /home
map -fstab 0 0 0 100% 0 0 100% /Network/Servers
/dev/disk2s1 202248 198912 3336 99% 12 4294967267 0% /Volumes/VirtualBox
/dev/disk3s1 81800 67720 14080 83% 121 4294967158 0% /Volumes/Sublime Text
/dev/disk5s1 1228720 386824 841896 32% 370 4294966909 0% /Volumes/Etcher
Try running ls /dev/tty* in a Terminal before and after plugging your device into USB. If there is an entry that only appears when the USB device is plugged in, that would be the entry corresponding to the device.

Ambari dashboard retrieving no statistics

I have a fresh install of Hortonworks Data Platform 2.2 installed on a small cluster (4 machines) but when I login to the Ambari GUI, the majority of dashboard stats boxes (HDFS disk usage, Network usage, Memory usage etc) are not populated with any statistics, instead they show the message:
No data There was no data available. Possible reasons include inaccessible Ganglia service
Clicking on the HDFS service link gives the following summary:
NameNode Started
SNameNode Started
DataNodes 4/4 DataNodes Live
NameNode Uptime Not Running
NameNode Heap n/a / n/a (0.0% used)
DataNodes Status 4 live / 0 dead / 0 decommissioning
Disk Usage (DFS Used) n/a / n/a (0%)
Disk Usage (Non DFS Used) n/a / n/a (0%)
Disk Usage (Remaining) n/a / n/a (0%)
Blocks (total) n/a
Block Errors n/a corrupt / n/a missing / n/a under replicated
Total Files + Directories n/a
Upgrade Status Upgrade not finalized
Safe Mode Status n/a
The Alerts and Health Checks box to the right of the screen is not displaying any information but if I click on the settings icon this opens the Nagios frontend and again, everything looks healthy here!
The install went smoothly (CentOS 6.5) and everything looks good as far as all services are concerned (all started with green tick next to service name). There are some stats displayed on the dashboard: 4/4 datanodes are live, 1/1 Nodemanages live & 1/1 Supervisors are live. I can write files to HDFS so its looks like it's a Ganglia issue?
The Ganglia daemon seems to be working ok:
ps -ef | grep gmond
nobody 1720 1 0 12:54 ? 00:00:44 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHistoryServer/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPHistoryServer/gmond.pid
nobody 1753 1 0 12:54 ? 00:00:44 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPFlumeServer/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPFlumeServer/gmond.pid
nobody 1790 1 0 12:54 ? 00:00:48 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseMaster/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPHBaseMaster/gmond.pid
nobody 1821 1 1 12:54 ? 00:00:57 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPKafka/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPKafka/gmond.pid
nobody 1850 1 0 12:54 ? 00:00:44 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPSupervisor/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPSupervisor/gmond.pid
nobody 1879 1 0 12:54 ? 00:00:45 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPSlaves/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPSlaves/gmond.pid
nobody 1909 1 0 12:54 ? 00:00:48 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPResourceManager/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPResourceManager/gmond.pid
nobody 1938 1 0 12:54 ? 00:00:50 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPNameNode/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPNameNode/gmond.pid
nobody 1967 1 0 12:54 ? 00:00:47 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPNodeManager/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPNodeManager/gmond.pid
nobody 1996 1 0 12:54 ? 00:00:44 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPNimbus/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPNimbus/gmond.pid
nobody 2028 1 1 12:54 ? 00:00:58 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPDataNode/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPDataNode/gmond.pid
nobody 2057 1 0 12:54 ? 00:00:51 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseRegionServer/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPHBaseRegionServer/gmond.pid
I have checked the Ganglia service on each node, the processes are running as expected
ps -ef | grep gmetad
nobody 2807 1 2 12:55 ? 00:01:59 /usr/sbin/gmetad --conf=/etc/ganglia/hdp/gmetad.conf --pid-file=/var/run/ganglia/hdp/gmetad.pid
I have tried restarting Ganglia services with no luck, restarted all services but still the same. Does anyone have any ideas how I get the dashboard to work properly? Thank you.
It turns out to be a proxy issue, to access the internet I had to add my proxy details to the file /var/lib/ambari-server/ambari-env.sh
export AMBARI_JVM_ARGS=$AMBARI_JVM_ARGS' -Xms512m -Xmx2048m -Dhttp.proxyHost=theproxy -Dhttp.proxyPort=80 -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false'
When ganglia was trying to access each node in the cluster the request was going via the proxy and never resolving, to overcome the issue I added my nodes to the exclude list (add the flag -Dhttp.nonProxyHosts) like so:
export AMBARI_JVM_ARGS=$AMBARI_JVM_ARGS' -Xms512m -Xmx2048m -Dhttp.proxyHost=theproxy -Dhttp.proxyPort=80 -Dhttp.nonProxyHosts="localhost|node1.dms|node2.dms|node3.dms|etc" -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false'
After adding the exclude list the stats were retrieved as expected!

Explore which files are heavily used in the system

I'm using ubuntu Ubuntu 14.04.1 LTS
atopsar -d 30 - shows that one of hard drive (sda) in the system is heavily used. This hard drive serves only mysql database. The most frequently used DBs where relocated to another hard drives (sdb, sdd) via symbolic links. Now atopsar shows nearly same load for sda and under 5% load to other HDDs.
Is there a way to know which files are heavily used on HDD?
Can it be that mysql InnoDB log files (ib_logfile) are fragmented? And therefore atopsar show such big load (50%-70%). What can be done in that case?
There are some output from atopsar -d 30:
08:52:47 disk busy read/s KB/read writ/s KB/writ avque avserv _dsk_
08:53:17 sda 63% 0.0 0.0 50.2 14.6 1.1 12.57 ms
sdb 5% 0.0 0.0 9.4 19.8 4.2 5.81 ms
sdd 2% 0.0 0.0 3.7 18.9 1.4 5.82 ms
08:53:47 sda 60% 0.0 16.0 48.1 15.7 1.0 12.55 ms
sdb 5% 0.0 0.0 6.9 17.5 4.6 7.35 ms
sdd 2% 0.0 0.0 4.7 24.9 1.4 4.06 ms
08:54:17 sda 38% 0.5 16.0 30.6 15.6 1.2 12.25 ms
sdb 3% 0.0 0.0 5.6 18.3 3.3 5.50 ms
sdd 2% 0.0 0.0 3.3 19.2 1.1 4.86 ms
08:54:47 sda 53% 0.0 0.0 42.5 16.5 1.1 12.37 ms
sdb 6% 0.0 0.0 8.7 21.0 5.8 6.37 ms
sdd 2% 0.0 0.0 3.1 23.1 1.3 5.68 ms
08:55:17 sda 51% 0.0 4.0 42.7 16.9 1.1 11.94 ms
sdb 5% 0.0 0.0 9.4 20.5 5.0 5.51 ms
sdd 1% 0.0 0.0 1.5 17.6 1.1 7.73 ms
08:55:47 sda 52% 0.0 0.0 40.6 14.5 1.0 12.85 ms
sdb 5% 0.0 0.0 6.8 19.5 5.4 6.66 ms
sdd 2% 0.0 0.0 4.3 31.3 1.3 4.78 ms
There is sysdig tool which allow you to see system-wide activities just like strace does for single process: http://www.sysdig.org/
There are examples for Disk usage info: https://github.com/draios/sysdig/wiki/Sysdig%20Examples#disk-io
See the top processes in terms of disk bandwidth usage
sysdig -c topprocs_file
See the top files in terms of read+write bytes
sysdig -c topfiles_bytes
Print the top files that apache has been reading from or writing to
sysdig -c topfiles_bytes proc.name=httpd
See the top directories in terms of R+W disk activity
sysdig -c fdbytes_by fd.directory "fd.type=file"
See the top files in terms of R+W disk activity in the /tmp directory
sysdig -c fdbytes_by fd.filename "fd.directory=/tmp/"
Observe the I/O activity on all the files named 'passwd'
sysdig -A -c echo_fds "fd.filename=passwd"
Sysdig is modern and convenient tool. For older Linuxes is it possible to get similar information using SystemTap: http://lukas.zapletalovi.com/2014/05/systemtap-as-a-system-wide-strace-tool.html
PS Thanks to habrahabr.ru with this post about Sysdig http://habrahabr.ru/company/selectel/blog/222839/
PPS Brendan D. Gregg created this picture "A quick tour of many tools..." for his Linux Performance page:
To find out the most heavily used files in the system please use: sudo pt-ioprofile -cell sizes
Example of output:
total pread read pwrite fsync lseek filename
10862592 0 0 10862592 0 0 /var/mysqldata/mysql/ibdata1
827392 0 0 827392 0 0 /var/mysqllog/mysql/ib_logfile0
... (other trivial I/O records truncated)
Got it from https://dba.stackexchange.com/questions/21209/innodb-high-disk-write-i-o-on-ibdata1-file-and-ib-logfile0
Please be aware that by default Percona toolkit attaches only to mysqld. And to find out most heavily used file you have to run it to all processes that might create such load. In my case I was definitely sure that it's mysql server, so it's enough for me.
Please read http://www.percona.com/doc/percona-toolkit/2.0/pt-ioprofile.html before you use it.
Try investigating with
dstat --top-bio
it will give you processes that use most of IO.
In linux you have /proc/diskstats - it gives only block device level stats.
I have never seen a mechanism to determine which file is busy in linux.

Resources