analyze systemd journal of a crashed / dead system - systemd

i resently upgraded a system. after reboot i was not able to login again. all users have been rejected with Login incorrect. systemd with journaling was running and writing error messages to file in /var/log/journal as usual.
i so booted a system from a revovery usb stick (same distribution) mounted the root device of the failed system /mnt and tried to analyze the logs with journalctl --root=/mnt/var/log/journal -xe. journalctl did not find journal files.
question: how can i read systemd journal content of a dead system using a recovery system?
have fun

I may be a bit late, but I stumbled upon this question and here is what I found:
journalctl logs are located in /var/log/journal/*
journalctl app can read foreign journal files with the following switches:
--file= followed by the *.journal file of your choice. This option may be used multiples times
--root= followed by the root directory of you choice, probably a mounted partition
--image= followed by a disk image,
files as argument, with the option --file

Related

MacOS Catalina read-write mount could not be mounted in single user or recovery mode with permission denied

I need to have writable access to the file system in recovery mode, but I always get the error
mount_apfs: volume could not be mounted: Permission denied.
I am aware of others who solved it like this: Read-only file system" with SIP disabled in macOS Catalina
i.e.:
start in recovery mode (Cmd-R at startup)
open terminal and disable SIP with csrutil disable
reboot into single user mode (Cmd-S at startup)
check SIP is disabled with csrutil status
try to mount the volumes with read/write:
sudo mount -uw /
sudo mount -uw /System/Volumes/Data
Neither of the two volumes can be mounted, I always get permission denied even with SIP disabled.
Interestingly, even if I boot the Mac from an external SSD (in this case an old Sierra installation) I cannot mount the disk as read/write.
Any further suggestions?
The problem in this case was a defect SSD, which switched into readonly mode after only 36 TB written, despite having a design of 1200 TBW.
Unfortunately, MacOS did not report this. When looking at System information > Storage > my SSD > SMART status the system still showed "Verified", which is supposed to mean that everything is ok.
It was not.
I determined this by installing smartmontools and running a check:
brew install smartmontools
smartctl -a disk1
it showed e.g.:
SMART overall-health self-assessment test result: FAILED
- Available spare has fallen below threshold
- media has been placed in read-only mode
- Data Units Written: ... 35,4 TB
- Available Spare: 1%
- Available Spare Threshhold: 10%
So issues
bad luck with SSD
bad error reporting by MacOS

Wbadmin backup failed due to a file system limitation

I'm trying to setup and learn the Wbadmin command line prompts for making my own backups. I'm created a test on Server 2008 R2 in VMWare, I've created a separate B: drive for backups. I'm trying to target specific files, and I've created 6 testFile# .txt files in the C drive under the !Test folder.
The command that I've used is:
wbadmin start backup -backupTarget:\\localhost\NetworkShare -include:C:\!Test\testFile*
The process starts, but ends up crashing. Screenshot attached below. The logs for both the backup and the error are blank. The main error message is:
There was a failure in updating the backup for deleted items.
The requested operation could not be completed due to a file system limitation
What am I doing wrong? B: was formatted to NTFS, and I've followed the instructions exactly.
So after some research, I found the cause of the error message. The proplem came from within the Virtual Machine itself. The VM or the Operating System was not configured, so Wbadmin would not accept the destination of //localhost/NetworkShare
When I tried backing up to a real network drive, everything worked as planned. The * wildcard, hoping to grab only the 6 testFiles numbered 1-6, worked correctly. However in real practice listing each individual file name after a comma: , will probably be more useful for others. Here is the command that worked:
wbadmin start backup -backuptarget:\\(IP address of network)\Public -inlcude:C:\!Test\testFile*
Here was the log report:
Backed up C:\
Backed up C:\!Test\
Backed up C:\!Test\testFile1.txt
Backed up C:\!Test\testFile2.txt
Backed up C:\!Test\testFile3.txt
Backed up C:\!Test\testFile4.txt
Backed up C:\!Test\testFile5.txt
Backed up C:\!Test\testFile6.txt
I hope this helps someone else

Invalid Directory Item Count (It should be 3*673 instead of 3*674) while MacBook partition using Disk Utility

I am trying to set up dual boot in my MacBookPro. When I am trying to make a partition from Disk Utility it is giving error,
Checking multi-linked files.
Checking catalog hierarchy.
Invalid directory item count (It should be 32673 instead of 32674)
I found solution from this post
Boot the Mac into Single User Mode by holding down Command+S during system boot. write,
fsck -fy
Once fsck completes, if you see a “File system was modified” message, you need to run again,
fsck -fy
After that you need to run,
reboot

custom Linux kernel build failure in vmware workstation

While trying to compile/build and boot custom kernel inside vmware workstation, while booting new kernel, it fails and falls to shell with error "failed to find disk by uuid".
I tried this with both ubuntu and centos.
Things I tried but didn't help
check mapping by uuid in boot entry and existence in directory.
initramfs-update
replaced root=uuid=<> with /dev/disk/sda3
is it issue with vmware workstation?
how can it be rectified..??
I had a similar fault with my own attempts to bootstrap Fedora 22 onto a blank partition using a Centos install on another partition. I never did solve it completely, but I did find the problem was in my initrd rather than the kernel.
The problem is the initrd isn't starting LVM because dracut didn't tell the initrd that it needs LVM. Therefore if you start LVM manually you should be able to boot into your system to fix it.
I believe this is the sequence of commands I ran from the emergency shell to start LVM:
vgscan
vgchange -ay
lvs
this link helped me remember
Followed by exit to resume normal boot.
You might have to mount your LVM /etc/fstab entries manually, I don't recall whether I did or not.
Try this:
sudo update-grub
Then:
mkinitcpio -p linux
It won't hurt to check your fstab file. There, you should find the UUID of your drive. Make sure you have the proper flags set in the fstab.
Also, there's a setting in the grub.cfg that has has GRUB use the old style of hexadecimal UUIDs. Check that out as well!
The issue is with creation of initramfs, after doing a
make oldconfig
and choosing default for new options, make sure the ENOUGH diskspace is available for the image to be created.
in my case the image created was not correct and hence it was failing to mount the image at boot time.
when compared; the image size was quite less than the existing image of lower version, so I added another disk with more than sufficient size and then
make bzImage
make modules
make modules_install
make install
starts working like a charm.
I wonder why the image creation got completed earlier and resulted in corrupt image (with less size) without throwing any error [every single time]

how to manually start/stop hadoop services on boot up/down?

Hi is someone aware about stopping and starting CDH(cloudera distribution Hadoop) Services with script we are doing this for production servers. For an instance if servers are restarted then before reboot all the Hadoop services stops gracefully and on startup the start.
I have a 8 Node Hadoop cluster on RHEL with cloudera 5.4.7 installed on it.
Till now i have identified few ways to do that one is here on link it says i have to use chkconfig to register the service on OS for eg as below:
sudo chkconfig hadoop-hdfs-namenode on
But when i am doing that i am getting error as
error reading information on service hadoop-hdfs-namenode: No such file or directory
which clearly states that it is unable to find the file i have specifed.
Then i searched for file and it is located in
/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/etc/rc.d/init.d/hadoop-hdfs-namenode
/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/etc/default/hadoop-hdfs-namenode
the i tried executing the same commands from the folder itself where files are located but the same error. The permissions are fine on file and tried ./ as well but same error.
I am also able to list all the process which are currently running by
sudo jps
14035 -- process information unavailable
10615 -- process information unavailable
15323 -- process information unavailable
5486 -- process information unavailable
2001 -- process information unavailable
46991 -- process information unavailable
42667 -- process information unavailable
33732 Jps
2698 -- process information unavailable
2727 -- process information unavailable
7901 -- process information unavailable
42624 -- process information unavailable
As one can see process names are not coming but these are hadoop process so to stop the process i can kill all of them but this is not the way to gracefully stop hadoop managed by cloudera. Please let me know if anyone is aware of anything which can help me moving forward.
Thanks to cloudera they provide a way to boot services on system startup. Below is the way to do that:
Click on the service
Go to the configuration
Search for Automatically Restart Process
Check the Check-Box.
It will restart the services on bootup.
you can do this by executing curl command form shell script. For example to start solr service you can use
curl -u admin:admin -X POST http://ipaddress:7180/api/v4/clusters//services/solr1/commands/start -H 'Content-type:aplication/json; charset=utf-8';
For More details on the visit
http://cloudera.github.io/cm_api/apidocs/v10/index.html

Resources