Osquery takes too much space - macos

I got some osquery on mac os and there is a file /private/var/log/osquery/osquery-output.log. This file takes almost 16 Gb of disk space. What is it? Can i delete it safely?

By itself, osquery does very little. It can be configured to run a variety of queries to examine system state. Depending on configuration, these results might be stored locally or sent to a log aggregator. The configuration can either be from a local file, or from a remote server.
It sounds like you have an osquery install that is configured to log to local disk, but nothing is collecting those results.
osquery itself does not do anything with that file. So you can certainly truncate it. (Just deleting it will likely leave an unlinked file). But that file implies a misconfigured setup.
Should it be logging to local disk? What consumes those logs? Etc.

Related

get name of memory mapped file

I have a windows host where, according to rammap, almost all memory is in mapped files. I try to find out which file causes such leak. All available guides suggest using tab File Summary to find out connection between file and mapped files. But there is no any file which occupies such amount of mapped files memory.
Is there a way to find out which file is to blame? I guess sysinternals tools like rammap already use windows api functions, so i won't find out more info if i'll try to use functions like GetMappedFileNameA on my own.
I have 24 GB of mapped files on my 96 GB machine. It seems to me that this is simply the Windows file cache ("Smartdrv", if you know that from DOS times).
This is roughly the same amount as displayed in Task Manager as "cached". The tool tip of that reads as
Memory that contains cached data and code that is not actively in use.
So, this is nothing to worry about. In fact it's great, because Windows can read files from memory instead of disk. That makes stuff much faster.

5.6 GB not enough for Cloudera?

I am running Cloudera Hadoop on my laptop and Oracle VirtualBox VM.
I have given 5.6 GB out of mine 8 and six from eight cores as well.
And still I am not able to keep it up and running.
Even without load services would not stay up and running and when I try a query at least Hive will be down within 20 minutes. And sometimes they go down like dominoes: one after another.
More memory seemed to help some: with 3GB and all services, Hue was blinking with red colors when the Hue itself managed to get up. And after rebooting it would takes 30 - 60 minutes before I manage to get the system up enough to even try running anything on it.
There has been two sensible notes (that I have managed to find):
- Warning of swapping.
- Crashing note when the system used 26 GB of virtual memory which was not enough.
My dataset is less than one megabyte, so it is hard to understand why the system would go up to dozens of gigabytes, but for whatever was reason for that has passed: now the system is running more steadily around the 5.6 GB that I have given to it after closing down a few services: see my answer to myself.
And still it is just more stable. Right after I got a warning of swapping and the Hive went down again. What could be reason for more-or-less all Hadoop services going down if the VM starts to swap?
I don't have enough reputation to post the picture to here, but when Hive went down again it was swapping 13 pages / second and utilizing 5.9 GB / 5.6 GB. So basically my system starts crashing more-or-less right after it start to swap. "428 pages were swapped to disk in the previous 15 minute(s)"
I have used default installation options as far as hard drive is concerned.
Only addition is a shared folder between Windows and VM. That works somewhat strangely locking files all the time, so I used it just like FTP and only for passing files from one system to another. Thus I can go days without using it, but systems still crash, so that is not the cause either.
Now that the system is mostly up, services crash still about twice a day: Service Monitor and Hive are quite even with their crashing frequency. After those come Activity Monitor and Event Server, which appear to crash always together. I believe Yarn crashes as well, but it gets up on its own. Last time Hive crashed first, and then it got followed by Service Monitor, Hive (second time), Activity Monitor and Event Server all.
As swap is disk, perhaps the problem is with disk:
# cat /etc/fstab
# swapoff -a
# badblocks -v /dev/VolGroup/lv_swap
Checking blocks 0 to 8388607
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.
# badblocks -vw /dev/VolGroup/lv_swap
Checking for bad blocks in read-write mode
From block 0 to 8388607
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
So nothing wrong with swap disk and I have not noticed any disk error anywhere else either.
Note that you could check file system from Windows side also. But I expect that if you make Windows to fix your Linux file system, you have good chances of destroying your Linux with that, so I did my checks somewhat pessimistically, because AFAIK these commands are safe to execute.
About half of the services kept going down, so giving more specifics would be a long story.
I succeeded to get the system more stable by closing down flume, hbase, impala, ks_indexer, oozie, spark and sqoop. And by increasing more memory to some remaining services that complained they had not been given enough memory.
Also I fixed couple of thing on the Windows side, I am not sure which one of these helped:
- MsMpEng.exe kept my hard drive busy. I didn't have permissions to kill it, but I decreased its priority to lowest possible.
- CcmExec.exe got to loop on my DVD and kept reading it for forever. This I solved by taking the DVD out from the drive. Then later on I killed the process tree to keep it from bothering for a while.
I found these using Windows resource manager.
The VM requires 4GB: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html You should use that.
I am not clear whether you are using the QuickStart VM though. It's set up to run just the essential services and tuned to conserve memory rather than exploit lots of memory.
It sounds like you are running your own installation, on one virtual machine, on your Windows machine. You may be running an entire cluster's worth of services on one desktop machine. Each of these services has master, worker processes, monitoring processes, etc. You don't need most of them.
You also probably have left memory settings at default suitable for a server-class machine of 16+ GB RAM. Remember these services usually run across many machines, not all on one.
Finally, you're clearly swapping, and that makes things incredibly slow. Remember this is all through a VM too!
Bottom line, use the QuickStart VM if you really want a 1-machine cluster tuned correctly. If you want a real cluster or more services, you need more hardware.
Also consider: cloudera.com/live contains a full CDH 5.1 cluster + sample data, running on demand on AWS. Of course, the advantage of the VM is that you can BYOD, but if you're simply looking for a hands-on Hadoop experience, Live is a great option.

Redis's huge files won't delete?

I'm using Redis-server for windows ( 2.8.4 - MSOpenTech) / windows 8 64bit.
It is working great , but even after I run :
I see this : (and here are my questions)
When Redis-server.exe is up , I see 3 large files :
When Redis-server.exe is down , I see 2 large files :
Question :
— Didn't I just tell it to erase all DB ? so why are those 2/3 huge files are still there ?
How can I completely erase those files? ( without re-generating)
NB
It seems that it is doing deletion of keys without freeing occupied space. if so , How can I free this unused space?
From https://github.com/MSOpenTech/redis/issues/83
"Redis uses the fork() UNIX system API to create a point-in-time snapshot of the data store for storage to disk. This impacts several features on Redis: AOF/RDB backup, master-slave synchronization, and clustering. Windows does not have a fork-like API available, so we have had to simulate this behavior by placing the Redis heap in a memory mapped file that can be shared with a child(quasi-forked) process. By default we set the size of this file to be equal to the size of physical memory. In order to control the size of this file we have added a maxheap flag. See the Redis.Windows.conf file in msvs\setups\documentation (also included with the NuGet and Chocolatey distributions) for details on the usage of this flag. "
I know this is an old thread, but I am facing the same issues with the file sizes.
In case you have problems with your C ssd drive (like me), you can make a directory junction:
1) Stop redis service
2) Move C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Redis folder to another drive / location.
3) Open a command prompt in C:\Windows\ServiceProfiles\NetworkService\AppData\Local then execute:
mklink /J "C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Redis" "[newpath]"
PD: [newpath] must be absolute, like "D:\directory junctions\Redis"
4) Start redis service. Now the files are in another drive.
Check http://ss64.com/nt/mklink.html if doubts regarding this command.
I faced this same issue on my development machine. It was resolved by stopping the redis service and I used WinDirStat (which is what I used to detect the issue originally) to permanently delete these files in appdata/local/redis.
I then started redis back up and things were working fine.
Before following this same procedure others may want to ensure that this data isn't needed. In my case it wasn't critical since this is my development workstation.
When you flush the DB you only flush the keys from memory. I'm not sure why you've got files of different names, it may be an artifact of the way the Windows port of Redis manages files, but Redis itself doesn't delete files when you remove keys. You will need to manage outdated files outside of Redis.

let's say I am writing my code and then my PC died, how necessary is it to do a complete scan if i don't want my later source code to be contaminated?

let's say I am writing a Ruby on Rails program and while editing a file, the machine blue screened. in this case, how necessary is it to re-scan the whole hard drive if I don't want my future files to be damaged?
Let's say if the OS is deleting a tmp file at the moment when my computer crashed, and still have some pointers to some sector on the hard drive. and if my newly created files happen to be in those sector, and next time the OS clean up files again, it may think that the "left-over" sector wasn't cleaned last time and clean it again, and damaging our source code. (esp with Ruby on Rails, where the source code could be generated by rails and not by us, and we may not know why our rails server doesn't work, if a file is affected). we can rely on SVN, but what if the file is affected before we check it in?
i think the official answer will be: "always scan the disk after a crash or power outage, for the data and even the space and indicate attempt to fix any bad sector", but the thing is, nowadays with the hard drive so big, it could take 2 hours to scan everything. And especially at work, we cannot wait for 2 hours if it is the middle of the day.
Does someone know if the modern OS, like XP, Vista, Mac OS, and Linux (when sometimes the power cord was loose and it didn't shut down properly and just shut down on 0% battery), with these modern OS, are our source code safe? Do they know how to structure to write to sector so that at most it will waste sector instead of overlapping sectors?
With a modern journaling file system (ext3/4, NTFS), the only problem would be that a file could be in a "half-written" state. Obviously scanning is not going to help this (that's what backups are for). The file system itself could not be corrupted. If you are using something like FAT, then yes, you should worry about this.
There's really only 1 issue here.
Is any file currently being written in some kind of "half written" state.
The primary cause of this would be if the application/editor is writing the file and the machine dies halfway through. In this case, the file be written is, well, half done. If it was over writing the original file, the original file is "gone", and the new one is "half done". If you don't have a back up file, then, well, you have a problem.
As far as a file having dangling pointers, or references to sectors not written, or somesuch thing. That problem depends on your file system.
The major, modern files ystems are journaled and "won't allow" this to happen. You may have a "half written", but that's because the application only got to write half of it, rather than the file system losing track of a sector pointer.
If you're playing file system games for performance, or whatever (such as using a UFS without logging), then you would want to run a fschk to clean up the file systems meta data.
But if you're using a modern operating system and file system (i.e. anything from the past 5 years), you won't have this problem.
Finally, if you do have version control running, then just do an "svn status", it will show you any "corrupted" files as they will have changed and it will detect that as well.
i see some information on
http://en.wikipedia.org/wiki/Journaling_file_system
Journalized file systems
File systems may provide journaling, which provides safe recovery in the event of a system crash. A journaled file system writes some information twice: first to the journal, which is a log of file system operations, then to its proper place in the ordinary file system. Journaling is handled by the file system driver, and keeps track of each operation taking place that changes the contents of the disk. In the event of a crash, the system can recover to a consistent state by replaying a portion of the journal. Many UNIX file systems provide journaling including ReiserFS, JFS, and Ext3.
In contrast, non-journaled file systems typically need to be examined in their entirety by a utility such as fsck or chkdsk for any inconsistencies after an unclean shutdown. Soft updates is an alternative to journaling that avoids the redundant writes by carefully ordering the update operations. Log-structured file systems and ZFS also differ from traditional journaled file systems in that they avoid inconsistencies by always writing new copies of the data, eschewing in-place updates.

user_dump_dest is there harm in putting this on a NAS

Is it correct to say that typically user_dump_dest is on a local drive?
If so, are there issues with mounting a NAS volume to both Unix and Windows and pointing user_dump_dest at that?
If so, what are they?
Are any issues worth not doing this in prod?
I've run 9.2 instances with user_dump_dest on a NAS and never had a problem with it.
If you are concerned though, have oracle write them locally, then sync them across to your NAS and remove them from local, I've never needed to do that though.
Yes, typically (and by default), user_dump_dest is on a local drive. I wouldn't expect there are any specific issues with putting it on NAS; but it would have all the potential issues that any application might: (1) If the NAS could not be reached, Oracle would not be able to write out user dump files, and (2) the write latency would be increased. The latency issue is probably only important if you're doing large-scale tracing to diagnose performance issues, as the write latency would impact the timing measurements. I would expect that if the NAS because unreachable and Oracle tried to write to it, it would silently fail.
I don't think I would do it unless space constraints on the local disk forced me to. But generally the contents of this directory should be insignificant relative to your data.

Resources