jffs2 gabrage collector runs even if partition is mounted RO - linux-kernel

I use JFFS2 partition as the root one mounting it in the command line (and fstab) as RO. It is Montavista 5.0 linux (2.6.18).
Everything works, except that when linux arrives to my application it is busy for about 15s by jffs2_gcd_mtd3 with 98% CPU time. This is unacceptable in my case.
I searched linux code and saw that GC thread is started ONLY when mounted RW, but in my case it starts nonetheless!
I tried to mount it rw and unmount afterwords, but...
Thanks ahead.
UPDATE: The statement about GC daemon is wrong - I saw it on error. The main cause of the issue is the VERY VERY slow work of JFFS2 in comparison to YAFFS2 I had previously. Just to compare - my ELF formatted application of 14MiB was loaded from YAFFS2 in 2-2.5 sec, while from JFFS2 it takes about 8 sec.!!!
This made me think that there is something blocking Linux...
Now, the question si changed to: what can make the JFFS2 to be SO DREADFULLY slow!?!? Again, the partition is mounted RO!

OK, the answer is as following:
It takes A LOT of time for JFFS2 to mount the partition of 120MiB - something about 10 seconds on ARM5 running at 300MHz. Nothing helps here - sumtools, unmount with R/W (to write the summary).
I solved the issue by:
- not including the unnecessary/unused space into Linux partitions;
- dividing the rest of 70MiB into two - one 55MiB with all Linux stuff, one of 15MiB with my application and its files.
This solved the problem. The time is about 2-3s

Related

What can be done to lower UE4Editor startup time?

Status: the problem lowered, but compared to other users reports it persists.
I have moved to UE4.27.0 and the startup time lowered from 11 (v4.26.2) to 6 minutes! (the RAM usage lowered too!) But doesnt compare to the speed other ppl report "almost instantly"...
It is not compiling anything, not even shaders, it is like the 6th time I run it for one project.
Should I try to disable plugins? but Im new with UE and dont want to difficult my usage. Tho, for ex., I have nothing VR related to test so it could really be initially disabled.
HD READ SPEED? NO
I have tested moving UE4Editor whole engine path (100GB) to a 3xSSD(Stripes), but the UE4Editor startup time remained the same. My HD were it is too, is fast but not so fast as the 3xSSD.
CPU USAGE? MAY BE if it could use 4 cores could solve it?
UE4Editor startup uses A SINGLE CORE ONLY, i can confirm with htop and system monitor, it is possible to see only a single core being used 100% and it changes between the 4 cores, so only one is used at 100% per time.
I tested this command line parameter -USEALLAVAILABLECORES after the project URL for UE4Editor, but nothing changed. I read that option is ignored in some machines, so may be if I patch it's usage it could work on mine?
GPU? no?
a report about an integrated graphics card (weak one) says it doesnt interfere with the startup time.
LOG for UE4Editor v4.27.0 with the new biggest intervals ("..." means ommited log lines to make it easier to read; "!(interval in seconds)" is just to easy reading it (no ommitted lines here)):
[2021.09.15-23.38.20:677][ 0]LogHAL: Linux SourceCodeAccessSettings: NullSourceCodeAccessor
!22s
[2021.09.15-23.38.42:780][ 0]LogTcpMessaging: Initializing TcpMessaging bridge
[2021.09.15-23.38.42:782][ 0]LogUdpMessaging: Initializing bridge on interface 0.0.0.0:0 to multicast group 230.0.0.1:6666.
!16s
[2021.09.15-23.38.58:158][ 0]LogPython: Using Python 3.7.7
...
[2021.09.15-23.39.01:817][ 0]LogImageWrapper: Warning: PNG Warning: Duplicate iCCP chunk
!75s
[2021.09.15-23.40.16:951][ 0]SourceControl: Source control is disabled
...
[2021.09.15-23.40.26:867][ 0]LogAndroidPermission: UAndroidPermissionCallbackProxy::GetInstance
!16s
[2021.09.15-23.40.42:325][ 0]LogAudioCaptureCore: Display: No Audio Capture implementations found. Audio input will be silent.
...
[2021.09.15-23.41.08:207][ 0]LogInit: Transaction tracking system initialized
!9s
[2021.09.15-23.41.17:513][ 0]BlueprintLog: New page: Editor Load
!23s
[2021.09.15-23.41.40:396][ 0]LocalizationService: Localization service is disabled
...
[2021.09.15-23.41.45:457][ 0]MemoryProfiler: OnSessionChanged
!13s
[2021.09.15-23.41.58:497][ 0]LogCook: Display: CookSettings for Memory: MemoryMaxUsedVirtual 0MiB, MemoryMaxUsedPhysical 16384MiB, MemoryMinFreeVirtual 0MiB, MemoryMinFreePhysical 1024MiB
SPECS:
I'm using ubuntu 20.04.
My CPU is 4 cores 3.6GHz.
GeForce GT 710 1GB.
Related question but for older UE4: https://answers.unrealengine.com/questions/987852/view.html
Unreal Engine needs a high-end pc with a lot of RAM, fast SSD's, a good CPU and a medium graphic card. First of all there are always some shaders that needs to be compiled from the engine, and a lot of assets to be loaded in the startup time. As I can see you're on Linux you are probably using a self-compiled Unreal Engine version.... not the best thing to do for a newbie, because this may cause several problems on load time, startup, compiling and a lot of other stuff. If it's the first times you're using Unreal, try using it on Windows, it's all easier.

Why doesn't the Linux kernel see the cache sizes in the gem5 emulator in full system mode?

I want to play around with cache sizes in my gem5 simulator to see how it affects performance of programs, and possibly tune programs at runtime.
As a sanity check, I tried to check that the command lines arguments I used were working , and so I tried the various methods proposed at: https://superuser.com/questions/55776/finding-l2-cache-size-in-linux/1298808#1298808
cat /sys/devices/system/cpu/cpu0/cache/index2/size
getconf LEVEL2_CACHE_SIZE
But I observed that:
the file /sys/devices/system/cpu/cpu0/cache/index2/size does not exist
getconf is empty
Why is that?
I am certain however that the caches are being, since I've benchmarked simple programs, and the cycle counts increase when I decrease the caches.
For example, my base command is:
M5_PATH='/data/git/linux-kernel-module-cheat/gem5/gem5-system' '/data/git/linux-kernel-module-cheat/gem5/gem5/build/ARM/gem5.opt' '/data/git/linux-kernel-module-cheat/gem5/gem5/configs/example/fs.py' --command-line='earlyprintk=pl011,0x1c090000 console=ttyAMA0 lpj=19988480 rw loglevel=8 mem=512MB root=/dev/sda nokaslr norandmaps printk.devkmsg=on printk.time=y' --disk-image='/data/git/linux-kernel-module-cheat/buildroot/output.arm-gem5~/images/rootfs.ext2' --dtb-file='/data/git/linux-kernel-module-cheat/gem5/gem5/system/arm/dt/armv7_gem5_v1_1cpu.dtb' --kernel='/data/git/linux-kernel-module-cheat/buildroot/output.arm-gem5~/build/linux-custom/vmlinux' --machine-type=VExpress_GEM5_V1 --num-cpus=1 --caches --l1d_size=1024 --l1i_size=1024 --l2cache --l2_size=1024 --l3_size=1024 --cpu-type=HPI
With those tiny caches, running the following:
m5 resetstats && dhrystone 10000 && m5 dumpstats
takes 175M cycles, and only 16M cycles if I use the exact same command but with huge caches of size 1024MB.
I observe a similar behavior for x86.
I'm using this testing infrastructure: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/05d8a324f74849f03404eb847f8da748e2e4502c#gem5-change-system-parameters which implies:
gem5 commit: fbe63074e3a8128bdbe1a5e8f6509c565a3abbd4
Linux kernel v4.15 with configuration: https://github.com/cirosantilli/linux-kernel-module-cheat/blob/05d8a324f74849f03404eb847f8da748e2e4502c/kernel_config_arm-gem5
Related thread on the mailing list: http://gem5-users.gem5.narkive.com/4xVBlf3c/verify-cache-configuration
For comparison, QEMU v2.11.0 x86 did show the cache sizes, but not the ARM one.
Maybe for ARM we would need to modify the bootloaders to tell that to kernel? But I don't know how those things work very well:
https://github.com/gem5/gem5/blob/fbe63074e3a8128bdbe1a5e8f6509c565a3abbd4/system/arm/simple_bootloader/simple.S
https://github.com/gem5/gem5/blob/fbe63074e3a8128bdbe1a5e8f6509c565a3abbd4/system/arm/aarch64_bootloader/boot.S
I have been told that:
gem5 doesn't implement the cache size discovery registers.
The problem is that it is really hard to configure them in the general case, and they might not even be able to represent the hierarchy in gem5.

Docker volume with Grunt file watch

I'm porting an existing project with Grunt file watches to a Docker development container. The source files are bind-mounted into the container, and Grunt watches the files for changes (this can probably be optimized, but my current concern is: simply get the current setup working within Docker).
On the Mac, I'm experiencing enormous CPU usage, so I read the performance tuning guide for osxfs. The guide mentions the cached and delegated volume modes.
The description for delegated says:
the container’s view is authoritative
(permit delays before updates on the container appear in the host)
For cached:
[…] provides all the guarantees of the delegated configuration, and some additional guarantees around the visibility of writes performed by containers. As such, cached typically improves the performance of read-heavy workloads, at the cost of some temporary inconsistency between the host and the container.
In comparison to which setting does cached improve performance? Is "read-heavy workloads" seen from the container's perspective?
To cut a long story short: What's the optimal setting to reduce CPU usage for a development environment which uses file watches? cached or delegated?
Ok, so I did some testing and here's my results. Setup:
MacBook Air 11", early 2014
macOS 10.12.6
Docker 17.06.0-ce-mac19 (18663)
watch task polling for ~ 1,000 files
The culprit processes eating up CPU cycles in the host are hyperkit and com.docker.osxfs. The following percentage values are the median CPU usage taken over five samples:
delegated: 18.7 % hyperkit + 0.0 % com.docker.osxfs = 18.7 %
cached: 24.3 % hyperkit + 0.1 % com.docker.osxfs = 24.4 %
default aka. consistent: 152.0 % hyperkit + 68.9 % com.docker.osxfs = 220.9 % (!)
Functionality-wise I didn't notice any difference. When changing a file outside the container the changes were picked up virtually immediately by the watch in each of the three cases. So I'm going to use the delegated mode now.

ext4 commit= mount option and dirty_writeback_centisecs

I'm tring to understand the way bytes go from write() to the phisical disk plate to tune my picture server performance.
Thing I don't understand is what is the difference between these two: commit= mount option and dirty_writeback_centisecs. Looks like they are about the same procces of writing changes to the storage device, but still different.
I do not get it clear which one fires first on the way to the disk for my bytes.
Yeah, I just ran into this investigating mount options for an SDCard Ubuntu install on an ARM Chromebook. Here's what I can tell you...
Here's how to see the dirty and writeback amounts:
user#chrubuntu:~$ cat /proc/meminfo | grep "Dirty" -A1
Dirty: 14232 kB
Writeback: 4608 kB
(edit: This dirty and writeback is rather high, I had a compile running when I ran this.)
So data to be written out is dirty. Dirty data can still be eliminated (if say, a temporary file is created, used, and deleted before it goes to writeback, it'll never have to be written out). As dirty data is moved into writeback, the kernel tries to combine smaller requests that may be into dirty into single larger I/O requests, this is one reason why dirty_expire_centisecs is usually not set too low. Dirty data is usually put into writeback when a) Enough data is cached to get up to vm.dirty_background_ratio. b) As data gets to be vm.dirty_writeback_centisecs centiseconds old (3000 default is 30 seconds) it is put into writeback. vm.dirty_writeback_centisecs, a writeback daemon is run by default every 500 centiseconds (5 seconds) to actually flush out anything in writeback.
fsync will flush out an individual file (force it from dirty into writeback and wait until it's flushed out of writeback), and sync does that with everything. As far as I know, it does this ASAP, bypassing any attempt to try to balance disk reads and writes, it stalls the device doing 100% writes until the sync completes.
commit=5 default ext4 mount option actually forces a sync() every 5 seconds on that filesystem. This is intended to ensure that writes are not unduly delayed if there's heavy read activity (ideally losing a maximum of 5 seconds of data if power is cut or whatever.) What I found with an Ubuntu install on SDCard (in a Chromebook) is that this actually just leads to massive filesystem stalls like every 5 seconds if you're writing much to the card, ChromeOS uses commit=600 and I applied that Ubuntu-side to good effect.
The dirty_writeback_centisecs, configures the daemons of the kernel Linux related to the virtual memory (that's why the vm). Which are in charge of making a write back from the RAM memory to all the storage devices, so if you configure the dirty_writeback_centisecs and you have 25 different storage devices mounted on your system it will have the same amount of time of writeback for all the 25 storage systems.
While the commit is done per storage device (actually is per filesystem) and is related to the sync process instead of the daemons from the virtual memory.
So you can see it as:
dirty_writeback_centisecs
writing from RAM to all filesystems
commit
each filesystem fetches from RAM

What does "mds" use to iterate the mounted file systems?

I have been closely following the Open ZFS development scene for OS X for the last few years. Things have progressed significantly over the last several months since the sad problems that occurred with Greenbytes, etc., but I have been pleased to see that we're finally close to getting real Spotlight support in place. I noticed this email passing by the other day from Jorgen Lundman (who has put a great deal of personal time into getting this going and contributing to the community) and thought perhaps others here might be interested in chiming in on this, his, topic regarding implementing Spotlight support for ZFS on OS X:
To summarize, I think the crux of this question boils down to this:
So then, what does "mds" use to iterate the mounted file systems? I do not
think the sources for "Spotlight-800.28" was ever released so we can't just
go look and learn, like we did for xnu, and IOkit.
It doesn't use the BSD getfsstat(), more likely it asks IOKit, and for some
reason rejects the lower mounts.
And the body of the email for convenience:
Hey guys,
So one of our long-term issues in OpenZFSonOSX is to play nice with Spotlight.
We have reached the point where everything sometimes pretends to work.
For example;
# mdfind helloworld4
/Volumes/hfs1/helloworld4.jpg
/Volumes/hfs2/helloworld4.jpg
/Volumes/zfs1/helloworld4.jpg
/Volumes/zfs2/helloworld4.jpg
Great, picks it up in our regular (control group) HFS mounted filesystems,
as well as the 2 ZFS mounts.
Mounted as:
/dev/disk2 on /Volumes/zfs1 (zfs, local, journaled)
/dev/disk2s1 on /Volumes/zfs2 (zfs, local, journaled)
# diskutil list
/dev/disk1
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *42.9 GB disk1
1: ZFS 42.9 GB disk1s1
2: 6A945A3B-1DD2-11B2-99A6-080020736631 8.4 MB disk1s9
/dev/disk2
#: TYPE NAME SIZE IDENTIFIER
0: zfs_pool_proxy FEST *64.5 MB disk2
1: zfs_filesystem_proxy ssss 64.5 MB disk2s1
So you can see, the actual pool disk is /dev/disk1, and the fake nodes we
create for mounting as /dev/disk2*, as it appears to be required by
Spotlight to work at all. We internally also let the volumes auto-mount,
from issuing "diskutil mount -mountPoint %s %s".
We are not a VOLFS, so there is no ".vol/" directory, nor will mdutil -t
work. But these two points are true for MS-DOS as well, and that does work
with Spotlight.
We correctly reply to zfs.fsbundle's zfs.util for "-p" (volume name) and
"-k" (get uuid), done pre-flight to mounting by DA.
Using FSMegaInfo tool, we can confirm that stat, statfs, readdir, and
similar tests appear to match that of HFS.
So then, the problem.
The problem comes from mounting zfs inside zfs. Ie,
When we mount
/Volumes/hfs1/
/Volumes/hfs1/hfs2/
/Volumes/zfs1/
/Volumes/zfs1/zfs2/
# mdfind helloworld4
/Volumes/hfs1/helloworld4.jpg
/Volumes/hfs1/hfs2/helloworld4.jpg
/Volumes/zfs1/helloworld4.jpg
Absent is of course, "/Volumes/zfs1/zfs2/helloworld4.jpg".
Interestingly, this works
# mdfind -onlyin /Volumes/zfs1/zfs2/ helloworld4
/Volumes/zfs1/zfs2/helloworld4.jpg
And additionally, mounting in reverse:
/Volumes/hfs2/
/Volumes/hfs2/hfs1/
/Volumes/zfs2/
/Volumes/zfs2/zfs1/
# mdfind helloworld4
/Volumes/hfs2/helloworld4.jpg
/Volumes/hfs2/hfs1/helloworld4.jpg
/Volumes/zfs2/helloworld4.jpg
So whichever ZFS filesystem was mounted first, works, but not the second.
So the individual ZFS filesystems are both equal. It is as if it doesn't
realise the lower mount is its own device.
So then, what does "mds" use to iterate the mounted fileystems? I do not
think the sources for "Spotlight-800.28" was ever released so we can't just
go look and learn, like we did for xnu, and IOkit.
It doesn't use the BSD getfsstat(), more likely it asks IOKit, and for some
reason rejects the lower mounts.
Some observations:
# /System/Library/Filesystems/zfs.fs/zfs.util -k disk2
87F06909-B1F6-742F-7355-F0D597849138
# /System/Library/Filesystems/zfs.fs/zfs.util -k disk2s1
8F60C810-2D29-FCD5-2516-2D02EED4566B
# grep uu /Volumes/zfs1/.Spotlight-V100/VolumeConfiguration.plist
<key>uuid.87f06909-b1f6-742f-7355-f0d597849138</key>
# grep uu /Volumes/zfs1/zfs2/.Spotlight-V100/VolumeConfiguration.plist
<key>uuid.8f60c810-2d29-fcd5-2516-2d02eed4566b</key>
Any assistance is appreciated, the main issue tracking Spotlight is;
https://github.com/openzfsonosx/zfs/issues/116
The branch for it;
https://github.com/openzfsonosx/zfs/tree/issue116
vfs_getattr;
https://github.com/openzfsonosx/zfs/blob/issue116/module/zfs/zfs_vfsops.c#L2307
It appears to be down to some undocumented expectations in the vfs_vget method, to lookup entries based entirely on the inode number. Ie, stat /.vol/16777222/1102011
It is expected that vfs_vget sets the vnode_name correctly here, using a call like vnode_update_identity() or similar.

Resources