Rounding integers to nearest 5

Rounding integers to nearest 5 - bash

Using the example below, anything below 2.5% will round down to 0%. How do I get it to round down to 1% and not 0%?
s='1.5% 2.5% 3.5% 25% 27% 34% 68%'
awk '{for (i=1; i<=NF; i++) $i = int( ($i+2) / 5) * 5 "%"} 1' <<< "$s"
0% 0% 5% 25% 25% 35% 70%

Related

Filter text disappearing after concat, why?

I'm trying to concatenate 2 videos, while at the same time adding some text to the second one. I'm using php-ffmpeg to do this, with the Sharapov extension, but I have the actual ffmpeg commands.
To do this it seems to encode the first video, then the second one with the text separately, and then joins them together. Problem is the text appears in the encoded (but un-joined) second video, but as soon as they are joined together it disappears.
Here are the commands:
ffmpeg '-y' '-i' 'video_in/first_vid.mp4' '-threads' '8' '-vcodec' 'libx264' '-acodec' 'aac' '-max_muxing_queue_size' '400' '-b:v' '1000k' '-refs' '6' '-coder' '1' '-sc_threshold' '40' '-flags' '+loop' '-me_range' '16' '-subq' '7' '-i_qfactor' '0.71' '-qcomp' '0.6' '-qdiff' '4' '-trellis' '1' '-b:a' '256k' '-ac' '2' '-pass' '1' '-passlogfile' '/tmp/ffmpeg-passes5b73814fa1e29qjif7/pass-5b73814fa1f5b' '/data/tmp/ffmpeg-7a580007c010377da19305b0836e7a4d-5b73814f9ff35.mp4'
2% 2% 2% 4% 4% 6% 6% 6% 8% 8% 10% 10% 10% 12% 12% 12% 14% 14% 14% 16% 16% 18% 18% 18% 20% 20% 22% 22% 22% 24% 24% 24% 26% 26% 26% 28% 28% 28% 30% 30% 30% 32% 32% 34% 34% 34% 36% 36% 36% 38% 38% 38% 40% 40% 40% 42% 44% 44% 44% 46% 46% 46% 48% 48% 48%
ffmpeg '-y' '-i' 'video_in/first_vid.mp4' '-threads' '8' '-vcodec' 'libx264' '-acodec' 'aac' '-max_muxing_queue_size' '400' '-b:v' '1000k' '-refs' '6' '-coder' '1' '-sc_threshold' '40' '-flags' '+loop' '-me_range' '16' '-subq' '7' '-i_qfactor' '0.71' '-qcomp' '0.6' '-qdiff' '4' '-trellis' '1' '-b:a' '256k' '-ac' '2' '-pass' '2' '-passlogfile' '/tmp/ffmpeg-passes5b73814fa1e29qjif7/pass-5b73814fa1f5b' '/data/tmp/ffmpeg-7a580007c010377da19305b0836e7a4d-5b73814f9ff35.mp4'
50% 50% 50% 52% 52% 52% 52% 52% 52% 54% 54% 54% 54% 54% 54% 56% 56% 56% 56% 56% 56% 58% 58% 58% 58% 58% 58% 60% 60% 60% 60% 60% 60% 62% 62% 62% 62% 62% 62% 64% 64% 64% 64% 64% 66% 66% 66% 66% 66% 66% 66% 68% 68% 68% 68% 68% 70% 70% 70% 70% 70% 72% 72% 72% 72% 72% 72% 72% 74% 74% 74% 74% 74% 74% 74% 76% 76% 76% 76% 76% 76% 76% 78% 78% 78% 78% 78% 78% 80% 80% 80% 80% 80% 80% 82% 82% 82% 82% 82% 84% 84% 84% 84% 84% 84% 84% 84% 84% 84% 86% 86% 86% 86% 86% 86% 88% 88% 88% 88% 88% 88% 88% 90% 90% 90% 90% 90% 90% 90% 90% 92% 92% 92% 92% 94% 94% 94% 94% 94% 94% 94% 94% 94% 96% 96% 96% 96% 96% 96% 96% 96% 98% 98% 98% 98% 98% 98%
ffmpeg '-y' '-i' 'video/second_vid.mp4' '-filter_complex' '[0:v]drawtext=fontfile=/usr/share/fonts/truetype/msttcorefonts/arial.ttf:text='test text':fontcolor='ffffff#1':fontsize=30:x=(w-tw)/2:y=50:alpha='if(lt(t,2),0,if(lt(t,4),(t-2)/2,if(lt(t,24),1,if(lt(t,26),(2-(t-24))/2,0))))'' '-threads' '8' '-vcodec' 'libx264' '-acodec' 'aac' '-max_muxing_queue_size' '400' '-b:v' '1000k' '-refs' '6' '-coder' '1' '-sc_threshold' '40' '-flags' '+loop' '-me_range' '16' '-subq' '7' '-i_qfactor' '0.71' '-qcomp' '0.6' '-qdiff' '4' '-trellis' '1' '-b:a' '256k' '-ac' '2' '-pass' '1' '-passlogfile' '/tmp/ffmpeg-passes5b738244e22e0b38pp/pass-5b738244e2412' '/data/tmp/ffmpeg-ed6a389106090a6a9f9c71c7c357400b-5b738244e2215.mp4'
2% 2% 4% 6% 8% 10% 12% 14% 14% 16% 18% 18% 20% 22% 22% 24% 26% 28% 30% 30% 32% 34% 34% 36% 38% 40% 42% 44% 44% 46% 48%
ffmpeg '-y' '-i' 'video/second_vid.mp4' '-filter_complex' '[0:v]drawtext=fontfile=/usr/share/fonts/truetype/msttcorefonts/arial.ttf:text='test text':fontcolor='ffffff#1':fontsize=30:x=(w-tw)/2:y=50:alpha='if(lt(t,2),0,if(lt(t,4),(t-2)/2,if(lt(t,24),1,if(lt(t,26),(2-(t-24))/2,0))))'' '-threads' '8' '-vcodec' 'libx264' '-acodec' 'aac' '-max_muxing_queue_size' '400' '-b:v' '1000k' '-refs' '6' '-coder' '1' '-sc_threshold' '40' '-flags' '+loop' '-me_range' '16' '-subq' '7' '-i_qfactor' '0.71' '-qcomp' '0.6' '-qdiff' '4' '-trellis' '1' '-b:a' '256k' '-ac' '2' '-pass' '2' '-passlogfile' '/tmp/ffmpeg-passes5b738244e22e0b38pp/pass-5b738244e2412' '/data/tmp/ffmpeg-ed6a389106090a6a9f9c71c7c357400b-5b738244e2215.mp4'
52% 52% 54% 54% 56% 56% 58% 58% 60% 60% 62% 64% 66% 68% 70% 70% 72% 74% 76% 76% 78% 80% 82% 84% 86% 86% 88% 88% 90% 90% 92% 94% 96% 98%
ffmpeg '-y' '-f' 'concat' '-safe' '0' '-i' '/data/tmp/ffmpeg-concat-5b73814f9fe10' '-c' 'copy' 'video_out/done_vid.mp4'
(For some reason it seems to run the encoding of each video command again at 50%)
So when I play ffmpeg-ed6a389106090a6a9f9c71c7c357400b-5b738244e2215.mp4, the text is in there, but when I play the output file (done_vid.mp4), the text isn't there.
What's going on?
UPDATE: Here are the ffprobes of the 2 temp files in case they help:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/tmp/ffmpeg-7a580007c010377da19305b0836e7a4d-5b73814f9ff35.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.83.100
Duration: 00:02:25.17, start: 0.000000, bitrate: 1247 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 999 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 242 kb/s (default)
Metadata:
handler_name : SoundHandler
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/tmp/ffmpeg-ed6a389106090a6a9f9c71c7c357400b-5b738244e2215.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.83.100
Duration: 00:01:30.03, start: 0.000000, bitrate: 322 kb/s
Stream #0:0(und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv444p, 1280x720 [SAR 1:1 DAR 16:9], 61 kb/s, 23.98 fps, 23.98 tbr, 19184 tbn, 47.96 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 258 kb/s (default)
Metadata:
handler_name : SoundHandler

The problem
Your first video output is yuv420p, but your second video output is yuv444p. Both should be yuv420p, so add format=yuv420p to the end of your filterchain:
'-filter_complex' '[0:v]drawtext=fontfile=/usr/share/fonts/truetype/msttcorefonts/arial.ttf:text='test text':fontcolor='ffffff#1':fontsize=30:x=(w-tw)/2:y=50:alpha='if(lt(t,2),0,if(lt(t,4),(t-2)/2,if(lt(t,24),1,if(lt(t,26),(2-(t-24))/2,0))))',format=yuv420p'
Other stuff
Since you're re-encoding everything anyway use the concat filter, then you can do everything in one command.
Do you really want to perform two passes? Use one pass with -crf instead of -b:v unless you are targeting a specific output file size. See FFmpeg Wiki: H.264.
You don't need to declare -threads. The encoder will automatically choose an appropriate value.
Use the encoding presets (see wiki link above). Then you can stop encoding like it is 2006 and you can remove all of this: '-refs' '6' '-coder' '1' '-sc_threshold' '40' '-flags' '+loop' '-me_range' '16' '-subq' '7' '-i_qfactor' '0.71' '-qcomp' '0.6' '-qdiff' '4' '-trellis' '1'.
I'll assume you're making videos to be played via progressive download. If that's the case add the -movflags +faststart output option so it can begin playback before it is completely downloaded by the viewer.

Ruby Regular expression to get percentage from specific line

I have the following output:
Name Stmts Miss Cover Missing
---------------------------------------------------------------
src/global_information.py 8 1 88% 6
src/settings.py 38 0 100%
src/storage_backends.py 4 4 0% 1-5
src/urls.py 8 0 100%
users/admin.py 1 0 100%
users/apps.py 3 3 0% 1-5
users/forms.py 5 0 100%
users/models.py 1 0 100%
users/tests/tests_views_urls.py 5 0 100%
users/urls.py 5 0 100%
users/views.py 1 1 0% 1
---------------------------------------------------------------
TOTAL 79 9 89%
I need to get the TOTAL percentage, which is 89%. I try the following two regex:
TOTAL\s+\d+\s+\d+\s+\d+\%
and
(?<=TOTAL\s).*
I can get the correct line but not sure how to extract the percentage part of it. This needs to be achieved in a regular expression due that I don’t have access to any tool
Thanks

You can use a regex like this:
TOTAL.*?(\d+)%
Working demo
Or if you want to capture the % then
TOTAL.*?(\d+%)
Then grab the content from the capturing group $1

str=<<_
Name Stmts Miss Cover Missing
---------------------------------------------------------------
src/global_information.py 8 1 88% 6
src/settings.py 38 0 100%
src/storage_backends.py 4 4 0% 1-5
src/urls.py 8 0 100%
users/admin.py 1 0 100%
users/apps.py 3 3 0% 1-5
users/forms.py 5 0 100%
users/models.py 1 0 100%
users/tests/tests_views_urls.py 5 0 100%
users/urls.py 5 0 100%
users/views.py 1 1 0% 1
---------------------------------------------------------------
TOTAL 79 9 89%
_
str[/^TOTAL.*?\K\d+%/] #=> "89%
\K means discard everything matched so far. The non-greedy modifier ? in .*? is needed. Without it the match before \K would end with the next-to-last digit in the total percentage (here the "8" in "89%", the "3" in "1234%").

awk '/TOTAL/{ print $4}' <INPUT_FILE>

Emacs is slow and lags when open link

My emacs is sometimes slow. Especially when I open link under cursor.
I have run profiler. What next to do with it? How to improve performance?
Results is as below.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
- command-execute 113 52%
- call-interactively 113 52%
- evil-ex 30 13%
- read-from-minibuffer 23 10%
+ command-execute 6 2%
+ elscreen-run-screen-update-hook 2 0%
redisplay_internal (C function) 1 0%
- eval-expression 28 12%
- eval 28 12%
- debug 28 12%
- recursive-edit 24 11%
- command-execute 16 7%
- call-interactively 16 7%
+ evil-ex 7 3%
+ byte-code 4 1%
+ evil-mouse-drag-region 3 1%
+ org-open-at-point 1 0%
+ mouse-set-point 1 0%
- evil-mouse-drag-region 14 6%
- evil-mouse-drag-track 14 6%
- eval 14 6%
- track-mouse 14 6%
- byte-code 14 6%
- read-event 9 4%
+ redisplay_internal (C function) 1 0%
- org-agenda 10 4%
- byte-code 10 4%
- org-agenda-get-restriction-and-command 10 4%
- byte-code 10 4%
read-char-exclusive 8 3%
- byte-code 9 4%
- read-file-name 9 4%
+ read-file-name-default 9 4%
+ minibuffer-complete 5 2%
+ org-open-at-point 4 1%
+ org-todo 4 1%
+ org-refile 3 1%
+ evil-previous-line 2 0%
+ profiler-report-write-profile 2 0%
+ profiler-report 1 0%
+ org-ctrl-c-ctrl-c 1 0%
- timer-event-handler 62 28%
- byte-code 62 28%
- apply 62 28%
- tooltip-timeout 62 28%
- run-hook-with-args-until-success 62 28%
- tooltip-help-tips 62 28%
- tooltip-show 62 28%
- byte-code 62 28%
- x-show-tip 59 27%
- face-set-after-frame-default 59 27%
- byte-code 59 27%
- face-spec-recalc 57 26%
- make-face-x-resource-internal 54 24%
- set-face-attributes-from-resources 53 24%
- set-face-attribute-from-resource 50 23%
+ face-name 4 1%
+ face-spec-set-2 2 0%
- ... 26 11%
Automatic GC 25 11%
+ vc-backend 1 0%
+ elscreen-run-screen-update-hook 5 2%
mouse-fixup-help-message 4 1%
+ redisplay_internal (C function) 4 1%
and 2 0%
+ tooltip-show-help 1 0%
Update 1
For some time no longer I observe this problem.

How to tell sed to make changes only to 1st column of an output

So i have an output with 6 columns, and what i want to do is ONLY for the first column to delete everything before the last semicolon " / ".
What i have so far is this
df -k | awk '{print $1}' | sed 's#.*/##'
but i dont want to use the awk there in order to take only the first column like this, i want to find a way that i can tell to sed to make these changes to the first column only.
So the original output is like this:
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0d0s0 12324895 5082804 7118843 42% /
/devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 8998420 1052 8997368 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
/usr/lib/libc/libc_hwcap1.so.1 12324895 5082804 7118843 42% /lib/libc.so.1
fd 0 0 0 0% /dev/fd
/dev/dsk/c0d0s3 4136995 146364 3949262 4% /var
swap 9145604 148236 8997368 2% /tmp
swap 8997400 32 8997368 1% /var/run
and i want the first column to look like this:
Filesystem
c0d0s0
devices
ctfs
proc
mnttab
swap
objfs
sharefs
libc_hwcap1.so.1
fd
c0d0s3
swap
swap

$ awk '{sub(/.*\//,"",$1)}1' file
Filesystem kbytes used avail capacity Mounted on
c0d0s0 12324895 5082804 7118843 42% /
devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 8998420 1052 8997368 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
libc_hwcap1.so.1 12324895 5082804 7118843 42% /lib/libc.so.1
fd 0 0 0 0% /dev/fd
c0d0s3 4136995 146364 3949262 4% /var
swap 9145604 148236 8997368 2% /tmp
swap 8997400 32 8997368 1% /var/run
.
$ awk 'NR==1{sub(/Mounted on/,"Mounted_on")} {sub(/.*\//,"",$1)}1' file | column -t
Filesystem kbytes used avail capacity Mounted_on
c0d0s0 12324895 5082804 7118843 42% /
devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 8998420 1052 8997368 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
libc_hwcap1.so.1 12324895 5082804 7118843 42% /lib/libc.so.1
fd 0 0 0 0% /dev/fd
c0d0s3 4136995 146364 3949262 4% /var
swap 9145604 148236 8997368 2% /tmp
swap 8997400 32 8997368 1% /var/run

Just split the first field in /-slices and replace the first field with the last of these slices whenever it occurs as the first part of the line:
awk '{n=split($1,a,"/"); gsub("^"$1,a[n])}1' file
Test
$ awk '{n=split($1,a,"/"); gsub("^"$1,a[n])}1' file
Filesystem kbytes used avail capacity Mounted on
c0d0s0 12324895 5082804 7118843 42% /
devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 8998420 1052 8997368 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
libc_hwcap1.so.1 12324895 5082804 7118843 42% /lib/libc.so.1
fd 0 0 0 0% /dev/fd
c0d0s3 4136995 146364 3949262 4% /var
swap 9145604 148236 8997368 2% /tmp
swap 8997400 32 8997368 1% /var/run
Note awk '{n=split($1,a,"/"); $1=a[n]}1' would also work, only that the format would be lost because the full string gets recalculated when you modify one of its fields.

df -k | awk '{print $1}' | perl -pe 's/^[\S]*\///g'
or
df -k | awk '{print $1}' |perl -lane '$F[0]=~s/.*\///g;print "#F"'

df -k|awk -F' ' '{print $1}'|sed "s/.*\///g"

sed solution
$ sed -r 's~.*/(\S+) ~\1~' file
or
$ sed -r 's~.*/(\S+)\s~\1~' file

Find a local minimum in a special graph

The issue at hand looks easy, but I could not find an easy solution so far.
I've got a histogram describing the value distributing of an array of floats, roughly looking like this:
As you can see, there is a local maximum near 0, which keeps falling down to a local minimum, then rising quickly to a plateau, and in the end falling to 0. I would like to detect the local minimum.
In practice, the histogram is not as smooth:
There are lots of spikes, and the local minimum may be stretched and uneven. I'm not sure how to tackle this problem.
There is little domain knowledge. The first max may even be higher than the second max. There may be spikes in any direction, values may be as low as 0.
This is a real life sample taken from 8 distinct runs. It's scaled to 0 - 10 to make it easier to understand.
0: 22% 12% 19% 17% 6% 5% 6% 5%
1: 3% 2% 1% 1% 4% 1% 4% 1%
2: 6% 2% 13% 5% 0% 2% 0% 2%
3: 62% 62% 52% 42% 2% 5% 2% 5%
4: 4% 19% 12% 28% 10% 13% 10% 13%
5: 0% 0% 3% 29% 30% 29% 30%
6: 37% 31% 37% 30%
7: 1% 7% 1% 7%
8: 6% 1% 6% 1%
9:
10:
Values rounded down. Missing values denote no occurrence of any value.
Explanation of the first line:
0: 22% the initial max
1: 3% local min
2: 6% still min
3: 62% plateau max
4: 4% second min
5: 0% 0
6: no more values
7:
8:
9:
10:
For reference, a list of the same data, this time scaled to 0 - 100 (there were no values in the 90-100 range at all). I messed up on the formatting, but it should give a rough idea.
0: 0% 0% 0% 1% 0% 0% 0% 0%
1: 0% 1% 1% 3% 0% 0% 0% 0%
2: 1% 2% 1% 3% 0% 0% 0% 0%
3: 4% 2% 3% 3% 0% 1% 0% 1%
4: 6% 1% 3% 2% 0% 0% 0% 0%
5: 2% 0% 3% 1% 0% 0% 0% 0%
6: 1% 0% 2% 0% 0% 0% 0% 0%
7: 1% 0% 1% 0% 0% 0% 0% 0%
8: 1% 0% 1% 0% 0% 0% 0% 0%
9: 1% 0% 1% 0% 1% 0% 1% 0%
10: 1% 0% 0% 0% 1% 0% 1% 0%
11: 0% 0% 0% 0% 0% 0% 0% 0%
12: 0% 0% 0% 0% 0% 0% 0% 0%
13: 0% 0% 0% 0% 0% 0% 0% 0%
14: 0% 0% 0% 0% 0% 0% 0% 0%
15: 0% 0% 0% 0% 0% 0% 0% 0%
16: 0% 0% 0% 0% 0% 0% 0% 0%
17: 0% 0% 0% 0% 0% 0% 0% 0%
18: 0% 0% 0% 0% 0% 0% 0% 0%
19: 0% 0% 0% 0% 0% 0% 0% 0%
20: 0% 0% 0% 0% 0% 0% 0% 0%
21: 0% 0% 0% 0% 0% 0% 0% 0%
22: 0% 0% 0% 0% 0% 0% 0% 0%
23: 0% 0% 0% 0% 0% 0% 0% 0%
24: 0% 0% 1% 0% 0% 0% 0% 0%
25: 0% 0% 1% 0% 0% 0% 0% 0%
26: 0% 0% 1% 0% 0% 0% 0% 0%
27: 0% 0% 1% 0% 0% 0% 0% 0%
28: 1% 0% 2% 1% 0% 0% 0% 0%
29: 3% 0% 2% 2% 0% 0% 0% 0%
30: 7% 1% 3% 2% 0% 0% 0% 0%
31: 10% 2% 4% 3% 0% 0% 0% 0%
32: 10% 3% 4% 4% 0% 0% 0% 0%
33: 6% 6% 5% 5% 0% 0% 0% 0%
34: 5% 5% 4% 4% 0% 0% 0% 0%
35: 5% 8% 6% 3% 0% 0% 0% 0%
36: 5% 10% 6% 4% 0% 0% 0% 0%
37: 5% 9% 5% 3% 0% 0% 0% 0%
38: 3% 8% 5% 5% 0% 0% 0% 0%
39: 2% 5% 5% 5% 0% 0% 0% 0%
40: 1% 4% 4% 5% 0% 1% 0% 1%
41: 1% 3% 2% 5% 0% 1% 0% 1%
42: 0% 1% 1% 4% 0% 0% 0% 0%
43: 0% 2% 0% 4% 1% 1% 1% 1%
44: 0% 1% 0% 3% 1% 1% 1% 1%
45: 0% 1% 0% 1% 0% 1% 0% 1%
46: 0% 1% 0% 1% 1% 1% 1% 1%
47: 0% 1% 0% 0% 1% 1% 1% 1%
48: 0% 1% 0% 0% 1% 1% 1% 1%
50: 0% 0% 0% 1% 1% 1% 1% 1%
50: 0% 1% 1% 1% 1% 1%
51: 0% 0% 2% 1% 2% 1%
52: 0% 1% 2% 1% 2% 1%
53: 0% 0% 4% 2% 4% 2%
54: 0% 2% 2% 2% 2%
55: 0% 2% 2% 2% 2%
56: 0% 2% 3% 2% 3%
57: 0% 2% 4% 2% 4%
58: 4% 6% 4% 6%
59: 3% 3% 3% 3%
60: 5% 5% 5% 5%
61: 5% 7% 5% 7%
62: 3% 5% 3% 5%
63: 4% 3% 4% 3%
64: 5% 2% 5% 2%
65: 3% 2% 2% 2%
66: 5% 1% 5% 1%
67: 1% 0% 1% 0%
68: 1% 0% 1% 0%
69: 0% 1% 0% 1%
70: 0% 0% 0% 0%
71: 0% 0% 0% 0%
72: 0% 0% 0% 0%
73: 0% 1% 0% 1%
74: 0% 0% 0% 0%
75: 0% 0% 0% 0%
76: 0% 1% 0% 1%
77: 0% 0% 0% 0%
78: 0% 0% 0% 0%
79: 0% 0% 0% 0%
80: 0% 0% 0% 1%
81: 0% 0% 0% 0%
82: 0% 0% 0% 0%
83: 0% 0% 0% 0%
84: 0% 0% 0% 0%
85: 1% 1%
86: 0% 0%
87: 1% 1%
88: 1% 1%
89: 0% 0%

Your "true" histogram is low frequency. Your noise is high frequency. Low-pass filtering the data with an appropriate bandwidth filter will get rid of most of the noise.

Here's an algoithm:
Smooth your data set by calculating
a moving average for a small window.
Test your smoothed data for local minima (i.e. any single datum
that it is smaller than its
neighbours.
If there are more than two local minima, increase the window size, and goto step 1.
Update:
Having looked at the sample data you posted, I've realised that you need to detect minimal plateaus rather than just individual points, so step two in the algorithm should be tweaked to identify a point as part of a minimum if there are no neighbours with smaller values between the nearest higher value neighbours on either side. Then when counting minima in step 3, a minimal plateau should count as a single minimum.
I've tested this algorithm on your example datasets and it performs well, picking minima at: 18, 12, 15, 13, 23, 20, 23and20 for your datasets respectively.

a possible heuristic: using spline approximation to smooth the histogram, and make it polynomical-like and then look for a local minimum.
note that this is only a heuristic solution and might fail... but I think will provide a good solution for most cases.

This actually sounds rather like histogram-based image segmentation to me (although this is not an image, so it's really just histogram segmentation). Sounds weird, but bear with me.
Is what's important about the minimum the fact that it's a minimum, or that it divides the small maximum from the large maximum? If it's the fact that it divides the maxima, then segmentation is definitely what you want.
Have a look at K-means clustering. You'd have two clusters. It's not a terribly complicated procedure, but Wikipedia (and other sources) do a much better job of explaining it than i could, so i'll leave it to them.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Rounding integers to nearest 5 - bash

Using the example below, anything below 2.5% will round down to 0%. How do I get it to round down to 1% and not 0%? s='1.5% 2.5% 3.5% 25% 27% 34% 68%' awk '{for (i=1; i<=NF; i++) $i = int( ($i+2) / 5) * 5 "%"} 1' <<< "$s" 0% 0% 5% 25% 25% 35% 70%

Related

Filter text disappearing after concat, why?

Ruby Regular expression to get percentage from specific line

Emacs is slow and lags when open link

How to tell sed to make changes only to 1st column of an output

Find a local minimum in a special graph

Categories

Resources