I have the following ansible playbook code which prints the some metrices of one remote server. Here I want to print the output in the csv file with the exact msg format shown in the output. How to print this in csv file.
Ansible playbook:
tasks:
- name: Get ip address of the remote node
ansible.builtin.shell: hostname -i | awk '{print $2}'
register: ipaddr
- name: Check uptime
shell: uptime | cut -d',' -f1
register: uptime_op
- debug:
msg: "{{uptime_op.stdout_lines}}"
- name: Get lsbkl value
shell: lsblk
register: lsblk_output
- debug:
msg: "{{lsblk_output.stdout_lines}}"
- name: Get Disc space value
shell: df -h
register: df_output
- debug:
msg: "{{df_output.stdout_lines}}"
output:
PLAY [test_host] *************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************
Tuesday 20 December 2022 10:07:07 -0800 (0:00:00.017) 0:00:00.017 ******
ok: [hostname.domain.com]
TASK [Get ip address of the remote node] ************************************************************************************
Tuesday 20 December 2022 10:07:14 -0800 (0:00:07.399) 0:00:07.417 ******
changed: [hostname.domain.com]
TASK [Check uptime] *********************************************************************************************************
Tuesday 20 December 2022 10:07:18 -0800 (0:00:03.860) 0:00:11.278 ******
changed: [hostname.domain.com]
TASK [debug] ****************************************************************************************************************
Tuesday 20 December 2022 10:07:22 -0800 (0:00:03.781) 0:00:15.059 ******
ok: [hostname.domain.com] => {
"msg": [
" 23:37pm up 359 days 5:53"
]
}
TASK [Get lsbkl value] ******************************************************************************************************
Tuesday 20 December 2022 10:07:22 -0800 (0:00:00.086) 0:00:15.145 ******
changed: [hostname.domain.com]
TASK [debug] ****************************************************************************************************************
Tuesday 20 December 2022 10:07:26 -0800 (0:00:03.815) 0:00:18.960 ******
ok: [hostname.domain.com] => {
"msg": [
"NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT",
"sda 8:0 0 1.1T 0 disk ",
"├─sda1 8:1 0 15G 0 part /",
"├─sda2 8:2 0 518M 0 part /boot/efi",
"├─sda3 8:3 0 1K 0 part ",
"├─sda5 8:5 0 2G 0 part /ctools",
"├─sda6 8:6 0 10G 0 part /var",
"├─sda7 8:7 0 48G 0 part [SWAP]",
"├─sda8 8:8 0 250M 0 part /dsm",
"├─sda9 8:9 0 501M 0 part /var/cfengine",
"├─sda10 8:10 0 10G 0 part /tmp",
"└─sda11 8:11 0 1T 0 part /infrastructure",
"sdb 8:16 0 1.8T 0 disk ",
"├─sdb1 8:17 0 484.3G 0 part /p4depot",
"├─sdb2 8:18 0 931.3G 0 part /p4meta",
"└─sdb3 8:19 0 372.9G 0 part /p4log"
]
}
TASK [Get Disc space value] *************************************************************************************************
Tuesday 20 December 2022 10:07:26 -0800 (0:00:00.088) 0:00:19.049 ******
changed: [hostname.domain.com]
TASK [debug] ****************************************************************************************************************
Tuesday 20 December 2022 10:07:30 -0800 (0:00:03.787) 0:00:22.836 ******
ok: [hostname.domain.com] => {
"msg": [
"Filesystem Size Used Avail Use% Mounted on",
"devtmpfs 189G 8.0K 189G 1% /dev",
"tmpfs 189G 0 189G 0% /dev/shm",
"tmpfs 189G 4.0G 185G 3% /run",
"tmpfs 189G 0 189G 0% /sys/fs/cgroup",
"/dev/sda1 15G 11G 4.8G 69% /",
"/dev/sda2 518M 0 518M 0% /boot/efi",
"/dev/sda10 10G 83M 10G 1% /tmp",
"/dev/sda11 1.1T 34M 1.1T 1% /infrastructure",
"/dev/sda8 247M 62M 185M 25% /dsm",
"/dev/sda6 10G 1.5G 8.6G 15% /var",
"/dev/sda9 498M 119M 379M 24% /var/cfengine",
"/dev/sdb2 931G 30G 902G 4% /p4meta",
"/dev/sdb3 373G 61M 373G 1% /p4log",
"/dev/sdb1 485G 112G 373G 23% /p4depot",
"/dev/sda5 2.1G 3.6M 1.8G 1% /ctools",
"tmpfs 1.0G 0 1.0G 0% /dsm/tmp/dsmbg.tmpfs",
"10.223.232.121:/new_itools 951G 497G 454G 53% /nfs/site/itools",
"incfs03n03b-04:/common_usr_local 11G 1.2G 8.9G 12% /nfs/iind/local",
"incfs04n08b-1:/prod 513M 1.5M 512M 1% /nfs/iind/proj/prod",
"incfs06n11b-1:/home0 351G 149G 202G 43% /nfs/iind/disks/home23",
"incfs02n10a-1:/iind_disks_home24 501G 59G 442G 12% /nfs/iind/disks/home24",
"incfs06n04a-05:/iind_gen_adm 301G 176G 125G 59% /nfs/site/gen/adm",
"incfs03n06b-1:/ba_ctg_home01 301G 263G 38G 88% /nfs/iind/disks/home110",
"inc08n07b-1:/home_tree 11G 79M 10G 1% /nfs/iind/home",
"incfs06n10a-1:/iind_gen_adm_netmeter_m 81G 28G 53G 35% /nfs/iind/disks/iind_gen_adm_netmeter",
"tmpfs 38G 0 38G 0% /run/user/37124",
"incfs07n05b-1:/common 201G 158G 43G 79% /nfs/site/disks/iind_gen_adm_common",
"tmpfs 38G 0 38G 0% /run/user/12142325"
]
}
PLAY RECAP ******************************************************************************************************************
hostname.domain.com : ok=8 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Attaching the expected csv file how it should look like.
Given the registered data df_output.stdout_lines there must be also df_output.stdout attribute. Use the filter community.general.jc and parse the registered data
- set_fact:
df: "{{ df_output.stdout|community.general.jc('df') }}"
gives
df:
- available: 189
filesystem: devtmpfs
mounted_on: /dev
size: 189G
use_percent: 1
used: 8
- available: 189
filesystem: tmpfs
mounted_on: /dev/shm
size: 189G
use_percent: 0
used: 0
...
Then, for each host create a CSV file on the controller. For example,
- copy:
dest: "/tmp/ansible_df_{{ item }}.csv"
content: |
{{ df_output.stdout_lines.0.split()[:-1]|join(',') }}
{% for m in hostvars[item]['df'] %}
{{ m.filesystem }},{{ m.size }},{{ m.used }},{{m.available }},{{ m.use_percent }},{{ m.mounted_on }}
{% endfor %}
loop: "{{ ansible_play_hosts }}"
run_once: true
delegate_to: localhost
will create
shel> cat /tmp/ansible_df_localhost.csv
Filesystem,Size,Used,Avail,Use%,Mounted
devtmpfs,189G,8,189,1,/dev
tmpfs,189G,0,189,0,/dev/shm
tmpfs,189G,4,185,3,/run
tmpfs,189G,0,189,0,/sys/fs/cgroup
/dev/sda1,15G,11,4,69,/
/dev/sda2,518M,0,518,0,/boot/efi
/dev/sda10,10G,83,10,1,/tmp
/dev/sda11,1.1T,34,1,1,/infrastructure
/dev/sda8,247M,62,185,25,/dsm
/dev/sda6,10G,1,8,15,/var
/dev/sda9,498M,119,379,24,/var/cfengine
/dev/sdb2,931G,30,902,4,/p4meta
/dev/sdb3,373G,61,373,1,/p4log
/dev/sdb1,485G,112,373,23,/p4depot
/dev/sda5,2.1G,3,1,1,/ctools
tmpfs,1.0G,0,1,0,/dsm/tmp/dsmbg.tmpfs
10.223.232.121:/new_itools,951G,497,454,53,/nfs/site/itools
incfs03n03b-04:/common_usr_local,11G,1,8,12,/nfs/iind/local
incfs04n08b-1:/prod,513M,1,512,1,/nfs/iind/proj/prod
incfs06n11b-1:/home0,351G,149,202,43,/nfs/iind/disks/home23
incfs02n10a-1:/iind_disks_home24,501G,59,442,12,/nfs/iind/disks/home24
incfs06n04a-05:/iind_gen_adm,301G,176,125,59,/nfs/site/gen/adm
incfs03n06b-1:/ba_ctg_home01,301G,263,38,88,/nfs/iind/disks/home110
inc08n07b-1:/home_tree,11G,79,10,1,/nfs/iind/home
incfs06n10a-1:/iind_gen_adm_netmeter_m,81G,28,53,35,/nfs/iind/disks/iind_gen_adm_netmeter
tmpfs,38G,0,38,0,/run/user/37124
incfs07n05b-1:/common,201G,158,43,79,/nfs/site/disks/iind_gen_adm_common
tmpfs,38G,0,38,0,/run/user/12142325
Given the data for testing
shell> cat data.json
{
"df_stdout_lines": [
"Filesystem Size Used Avail Use% Mounted on",
"devtmpfs 189G 8.0K 189G 1% /dev",
"tmpfs 189G 0 189G 0% /dev/shm",
"tmpfs 189G 4.0G 185G 3% /run",
"tmpfs 189G 0 189G 0% /sys/fs/cgroup",
"/dev/sda1 15G 11G 4.8G 69% /",
"/dev/sda2 518M 0 518M 0% /boot/efi",
"/dev/sda10 10G 83M 10G 1% /tmp",
"/dev/sda11 1.1T 34M 1.1T 1% /infrastructure",
"/dev/sda8 247M 62M 185M 25% /dsm",
"/dev/sda6 10G 1.5G 8.6G 15% /var",
"/dev/sda9 498M 119M 379M 24% /var/cfengine",
"/dev/sdb2 931G 30G 902G 4% /p4meta",
"/dev/sdb3 373G 61M 373G 1% /p4log",
"/dev/sdb1 485G 112G 373G 23% /p4depot",
"/dev/sda5 2.1G 3.6M 1.8G 1% /ctools",
"tmpfs 1.0G 0 1.0G 0% /dsm/tmp/dsmbg.tmpfs",
"10.223.232.121:/new_itools 951G 497G 454G 53% /nfs/site/itools",
"incfs03n03b-04:/common_usr_local 11G 1.2G 8.9G 12% /nfs/iind/local",
"incfs04n08b-1:/prod 513M 1.5M 512M 1% /nfs/iind/proj/prod",
"incfs06n11b-1:/home0 351G 149G 202G 43% /nfs/iind/disks/home23",
"incfs02n10a-1:/iind_disks_home24 501G 59G 442G 12% /nfs/iind/disks/home24",
"incfs06n04a-05:/iind_gen_adm 301G 176G 125G 59% /nfs/site/gen/adm",
"incfs03n06b-1:/ba_ctg_home01 301G 263G 38G 88% /nfs/iind/disks/home110",
"inc08n07b-1:/home_tree 11G 79M 10G 1% /nfs/iind/home",
"incfs06n10a-1:/iind_gen_adm_netmeter_m 81G 28G 53G 35% /nfs/iind/disks/iind_gen_adm_netmeter",
"tmpfs 38G 0 38G 0% /run/user/37124",
"incfs07n05b-1:/common 201G 158G 43G 79% /nfs/site/disks/iind_gen_adm_common",
"tmpfs 38G 0 38G 0% /run/user/12142325"
]
}
Example of a complete playbook for testing
- hosts: localhost
vars_files:
- data.json
vars:
df_output:
stdout: |
Filesystem Size Used Avail Use% Mounted on
devtmpfs 189G 8.0K 189G 1% /dev
tmpfs 189G 0 189G 0% /dev/shm
tmpfs 189G 4.0G 185G 3% /run
tmpfs 189G 0 189G 0% /sys/fs/cgroup
/dev/sda1 15G 11G 4.8G 69% /
/dev/sda2 518M 0 518M 0% /boot/efi
/dev/sda10 10G 83M 10G 1% /tmp
/dev/sda11 1.1T 34M 1.1T 1% /infrastructure
/dev/sda8 247M 62M 185M 25% /dsm
/dev/sda6 10G 1.5G 8.6G 15% /var
/dev/sda9 498M 119M 379M 24% /var/cfengine
/dev/sdb2 931G 30G 902G 4% /p4meta
/dev/sdb3 373G 61M 373G 1% /p4log
/dev/sdb1 485G 112G 373G 23% /p4depot
/dev/sda5 2.1G 3.6M 1.8G 1% /ctools
tmpfs 1.0G 0 1.0G 0% /dsm/tmp/dsmbg.tmpfs
10.223.232.121:/new_itools 951G 497G 454G 53% /nfs/site/itools
incfs03n03b-04:/common_usr_local 11G 1.2G 8.9G 12% /nfs/iind/local
incfs04n08b-1:/prod 513M 1.5M 512M 1% /nfs/iind/proj/prod
incfs06n11b-1:/home0 351G 149G 202G 43% /nfs/iind/disks/home23
incfs02n10a-1:/iind_disks_home24 501G 59G 442G 12% /nfs/iind/disks/home24
incfs06n04a-05:/iind_gen_adm 301G 176G 125G 59% /nfs/site/gen/adm
incfs03n06b-1:/ba_ctg_home01 301G 263G 38G 88% /nfs/iind/disks/home110
inc08n07b-1:/home_tree 11G 79M 10G 1% /nfs/iind/home
incfs06n10a-1:/iind_gen_adm_netmeter_m 81G 28G 53G 35% /nfs/iind/disks/iind_gen_adm_netmeter
tmpfs 38G 0 38G 0% /run/user/37124
incfs07n05b-1:/common 201G 158G 43G 79% /nfs/site/disks/iind_gen_adm_common
tmpfs 38G 0 38G 0% /run/user/12142325
stdout_lines: "{{ df_stdout_lines }}"
tasks:
- debug:
var: df_output.stdout_lines
- debug:
var: df_output.stdout
- set_fact:
df: "{{ df_output.stdout|community.general.jc('df') }}"
- debug:
var: df
- copy:
dest: "/tmp/ansible_df_{{ item }}.csv"
content: |
{{ df_output.stdout_lines.0.split()[:-1]|join(',') }}
{% for m in hostvars[item]['df'] %}
{{ m.filesystem }},{{ m.size }},{{ m.used }},{{m.available }},{{ m.use_percent }},{{ m.mounted_on }}
{% endfor %}
loop: "{{ ansible_play_hosts }}"
run_once: true
delegate_to: localhost
So i have an output with 6 columns, and what i want to do is ONLY for the first column to delete everything before the last semicolon " / ".
What i have so far is this
df -k | awk '{print $1}' | sed 's#.*/##'
but i dont want to use the awk there in order to take only the first column like this, i want to find a way that i can tell to sed to make these changes to the first column only.
So the original output is like this:
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0d0s0 12324895 5082804 7118843 42% /
/devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 8998420 1052 8997368 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
/usr/lib/libc/libc_hwcap1.so.1 12324895 5082804 7118843 42% /lib/libc.so.1
fd 0 0 0 0% /dev/fd
/dev/dsk/c0d0s3 4136995 146364 3949262 4% /var
swap 9145604 148236 8997368 2% /tmp
swap 8997400 32 8997368 1% /var/run
and i want the first column to look like this:
Filesystem
c0d0s0
devices
ctfs
proc
mnttab
swap
objfs
sharefs
libc_hwcap1.so.1
fd
c0d0s3
swap
swap
$ awk '{sub(/.*\//,"",$1)}1' file
Filesystem kbytes used avail capacity Mounted on
c0d0s0 12324895 5082804 7118843 42% /
devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 8998420 1052 8997368 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
libc_hwcap1.so.1 12324895 5082804 7118843 42% /lib/libc.so.1
fd 0 0 0 0% /dev/fd
c0d0s3 4136995 146364 3949262 4% /var
swap 9145604 148236 8997368 2% /tmp
swap 8997400 32 8997368 1% /var/run
.
$ awk 'NR==1{sub(/Mounted on/,"Mounted_on")} {sub(/.*\//,"",$1)}1' file | column -t
Filesystem kbytes used avail capacity Mounted_on
c0d0s0 12324895 5082804 7118843 42% /
devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 8998420 1052 8997368 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
libc_hwcap1.so.1 12324895 5082804 7118843 42% /lib/libc.so.1
fd 0 0 0 0% /dev/fd
c0d0s3 4136995 146364 3949262 4% /var
swap 9145604 148236 8997368 2% /tmp
swap 8997400 32 8997368 1% /var/run
Just split the first field in /-slices and replace the first field with the last of these slices whenever it occurs as the first part of the line:
awk '{n=split($1,a,"/"); gsub("^"$1,a[n])}1' file
Test
$ awk '{n=split($1,a,"/"); gsub("^"$1,a[n])}1' file
Filesystem kbytes used avail capacity Mounted on
c0d0s0 12324895 5082804 7118843 42% /
devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 8998420 1052 8997368 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
libc_hwcap1.so.1 12324895 5082804 7118843 42% /lib/libc.so.1
fd 0 0 0 0% /dev/fd
c0d0s3 4136995 146364 3949262 4% /var
swap 9145604 148236 8997368 2% /tmp
swap 8997400 32 8997368 1% /var/run
Note awk '{n=split($1,a,"/"); $1=a[n]}1' would also work, only that the format would be lost because the full string gets recalculated when you modify one of its fields.
df -k | awk '{print $1}' | perl -pe 's/^[\S]*\///g'
or
df -k | awk '{print $1}' |perl -lane '$F[0]=~s/.*\///g;print "#F"'
df -k|awk -F' ' '{print $1}'|sed "s/.*\///g"
sed solution
$ sed -r 's~.*/(\S+) ~\1~' file
or
$ sed -r 's~.*/(\S+)\s~\1~' file
The issue at hand looks easy, but I could not find an easy solution so far.
I've got a histogram describing the value distributing of an array of floats, roughly looking like this:
As you can see, there is a local maximum near 0, which keeps falling down to a local minimum, then rising quickly to a plateau, and in the end falling to 0. I would like to detect the local minimum.
In practice, the histogram is not as smooth:
There are lots of spikes, and the local minimum may be stretched and uneven. I'm not sure how to tackle this problem.
There is little domain knowledge. The first max may even be higher than the second max. There may be spikes in any direction, values may be as low as 0.
This is a real life sample taken from 8 distinct runs. It's scaled to 0 - 10 to make it easier to understand.
0: 22% 12% 19% 17% 6% 5% 6% 5%
1: 3% 2% 1% 1% 4% 1% 4% 1%
2: 6% 2% 13% 5% 0% 2% 0% 2%
3: 62% 62% 52% 42% 2% 5% 2% 5%
4: 4% 19% 12% 28% 10% 13% 10% 13%
5: 0% 0% 3% 29% 30% 29% 30%
6: 37% 31% 37% 30%
7: 1% 7% 1% 7%
8: 6% 1% 6% 1%
9:
10:
Values rounded down. Missing values denote no occurrence of any value.
Explanation of the first line:
0: 22% the initial max
1: 3% local min
2: 6% still min
3: 62% plateau max
4: 4% second min
5: 0% 0
6: no more values
7:
8:
9:
10:
For reference, a list of the same data, this time scaled to 0 - 100 (there were no values in the 90-100 range at all). I messed up on the formatting, but it should give a rough idea.
0: 0% 0% 0% 1% 0% 0% 0% 0%
1: 0% 1% 1% 3% 0% 0% 0% 0%
2: 1% 2% 1% 3% 0% 0% 0% 0%
3: 4% 2% 3% 3% 0% 1% 0% 1%
4: 6% 1% 3% 2% 0% 0% 0% 0%
5: 2% 0% 3% 1% 0% 0% 0% 0%
6: 1% 0% 2% 0% 0% 0% 0% 0%
7: 1% 0% 1% 0% 0% 0% 0% 0%
8: 1% 0% 1% 0% 0% 0% 0% 0%
9: 1% 0% 1% 0% 1% 0% 1% 0%
10: 1% 0% 0% 0% 1% 0% 1% 0%
11: 0% 0% 0% 0% 0% 0% 0% 0%
12: 0% 0% 0% 0% 0% 0% 0% 0%
13: 0% 0% 0% 0% 0% 0% 0% 0%
14: 0% 0% 0% 0% 0% 0% 0% 0%
15: 0% 0% 0% 0% 0% 0% 0% 0%
16: 0% 0% 0% 0% 0% 0% 0% 0%
17: 0% 0% 0% 0% 0% 0% 0% 0%
18: 0% 0% 0% 0% 0% 0% 0% 0%
19: 0% 0% 0% 0% 0% 0% 0% 0%
20: 0% 0% 0% 0% 0% 0% 0% 0%
21: 0% 0% 0% 0% 0% 0% 0% 0%
22: 0% 0% 0% 0% 0% 0% 0% 0%
23: 0% 0% 0% 0% 0% 0% 0% 0%
24: 0% 0% 1% 0% 0% 0% 0% 0%
25: 0% 0% 1% 0% 0% 0% 0% 0%
26: 0% 0% 1% 0% 0% 0% 0% 0%
27: 0% 0% 1% 0% 0% 0% 0% 0%
28: 1% 0% 2% 1% 0% 0% 0% 0%
29: 3% 0% 2% 2% 0% 0% 0% 0%
30: 7% 1% 3% 2% 0% 0% 0% 0%
31: 10% 2% 4% 3% 0% 0% 0% 0%
32: 10% 3% 4% 4% 0% 0% 0% 0%
33: 6% 6% 5% 5% 0% 0% 0% 0%
34: 5% 5% 4% 4% 0% 0% 0% 0%
35: 5% 8% 6% 3% 0% 0% 0% 0%
36: 5% 10% 6% 4% 0% 0% 0% 0%
37: 5% 9% 5% 3% 0% 0% 0% 0%
38: 3% 8% 5% 5% 0% 0% 0% 0%
39: 2% 5% 5% 5% 0% 0% 0% 0%
40: 1% 4% 4% 5% 0% 1% 0% 1%
41: 1% 3% 2% 5% 0% 1% 0% 1%
42: 0% 1% 1% 4% 0% 0% 0% 0%
43: 0% 2% 0% 4% 1% 1% 1% 1%
44: 0% 1% 0% 3% 1% 1% 1% 1%
45: 0% 1% 0% 1% 0% 1% 0% 1%
46: 0% 1% 0% 1% 1% 1% 1% 1%
47: 0% 1% 0% 0% 1% 1% 1% 1%
48: 0% 1% 0% 0% 1% 1% 1% 1%
50: 0% 0% 0% 1% 1% 1% 1% 1%
50: 0% 1% 1% 1% 1% 1%
51: 0% 0% 2% 1% 2% 1%
52: 0% 1% 2% 1% 2% 1%
53: 0% 0% 4% 2% 4% 2%
54: 0% 2% 2% 2% 2%
55: 0% 2% 2% 2% 2%
56: 0% 2% 3% 2% 3%
57: 0% 2% 4% 2% 4%
58: 4% 6% 4% 6%
59: 3% 3% 3% 3%
60: 5% 5% 5% 5%
61: 5% 7% 5% 7%
62: 3% 5% 3% 5%
63: 4% 3% 4% 3%
64: 5% 2% 5% 2%
65: 3% 2% 2% 2%
66: 5% 1% 5% 1%
67: 1% 0% 1% 0%
68: 1% 0% 1% 0%
69: 0% 1% 0% 1%
70: 0% 0% 0% 0%
71: 0% 0% 0% 0%
72: 0% 0% 0% 0%
73: 0% 1% 0% 1%
74: 0% 0% 0% 0%
75: 0% 0% 0% 0%
76: 0% 1% 0% 1%
77: 0% 0% 0% 0%
78: 0% 0% 0% 0%
79: 0% 0% 0% 0%
80: 0% 0% 0% 1%
81: 0% 0% 0% 0%
82: 0% 0% 0% 0%
83: 0% 0% 0% 0%
84: 0% 0% 0% 0%
85: 1% 1%
86: 0% 0%
87: 1% 1%
88: 1% 1%
89: 0% 0%
Your "true" histogram is low frequency. Your noise is high frequency. Low-pass filtering the data with an appropriate bandwidth filter will get rid of most of the noise.
Here's an algoithm:
Smooth your data set by calculating
a moving average for a small window.
Test your smoothed data for local minima (i.e. any single datum
that it is smaller than its
neighbours.
If there are more than two local minima, increase the window size, and goto step 1.
Update:
Having looked at the sample data you posted, I've realised that you need to detect minimal plateaus rather than just individual points, so step two in the algorithm should be tweaked to identify a point as part of a minimum if there are no neighbours with smaller values between the nearest higher value neighbours on either side. Then when counting minima in step 3, a minimal plateau should count as a single minimum.
I've tested this algorithm on your example datasets and it performs well, picking minima at: 18, 12, 15, 13, 23, 20, 23and20 for your datasets respectively.
a possible heuristic: using spline approximation to smooth the histogram, and make it polynomical-like and then look for a local minimum.
note that this is only a heuristic solution and might fail... but I think will provide a good solution for most cases.
This actually sounds rather like histogram-based image segmentation to me (although this is not an image, so it's really just histogram segmentation). Sounds weird, but bear with me.
Is what's important about the minimum the fact that it's a minimum, or that it divides the small maximum from the large maximum? If it's the fact that it divides the maxima, then segmentation is definitely what you want.
Have a look at K-means clustering. You'd have two clusters. It's not a terribly complicated procedure, but Wikipedia (and other sources) do a much better job of explaining it than i could, so i'll leave it to them.