Black is not honoring #fmt: skip - python-black

With below configuration in pyproject.toml :
[tool.black]
# How many characters per line to allow.
line-length = 120
# When processing Jupyter Notebooks, add the given magic to the list of known
# python-magics (timeit, prun, capture, pypy, python3, python, time).
# Useful for formatting cells with custom python magics.
# python-cell-magics =
# Require a specific version of Black to be running
# (useful for unifying results across many environments e.g. with a pyproject.toml file).
# It can be either a major version number or an exact version.
# required-version =
# A regular expression that matches files and directories that should be
# included on recursive searches. An empty value means all files are included
# regardless of the name. Use forward slashes for directories on all platforms (Windows, too).
# Exclusions are calculated first, inclusions later.
# include = "(\.pyi?|\.ipynb)$"
# A regular expression that matches files and directories that should be
# excluded on recursive searches. An empty value means no paths are excluded.
# Use forward slashes for directories on all platforms (Windows, too).
# Exclusions are calculated first, inclusions later.
# exclude = "/(\.direnv|\.eggs|\.git|\.hg|\.mypy_cache|\.nox|\.tox|\.venv|venv|\.svn|\.ipynb_checkpoints|_build|buck-out|build|dist|__pypackages__)/"
# Like 'exclude', but adds additional files and directories on top of the excluded ones.
# (Useful if you simply want to add to the default).
# extend-exclude =
# Like 'exclude', but files and directories matching this regex will be excluded
# even when they are passed explicitly as arguments.
# force-exclude =
# The name of the file when passing it through stdin.
# Useful to make sure Black will respect 'force-exclude' option on some editors that rely on using stdin.
# stdin-filename =
# Number of parallel workers.
# Can be a number or a range.
# workers =
and this command line :
black --config "pyproject.toml" --target-version py39 --check --diff .
the following line of code is flagged :
ave_quantity = self.exec_math(math_iterable["mean"], "mean", []) # execute the "mean" fxn on the dataset # cspell: disable-line # fmt: skip
--- properties/datasets/models.py 2022-11-30 00:01:16.590743 +0000
+++ properties/datasets/models.py 2022-11-30 00:01:18.692767 +0000
## -746,11 +746,13 ##
calculate the mean value of all the dataset points
return: numerical value of this function when all variables are zero
rtype: float
"""
- ave_quantity = self.exec_math(math_iterable["mean"], "mean", []) # execute the "mean" fxn on the dataset # fmt:skip
+ ave_quantity = self.exec_math(
+ math_iterable["mean"], "mean", []
+ ) # execute the "mean" fxn on the dataset # fmt:skip
return getattr(ave_quantity, "magnitude", 0.0)
def serialize(self, flat=False):
return {
"type": "dataset",
would reformat properties/datasets/models.py
Oh no! 💥 💔 💥
1 file would be reformatted, 102 files would be left unchanged.
What am I missing here?
Using black v22.10.0
Also asked here --> https://github.com/psf/black/issues/451#issuecomment-1331478945

Related

How can I use the ruamel.yaml rtsc mode?

I've been working on creating a YAML re-formatter based on ruamel.yaml (which you can see here).
I'm currently using version 0.17.20.
Cleaning up comments and whitespace has been difficult. I want to:
ensure there is only one space before the # for EOL comments
align full line comments with the key or item immediately following
remove duplicate blank lines so there is at most one blank line
To get closer to achieving that, I have a custom Emitter class where I extend write_comment to adjust the comments just before writing with super().write_comment(...). However, the Emitter does not know about which key or item comes next because comments are generally attached as post comments.
As I've studied the ruamel.yaml code to figure out how to do this, I found the rtsc mode (Round Trip Split Comments) which looks fantastic because it separates EOLComment, BlankLineComment and FullLineComment instead of lumping them together.
From what I can tell, the Parser and Scanner have been adjusted to capture the comments. So, loading is (mostly?) implemented with this "NEWCMNT" implementation. But Emitter.write_comment expects CommentToken instead of comment line numbers, so dumping does not work yet.
If I update my Emitter.write_comment method, is that enough to finish dumping? Or what else might be necessary? In one of my tries, I ran into a sys.exit in ScannedComments.assign_eol() - what else is needed to finish that?
PS: I wouldn't normally ask how to collaborate on StackOverflow, but this is not a bug report or a feature request, and I'm trying/failing to use a new (undocumented) feature, so I'm filing this here instead of sourceforge.
rtsc is work in progress cq work started but unfinished. It's internals will almost certainly change.
Two of the three points you indicate can relatively easy be implemented:
set the column of each comment to 0 ( by recursively going over a loaded data structure similar to here ) if the column is before the position of the end of the value on a line, you'll get one space between the value and the column
at the same time doing the recursion in the previous point. Take each comment value and do something like:
value = '\n'.join(line.strip() for line in value.splitlines()
while '\n\n\n' in value:
value = value.replace('\n\n\n', '\n\n')
The indentation to the following element is difficult, depends on the
data structure etc. Given that these are full line comments, I suggest
you do some postprocessing of the YAML document you generate:
find a full line comment, gather full line comments until next line is
not full line comment (i.e. some "real" YAML). Since full line comments
are in column[0] if the previous stuff is applied, you don't have to
track if you are in a (multi-line) literal or folded scalar string where
one of the lines happens to start with #
determine number of spaces
before real YAML and apply these to the full line comments.
import sys
import ruamel.yaml
yaml_str = """\
# the following is a example YAML doc
a:
- b: 42
# collapse multiple empty lines
c: |
# this is not a comment
it is the first line of a block style literal scalar
processing this gobbles a newline which doesn't go into a comment
# that is unless you have a (dedented) comment directly following
d: 42 # and some non-full line comment
e: # another one
# and some more comments to align
f: glitter in the dark near the Tannhäuser gate
"""
def redo_comments(d):
def do_one(comment):
if not comment:
return
comment.column = 0
value = '\n'.join(line.strip() for line in comment.value.splitlines()) + '\n'
while '\n\n\n' in value:
value = value.replace('\n\n\n', '\n\n')
comment.value = value
def do_values(v):
for x in v:
for comment in x:
do_one(comment)
def do_loc(v):
if v is None:
return
do_one(v[0])
if not v[1]:
return
for comment in v[1]:
do_one(comment)
if isinstance(d, dict):
do_loc(d.ca.comment)
do_values(d.ca.items.values())
for val in d.values():
redo_comments(val)
elif isinstance(d, list):
do_values(d.ca.items.values())
for elem in d:
redo_comments(elem)
def realign_full_line_comments(s):
res = []
buf = []
for line in s.splitlines(True):
if not buf:
if line and line[0] == '#':
buf.append(line)
else:
res.append(line)
else:
if line[0] in '#\n':
buf.append(line)
else:
# YAML line, determine indent
count = 0
while line[count] == ' ':
count += 1
if count > len(line):
break # superfluous?
indent = ' ' * count
for cline in buf:
if cline[0] == '\n': # empty
res.append(cline)
else:
res.append(indent + cline)
buf = []
res.append(line)
return ''.join(res)
yaml = ruamel.yaml.YAML()
# yaml.indent(mapping=4, sequence=4, offset=2)
# yaml.preserve_quotes = True
data = yaml.load(yaml_str)
redo_comments(data)
yaml.dump(data, sys.stdout, transform=realign_full_line_comments)
which gives:
# the following is a example YAML doc
a:
- b: 42
# collapse multiple empty lines
c: |
# this is not a comment
it is the first line of a block style literal scalar
processing this gobbles a newline which doesn't go into a comment
# that is unless you have a (dedented) comment directly following
d: 42 # and some non-full line comment
e: # another one
# and some more comments to align
f: glitter in the dark near the Tannhäuser gate

Sort two text files with its indented text aligned to it

I would like to compare two of my log files generated before and after an implementation to see if it has impacted anything. However, the order of the logs I get is not the same all the time. Since, the log file also has multiple indented lines, when I tried to sort, everything is sorted. But, I would like to keep the child intact with the parent. Indented lines are spaces and not tab.
Any help would be greatly appreciated. I am fine with any windows solution or Linux one.
Eg of the file:
#This is a sample code
Parent1 to be verified
Child1 to be verified
Child2 to be verified
Child21 to be verified
Child23 to be verified
Child22 to be verified
Child221 to be verified
Child4 to be verified
Child5 to be verified
Child53 to be verified
Child52 to be verified
Child522 to be verified
Child521 to be verified
Child3 to be verified
I am posting another answer here to sort it hierarchically, using python.
The idea is to attach the parents to the children to make sure that the children under the same parent are sorted together.
See the python script below:
"""Attach parent to children in an indentation-structured text"""
from typing import Tuple, List
import sys
# A unique separator to separate the parent and child in each line
SEPARATOR = '#'
# The indentation
INDENT = ' '
def parse_line(line: str) -> Tuple[int, str]:
"""Parse a line into indentation level and its content
with indentation stripped
Args:
line (str): One of the lines from the input file, with newline ending
Returns:
Tuple[int, str]: The indentation level and the content with
indentation stripped.
Raises:
ValueError: If the line is incorrectly indented.
"""
# strip the leading white spaces
lstripped_line = line.lstrip()
# get the indentation
indent = line[:-len(lstripped_line)]
# Let's check if the indentation is correct
# meaning it should be N * INDENT
n = len(indent) // len(INDENT)
if INDENT * n != indent:
raise ValueError(f"Wrong indentation of line: {line}")
return n, lstripped_line.rstrip('\r\n')
def format_text(txtfile: str) -> List[str]:
"""Format the text file by attaching the parent to it children
Args:
txtfile (str): The text file
Returns:
List[str]: A list of formatted lines
"""
formatted = []
par_indent = par_line = None
with open(txtfile) as ftxt:
for line in ftxt:
# get the indentation level and line without indentation
indent, line_noindent = parse_line(line)
# level 1 parents
if indent == 0:
par_indent = indent
par_line = line_noindent
formatted.append(line_noindent)
# children
elif indent > par_indent:
formatted.append(par_line +
SEPARATOR * (indent - par_indent) +
line_noindent)
par_indent = indent
par_line = par_line + SEPARATOR + line_noindent
# siblings or dedentation
else:
# We just need first `indent` parts of parent line as our prefix
prefix = SEPARATOR.join(par_line.split(SEPARATOR)[:indent])
formatted.append(prefix + SEPARATOR + line_noindent)
par_indent = indent
par_line = prefix + SEPARATOR + line_noindent
return formatted
def sort_and_revert(lines: List[str]):
"""Sort the formatted lines and revert the leading parents
into indentations
Args:
lines (List[str]): list of formatted lines
Prints:
The sorted and reverted lines
"""
sorted_lines = sorted(lines)
for line in sorted_lines:
if SEPARATOR not in line:
print(line)
else:
leading, _, orig_line = line.rpartition(SEPARATOR)
print(INDENT * (leading.count(SEPARATOR) + 1) + orig_line)
def main():
"""Main entry"""
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <file>")
sys.exit(1)
formatted = format_text(sys.argv[1])
sort_and_revert(formatted)
if __name__ == "__main__":
main()
Let's save it as format.py, and we have a test file, say test.txt:
parent2
child2-1
child2-1-1
child2-2
parent1
child1-2
child1-2-2
child1-2-1
child1-1
Let's test it:
$ python format.py test.txt
parent1
child1-1
child1-2
child1-2-1
child1-2-2
parent2
child2-1
child2-1-1
child2-2
If you wonder how the format_text function formats the text, here is the intermediate results, which also explains why we could make file sorted as we wanted:
parent2
parent2#child2-1
parent2#child2-1#child2-1-1
parent2#child2-2
parent1
parent1#child1-2
parent1#child1-2#child1-2-2
parent1#child1-2#child1-2-1
parent1#child1-1
You may see that each child has its parents attached, all the way along to the root. So that the children under the same parent are sorted together.
Short answer (Linux solution):
sed ':a;N;$!ba;s/\n /#/g' test.txt | sort | sed ':a;N;$!ba;s/#/\n /g'
Test it out:
test.txt
parent2
child2-1
child2-1-1
child2-2
parent1
child1-1
child1-2
child1-2-1
$ sed ':a;N;$!ba;s/\n /#/g' test.txt | sort | sed ':a;N;$!ba;s/#/\n /g'
parent1
child1-1
child1-2
child1-2-1
parent2
child2-1
child2-1-1
child2-2
Explanation:
The idea is to replace the newline followed by an indentation/space with a non newline character, which has to be unique in your file (here I used # for example, if it is not unique in your file, use other characters or even a string), because we need to turn it back the newline and indentation/space later.
About sed command:
:a create a label 'a'
N append the next line to the pattern space
$! if not the last line, ba branch (go to) label 'a'
s substitute, /\n / regex for newline followed by a space
/#/ a unique character to replace the newline and space
if it is not unique in your file, use other characters or even a string
/g global match (as many times as it can)

Improving an algorithm for substring search when reading ZIP files

So I have a ZIP reader library, and I read ZIP files by first figuring out where the EOCD record is (the standard way "from the tail"). I have to look for a pattern that is roughly this:
4byte_magic_number, fixed_n_bytes, 2_bytes_of_comment_size, comment
The bytesize of comment is provided in the 2_bytes_of_comment_size. Just scanning for the magic number is insufficient, because I eager-read a substantial portion at the tail of the file - basically the maximum size the ZIP EOCD record can be, and then look for this pattern in there.
So far, I came up with this
def locate_eocd_signature(in_str)
# We have to scan from the _very_ tail. We read the very minimum size
# the EOCD record can have (up to and including the comment size), using
# a sliding window. Once our end offset matches the comment size we found our
# EOCD marker.
eocd_signature_int = 0x06054b50
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
end_location = minimum_record_size * -1
loop do
# If the window is nil, we have rolled off the start of the string, nothing to do here.
# We use negative values because if we used positive slice indices
# we would have to detect the rollover ourselves
break unless window = in_str[end_location, minimum_record_size]
window_location = in_str.bytesize + end_location
unpacked = window.unpack(unpack_pattern)
# If we found the signature, pick up the comment size, and check if the size of the window
# plus that comment size is where we are in the string. If we are - bingo.
if unpacked[0] == 0x06054b50 && comment_size = unpacked[-1]
assumed_eocd_location = in_str.bytesize - comment_size - minimum_record_size
# if the comment size is where we should be at - we found our EOCD
return assumed_eocd_location if assumed_eocd_location == window_location
end
end_location -= 1 # Shift the window back, by one byte, and try again.
end
end
but it just screams ugly at me. Is there a better way to do something like this? Is there a pack specifier that says "all the bytes in binary until the the end of the string" that I do not know of? Then I could tack that onto the end of the pack specifier for example... A bit at loss here.
In the end I opted for the following optimization. First, I made a method for finding all the indices of a given substring in a string - there is no stdlib builtin for this.
def all_indices_of_substr_in_str(of_substring, in_string)
last_i = 0
found_at_indices = []
while last_i = in_string.index(of_substring, last_i)
found_at_indices << last_i
last_i += of_substring.bytesize
end
found_at_indices
end
Then, we use it to "latch" onto the offsets in our buffer where our signature was found.
def locate_eocd_signature(in_str)
eocd_signature = 0x06054b50
eocd_signature_str = [eocd_signature].pack('V')
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
str_size = in_str.bytesize
indices = all_indices_of_substr_in_str(eocd_signature_str, in_str)
indices.each do |check_at|
maybe_record = in_str[check_at..str_size]
# If the record is smaller than the minimum - we will never recover anything
break if maybe_record.bytesize < minimum_record_size
# Now we check if the record ends with the combination
# of the comment size and an arbitrary byte string of that size.
# If it does - we found our match
*_unused, comment_size = maybe_record.unpack(unpack_pattern)
if (maybe_record.bytesize - minimum_record_size) == comment_size
return check_at # Found the EOCD marker location
end
end
# If we haven't caught anything, return nil deliberately instead of returning the last statement
nil
end

Geany: Syntax highlighting for custom filetype for SOME words

Geany is a simple, fast and yet powerful text editor.
It has quite strong support for syntax highlighting for almost all kinds
of programming languages.
I was wondering how to make a customized syntax highlighting for my
special need program called "Phosim" which has the file extension .cat.
So far I have done this:
First I created filetype extension configuration file: ~/.config/geany/filetype_extensions.conf
The contents of this looks like this:
[Extensions]
Gnuplot=*.gp;*.gnu;*.plt;
Galfit=*.gal;
Phosim=*.cat;
[Groups]
Script=Gnuplot;Galfit;Phosim;
Here, I am trying to apply custom highlight to programs Gnuplot, Galfit, and Phosim. For Gnuplot and Galfit it works fine. But for Phosim I got some problems.
Then I created file definition configuration file: ~/.config/geany/filedefs/filetypes.Phosim.conf
The contents of which looks like this:
# Author : Bhishan Poudel
# Date : May 24, 2016
# Version : 1.0
[styling]
# Edit these in the colorscheme .conf file instead
default=default
comment=comment_line
function=keyword_1
variable=string_1,bold
label=label
userdefined=string_2
number=number_2
[keywords]
# all items must be in one line separated by space
variables=object Unrefracted_RA_deg SIM_SEED none
functions=
lables=10
userdefined=angle 30 Angle_RA 20.0 none
numbers=0 1 2 3 4 5 6 7 8 9
[lexer_properties]
nsis.uservars=1
nsis.ignorecase=1
[settings]
# default extension used when saving files
extension=cat
# single comments, like # in this file
comment_single=#
# multiline comments
#comment_open=
#comment_close=
# This setting works only for single line comments
comment_use_indent=true
# context action command (please see Geany's main documentation for details)
context_action_cmd=
# lexer filetype should be an existing lexer that does not use lexer_filetype itself
lexer_filetype=NSIS
[build-menu]
EX_00_LB=Execute
EX_00_CM=
EX_00_WD=
FT_00_LB=
FT_00_CM=
FT_00_WD=
FT_02_LB=
FT_02_CM=
FT_02_WD=
Now my example.cat looks like this:
# example.cat
angle 30
Angle_RA 20.0
object none
# Till now,
# Words highlighted : angle 30 object none
# Words not highlighted: Angle_RA 20.0
# I like them also to be highlighted!
I got syntax highlighting for only two words, viz., object and none.
I tried styling equal to Fortran since it has uppercase letters but it also did not work.
How can we get the syntax highlight for the variable names which contains uppercase, lowercase, and underscore?
For example:
I got syntax highlight for words: object none.
But, did not get syntax highlight for words: Angle_RA 20.0
Also, I my numbers 0,1,..,9 are highlighted but the decimals are not highlighted. How can we highlight decimals too?
For example:
I got syntax highlight for words: 1 1000 but, did not get syntax highlight for words: 49552.3 180.0
Some useful links are following:
Make Geany recognize additional file extensions
Custom syntax highlighting in Geany
http://www.geany.org/manual/current/index.html#custom-filetypes
http://www.geany.org/manual/#lexer-filetype
Instead of creating new file definition files I added file extensions for Python and it worked for me.
For example, I wanted to custom highlight the files with extension .icat (If you are interested, this is instance catalog file for Phosim Software in Astronomy.)
Drawback: The additional words are also highlighted in python scripts (.py,.pyc,.ipy)
Note: If anybody posts solution that works with new file extension, ~/.config/geany/filedefs/filetypes.Phosim.conf I would heartly welcome that.
My example.pcat file looks like this:
# example.pcat
Unrefracted_RA_deg 0
Unrefracted_Dec_deg 0
Unrefracted_Azimuth 0
Unrefracted_Altitude 89
Slalib_date 1994/7/19/0.298822999997
Opsim_rotskypos 0
Opsim_rottelpos 0
Opsim_moondec -90
Opsim_moonra 180
Opsim_expmjd 49552.3
Opsim_moonalt -90
Opsim_sunalt -90
Opsim_filter 2
Opsim_dist2moon 180.0
Opsim_moonphase 10.0
Opsim_obshistid 99999999
Opsim_rawseeing 0.65
SIM_SEED 1000
SIM_MINSOURCE 1
SIM_TELCONFIG 0
SIM_CAMCONFIG 1
SIM_VISTIME 15000.0
SIM_NSNAP 1
object 0 0.0 0.0 20 ../sky/sed_flat.txt 0 0 0 0 0 0 bhishan.fits 0.09 0.0 none
I want geany to highlight all first words with yellow color, numbers with mangenta, and the word 'none' with blue color.
First I created (or, edited if already exists) the file:
~/.config/geany/filetype_extensions.conf
And added following stuff inside it.
[Extensions]
Gnuplot=*.gp;*.gnu;*.plt;
Galfit=*.gal;
Phosim=*.pcat;
Python=*.py;*.pyc;*.ipy;*.icat;*.pcat
[Groups]
Script=Gnuplot;Galfit;Phosim;Python;
Then, I added the additional KEYWORDS to the already existing keywords in python filetype.
For this I created (or, edited if already exists) the file:
~/.config/geany/filedefs/filetypes.python
Now, the file ~/.config/geany/filedefs/filetypes.python looks like this:
# Author : Bhishan Poudel
# Date : June 9, 2016
# Version : 1.0
# File : Filetype for both python and phosim_instance_catalogs
[styling]
default=default
commentline=comment_line
number=number_1
string=string_1
character=character
word=keyword_1
triple=string_2
tripledouble=string_2
classname=type
defname=function
operator=operator
identifier=identifier_1
commentblock=comment
stringeol=string_eol
word2=keyword_2
decorator=decorator
[keywords]
# all items must be in one line
primary=and as assert break class continue def del elif else except exec finally for from global if import in is lambda not or pass print raise return try while with yield False None True Words_after_this_are_for_Phosim_pcat_files Unrefracted_RA_deg Unrefracted_Dec_deg Unrefracted_Azimuth Unrefracted_Altitude Slalib_date Opsim_moondec Opsim_rotskypos Opsim_rottelpos Opsim_moondec Opsim_moonra Opsim_expmjd Opsim_moonalt Opsim_sunalt Opsim_filter Opsim_dist2moon Opsim_moonphase Opsim_obshistid Opsim_rawseeing SIM_SEED SIM_MINSOURCE SIM_TELCONFIG SIM_CAMCONFIG SIM_VISTIME SIM_NSNAP object
identifiers=ArithmeticError AssertionError AttributeError BaseException BufferError BytesWarning DeprecationWarning EOFError Ellipsis EnvironmentError Exception FileNotFoundError FloatingPointError FutureWarning GeneratorExit IOError ImportError ImportWarning IndentationError IndexError KeyError KeyboardInterrupt LookupError MemoryError NameError NotImplemented NotImplementedError OSError OverflowError PendingDeprecationWarning ReferenceError RuntimeError RuntimeWarning StandardError StopIteration SyntaxError SyntaxWarning SystemError SystemExit TabError TypeError UnboundLocalError UnicodeDecodeError UnicodeEncodeError UnicodeError UnicodeTranslateError UnicodeWarning UserWarning ValueError Warning ZeroDivisionError __debug__ __doc__ __import__ __name__ __package__ abs all any apply basestring bin bool buffer bytearray bytes callable chr classmethod cmp coerce compile complex copyright credits delattr dict dir divmod enumerate eval execfile exit file filter float format frozenset getattr globals hasattr hash help hex id input int intern isinstance issubclass iter len license list locals long map max memoryview min next object oct open ord pow print property quit range raw_input reduce reload repr reversed round set setattr slice sorted staticmethod str sum super tuple type unichr unicode vars xrange zip array arange Catagorical cStringIO DataFramedate_range genfromtxt linspace loadtxt matplotlib none numpy np pandas pd plot plt pyplot savefig scipy Series sp StringIO
[lexer_properties]
fold.comment.python=1
fold.quotes.python=1
[settings]
# default extension used when saving files
extension=py
# the following characters are these which a "word" can contains, see documentation
wordchars=_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
# MIME type
mime_type=text/x-python
comment_single=#
comment_open="""
comment_close="""
comment_use_indent=true
# context action command (please see Geany's main documentation for details)
context_action_cmd=
[indentation]
width=4
# 0 is spaces, 1 is tabs, 2 is tab & spaces
type=0
[build_settings]
# %f will be replaced by the complete filename
# %e will be replaced by the filename without extension
# (use only one of it at one time)
compiler=python -m py_compile "%f"
run_cmd=python "%f"
[build-menu]
FT_00_LB=Execute
FT_00_CM=python %f
FT_00_WD=
FT_01_LB=
FT_01_CM=
FT_01_WD=
FT_02_LB=
FT_02_CM=
FT_02_WD=
EX_00_LB=Execute
EX_00_CM=clear; python %f
EX_00_WD=
error_regex=([^:]+):([0-9]+):([0-9:]+)? .*
EX_01_LB=
EX_01_CM=
EX_01_WD=
Now, I restarted the geany and I can see all the first words in yellow, numbers other color and the word 'none' is blue colored.

manipulating array: adding number of occurrences to the duplicate elements

(You are welcome to change the title to a more appropriate one!)
I got another Ruby/ERB question. I have this file:
ec2-23-22-59-32, mongoc, i-b8b44, instnum=1, Running
ec2-54-27-11-46, mongod, i-43f9f, instnum=2, Running
ec2-78-62-192-20, mongod, i-02fa4, instnum=3, Running
ec2-24-47-51-23, mongos, i-546c4, instnum=4, Running
ec2-72-95-64-22, mongos, i-5d634, instnum=5, Running
ec2-27-22-219-75, mongoc, i-02fa6, instnum=6, Running
And I can process the file to create an array like this:
irb(main):007:0> open(inFile).each { |ln| puts ln.split(',').map(&:strip)[0..1] }
ec2-23-22-59-32
mongoc
ec2-54-27-11-46
mongod
....
....
But what I really want is the occurrence number concatenated to the "mongo-type" so that it becomes:
ec2-23-22-59-32
mongoc1
ec2-54-27-11-46
mongod1
ec2-78-62-192-20
mongod2
ec2-24-47-51-23
mongos1
ec2-72-95-64-22
mongos2
ec2-27-22-219-75
mongoc2
The number of each mongo-type is not fixed and it changes over time. Any help with how can I do that? Thanks in advance. Cheers!!
Quick answer (maybe could be optimized):
data = 'ec2-23-22-59-32, mongoc, i-b8b44, instnum=1, Running
ec2-54-27-11-46, mongod, i-43f9f, instnum=2, Running
ec2-78-62-192-20, mongod, i-02fa4, instnum=3, Running
ec2-24-47-51-23, mongos, i-546c4, instnum=4, Running
ec2-72-95-64-22, mongos, i-5d634, instnum=5, Running
ec2-27-22-219-75, mongoc, i-02fa6, instnum=6, Running'
# a hash where we will save mongo types strings as keys
# and number of occurence as values
mtypes = {}
data.lines.each do |ln|
# get first and second element of given string to inst and mtype respectively
inst, mtype = ln.split(',').map(&:strip)[0..1]
# check if mtypes hash has a key that equ current mtype
# if yes -> add 1 to current number of occurence
# if not -> create new key and assign 1 as a value to it
# this is a if ? true : false -- ternary operator
mtypes[mtype] = mtypes.has_key?(mtype) ? mtypes[mtype] + 1 : 1
# combine an output string (everything in #{ } is a variables
# so #{mtype}#{mtypes[mtype]} means take current value of mtype and
# place after it current number of occurence stored into mtypes hash
p "#{inst} : #{mtype}#{mtypes[mtype]}"
end
Output:
# "ec2-23-22-59-32 : mongoc1"
# "ec2-54-27-11-46 : mongod1"
# "ec2-78-62-192-20 : mongod2"
# "ec2-24-47-51-23 : mongos1"
# "ec2-72-95-64-22 : mongos2"
# "ec2-27-22-219-75 : mongoc2"
Quite strightforward I think. If you don't understand something -- let me know.

Resources