ruby serialport gem, who is responsible to check parity errors? - ruby

gems
serialport (1.0.4)
Authors: Guillaume Pierronnet, Alan Stern, Daniel E. Shipton, Tobin
Richard, Hector Parra, Ryan C. Payne
Homepage: http://github.com/hparra/ruby-serialport/
Library for using RS-232 serial ports.
I am using this gem, and my device's specifications are as follows.
9600bps
7bits
1 stop bit
EVEN parity
When I receive data like below, the unpacked data is still with parity bit.
sp = SerialPort.new("/dev/serial-device", 9600, 7, 1, SerialPort::EVEN)
data = sp.gets
data.chars.each do |char|
puts char.unpack("B*")
end
ex. if sp receives a, the unpacked data is 11100001 instead of 01100001, because it's EVEN parity.
To convert the byte back the what it should be, I do like this
data = sp.gets #gets 11100001 for 'a' (even parity)
data.bytes.to_a.each do |byte|
puts (byte & 127).chr
end
now, to me, this is a way low-level. I was expecting the serialport gem was to do this parity check, but as far as I read its document, it doesn't say anything about parity check.
Am I missing a method that is already implemented in the gem, or is my work around above is nessesary since it's my responsibity to check the parity and find error?

SerialPort::ODD, SerialPort::MARK, SerialPort::SPACE
(MARK and SPACE are not supported on Posix)
Raise an argError on bad argument.
SerialPort::new and SerialPort::open without a block return an
instance of SerialPort. SerialPort::open with a block passes
a SerialPort to the block and closes it when the block exits
(like File::open).
** Instance methods **
modem_params() -> aHash
modem_params=(aHash) -> aHash
get_modem_params() -> aHash
set_modem_params(aHash) -> aHash
set_modem_params(baudrate [, databits [, stopbits [, parity]]])
Get and set the modem parameters. Hash keys are "baud", "data_bits",
"stop_bits", and "parity" (see above).
Parameters not present in the hash or set to nil remain unchanged.
Default parameter values for the set_modem_params method are:
databits = 8, stopbits = 1, parity = (databits == 8 ?
SerialPort::NONE : SerialPort::EVEN).
baud() -> anInteger
baud=(anInteger) -> anInteger
data_bits() -> 4, 5, 6, 7, or 8
data_bits=(anInteger) -> anInteger
stop_bits() -> 1 or 2
stop_bits=(anInteger) -> anInteger
parity() -> anInteger: SerialPort::NONE, SerialPort::EVEN,
SerialPort::ODD, SerialPort::MARK, or SerialPort::SPACE
parity=(anInteger) -> anInteger
Get and set the corresponding modem parameter.
flow_control() -> anInteger
flow_control=(anInteger) -> anInteger
Get and set the flow control: SerialPort::NONE, SerialPort::HARD,
SerialPort::SOFT, or (SerialPort::HARD | SerialPort::SOFT).
Note: SerialPort::HARD mode is not supported on all platforms.
SerialPort::HARD uses RTS/CTS handshaking; DSR/DTR is not
supported.
read_timeout() -> anInteger
read_timeout=(anInteger) -> anInteger
write_timeout() -> anInteger
write_timeout=(anInteger) -> anInteger
Get and set timeout values (in milliseconds) for reading and writing.
A negative read timeout will return all the available data without
waiting, a zero read timeout will not return until at least one
byte is available, and a positive read timeout returns when the
requested number of bytes is available or the interval between the
arrival of two bytes exceeds the timeout value.
Note: Read timeouts don't mix well with multi-threading.
Note: Under Posix, write timeouts are not implemented.
break(time) -> nil
Send a break for the given time.
time -> anInteger: tenths-of-a-second for the break.
Note: Under Posix, this value is very approximate.
signals() -> aHash
Return a hash with the state of each line status bit. Keys are
"rts", "dtr", "cts", "dsr", "dcd", and "ri".
Note: Under Windows, the rts and dtr values are not included.
rts()
dtr()
cts()
dsr()
dcd()
ri() -> 0 or 1
rts=(0 or 1)
dtr=(0 or 1) -> 0 or 1
Get and set the corresponding line status bit.
Note: Under Windows, rts() and dtr() are not implemented

Related

With ruamel.yaml how can I conditionally convert flow maps to block maps based on line length?

I'm working on a ruamel.yaml (v0.17.4) based YAML reformatter (using the RoundTrip variant to preserve comments).
I want to allow a mix of block- and flow-style maps, but in some cases, I want to convert a flow-style map to use block-style.
In particular, if the flow-style map would be longer than the max line length^, I want to convert that to a block-style map instead of wrapping the line somewhere in the middle of the flow-style map.
^ By "max line length" I mean the best_width that I configure by setting something like yaml.width = 120 where yaml is a ruamel.yaml.YAML instance.
What should I extend to achieve this? The emitter is where the line-length gets calculated so wrapping can occur, but I suspect that is too late to convert between block- and flow-style. I'm also concerned about losing comments when I switch the styles. Here are some possible extension points, can you give me a pointer on where I'm most likely to have success with this?
Emitter.expect_flow_mapping() probably too late for converting flow->block
Serializer.serialize_node() probably too late as it consults node.flow_style
RoundTripRepresenter.represent_mapping() maybe? but this has no idea about line length
I could also walk the data before calling yaml.dump(), but this has no idea about line length.
So, where should I and where can I adjust the flow_style whether a flow-style map would trigger line wrapping?
What I think the most accurate approach is when you encounter a flow-style mapping in the dumping process is to first try to emit it to a buffer and then get the length of the buffer and if that combined with the column that you are in, actually emit block-style.
Any attempt to guesstimate the length of the output without actually trying to write that part of a tree is going to be hard, if not impossible to do without doing the actual emit. Among other things the dumping process actually dumps scalars and reads them back to make sure no quoting needs to be forced (e.g. when you dump a string that reads back like a date). It also handles single key-value pairs in a list in a special way ( [1, a: 42, 3] instead of the more verbose [1, {a: 42}, 3]. So a simple calculation of the length of the scalars that are the keys and values and separating comma, colon and spaces is not going to be precise.
A different approach is to dump your data with a large line width and parse the output and make a set of line numbers for which the line is too long according to the width that you actually want to use. After loading that output back you can walk over the data structure recursively, inspect the .lc attribute to determine the line number on which a flow style mapping (or sequence) started and if that line number is in the set you built beforehand change the mapping to block style. If you have nested flow-style collections, you might have to repeat this process.
If you run the following, the initial dumped value for quote will be on one line.
The change_to_block method as presented changes all mappings/sequences that are too long
that are on one line.
import sys
import ruamel.yaml
yaml_str = """\
movie: bladerunner
quote: {[Batty, Roy]: [
I have seen things you people wouldn't believe.,
Attack ships on fire off the shoulder of Orion.,
I watched C-beams glitter in the dark near the Tannhäuser Gate.,
]}
"""
class Blockify:
def __init__(self, width, only_first=False, verbose=0):
self._width = width
self._yaml = None
self._only_first = only_first
self._verbose = verbose
#property
def yaml(self):
if self._yaml is None:
self._yaml = y = ruamel.yaml.YAML(typ=['rt', 'string'])
y.preserve_quotes = True
y.width = 2**16
return self._yaml
def __call__(self, d):
pass_nr = 0
changed = [True]
while changed[0]:
changed[0] = False
try:
s = self.yaml.dumps(d)
except AttributeError:
print("use 'pip install ruamel.yaml.string' to install plugin that gives 'dumps' to string")
sys.exit(1)
if self._verbose > 1:
print(s)
too_long = set()
max_ll = -1
for line_nr, line in enumerate(s.splitlines()):
if len(line) > self._width:
too_long.add(line_nr)
if len(line) > max_ll:
max_ll = len(line)
if self._verbose > 0:
print(f'pass: {pass_nr}, lines: {sorted(too_long)}, longest: {max_ll}')
sys.stdout.flush()
new_d = self.yaml.load(s)
self.change_to_block(new_d, too_long, changed, only_first=self._only_first)
d = new_d
pass_nr += 1
return d, s
#staticmethod
def change_to_block(d, too_long, changed, only_first):
if isinstance(d, dict):
if d.fa.flow_style() and d.lc.line in too_long:
d.fa.set_block_style()
changed[0] = True
return # don't convert nested flow styles, might not be necessary
# don't change keys if any value is changed
for v in d.values():
Blockify.change_to_block(v, too_long, changed, only_first)
if only_first and changed[0]:
return
if changed[0]: # don't change keys if value has changed
return
for k in d:
Blockify.change_to_block(k, too_long, changed, only_first)
if only_first and changed[0]:
return
if isinstance(d, (list, tuple)):
if d.fa.flow_style() and d.lc.line in too_long:
d.fa.set_block_style()
changed[0] = True
return # don't convert nested flow styles, might not be necessary
for elem in d:
Blockify.change_to_block(elem, too_long, changed, only_first)
if only_first and changed[0]:
return
blockify = Blockify(96, verbose=2) # set verbose to 0, to suppress progress output
yaml = ruamel.yaml.YAML(typ=['rt', 'string'])
data = yaml.load(yaml_str)
blockified_data, string_output = blockify(data)
print('-'*32, 'result:', '-'*32)
print(string_output) # string_output has no final newline
which gives:
movie: bladerunner
quote: {[Batty, Roy]: [I have seen things you people wouldn't believe., Attack ships on fire off the shoulder of Orion., I watched C-beams glitter in the dark near the Tannhäuser Gate.]}
pass: 0, lines: [1], longest: 186
movie: bladerunner
quote:
[Batty, Roy]: [I have seen things you people wouldn't believe., Attack ships on fire off the shoulder of Orion., I watched C-beams glitter in the dark near the Tannhäuser Gate.]
pass: 1, lines: [2], longest: 179
movie: bladerunner
quote:
[Batty, Roy]:
- I have seen things you people wouldn't believe.
- Attack ships on fire off the shoulder of Orion.
- I watched C-beams glitter in the dark near the Tannhäuser Gate.
pass: 2, lines: [], longest: 67
-------------------------------- result: --------------------------------
movie: bladerunner
quote:
[Batty, Roy]:
- I have seen things you people wouldn't believe.
- Attack ships on fire off the shoulder of Orion.
- I watched C-beams glitter in the dark near the Tannhäuser Gate.
Please note that when using ruamel.yaml<0.18 the sequence [Batty, Roy] never will be in block style
because the tuple subclass CommentedKeySeq does never get a line number attached.

Why are my byte arrays not different even though print() says they are?

I am new to python so please forgive me if I'm asking a dumb question. In my function I generate a random byte array for a given number of bytes called "input_data", then I add bytewise some bit errors and store the result in another byte array called "output_data". The print function shows that it works exactly as expected, there are different bytes. But if I compare the byte arrays afterwards they seem to be identical!
def simulate_ber(packet_length, ber, verbose=False):
# generate input data
input_data = bytearray(random.getrandbits(8) for _ in xrange(packet_length))
if(verbose):
print(binascii.hexlify(input_data)+" <-- simulated input vector")
output_data = input_data
#add bit errors
num_errors = 0
for byte in range(len(input_data)):
error_mask = 0
for bit in range(0,7,1):
if(random.uniform(0, 1)*100 < ber):
error_mask |= 1 << bit
num_errors += 1
output_data[byte] = input_data[byte] ^ error_mask
if(verbose):
print(binascii.hexlify(output_data)+" <-- output vector")
print("number of simulated bit errors: " + str(num_errors))
if(input_data == output_data):
print ("data identical")
number of packets: 1
bytes per packet: 16
simulated bit error rate: 5
start simulation...
0d3e896d61d50645e4e3fa648346091a <-- simulated input vector
0d3e896f61d51647e4e3fe648346001a <-- output vector
number of simulated bit errors: 6
data identical
Where is the bug? I am sure the problem is somewhere between my ears...
Thank you in advance for your help!
output_data = input_data
Python is a referential language. When you do the above, both variables now refer to the same object in memory. e.g:
>>> y=['Hello']
>>> x=y
>>> x.append('World!')
>>> x
['Hello', 'World!']
>>> y
['Hello', 'World!']
Cast output_data as a new bytearray and you should be good:
output_data = bytearray(input_data)

Error in setting max features parameter in Isolation Forest algorithm using sklearn

I'm trying to train a dataset with 357 features using Isolation Forest sklearn implementation. I can successfully train and get results when the max features variable is set to 1.0 (the default value).
However when max features is set to 2, it gives the following error:
ValueError: Number of features of the model must match the input.
Model n_features is 2 and input n_features is 357
It also gives the same error when the feature count is 1 (int) and not 1.0 (float).
How I understood was that when the feature count is 2 (int), two features should be considered in creating each tree. Is this wrong? How can I change the max features parameter?
The code is as follows:
from sklearn.ensemble.iforest import IsolationForest
def isolation_forest_imp(dataset):
estimators = 10
samples = 100
features = 2
contamination = 0.1
bootstrap = False
random_state = None
verbosity = 0
estimator = IsolationForest(n_estimators=estimators, max_samples=samples, contamination=contamination,
max_features=features,
bootstrap=boostrap, random_state=random_state, verbose=verbosity)
model = estimator.fit(dataset)
In the documentation it states:
max_features : int or float, optional (default=1.0)
The number of features to draw from X to train each base estimator.
- If int, then draw `max_features` features.
- If float, then draw `max_features * X.shape[1]` features.
So, 2 should mean take two features and 1.0 should mean take all of the features, 0.5 take half and so on, from what I understand.
I think this could be a bug, since, taking a look in IsolationForest's fit:
# Isolation Forest inherits from BaseBagging
# and when _fit is called, BaseBagging takes care of the features correctly
super(IsolationForest, self)._fit(X, y, max_samples,
max_depth=max_depth,
sample_weight=sample_weight)
# however, when after _fit the decision_function is called using X - the whole sample - not taking into account the max_features
self.threshold_ = -sp.stats.scoreatpercentile(
-self.decision_function(X), 100. * (1. - self.contamination))
then:
# when the decision function _validate_X_predict is called, with X unmodified,
# it calls the base estimator's (dt) _validate_X_predict with the whole X
X = self.estimators_[0]._validate_X_predict(X, check_input=True)
...
# from tree.py:
def _validate_X_predict(self, X, check_input):
"""Validate X whenever one tries to predict, apply, predict_proba"""
if self.tree_ is None:
raise NotFittedError("Estimator not fitted, "
"call `fit` before exploiting the model.")
if check_input:
X = check_array(X, dtype=DTYPE, accept_sparse="csr")
if issparse(X) and (X.indices.dtype != np.intc or
X.indptr.dtype != np.intc):
raise ValueError("No support for np.int64 index based "
"sparse matrices")
# so, this check fails because X is the original X, not with the max_features applied
n_features = X.shape[1]
if self.n_features_ != n_features:
raise ValueError("Number of features of the model must "
"match the input. Model n_features is %s and "
"input n_features is %s "
% (self.n_features_, n_features))
return X
So, I am not sure on how you can handle this. Maybe figure out the percentage that leads to just the two features you need - even though I am not sure it'll work as expected.
Note: I am using scikit-learn v.0.18
Edit: as #Vivek Kumar commented this is an issue and upgrading to 0.20 should do the trick.

Lists garbage collection

If I do the following:
List2 = [V || V <- List1, ...]
It seems that the List2 refers to the List1 and erlang:garbage_collect() doesn't clear memory. How is it possible to create a new list without references and discard the old?
In any language with garbage collection you simply need to 'lose' all references to a piece of data before it can be garbage collected. Simply returning from the function that generates the original list, while not storing it in any other 'persistent' location (e.g. the process dictionary), should allow the memory to be reclaimed.
The VM is supposed to manage the garbage collecting. If you use a gen_server, or if you use a "home made" server_loop(State), you should have always the same pattern:
server_loop(State) ->
A = somefunc(State),
B = receive
mesg1 -> func1(...);
...
after Timeout ->
func2(...)
end,
NewState = func3(...),
server_loop(NewState).
As long as a process is alive, executing this loop, the VM will allocate and manage memory areas to store all needed information (variables, message queue...+ some margin) As far as I know, there is some spare memory allocated to the process, and if the VM does not try to recover the memory very fast after it has been released, but if you force a garbage collecting, using erlang:garbage_collect(Pid) you can verify that the memory is free - see example bellow.
startloop() -> spawn(?MODULE,loop,[{lists:seq(1,1000),infinity}]).
loop(endloop) -> ok;
loop({S,T}) ->
NewState = receive
biglist -> {lists:seq(1,5000000),T};
{timeout,V} -> {S,V};
sizelist -> io:format("Size of the list = ~p~n",[length(S)]),
{S,T};
endloop -> endloop
after T ->
L = length(S) div 2,
{lists:seq(1,L),T}
end,
loop(NewState).
%% Here, NewState is a copy of State or a totally new data, depending on the
%% received message. In general, for performance consideration it can be
%% interesting to take care of the function used to avoid big copies,
%% and allow the compiler optimize the beam code
%% [H|Q] rather than Q ++ [H] to add a term to a list for example
and the results in the VM:
2> P = lattice:startloop().
<0.57.0>
...
6> application:start(sasl).
....
ok
7> application:start(os_mon).
...
ok
...
11> P ! biglist.
biglist
...
% get_memory_data() -> {Total,Allocated,Worst}.
14> memsup:get_memory_data().
{8109199360,5346488320,{<0.57.0>,80244336}}
...
23> P ! {timeout,1000}.
{timeout,1000}
24> memsup:get_memory_data().
{8109199360,5367361536,{<0.57.0>,80244336}}
the worst case is the loop process: {<0.57.0>,80244336}
...
28> P ! sizelist.
Size of the list = 0
sizelist
...
31> P ! {timeout,infinity}.
{timeout,infinity}
32> P ! biglist.
biglist
33> P ! sizelist.
Size of the list = 5000000
sizelist
...
36> P ! {timeout,1000}.
{timeout,1000}
37> memsup:get_memory_data().
{8109199360,5314289664,{<0.57.0>,10770968}}
%% note the garbage collecting in the previous line: {<0.57.0>,10770968}
38> P ! sizelist.
sizelist
Size of the list = 156250
39> memsup:get_memory_data().
{8109199360,5314289664,{<0.57.0>,10770968}}
...
46> P ! sizelist.
Size of the list = 0
sizelist
47> memsup:get_memory_data().
{8109199360,5281882112,{<0.57.0>,10770968}}
...
50> erlang:garbage_collect(P).
true
51> memsup:get_memory_data().
{8109199360,5298778112,{<0.51.0>,688728}}
%% after GC, the process <0.57.0> is no more the worst case
If you create new list like this, the new list will have elements from the first one, some elements will be shared between both the lists. And if you throw the first list away, shared elements will still be reachable from the new list and won't count as garbage.
How do you check if the first list is garbage collected? Do you test this in erlang console? The console stores results of evaluation each expression that may be the cause you don't see the list garbage collected.

Nice formatting of numbers inside Messages

When printing string with StyleBox by default we get nicely formatted numbers inside string:
StyleBox["some text 1000000"] // DisplayForm
I mean that the numbers look as if would have additional little spaces: "1 000 000".
But in Messages all numbers are displayed without formatting:
f::NoMoreMemory =
"There are less than `1` bytes of free physical memory (`2` bytes \
is free). $Failed is returned.";
Message[f::NoMoreMemory, 1000000, 98000000]
Is there a way to get numbers inside Messages to be formatted?
I'd use Style to apply the AutoNumberFormatting option:
You can use it to target specific messages:
f::NoMoreMemory =
"There are less than `1` bytes of free physical memory (`2` bytes is free). $Failed is returned.";
Message[f::NoMoreMemory,
Style[1000000, AutoNumberFormatting -> True],
Style[98000000, AutoNumberFormatting -> True]]
or you can use it with $MessagePrePrint to apply it to all the messages:
$MessagePrePrint = Style[#, AutoNumberFormatting -> True] &;
Message[f::NoMoreMemory, 1000000, 98000000]
I think you want $MessagePrePrint
$MessagePrePrint =
NumberForm[#, DigitBlock -> 3, NumberSeparator -> " "] &;
Or, incorporating Sjoerd's suggestion:
With[
{opts =
AbsoluteOptions[EvaluationNotebook[],
{DigitBlock, NumberSeparator}]},
$MessagePrePrint = NumberForm[#, Sequence ## opts] &];
Adapting Brett Champion's method, I believe this allows for copy & paste as you requested:
$MessagePrePrint = StyleForm[#, AutoNumberFormatting -> True] &;

Resources