Jython: Determine the number of arguments a Java method takes at runtime - arguments

I'm trying to write an object inspector for Java objects in Jython, and I want to determine how many arguments a given Java method expects. Is there any way to do that in python, or do I have to use Java reflection for that.
To explain, I'd like to call all "get..." methods of a Java object that don't take any arguments:
from java.util import Date, ArrayList
def numberOfArguments(fct):
# Some magic happens here
return 0
def check(o):
print("")
print(type(o).name)
for fctName in dir(o):
if not str(fctName).startswith("get"): continue
print("== " + fctName)
fct = eval("o."+fctName)
if numberOfArguments(fct) == 0:
print(" " + str(fct()))
check(Date())
check(ArrayList())

Oh well, it turns out that I was doing the wrong thing by using dir(obj). It's just way easier to use o.getClass().getMethods(). This way, I also don't get bitten by overloaded methods.
from java.util import Date, ArrayList
def numberOfArguments(fct):
# Not very magic:
return len(fct.getParameterTypes())
def check(o):
print("")
print(type(o).name)
# Use Java reflection instead of Python dir() function
for fct in o.getClass().getMethods():
fctName = fct.getName()
if not str(fctName).startswith("get"): continue
print("== " + fctName)
if numberOfArguments(fct) == 0:
print(" " + str(fct.invoke(o, [])))
check(Date())
check(ArrayList())

Related

How can I export layered drawings from drawio to create "animated" slides in beamer?

When preparing lectures, or conference presentations with beamer, I usually use layered drawings. Then for graphics included in consecutive slides ("frames" in beamer), I simply use different sets of layers.
For graphics created in IPE, I have created a dedicated expallviews.lua script.
Unfortunately, for graphics created with diagrams.net locally run as drawio-desktop, no such automated export of various layers exists. The only way is to manually select the visible layers in GUI and then export consecutive drawings to a set of PDF files.
Is there a more convenient method to solve that problem?
The described problem has been reported in issues 405 and 737 in the drawio-desktop repository.
After reviewing those issues, I have found a method based on automated (instead of a manual via GUI) changing the visibility of layers and exporting such drawings to the set of PDF files. The proposed method is described in the comment to the issue 405. It uses a simple Python script:
#!/usr/bin/python3
"""
This script modifies the visibility of layers in the XML
file with diagram generated by drawio.
It works around the problem of lack of a possibility to export
only the selected layers from the CLI version of drawio.
Written by Wojciech M. Zabolotny 6.10.2022
(wzab01<at>gmail.com or wojciech.zabolotny<at>pw.edu.pl)
The code is published under LGPL V2 license
"""
from lxml import etree as let
import xml.etree.ElementTree as et
import xml.parsers.expat as pe
from io import StringIO
import os
import sys
import shutil
import zlib
import argparse
PARSER = argparse.ArgumentParser()
PARSER.add_argument("--layers", help="Selected layers, \"all\", comma separated list of integers or integer ranges like \"0-3,6,7\"", default="all")
PARSER.add_argument("--layer_prefix", help="Layer name prefix", default="Layer_")
PARSER.add_argument("--outfile", help="Output file", default="output.drawio")
PARSER.add_argument("--infile", help="Input file", default="input.drawio")
ARGS = PARSER.parse_args()
INFILENAME = ARGS.infile
OUTFILENAME = ARGS.outfile
# Find all elements with 'value' starting with the layer prefix.
# Return tuples with the element and the rest of 'value' after the prefix.
def find_layers(el_start):
res = []
for el in el_start:
val = el.get('value')
if val is not None:
if val.find(ARGS.layer_prefix) == 0:
# This is a layer element. Add it, and its name
# after the prefix to the list.
res.append((el,val[len(ARGS.layer_prefix):]))
continue
# If it is not a layer element, scan its children
res.extend(find_layers(el))
return res
# Analyse the list of visible layers, and create the list
# of layers that should be visible. Customize this part
# if you want a more sophisticate method for selection
# of layers.
# Now only "all", comma separated list of integers
# or ranges of integers are supported.
def build_visible_list(layers):
if layers == "all":
return layers
res = []
for lay in layers.split(','):
# Is it a range?
s = lay.find("-")
if s > 0:
# This is a range
first = int(lay[:s])
last = int(lay[(s+1):])
res.extend(range(first,last+1))
else:
res.append(int(lay))
return res
def is_visible(layer_tuple,visible_list):
if visible_list == "all":
return True
if int(layer_tuple[1]) in visible_list:
return True
try:
EL_ROOT = et.fromstring(open(INFILENAME,"r").read())
except et.ParseError as perr:
# Handle the parsing error
ROW, COL = perr.position
print(
"Parsing error "
+ str(perr.code)
+ "("
+ pe.ErrorString(perr.code)
+ ") in column "
+ str(COL)
+ " of the line "
+ str(ROW)
+ " of the file "
+ INFILENAME
)
sys.exit(1)
visible_list = build_visible_list(ARGS.layers)
layers = find_layers(EL_ROOT)
for layer_tuple in layers:
if is_visible(layer_tuple,visible_list):
print("set "+layer_tuple[1]+" to visible")
layer_tuple[0].attrib['visible']="1"
else:
print("set "+layer_tuple[1]+" to invisible")
layer_tuple[0].attrib['visible']="0"
# Now write the modified file
t=et.ElementTree(EL_ROOT)
with open(OUTFILENAME, 'w') as f:
t.write(f, encoding='unicode')
The maintained version of that script, together with a demonstration of its use is also available in my github repository.

Expected type '{__name__}', got '() -> None' instead

I have a question about my Python(3.6) code or PyCharm IDE on MacBook
I wrote a function using "timeit" to test time spent by other function
def timeit_func(func_name, num_of_round=1):
print("start" + func_name.__name__ + "()")
str_setup = "from __main__ import " + func_name.__name__
print('%s() spent %f s' % (func_name.__name__,
timeit.timeit(func_name.__name__ + "()",
setup=str_setup,
number=num_of_round)))
print(func_name.__name__ + "() finish")
parameter "func_name" is just a function need to be tested and has already been defined.
and I call this function with the code
if __name__ == "__main__":
timeit_func(func_name=another_function)
the function works well, but pycharm show the info with this code "func_name=another_function":
Expected type '{__name__}', got '() -> None' instead less... (⌃F1 ⌥T)
This inspection detects type errors in function call expressions. Due to dynamic dispatch and duck typing, this is possible in a limited but useful number of cases. Types of function parameters can be specified in docstrings or in Python 3 function annotations
I have googled "Expected type '{name}', got '() -> None" but got nothing helpful.I am new on Python.
I want to ask what it means? And how can I let this information disappear? because now it is highlighted and let me feel uncomfortable.
I use it in Python3.6 byimport time,this is what I found in the doc of timeit module()(timeit.timeit())
def timeit(stmt="pass", setup="pass", timer=default_timer, number=default_number, globals=None):
"""Convenience function to create Timer object and call timeit method."""
return Timer(stmt, setup, timer, globals).timeit(number)
Your parameter func_name is badly named because you are passing it a function, not the name of a function. This probably indicates the source of your confusion.
The error message is simply saying that pycharm is expecting you to pass an object with an attribute __name__ but it was given a function instead. Functions do have that attribute but it is part of the internal detail, not something you normally need to access.
The simplest solution would be to work with the function directly. The documentation for timeit isn't very clear on this point, but you can actually give it a function (or any callable) instead of a string. So your code could be:
def timeit_func(func, num_of_round=1):
print("start" + func.__name__ + "()")
print('%s() spent %f s' % (func.__name__,
timeit.timeit(func,
number=num_of_round)))
print(func.__name__ + "() finish")
if __name__ == "__main__":
timeit_func(func=another_function)
That at least makes the code slightly less confusing as the parameter name now matches the value rather better. I don't use pycharm so I don't know if it will still warn, that probably depends whether it knows that timeit takes a callable.
An alternative that should get rid of the error would be to make the code match your parameter name by actually passing in a function name:
def timeit_func(func_name, num_of_round=1):
print("start" + func_name + "()")
str_setup = "from __main__ import " + func_name
print('%s() spent %f s' % (func_name,
timeit.timeit(func_name + "()",
setup=str_setup,
number=num_of_round)))
print(func_name + "() finish")
if __name__ == "__main__":
timeit_func(func_name=another_function.__name__)
This has the disadvantage that you can now only time functions defined and importable from in your main script whereas if you actually pass the function to timeit you could use a function defined anywhere.

Is it possible to read pdf/audio/video files(unstructured data) using Apache Spark?

Is it possible to read pdf/audio/video files(unstructured data) using Apache Spark?
For example, I have thousands of pdf invoices and I want to read data from those and perform some analytics on that. What steps must I do to process unstructured data?
Yes, it is. Use sparkContext.binaryFiles to load files in binary format and then use map to map value to some other format - for example, parse binary with Apache Tika or Apache POI.
Pseudocode:
val rawFile = sparkContext.binaryFiles(...
val ready = rawFile.map ( here parsing with other framework
What is important, parsing must be done with other framework like mentioned previously in my answer. Map will get InputStream as an argument
We had a scenario where we needed to use a custom decryption algorithm on the input files. We didn't want to rewrite that code in Scala or Python. Python-Spark code follows:
from pyspark import SparkContext, SparkConf, HiveContext, AccumulatorParam
def decryptUncompressAndParseFile(filePathAndContents):
'''each line of the file becomes an RDD record'''
global acc_errCount, acc_errLog
proc = subprocess.Popen(['custom_decrypt_program','--decrypt'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(unzippedData, err) = proc.communicate(input=filePathAndContents[1])
if len(err) > 0: # problem reading the file
acc_errCount.add(1)
acc_errLog.add('Error: '+str(err)+' in file: '+filePathAndContents[0]+
', on host: '+ socket.gethostname()+' return code:'+str(returnCode))
return [] # this is okay with flatMap
records = list()
iterLines = iter(unzippedData.splitlines())
for line in iterLines:
#sys.stderr.write('Line: '+str(line)+'\n')
values = [x.strip() for x in line.split('|')]
...
records.append( (... extract data as appropriate from values into this tuple ...) )
return records
class StringAccumulator(AccumulatorParam):
''' custom accumulator to holds strings '''
def zero(self,initValue=""):
return initValue
def addInPlace(self,str1,str2):
return str1.strip()+'\n'+str2.strip()
def main():
...
global acc_errCount, acc_errLog
acc_errCount = sc.accumulator(0)
acc_errLog = sc.accumulator('',StringAccumulator())
binaryFileTup = sc.binaryFiles(args.inputDir)
# use flatMap instead of map, to handle corrupt files
linesRdd = binaryFileTup.flatMap(decryptUncompressAndParseFile, True)
df = sqlContext.createDataFrame(linesRdd, ourSchema())
df.registerTempTable("dataTable")
...
The custom string accumulator was very useful in identifying corrupt input files.

Using apply functions in SparkR

I am currently trying to implement some functions using sparkR version 1.5.1. I have seen older (version 1.3) examples, where people used the apply function on DataFrames, but it looks like this is no longer directly available. Example:
x = c(1,2)
xDF_R = data.frame(x)
colnames(xDF_R) = c("number")
xDF_S = createDataFrame(sqlContext,xDF_R)
Now, I can use the function sapply on the data.frame object
xDF_R$result = sapply(xDF_R$number, ppois, q=10)
When I use a similar logic on the DataFrame
xDF_S$result = sapply(xDF_S$number, ppois, q=10)
I get the error message "Error in as.list.default(X) :
no method for coercing this S4 class to a vector"
Can I somehow do this?
This is possible with user defined functions in Spark 2.0.
wrapper = function(df){
+ out = df
+ out$result = sapply(df$number, ppois, q=10)
+ return(out)
+ }
> xDF_S2 = dapplyCollect(xDF_S, wrapper)
> identical(xDF_S2, xDF_R)
[1] TRUE
Note you need a wrapper function like this because you can't pass the extra arguments in directly, but that may change in the future.
The native R functions do not support Spark DataFrames. We can use user defined functions in SparkR to execute native R modules. These are executed on the executors and thus the libraries must be available on all the executors.
For example, suppose we have a custom function holt_forecast which takes in a data.table as an argument.
Sample R code
sales_R_df %>%
group_by(product_id) %>%
do(holt_forecast(data.table(.))) %>%
data.table(.) -> dt_holt
For using UDFs, we need to specify the schema of the output data.frame returned by the execution of the native R method. This schema is used by Spark to generate back the Spark DataFrame.
Equivalent SparkR code
Define the schema
structField("product_id", "integer"),
structField("audit_date", "date"),
structField("holt_unit_forecast", "double"),
structField("holt_unit_forecast_std", "double")
)
Execute the method
library(data.table)
library(lubridate)
library(dplyr)
library(forecast)
sales <- data.table(x)
y <- data.frame(key,holt_forecast(sales))
}, dt_holt_schema)
Reference: https://shbhmrzd.medium.com/stl-and-holt-from-r-to-sparkr-1815bacfe1cc

HMAC-SHA1 in bash

Is there a bash script available to generate a HMAC-SHA1 hash?
The equivalent of the following PHP code:
hash_hmac("sha1", "value", "key", TRUE);
Parameters
true : When set to TRUE, outputs raw binary data. FALSE outputs lowercase hexits.
Thanks.
see HMAC-SHA1 in bash
In bash itself, no, it can do a lot of stuff but it also knows when to rely on external tools.
For example, the Wikipedia page provides a Python implementation which bash can call to do the grunt work for HMAC_MD5, repeated below to make this answer self-contained:
#!/usr/bin/env python
from hashlib import md5
trans_5C = "".join(chr(x ^ 0x5c) for x in xrange(256))
trans_36 = "".join(chr(x ^ 0x36) for x in xrange(256))
blocksize = md5().block_size
def hmac_md5(key, msg):
if len(key) > blocksize:
key = md5(key).digest()
key += chr(0) * (blocksize - len(key))
o_key_pad = key.translate(trans_5C)
i_key_pad = key.translate(trans_36)
return md5(o_key_pad + md5(i_key_pad + msg).digest())
if __name__ == "__main__":
h = hmac_md5("key", "The quick brown fox jumps over the lazy dog")
print h.hexdigest() # 80070713463e7749b90c2dc24911e275
(keeping in mind that Python also contains SHA1 stuff as well, see here for details on how to use HMAC with the hashlib.sha1() constructor).
Or, if you want to run the exact same code as PHP does, you could try running it with phpsh, as detailed here.

Resources