Sax XML Parser, switch does not take a string - enums

We are trying to parse an xml file using sax parser, but we faced a problem using switch in :
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
switch(MyEnum.valueOf(qNam))
case tag1:
.......
break;
case tag2:
........
break;
case tag5:
..........
In each case we are populating some pojo objects.
The problem is when the parser encounter a tag that we are ignoring it throw an exception.

Exception is thrown because your own code calls MyEnum.valueOf with argument that is not guaranteed to be name of enum constant.
Because you want to ignore Exception, it is likely better to not have exception thrown at all. That can be done for example by adding following method to MyEnum:
public static boolean isOneOfTheValues(String val) {
for (MyEnum m: values()) {
if (m.name().equals(val)) {
return true;
}
}
return false;
}
and then not going in to switch statement at all if it is known to be unknown value:
if (!MyEnum.isOneOfTheValues(s)) {
return;
}
switch(MyEnum.valueOf(qNam))
If enumeration contains many constants, using rebuild set instead of iterating over return value of values() can provide better performance.

Related

How to refactor cascade if statements

I found this question on https://github.com/arialdomartini/Back-End-Developer-Interview-Questions#snippets
And I am curious about your opinion, I just can't find an decent solution of this refactor, and what pattern would apply in this very common case.
function()
{
HRESULT error = S_OK;
if(SUCCEEDED(Operation1()))
{
if(SUCCEEDED(Operation2()))
{
if(SUCCEEDED(Operation3()))
{
if(SUCCEEDED(Operation4()))
{
}
else
{
error = OPERATION4FAILED;
}
}
else
{
error = OPERATION3FAILED;
}
}
else
{
error = OPERATION2FAILED;
}
}
else
{
error = OPERATION1FAILED;
}
return error;
}
Do you have any idea of how to refactor this?
Actually, I feel there is way more space for refactoring than what suggested by Sergio Tulentsev.
The questions in the repo you linked are more about starting a conversation on code than closed-ended questions. So, I think it is worth to discuss the smells and design flaws of that code, to set up the refactoring goals.
Smells
I see these problems:
The code violates some of the SOLID principles. It surely violates the Open Closed Principle, as it is not possible to extend it without changing its code. E.g., adding a new operation would require adding a new if/else branch;
It also violate the Single Responsibility Principle. It just does too much. It performs error checks, it's responsible to execute all the 4 operations, it contains their implementations, it's responsible to check their results and to chain their execution in the right order;
It violates the Dependency Inversion Principle, because there are dependencies between high-level and low-level components;
It has a horrible Cyclomatic complexity
It exhibits high coupling and low cohesion, which is exactly the opposite of what is recommended;
It contains a lot of code duplication: the function Succeeded() is repeated in each branch; the structure of if/elses is replicated over and over; the assignment of error is duplicated.
It could have a pure functional nature, but it relies instead on state mutation, which makes reasoning about it not easy.
There's an empty if statement body, which might be confusing.
Refactoring
Let's see what could be done.
Here I'm using a C# implementation, but similar steps can be performed with whatever language.
I renamed some of the elements, as I believe honoring a naming convention is part of the refactoring.
internal class TestClass
{
HResult SomeFunction()
{
var error = HResult.Ok;
if(Succeeded(Operation1()))
{
if(Succeeded(Operation2()))
{
if(Succeeded(Operation3()))
{
if(Succeeded(Operation4()))
{
}
else
{
error = HResult.Operation4Failed;
}
}
else
{
error = HResult.Operation3Failed;
}
}
else
{
error = HResult.Operation2Failed;
}
}
else
{
error = HResult.Operation1Failed;
}
return error;
}
private string Operation1()
{
// some operations
return "operation1 result";
}
private string Operation2()
{
// some operations
return "operation2 result";
}
private string Operation3()
{
// some operations
return "operation3 result";
}
private string Operation4()
{
// some operations
return "operation4 result";
}
private bool Succeeded(string operationResult) =>
operationResult == "some condition";
}
internal enum HResult
{
Ok,
Operation1Failed,
Operation2Failed,
Operation3Failed,
Operation4Failed,
}
}
For the sake of simplicity, I supposed each operation returns a string, and that the success or failure is based on an equality check on the string, but of course it could be whatever. In the next steps, it would be nice if the code is independent from the result validation logic.
Step 1
It would be nice to start the refactoring with the support of some test harness.
public class TestCase
{
[Theory]
[InlineData("operation1 result", HResult.Operation1Failed)]
[InlineData("operation2 result", HResult.Operation2Failed)]
[InlineData("operation3 result", HResult.Operation3Failed)]
[InlineData("operation4 result", HResult.Operation4Failed)]
[InlineData("never", HResult.Ok)]
void acceptance_test(string failWhen, HResult expectedResult)
{
var sut = new SomeClass {FailWhen = failWhen};
var result = sut.SomeFunction();
result.Should().Be(expectedResult);
}
}
Our case is a trivial one, but being the quiz supposed to be a job interview question, I would not ignore it.
Step 2
The first refactoring could be getting rid of the mutable state: each if branch could just return the value, instead of mutating the variable error. Also, the name error is misleading, as it includes the success case. Let's just get rid of it:
HResult SomeFunction()
{
if(Succeeded(Operation1()))
{
if(Succeeded(Operation2()))
{
if(Succeeded(Operation3()))
{
if(Succeeded(Operation4()))
return HResult.Ok;
else
return HResult.Operation4Failed;
}
else
return HResult.Operation3Failed;
}
else
return HResult.Operation2Failed;
}
else
return HResult.Operation1Failed;
}
We got rid of the empty if body, making in the meanwhile the code slightly easier to reason about.
Step 3
If now we invert each if statement (the step suggested by Sergio)
internal HResult SomeFunction()
{
if (!Succeeded(Operation1()))
return HResult.Operation1Failed;
if (!Succeeded(Operation2()))
return HResult.Operation2Failed;
if (!Succeeded(Operation3()))
return HResult.Operation3Failed;
if (!Succeeded(Operation4()))
return HResult.Operation4Failed;
return HResult.Ok;
}
we make it apparent that the code performs a chain of executions: if an operation succeeds, the next operation is invoked; otherwise, the chain is interrupted, with an error. The GOF Chain of Responsibility Pattern comes to mind.
Step 4
We could move each operation to a separate class, and let our function receive a chain of operations to execute in a single shot. Each class would deal with its specific operation logic (honoring the Single Responsibility Principle).
internal HResult SomeFunction()
{
var operations = new List<IOperation>
{
new Operation1(),
new Operation2(),
new Operation3(),
new Operation4()
};
foreach (var operation in operations)
{
if (!_check.Succeeded(operation.DoJob()))
return operation.ErrorCode;
}
return HResult.Ok;
}
We got rid of the ifs altogether (but one).
Notice how:
The interface IOperation has been introduced, which is a preliminary move to decouple the function from the operations, complying the with the Dependency Inversion Principle;
The list of operations can easily be injected into the class, using the Dependency Injection.
The result validation logic has been moved to a separate class Check, injected into the main class (Dependency Inversion and Single Responsibility are satisfied).
internal class SimpleStringCheck : IResultCheck
{
private readonly string _failWhen;
public Check(string failWhen)
{
_failWhen = failWhen;
}
internal bool Succeeded(string operationResult) =>
operationResult != _failWhen;
}
We gained the ability to switch the check logic without modifying the main class (Open-Closed Principle).
Each operation has been moved to a separate class, like:
internal class Operation1 : IOperation {
public string DoJob()
{
return "operation1 result";
}
public HResult ErrorCode => HResult.Operation1Failed;
}
Each operation knows its own error code. The function itself became independent from it.
Step 5
There is something more to refactor on the code
foreach (var operation in operations)
{
if (!_check.Succeeded(operation.DoJob()))
return operation.ErrorCode;
}
return HResult.Ok;
}
First, it's not clear why the case return HResult.Ok; is handled as a special case: the chain could contain a terminating operation never failing and returning that value. This would allow us to get rid of that last if.
Second, our function still has 2 responsibility: to visit the chain, and to check the result.
An idea could be to encapsulate the operations into a real chain, so our function could reduce to something like:
return operations.ChainTogether(_check).Execute();
We have 2 options:
Each operation knows the next operation, so starting from operation1 we could execute the whole chain with a single call;
Operations are kept unaware of being part of a chain; a separate, encapsulating structure adds to operations the ability to be executed in sequence.
I'm going on with the latter, but that's absolutely debatable. I'm introducing a class modelling a ring in a chain, moving the code away from our class:
internal class OperationRing : IRing
{
private readonly Check _check;
private readonly IOperation _operation;
internal IRing Next { private get; set; }
public OperationRing(Check check, IOperation operation)
{
_check = check;
_operation = operation;
}
public HResult Execute()
{
var operationResult = _operation.DoJob();
if (_check.Succeeded(operationResult))
return Next.Execute();
return _operation.ErrorCode;
}
}
This class is responsible to execute an operation and to handle the execution to the next ring if it succeeded, or to interrupt the chain returning the right error code.
The chain will be terminated by a never-failing element:
internal class AlwaysSucceeds : IRing
{
public HResult Execute() => HResult.Ok;
}
Our original class reduces to:
internal class SomeClass
{
private readonly Check _check;
private readonly List<IOperation> _operations;
public SomeClass(Check check, List<IOperation> operations)
{
_check = check;
_operations = operations;
}
internal HResult SomeFunction()
{
return _operations.ChainTogether(_check).Execute();
}
}
In this case, ChainTogether() is a function implemented as an extension of List<IOperation>, as I don't believe that the chaining logic is responsibility of our class.
That's not the right answer
It's absolutely debatable that the responsibilities have been separated to the most appropriate classes. For example:
is chaining operations a task of our function? Or should it directly receive the chained structure?
why the use of an enumerable? As Robert Martin wrote in "Refactoring: Improving the Design of Existing Code": enums are code smells and should be refactored to polymorphic classes;
how much is too much? Is the resulting design too complex? Does the complexity of the whole application need this level of modularisation?
Therefore, I'm sure there are several other ways to refactor the original function. In a job interview, or in a pair programming session, I expect a lot of discussions and evaluations to occur.
You could use early returns here.
function() {
if(!SUCCEEDED(Operation1())) {
return OPERATION1FAILED;
}
if(!SUCCEEDED(Operation2())) {
return OPERATION2FAILED;
}
if(!SUCCEEDED(Operation3())) {
return OPERATION3FAILED;
}
if(!SUCCEEDED(Operation4())) {
return OPERATION4FAILED;
}
# everything succeeded, do your thing
return S_OK;
}

Parallel Stream repeating items

I am retrieving big chunks of data from DB and using this data to write it somewhere else. In order to avoid a long processing time, I'm trying to use parallel streams to write it. When I run this as sequential streams, it works perfectly. However, if I change it to parallel, the behavior is odd: it prints the same object multiple times (more than 10).
#PostConstruct
public void retrieveAllTypeRecords() throws SQLException {
logger.info("Retrieve batch of Type records.");
try {
Stream<TypeRecord> typeQueryAsStream = jdbcStream.getTypeQueryAsStream();
typeQueryAsStream.forEach((type) -> {
logger.info("Printing Type with field1: {} and field2: {}.", type.getField1(), type.getField2()); //the same object gets printed here multiple times
//write this object somewhere else
});
logger.info("Completed full retrieval of Type data.");
} catch (Exception e) {
logger.error("error: " + e);
}
}
public Stream<TypeRecord> getTypeQueryAsStream() throws SQLException {
String sql = typeRepository.getQueryAllTypesRecords(); //retrieves SQL query in String format
TypeMapper typeMapper = new TypeMapper();
JdbcStream.StreamableQuery query = jdbcStream.streamableQuery(sql);
Stream<TypeRecord> stream = query.stream()
.map(row -> {
return typeMapper.mapRow(row); //maps columns values to object values
});
return stream;
}
public class StreamableQuery implements Closeable {
(...)
public Stream<SqlRow> stream() throws SQLException {
final SqlRowSet rowSet = new ResultSetWrappingSqlRowSet(preparedStatement.executeQuery());
final SqlRow sqlRow = new SqlRowAdapter(rowSet);
Supplier<Spliterator<SqlRow>> supplier = () -> Spliterators.spliteratorUnknownSize(new Iterator<SqlRow>() {
#Override
public boolean hasNext() {
return !rowSet.isLast();
}
#Override
public SqlRow next() {
if (!rowSet.next()) {
throw new NoSuchElementException();
}
return sqlRow;
}
}, Spliterator.CONCURRENT);
return StreamSupport.stream(supplier, Spliterator.CONCURRENT, true); //this boolean sets the stream as parallel
}
}
I've also tried using typeQueryAsStream.parallel().forEach((type) but the result is the same.
Example of output:
[ForkJoinPool.commonPool-worker-1] INFO TypeService - Saving Type with field1: L6797 and field2: P1433.
[ForkJoinPool.commonPool-worker-1] INFO TypeService - Saving Type with field1: L6797 and field2: P1433.
[main] INFO TypeService - Saving Type with field1: L6797 and field2: P1433.
[ForkJoinPool.commonPool-worker-1] INFO TypeService - Saving Type with field1: L6797 and field2: P1433.
Well, look at you code,
final SqlRow sqlRow = new SqlRowAdapter(rowSet);
Supplier<Spliterator<SqlRow>> supplier = () -> Spliterators.spliteratorUnknownSize(new Iterator<SqlRow>() {
…
#Override
public SqlRow next() {
if (!rowSet.next()) {
throw new NoSuchElementException();
}
return sqlRow;
}
}, Spliterator.CONCURRENT);
You are returning the same object every time. You achieve your desired effects by implicitly modifying the state of this object when calling rowSet.next().
This obviously can’t work when multiple threads try to access that single object concurrently. Even buffering some items, to hand them over to another thread will cause trouble. Therefore, such interference can cause problems with sequential streams as well, as soon as stateful intermediate operations are involved, like sorted or distinct.
Assuming that typeMapper.mapRow(row) will produce an actual data item which has no interference to other data items, you should integrate this step into the stream source, to create a valid stream.
public Stream<TypeRecord> stream(TypeMapper typeMapper) throws SQLException {
SqlRowSet rowSet = new ResultSetWrappingSqlRowSet(preparedStatement.executeQuery());
SqlRow sqlRow = new SqlRowAdapter(rowSet);
Spliterator<TypeRecord> sp = new Spliterators.AbstractSpliterator<TypeRecord>(
Long.MAX_VALUE, Spliterator.CONCURRENT|Spliterator.ORDERED) {
#Override
public boolean tryAdvance(Consumer<? super TypeRecord> action) {
if(!rowSet.next()) return false;
action.accept(typeMapper.mapRow(sqlRow));
return true;
}
};
return StreamSupport.stream(sp, true); //this boolean sets the stream as parallel
}
Note that for a lot of use cases, like this one, implementing a Spliterator is simpler than implementing an Iterator (which needs to be wrapped via spliteratorUnknownSize anyway). Also, there is no need to encapsulate this instantiation into a Supplier.
As a final note, the current implementation does not perform well for streams with an unknown size, as it treats Long.MAX_VALUE like a very large number, ignoring the “unknown” semantic assigned to it by the specification. It will be very beneficial to the parallel performance to provide an estimate size, it doesn’t need to be precise, in fact, with the current implementation, even a completely made up number, say 1000 may perform better than correctly using Long.MAX_VALUE to denote an entirely unknown size.

Querying single database row using rxjava2

I am using rxjava2 for the first time on an Android project, and am doing SQL queries on a background thread.
However I am having trouble figuring out the best way to do a simple SQL query, and being able to handle the case where the record may or may not exist. Here is the code I am using:
public Observable<Record> createRecordObservable(int id) {
Callable<Record> callback = new Callable<Record>() {
#Override
public Record call() throws Exception {
// do the actual sql stuff, e.g.
// select * from Record where id = ?
return record;
}
};
return Observable.fromCallable(callback).subscribeOn(Schedulers.computation());
}
This works well when there is a record present. But in the case of a non-existent record matching the id, it treats it like an error. Apparently this is because rxjava2 doesn't allow the Callable to return a null.
Obviously I don't really want this. An error should be only if the database failed or something, whereas a empty result is perfectly valid. I read somewhere that one possible solution is wrapping Record in a Java 8 Optional, but my project is not Java 8, and anyway that solution seems a bit ugly.
This is surely such a common, everyday task that I'm sure there must be a simple and easy solution, but I couldn't find one so far. What is the recommended pattern to use here?
Your use case seems appropriate for the RxJava2 new Observable type Maybe, which emit 1 or 0 items.
Maybe.fromCallable will treat returned null as no items emitted.
You can see this discussion regarding nulls with RxJava2, I guess that there is no many choices but using Optional alike in other cases where you need nulls/empty values.
Thanks to #yosriz, I have it working with Maybe. Since I can't put code in comments, I'll post a complete answer here:
Instead of Observable, use Maybe like this:
public Maybe<Record> lookupRecord(int id) {
Callable<Record> callback = new Callable<Record>() {
#Override
public Record call() throws Exception {
// do the actual sql stuff, e.g.
// select * from Record where id = ?
return record;
}
};
return Maybe.fromCallable(callback).subscribeOn(Schedulers.computation());
}
The good thing is the returned record is allowed to be null. To detect which situation occurred in the subscriber, the code is like this:
lookupRecord(id)
.observeOn(AndroidSchedulers.mainThread())
.subscribe(new Consumer<Record>() {
#Override
public void accept(Record r) {
// record was loaded OK
}
}, new Consumer<Throwable>() {
#Override
public void accept(Throwable throwable) {
// there was an error
}
}, new Action() {
#Override
public void run() {
// there was an empty result
}
});

Using org.xmlunit.diff.NodeFilters in XMLUnit DiffBuilder

I am using the XMLUnit in JUnit to compare the results of tests. I have a problem wherein there is an Element in my XML which gets the CURRENT TIMESTAMP as the tests run and when compared with the expected output, the results will never match.
To overcome this, I read about using org.xmlunit.diff.NodeFilters, but do not have any examples on how to implement this. The code snippet I have is as below,
final org.xmlunit.diff.Diff documentDiff = DiffBuilder
.compare(sourcExp)
.withTest(sourceActual)
.ignoreComments()
.ignoreWhitespace()
//.withNodeFilter(Node.ELEMENT_NODE)
.build();
return documentDiff.hasDifferences();
My problem is, how do I implement the NodeFilter? What parameter should be passed and should that be passed? There are no samples on this. The NodeFilter method gets Predicate<Node> as the IN parameter. What does Predicate<Node> mean?
Predicate is a functional interface with a single test method that - in the case of NodeFilter receives a DOM Node as argument and returns a boolean. javadoc of Predicate
An implementation of Predicate<Node> can be used to filter nodes for the difference engine and only those Nodes for which the Predicate returns true will be compared. javadoc of setNodeFilter, User-Guide
Assuming your element containing the timestamp was called timestamp you'd use something like
.withNodeFilter(new Predicate<Node>() {
#Override
public boolean test(Node n) {
return !(n instanceof Element &&
"timestamp".equals(Nodes.getQName(n).getLocalPart()));
}
})
or using lambdas
.withNodeFilter(n -> !(n instanceof Element &&
"timestamp".equals(Nodes.getQName(n).getLocalPart())))
This uses XMLUnit's org.xmlunit.util.Nodes to get the element name more easily.
The below code worked for me,
public final class IgnoreNamedElementsDifferenceListener implements
DifferenceListener {
private Set<String> blackList = new HashSet<String>();
public IgnoreNamedElementsDifferenceListener(String... elementNames) {
for (String name : elementNames) {
blackList.add(name);
}
}
public int differenceFound(Difference difference) {
if (difference.getId() == DifferenceConstants.TEXT_VALUE_ID) {
if (blackList.contains(difference.getControlNodeDetail().getNode()
.getParentNode().getNodeName())) {
return DifferenceListener.RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
}
}
return DifferenceListener.RETURN_ACCEPT_DIFFERENCE;
}
public void skippedComparison(Node node, Node node1) {
}

String cannot be cast to an Iterable error?

So I'm attempting to go through a groovyObject's fields and obtain the property of that field. So this is what I got(sorry its a little rough so cleaning would be appreciated but not necessary, I'm also doing a little debugging and other stuff with the Log and what not.):
public void traverse(final GroovyObject groovy) throws RepositoryException, NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException
{
Field[] theFields = groovy.getClass().getDeclaredFields();
final ArrayList<Field> fields = new ArrayList<Field>();
int count =0;
for(Field field : theFields)
{
fields.add(field);
LOG.error("{} = {}",field.getName(), groovy.getProperty(field.getName()));
}
//this is the guava tree traverser
TreeTraverser<GroovyObject> traverser = new TreeTraverser<GroovyObject>()
{
#Override
public Iterable<GroovyObject> children(GroovyObject root)
{
return (Iterable<GroovyObject>)root.getProperty(fields.get(0).getName());
//|-->Here I get the String cannot be cast to Iterable. Which I find odd since it is still an object just getProperty takes a string. right?
}
};
Thoughts on this? Thanks for the help!
GroovyObject.getProperty(String) retrieves the value of the given property. And if that value happens to be a String you cannot cast it to Iterable.
If you adjust your log statement, you can inspect the types of the fields:
LOG.error("{} of type {} = {}", field.getName(), field.getType(), groovy.getProperty(field.getName()));
So I figured it outl. Essentially what needs to happen is I need to make two iterators: one for the groovy objects and one for the property strings so the end goal looks like
groovyObject.iterate().next().getProperty(string.iterate().next());
Or something like that, I will update this when I figure it out.!
Once I make that I can go back in and think about making it more efficient

Resources