How do I get CsvDozerBeanWriter to pull column headers from Dozer XML mapping files - supercsv

I'm writing a feature to produce CSV snapshots of screen data.
I need this to be data-driven. Thus I need to avoid hard-coding each snapshot in Java, but rather load it from a data source such as an XML file or a database. The data is contained in Java beans.
I'm using SuperCSV with the Dozer extension both at 2.1.0.
This combination seems perfect since I can code the mappings from the beans to the columns in Dozer XML mapping files.
This works well for the data, but I have not found a way to specify the strings to use for the CSV's column headers other than to hard-code them in Java as is done in all of the examples and test cases I've looked at. That is not data-driven.
Is there a way for me to code the column headers in the mapper file. Or even to extract them from the mapper file, construct a List and pass them to the writerHeader() method?
I think it would be OK to just use the bean property names as the headers, although ideal situation is that I am provided some additional meta-data notation in the XML's <Field> tag that specifies the header.
I'd have posted this on SourceForge, but I'm getting a 500 error there.

I'm a Super CSV developer. You're the first person I've heard of who's using CsvDozerBeanWriter with their own DozerBeanMapper - great to hear that feature is useful :)
So what's the goal of being 'data driven'? It sounds like you want your code to be really generic, so you can alter the CSV just by changing the XML. Is that right? Of course, you can't configure the cell processors dynamically...or are you trying to do that too!!??
I'd take a look at the MappingMetadata API of Dozer, which you can access by calling getMappingMetadata() on the DozerBeanMapper. I've never used it, but it looks like you could derive the column names this way (though you'd probably be limited to the field names).
Otherwise, you'll have to parse the XML file yourself (I'd probably use XPath). You'd have to do it this way if you want to use some other metadata in the XML for the column name.

Related

Using Page Objects vs Config Files in Selenium

I've been using Ruby Selenium-Webdriver for one of the automation scripts I'm developing and I'm being asked to use Page Objects, we use page objects a lot however for this application I am using CSV file instead, I have defined all the xpaths that I'm using in my application in a CSV file and I'm parsing that CSV file in my script to refer to those objects, I would like to know is there much of a difference in using a class for defining Page Objects or using a CSV file instead apart from performance concern? I believe using a CSV file will be an addon for us from configuration standpoint and will make it much easier to maintain, any suggestions on this?
Edit - In our use case, we're actually automating applications built on a cloud based tool, so basically all the applications share same design structure from HTML standpoint so we define xpath patterns in CSV and then we pass certain parameters to some custom methods that we've developed to generate xpath's automatically using the CSV instead of finding those manually as its overhead for us because we already know that all the applications will share similar xpath pattern for all elements.
Thanks
I think, POM is better than CSV approach. In POM, you put elements for a page in a separate class file. So, if any change is to make then it's easier to find where to change/maintain. Moreover, it won't get too messy as CSV file and you don't need to use extra utility function to parse those.
There is also a pageobjects gem that provides a set of libraries over and above webdriver/watir, simplifying the code.
Plus, why xpaths? Its one of the last recommended ways to identify an element.
As for the frameork aspect, csv should be more of a maintenance problem than PageObjects. Its the basic difference between text and code. You enforce Object oriented approach on your elements in PageObjects but that is not possible with csv.
In the best case scenario, you have created a column/separate sheets defining which page that element xpath belongs to. That sounds like an overhead. As your application / suite grows there can be thousands of elements. Imagine parsing/ manually updating a csv with that kind of data.
Instead in PageObjects, your elements will be restricted to the Page. Any changes to the app will also specify which elements may get impacted. Now, when define your element as an object in PageObject, rather than css, you also dont need to explicitly create your elements by reading the csv.
It completely depends on the application and the type of test you might perform.
Since it is an automated test script, you do not have to really worry about the performance of the script (it might take few more milli seconds to parse, which should be OK).
Maintaining all the elements identification properties & corresponding actions in a CSV file will make the maintenance easier and make the framework application independent which are nice. But maintaining your framework is bit difficult to make it more robust. Both approaches have its own pros and cons.
Refer to below posts [examples are in java - but you will get the idea]:
Keyword driven framework
Advanced Page Objects
Update:
If you like both, you can comeup with your implementation to easily integrate these too.
#ObjectRepository(src="/login.csv")
public class LoginPage{
private Map<String, WebElement> elements;
public void login(){
elements.get("username").sendKeys('');
elements.get("password").sendKeys('');
elements.get("signin").click();
}
}
Ie, define all the elements in a config file like csv/json etc. Let the page object refer to the class for the page elements. All the methods will be part of the page class.

Issue with setting multiple projectionSchemas for AvroParquetInputFormat

I use AvroParquetInputFormat. The usecase requires scanning of multiple input directories and each directory will have files with one schema. Since AvroParquetInputFormat class could not handle multiple input schemas, I created a workaround by statically creating multiple dummy classes like MyAvroParquetInputFormat1, MyAvroParquetInputFormat2 etc where each class just inherits from AvroParquetInputFormat. And for each directory, I set a different MyAvroParquetInputFormat and that worked (please let me know if there is a cleaner way to achieve this).
My current problem is as follows:
Each file has a few hundred columns and based on meta-data I construct a projectionSchema for each directory, to reduce unnecessary disk & network IO. I use the static setRequestedProjection() method on each of my MyAvroParquetInputFormat classes. But, being static, the last call’s projectionSchema is used for reading data from all directories, which is not the required behavior.
Any pointers to workarounds/solutions would is highly appreciated.
Thanks & Regards
MK
Keep in mind that if your avro schemas are compatible (see avro doc for definition of schema compatibility) you can access all the data with a single schema. Extending on this, it is also possible to construct a parquet friendly schema (no unions) that is compatible with all your schemas so you can use just that one.
As for the approach you took, there is no easy way of doing this that I know of. You have to extend MultipleInputs functionality somehow to assign a different schema for each of your input formats. MultipleInputs works by setting two configuration properties in your job configuration:
mapreduce.input.multipleinputs.dir.formats //contains a comma separated list of InputFormat classes
mapreduce.input.multipleinputs.dir.mappers //contains a comma separated list of Mapper classes.
These two lists must be the same length. And this is where it gets tricky. This information is used deep within hadoop code to initialize mappers and input formats, so that's where you should add your own code.
As an alternative, I would suggest that you do the projection using one of the tools already available, such as hive. If there are not too many different schemas, you can write a set of simple hive queries to do the projection for each of the schemas, and after that you can use a single mapper to process the data or whatever the hell you want.

Yaml properties as a Map in Spring Boot

We have a spring-boot project and are using application.yml files. This works exactly as described in the spring-boot documentation. spring-boot automatically looks in several locations for the files, and obeys any environment overrides we use for the location of those files.
Now we want to also expose those yaml properties as a Map. According to the documentation this can be done with YamlMapFactoryBean. However YamlMapFactoryBean wants me to specify which yaml files to use via the resources property. I want it to use the same yaml files and processing hierarchy that it used when creating properties, so that I can take still take advantage of "magical" features such as placeholder resolution in property values.
I didn't see any documentation on if this was possible.
I was thinking of writing a MapFactoryBean that looked at the environment and simply reversed the "flattening" performed by the YamlProcessor when creating the properties representation of the file.
Any other ideas?
The ConfigFileApplicationContextListener contains the logic for searching for files in various locations. And PropertySourcesLoader loads a file (Resource) into property sources. Neither is really designed for standalone use, but you could easily duplicate them if you want more control. The PropertySourcesLoader delegates to a collection of PropertySourceLoaders so you could add one of the latter that delegates to your YamlMapFactoryBean.
A slightly awkward but workable solution would be to use the existing machinery to collect the YAML on startup. Add a new PropertySourceLoader to your META-INF/spring.factories and let it create new property sources, then post process the Environment to extract the source map(s).
Beware, though: creating a single Map from multiple YAML files, or even a single one with multiple documents (let alone multiple files with multiple documents) isn't as easy as you might think. You have a map-merge problem, and someone is going to have to define the algorithm. The flattening done in YamlMapPropertiesBean and the merge in YamlMapFactoryBean are just two choices out of (probably) a larger set of possibilities.

Create Value class for Sequence Files at runtime

I have some types of data that I have to upload on HDFS as Sequence Files.
Initially, I had thought of creating a .jr file at runtime depending on the type of schema and use rcc DDL tool by Hadoop to create these classes and use them.
But looking at rcc documentation, I see that it has been deprecated. I was trying to see what other options I have to create these value classes per type of data.
This is a problem as I get to know the metadata of the data to be loaded at runtime along with the data-stream. So, I have, no choice, but to create Value class at runtime and then use it for writing (key, vale) to SequenceFile.Writer and finally saving it on HDFS.
Is there any solution for this problem?
You can try looking other serialization frameworks, like Protocol Buffers, Thrift, or Avro. You might want to look at Avro first, since it doesn't require static code generation, which might be more suitable for you.
Or if you want something really quick and dirty, each record in the SequenceFile can be a HashMap where the key/values are the name of the field and the value.

xsd - validating values from external dictionary file

I would like to define a schema for a document like:
...
<car>
<make>ford</make>
<model>mondeo</model>
</car>
...
the problem is that I would like to constraint possible values (so ford/mondeo or audi/a4 would be valid values for make/model, but audi/mondeo would not) from external data dictionary. In case when new car models needs to be added only external data file would change, but xsd schema would remain the same.
Is this possible at all? I have looked at key/keyref constraint, I see I can use them within a single document, but this is not I'm looking for. I don't want to repeat full data dictionary with every document instance, I would prefer to have the data file rather constitute part of the schema.
That is not possible in XML Schema 1.0.
XML Schema 1.1 will add some support that will allow expressing this kind of constraints (although AFAIK not in external files) - but that is not yet a W3C recommendation.
It is possible to implement this now with Schematron, eventually embedded in XML Schema.
However, there was already work in this area with usable results. See OASIS Code Lists
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=codelist
More details can be found here:
http://www.genericode.org/
This is used in the OASIS Universal Business Language (UBL)
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl
Best Regards,
George

Resources