junit duplicate key, but first three times - spring

I run junit tests, with almost the same configuration like main project, in postgres like a main project but on separate test db. Every run i insert prepared data for test.
https://i.stack.imgur.com/tlb92.jpg"
I test service method "addCategory" and three times run i get "duplicate key value violates unique constraint "category_pkey"". the fourth time the entity gets id=4 and all right. I try different generic strategy but problem not solved
How do that test work on first run?

Related

spring batch - how to avoid re-loading(writing) data that was loaded in the previous run

I have a basic spring batch app which is trying to load the data from a csv file to mysql. the program does load the file into db during the first run. However when I accidently re-run the job/app again, it had thrown the primary key violation (for the right reasons).
What is the best way to avoid reloading the data that is present on the target system? when the batch job is scheduled, if for any good reason, the source file has not changed since the previous run, I want to see 0 record processed message rather than a primary key violation error. hope it makes sense.
more information:
Thanks. I have probably not understood the answer. Let me explain my requirement in a better way. I have a file contains the data from external data source (say new hire data) with a fixed name of hire.csv. The file should be updated with the delta changes for every run. As there is a possibility of manual error of not removing all loaded rows, some new hires from previous run would also be present on current run. Is there a mechanism available within itemreader or itemprocessor to skip those records that are already present on the target db? I can do "insert into tb where not in (select from tb)" but this run for every row which I dont want to use. Hope it is clear now. thanks again.
However when I accidently re-run the job/app again, it had thrown the primary key violation (for the right reasons). What is the best way to avoid reloading the data that is present on the target system?
The file you are ingesting should be a (identifying) job parameter. This way, when the first run succeeds, the job instance is complete and cannot be run again. This is by design in Spring Batch for this very use case: preventing accidental job execution twice by error.
Edit: Add further options based on comments
If deleting the file is an option, then you can use a job listener or a final step to delete the file after ingesting it. With this option, you need to add a second identifying paramter (since the file name is always hire.csv) to make sure you have a different job instance for each run. This option does not require having a different file name for each run.
If the file can be renamed to hire-${timestamp}.csv and will be unique, then deleting the file after ingesting it and using a single job parameter with the filename is enough
Side note: I have seen people using a business key to identify records in the input file and using an item processor to query the database and filter items that have been already ingested. This works for small datasets but performs poorly with large datasets due to the additional query for each item.

Visual Studio Load Test - Using Data Source with Multiple Agents

I'm using Visual Studio 2015 Load Test and running a Web Performance test that has a data source connected. The data source contains user login information for 250 users.
Running this in sequential order on a single agent works fine. However, I'm attempting to add in 10 test agents to share out the load. By design the Load Test copies the data source to each agent and it runs the test. What ends up happening is that all 10 agents start the test using the row 1 user from the data source. I'm hoping there's away to set up the Load Test to run sequentially across all agents (ex: Agent 1 uses row 1, Agent 2 uses row 2, Agent 3 uses row 3, etc...)
I suspect there's not an option to set this up, but wondered if anyone ran into this and had workarounds to offer. I did find this info via http://vsptqrg.codeplex.com
Multiple machines running as a rig
Sequential – This works that same as if you are on one machine. Each agent receives a full copy of the data and each starts with row 1 in the data source. Then each agent will run through each row in the data source and continue looping until the load test completes.
Random – This also works the same as if you run the test on one machine. Each agent will receive a full copy of the data source and randomly select rows.
Unique – This one works a little differently. Each row in the data source will be used once. So if you have 3 agents, the data will be spread across the 3 agents and no row will be used more than once. As with one machine, once every row is used, the web test will stop executing.
You Can Split the Data set/CSV and distribute to each Agent, i.e in your case "25 data set"/agent and execute the test.
Each Agent can use their own Data set/CSV.
CSV Split: http://monchito.com/blog/autosplit-csv
The nearest you can get to what you want is to use the unique setting. However each data source row will only be used once, then the test will stop. With a data source containing 250 line only 250 test executions will take place. I do not know the exact distribution of data source rows to agents when unique is specified.
If more than one execution per data source row is wanted then another approach is to have one data source column per agent. Use the agent_id to select the column. Use the sequential data source access. A variation is to just have one set of data in the data source but append the agent_id to some of the values in the data sources. This answer has some variations on these ideas and some code.
Another possibility is to use the MoveDataTableCursor method to set a specific row for each test execution. This could be called in a PreWebTest method of a WebTestPlugin. The code would use the context parameters $AgentId and $WebTestIteration. The call would be based the following:
MoveDataTableCursor(..., ..., $AgentId * NumberOfAgents + $WebTestIteration);
Notes:
The values of $AgentId and $WebTestIteration from the context are strings, they would need to be converted to numbers to do the multiply and add.
Would also need to check whether the two values are zero-based or one-based.
The documentation for MoveDataTableCursor is not very informative

How to order ETL tasks in Sql Server Data Tools (Integration Services)?

I'm a newbie in ETL processing. I am trying to populate a data mart through ETL and have hit a bump. I have 4 ETL tasks(Each task filling a particular table in the Mart) and the problem is that I need to perform them in a particular order so as to avoid constraint violations like Foreign Key constraints. How can I achieve this? Any help is really appreciated.
This is a snap of my current ETL:
Create a separate Data Flow Task for each table you're populating in the Control Flow, and then simply connect them together in the order you need them to run in. You should be able to just copy/paste the components from your current Data Flow to the new ones you create.
The connections between Tasks in the Control Flow are called Precendence Constraints, and if you double-click on one you'll see that they give you a number of options on how to control the flow of your ETL package. For now though, you'll probably be fine leaving it on the defaults - this will mean that each Data Flow Task will wait for the previous one to finish successfully. If one fails, the next one won't start and the package will fail.
If you want some tables to load in parallel, but then have some later tables wait for all of those to be finished, I would suggest adding a Sequence Container and putting the ones that need to load in parallel into it. Then connect from the Sequence Container to your next Data Flow Task(s) - or even from one Sequence Container to another. For instance, you might want one Sequence Container holding all of your Dimension loading processes, followed by another Sequence Container holding all of your Fact loading processes.
A common pattern goes a step further than using separate Data Flow Tasks. If you create a separate package for every table you're populating, you can then create a parent package, and use the Execute Package Task to call each of the child packages in the correct order. This is fantastic for reusability, and makes it easy for you to manually populate a single table when needed. It's also really nice when you're testing, as you don't need to keep disabling some Tasks or re-running the entire load when you want to test a single table. I'd suggest adopting this pattern early on so you don't have a lot of re-work to do later.

JPA(OpenJPA) is very slow when searching on multiple data sources

We have an Spring application running on OpenJPA/Oracle. It has three persistenceUnit to take care of, but all three are on the same oracle instance so every unit has transaction-type as "RESOURCE_LOCAL".
The problem is, when performing a search using very basic finder meithod, like search employee by department(not a primary key, also not indexed but the employee table only owns a thousand record), it takes very long time to respond for result. The same query only take like 0.089 second on SQLplus. Hence open this thread to discuss what could be the main cause of the issue and what could be a possible solution?
Thanks in advance.

Unit Testing DDL with SQL Developer 3.1

SQL Developer supports unit testing of DML but I've not found a way to create unit tests for DDL. What would be a good approach to this problem? The schema I'm starting with is small, less than a dozen tables with larger projects on the horizon. Google isn't returning much to the application of unit tests to DDL. Any ideas on an approach to testing DDL or other tools that exist for unit testing DDL?
What do you want to test about DDL? Either the table is created as defined or it is not.
What you could do is write a series of tests that queries the Data Dictionary to ensure the tables are present, have the columns with the sizes and datatype you want etc. This would be more of a schema verification script than unit tests however, and I am not sure how valuable it would be.
If you maintain a schema build script (or a series of migrations to add new objects to add objects to your schema), then if it applies without errors you know the schema has been created as it was defined.
Then if you have stored procedures, some of them will fail to compile if the schema is not 100% correct. Getting the procedures in cleanly would be another verification step for the schema.
Finally, the unit tests that you write to test the DML and stored procedures will verify that the correct data goes into the correct tables.
You might want some tests to ensure that a table can only accept certain values or columns can be unique etc (ie test the constraints are correct) but that would be down to standard unit tests too.
I am a big believer in writing unit tests for DB code, but I don't like SQL Developers GUI approach of doing it. Right now I am writing tests for an application, but I am coding the tests in Ruby and it seems to be working well. It will also be easily built into our build and automated test process.
Another alternative is UT_PLSQL which I have used before, however simply due to the nature of PLSQL is makes the tests very verbose, which is why I decided to use Ruby for my current project.
I know this is an older question, but I've recently been working to solve the same problem. I think it's useful to define tests for DDL prior to creating objects and then creating those objects to pass those tests.
I've done some of this using an assert "pattern" -- i.e., tdd.ddlunit.assert_tableexists(p_schema_name, p_table_name) which raises an exception if the table doesn't exist, and silently runs when it does.
Other assertions I've created are for things like making sure all varchar2 columns use character semantics instead of byte length semantics, and making sure all tables and columns are commented.
These get checked in to the code repository and can be run via continuous integration frameworks to make sure we have a valid database per what we expect.

Resources