I'm using JMeter to run performance tests, but my sample data set is huge.
We want to simulate production-like traffic, and in order to do that, we need to have a large variety of requests replayed from production logs.
In turn, this causes us to have a huge sample dataset. So the questions are:
What's the recommended CSV sample size for large input samples?
Is CSV Data Config enough to use files that contain 300MB - 500MB or more worth of HTTP request payloads?
Can I just increase JVM memory limits?
Is this method good enough? Is there a better alternative?
Thanks!
The size of the CSV has no impact on memory usage of JMeter provided you use CSV Data Set.
Just don't use the CSVRead function as per the note in documentation.
By the way I see you flagged question as JMeter 3.2, in case you are using it, you should upgrade to JMeter 4.0 which is the most powerful and accurate version.
Related
I just started learning microstream. After going through the examples published to microstream github repository, I wanted to test its performance with an application that deals with more data.
Application source code is available here.
Instructions to run the application and the problems I faced are available here
To summarize, below are my observations
While loading a file with 2.8+ million records, processing takes 5 minutes
While calculating statistics based on loaded data, application fails with an OutOfMemoryError
Why is microstream trying to load all data (4 GB) into memory? Am I doing something wrong?
MicroStream is not like a traditional database and starts from the concept that all data are in memory. And an Object graph can be stored to disk (or other media) when you store this through the StorageManager.
In your case, all data are in 1 list and thus when accessing this list it reads all records from the disk. The Lazy reference isn't useful how you have used it since it just handles the access to the one list with all data.
Some optimizations that you can introduce.
Split the data based on vendorId, or day using a Map<String, Lazy<List>>
When a Map value is 'processed' removed it from the memory again by clearing the lazy reference. https://docs.microstream.one/manual/5.0/storage/loading-data/lazy-loading/clearing-lazy-references.html
Increase the number of Channels to optimize the reading and writing the data. see https://docs.microstream.one/manual/5.0/storage/configuration/using-channels.html
Don't store the object graph every 10000 lines but just at the end of the loading.
Hope this helps you solve the issues you have at the moment
Hi guys I want to run a test ( load test) with almost 200 users. I needed to know the exact hardware configuration I would require like : RAM, disk space, and all.
Can someone help me on this....thanks in Advance.
(P.S : please don't tell me it depends on your user and all I'm new to Jmeter I don't know anything)
Actually it depends on what your test is doing, number of Samplers, number of PreProcessors, PostProcessors, Assertions, request and response size, etc.
Minimal configuration would be:
At least 512KB per thread + around 100 megabytes for JMeter to operate which gives ~100 MB of RAM
At least ~50 MB of disk space (JMeter) + ~400 MB for Java SDK/JRE + how many extra space you need for test data and test results
However only you can get the exact answer by running your actual test and measuring the JMeter's footprint using i.e. JVisualVM or JMeter PerfMon Plugin
Also make sure you're following JMeter Best Practices
#Mehvish It is impossible to say the correct hardware requirement. But, a normal machines of this generation might support it. But, only way to validate is to actually do it.
Please refer this article as it has some good info which might help.
https://sqa.stackexchange.com/questions/15178/what-is-recommended-hardware-infrastructure-for-running-heavy-jmeter-load-tests
My request has 3800 viewstates that are coming from the previous request's response. Its very hard to capture the values one by one using reg expression and replacing them with variables.
Is there any simple way to handle them?
There is an alternative way of recording a JMeter test using a cloud-based proxy service. It is capable of exporting recordings in SmartJMX format with automatic detection and correlation of dynamic parameters so you won't have to handle them manually - the necessary PostProcessors and variables substitutions will be added to the test plan automatically.
Check out How to Cut Your JMeter Scripting Time by 80% article for more details.
In general I would recommend talking to your application developers as almost 4k dynamic parameters are too much, it will create at least massive network IO overhead to pass them back and forth and immense CPU/RAM to parse on both sides.
I have a design question. I have a 3-4 GB data file, ordered by time stamp. I am trying to figure out what the best way is to deal with this file.
I was thinking of reading this whole file into memory, then transmitting this data to different machines and then running my analysis on those machines.
Would it be wise to upload this into a database before running my analysis?
I plan to run my analysis on different machines, so doing it through database would be easier but if I increase the number machines to run my analysis on the database might get too slow.
Any ideas?
#update :
I want to process the records one by one. Basically trying to run a model on a timestamp data but I have various models so want to distribute it so that this whole process run over night every day. I want to make sure that I can easily increase the number of models and not decrease the system performance. Which is why I am planning to distributing data to all the machines running the model ( each machine will run a single model).
You can even access the file in the hard disk itself and reading a small chunk at a time. Java has something called Random Access file for the same but the same concept is available in other languages also.
Whether you want to load into the the database and do analysis should be purely governed by the requirement. If you can read the file and keep processing it as you go no need to store in database. But for analysis if you require the data from all the different area of file than database would be a good idea.
You do not need the whole file into memory, just the data you need for analysis. You can read every line and store only the needed parts of the line and additionally the index where the line starts in file, so you can find it later if you need more data from this line.
Would it be wise to upload this into a database before running my analysis ?
yes
I plan to run my analysis on different machines, so doing it through database would be easier but if I increase the number machines to run my analysis on the database might get too slow.
don't worry about it, it will be fine. Just introduce a marker so the rows processed by each computer are identified.
I'm not sure I fully understand all of your requirements, but if you need to persist the data (refer to it more than once,) then a db is the way to go. If you just need to process portions of these output files and trust the results, you can do it on the fly without storing any contents.
Only store the data you need, not everything in the files.
Depending on the analysis needed, this sounds like a textbook case for using MapReduce with Hadoop. It will support your requirement of adding more machines in the future. Have a look at the Hadoop wiki: http://wiki.apache.org/hadoop/
Start with the overview, get the standalone setup working on a single machine, and try doing a simple analysis on your file (e.g. start with a "grep" or something). There is some assembly required but once you have things configured I think it could be the right path for you.
I had a similar problem recently, and just as #lalit mentioned, I used the RandomAccess file reader against my file located in the hard disk.
In my case I only needed read access to the file, so I launched a bunch of threads, each thread starting in a different point of the file, and that got me the job done and that really improved my throughput since each thread could spend a good amount of time blocked while doing some processing and meanwhile other threads could be reading the file.
A program like the one I mentioned should be very easy to write, just try it and see if the performance is what you need.
#update :
I want to process the records one by one. Basically trying to run a model on a timestamp data but I have various models so want to distribute it so that this whole process run over night every day. I want to make sure that I can easily increase the number of models and not decrease the system performance. Which is why I am planning to distributing data to all the machines running the model ( each machine will run a single model).
I have a bunch of perfmon files that have captured information over a period of time. Whats the best tool to crunch this information? Idealy I'd like to be able to see avg stats per hour for the object counters that have been monitored.
From my experience, even just Excel makes a pretty good tool for quickly whipping up graphs of perfmon if you relog the data to CSV or TSV. You can just plot a rolling average & see the progression. Excel isn't fancy, but if you don't have more than 30-40 megs of data it can do a pretty quick job. I've found that Excel 2007 tends to get unstable when using tables & over 50 megs of data: at one point an 'undo' caused it to consume 100% cpu & 1.3 GB of RAM.
Addendum - relog isn't the best known tool but it is very useful. I don't know of any GUI front ends, so you just have to run it from the command line. The two most common cases I've used it for are
Removing unnecessary counters from logs that different sysadmin gave me, e.g. the entire process & memory objects.
Converting the binary perfmon logs to .csv or .tsv files.
Perhaps look into using LogParser.
It depends on how the info was logged (Perfmon doesn't lack flexibility)
If they're CSV you can even use the ODBC Text drivers and run queries against them!
(performance would be 'intriguing')
And here's the obligatory link to a CodingHorror article on the topic ;-)
This is a free tool provided on Codeplex, provides charting capabilities, and inbuilt thresholds for differnt server roles, which can also be modified. Generates HTML reports.
http://www.codeplex.com/PAL/Release/ProjectReleases.aspx?ReleaseId=21261
Take a look at SmartMon (www.perfmonanalysis.com). It analyzes Perfmon data in CSV and SQL Server databases.