How to perform Solr performance testing? [closed] - performance

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I got the task to test Solr performance testing. I am completely new to Solr and not having idea how to perform here testing.
Solr which we are using, it is utilizing a lot of RAM and CPU. Due to that our application is getting hang and send server error messages.
What would be the way of testing Solr, whether it is required any tool to create multiple concurrent threads?

According to the Solr Quick Start guide
Searching
Solr can be queried via REST clients, cURL, wget, Chrome POSTMAN, etc., as well as via the native clients available for many programming languages.
so you can use "usual" HTTP Request samplers to mimic multiple users concurrently using Solr.
References:
Building a Web Test Plan
Testing SOAP/REST Web Services Using JMeter

For search applications, the amount of requests by itself usually isn't as important as the query profile. There's a lot of internal caching going on, and the only useful way to be able to do decent performance testing, is to use your actual query logs to replicate the query profile that represents your users. You'll also have to use the actual data that you have in your Solr server, so you get (at least) close to the same cardinality for fields and values.
This would mean using the same filters, the same kind of queries and within the same kind of simultaneous load. Since you probably want to go above the load you see in production, using logs for several days as a single day (and be sure to get weekends vs weekdays in there, and if you have a particularly bad day, such as black fridays for ecommerce, keep those logs available so you're able to replicate that profile.
There are (many) tools to do the HTTP requests to Solr, but be sure to use a query profile and sets of queries that actually represent how you're using Solr, otherwise you're just hitting the query cache each single time, or you have data that doesn't represent the actual data in your dataset - which will give you completely irrelevant response times (i.e. random data performs a lot worse than actual english text where tokens are repeated over documents).

You can use solr meter to do the performance testing. read here solr
meter wiki

The most important thing is not how to test the queries - but putting up the scenarios that you want to test and which mimicks the real time application usage you see.
initially you need to decide what you want to find out with your test. Do you want to find bottlenecks? do you want to find out if your current setup can match business requirements? do you need to find the breaking points of your current architecture and setup?
Solr utilizing a lot of CPU is very often related to indexing and memory usage might be related to segment merging - so it sounds like you need to define your senarios.
how much content should you push to Solr for it to perform indexing?
how many queries do you need to send?
what are the features of the queries (facets, highlighting, spellcheck etc)?

You could use Jmeter to test the throughput of your search application and you could also check for IO, load, CPU usage & Ram Usage on each Solr instances.

Related

Impact of `oplog` in meteor application

I implemented oplog on our server and that time our application response time improved but after some hour response time increased and application response was very slow.
Can you let me know the
Disadvantage of Oplog
Impact of Oplog on Meteor Application
what needs to take care while implementing oplog.
Please help me I go through several video and link but not find any satisfactory answer, thanks.
As mentioned in the comments, this is a very broad question and the "correct" answer completely depends on your specific situation (e.g. app needs, use case, etc.). Nevertheless, here is my attempted answer based upon having been thru the challenges of scaling Meteor apps.
If at all possible, you will want to enable oplog tailing in your production Meteor app. If you have done any Meteor development, then you are used to using the oplog because it is enabled by default.
The symptom you describe of the application response times increasing over time and becoming very slow is likely the result of something else going on with your app or hosting environment / infrastructure. I have run multiple production Meteor apps with 100+ simultaneous users and have never experienced this occurring.
There is one specific situation where you would NOT want to use the oplog and that is if you have a complex query where the bulk of the resulting data gets updated often. This can causing CPU spikes and/or thrashing and will kill your apps performance. I have one such application that falls into this category and after extensive testing, I found that it is much better to disable oplog on the query and increase the pollingThrottleMs accordingly. Again, this is an exception case and represents about the only time that you would want to stay away from using the oplog.
These are just some very surface level thoughts on the use of oplog based on my experience. I encourage you to experiment and see what works best for your app.

Http proxy server tests [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have implemented a http proxy client/server. Currently I intended to test this proxy client/server performance. Can anybody help me what approaches exists to make these tests?
If you are looking for some tools the following will be helpful for you:
RoboHydra is a web server designed precisely to help you write and test software that uses HTTP as a communication protocol. There are many ways to use
RoboHydra, but the most common use cases are as follows: RoboHydra
allows you to combine a locally stored website front end with a back
end sat on a remote server, allowing you to test your own local
front end installation with a fully functional back end, without
having to install the back end on your local machine.
If you write a program designed to talk to a server using HTTP, you
can use RoboHydra to imitate that server and pass custom responses
to the program. This can help you reproduce different bugs and
situations that might be otherwise hard, if not impossible, to test.
https://dev.opera.com/articles/robohydra-testing-client-server-interactions/
Webserver Stress Tool simulates large numbers of users accessing a website via HTTP/HTTPS. The software can simulate up to 10.000 users that independently click their way through a set of URLs. Simple URL patterns are supported as well as complex URL patterns (via a Script file).
Webserver Stress Tool supports a number of different testing types. For example
✓ Performance Tests—this test queries single URLs of a webserver or web application to identify and discover elements that may be responsible for slower than expected performance. This test provides a unique opportunity to optimize server settings or application configurations by testing various implementations of single web pages/script to identify the fastest code or settings.
✓ Load Tests—this tests your entire website at the normal (expected) load. For load testing you simply enter the URLs, the number of users, and the time between clicks of your website traffic. This is a “real world” test.
✓ Stress Tests—these are simulations of “brute force” attacks that apply excessive load to your webserver. This type of “brute force” situation can be caused by a massive spike in user activity (i.e., a new advertising campaign). This is a great test to find the traffic threshold for your webserver.
✓ Ramp Tests—this test uses escalating numbers of users over a given time frame to determine the maximum number of users the webserver can accommodate before producing error messages.
✓ Various other tests—working with Webserver Stress Tool simply gives you more insight about your website, e.g. to determine that web pages can be requested simultaneously without problems like database deadlocks, semaphores, etc.
http://www.paessler.com/tools/webstress/features
To better understand what is client-server and web based testing and how to test these applications you may read this post http://www.softwaretestinghelp.com/what-is-client-server-and-web-based-testing-and-how-to-test-these-applications/

how does one identify why a website is slow? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I was asked this question once at an interview:
"Suppose you own a website where the server is at some remote location. One day, some user calls/emails you saying the site is abominably slow. How would you identify why the site is slow? Also, when you check the website yourself as any user would (using your browser), the site behaves just fine."
I could think of only one thing (which was shot down):
Check the server logs to analyse incoming traffic. Maybe a DoS attack or exceptionally high traffic. Interviewer told me to assume the server has normal traffic and no DoS.
I was kind of lost because I had never thought of this problem. I have almost no idea how running a server/website works. So if someone could highlight a few approaches, it would be nice.
While googling around, I could find only this relevant, wonderful article. That article is kind of too technical for me now, but I'm slowly breaking it down and understanding it.
Since you already said when you check the site yourself the speed is fine, this means that (at least for the pages you checked) there is nothing wrong with the server and it can serve those pages at a good speed. What you should be figuring out at this point is what the difference is between you and the user that reports your site is slow. It might be a lot of different things:
Is the user using a slow network connection (mobile for example)?
Does the user experience the same problems with other websites hosted at the same webhoster? If so, this could indicate a network problem. Normally this could also indicate a resource problem at the webserver, but in that case the site would also be slow for you.
If neither of the above leads to an answer, you could assume that the connection to the server and the server itself are fine. This means the problem must be in the users device. Find out which browser/OS he uses and try to replicate the problem. If that fails find out if he uses any antivirus or similar software that might cause problems.
This is a great tool to find the speed of web pages and tells you what makes it slow: https://developers.google.com/speed/pagespeed/insights
I think one of the important thing that is missing from above answers is the server location, which can play a vital in web performance.
When someone is saying that it is taking a longer time to open a web page that means high latency. High latency can be caused due to server location.
Let's assume as you are the owner of the web page then the server and client are co-located, so it will have a low latency.
But, now if client is across the border, then latency time will increase drastically. And hence a slow perfomance.
Another factor is caching which drastically affects the latency time.
Taking the example of facebook, they have server all over the world to reduce the latency time (and also to provide several other advantages) and they use huge caching system to cache their hot data (trending topics) whereas cold data (old data) are stored in hard disk so it takes a longer time to load an older photo or post.
So, a user might would have complained about this as they were trying load up some cold data.
I can think of these few reasons (first two are already mentioned above):
High Latency due to location of client
Server memory might need to be increased
Number of service calls from the page.
If a service could be down at the time of complaint, it could prevent page from loading.
The server load might be too high at the time of the poor experience. The server might need to increase the resources (e.g. adding another server/web server to the cluster).
Check if there was any background job running on the server at that time.
It is important to check the logs and schedules of the batch jobs to determine what all was running at that time.
Hope this help.
Normally the user takes the page loading time as a measure to find out that the site is slow. But if you really want to know that what is taking the maximum time the you can open the browser debugger by pressing f12. if your browser is chrome the click on network and see what calls your application is making and which are taking maximum time. If you are using Firefox the you need to install firebug. If you have that, then again press f12 and click on Net.
One reason could be the role of the user is different of your role. You might be having suppose an administrator privilege (some thing like super user role) and the code might be just allowing everything for such role that means it does not really do much of conditional checking to see what is allowed or not. Some times, it's a considerable over ahead to get all the privileges of the user and have the conditions checking, how course depends how how the authorization is implemented. That means, the page might be really slow for specific roles. Hence, you should find out the roles of the user and see if that is a reason.
Obviously an issue with the connection of the person connecting to your site OR it's possible it was a temporary issue and by the time you checked your site, everything was dandy. You could check your logs or ask your host if there was an issue at the time the slow down occured.
This is usually a memory issue and it can be resolved by increasing the Heap Size of the Web Server hosting the application. In case the application is running on Weblogic Server. Heap size can be increased in "setEnv" file located in Application Home.
Goodluck!
Michael Orebe
Though your question is quite clear, web site optimisation is a very extensive subject.
The majority of the popular web developing frameworks are for some reason, extremely processor inefficient.
The old fashioned way of developing n-tier web applications is still very relevant and is still considered to be best practice according the W3C. If you take a little time to read the source code structure of the most popular web developing frameworks you will see that they run much more code at the server than is necessary.
This may seem a bit of a simple answer but, the less code you run at the server and the more code you run at the client the faster your servers will work.
Sometimes contrasting framework code against the old fashioned way is the best way to get an understanding of this. Here is a link to a fully working mini web application which represents W3C best practices and runs the minimum amount of code at the server and the maximum amount of code at the client: http://developersfound.com/W3C_MVC_EX.zip this codebases is also MVC compliant.
This codebase comes with a MySQL database dump, php and client side code. To see this code in action you will need to restore the SQL dump to a MySQL instance (sql dump came from MySQL 8 Community) and add the user and schema permissions that are found in the php file (conn_include.php); setting the user to have execute permissions on the schema.
If you contrast this code base against all of the most popular web frameworks, it will really open your eyes to just how inefficient these frameworks are. The popular PHP frameworks that claim to be MVC frameworks aren’t actually MVC compliant at all. This is because they rely on embedding PHP tags inside HTML tags or visa-versa (considered very bad practice according the W3C). Also most popular node frameworks run way more code at the server than is necessary. Embedded tags also stop asynchronous calls from working properly unless the framework supports AJAX dumps such as Yii 2.
Two of the most important rules to follow with MVC compliance is: never embed server side tags (such as PHP tags) in HTML tags or visa-versa (unless there is a very good excuse such as SEO) and religiously never write code to run at the server if it can be run at the client. Also true MVC is based on tier separation, where as the MVC frameworks are based on code separation. True MVC compliance is very processor efficient. Don’t get me wrong MVC frameworks are very useful for a lot of things, but if you’re developing a site that is going to get millions of hits, they are quite useless, or at least they will drive your cloud bills so high that it will really eat into your company’s profits.
In summary frameworks don’t give much control over what code runs at the client or server and are very inefficient but you can get prototypes up and running quicker with less code.
In contrast the old fashioned way takes a bit more elbow grease but you have complete control over what runs at the server and what runs at the client.
As an additional bit of advice for optimisation avoid using pass-through queries and triggers and instead opt for stored procedures. Historically stored procedures weren’t invented at the time MVC was present as a paradigm but it definitely increases separation of concerns between the tiers and is much more processor efficient.
Hope this advice helps.

SOLR issue - too many search queries

We have a PHP web application which is using SOLR for searching. The APP is using CURL to connect to the SOLR server and which run in a loop with thousands of predefined keywords. That will create thousands of different search quires to SOLR at a given time.
My issue is that, when a single user logged into the app everything is working as expected. When there is more than one user is trying to run the app we are getting this response from the server.
Failed to connect to xxx.xxx.xxx.xxx: Cannot assign requested
addressFailed to connect to xxx.xxx.xxx.xxx: Cannot assign requested
addressFailed
Our assumption is that, SOLR server is unable to handle this much search queries at a given time. If so what is the solution to overcome this?. Is there any settings like keep-alive in SOLR?
Any help would be highly appreciate.
Thanks,
Arun S
What about OR'ing a subset of the keywords and do just one query?
Then, if nothing was found, try with the next subset of keywords?
For this particular case I think you need to improve your application performance, not SOLR's, even if you need to do some trickery.
Is there any Maximum connection limit in SOLR?
This is heavily dependent on the hardware and operating system you are using, and probably which servlet container you use.
If you need more performance out of SOLR, you may need to tweak your schema (which you could post here for more help), sale vertically (ram directories or SSD's) or considering a master-slave setup.

Strategies For AutoCompletion Web Services in .Net. Non UI focused

I'm getting a little tired of all the UI demos of auto completion in ASP.Net. I believe the UI portion of autocompletion has been solved multiple times over again.
My question is how do you best handle the queries hitting your webservices? I'm currently implementing an autocompletion service for a musician database. The database is fairly small with only 20,000 rows, but autocompletion is extremely speed sensitive. It needs to be fairly instant to be of any use.
I'm currently using NHibernate for my DAL, but I'm wondering if this is a place where I may want to bypass NHibernate. Perhaps projections on named queries would be the best strategy? Where do I cache? NHibernate's 2nd level cache? Let the web service cache?
I've already thought of a lot of naive methods to develop this, but I would like to soak in any tips that people already have in the wild. Also, what if you have many different types of entities you want autocompletion on? Do you spread those implementations around in their different repositories or do you design/implement a completely separate autocompletion service?
This depends on how large your sites traffic is. I generally suggest using a product such as MemCached or MemCached Win32 depending on your environment availability (MemCached for cheap linux boxes if you can is best...all that is needed is a ton of memory!). You might also look to something like Velocity (MS's new cache cloud offering). This would then allow you to cache a key (what ever the query is) with the results efficiently! Keep your cache times down based on however frequently you are updating your dataset. If you don't update often then the cache time can be longer. If you find that your cache cloud is growing like crazy you might want to only cache what is most frequently asked for (though your cache implementation should handle this by removing what is not accessed frequently!).

Resources