Exchange Web Services (Managed API) vs. WebDav Performance Question - exchange-server

I'm new to Exchange (2007) development so please bear with me. :-). There appear to be a myriad of technologies for Exchange development -- the latest being Exchange Web Services -- and it's related Managed API. I need to write a program that can -- if necessary -- run on the Exchange servers -- to scan people's mailboxes for the purpose of purging messages that meet various criteria (irrelevant for this discussion).
It is my understanding that most of the other technologies -- WebDav, MAPI, CDO -- are now deprecated with respect to Exchange 2007 and Exchange 2010. So since this is a greenfield application, I decided to use the Exchange Web Services Managed API.
I'm concerned about the number of items I can scan per hour. Since it is web services based there is a network hop involved. So I'd like to run this utility on the server with whom I am commnunicating. Am I correct that I have to talk to a "Hub" server?. I'm using Auto Discovery and it appears to resolve to a "hub" server no matter which mail server contains the actual message store I'm scanning.
When pulling multiple items down -- using ExchangeService.FindItems and specifying a page size of 500 -- I get pretty good throughput from my workstation to the hub server. I was able to retrieve 22,000 mail items in 47 seconds. That seems reasonable. However, turns out that not all properties are "bound" when retrieved that way. Certain properties -- like ToRecipients and CcReipients -- are not filled in. You have to explicitly bind them (individually) -- with a call to
Item.Bind(Server, Item.Id)
This is a separate round-trip to the server and this drops throughput down from about 460 items/second to 3 items per second -- which is unworkable.
So -- a few other questions. Is there any way to either force the missing properties to be bound during the call to FindItems? Failing that, is there a way to bind multiple items at once?
Finally, am I right in choosing Exchange Web Services at all for this type of work. I love the simplicity of the programming model and would not like to move to another technology if it is (a) more complex or (b) deprecated. If another technology will do this job better, and it is not deprecated, than I would consider using it if necessary. Your opinion and advice is appreciated.

You can user the service to load many properties for many items in one call to the server - it is designed exactly for your problem. It is just unfortunate that the Managed API documentation is still pretty thin on the ground.
results = folder.findItems... (or whatever find call you are making)
service.LoadPropertiesForItems(results, propertySet);
Where property set is something like:
PropertySet s = new PropertySet(BasePropertySet.IdOnly, ItemSchema.Subject, customDefinitions);
Use the various xSchema classes to load in the specific fields you want to minimise load if you are fetching lots of records back.

Related

How does CM & CD server communicates in sitecore?

I am new to sitecore and just trying to understand its architecture/design. Just curious to know how Intranet and Internet server communicates and how does the data flow happens between these two layers in on-prem and on AWS EC2 environment? I have surfed enough in the web and couldn't find the appropriate explanation.
Really appreciate if anyone can help me understand.
When u do a publish from CM, it puts a record in eventqueue table in Web Db.
all CD servers will hit the eventqueue table table for update and proceed.
default is 2 seconds once this hit happens.
In short, they communicate via events in the database(s). Note: This is very simplified but seeing it this way helped me understand how the events work and troubleshoot issues.
For example, when publishing an item, the publisher (running on CM or on a dedicated role) reads its data from the master database and writes it to the web database. When done, it raises an event by writing a row in the EventQueue table in web database. The CD server(s) picks up this event and clears its corresponding caches etc. causing a reload of that data from the web database.
All Sitecore databases have the EventQueue table and events goes to the table in different databases, depending on the type of event. An events is basically just a class name and a set of serialized data. Events can be raised "locally" and "globally" indicating if several instances should pick up the event. Think of a scenario where you have two CD servers sharing one web database, both CD's would have to pick up the event.
To keep track on what events has been processed, a "EQSTAMP" value is stored in the Properties table. It's named [database]_EQSTAMP_[InstanceName]. It's therefore essential that not two Sitecore instances share the same instance name. If not set, Sitecore will make an instance name by combining the hostname and IIS site name. The decimal Value of this timestamp corresponds to the hexadecimal Stamp column in the EventQueue table.
Normally, you should never have to play with these tables yourself, but I find it good to have some insights in how they work and keep an eye on them. They can grow in size and cause some issues. The CleanupEventQueue scheduled task is responsible for removing old processed events from the EventQueue tables. You may want to play with the scheduling of this agent if your EventQueue grows too large between cleanups.
Note: This is the most common way of communication between the servers. Later versions of Sitecore have other techniques as well, such as Rebus.
Event Queues. Why? How? When? article that explains it in detail, it also describes the pitfalls of using this mechanism in real life as well.
Please also be aware that Sitecore.Link project is a good place to get more knowledge regarding Sitecore functionality.
It accumulates Sitecore knowledge all around the web.
Thanks.

Designing microservices in practice

Yet another question on how to or how not to split up a microservice :-D
The scenario:
What do we need?
Sending emails at different points of time within the work flow of an ecommerce order process. These mails will be containing order information.
What do we have?
1 x persistence service which retrieves order information
Several services which subscribe to order events and processes the relevant use case (e.g. Confirmation, delivery, invoice)
1 x service which can be triggered to send a mail
What's the next step?
Designing the architectural component which transforms the order information so they will fit the data structure of the email rendering service.
The current options are
1 having each processing service transform already existing order information for the mail template and send them to the mail rendering service.
2 have each processing service call a new service which would aggregate and transform the order information and call the mail rendering service.
Currently we're not sure yet if the data structures for the mail templates will be mostly common or if there will be differences.
So what do you think of these options in terms of cohesion, coupling and separation of concerns?
Do you need any more information? Any constructive thoughts are welcome!
Your software architecture should reflect your organizational structure, see Conway's law
Do you have multiple teams, and you want to minimize dependencies between the teams.
Are "services" large and complex enough to justify them being separated into modules?
Does the size of the product justify having advanced devops in place to orchestrate the microservices?
Do you need the flexibility in terms of deployment and replaceability of individual "services"?
If you can answer yes to most of these questions, it would make sense to go for microservices. Otherwise, you are just making your life complicated.
Frankly, microservices require a lot of coordination overhead which makes sense only if the product is large enough. Most (small) projects are just fine with monolithic and MVC architecture.
This is how I propose to proceed man, it's how one of my project's architecture does all SMTP related stuff.
API receives an HTTP request
It persists data needed to the database.
It offloads the long-running and memory intensive processes to mail builder.
Optional, mail builder builds attachment files (XLSX, PDF, etc)
Mail builder uploads to File Server
Mail builder offloads generic SMTP sending to SMTP service.
I suggested this format because it allows you to scale the instance of each piece (Mail builder will have tons of instances) depending on bottlenecks in your processing pipeline.
Given that you have asked this question in microservices, I am assuming you are asking the question in reference to cloud native patterns.
I suggest you start with looking at microservices pattern. An excellent site for the patterns is https://microservices.io/patterns/microservices.html.
Your question does not have the necessary details to provide an educated advice on what patterns are suitable and what are not. So, I suggest you look at these few patterns...
https://microservices.io/patterns/data/shared-database.html
https://microservices.io/patterns/data/database-per-service.html
Also take a look at event sourcing pattern
https://microservices.io/patterns/data/event-sourcing.html
Hope this helps.

Performance Testing Methodology

I'm looking for a "concrete" methodology to individuate performance bottleneck of a service provided through a web application. I'm looking for an holistic approach that includes testing of computer network, database and web applications.
Suppose that you are in front of a web application that allows you to download pdf files once logged in your company network.
You access to the application with a browser.
The end user requirement is that the web application must allows to download pdf files (with size up to 5MB) in no more than 1 minute.
Some technical details:
- The application consists of a database, a document management system (e.g., Alfresco) and pieces of Java code.
- An user authenticates him/herself by providing username and password to the application, the application on its turn sends them to the LDAP server (the LDAP server is deployed on another physical server). A java serlet does this work and additionally queries the DB to understand the role of the user (a user can be the administrator, a reader, a writer).
- An authenticated user access to a search page, after searching a document the file will be downloaded. The search works in this way: the user fills in some fields (e.g., the name of the document) the field is sent to the document management systems which performs the actual search of the file and returns the results back to the application.
When the user clicks the download button, the application retrieves the document from the document management system.
The underlying network should be 1GB Eth with some routers/bridges and a load balancer, we have a broad knowledge of network topology.
My question is: if there is a performance bottleneck somewhere (in the network, in the web application, e.g., poor coding) that violates the former requirement (1 second download time) how can we discover it? From which element should we start? For instance trying to understand network performance, then document management systems and at the end the whole system (application, network, database). How should we incrementally increase the number of download request?
I'm looking for a methodology, I've already read
http://www.agileload.com/performance-testing/performance-testing-methodology/test-methodology
http://msdn.microsoft.com/en-us/library/bb924375.aspx
What performance testing methodology are you using for your webapps?
All them contain nice suggestions, but I want a more practical methodology with reference to testing of web application.
Thank you in advance
Is it one minute or one second for a 5MB file? Can you post a diagram
of how the various pieces are connected?
There is a way to determine how the network latency and application processing contribute towards the total response time.
It requires instrumenting the browser and other components that make up the complete system. I.e. writing code in JavaScript, Java, C/C++, Perl, Python, etc. and embedding it into each of the application component so that components can report events to a central collector.
If instrumentation cannot be easily added to the components, then the other alternative to insert event collecting proxies between components and then have them report events to a central collector. You can determine and factor out delays due to proxies by running few tests with and without proxies in the path.
Once the events arrive at the central collector, one can get good visibility into how the response time is made up.

Performance monitoring all layers of a system

I use several loadtesting tools (Loadrunner, JMeter, NeoLoad) to performance test different applications. Im wondering if it is possible to monitor all layers of an application stack so for example. Say i have the following data chain.
Loadbalancer <-x-> Application Server <-x-> RMI <-x-> Java Application <-x-> MQ <-x-> Legacy application <-x-> Database
Where i have marked the x in the chain i am interested in monitoring, for example avg responsetimes.
Obviously we could simply create a wrapper on all endpoints which would gather the statistics for us and maybe we could import it into loadrunner or other loadtesting tools and sideline hem with the tools inbuilt performance statistics, but maybe there is tools/applications which already does this?
If not, how should we proceed, in order to gather this kind of statistics?
The standard for this was supposed to be Application Response Measurement (ARM). It was a cross language set of APIs that did just what you were looking for. The issue is that the products that implement this spec all tend to be big, expensive "enterprise" level monitoring tools. Think multi-week installs, consultants, more infrastructure and lots of buzzwords.
Still, if this is a mission critical app with a mission critical budget, this may be what you need. But you may be able to build your own that does just enough without too much effort. A quick search turns up at least one open source ARM implementation if you still want to use that API.
Another option is to simply to have transactions you can run against each tier of the system to check general responsiveness. For example you can have a static web page on the LB, a no-op tx on the app server, a "hello" servlet on the Java app, put a message directly on the queue, etc. During a performance / load test, these could be hit directly by the load testing tool or you could write a wrapper servlet / application call that does this as a single HTTP (RMI?) call. Running these a few times a minute won't add too much load to the system, but it should help you pinpoint which tier is slower. The nice thing about this approach is that it also works in production, just watch out for security issues.
For single user kind of test, where you know you have problem (e.g. this tx is "slow"), I have also had pretty good luck with network tracing. It's very tedious, but when you aren't sure what tier is slow, starting up a network trace on a few machines and running a single tx usually gives a good idea of what the system is doing.
I have handled this decomposition a number of ways in the past. The first is at a very low level using protocol analyzer dumped data to find the time points where a conversation leaves tier X and enters tier Y. The second method is through the use of log examination for the various tiers. Something that can make your examination quite usefule in this case is a common log server for all of your components (syslog, Rsyslog, etc....) and a nice log parsing tool, such as the freely available Microsoft Logparser. The third method utilization of the audit trail for an application stored in the database. You may find this when working on enterprise services bus style applications which have a consumer/producer model and a bus to pass information rather than a direct connection. The audit trails I have seen are typically stored in a database and allow the tracking of an individual transaction through the entire application infrastructure. Your Load balancer, as a network device, may be out of the hunt on this one.
Note, if you go the protocol analyzer or log route, then be sure and synchronize all of your source information devices to a common time server. Having one of your collectors (analyzer, app log) off on a time stamp basis can really be a hair pulling experience when you get into the analysis phase.
As to how you move from your collected data into LoadRunner, that part is very mechanical. The Analysis program supports an interface to import external datapoints. The format is very specific and is documented in both help and the online docs. This import process works very well, as I often have to use it for collection of statistics from hosts which I do not have direct monitoring access to, but which need to be included as a part of the monitored test infrastructure.
James Pulley
Moderator (YahooGroups LoadRunner, Advanced-Loadrunner; GoogleGroups lr-LoadRunner; Linkedin LoadRunner, LoadRunnerByTheHour; SQAForums LoadRunner, WinRunner)

Web Hosting, Web Scaling

I have a simple web application to conduct online exams for the college students. All questions are multiple choice questions. Around 5000 users will be taking up the exam. My backend is mysql and using PHP as the front end. I want to know the hardware configuration for the servers that will be required to host this application and work seamlessly for the required no of users.
I am also looking out for cloud solutions. If I choose Amazone EC2 instances, can some body give me advice on what type of EC2 machine I should go into for this application?
It is impossible to tell the exact specs of the servers that will be required to run your setup, because there are too many variables. However, it is definitely a good question: when I was a student at university, it happened that a professor tried to do this, and didn't do testing: on the exam date, the system got overloaded and the exam had to be cancelled!
Start with testing what you already have. You can use something like the ab tool or JMeter. It will simulate the requested load for you automatically, so you can check how your actual server performs, and act accordingly.
Application design is also important. Like you can cache all the question at web layer to avoid database query. Make client heavy app such that server payload is minimum (json response) to reduce download time load on server.
Request multiple questions at once and Batch user responses to answer question together to decrease ajax calls.
Make use of nosql solution to avoid RDMS constraints overhead.

Resources