Networking optimization - etl

At work we do a lot of ETLs for customers from the data that they give to us. Our networking I think is a little inefficient and I want to talk to our networking admin but I do not know what questions to ask.
One of the longest parts of the ETL is loading the work on a test server from my conversion machine. My ETL machine and test machine are on the same rack, can these machines be connected in some way so when they exchange information it is done over their own private network and doesn't have to travel over our vpn. I want a couple of gigabit cable across these guys, also we have two more machines in that room that are used by my department and if possible I would like to increase this data transfer as well.

Related

ruby scrape with multiple ip addresses

I would like to know if it is possible for a Ruby program to possess multiple IP addresses? I am trying to download a lot of data from a site, but it is very slow with only 1 connection at a time.
I intend to multi-thread my program with each thread using its own IP address, but I do not know if it is possible in the first place, any help or hints would be greatly appreciated.
It is definitely possible for a machine or a program to have multiple IP addresses. You can even have multiple network adapters, and tie each of them to different physical connections.
However, it can get really hairy to maintain. The challenge for that is partly in the code, partly in the system maintenance, and partly in the networking required to make that happen.
A better approach that you can take is to design your program so that it can run distributed. As such, you can have several copies of it synchronized and doing the work in parallel. You can then scale it horizontally (build more copies) as required, and over different machines and connections if required.
EDIT: You mentioned that you cannot scale horizontally, and that you prefer to use multiple connections from the same machine.
It's very likely that for this you'll have to go a little bit lower in the network stack, developing yourself the connection through sockets in order to use specific network interfaces.
Check out an introduction to Ruby sockets.
Also, check out these related questions:
How does a socket know which network interface controller to use?
Binding to networking interfaces in ruby
Ruby: Binding a listening socket to a specific interface
Can I make ruby send network traffic over a specific iface?

performance test wireless network using loadrunner

We have a requirement to test a wireless network that will be deployed to 1000s of people. I am looking for a way to test it with loadrunner.
Can you suggest a good approach to test the wireless connection for "Concurrent open connection" and network speed when 1000s of users are connected to the wireless network and downloading data.
Wrong tool. You need a network tool to answer a network question because the items you need to sample the item under test (the network) need to be proven stable and scalable beyond the network under all conditions. You also need t a tool with specific characteristics related to the use of other finite resources, DISK, CPU and RAM specifically. User applications fail these prequalifiers. I recommend the tool IxChariot from Ixia Communications. It is also the tool used by the WiFi alliance for certification.
Network tools are simpler tools in many respects, as you will be testing data flows. Network tools are also far simpler to setup with dual homes load generators, using a hard wired connection for test setup and control, with a different network under test, the wireless network.
http://www.ixiacom.com/products/ixchariot
And no, I don't sell it.

Does it make sense to host movies from a home LAMP server SQL database?

First off, my apologies if the concept of this question makes no sense, as I am attempting to create my first SQL database.
-Does it make sense to host movies from a home LAMP server SQL database?
-If I decide to host videos from an SQL database via my Ubuntu 12.04 LAMP server, will I have advantages in performance?
-Will that include wireless/wired streaming over an Airport Extreme router (or in other words, over a network)?
The alternative would be a file share. I am aiming to learn some basic SQL setup and administration; I thought this might be an interesting and educational way to do it.
Start by researching how various home media solutions solve this problem. Each has differing upsides and downsides. Some focus on indexing and adding metadata to shared network drives then provide a browsing interface to it. Others wholly encapsulate the media store and provide the media directly--sometimes by faking a file share which in reality is backed by the database.
The performance of video over network depends on the performance of your main server. Now if you want to host the videos your server should be very fast and reliable.
It's advisable to use LAMP Server if used on LAN Network.
But if you are planning for later modification in terms of users then I would suggest that you go with the alternative.

Scaling Tigase XMPP server on Amazon EC2

Does anyone have an experience running clustered Tigase XMPP servers on Amazon's EC2, primarily I wish to know about anything that might trip me up that is non-obvious. (For example apparently running Ejabberd on EC2 can cause issues due to Mnesia.)
Or if you have any general advice to installing and running Tigase on Ubuntu.
Extra information:
The system I’m developing uses XMPP just to communicate (in near real-time) between a mobile app and the server(s).
The number of users will initially be small, but hopefully will grow. This is why the system needs to be scalable. Presumably for a just a few thousand users you wouldn’t need a cc1.4xlarge EC2 instance? (Otherwise this is going to be very expensive to run!)
I plan on using a MySQL database hosted in Amazon RDS for the XMPP server database.
I also plan on creating an external XMPP component written in Python, using SleekXMPP. It will be this external component that does all the ‘work’ of the server, as the application I’m making is quite different from instant messaging. For this part I have not worked out how to connect an external XMPP component written in Python to a Tigase server. The documentation seems to suggest that components are written specifically for Tigase - and not for a general XMPP server, using XEP-0114: Jabber Component Protocol, as I expected.
With this extra information, if you can think of anything else I should know about I’d be glad to know.
Thank you :)
I have lots of experience. I think there is a load of non-obvious problems. Like the only reliable instance to run application like Tigase is cc1.4xlarge. Others cause problems with CPU availability and this is just a lottery whether you are lucky enough to run your service on a server which is not busy with others people work.
Also you need an instance with the highest possible I/O to make sure it can cope with network traffic. The high I/O applies especially to database instance.
Not sure if this is obvious or not, but there is this problem with hostnames on EC2, every time you start instance the hostname changes and IP address changes. Tigase cluster is quite sensitive to hostnames. There is a way to force/change the hostname for the instance, so this might be a way around the problem.
Of course I am talking about a cluster for millions of online users and really high traffic 100k XMPP packets per second or more. Generally for large installation it is way cheaper and more efficient to have a dedicated servers.
Generally Tigase runs very well on Amazon EC2 but you really need the latest SVN code as it has lots of optimizations added especially after tests on the cloud. If you provide some more details about your service I may have some more suggestions.
More comments:
If it comes to costs, a dedicated server is always cheaper option for constantly running service. Unless you plan to switch servers on/off on hourly basis I would recommend going for some dedicated service. Costs are lower and performance is way more predictable.
However, if you really want/need to stick to Amazon EC2 let me give you some concrete numbers, below is a list of instances and how many online users the cluster was able to reliably handle:
5*cc1.4xlarge - 1mln 700k online users
1*c1.xlarge - 118k online users
2*c1.xlarge - 127k online users
2*m2.4xlarge (with 5GB RAM for Tigase) - 236k online users
2*m2.4xlarge (with 20GB RAM for Tigase) - 315k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 400k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 312k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 327k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 280k online users
A few more comments:
Why amount of memory matters that much? This is because CPU power is very unreliable and inconsistent on all but cc1.4xlarge instances. You have 8 virtual CPUs but if you look at the top command you often see one CPU is working and the rest is not. This insufficient CPU power leads to internal queues grow in the Tigase. When the CPU power is back Tigase can process waiting packets. The more memory Tigase has the more packets can be queued and it better handles CPU deficiencies.
Why there is 5*m2.4xlarge 4 times? This is because I repeated tests many times at different days and time of the day. As you can see depending on the time and date the system could handle different load. I guess this is because Tigase instance shared CPU power with some other services. If they were busy Tigase suffered from CPU under power.
That said I think with installation of up to 10k online users you should be fine. However, other factors like roster size greatly matter as they affect traffic, and load. Also if you have other elements which generate a significant traffic this will put load on your system.
In any case, without some tests it is impossible to tell how really your system behaves or whether it can handle the load.
And the last question regarding component:
Of course Tigase does support XEP-0114 and XEP-0225 for connecting external components. So this should not be a problem with components written in different languages. On the other hand I recommend using Tigase's API for writing component. They can be deployed either as internal Tigase components or as external components and this is transparent for the developer, you do not have to worry about this at development time. This is part of the API and framework.
Also, you can use all the goods from Tigase framework, scripting capabilities, monitoring, statistics, much easier development as you can easily deploy your code as internal component for tests.
You really do not have to worry about any XMPP specific stuff, you just fill body of processPacket(...) method and that's it.
There should be enough online documentation for all of this on the Tigase website.
Also, I would suggest reading about Python support for multi-threading and how it behaves under a very high load. It used to be not so great.

bandwidth and traffic simulator for web apps?

Can you suggest how to create a test environment to simulate various types of bandwidths and traffic in a web app?
Or maybe an open source program which does this against localhost?
I think this is a very important subject when programming web apps but it is not a usual topic, the only way i can imagine to create such kind of environment is to use some kind of proxy in a local network but before start looking into the squid documentation i would like to hear your suggestions.
if you're using apache you may want to take a look at apache ab
There are two approaches to shape network traffic to simulate a network link:
Run some software on the client or server that sits somewhere in the networking stack and shapes the traffic between the app and the network interface
Run the traffic shaping software on a dedicated machine with 2 network interfaces through which your traffic is routed
(2) is a better solution if you don't want to install software on the client or server (and possibly impact performance), but requires more hardware fiddling.
Some other features you might want to think about are what shaping parameters can be simulated. Most do delay and packet loss, some do jitter and bandwidth limiting as well. Some solutions can selectively filter traffic (for instance by port number, TCP or UDP etc).
Here is a list of some of the systems I've found:
Open Source or Freeware
DummyNet is an open source BSD Unix-based for dedicated devices. It is not clear if the software is being actively maintained
NistNet is an open source Linux-based system for dedicated devices. The software has not been actively maintained for several years.
Commercial
Apposite Technoligies sell dedicated hardware solutions for simulating WAN links, with a Web based GUI for configuring the settings and collecting traffic measurements
East Coast DataCom sell hardware dedicated simulators for simulating routers and modems
Itrinegy offer both dedicated device solutions, and solutions for running on clients or servers.
Network FX offer several dedicated device products for simulating network impairments between the client & server
NetLimiter is a client side system that allows throttling of individual applications, and includes a firewall.
Shunra Software offer a range of products, from high end enterprise WAN simulation and testing, to a simple client-resident emulator.
The closest I can think of is doing something similar with VEDekstop from Shunra..
Simulating High Latency and Low Bandwidth in Testing of Database Applications
Shunra VE Desktop Standard is a Windows-based client software solution that simulates a wide area network link so that you can test applications under a variety of current and potential network conditions – directly from your desktop.
I wrote a php script awhile back which used CURL to run a sequence of page requests against my server which represented a typical use scenario. I had it output the times that it took for the server to respond to each of the requests. I then had another script which spawned a bunch of these test case scripts simultaneously for a sustained period and correlated the results into a file which I could then look at in a spreadsheet to see average times. This way I could simulate the number of users hitting the site that I wanted. The limitations are that you need to run the test script on a different server to the web server and that the client machine can become too loaded to give meaningful results past a certain point. I've since left the job otherwise I would paste the scripts here.
If you are running a Linux box as your server, Linux box as your client, or have the capability to put (perhaps a VM) a Linux router between your client and server, you can use NetEm.
NetEm is a Linux TC (Traffic Control) discipline which can delay (i.e. add latency) packets leaving a host. Although it's tricky to set up clever rules (e.g. add latency to some traffic, not to others), it's easy to add a simple "delay everything leaving the interface by 50ms" type rules and some recipes are provided.
By sticking a Linux VM between your client and server, you can simulate as much latency as you like. And you can turn it on and off dynamically. Linux has other TC disciplines which can be combined with NetEm to restrict bandwidth (but the script to set this up can be somewhat complicated). NetEm can also randomly drop packets.
I use it and it works a treat :)
Web Application Stress Tool (WAST) from Microsoft is what you need.
http://www.microsoft.com/downloads/details.aspx?familyid=e2c0585a-062a-439e-a67d-75a89aa36495&displaylang=en
I haven't used it for years (lack of need, not because I'd found anything else), but xat webspeed would be the first thing I would point toward
As other people have mentioned, Apache's ab (comes with Apache, so you probably have it already) is good.
Other good options are:
HP's LoadRunner Apache
Jakarta's JMeter
Tsung (if you want to get your erlang on)
I personally like ab and JMeter the best.
We use Loadrunner to do bandwidth and traffic simulation in our App. Loadrunner is can start agents on various machines and you can simulate one machine as running on dialup modem v/s another on DSL v/s another on Cable internet.
We also use Loadrunner to simulate various kinds of traffic conditions from 10 user run to 500 user run. We can also insert think times in the script and simulate a real user executing the http request. The best part is that it comes with a recording studio where it will plug in with Internet explorer and you can record the whole scenario/Usecase that can be as simple as hitting one page to a full blown 50-60 page script or more.
i found this little java program that works great : sloppy
yet not a proffesional solution but it works for simple tests, i guess it uses java streams and buffers to slow down the connection .
Have you looked at Tsung? It's a great utility for seeing if your website will scale in event of attack, I mean massive popularity. We use it for our web frontend, and our internal systems too.
If you're interested in performing your tests out of your browser, there is also a really great Firefox plug-in.
Do not forget about Wanulator (http://www.wanulator.de/).
The name Wanulator comes from "WAN" and "simulator. This pretty much describes what the software does: It simulates different Internet conditions such as delay or packet loss. Furthermore it simulates user access line speeds e.g. modem, ISDN or ADSL.
Wanulator is currently packaged as a Linux boot CD based on SLAX. This will give you a full out of the box experience. You can turn any PC into a test-system within a blink - just by booting the Wanulator CD. The package already includes useful client SW such as web-browser and network sniffer (Wireshark). Nevertheless if the PC has 2 network interfaces the system can run as an intermediate system between your server and your client - as a switch - without any configuration hassles.

Resources