Have program execution switch between computers/servers - windows

I've got two servers and a program that I want to run on them (not necessarily simultaneously).
Let's call one server "SA" and the other one "SB".
SB is a backup for SA, and I want that while my program is executing on SA, if SA fails then the program will immediately pick up where it left off and continue executing on SB.
What is the easiest way I can accomplish this?

There are probably a bunch of ways that this could be done, but I'd use an exclusive file lock to do it. To make that happen, you need enough network connectivity between the two servers that both could open a file for writing to.
Your basic algorithm (pseudocode) goes like this:
File f;
while (true) {
myTurn = false
try {
Open Network file for writing
myTurn = true;
} catch (IOException e) {
// not logging anything because this is expected.
// you might log that you tried maybe
myTurn = false;
}
if ( myTurn ) {
Do all of your actual work here.
loop many times if that's what you're doing.
don't exit this bit until your server wants to shut down
(or crashes).
But don't close the file
}
}
Basically what happens is that your app tries to open a file exclusively.
If it can't open it, then the other server is locked, so this server should stay quiet.
If it can open the file, then the other server is not running and this server should do the work.
For this to work, it's absolutely essential that the "work" routine, does not hang - as long as the other server's process is active, it will hang onto that network file lock. So if the other server goes into an infinite loop, you'll be out of luck.
And remember, both servers are trying to open the same network file. If they're trying to open a local file, it's not going to work.
This question has an example that you could probably re-use:
Getting notified when a file lock is released

Related

How can I obtain access to a networked location?

My program, when started up with the system, is unable to access a networked location:
fn main() {
ensure_network("\\\\SERVER\\".to_string());
}
fn ensure_network(network_dir: String) {
let timer = std::time::Instant::now();
let mut prev_counter = 0;
loop {
if std::fs::read_dir(&network_dir).is_ok() {
break;
}
if timer.elapsed().as_secs() > prev_counter + 60 {
println!("Still Failing.");
prev_counter = timer.elapsed().as_secs();
}
std::hint::spin_loop();
}
println!("Network access obtained (Time elapsed: {})",
timer.elapsed().as_secs_f32());
}
Edit (Restating problem after much research into the issue):
This program starts up with the PC using Task Scheduler. It is set to "Run only when user is logged on" and to "Run with highest privileges." However, most of the time the program fails to find the connection and gives the error, "The user name or password is incorrect. (os error 1326)."
The program succeeds when run manually with administrator privilege.
On occasion the program will succeed on startup, but this is rare.
The program will succeed if any other application is started as administrator after the program enters its loop.
On Task Scheduler you can delay the execution of the task.
It's okay if you execute it after login, but when Active Directory or anyway a Domain system is between you and the login, the connection to the shared storage may take a while, and the program may try to execute before this happens. Try to put on a 10-20 seconds delay on it and see if this solve your problem.
If it doesn't work, again supposing that you have a domain in the middle, you may need to explicit give user and passwd to access the network where the directory you're looking for.

How to Start and Stop a server with command line tool in golang

I have a grpc server(golang) which I want to start and stop via command line tool, after stopping the server it should perform some housekeeping tasks and exit the process.
I can do this by keeping a loop waiting for user input. Ex -
func main() {
for {
var input string
fmt.Scanln(&input)
//parse input
// if 'start' execute - go start()
// if 'stop' execute - stop() and housekeepingTask() and break
}
}
There can be different approaches. Is there any better idea or approach which can be used ?
I am looking for something similar how kafka/any db start and stop works.
Any pointer to an existing solution or approach would be helpful.
I got one of the correct approaches which was commented/answered by #Mehran also. But let me take a moment and answer it in detail -
This can be solved by Inter-Process-Communication. We can have a bash file that sends user signals to the program (if already running) and based on its process can act on it. (even we can have stdin file which process can read and act upon)
Some useful links -
- https://blog.mbassem.com/2016/05/15/handling-user-defined-signals-in-go/
- Send command to a background process
- How to signal an application without killing it in Linux?
I will try to add a working go program.

How can I troubleshoot silently failing queued jobs?

I have a job that is dispatched with two arguments - path and filename to a file. The job parses the file using simplexml, then makes a notice of it in the database and moves the file to the appropriate folder for safekeeping. If anything goes wrong, it moves the file to another folder for failed files, as well as creates an event to give me a notification.
My problem is that sometimes the job will fail silently. The job is removed from the queue, but the file has not been parsed and it remains in the same directory. The failed_jobs table is empty (I'm using the database queue driver for development) and the failed() method has not been triggered. The Queue::failing() method I put in the app service provider has not been triggered either - I know, since both of those contain only a single log call to check whether they were hit. The Laravel log is empty (it's readable and Laravel does write to it for other errors - I double-checked) and so are relevant system log files such as e.g. php's.
At first I thought it was a timeout issue, but the queue listener has not failed or stopped, nor been restarted. I increased the timeout to 300 seconds anyway, and verified that all of the "[datetime] Processed: [job]" lines the listener generates were well within that timespan. Php execution times etc. are also far longer than required for this job.
So how on earth can I troubleshoot this when the logs are empty, nothing appears to fail, and I get no notification of what's wrong? If I queue up 200 files then maybe 180 will be processed and the remaining 20 fail silently. If I refresh the database + migrations and queue up the same 200 again, then maybe 182 will be processed and 18 will fail silently - but they won't necessarily be the same.
My handle method, simplified to show relevant bits, looks as follows:
public function handle()
{
try {
$xml = simplexml_load_file($this->path.$this->filename);
$this->parse($xml);
$parsedFilename = config('feeds.parsed path').$this->filename;
File::move($this->path.$this->filename, $parsedFilename);
} catch (Exception $e) {
// if i put deliberate errors in the files, this works fine
$errorFilename = config('feeds.error path').$this->filename;
File::move($this->path.$this->filename, $errorFilename);
event(new ParserThrewAnError($this->filename));
}
}
Okay, so I still have absolutely no idea why, but... after restarting the VM I have tested eight times with various different files and options and had zero problems. If anyone can guess the reason, feel free to reply and I'll accept your answer if it sounds reasonable. For now, I'll mark my own answer as correct once I can, in case somebody else stumbles across this later.

Selenium Firefox Open timeout

Using Windows 2008, C#, Firefox 3.5.1, Selenium RC (v1.0.1)
When it works, this code executes very quickly and the page loads within .5 seconds.
However, the session always seems to fail after 3 - 5 iterations. The open command will cause a window to be spawned, but no page to be loaded. Eventually a timeout exception is returned. The page has not actually timed out. Instead, it is as though the request for a URL has never reached the browser window.
class Program
{
static void Main(string[] args)
{
for (int i = 0; i < 10; i++)
{
var s = new DefaultSelenium("localhost", 4444, "firefox", "http://my.server");
s.Start();
s.SetSpeed("300");
s.Open("/");
s.WaitForPageToLoad("30000");
s.Type("//input[contains(#id, '_username')]", "my.test");
s.Type("//input[contains(#id, '_password')]", "password");
s.Stop();
}
}
}
I have a similar set up (Firefox 3.6.15, Selenium RC 1.0.1, but on WinXP and using the Python libraries) and I am working with a couple of sites - one site is naturally prone to timeouts in normal use (e.g. by a human user) whereas the others typically are not. Those that aren't appear a little slower but the one that is prone to timeouts is significantly slower when run via RC than by a person - it won't always timeout but the incidence is much much more common.
My limited mental model for this is that somehow the extra steps RC is doing (communicating with the browser, checking what it sees in the returned pages etc etc) are somehow adding a bit to each step of the page loads and then at some point they will push it over the edge. Obviously this is overly simplified, I just haven't had time to properly investigate.
Also, I do tend to notice that the problem gets worse over time, which fits a little with what the OP has seen (i.e. working the first time but not after 3 - 5 attempts). Often a reboot seems to fix the issues, but without proper investigation I can't tell why this helps, perhaps it is somehow freeing up memory (the machine is used for other things), getting allocated to a different one of our company's proxies or something else I haven't considered.
So... not much of a full answer here (a comment would have been more appropriate, but my login isn't able to yet), but at least it reinforces that you're not the only one. Periodic restarts are an annoying thing to need to do, but in the absence of any smarter analysis and answers, maybe they'd be worth a shot?
I was facing the same problem .This is because open method of DefaultSelenium has timeout of 30000ms, so it waits for 30s for your page to load. You can try this trivial solution.
//selenium is DefaultSelenium instance as private member of the class
boolean serverStartTry = false;
int tryCount =1;
while((!serverStartTry) && tryCount <= Constants.maxServerTries){
try{
this.selenium.open(ReadConFile.readcoFile("pageName"));
System.out.println("Server started in try no: "+tryCount);
serverStartTry =true;
}catch (SeleniumException e) {
System.out.println("Server start try no: "+tryCount );
System.out.println("Server Start Try: "+ serverStartTry);
serverStartTry = false;
tryCount++;
}
}
if(!serverStartTry){
System.out.println("Server Not started, no. of attempts made: "+tryCount);
System.exit(0);
}
I've solved using:
selenium.setTimeout("60000");
before open instruction.

Safest way to copy a file

I need to merg two PDF files.
However sometimes a file might be locked up
I wrote this code, but I'm wondering if it's not the smartest solution:
private static int FILE_LOCKED_WAIT_PERIOD = 1000;
while (true)
{
// If the file is in use, IOException will be thrown.
// If file is not permitted to be opened because of Permission
// Restrictions, UnauthorizedAccessException will be thrown.
// For all other, Use normal Exception.
try
{
inputDocument1 = PdfReader.Open(fileToMerge, PdfDocumentOpenMode.Import);
break;
}
catch (IOException)
{
Thread.Sleep(FILE_LOCKED_WAIT_PERIOD);
}
catch (UnauthorizedAccessException)
{
Thread.Sleep(FILE_LOCKED_WAIT_PERIOD);
}
catch (Exception)
{
Thread.Sleep(FILE_LOCKED_WAIT_PERIOD);
}
}
You should add a timer so that you sleep for a few clicks before you try the file operation again.
Also you should have counter so you do not wait indefinitely and that you exit after say 15 tries.
Well this depends:
1) Is it a process that is all internal to a system independent of a user? If so you should try to find out what is locking the file and wait for the explicitly. Waiting randomly and then trying over and over again may cause problems on its own.
2) Is it a user that may have the file open? In this case waiting is not helpful since the system could retry all weekend because the user suddenly left for the day. You have no control over user timing. Just tell the user that you cannot do the requested operation because the file is open and have them try again.
Usually waiting for N seconds/minutes is not really a solution. Either you know what the problem may be and poll & resolve the issue or you can't really do anything and just send out notice.
There is no special function to do this. Actually even if this function exists, some process can still easily lock this file between your "lock check" and "file open"

Resources