Way to minimize scripts start-up time via persistence?

Way to minimize scripts start-up time via persistence? - performance

Raku start-up times are notoriously huge.
On my machine (old one but the most modern I own):
time raku -e 'put "Hello, World!"'
495.67 - 567.02 millis
(1.53 secs for the 1st time launch)
time perl -E 'say "Hello, World!"'
13.63 - 19.51 millis
time sh -c 'echo "Hello, World!"'
4.31 - 6.88 millis
c:
#include <stdio.h>
int main(void) {
puts("Hello, World!");
return 0;
}
time ./hello
5.03 - 7.67 millis
time clang hello.c -o hello (compile)
126.39 - 195.66 millis
Raku does boast many nifties built-in which make it heavy.
Yet sometimes you need a rather simple script, and you would prefer Raku for its elegance, but launch-time spent waiting brings dissatisfaction and you don't pick Raku for your next simple script.
Is there a way to help it?
Like starting Raku once and keeping it running, and making it handle all Raku scripts, with and without modules, big and small, and raku -e ... as well?
Something like FastCGI but for local shell?
Even sacrificing permanently huge amount of memory is better for me than wasting time waiting.
In KiloBytes, according to ps:
Virtual size Resident set size
raku -e 'loop {sleep 1}' 146704 107200
perl -e 'while (1) {sleep 1}' 1252 4036
sh -c 'while true; do sleep 1; done' 892 900
#include <unistd.h>
void main(void) { for(;;) {sleep(1);} } 172 780
Update:
Elaborating upon
raku -e 'while prompt "Enter filename: " -> $filename { EVALFILE $filename }'
from https://stackoverflow.com/a/73873471/14812514
Concocted from https://docs.raku.org/type/IO::Socket::INET and https://www.tutorialspoint.com/perl/perl_socket_programming.htm
raku-persistent, heavy server:
#! /usr/bin/env raku
use MONKEY-SEE-NO-EVAL;
my $listen = IO::Socket::INET.new( :listen,
:localhost<localhost>,
:localport(3333) );
loop {
my $conn = $listen.accept;
try {
while my $buf = $conn.recv() {
put now - (EVAL $buf)[1] - 37; # also subtract leap seconds
EVALFILE (EVAL $buf)[0];
}
}
$conn.close;
CATCH { default {.payload.say} }
}
ra, light client:
#! /usr/bin/env perl
use strict;
use warnings;
use Time::HiRes;
use Socket;
my $file = shift || '/tmp/test.raku';
my $arg1 = shift || Time::HiRes::time;
# initialize host and port
my $host = shift || 'localhost';
my $port = shift || 3333;
my $server = "localhost"; # Host IP running the server
# create the socket, connect to the port
socket(SOCKET,PF_INET,SOCK_STREAM,(getprotobyname('tcp'))[2])
or die "Can't create a socket $!\n";
connect(SOCKET, pack_sockaddr_in($port, inet_aton($server)))
or die "Can't connect to port $port! \n";
printf SOCKET '["%s", %f]', $file, $arg1;
close SOCKET or die "close: $!";
/tmp/test.raku:
put "Hello, World!";
run 'notify-send', 'Hello, World!'
raku-persistent in one terminal, once;
ra [script.raku] in another terminal, how many times you want.
Delay ranges 0.008848472 - 1.322056732; in most cases being below 0.07.
0.008848472 must be some kind of a mistake - it is less than perl startup time, which is impossible.
This is still a proof of concept since arguments don't get into the target script.
Update 2:
As a reaction to https://stackoverflow.com/a/73918912/14812514
Suggested bench suite doesn't work via Makefile on my machines, even on Debian-based antiX 21, and possible solution was found after I had done all measurements semi-manually. Moreover, on OpenBSD run.c doesn't even compile. Being zero at c, I resorted to shell time for i in $(seq 1000); do ./<file> >/dev/null; done.
All 4 machines with HDD, no SSD.
My initial configuration is i5-3320M OpenBSD and marked with ^.
Results in seconds for 1K invocations, also for single invocation for Raku only:
CPU cores GHz OS via Raku K Raku 1 Perl K C K
Pentium 4-M i686 1 2.2 OpenBSD i386 shell seq 36m32.479 2.192479 22.368 6.408
Core 2 Duo T5800 2 2 antiX c run 10m34.460 0.63446 2.224 0.535
Core 2 Duo T5800 2 2 antiX shell seq 7m48.153 0.468153 3.878 1.509
^i5-3320M 2 2.6 OpenBSD shell seq 8m 0.011 0.480011 8.150 2.258
i5-3320M 2 2.6 antiX live c run 4m53.469 0.293469 1.157 0.276
i5-3320M 2 2.6 antiX live shell seq 3m37.042 0.217042 1.688 0.615
i7-3770S 4 3.1 OpenBSD shell seq 6m44.920 0.40492 7.026 2.340
i7-3770S 4 3.1 antiX live c run 4m 5.571 0.245571 0.872 0.268
Some highlights:
Raku startup times are unwelcomely regardless of CPU and OS
shell for i in $(seq... is slower than custom C run for Perl & C but outperformed it for Raku - 7m+ vs 10m+ on Core 2 Duo and 3m+ vs 4m+ on i5
OpenBSD doesn't prioritize speed
3rd-party software doesn't prioritize optimization for OpenBSD

Assuming you have a recent version of Rakudo, I can say that you have indeed a very slow machine:
% time perl -E 'say "Hello, World!"'
Hello, World!
real 0.05s
user 0.00s
sys 0.03s
% time raku -e 'put "Hello, World!"'
Hello, World!
real 0.12s
user 0.12s
sys 0.02s
are the numbers I see on a 2-year M1 MacMini. On a 10+ year old MacMini running Debian, I see:
$ time perl -E 'say "Hello, World!"'
Hello, World!
real 0m0.009s
user 0m0.004s
sys 0m0.005s
$ time raku -e 'put "Hello, World!"'
Hello, World!
real 0m0.241s
user 0m0.287s
sys 0m0.041s
Now to get back to your question: yes, you can have something like FastCGI but for a local shell:
% raku -e 'while prompt "Enter filename: " -> $filename { EVALFILE $filename }
run this, enter the name of the script to execute and ENTER, and it will run. Now, this is the principle: this only runs scripts without arguments. You can build on this allowing for arguments and such.

The client-server model you describes seems like a really good idea to me! In fact, I've been toying with a similar idea myself; I actually gave a lightning talk on the subject at the 2022 Perl and Raku Conference, A Nailgun for Raku.
As that talk mentions, this is the basic idea behind Nailgun – which implements the same idea but for Java. As you note, it gets a bit more complex than the proof of concept, but definitely seems like a doable Raku project.
On a different note, I agree with Raiph [edit: with Liz, oops!] that the timings you posted are pretty close to a worst-case scenario in terms of launch speed. My go-to reference for language startup time benchmarks is bdrung/startup-time, which has both a benchmarking framework and the author's own data. For comparison, that author clocks a "Hello, World!" C program between 0.26 ms (for a 2018 laptop) and 2.19 ms (for a Raspberry Pi 3). Those times might provide some context for the 5.03 to 7.67ms time you measured for a "Hello, World!" C programs.

Related

Output from running Matlab (Linux) as a Cron job in Bash includes many ">>" in the email

I am running a Matlab script on Linux (RedHat Enterprise Linux RHEL 7.6, 64-bit) as a cron job. I am not admin on that machine, therefore, I use crontab -e to schedule the job. The installed version of Matlab is 2018b. The email which I recieve upon execution includes a couple of >> at the beginning and end which I find a bit irritating.
Here, an example of the email:
MATLAB is selecting SOFTWARE OPENGL rendering.
< M A T L A B (R) >
Copyright 1984-2018 The MathWorks, Inc.
R2018b (9.5.0.944444) 64-bit (glnxa64)
August 28, 2018
To get started, type doc.
For product information, visit www.mathworks.com.
>> >> >> >>
Matlab started: 2020-07-31 21:50:26.
>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>
Going to update from 2015-01-01 00:00:00 UTC to 2015-12-31 23:00:00 UTC.
[...]
>> Matlab closes: 2020-07-31 23:26:41.
>>
The corresponding lines at the beginning of the Matlab script look exactly like this:
close all
clearvars
% profile on % to check performance
fprintf('\nMatlab started: %s.\n', char(datetime()))
%% Database user parameters
% connects always to the soecified database on "localhost"
DB_conn_name = 'abc';
DB_username = 'def';
DB_password = 'ghi';
% Add path and subfolders
if isunix
addpath(genpath('/project/abc'));
elseif ispc
addpath(genpath('C:\Branches\abc'));
end
% Change working folder
if isunix
cd /project/abc
elseif ispc
cd C:\Branches\abc
end
% Add database driver to path
javaaddpath JDBC_driver/mysql-connector-java.jar % Forward slashes within Matlab work even on Windows
% Set default datetime format
datetime.setDefaultFormats('default','yyyy-MM-dd HH:mm:ss')
%% Begin and end of update period
% now_UTC = datetime('now','TimeZone','UTC');
% time_2 = datetime(now_UTC.Year, now_UTC.Month, now_UTC.Day-1, 22, 0, 0); % Set the end time not too late, otherwise, some data might not yet be available for some areas leading to ugly "dips" in Power BI.
% During each update, we update e.g. the past 30 days
% datetime_month_delay = time_1 - days(30);
% Override automatic dates obtained below, for testing purposes
% time_1 = datetime(2020,1,1,0,0,0);
% time_2 = datetime(2020,2,1,23,0,0);
% Updating several years, one at a time
for iYear = 2015:2019
time_1 = datetime(iYear,1,1,0,0,0);
time_2 = datetime(iYear,12,31,23,0,0);
fprintf(['\nGoing to update from ',char(time_1),' UTC to ',char(time_2),' UTC. \n'])
[...]
Looks as though each row that is outside the for loop produces an empty line and therefore such a >> prompt in the output. Also visible at the end (not included here).
The crontab -e looks like the following:
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=<my email address>
HOME=/project/abc
HTTP_PROXY=<proxy address>:8086
HTTPS_PROXY=<proxy address>:8086
# Run script regularly: minute hour day month dayofweek command
# No linebreaks allowed
15 2 * * * ~/script.sh
The shell script script.sh looks like this:
#!/bin/bash
/prog/matlab2018b/bin/matlab -nodesktop < ~/git-repos/abc/matlabscript.m
Does anyone have an idea what I need to change to get rid of these >>? That would be great! Thanks in advance!

The -nodesktop flag is still giving you an interactive shell, which is why crontab is capturing the prompts at all. You need to tell the matlab command what statement to execute.
I know you are using R2018b; but, I am going to give you BOTH answers for before and after R2019a, in case you ever upgrade.
For both answers: Because you called this in your crontab, make sure to use full path for your MATLAB executable for security reasons; and, it would be good to make sure you use the -sd flag as well so that your statement to execute is first in the path. The statement to execute is to be typed the same way you would type it on the MATLABcommand line.
Before R2019a: Per the doc page for the R2018b matlab (Linux) command, you need to run your command with the -r and -sd flags together. The -sd flag specifies your startup directory. Also, your code needs to have an exit statement at the end so that the matlab executable knows its done.
/path/before_R2019a/matlab -sd /path/startup_directory -b statement
Starting in R2019a, the -batch flag in your invocation of MATLAB is the recommended way to run automated jobs like this, per the matlab (Linux) command doc page
Note that starting in R2019a, the -r flag is NOT recommended; and, it should NOT be used with the -batch flag.
The -batch flag is simpler to use, and was added to make automation tasks easier. For starters, you no longer need to have an exit statement in your code with this approach.
Also remember that if you need quotes, starting in R2016b, MATLAB handles both double and single quoted strings. Choose appropriately in your script or cron call to handle your linux shell replacements - or avoid them.
/path/R2019a+/matlab -sd /path/startup_directory -b statement
As an added bonus, if you use the -batch flag, you can tell from inside your script whether it is running from a -batch call or interactively using the MATLAB variable batchStartupOptionUsed.

Buffer overflow attack, possible to regenerate a random canary?

I'm working on a system security project with the topic: buffe roverflow attack.
I do have a program and the sourcecode I should gain root permissions with.
The problem I have is caused by a random canary, at the beginning ot the program the random number generator is initialized:
srand(time(NULL) ^ (getpid() << 16));
later on the canary gets set by
canary = rand();
My question: Is it possible to regenerate the canary? I would like to regenerate the salt (time(NULL) returns the time since 1970 in seconds and pid is constant as the program starts) and then get the canary by calling rand().
I'm not familiar with any script language and do not have a lot linux experience, so I hope not to waste time with a solution that would never work.
Thank you in advance! :)

Yes, it's possible. All you need to know is the program's start time and process ID. You can get this information from the command line by entering ps -eo pid,cmd,lstart | grep «program-name». For example, suppose the program you're interested in is called test:
$ ps -eo pid,cmd,lstart | grep test
26798 ./test Thu Oct 26 22:03:52 2017
The number at the start of the output line is the process ID, and the time at which this process started is at the end of the line. You can easily convert this to a Unix timestamp:
$ date --date='Thu Oct 26 22:03:52 2017' '+%s'
1509055432
Last time I checked, PHP was still using the same random number generator. So you can use PHP to calculate the canary:
$ php -r 'srand(1509055432 ^ (26798 << 16)); echo rand()."\n";'
1670313680
(If you don't have PHP, you could easily roll your own C application to do this calculation for you.)
It should be quite obvious that srand(time(NULL)) is of no use whatsoever if you want to generate values that are hard to guess. A better approach would be to fetch bytes from \dev\urandom instead.

High frequency calls to 'VM Periodic Task Thread'

Running a small jetty application on a raspberry pi I noticed that after the first access, the application keeps burning around 3% CPU. A quick inspection showed that the same is true, with less %, on my laptop. Checking with strace I find a never ending sequence of
...
12:58:01.999717 clock_gettime(CLOCK_MONOTONIC, {2923, 200177551}) = 0
12:58:01.999864 futex(0x693a0f44, FUTEX_WAIT_BITSET_PRIVATE, 1, {2923, 250177551}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
12:58:02.050090 futex(0x693a0f28, FUTEX_WAKE_PRIVATE, 1) = 0
12:58:02.050236 gettimeofday({1436093882, 50296}, NULL) = 0
12:58:02.050403 gettimeofday({1436093882, 50444}, NULL) = 0
12:58:02.050767 clock_gettime(CLOCK_MONOTONIC, {2923, 251228114}) = 0
...
(This is Java 7 on ubuntu 14.04 with Jetty 9.3.* using an h2 db, just in case this rings any bells for someone.)
I learned that it suffices to capture strace -f -tt -p <pid> -o out.txt, grep for clock_gettime, extract the pid, sort and uniq -c to find the thread calling clock_gettime most often. Plotting the delta times nicely shows a line at 50 milliseconds. Further the PID can be found in a thread dump taken with jvisualvm as the nid in hex and turns out to be 'VM Periodic Task Thread'. But why so often? This does not seem to be a standard behaviour of the JVM.

Passing arguments to interactive fortran program

I have a fortran program (which I cannot modify) that requires several inputs from the user (in the command line) when it is run. The program takes quite a while to run, and I would like to retain use of the terminal by running it in the background; however, this is not possible due to its interactive nature.
Is there a way, using a bash script or some other method, that I can pass arguments to the program without directly interacting with it via the command line?
I'm not sure if this is possible; I tried searching for it but came up empty, though I'm not exactly sure what to search for.
Thank you!
ps. I am working on a unix system where I cannot install things not already present.

You can pipe it in:
$ cat delme.f90
program delme
read(*, *) i, j, k
write(*, *) i, j, k
end program delme
$ echo "1 2 3" | ./delme
1 2 3
$ echo "45 46 47" > delme.input
$ ./delme < delme.input
45 46 47
$ ./delme << EOF
> 3 2 1
> EOF
3 2 1

fastest hashing in a unix environment?

I need to examine the output of a certain script 1000s of times on a unix platform and check if any of it has changed from before.
I've been doing this:
(script_stuff) | md5sum
and storing this value. I actually don't really need "md5", JUST a simple hash function which I can compare against a stored value to see if its changed. Its okay if there are an occassional false positive.
Is there anything better than md5sum that works faster and generates a fairly usable hash value? The script itself generates a few lines of text - maybe 10-20 on average to max 100 or so.
I had a look at fast md5sum on millions of strings in bash/ubuntu - that's wonderful, but I can't compile a new program. Need a system utility... :(
Additional "background" details:
I've been asked to monitor the DNS record of a set of 1000 or so domains and immediately call certain other scripts if there has been any change. I intend to do a dig xyz +short statement and hash its output and store that, and then check it against a previously stored value. Any change will trigger the other script, otherwise it just goes on. Right now, we're planning on using cron for a set of these 1000, but can think completely diffeerently for "seriously heavy" usage - ~20,000 or so.
I have no idea what the use of such a system would be, I'm just doing this as a job for someone else...

The cksum utility calculates a non-cryptographic CRC checksum.

How big is the output you're checking? A hundred lines max. I'd just save the entire original file then use cmp to see if it's changed. Given that a hash calculation will have to read every byte anyway, the only way you'll get an advantage from a checksum type calculation is if the cost of doing it is less than reading two files of that size.
And cmp won't give you any false positives or negatives :-)
pax> echo hello >qq1.txt
pax> echo goodbye >qq2.txt
pax> cp qq1.txt qq3.txt
pax> cmp qq1.txt qq2.txt >/dev/null
pax> echo $?
1
pax> cmp qq1.txt qq3.txt >/dev/null
pax> echo $?
0
Based on your question update:
I've been asked to monitor the DNS record of a set of 1000 or so domains and immediately call certain other scripts if there has been any change. I intend to do a dig xyz +short statement and hash its output and store that, and then check it against a previously stored value. Any change will trigger the other script, otherwise it just goes on. Right now, we're planning on using cron for a set of these 1000, but can think completely diffeerently for "seriously heavy" usage - ~20,000 or so.
I'm not sure you need to worry too much about the file I/O. The following script executed dig microsoft.com +short 5000 times first with file I/O then with output to /dev/null (by changing the comments).
#!/bin/bash
rm -rf qqtemp
mkdir qqtemp
((i = 0))
while [[ $i -ne 5000 ]] ; do
#dig microsoft.com +short >qqtemp/microsoft.com.$i
dig microsoft.com +short >/dev/null
((i = i + 1))
done
The elapsed times at 5 runs each are:
File I/O | /dev/null
----------+-----------
3:09 | 1:52
2:54 | 2:33
2:43 | 3:04
2:49 | 2:38
2:33 | 3:08
After removing the outliers and averaging, the results are 2:49 for the file I/O and 2:45 for the /dev/null. The time difference is four seconds for 5000 iterations, only 1/1250th of a second per item.
However, since an iteration over the 5000 takes up to three minutes, that's how long it will take maximum to detect a problem (a minute and a half on average). If that's not acceptable, you need to move away from bash to another tool.
Given that a single dig only takes about 0.012 seconds, you should theoretically do 5000 in sixty seconds assuming your checking tool takes no time at all. You may be better off doing something like this in Perl and using an associative array to store the output from dig.
Perl's semi-compiled nature means that it will probably run substantially faster than a bash script and Perl's fancy stuff will make the job a lot easier. However, you're unlikely to get that 60-second time much lower just because that's how long it takes to run the dig commands.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio