Executing a script takes so long on Git Bash - windows

I'm currently executing a script on Git Bash on a Windows 7 VM. The same script is executed within 15-20 seconds on my Mac machine, but it takes almost 1 hour to run on my Windows.
The script itself contains packages that extract data from XML files, and does not call upon any APIs or anything of the sort.
I have no idea what's going on, and I've tried solving it with the following answers, but to no avail:
https://askubuntu.com/a/738493
https://github.com/git-for-windows/git/wiki/Diagnosing-performance-issues
I would like to have someone help me out in diagnosing or giving a few pointers on what I could do to either understand where the issue is, or how to resolve it altogether.
EDIT:
I am not able to share the entire script, but you can see the type of commands that the script uses through previous questions I have asked on Stackoverflow. Essentially, there is a mixture of XMLStarlet commands that are used.
https://stackoverflow.com/a/58694678/3480297
https://stackoverflow.com/a/58693691/3480297
https://stackoverflow.com/a/58080702/3480297
EDIT2:
As a high level overview, the script essentially loops over a folder for XML files, and then retrieves certain data from each one of those files, before creating an HTML page and pasting that data in tables.
A breakdown of these steps in terms of the code can be seen below:
Searching folder for XML files and looping through each one
for file in "$directory"*
do
if [[ "$file" == *".xml"* ]]; then
filePath+=( "$file" )
fi
done
for ((j=0; j < ${#filePath[#]}; j++)); do
retrieveData "${filePath[j]}"
done
Retrieving data from the XML file in question
function retrieveData() {
filePath=$1
# Retrieve data from the revelent xml file
dataRow=$(xml sel -t -v "//xsd:element[#name=\"$data\"]/#type" -n "$filePath")
outputRow "$dataRow"
}
Outputting the data to an HTML table
function outputRow() {
rowValue=$1
cat >> "$HTMLFILE" << EOF
<td>
<div>$rowValue</div>
</td>
EOF
}
As previously mentioned, the actual xml commands used to retrieve the relevant data can differ, however, the links to my previous questions have the different types of commands used.

Your git-bash installation is out of date.
Execute git --version to confirm this. Are you using something from before 2.x?
Please install the latest version of git-bash, which is 2.24.0 as of 2019-11-13.
See the Release Notes for git for more information about performance improvements over time.

Related

Shell script to analyze files in multiple directories

I am new to Unix and shell scripting, so hoping somebody can help me. I have a code that analyzes a file and then averages over several files (3 in the pared down example below). I submit my analysis code using the bsub command in the shell script. My question is, how do I modify my script so that I can do the analysis, which entails averaging as mentioned, over all 3 files that are in different directories? The only idea that comes to my mind is to copy each file to a separate directory and then doing all of the analysis there, but the files are huge, so copying them doesn't seem like the best idea. I'll be happy to clarify anything in case that's not clear.
#!/bin/sh
lsf_job_file="submit_analysis.lsf"
start=1
stop=3
increment=1
for id in $(seq $start $increment $stop)
do
file_dir="${id}"
cp -R base "${file_dir}"
cd "${file_dir}"
# Submit the job
bsub < "${submit_analysis}"
cd ..
done

Setting the current date into a variable in a Script in bash

So for the life of me I cannot figure out why my script will not take my date command as a variable. I have a script that is run every time a message is received and procmail filters specific messages by their subject line. The script looks like this:
d=$(date +%Y%m%d)
:0 wc
* ^(From|subject).*xxx
| cat&>/xx/xx/xx/xx/MSG:$d && \
chmod xxx /xx/xx/xx/xx/MSG:$d && \
/xx/xx/xx/otherscript.sh /xx/xx/xx/xx/MSG:$d
I have run the date command plenty of times in other scripts and to stdout without any issue, so I am wondering if this is a procmail issue? I have looked at a few different sites about this but still have not found a solution. My end goal is to create unique file names as well as for organization purposes each time a new email comes in.
The other reason for me believing it has something to do with procmail is that it was working fine just 3 months ago (didn't change any files or permissions). I have even tried several variations (only showing a few examples):
$'date +"%Y%m%d"'
$(date)
echo $(date)
I get a variety of files created ranging with it printing MSG:(date), MSG:(date ,etc. MSG:(date appears to like it tries to read the variable but is getting cut off or the space between date and + is causing an issue.
And at the end of my script I send it to another script which also creates a new file with the date appended and it works just fine:
fileOut="/xxx/xxx/xxx/xxx.$v.$(date +"%Y%m%d-%H%M%S").xxx"
prints: xxx.YH8AcFV9.20160628-090506.txt
Thanks for your time :-)
Procmail does not support the modern POSIX shell command substitution syntax; you need to use backticks.
d=`date +%Y%m%d` # or just date +%F
If you want to avoid invoking an external process, the From_ pseudo-header contains a fresh date stamp on many architectures.

bash script rsync itself from remote host - how to?

I have multiple remote sites which run a bash script, initiated by cron (running VERY frequently -- 10 minutes or less), in which one of it's jobs is to sync a "scripts" directory. The idea is for me to be able to edit the scripts in one location (a server in a data center) rather than having to log into each remote site and doing any edits manually. The question is, what are the best options for syncing the script that is currently running the sync? (I hope that's clear).
I would imagine syncing a script that is currently running would be very bad. Does the following look feasible if I run it as the last statement of my script? pros? cons? Other options??
if [ -e ${newScriptPath} ]; then
mv ${newScriptPath} ${permanentPath}" | at "now + 1 minute"
fi
One problem I see is that it's possible that if I use "1 minute" (which is "at's" smallest increment), and the script ends, and cron initiates the next job before "at" replaces the script, it could try to replace it during the next run of the script....
Changing the script file during execution is indeed dangerous (see this previous answer), but there's a trick that (at least with the versions of bash I've tested with) forces bash to read the entire script into memory, so if it changes during execution there won't be any effect. Just wrap the script in {}, and use an explicit exit (inside the {}) so if anything gets added to the end of the file it won't be executed:
#!/bin/bash
{
# Actual script contents go here
exit
}
Warning: as I said, this works on the versions of bash I have tested it with. I make no promises about other versions, or other shells. Test it with the shell(s) you'll be using before putting it into production use.
Also, is there any risk that any of the other scripts will be running during the sync process? If so, you either need to use this trick with all of them, or else find some general way to detect which scripts are in use and defer updates on them until later.
So I ended up using the "at" utility, but only if the file changed. I have a ".cur" and ".new" version of the script on the local machine. If the MD5 is the same on both, I do nothing. If they are different, I wait until after the main script completes, then force copy the ".new" to the ".cur" in a different script.
I create the same lock file (name) for the update_script so another instance of the first script won't run if I'm changing it..
part in main script....
file1=`script_cur.sh`
file2=`script_new.sh`
if [ "$file1" == "$file2" ] ; then
echo "Files have the same content"
else
echo "Files are different, scheduling update_script.sh at command"
at -f update_script.sh now + 1 minute
fi

Logging bash-history in a database

Previously I wrote a script which log my previously visited directories to sqlite3 db. I wrote some shortcut to quickly search and navigate through history. Now I am thinking of doing the same with my bash commands.
When I execute a command in bash, how can I get the command name? Do I have to change the part of bash's source-code responsible for writing bash-history? Once I have a database of my command history, I can do smart search in it.
Sorry to come to this question so late!
I tend to run a lot of shells where I work and as a result long running shells history will get mixed up or lost all the time. I finally got so fed up I started logging to a database :)
I haven't worked out the integration totally but here is my setup:
Recompile bash with SYSLOG enabled. Since bash version 4.1 this code is all in place, it just needs to be enabled in the config-top.h i believe.
Install new bash and configure your syslog client to log user.info messages
Install rsyslog and rsyslog-pgsql plugin as well as postgresql. I had a couple of problems getting this installed on debian testing PM me if you run into problems or ask here :)
Configure the user messages to feed into the database.
At the end of all this all your commands should be logged into a database called table called systemevents. You will definitely want to set up indexes on a couple of the fields if you use the shell regularly as queries can start to take forever :)
Here are a couple of the indexes i set up:
Indexes:
"systemevents_pkey" PRIMARY KEY, btree (id)
"systemevents_devicereportedtime_idx" btree (devicereportedtime)
"systemevents_fromhost_idx" hash (fromhost)
"systemevents_priority_idx" btree (priority)
"systemevents_receivedat_idx" btree (receivedat)
fromhost, receivedat, and devicereportedtime are especially helpful!
From just the short time I've been using it this is really amazing. It lets me find commands across any servers ive been on recently! Never lose a command again! Also you can correlate it with downtime / other problems if you have multiple users.
Im planning on writing my own rsyslog plugin to make the history format in the database a little more usable. Ill update when I do :)
Good luck!
You can use the Advanced Shell History tool to write your shell history to sqlite3 and query the database from the command line using the provided ash_query tool.
vagrant#precise32:~$ ash_query -Q
Query Description
CWD Shows the history for the current working directory only.
DEMO Shows who did what, where and when (not WHY).
ME Select the history for just the current session.
RCWD Shows the history rooted at the current working directory.
You can write your own custom queries and also make them available from the command line.
This tool give you a lot of extra historical information besides commands - you get exit codes, start and stop times, current working directory, tty, etc.
Full disclosure - I am the author and maintainer.
Bash already records all of your commands to ~/.bash_history which is a plain text file.
You browse the contents with the up/down arrow, or search it by pressing control-r.
Take a look at fc:
fc: fc [-e ename] [-lnr] [first] [last] or fc -s [pat=rep] [command]
Display or execute commands from the history list.
fc is used to list or edit and re-execute commands from the history list.
FIRST and LAST can be numbers specifying the range, or FIRST can be a
string, which means the most recent command beginning with that
string.
Options:
-e ENAME select which editor to use. Default is FCEDIT, then EDITOR,
then vi
-l list lines instead of editing
-n omit line numbers when listing
-r reverse the order of the lines (newest listed first)
With the `fc -s [pat=rep ...] [command]' format, COMMAND is
re-executed after the substitution OLD=NEW is performed.
A useful alias to use with this is r='fc -s', so that typing `r cc'
runs the last command beginning with `cc' and typing `r' re-executes
the last command.
Exit Status:
Returns success or status of executed command; non-zero if an error occurs.
You can invoke it to get the text to insert into your table, but why bother if it's already saved by bash?
In order to get the full history either use history command and process its output:
$ history > history.log
or flush the history (as it is being kept in memory by BASH) using:
$ history -a
and then process ~/.bash_history

Compiling historical information (esp. SLOCs) about a project

I am looking for a tool that will help me to compile a history of certain code metrics for a given project.
The project is stored inside a mercurial repository and has about a hundred revisions. I am looking for something that:
checks out each revision
computes the metrics and stores them somewhere with an identifier of the revision
does the same with the next revisions
For a start, counting SLOCs would be sufficient, but it would also be nice to analyze # of Tests,TestCoverage etc.
I know such things are usually handled by a CI Server, however I am solo on this project and thus haven't bothered to set up a CI Server (I'd like to use TeamCity but I really didn't see the benefit of doing so in the beginnig). If I'd set up my CI Server now, could it handle that?
According to jitter's suggestion I have written a small bash script running inside cygwin using sloccount for counting the source lines. The output was simply dumped to a textfile:
#!/bin/bash
COUNT=0 #startrev
STOPATREV = 98
until [ $COUNT -gt $STOPATREV ]; do
hg update -C -r $COUNT >> sloc.log # update and log
echo "" >> sloc.log # echo a newline
rm -r lib # dont count lib folder
sloccount /thisIsTheSourcePath | print_sum
let COUNT=COUNT+1
done
You could write a e.g. shell script which
checks out first version
run sloccount on it (save output)
check out next version
repeat steps 2-4
Or look into ohloh which seems to have mercurial support by now.
Otherwise I don't know of any SCM statistics tool which supports mercurial. As mercurial is relatively young (since 2005) it might take some time until such "secondary use cases" are supported. (HINT: maybe provide a hgstat library yourself as there are for svn and csv)
If it were me writing software to do that kind of thing, I think I'd dump metrics results for the project into a single file, and revision control that. Then the "historical analysis" tool would have to pull out old versions of just that one file, rather than having to pull out every old copy of the entire repository and rerun all the tests every time.

Resources