apachebench (ab) -g switch append to file? - apachebench

apachebench (ab) -g switch generates graph friendly data. I'd like to have it run every x(min) in order to collect performance metrics and make a graph however every time the command runs it replaces the data in the file;
Is there a way to make it append the data to the file instead?
example:
ab -g /tmp/graph1 -n 25 -c 1 http://www.google.com/
will put the graph data in /tmp/graph1 file
If I run it a second time however the data in there will be lost and replaced for new data; I want it to append instead and keep both runs data in the file.

I found a rather simple way to make it work:
ab -g /tmp/graph1 -n 25 -c 1 http://www.google.com/ && cat /tmp/graph1 >> /tmp/full_graph

Related

Show only newly added lines of logfile in terminal

I use tail -f to show the contents of a logfile.
What I want is when the logfile content changes, instead of appending the new lines to my screen, only the newly added lines should be shown on my screen.
So as if a clearscreen was made every time before printing the new lines.
I tried to find a solution by web search but couldn't find anything useful.
edit:
In my case it happens that several lines will be added at once (it is a php error logfile). So I am looking for a solution where more than the single last line can be shown on screen.
The watch command in combination with the tail command shows the last line of a log file with the intervall of every 2 seconds. Basically it doesn't refresh whenever a new line is appended to the log file but since you could specifiy an intervall it might help you for your use case.
watch -t tail -1 <path_to_logfile>
If you need a faster intervall like every 0.5 seconds, then you could specify it with the 'n' option i.e.:
watch -t -n 0.5 tail -1 <path_to_logfile>
Try
$ watch 'tac FILE | grep -m1 -C2 PATTERN | tac'
where
PATTERN is any keyword (or regexp) to identify errors you seek in the log,
tac prints the lines in reverse,
-m is a max count of matching lines to grep,
-C is any number of lines of context (before and after the match) to show (optional).
That would be similar to
$ tail -f FILE | grep -C2 PATTERN
if you didn't mind just appending occurrences to the output in real-time.
But if you don't know any generic PATTERN to look for at all,
you'd have to just follow all the updates as the logfile grows:
$ tail -n0 -f FILE
Or even, create a copy of the logfile and then do a diff:
Copy: cp file.log{,.old}
Refresh the webpage with your .php code (or whatever, to trigger the error)
Run: diff file.log{,.old}
(or, if you prefer sort to diff: $ sort file.log{,.old} | uniq -u)
The curly braces is shorthand for both filenames (see Brace Expansion in $ man bash)
If you must avoid any temp copies, store the line count in memory:
z=$(grep -c ^ file.log)
Refresh the webpage to trigger an error
tail -n +$z file.log
The latter approach can be built upon, to create a custom scripting solution more suitable for your needs (check timestamps, clear screen, filter specific errors, etc). For example, to only show the lines that belong to the last error message in the log file updated in real-time:
$ clear; z=$(grep -c ^ FILE); while true; do d=$(date -r FILE); sleep 1; b=$(date -r FILE); if [ "$d" != "$b" ]; then clear; tail -n +$z FILE; z=$(grep -c ^ FILE); fi; done
where
FILE is, obviously, your log file name;
grep -c ^ FILE counts all lines in a file (that is almost, but not entirely unlike cat FILE|wc -l that would only count newlines);
sleep 1 sets the pause/delay between checking the file timestamps to 1 second, but you could change it to even a floating point number (the less the interval, the higher the CPU usage).
To simplify any repetitive invocations in future, you could save this compound command in a Bash script that could take a target logfile name as an argument, or define a shell function, or create an alias in your shell, or just reverse-search your bash history with CTRL+R. Hope it helps!

How to download URLs in a csv and naming outputs based on a column value

1. OS: Linux / Ubuntu x86/x64
2. Task:
Write a Bash shell script to download URLs in a (large) csv (as fast/simultaneous as possible) and naming each output on a column value.
2.1 Example Input:
A CSV file containing lines like:
001,http://farm6.staticflickr.com/5342/a.jpg
002,http://farm8.staticflickr.com/7413/b.jpg
003,http://farm4.staticflickr.com/3742/c.jpg
2.2 Example outputs:
Files in a folder, outputs, containg files like:
001.jpg
002.jpg
003.jpg
3. My Try:
I tried mainly in two styles.
1. Using the download tool's inner support
Take ariasc as an example, it support use -i option to import a file of URLs to download, and (I think) it will process it in parallel to max speed. It do have --force-sequential option to force download in the order of the lines, but I failed to find a way to make the naming part happen.
2. Splitting first
split the file into files and run a script like the following to process it:
#!/bin/bash
INPUT=$1
while IFS=, read serino url
do
aria2c -c "$url" --dir=outputs --out="$serino.jpg"
done < "$INPUT"
However, it means for each line it will restart aria2c again which seems cost time and low the speed.
Though, one can run the script in bash command multiple times to get 'shell-level' parallelism, it seems not to be the best way.
Any suggestion ?
Thank you,
aria2c supports so called option lines in input files. From man aria2c
-i, --input-file=
Downloads the URIs listed in FILE. You can specify multiple sources for a single entity by putting multiple URIs on a single line separated by the TAB character. Additionally, options can be specified after each URI line. Option lines must start with one or more white space characters (SPACE or TAB) and must only contain one option per line.
and later on
These options have exactly same meaning of the ones in the command-line options, but it just applies to the URIs it belongs to. Please note that for options in input file -- prefix must be stripped.
You can convert your csv file into an aria2c input file:
sed -E 's/([^,]*),(.*)/\2\n out=\1/' file.csv | aria2c -i -
This will convert your file into the following format and run aria2c on it.
http://farm6.staticflickr.com/5342/a.jpg
out=001
http://farm8.staticflickr.com/7413/b.jpg
out=002
http://farm4.staticflickr.com/3742/c.jpg
out=003
However this won't create files 001.jpg, 002.jpg, … but 001, 002, … since that's what you specified. Either specify file names with extensions or guess the extensions from the URLs.
If the extension is always jpg you can use
sed -E 's/([^,]*),(.*)/\2\n out=\1.jpg/' file.csv | aria2c -i -
To extract extensions from the URLs use
sed -E 's/([^,]*),(.*)(\..*)/\2\3\n out=\1\3/' file.csv | aria2c -i -
Warning: This works if and only if every URL ends with an extension. For instance, due to the missing extension the line 001,domain.tld/abc would not be converted at all, causing aria2c to fail on the "URL" 001,domain.tld/abc.
Using all standard utilities you can do this to download in parallel:
tr '\n' ',' < file.csv |
xargs -P 0 -d , -n 2 bash -c 'curl -s "$2" -o "$1.jpg"' -
-P 0 option in xargs lets it run commands in parallel (one per core processor)

Weka's StringToWordVector filter from command line?

Is it possible to run the StringToWordVector filter in Weka from the command line and get a processed output file? I'd like to pre-process my data separately before feeding it back into Weka for training. So I'm trying to run the filter, get an output file, and then do the rest. I am using a high-end GPU virtual machine with SSH-only access, so I can't use the Weka GUI, only the command line.
See this
java weka.filters.unsupervised.attribute.StringToWordVector -O -L -tokenizer "weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\"\\'()?!-¿¡+*&#$%\\\\/=<>[]_`#\"" -W 10000000 -b -i input-train.arff -o output-train-vector.arff -r input-test.arff -s output-test-vector.arff

Possible to output Vowpal Wabbit predictions to .txt along with observed target values?

We're writing a forecasting application that uses Vowpal Wabbit and are looking to automate as much of our model validation process as we can. Anyone know whether vw has a native utility to output the target values in a test file along with the predictions from a vw model? These values are printed to the terminal output during prediction. Is there an argument to the regular vw call, or perhaps a tool in the utl folder that prints targets and forecasts together on a row-wise basis?
Here's what the code I'm using now for prediction looks like:
vw -d /path/to/data/test.vw -t -i lg.vw --link=logistic -p predictions.txt
My goal is to produce from within Vowpal an output file that looks like this:
Predicted Target
0.78 1
0.23 0
0.49 1
...
UPDATE
#arielf's code worked like a charm. I've only made one minor addition to print the streaming results to a validation.txt file:
vw -d test.vw -t -i lg.vw --link=logistic -P 1 2>&1 | \
perl -ane 'print "$F[5]\t$F[4]\n" if (/^\d/)' > validation.txt
Try this:
vw -d test.vw -t -i lg.vw --link=logistic -P 1 2>&1 | \
perl -ane 'print "$F[5]\t$F[4]\n" if (/^\d/)'
Explanation:
-P 1 # Add option: set vw progress report to apply to every example
Note: -P is a capital P (alias for --progress), 1 is the progress printing interval.
Note that you don't need to add predictions with -p ... since that is redundant in this case (predictions are already included in vw progress lines)
A progress report line with headers, looks like this:
average since example example current current current
loss last counter weight label predict features
0.000494 0.000494 1 1.0 -0.0222 0.0000 14
Since progress report goes to stderr, we need to redirect stderr to stdout (2>&1).
Now we pipe the vw progress output into perl for simple post-processing. The perl command loops over each line of input without printing by default (-n), auto-splits into fields on white-space (-a), and applies the expression (-e) printing the 5th and 4th fields separated by a TAB and terminated by a newline if the line starts with a number (in order to skip whatever isn't a progress line, e.g. headers, preambles and summary lines). I reversed the 5th & 4th filed order because vw progress lines have the observed value before the predicted value and you asked for the opposite order.
UPDATE
Aaron published a working example using this solution in Google Drive: https://drive.google.com/open?id=0BzKSYsAMaJLjZzJlWFA2N3NnZGc

Append to the top of a large file: bash

I have a nearly 3 GB file that I would like to add two lines to the top of. Every time I try to manually add these lines, vim and vi freeze up on the save (I let them try to save for about 10 minutes each). I was hoping that there would be a way to just append to the top, in the same way you would append to the bottom of the file. The only things I have seen so far however include a temporary file, which I feel would be slow due to the file size.
I was hoping something like:
grep -top lineIwant >> fileIwant
Does anyone know a good way to append to the top of the file?
Try
cat file_with_new_lines file > newfile
I did some benchmarking to compare using sed with in-place edit (as suggested here) to cat (as suggested here).
~3GB bigfile filled with dots:
$ head -n3 bigfile
................................................................................
................................................................................
................................................................................
$ du -b bigfile
3025635308 bigfile
File newlines with two lines to insert on top of bigfile:
$ cat newlines
some data
some other data
$ du -b newlines
26 newlines
Benchmark results using dumbbench v0.08:
cat:
$ dumbbench -- sh -c "cat newlines bigfile > bigfile.new"
cmd: Ran 21 iterations (0 outliers).
cmd: Rounded run time per iteration: 2.2107e+01 +/- 5.9e-02 (0.3%)
sed with redirection:
$ dumbbench -- sh -c "sed '1i some data\nsome other data' bigfile > bigfile.new"
cmd: Ran 23 iterations (3 outliers).
cmd: Rounded run time per iteration: 2.4714e+01 +/- 5.3e-02 (0.2%)
sed with in-place edit:
$ dumbbench -- sh -c "sed -i '1i some data\nsome other data' bigfile"
cmd: Ran 27 iterations (7 outliers).
cmd: Rounded run time per iteration: 4.464e+01 +/- 1.9e-01 (0.4%)
So sed seems to be way slower (80.6%) when doing in-place edit on large files, probably due to moving the intermediary temp file to the location of the original file afterwards. Using I/O redirection sed is only 11.8% slower than cat.
Based on these results I would use cat as suggested in this answer.
Try doing this :
using sed :
sed -i '1i NewLine' file
Or using ed :
ed -s file <<EOF
1i
NewLine
.
w
q
EOF
The speed of such an operation depends greatly on the underlying file system. To my knowledge there isn't a FS optimized for this particular operation. Most FS organize files using full disk blocks, excepted for the last one, which may be partially used by the end of the file. Indeed, a file of size N would take N/S blocks, where S is the block size, and one more block for the remaining part of the file (of size N%S, % being the remainder operator), if N is not divisible by S.
Usually, these blocks are referenced by their indices on the disk (or partition), and these indices are stored within the FS metadata, attached to the file entry which allocates them.
From this description, you can see that it could be possible to prepend content whose size would be a multiple of the block size, by just updating the metadata with the new list of blocks used by the file. However, if that prepended content doesn't fill exactly a number of blocks, then the existing data would have to be shifted by that exceeding amount.
Some FS may implement the possibility of having partially used blocks within the list (and not only as the last entry) of used ones for files, but this is not a trivial thing to do.
See these other SO questions for further details:
Prepending Data to a file
Is there a file system with a low level prepend operation
At a higher level, even if that operation is supported by the FS driver, it is still possible that programs don't use the feature.
For the instance of that problem you are trying to solve, the best way is probably a program capable of catening the new content and the existing one to a new file.
cat file
Unix
linux
It append to the the two lines of the file at the same time using the command
sed -i '1a C \n java ' file
cat file
Unix
C
java
Linux
you want to INSERT means using i and Replace means using c

Resources