Goaccess Process Multiple Logs - bash

I have a directory with log files. I want to process the last 13 of them (past quarter). I can't use a wildcard using Goaccess because I don't want to include all of them, just the last 13 generated weeks' worth.
I have an array of the filenames of those last 13 files, but I don't know the syntax for the Goaccess command to include those files. I can't find any reference as to how to do this, as all notes I've seen refer to using a wildcard. I don't want to start copying and moving files around. There should be a way of doing this in the command line with multiple filenames which I can generate just fine.
How can I use a multiple logname input syntax in Goaccess?
Something like:
/usr/local/bin/goaccess -p /users/rich/things/goaccess.conf log1.log log2.log log3.log -o qreport.html

MULTIPLE LOG FILES
There are several ways to parse multiple logs with GoAccess. The
simplest is to pass multiple log files to the command line:
goaccess access.log access.log.1
goaccess-custom-logs
In your case, you need to process only the last three generated file so you can get the last three file using ls. The final command will become
/usr/local/bin/goaccess -p /users/rich/things/goaccess.conf $(ls -t log* | head -3 | tr '\r\n' ' ') -o qreport.html
This will process the last three files that is started with log

Related

How to download URLs in a csv and naming outputs based on a column value

1. OS: Linux / Ubuntu x86/x64
2. Task:
Write a Bash shell script to download URLs in a (large) csv (as fast/simultaneous as possible) and naming each output on a column value.
2.1 Example Input:
A CSV file containing lines like:
001,http://farm6.staticflickr.com/5342/a.jpg
002,http://farm8.staticflickr.com/7413/b.jpg
003,http://farm4.staticflickr.com/3742/c.jpg
2.2 Example outputs:
Files in a folder, outputs, containg files like:
001.jpg
002.jpg
003.jpg
3. My Try:
I tried mainly in two styles.
1. Using the download tool's inner support
Take ariasc as an example, it support use -i option to import a file of URLs to download, and (I think) it will process it in parallel to max speed. It do have --force-sequential option to force download in the order of the lines, but I failed to find a way to make the naming part happen.
2. Splitting first
split the file into files and run a script like the following to process it:
#!/bin/bash
INPUT=$1
while IFS=, read serino url
do
aria2c -c "$url" --dir=outputs --out="$serino.jpg"
done < "$INPUT"
However, it means for each line it will restart aria2c again which seems cost time and low the speed.
Though, one can run the script in bash command multiple times to get 'shell-level' parallelism, it seems not to be the best way.
Any suggestion ?
Thank you,
aria2c supports so called option lines in input files. From man aria2c
-i, --input-file=
Downloads the URIs listed in FILE. You can specify multiple sources for a single entity by putting multiple URIs on a single line separated by the TAB character. Additionally, options can be specified after each URI line. Option lines must start with one or more white space characters (SPACE or TAB) and must only contain one option per line.
and later on
These options have exactly same meaning of the ones in the command-line options, but it just applies to the URIs it belongs to. Please note that for options in input file -- prefix must be stripped.
You can convert your csv file into an aria2c input file:
sed -E 's/([^,]*),(.*)/\2\n out=\1/' file.csv | aria2c -i -
This will convert your file into the following format and run aria2c on it.
http://farm6.staticflickr.com/5342/a.jpg
out=001
http://farm8.staticflickr.com/7413/b.jpg
out=002
http://farm4.staticflickr.com/3742/c.jpg
out=003
However this won't create files 001.jpg, 002.jpg, … but 001, 002, … since that's what you specified. Either specify file names with extensions or guess the extensions from the URLs.
If the extension is always jpg you can use
sed -E 's/([^,]*),(.*)/\2\n out=\1.jpg/' file.csv | aria2c -i -
To extract extensions from the URLs use
sed -E 's/([^,]*),(.*)(\..*)/\2\3\n out=\1\3/' file.csv | aria2c -i -
Warning: This works if and only if every URL ends with an extension. For instance, due to the missing extension the line 001,domain.tld/abc would not be converted at all, causing aria2c to fail on the "URL" 001,domain.tld/abc.
Using all standard utilities you can do this to download in parallel:
tr '\n' ',' < file.csv |
xargs -P 0 -d , -n 2 bash -c 'curl -s "$2" -o "$1.jpg"' -
-P 0 option in xargs lets it run commands in parallel (one per core processor)

using split command in shell how to know the number of files generated

I am using command split for a large file to generate little files which are put in a folder, my problem is the folder contains over files different from my split.
I would like to know if there is a way to know how much files are generated only from my split not the number of all files in my folder.
My command split a 2 d. Is there any option I can join to this command to know it?
I know this ls -Al | wc -l will give me the number of files in the folder that doesn't interest me.
The simplest solution here is to split into a fresh directory.
Assuming that's not possible and you aren't worried about other processes operating on the directory in question you can just count the files before and after. Something like this
$ before=(*)
$ split a 2 d
$ after=(*)
$ echo "Split files: $((after - before))"
If the other files in the directory can't have the same format as the split files (and presumably they can't or split would fail or overwrite them) then you could use an appropriate glob to get just the files that match the pattern. Soemthing like splitfiles=(d??).
That failing you could see whether the --verbose option to split allows you to use split_count=$(split --verbose a 2 d | wc -l) or similar.
To be different, I will be counting the lines with grep utilizing the --verbose option:
split --verbose other_options file|grep -c ""
Example:
$ split --verbose -b 2 file|grep -c ""
60
# yeah, my file is pretty small, splitting on 2 bytes to produce numerous files
You can use split command with options -l and -a to specify prefix and suffix for the generated files.

Find and copy the two most recent files added to a directory with a specific format

I'm currently writing a ksh script that will run every 5 minutes. I want to select the two most recently added files within a directory that have a specific format. The format of the file should be: OUS_*_*_*.html. The files should then be copied over to a destination directory.
I assume I can use find, but I am using HP-UX and it does not support the -amin, -cmin, -mmin options. Does anyone know how I can achieve this functionality?
Edit 1: I have found the following commands, each of which are supposed to return the single newest file, but in use more than one file is listed:
ls -Art | tail -n 1
ls -t | head -n1
Edit 2: I can see how the functionality of these commands should work, but ls -t lists files in a table format, and selecting the first line actually selects three separate file names. I attempted to use ls -lt, but now the first line is a string total 112 followed by the file names along with their access rights, time stamp, etc..
Edit 3: I found that the -1 (numeral 1, not l) option provides a list with just file names. Using the command ls -1t | head -n 2 I was able to gain the ability to list the two newest files.
Q: Is it possible to restrict the ls command to just look for files with the previously mentioned format?
I was able to use this block of code to list the most recently added files to a directory that conform to a specific format:
ls -1t $fileNameFormat | head -n 2

How to loop through all files in a directory and find if a value exists in those files using a shell script

I have a directory PAYMENT. Inside it I have some text files.
xyx.txt
agfh.txt
hjul.txt
I need to go through all these files and find how many entries in each file contain text like BPR.
If it's one I need to get an alert. For example, if xyx.txt contains only one BPR entry, I need to get an alert.
You do not need to loop, something like this should make it:
grep -l "BPR" /your/path/PAYMENT/*
grep is a tool to find lines matching a pattern in files.
-l shows which files have that string.
"BPR" is the string you are looking for.
/your/path/PAYMENT/* means that it will grep throughout all files in that dir.
In case you want to want to find within specific kind of files / inside directories, say so because the command would vary a little.
Update
Based on your new requests:
I need to go through all these files an find how many entries in each
file like BPR.
If its one I need to get an alert. For example, if xyx.txt contains
only one BPR entry, I need to get an alert.
grep -c is your friend (more or less). So what you can do is:
if [ "$(grep -c a_certain_file)" -eq 1 ]; then
echo "mai dei mai dei"
fi
i need to go through all these files an find is there an entry like 'BPR'
If you are looking for a command to find file names that contain BPR then use:
echo /path/to/PAYMENT/*BPR*.txt
If you are looking for a command to find file names that contain text BPR then use:
grep -l "BPR" /path/to/PAYMENT/*.txt
Give this a try:
grep -o BPR {path}/* | wc -l
where {path} is the location of all the files. It'll give you JUST the number of occurrences of the string "BPR".
Also, FYI - this has also been talked about here: count all occurrences of string in lots of files with grep

Create files using grep and wildcards with input file

This should be a no-brainer, but apparently I have no brain today.
I have 50 20-gig logs that contain entries from multiple apps, one of which addes a transaction ID to its log lines. I have 42 transaction IDs I need to review, and I'd like to parse out the appropriate lines into separate files.
To do a single file, the command would be simply,
grep CDBBDEADBEEF2020X02393 server.log* > CDBBDEADBEEF2020X02393.log
that creates a log isolated to that transaction, from all 50 server.logs.
Now, I have a file with 42 txnIDs (shortening to 4 here):
CDBBDEADBEEF2020X02393
CDBBDEADBEEF6548X02302
CDBBDE15644F2020X02354
ABBDEADBEEF21014777811
And I wrote:
#/bin/sh
grep $1 server.\* > $1.log
But that is not working. Changing the shebang to #/bin/bash -xv, gives me this weird output (obviously I'm playing with what the correct escape magic must be):
$ ./xtrakt.sh B7F6E465E006B1F1A
#!/bin/bash -xv
grep - ./server\.\*
' grep - './server.*
: No such file or directory
I have also tried the command line
grep - server.* < txids.txt > $1
But OBVIOUSLY that $1 is pointless and I have no idea how to get a file named per txid using the input redirect form of the command.
Thanks in advance for any ideas. I haven't gone the route of doing a foreach in the shell script, because I want grep to put the original filename in the output lines so I can examine context later if I need to.
Also - it would be great to have the server.* files ordered numerically (server.log.1, server.log.2 NOT server.log.1, server.log.10...)
try this:
while read -r txid
do
grep "$txid" server.* > "$txid.log"
done < txids.txt
and for the file ordering - rename files with one digit to two digit, with leading zeroes, e.g. mv server.log.1 server.log.01.

Resources