awk system call - bash

I want to use awk and the system() function to move a couple of directories around.
I have a file that I want to process with awk names file.cfg which is organized in the following way:
/path1 /path2
/some_path /some_other_path
and so on..
each first path is separated from the second path by a whitespace
So here's how I did it:
awk '{system(mv -R $1" "$2)}' file.cfg
but it doesn't work and I get
sh: 0/home/my_user/path1: No such file or directory
But file.cfg looks like this:
/home/my_user/path1 /home/my_user/path2
and there is no 0 before /home. So what am I missing here?

You have to quote the command you give to system:
awk '{system("mv -R " $1 " " $2)}' file.cfg
Currently mv -R is interpreted as the value of variable mv minus the value of R, which is 0 since neither is defined.

Why not just use xargs?
cat file.cfg | xargs -n 2 mv
This will pass tokens (separated by whitespace) from your file into mv in groups of two.

Related

Arrange file based on month information on filename

I have a folder contain daily rainfall data in geotiff format from 1981-2019 with naming convention chirps-v2.0.yyyymmdd.1days.tif
I would like to arrange all the files based on MONTH information, and move into a new folder, ie all files with Month = January will move to Month01 folder.
Is there any one-liner solution for that, I am using terminal on macos.
This should do it:
for i in $(seq -f "%02g" 1 12); do mkdir -p "Month$i"; mv chirps-v2.0.????$i*.tif "Month$i"; done
Explanation:
For each number in the range 1, 12 (padded with 0 if necessary)...
Make the directories Month01, Month02, etc. If the directory already exists, continue.
Move all files that include the current month number in the relevant part of the filename to the appropriate folder. The question marks in chirps-v2.0.????$i*.tif represent single-character wildcards.
Note: If there is any chance there will be spaces in your .tif filenames, you can use "chirps-v2.0."????"$i"*".tif" instead.
I don't think there is a simple way to do this. You can, however, do a "one-liner" solution if you use pipes and for loops, things like that:
for file in $(ls *.tif); do sed -r 's/(.*\.[0-9]{4})([0-9]{2})(.*)/\1 \2 \3/' <<< "$file" | awk '{print "mkdir -p dstDir/Month" $2 "; cp", $1 $2 $3, "dstDir/Month" $2}' | bash ; done
Formatting this a bit:
for file in $(ls *.tif); do \
sed -r 's/(.*\.[0-9]{4})([0-9]{2})(.*)/\1 \2 \3/' <<< "$file" \
| awk '{print "mkdir -p dstDir/Month" $2 "; cp", $1 $2 $3, "dstDir/Month" $2}' \
| bash
done
This needs to be executed from the directory containing your files (see "ls *.tif). You will also need to replace "dstDir" with the name of the parent directory where "Month01" will be created.
This may not be perfect, but you can edit it, if required. Also, if you don't have bash, only zsh, replace the "bash" bit by "zsh" should still work.

How to print just the file name when looping through all files of a directory

Imagine there are these 3 subdirectories inside my directory:
dfcg7 yhjs6 gbggse3
Inside each of this subdirectories there is a txt file, which I would like to use in another program, so I would like to print all the relative paths to this files.
I am trying:
for file in /mnt/lustre/mydir*
do
printf "$file/*.txt \t"
done
and I also tried:
for file in /mnt/lustre/mydir*
do
printf "$file"/*.txt "\t"
done
but in both cases, my output is this:
/mnt/lustre/mydir/dfcg7/*txt/mnt/lustre/mydir/yhjs6/*txt/mnt/lustre/mydir/gbggse3/*txt
My output is no tab separated
It is printing the full path, instead of the relative
It is not printing the file name inside each subdirectory
So, my desired output would be this:
dfcg7/fileA.txt yhjs6/fileB.txt gbggse3/fileC.txt
How can I solve this?
You could store the path prefix in a variable:
prefix=/mnt/lustre/mydir
Assign the files to an array:
files=("$prefix"/*/*.txt)
And then print the array, tab separated, while removing the prefix from each element:
$ (IFS=$'\t'; printf '%s\n' "${files[*]/#"$prefix"\/}")
dfcg7/fileA.txt gbggse3/fileC.txt yhjs6/fileB.txt
This uses a subshell to contain the scope of the modified IFS.
This is a quick solution that I've been able to come up with
RESULT=""
for file in /path/to/you/files/*; do
OUTPUT=$(ls $file/*txt)
RESULT="$RESULT:$OUTPUT"
done
# replace every : with a tab symbol and remove the first tab
echo $RESULT | tr ':' '\t' | gsed -r 's/^\s+//g'
Notice that I've used gsed here. It is a GNU sed available for MacOS. If you are using Linux you can simply use sed.
You can try.
for file in /mnt/lustre/mydir/*/*.txt; do
printf '%s\t%s\t%s' "${file#*/*/*/*/}"
done
Output
dfcg7/fileA.txt gbggse3/fileC.txt yhjs6/fileB.txt
Try:
cd /mnt/lustre/mydir
for file in */*.txt
do
printf "%s\t" "$file"
done
printf "\n"
Or easier:
(cd /mnt/lustre/mydir; ls -d */*.txt | tr '\n' '\t'); echo
The last echo is to append a \n at the end, but if you don't need it, you can leave it out.

How to rename a CSV file from a value in the CSV file

I have 100 1-line CSV files. The files are currently labeled AAA.txt, AAB.txt, ABB.txt (after I used split -l 1 on them). The first field in each of these files is what I want to rename the file as, so instead of AAA, AAB and ABB it would be the first value.
Input CSV (filename AAA.txt)
1234ABC, stuff, stuff
Desired Output (filename 1234ABC.csv)
1234ABC, stuff, stuff
I don't want to edit the content of the CSV itself, just change the filename
something like this should work:
for f in ./* ; do new_name=$(head -1 $f | cut -d, -f1); cp $f dir/$new_name
move them into a new dir just in case something goes wrong, or you need the original file names.
starting with your original file before splitting
$ awk -F, '{print > ($1".csv")}' originalFile.csv
and do all in one shot.
This will store the whole input file into the colum1.csv of the inputfile.
awk -F, '{print $0 > $1".csv" }' aaa.txt
In a terminal, changed directory, e.g. cd /path/to/directory that the files are in and then use the following compound command:
for f in *.txt; do echo mv -n "$f" "$(awk -F, '{print $1}' "$f").cvs"; done
Note: There is an intensional echo command that is there for you to test with, and it will only print out the mv command for you to see that it's the outcome you wish. You can then run it again removing just echo from the compound command to actually rename the files as desired via the mv command.

Add a prefix to logs with AWK

I am facing a problem with a script I need to use for log analysis; let me explain the question:
I have a gzipped file like:
5555_prova.log.gz
Inside the file there are mali lines of log like this one:
2018-06-12 03:34:31 95.245.15.135 GET /hls.playready.vod.mediasetpremium/farmunica/2018/06/218742_163f10da04c7d2/hlsrc/w12/21.ts
I need a script read the gzipped log file which is capable to output on the stdout a modified log line like this one:
5555 2018-06-12 03:34:31 95.245.15.135 GET /hls.playready.vod.mediasetpremium/farmunica/2018/06/218742_163f10da04c7d2/hlsrc/w12/21.ts
As you can see the line of log now start with the number read from the gzip file name.
I need this new line to feed a logstash data crunching chain.
I have tried with a script like this:
echo "./5555_prova.log.gz" | xargs -ISTR -t -r sh -c "gunzip -c STR | awk '{$0="5555 "$0}' "
this is not exactly what I need (the prefix is static and not captured with a regular expression from the file name) but even with this simplified version I receive an error:
sh -c gunzip -c ./5555_prova.log.gz | awk '{-bash=5555 -bash}'
-bash}' : -c: line 0: unexpected EOF while looking for matching `''
-bash}' : -c: line 1: syntax error: unexpected end of file
As you can see from the above output the $0 is no more the whole line passed via pipe to awk but is a strange -bash.
I need to use xargs because the list of gzipped file is fed the the command line from an another tool (i.e. an instantiated inotifywait listening to a directory where the files are written via ftp).
What I am missing? do you have some suggestions to point me in the right direction?
Regards,
S.
Trying to following the #Charles Duffy suggestion I have written this code:
#/bin/bash
#
# Usage: sendToLogstash.sh [pattern]
#
# Executes a command whenever files matching the pattern are closed in write
# mode or moved to. "{}" in the command is replaced with the matching filename (via xargs).
# Requires inotifywait from inotify-tools.
#
# For example,
#
# whenever.sh '/usr/local/myfiles/'
#
#
DIR="$1"
PATTERN="\.gz$"
script=$(cat <<'EOF'
awk -v filename="$file" 'BEGIN{split(filename,array,"_")}{$0=array[1] OFS $0} 1' < $(gunzip -dc "$DIR/$file")
EOF
)
inotifywait -q --format '%f' -m -r -e close_write -e moved_to "$DIR" \
| grep --line-buffered $PATTERN | xargs -I{} -r sh -c "file={}; $script"
But I got the error:
[root#ms-felogstash ~]# ./test.sh ./poppo
gzip: /1111_test.log.gz: No such file or directory
gzip: /1111_test.log.gz: No such file or directory
sh: $(gunzip -dc "$DIR/$file"): ambiguous redirect
Thanks for your help, I feel very lost writing bash scripts.
Regards,
S.
EDIT: Also in case you are dealing with multiple .gz files and want to print their content along with their file names(first column _ delimited) then following may help you.
for file in *.gz; do
awk -v filename="$file" 'BEGIN{split(filename,array,"_")}{$0=array[1] OFS $0} 1' <(gzip -dc "$file")
done
I haven't tested your code(couldn't completely understand also), so trying to give here a way like in case your code could pass file name to awk then it will be pretty simple to append the file's first digits like as follows(just an example).
awk 'FNR==1{split(FILENAME,array,"_")} {$0=array[1] OFS $0} 1' 5555_prova.log_file
So here I am taking FILENAME out of the box variable for awk(only in first line of file) and then by splitting it into array named array and then adding it in each line of the file.
Also wrap "gunzip -c STR this with ending " which seems to be missing before you pass its output to awk too.
NEVER, EVER use xargs -I with a string substituted into sh -c (or bash -c or any other context where that string is interpreted as code). This allows malicious filenames to run arbitrary commands -- think about what happens if someone runs touch $'$(rm -rf ~)\'$(rm -rf ~)\'.gz', and gets that file into your log.
Instead, let xargs append arguments after your script text, and write your script to iterate over / read those arguments as data, rather than having them substituted into code.
To show how to use xargs safely (well, safely if we assume that you've filtered out filenames with literal newlines):
# This way you don't need to escape the quotes in your script by hand
script=$(cat <<'EOF'
for arg; do gunzip -c <"$arg" | awk '{$0="5555 "$0}'; done
EOF
)
# if you **did** want to escape them by hand, it would look like this:
# script='for arg; do gunzip -c <"$arg" | awk '"'"'{$0="5555 "$0}'"'"'; done'
echo "./5555_prova.log.gz" | xargs -d $'\n' sh -c "$script" _
To be safer with all possible filenames, you'd instead use:
printf '%s\0' "./5555_prova.log.gz" | xargs -0 sh -c "$script" _
Note the use of NUL-delimited input (created with printf '%s\0') and xargs -0 to consume it.

awk Getting ALL line but last field with the delimiters

I have to make a one-liner that renames all files in the current directory
that end in ".hola" to ".txt".
For example:
sample.hola and name.hi.hola will be renamed to sample.txt and name.hi.txt respectively
I was thinking about something like:
ls -1 *.hola | awk '{NF="";print "$0.hola $0.txt"}' (*)
And then passing the stdin to xargs mv -T with a |
But the output of (*) for the example would be sample and name hi.
How do I get the output name.hi for name.hi.hola using awk?
Why would you want to involve awk in this?
$ for f in *.hola; do echo mv "$f" "${f%hola}txt"; done
mv name.hi.hola name.hi.txt
mv sample.hola sample.txt
Remove the echo when you're happy with the output.
Well, for your specific problem, I recommend the rename command. Depending on the version on your system, you can do either rename -s .hola .txt *.hola, or rename 's/\.hola$/.txt/' *.hola.
Also, you shouldn't use ls to get filenames. When you run ls *.hola, the shell expands *.hola to a list of all the filenames matching that pattern, and ls is just a glorified echo at that point. You can get the same result using e.g. printf '%s\n' *.hola without running any program outside the shell.
And your awk is missing any attempt to remove the .hola. If you have GNU awk, you can do something like this:
awk -F. '{old=$0; NF-=1; new=$0".txt"; print old" "new}'
That won't work on BSD/MacOS awk. In that case you can do something like this:
awk -F. '{
old=$0; new=$1;
for (i=2;i<NF;++i) { new=new"."$i };
print old" "new".txt"; }'
Either way, I'm sure #EdMorton probably has a better awk-based solution.
How about this? Simple and straightforward:
for file in *.hola; do mv "$file" "${file/%hola/txt}"; done

Resources