Finding the file name in a directory with a pattern - shell

I need to find the latest file - filename_YYYYMMDD in the directory DIR.
The below is not working as the position is shifting each time because of the spaces between(occurring mostly at file size field as it differs every time.)
please suggest if there is other way.
report =‘ls -ltr $DIR/filename_* 2>/dev/null | tail -1 | cut -d “ “ -f9’

You can use AWK to cut the last field . like below
report=`ls -ltr $DIR/filename_* 2>/dev/null | tail -1 | awk '{print $NF}'`
Cut may not be an option here

If I understand you want to loop though each file in the directory and file the largest 'YYYYMMDD' value and the filename associated with that value, you can use simple POSIX parameter expansion with substring removal to isolate the 'YYYYMMDD' and compare against a value initialized to zero updating the latest variable to hold the largest 'YYYYMMDD' as you loop over all files in the directory. You can store the name of the file each time you find a larger 'YYYYMMDD'.
For example, you could do something like:
#!/bin/sh
name=
latest=0
for i in *; do
test "${i##*_}" -gt "$latest" && { latest="${i##*_}"; name="$i"; }
done
printf "%s\n" "$name"
Example Directory
$ ls -1rt
filename_20120615
filename_20120612
filename_20120115
filename_20120112
filename_20110615
filename_20110612
filename_20110115
filename_20110112
filename_20100615
filename_20100612
filename_20100115
filename_20100112
Example Use/Output
$ name=; latest=0; \
> for i in *; do \
> test "${i##*_}" -gt "$latest" && { latest="${i##*_}"; name="$i"; }; \
> done; \
> printf "%s\n" "$name"
filename_20120615
Where the script selects filename_20120615 as the file with the greatest 'YYYYMMDD' of all files in the directory.
Since you are using only tools provided by the shell itself, it doesn't need to spawn subshells for each pipe or utility it calls.
Give it a test and let me know if that is what you intended, let me know if your intent was different, or if you have any further questions.

Related

How to rename files with incrementing numbers to files with that number plus 10

Hi I have a list of files ex. 0.png, 1.png ... 60.png, 61.png and I want to rename all the files to 10.png,11.png ... 70.png, 71.png however I do not know how I could do that.
In bash, you can use a parameter expansion to handle the rename, e.g.
for name in *.png; do
val="${name%.png}"
val=$((val+10))
mv "$name" "$val.png"
done
Explanation
val is created from the parameter expansion "${name%.png}" which simply trims ".png" from the right-hand side of the filename.
val=$((val+10)) adds 10 to the number.
mv "$name" "$val.png" moves the file from its original name to the new name with the value increased by 10.
If you want to eliminate the intermediate val variable, you can do it all in a single expression, e.g.
for name in *.png; do
mv "$name" "$((${name%.png} + 10)).png"
done
Look things over and let me know if you have further questions.
Assuming that the filenames are of the form number.ext, this function would do the trick.
#!/bin/bash
function rename_file() {
local file=$1
local fname=$(($(echo $file | cut -d. -f1) + 10))
local ext=$(echo $file | cut -d. -f2)
mv $file $fname.$ext
}
To rename a file, call rename_file file_name in your shell script.

How to sort files in paste command with 500 files csv

My question is similar to How to sort files in paste command?
- which has been solved.
I have 500 csv files (daily rainfall data) in a folder with naming convention chirps_yyyymmdd.csv. Each file has only 1 column (rainfall value) with 100,000 rows, and no header. I want to merge all the csv files into a single csv in chronological order.
When I tried this script ls -v file_*.csv | xargs paste -d, with only 100 csv files, it worked. But when tried using 500 csv files, I got this error: paste: chirps_19890911.csv: Too many open files
How to handle above error?
For fast solution, I can divide the csv's into two folder and do the process using above script. But, the problem I have 100 folders and it has 500 csv in each folder.
Thanks
Sample data and expected result: https://www.dropbox.com/s/ndofxuunc1sm292/data.zip?dl=0
You can do it with gawk like this...
Simply read all the files in, one after the other and save them into an array. The array is indexed by two numbers, firstly the line number in the current file (FNR) and secondly the column, which I increment each time we encounter a new file in the BEGINFILE block.
Then, at the end, print out the entire array:
gawk 'BEGINFILE{ ++col } # New file, increment column number
{ X[FNR SEP col]=$0; rows=FNR } # Save datum into array X, indexed by current record number and col
END { for(r=1;r<=rows;r++){
comma=","
for(c=1;c<=col;c++){
if(c==col)comma=""
printf("%s%s",X[r SEP c],comma)
}
printf("\n")
}
}' chirps*
SEP is just an unused character that makes a separator between indices. I am using gawk because BEGINFILE is useful for incrementing the column number.
Save the above in your HOME directory as merge. Then start a Terminal and, just once, make it executable with the command:
chmod +x merge
Now change directory to where your chirps are with a command like:
cd subdirectory/where/chirps/are
Now you can run the script with:
$HOME/merge
The output will rush past on the screen. If you want it in a file, use:
$HOME/merge > merged.csv
First make one file without pasting and change that file into a oneliner with tr:
cat */chirps_*.csv | tr "\n" "," > long.csv
If the goal is a file with 100,000 lines and 500 columns then something like this should work:
paste -d, chirps_*.csv > chirps_500_merge.csv
Additional code can be used to sort the chirps_... input files into any desired order before pasteing.
The error comes from ulimit, from man ulimit:
-n or --file-descriptor-count The maximum number of open file descriptors
On my system ulimit -n returns 1024.
Happily we can paste the paste output, so we can chain it.
find . -type f -name 'file_*.csv' |
sort |
xargs -n$(ulimit -n) sh -c '
tmp=$(mktemp);
paste -d, "$#" >$tmp;
echo $tmp
' -- |
xargs sh -c '
paste -d, "$#"
rm "$#"
' --
Don't parse ls output
Once we moved from parsing ls output to good find, we find all files and sort them.
the first xargs takes 1024 files at a time, creates temporary file, pastes the output into temporary and outputs the temporary file filename
The second xargs does the same with temporary files, but also removes all the temporaries
As the count of files would be 100*500=500000 which is smaller then 1024*1024 we can get away with one pass.
Tested against test data generated with:
seq 1 2000 |
xargs -P0 -n1 -t sh -c '
seq 1 1000 |
sed "s/^/ $RANDOM/" \
>"file_$(date --date="-${1}days" +%Y%m%d).csv"
' --
The problem seems to be much like foldl with maximum size of chunk to fold in one pass. Basically we want paste -d, <(paste -d, <(paste -d, <1024 files>) <1023 files>) <rest of files> that runs kind-of-recursively. With a little fun I came up with the following:
func() {
paste -d, "$#"
}
files=()
tmpfilecreated=0
# read filenames...c
while IFS= read -r line; do
files+=("$line")
# if the limit of 1024 files is reached
if ((${#files[#]} == 1024)); then
tmp=$(mktemp)
func "${files[#]}" >"$tmp"
# remove the last tmp file
if ((tmpfilecreated)); then
rm "${files[0]}"
fi
tmpfilecreated=1
# start with fresh files list
# with only the tmp file
files=("$tmp")
fi
done
func "${files[#]}"
# remember to clear tmp file!
if ((tmpfilecreated)); then
rm "${files[0]}"
fi
I guess readarray/mapfile could be faster, and result in a bit clearer code:
func() {
paste -d, "$#"
}
tmp=()
tmpfilecreated=0
while readarray -t -n1023 files && ((${#files[#]})); do
tmp=("$(mktemp)")
func "${tmp[#]}" "${files[#]}" >"$tmp"
if ((tmpfilecreated)); then
rm "${files[0]}"
fi
tmpfilecreated=1
done
func "${tmp[#]}" "${files[#]}"
if ((tmpfilecreated)); then
rm "${files[0]}"
fi
PS. I want to merge all the csv files into a single csv in chronological order. Wouldn't that be just cut? Right now each column represents one day.
You can try this Perl-one liner. It will work for any number of files matching *.csv under a directory
$ ls -1 *csv
file_1.csv
file_2.csv
file_3.csv
$ cat file_1.csv
1
2
3
$ cat file_2.csv
4
5
6
$ cat file_3.csv
7
8
9
$ perl -e ' BEGIN { while($f=glob("*.csv")) { $i=0;open($FH,"<$f"); while(<$FH>){ chomp;#t=#{$kv{$i}}; push(#t,$_);$kv{$i++}=[#t];}} print join(",",#{$kv{$_}})."\n" for(0..$i) } ' <
1,4,7
2,5,8
3,6,9
$

How do you compress multiple folders at a time, using a shell?

There are n folders in the directory named after the date, for example:
20171002 20171003 20171005 ...20171101 20171102 20171103 ...20180101 20180102
tips: Dates are not continuous.
I want to compress every three folders in each month into one compression block.
For example:
tar jcvf mytar-20171002_1005.tar.bz2 20171002 20171003 20171005
How to write a shell to do this?
You need to do a for loop on your ls variable, then parse the directory name.
dir_list=$(ls)
prev_month=""
times=0
first_dir=""
last_dir=""
dir_list=()
for i in $dir_list; do
month=${i:0:6} #here month will be year plus month
if [ "$month" = "$prev_month" ]; then
i=$(($i+1))
if [ "$i" -eq "3" ]; then
#compress here
dir_list=()
first_dir=""
last_dir=""
else
last_dir=$i
dir_list+=($i)
fi
else
if [ "$first_dir" = "" ]; then
first_dir=$i
else
#compress here
first_dir="$i"
last_dir=""
dir_list=()
fi
fi
This code is not tested and may contain syntaxe error. '#compress here' need to be replace by a loop on the array to create a string to compress.
Assuming you don't have too many directories (I think the limit is several hundred), then you can use Bash's array manipulation.
So, you first load all your directory names into a Bash array:
dirs=( $(ls) )
(I'm going to assume files have no spaces in their names, otherwise it gets a bit dicey)
Then you can use Bash's array slice syntax to pop 3 elements at a time from the array:
while [ "${#dirs[#]}" -gt 0 ]; do
dirs_to_compress=( "${dirs[#]:0:3}" )
dirs=( "${dirs[#]:3}" )
# do something with dirs_to_compress
done
The rest should be pretty easy.
You can achieve this with xargs, a bash while loop, and awk:
ls | xargs -n3 | while read line; do
tar jcvf $(echo $line | awk '{print "mytar-"$1"_"substr($NF,5,4)".tar.bz2"}') $line
done
unset folders
declare -A folders
g=3
for folder in $(ls -d */); do
folders[${folder:0:6}]+="${folder%%/} "
done
for folder in "${!folders[#]}"; do
for((i=0; i < $(echo ${folders[$folder]} | tr ' ' '\n' | wc -l); i+=g)) do
group=(${folders[$folder]})
groupOfThree=(${group[#]:i:g})
tar jcvf mytar-${groupOfThree[0]}_${groupOfThree[-1]:4:4}.tar.bz2 ${groupOfThree[#]}
done
done
This script finds all folders in the current directory, seperates them in groups of months, makes groups of at most three folders and creates a .tar.bz2 for each of them with the name you used in the question.
I tested it with those folders:
20171101 20171102 20171103 20171002 20171003 20171005 20171007 20171009 20171011 20171013 20180101 20180102
And the created tars are:
mytar-20171002_1005.tar.bz2
mytar-20171007_1011.tar.bz2
mytar-20171013_1013.tar.bz2
mytar-20171101_1103.tar.bz2
mytar-20180101_0102.tar.bz2
Hope that helps :)
EDIT: If you are using bash version < 4.2 then replace the line:
tar jcvf mytar-${groupOfThree[0]}_${groupOfThree[-1]:4:4}.tar.bz2 ${groupOfThree[#]}
by:
tar jcvf mytar-${groupOfThree[0]}_${groupOfThree[`expr ${#groupOfThree[#]} - 1`]:4:4}.tar.bz2 ${groupOfThree[#]}
That's because bash version < 4.2 doesn't support negative indices for arrays.

find only the first file from many directories

I have a lot of directories:
13R
613
AB1
ACT
AMB
ANI
Each directories contains a lots of file:
20140828.13R.file.csv.gz
20140829.13R.file.csv.gz
20140830.13R.file.csv.gz
20140831.13R.file.csv.gz
20140901.13R.file.csv.gz
20131114.613.file.csv.gz
20131115.613.file.csv.gz
20131116.613.file.csv.gz
20131117.613.file.csv.gz
20141114.ab1.file.csv.gz
20141115.ab1.file.csv.gz
20141116.ab1.file.csv.gz
20141117.ab1.file.csv.gz
etc..
The purpose if to have the first file from each directories
The result what I expect is:
13R|20140828
613|20131114
AB1|20141114
Which is the name of the directories pipe the date from the filename.
I guess I need a find and head command + awk but I can't make it, I need your help.
Here what I have test it
for f in $(ls -1);do ls -1 $f/ | head -1;done
But the folder name is missing.
When I mean the first file, is the first file returned in an alphabetical order within the folder.
Thanks.
You can do this with a Bash loop.
Given:
/tmp/test
/tmp/test/dir_1
/tmp/test/dir_1/file_1
/tmp/test/dir_1/file_2
/tmp/test/dir_1/file_3
/tmp/test/dir_2
/tmp/test/dir_2/file_1
/tmp/test/dir_2/file_2
/tmp/test/dir_2/file_3
/tmp/test/dir_3
/tmp/test/dir_3/file_1
/tmp/test/dir_3/file_2
/tmp/test/dir_3/file_3
/tmp/test/file_1
/tmp/test/file_2
/tmp/test/file_3
Just loop through the directories and form an array from a glob and grab the first one:
prefix="/tmp/test"
cd "$prefix"
for fn in dir_*; do
cd "$prefix"/"$fn"
arr=(*)
echo "$fn|${arr[0]}"
done
Prints:
dir_1|file_1
dir_2|file_1
dir_3|file_1
If your definition of 'first' is different that Bash's, just sort the array arr according to your definition before taking the first element.
You can also do this with find and awk:
$ find /tmp/test -mindepth 2 -print0 | awk -v RS="\0" '{s=$0; sub(/[^/]+$/,"",s); if (s in paths) next; paths[s]; print $0}'
/tmp/test/dir_1/file_1
/tmp/test/dir_2/file_1
/tmp/test/dir_3/file_1
And insert a sort (or use gawk) to sort as desired
sort has an unique option. Only the directory should be unique, so use the first field in sorting -k1,1. The solution works when the list of files is sorted already.
printf "%s\n" */* | sort -k1,1 -t/ -u | sed 's#\(.*\)/\([0-9]*\).*#\1|\2#'
You will need to change the sed command when the date field may be followed by another number.
This works for me:
for dir in $(find "$FOLDER" -type d); do
FILE=$(ls -1 -p $dir | grep -v / | head -n1)
if [ ! -z "$FILE" ]; then
echo "$dir/$FILE"
fi
done

UNIX - Replacing variables in sql with matching values from .profile file

I am trying to write a shell which will take an SQL file as input. Example SQL file:
SELECT *
FROM %%DB.TBL_%%TBLEXT
WHERE CITY = '%%CITY'
Now the script should extract all variables, which in this case everything starting with %%. So the output file will be something as below:
%%DB
%%TBLEXT
%%CITY
Now I should be able to extract the matching values from the user's .profile file for these variables and create the SQL file with the proper values.
SELECT *
FROM tempdb.TBL_abc
WHERE CITY = 'Chicago'
As of now I am trying to generate the file1 which will contain all the variables. Below code sample -
sed "s/[(),']//g" "T:/work/shell/sqlfile1.sql" | awk '/%%/{print $NF}' | awk '/%%/{print $NF}' > sqltemp2.sql
takes me till
%%DB.TBL_%%TBLEXT
%%CITY
Can someone help me in getting to file1 listing the variables?
You can use grep and sort to get a list of unique variables, as per the following transcript:
$ echo "SELECT *
FROM %%DB.TBL_%%TBLEXT
WHERE CITY = '%%CITY'" | grep -o '%%[A-Za-z0-9_]*' | sort -u
%%CITY
%%DB
%%TBLEXT
The -o flag to grep instructs it to only print the matching parts of lines rather than the entire line, and also outputs each matching part on a distinct line. Then sort -u just makes sure there are no duplicates.
In terms of the full process, here's a slight modification to a bash script I've used for similar purposes:
# Define all translations.
declare -A xlat
xlat['%%DB']='tempdb'
xlat['%%TBLEXT']='abc'
xlat['%%CITY']='Chicago'
# Check all variables in input file.
okay=1
for key in $(grep -o '%%[A-Za-z0-9_]*' input.sql | sort -u) ; do
if [[ "${xlat[$key]}" == "" ]] ; then
echo "Bad key ($key) in file:"
grep -n "${key}" input.sql | sed 's/^/ /'
okay=0
fi
done
if [[ ${okay} -eq 0 ]] ; then
exit 1
fi
# Process input file doing substitutions. Fairly
# primitive use of sed, must change to use sed -i
# at some point.
# Note we sort keys based on descending length so we
# correctly handle extensions like "NAME" and "NAMESPACE",
# doing the longer ones first makes it work properly.
cp input.sql output.sql
for key in $( (
for key in ${!xlat[#]} ; do
echo ${key}
done
) | awk '{print length($0)":"$0}' | sort -rnu | cut -d':' -f2) ; do
sed "s/${key}/${xlat[$key]}/g" output.sql >output2.sql
mv output2.sql output.sql
done
cat output.sql
It first checks that the input file doesn't contain any keys not found in the translation array. Then it applies sed substitutions to the input file, one per translation, to ensure all keys are substituted with their respective values.
This should be a good start, though there may be some edge cases such as if your keys or values contain characters sed would consider important (like / for example). If that is the case, you'll probably need to escape them such as changing:
xlat['%%UNDEFINED']='0/0'
into:
xlat['%%UNDEFINED']='0\/0'

Resources