extract information regarding : size && time && row_count in one line shell script - bash

Hey every one! I am pretty new for shell script and I am stuck
I need to extract information regarding: file_name && size && time && row_count and I want it do in one command line. I tried like this :
ls -l * && wc -l file.txt && du -ks file.txt | cut -f1| awk '{print $5" " $6 " " $7 " "$8 " " $9 " "$1 " "$2}'
but is not working properly
I also tried do in loop but i dont know how extract from there
for file in `ls -ltr /export/home/oracle/dbascripts/scripts`
do
[[ -f $file ]] && echo $file | awk '{print $3}'
done
Then I want to redirect to file like this >> for sql loader purpose.
Thanks in advance!

This could be a start if you have GNU find and GNU coreutils (most Linux distribution will do):
for i in /my/path/*; do
find "$i" ! -type d -printf '%p %TY-%Tm-%Td %TH:%TM:%TS %s '
wc -l <"$i"
done
/my/path/* should be modified to reflect the files you want to probe.
Also keep in mind that this one-liner has a few major issues if any directories are specified. This should be safer in that regard:
for i in *; do
if [[ -d "$i" ]]; then
continue
fi
find "$i" -printf '%p %TY-%Tm-%Td %TH:%TM:%TS %s '
wc -l <"$i"
done
You will want to see the manual page for GNU find to understand this better.
EDIT:
There is at least other faster way, using join and bash process substitution, but it's a bit ugly and somewhat harder to make safe and work the kinks out of.

ExtractInformation()
{
timesep="-"
sep="|"
dot=":"
sec="00"
lcount=`wc -l < $fname`
modf_time=`ls -l $fname`
f_size=`echo $modf_time | awk '{print $5}'`
time_month=`echo $modf_time | awk '{print $6}'`
time_day=`echo $modf_time | awk '{print $7}'`
time_hrmin=`echo $modf_time | awk '{print $8}'`
time_hr=`echo $time_hrmin | cut -d ':' -f1`
time_min=`echo $time_hrmin | cut -d ':' -f2`
time_year=`date '+%Y'`
time_param="DD-MON-YYYY HH24:MI:SS"
time_date=$time_day$timesep$time_month$timesep$time_year" "$time_hrmin$dot$sec
result=$fname$sep$time_date$sep$f_size$sep$lcount$sep$time_param
sqlresult=`echo $result | awk '{FS = "|" ;q=sprintf("%c", 39); print "INSERT INTO SIP_ICMS_FILE_T(f_name, f_date_time,f_size,f_row_count) VALUES (" q $1 q ", TO_DATE("q $2 q,q $5 q "),"$3","$4");";}'`
echo $sqlresult>>data.sql
echo "Reading data....."
}
UploadData()
{
#ss=`sqlplus -s a/a#adb #data.sql
#set serveroutput on
#set feedback off
#set echo off`
echo "loading with sql Loader....."
}
f_data=data.sql
[[ -f $f_data ]] && rm data.sql
for fname in * ;
do
if [[ -f $fname ]] then
ExtractInformation
fi
UploadData
#Zipdata
done

Related

bulk write in Unix using shell script

Is there any way to write bulk data in a file in shell script instead of writing line by line code in file?
In below script, I want to write difference between arrival time and generation time of files in test.csv file.
########################################################
echo "Starting the Execution for Time difference\n";
############################################################
# Functions used across the script
datediff() {
Unixtime=`echo $1 $2 $3 $4`
Filetime=`echo $5 $6 $7 $8`
echo $Unixtime;
echo $Filetime;
d1=`date -d "$Unixtime" +%s`
d2=`date -d "$Filetime" +%s`
echo $d1;
echo $d2;
TIME_DIFF=`expr $d1 - $d2`
TIME_DIFF=`expr $TIME_DIFF / 60`
echo $TIME_DIFF;
echo "$Unixtime,$Filetime,$TIME_DIFF,$9" >> ../test.csv
}
rm -f ../test.csv;
for i in `ls -1 | grep -v 'DelayCheck.s*'`
do
DayMonth=`ls -lrt $i | awk '{print $7" "$6" "}'`
Year=`ls --full-time $i | awk '{print $6}' | cut -c1-4`
HourMin=`ls -lrt $i | awk '{print " "$8}'`
timeA=`echo $DayMonth $Year $HourMin`
FileYearMonDay=`ls -ltr $i | awk '{print $9}' | awk -F'--' '{print $3}' | cut
-c2-9`
timeB1=`date -d $FileYearMonDay +'%d %b %Y'`
timeB2=`echo $i | awk -F'--' '{print substr($3,10,13)}' | sed -e
's/../:&/2g'`
timeB=`echo $timeB1 $timeB2`
echo "Time A is $timeA";
echo " Time b is $timeB";
datediff $timeA $timeB $i
done
echo $?;
script is working fine, but the problem is there is over 100k files. So script performance is bad.
I had tried to search is there any way to write bulk data in a file but I didn't find any solution.

How to pass a variable string to a file txt at the biginig of test?

I have a problem
I Have a program general like this gene.sh
that for all file (es file: geneX.csv) make a directory with the name of gene (example: Genex/geneX.csv) next this program compile an other program inside gene.sh but this progrm need a varieble and I dont know how do it.
this is the program gene.sh
#!/bin/bash
# Create a dictory for each file *.xls and *.csv
for fname in *.xlsx *csv
do
dname=${fname%.*}
[[ -d $dname ]] || mkdir "$dname"
mv "$fname" "$dname"
done
# For each gene go inside the directory and compile the programs getChromosomicPositions.sh to have the positions, and getHapolotipeStings.sh to have the variants
for geni in */; do
cd $geni
z=$(tail -n 1 *.csv | tr ';' "\n" | wc -l)
cd ..
cp getChromosomicPositions.sh $geni --->
cp getHaplotypeStrings.sh $geni
cd $geni
export z
./getChromosomicPositions.sh *.csv
export z
./getHaplotypeStrings.sh *.csv
cd ..
done
This is the program getChromosomichPositions.sh:
rm chrPosRs.txt
grep '^Haplotype\ ID' $1 | cut -d ";" -f 4-61 | tr ";" "\n" | awk '{print "select chrom,chromStart,chromEnd,name from snp147 where name=\""$1"\";"}' > listOfQuery.txt
while read l; do
echo $l > query.txt
mysql -h genome-mysql.cse.ucsc.edu -u genome -A -D hg38 --skip-column-names < query.txt > queryResult.txt
if [[ "$(cat queryResult.txt)" == "" ]];
then
cat query.txt |
while read line; do
echo $line | awk '$6 ~/rs/ {print $6}' > temp.txt;
if [[ "$(cat temp.txt)" != "" ]];
then cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g' > temp.txt;
./getHGSVposHG19.sh temp.txt ---> Hear the problem--->
else
echo $line | awk '{num=sub(/.*:g\./,"");num+=sub(/\".*/,"");if(num==2){print};num=""}' > temp2.txt
fi
done
cat query.txt >> varianti.txt
echo "Missing Data" >> chrPosRs.txt
else
cat queryResult.txt >> chrPosRs.txt
fi
done < listOfQuery.txt
rm query*
hear the problem:
I need to enter in the file temp.txt and put automatically at the beginning of the file the variable $geni of the program gene.sh
How can I do that?
Why not pass "$geni" as say the first argument when invoking your script, and treating the rest of the arguments as your expected .csv files.
./getChromosomicPositions.sh "$geni" *.csv
Alternatively, you can set it as environment variable for the script, so that it can be used there (or just export it).
geni="$geni" ./getChromosomicPositions.sh *.csv
In any case, once you have it available in the second script, you can do
if passed as the first argument:
echo "${1}:$(cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g')
or if passed as environment variable:
echo "${geni}:$(cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g')

Find TXT files and show Total Count of records of each file and Size of each file

I need to find row Count and size of each TXT files.
It needs to search all the directories and just show result as :
FileName|Cnt|Size
ABC.TXT|230|23MB
Here is some code:
v_DIR=$1
echo "the directory to cd is "$1
x=`ls -l $0 | awk '{print $9 "|" $5}'`
y=`awk 'END {print NR}' $0`
echo $x '|' $y
Try something like
find -type f -name '*.txt' -exec bash -c 'lines=$(wc -l "$0" | cut -d " " -f1); size=$(du -h "$0" | cut -f1); echo "$0|$lines|$size"' {} \;

shell script sum in for loop not working

size=`ls -l /var/temp.* | awk '{ print $5}'`
fin_size=0
for row in ${size} ;
do
fin_size=`echo $(( $row + $fin_size )) | bc`;
done
echo $fin_size
is not working !! echo $fin_size is throwing some garbage minus value.
where I'm mistaking?
(my bash is old and I suppose to work in this only Linux kernel: 2.6.39)
Don't parse ls.
Why not use du as shown below?
du -cb /var/temp.* | tail -1
Because it cannot be stressed enough: Why you shouldn't parse the output of ls(1)
Use e.g. du as suggested by dogbane, or find:
$ find /var -maxdepth 1 -type f -name "temp.*" -printf "%s\n" | awk '{total+=$1}END{print total}'
or stat:
$ stat -c%s /var/temp.* | awk '{total+=$1}END{print total}'
or globbing and stat (unnecessary, slow):
total=0
for file in /var/temp.*; do
[ -f "${file}" ] || continue
size="$(stat -c%s "${file}")"
((total+=size))
done
echo "${total}"
Below should be enough:
ls -l /var/temp.* | awk '{a+=$5}END{print a}'
No need for you to run the for loop.This means:
size=ls -l /var/temp.* | awk '{ print $5}'`
fin_size=0
for row in ${size} ;
do
fin_size=`echo $(( $row + $fin_size )) | bc`;
done
echo $fin_size
The whole above thing can be replaced with :
fin_size=`ls -l /var/temp.* | awk '{a+=$5}END{printf("%10d",a);}'`
echo $fin_size

How to quote file name using awk?

I want output 'filename1','filename2' ,'filename3' ....
I m using awk ..but no idea how to print last quoate after filename.
It printing me ,'filename ===>I need ,'filename'
ls -ltr | grep -v ^d | sed '1d'| awk '{print "," sprintf("%c", 39) $9}'
Thanks in advance!
You can use the find command as:
find . -maxdepth 1 -type f -printf "'%f'," | sed s/,$//
if you have Ruby(1.9+)
ruby -e 'puts Dir["*"].select{|x|test(?f,x)}.join("\47,\47")'
else
find . -maxdepth 1 -type f -printf '%f\n' | sed -e ':a N' -e "s#\n#','#" -e 'b a'
Use the printf function http://www.gnu.org/manual/gawk/html_node/Basic-Printf.html
Pure bash (probably posix sh, too):
comma=
for file in * ; do
if [ ! -d "$file" ] ; then
if [ ! -z $comma ] ; then
printf ","
fi
comma=1
printf "'%s'" "$file"
fi
done
Files with ' in the name are not accounted for, but nobody else has been doing that either. Presuming that escaping with \ is correct you could do.
comma=
for file in * ; do
if [ ! -d "$file" ] ; then
if [ ! -z $comma ] ; then
printf ","
fi
comma=1
printf "'%s'" "${file//\'/\'}"
fi
done
But some CSV systems would require you to follow write '' instead, which would be
printf "'%s'" "${file//\'/''}"
Let's pretend that you're processing some other data besides the output of ls.
$ printf "hello\ngoodbye\no'malley\n" | awk '{gsub("\047","\047\\\047\047",$1);printf "%s\047%s\047",comma,$1; comma=","}END{printf "\n"}'
'hello','goodbye','o'\''malley'
This variant works fine but I think there should be more elegant way to do it.
ls -1 $1 | cut -d'.' -f1 | awk '{printf "," sprintf("%c", 39) $1 sprintf("%c", 39) "\n" }'| sed '1 s/,*//'

Resources