I have file whose size is approx 1 GB and that file has a data in below format .
A|CD|44123|0|0
B|CD|44124|0|0
C|CD|44125|0|0
D|CD|44126|0|0
E|CD|44127|0|0
F|CD|44128|0|0
J|CD|44129|0|0
I|CD|44130|0|0
In this file I have to replace the third column value from a value which i will get after conversion . For which i have to open this file and then read the file and replace it . This process is taking around 5 hours .Below is the code which i am using
cat $FILE_NAME |\
while read REC
do
DATE=`echo "$REC" | cut -d\| -f3`
DATE_NEW=`$UTIL $DATE | head -1 |cut -d" " -f12`
RECORD="$DATE_NEW,"
echo "$RECORD" >> $New_File
done
Is there a way we can make this more better and fast.
Desired output will be like this where DATE_NEW value will be placed on each 3rd column DATE_NEW value will be the converted value which I will get from this
DATE_NEW=`$UTIL $DATE | head -1 |cut -d" " -f12`
A|CD|10/20/2020|0|0
B|CD|10/25/2020|0|0
C|CD|10/25/2020|0|0
D|CD|10/25/2020|0|0
E|CD|11/15/2020|0|0
F|CD|11/14/2020|0|0
J|CD|11/16/2020|0|0
I|CD|11/17/2020|0|0
After the comment from #Sundeep Why is using a shell loop to process text considered bad practice? I wrote the logic in Perl and from 5-7 hours processing time in Perl it took 99 Seconds to get the job done.
Give this a try:
awk -v cmd="Cmd2GetNEWDATE" 'BEGIN{FS=OFS="|"}{cmd|getline v;close(cmd)}$3=v' file
Related
I want the code in Bash scripting
"It should print the dates in the below manner
From : 2015-October-03 2015-October-04(in the next line again it should print)
2015-October-10 2015-October-11
" "
" "
To :2017-October-21 2017-October-22
2017-October-28 2017-October-29
So, this should print all the months from the 2015-till date weekend dates in the above format only. please help me at the earliest
The following is the solution for your query.
Solution:-
#!/bin/bash
Date_Diff_Count=` echo $[$[$(date +%s)-$(date -d "2015-01-01" +%s)]/60/60/24] `
for i in ` seq -$Date_Diff_Count 0 `
do
VALUE=`date -d "+$i day" | egrep -i "Sat|Sun" | awk -F" " '{print $2" "$3" "$6}'`
[[ ! -z ${VALUE} ]] && date -d "${VALUE}" +%Y-%B-%d
done > sample.txt
paste -d " " - - < sample.txt
Output
2015-January-03 2015-January-04
2015-January-10 2015-January-11
2015-January-17 2015-January-18
2015-January-24 2015-January-25
2015-January-31 2015-February-01
...
2016-May-07 2016-May-08
2016-May-14 2016-May-15
2016-May-21 2016-May-22
2016-May-28 2016-May-29
...
2017-October-07 2017-October-08
2017-October-14 2017-October-15
2017-October-21 2017-October-22
2017-October-28 2017-October-29
Explanation
Date_Diff_Count is the variable i.e. getting number of days by
subtracting the start date from the current date. Based on your wish
you can edit the start date.
For loop is starting from -Date_Diff_Count to 0 for Ex: if
Date_Diff_Count is 500, for loop sequence starts from -500 to 0.
Value is where we are fetching only year,month and date after doing pipe on the output of date and egrep command.
if value is not zero then we are converting date into the format YYYY-month-DD
Final output will be saved in sample.txt file
Final paste command is to merge 2 consecutive lines into a single line. If you want to merge 3 lines then use paste -d " " - - -
d is delimiter to separate the merged lines. You can use any other operators based on your requirements.
I have some data like
[09359]0000.365604| =>SttSasph_Hmbm_bSPO_PhQmOm (Hmbm_PhQmOm_utWmP.asp)
[09359]0000.365687| =>Hmbm_bSPO_PhQmOm_Wd (Hmbm_PhQmOm_utWmP.asp)
[09359]0000.365879| =>SttSasph_Hmbm_quOuO_PhQmOm (Hmbm_PhQmOm_utWmP.asp)
[09359]0000.365890| =>Hmbm_quOuO_PhQmOm_Wd (Hmbm_PhQmOm_utWmP.asp)
[09359]0000.365979| WSmmOT SDDQ vSQWSbmO not POt, QOvOQtWnH to Onv mOthod
[09359]0001.625300| db_HOt_POPPWon_Wd: aspuQQOnt POPPWon WD WP 1016,59
[09359]0002.365979| WSmmOT SDDQ vSQWSbmO not POt, QOvOQtWnH to Onv mOthod
Every Line starts with a process number (Which can change) in square brackets
Then Seconds after the module (0001) in this case
Then MicroSeconds after the fullstop.
Then a Pipe to terminate.
Rest part can be ignored
What I need is to caluclate
Convert Seconds into MircoSeconds
Add the Microsconds to Converted Microseconds (From 1)
Find out the difference in microseconds. for eg. line2-line1 , line3-line2, line4- line3 and so.
Print the result in seperate file.
I tried to use this logic. But, it didnt work.
May I get suggestions with optimised way to do it or
improvement in my existing logic
sec=$(grep '^\[.\{1,\}\]' mass.May28.1 | cut -d "| " -f1 | cut -c8- | cut -d"." -f1)
msec=$(grep '^\[.\{1,\}\]' mass.May28.1 | cut -d "| " -f1 | cut -c8- | cut -d"." -f2)
$f_msec=$((sec * 1000000 + msec)) > final_difference_file
If you are comfortable with awk, then you can use this script:
script.awk
BEGIN{ FS="[\\[\\]\\|]+" }
{ printf("[%s]%011.6f|%s\n", $2,$3-prev,$4)
prev = $3 }
Use it like this: awk -f script.awk yourfile
The first line setups the fieldsplitting to use the brackets and pipe (ignore the backslashes they are need to escape the symbols that are regexp metacharacters). The second line prints the fields and calculates the timediff. The last line stores the current time for the calculation in the next line.
This can also be done with a bash script. Since bash lacks floating point arithmetic, we have to gather seconds and microseconds seperately (or call an external tool like bc for each line):
script.sh
IFS='|[].'
factor=1000000
prev=0
while read dummy pid secs msecs text;
do
msecs=$(( $secs * $factor + $msecs ))
timediff=$(( $msecs - $prev ))
prev=$msecs
secs=$(( $timediff / $factor ))
msecs=$(( $timediff - $secs * $factor ))
printf "[%s]%04d.%06d|%s\n" "$pid" "$secs" "$msecs" "$text"
done
Use it like this: bash script.sh yourfile
An application is continually writing to a log. Each line forms a new entry, the log is in a csv format. Example:
123123123,asdf,asdf,3453456,sdfgsfgs,4567asd,zxc,aa
444444222,asdf,asdf,3453456,sdfgsfgs,4567asd,zxc,aa
563434535,asdf,asdf,3453456,sdfgsfgs,4567asd,zxc,aa
234234334,asdf,asdf,3453456,sdfgsfgs,4567asd,zxc,aa
234234534,asdf,asdf,3453456,sdfgsfgs,4567asd,zxc,aa
546456456,asdf,asdf,3453456,sdfgsfgs,4567asd,zxc,aa
567567567,asdf,asdf,3453456,sdfgsfgs,4567asd,zxc,aa
234232342,asdf,asdf,3453456,sdfgsfgs,4567asd,zxc,aa
I need to poll the log and extract the data in chunks appending the data to another log file called newLog.csv
I need to ensure that;
I don't copy data already moved over to the new file,
If there is not 200 lines of data then it captures the nearest number of lines available, without getting duplicates.
Can I change this tail statement to meet the above?
tail -n 200 $REMOTE_HOME/data/log.csv >> $SCRIPT_DIR/$project/newLog.csv
Provided the first data in the string is some sort of a time code (unixtime ?), you could do:
1.Check the time of last written line in new log.
LAST_LINE=tail -n 1 /PATH/new_log | awk -F',' '{print $1}'
2.Check the first line you want to write
FIRST_LINE=tail -n 200 /PATH/old_log | head -n 1
3.If the last line in new log is older than first line of 200 write 200 lines
if [ $LAST_LINE -lt $FIRST_LINE ]
do tail -n 200 /PATH/old_log >> /PATH/new_log;done;
Now you have to put it in a loop, to make stuff work if e.g. 3 lines overlap. Basically you do the same as before, just have to list the last 200 lines to get the first new one.
LAST_LINE=tail -n 1 /PATH/new_log | awk -F',' '{print $1}'
COUNT=200;
while [ $COUNT -gt 0 ]; do
FIRST_LINE=tail -n $COUNT /PATH/old_log | head -n 1
if [ $LAST_LINE -lt $FIRST_LINE ]
do tail -n $COUNT /PATH/old_log >> /PATH/new_log;break;done;
done
I try to take the first number from each file.dat of the form:
5.01 1 56.413481000 -0.00063400 0.00095770
5.01 2 61.193808800 0.00102170 0.00078280
5.01 3 65.974136600 -0.00108170 0.00102620
5.01 4 70.754464300 0.00082490 0.00103630
and then use this number (5.01) as the title of a .png file.
I use a bash script and I know the command line=$(head -n 1 $f) as found in a question here, but this take to me the first line of the file $f.
In this case also the space in the line is saved and the .png file title became:
plot 5.01 1 56.413481000 -0.00063400 0.00095770.png
There is some way to take only 5.01 and have a trim title for the plot?
Thanks to all.
I'd probably just do it with perl:
VAL=$( echo "$line" | perl -pe 's/^[^\d]+//g;s/[^\d\.].*$//' )
Something like that anyway.
Should remove:
anything that isn't a digit from the start of line.
Anything not-digit or not . to the end of line.
Or with grep:
grep -o "[0-9]*\.[0-9]*" file.dat | head -1
Edit:
Testing without the head -1 for a oneline input:
echo " 5.01 2 61.193808800 0.00102170 0.00078280" | grep -o "[0-9]*\.[0-9]*"
5.01
61.193808800
0.00102170
0.00078280
Using head -1 will return the first match on the first line.
When you know the match will be on the first line, so can we ignore files with an incorrect first line (and don't grep through complete files):
Make a two-headed monster:
head -1 | grep -o "[0-9]*\.[0-9]*" file.dat | head -1
To extract the first field, assuming they are tab separated:
val=$(head -n 1 $f | cut -f 1)
or, if they are space separated instead:
val=$(head -n 1 $f | cut -f 1 -d ' ')
OR you can avoid calling any extra processes and keep all data manipulation in the bash shell with
while read realNum restOfLine ;
break
done < $f
echo $realNum
This grabs the first "word" and puts the remaining into "restOfLine".
The break ensures that you only read the first line of the file.
IHTH
I'm trying to analyze an enormous text file (1.6GB), whose data lines look like this:
20090118025859 -2.400000 78.100000 1023.200000 0.000000
20090118025900 -2.500000 78.100000 1023.200000 0.000000
20090118025901 -2.400000 78.100000 1023.200000 0.000000
I don't even know how many lines there are. But I'm trying to split the file by date. The left number is a time stamp (these lines for example are from 2009, january 18th).
How can I split this file into pieces according to the date?
The number of entries per date differs, so using split with a constant number won't work.
Everything I know would be to grep file '20090118*' > data20090118.dat , but there sure is a way to do all the dates at once, right?
Thanks in advance,
Alex
Using awk:
awk '{print > "data"substr($1,0,8)".dat"}' myfile
This should work if the items are in date sequence:
date=20090101 # Change to the earliest date
while IFS= read -rd $'\n' line
do
if [ "$(echo "$line" | cut -d ' ' -f 1 | cut -c 1-8)" -eq $date ]
then
echo "$line" >> "$date.dat"
else
let date++
fi
done < log.dat
With the caveats that each day needs to have more than 1 record,
and that the output file will have blank lines:
uniq --all-repeated=separate -w8 file | csplit -s - '/^$/' '{*}'
We really should have an option to uniq to output even uniq records.
Also csplit should have an option to suppress the matched line.