"Integer expression expected" bash if statements - bash

I'm trying to extract the xy coordinates of some earthquake occurrences along with their magnitudes from a file "seismic_c_am.txt", and plot them as circles of various sizes and colours based on the magnitude. Here is what I have so far:
25 i=`awk '{ FS = "|" ; print $11}' seismic_c_am.txt`
26
27 if [ "$i" -gt 7 ] ; then
28 awk 'NR%25==0 { FS = "|" ; print $4, $3}' seismic_c_am.txt | psxy $rgn $proj -Sc0.25c -Gred -O -K >> $psfile ;
29 fi
30
31 if [ "$i" -gt 5 ] && [ "$i" -le 7 ] ; then
32 awk 'NR%25==0 { FS = "|" ; print $4, $3}' seismic_c_am.txt | psxy $rgn $proj -Sc0.2c -Gorange -O -K >> $psfile ;
33 fi
34
35 if [ "$i" -le 5 ] ; then
36 awk 'NR%25==0 { FS = "|" ; print $4, $3}' seismic_c_am.txt | psxy $rgn $proj -Sc0.1c -Gyellow -O -K >> $psfile ;
37 fi
This script seems to just print all the magnitudes ($11) into the terminal, and the last line reads:
.
.
3.6
4.0
1.7
3.6 : integer expression expected
But I don't know which line this is referring to! Possibly line 27, 31 or 35? (see above)

Bash doesn't do floating point arithmetic, only integer arithmetic.
Since you're comparing with integers, you can make awk print the integer part.
i=`awk '{ FS = "|" ; printf "%d", $11}' seismic_c_am.txt`
If you want to know which line is causing these errors, add the command set -x to your script to turn on tracing mode: bash will print each script line before executing it. If you only want to trace part of the script, you can turn off tracing with set +x.
Since you're repeating the same snippet many times, you may want to restructure your script a bit.
i=`awk '{ FS = "|" ; printf "%d", $11}' seismic_c_am.txt`
if [ $i -ge 7 ]; then
sc_value=0.25 color=red
elif [ $i -ge 5 ]; then
sc_value=0.2 color=orange
else
sc_value=0.1 color=yellow
fi
awk 'NR%25==0 { FS = "|" ; print $4, $3}' seismic_c_am.txt |
psxy $rgn $proj -Sc${sc_value}c -G$color -O -K >> $psfile

Related

Testing grep output

The cmd:
STATUS=`grep word a.log | tail -1 | awk '{print $1,$2,$7,$8,$9}'`
echo "$STATUS"
The output:
2020-05-18 09:27:01 1 of 122
I need to display this $STATUS and need to do the test comparison as well.
How to compare number 122 in below? How to represent 122 in $X?
The number 122 can be any number, resulted from above cmd.
if [ "$X" -gt "300" ]
then
echo "$STATUS. This in HIGH queue ($X)"
else
echo "$STATUS. This is NORMAL ($X)"
fi
You could do it with one awk script:
awk '
/word/{ status=$1" "$2" "$7" "$8" "$9; x=$9 }
END{ printf status". This %s (%s)\n", (x>300 ? "in HIGH queue" : "is NORMAL"), x }
' a.log
I would suggest using lowercase for variables to reduce possible confusion for someone other than the original author reading the script in the future. Also using $() is typically preferable to using back-ticks -- makes quoting easier to get right.
status="$(grep word a.log | tail -1 | awk '{print $1,$2,$7,$8,$9}')"
x="$(printf '%s' "$status" | awk '{ print $NF }')"
if [ "$x" -gt 300 ]
then
echo "$status. This in HIGH queue ($x)"
else
echo "$status. This is NORMAL ($x)"
fi
Note -- we could refactor the status line a bit:
status="$(awk '/word/ { x = $1 OFS $2 OFS $7 OFS $8 OFS $9 } END { print x }' a.log)"

AWK, average columns of different length from multiple files

I need to calculate average from columns from multiple files but columns have different number of lines. I guess awk is best tool for this but anything from bash will be OK. Solution for 1 column per file is OK. If solution works for files with multiple columns, even better.
Example.
file_1:
10
20
30
40
50
file_2:
20
30
40
Expected result:
15
25
35
40
50
awk would be a tool to do it easily,
awk '{a[FNR]+=$0;n[FNR]++;next}END{for(i=1;i<=length(a);i++)print a[i]/n[i]}' file1 file2
And the method could also suit for multiple files.
Brief explanation,
FNR would be the input record number in the current input file.
Record the sum of the specific column in files into a[FNR]
Record the number of show up times for the specific column into n[FNR]
Print the average value for each column using print a[i]/n[i] in the for loop
I have prepared for you the following bash script for you,
I hope this helps you.
Let me know if you have any question.
#!/usr/bin/env bash
#check if the files provided as parameters exist
if [ ! -f $1 ] || [ ! -f $2 ]; then
echo "ERROR: file> $1 or file> $2 is missing"
exit 1;
fi
#save the length of both files in variables
file1_length=$(wc -l $1 | awk '{print $1}')
file2_length=$(wc -l $2 | awk '{print $1}')
#if file 1 is longer than file 2 appends n 0\t to the end of the file
#until both files are the same length
# you can improve the scrips by creating temp files instead of working directly on the input ones
if [ "$file1_length" -gt "$file2_length" ]; then
n_zero_to_append=$(( file1_length - file2_length ))
echo "append $n_zero_to_append zeros to file $2"
#append n zeros to the end of file
yes 0 | head -n "${n_zero_to_append}" >> $2
#combine both files and compute the average line by line
awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
#if file 2 is longer than file 1 do the inverse operation
# you can improve the scrips by creating temp files instead of working on the input ones
elif [ "$file2_length" -gt "$file1_length" ]; then
n_zero_to_append=$(( file2_length - file1_length ))
echo "append $n_zero_to_append zeros to file $1"
yes 0 | head -n "${n_zero_to_append}" >> $1
awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
#if files have the same size we do not need to append anything
#and we can directly compute the average line by line
else
echo "the files : $1 and $2 have the same size."
awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
fi

How to speed up reading txt file in bash script

I am buliding a script that reads 24 daily hourly temperature data to extract a latitude-longitude region for a smaller domain. There are three columns in each data file temperature-longitude-latitude and 188426 rows.
> ==> 20120810234500.txt <==
> 0.0362,-12.5000,33.5000
> -0.0188,-12.5000,33.5400
> -0.0732,-12.5000,33.5800
> -0.1263,-12.5000,33.6200
> -0.1778,-12.5000,33.6600
> -0.2278,-12.5000,33.7000
> -0.2761,-12.5000,33.7400
> -0.3226,-12.5000,33.7800
> -0.3677,-12.5000,33.8200
> -0.4115,-12.5000,33.8600
I have used for and while loops and awk command to read data but it takes a too long time (at least for me) to read, extract and grab the new smaller file. Here you can see the relevant part of the script
# Start 24 hours loop
lom1=-3
lom2=3
lam1=35
lam2=42
nhoras=24
n=1
while [ $n -le $nhoras ]
do
# File name (nom_file) and length (nstation=188426)
nom_file=`awk -v i=$n 'BEGIN { FS = ","} NR==i { print $1 }' lista_datos.txt`
nstation=`awk 'END{print NR}' $nom_file`
# Original data came from windows system and has carriage returns
dos2unix -q $nom_file
# Date, time values from file name
year=`echo $nom_file | cut -c 1-4`
month=`echo $nom_file | cut -c 5-6`
day=`echo $nom_file | cut -c 7-8`
hour=`echo $nom_file | cut -c 9-14`
# Part of the string to write in the new smaller file
var1=`echo $nom_file | awk '{print substr($0,1,4) " " substr($0,5,2) " " substr($0,7,2) " " substr($0,9,6)}'`
# Read rows 65000 to 125000 to gain processing time
m=65000
#while [ $m -le $nstation ] # Bucle extración datos
while [ $m -le 125000 ] # Bucle extración datos
do
station_id=$m
elevation=1.5
lat=`awk -v i=$m 'BEGIN { FS = ","} NR==i { print $3 }' $nom_file`
lon=`awk -v i=$m 'BEGIN { FS = ","} NR==i { print $2 }' $nom_file`
# As lon/lat are floating point I use this workaround to get a smaller region
lom1=`echo $lon'>'$lon1 | bc -l`
lom2=`echo $lon'<'$lon2 | bc -l`
lam1=`echo $lat'>'$lat1 | bc -l`
lam2=`echo $lat'<'$lat2 | bc -l`
if [ $lom1 -eq 1 ] && [ $lom2 -eq 1 ];
then
if [ $lam1 -eq 1 ] && [ $lam2 -eq 1 ];
then
# Second part of the string to write in the new smaller file
var2=`awk -v i=$m -v e=$elevation 'BEGIN { FS = ","} NR==i { print "'${station_id}' " $3 " " $2 " '${elevation}' 000 " $1 " 000" }' $nom_file`
# Paste
paste <(echo "$var1") <(echo "$var2") -d ' ' >> out.txt
fi # final condición lat
fi # final condición lon
m=$(( $m + 1 ))
done # End of extracting loop
# Save results
cat cabecera-dp-s.txt out.txt > dp-s$year-$month-$day-$hour
rm out.txt
n=$(( $n + 1 ))
done # End 24 hours loop
By now it takes two hours to process a single imput file. Is there any option to speed up the process?
Thanks in advance
Thanks to all the comments and specially thanks to #fedorqui
With the right use for awk processing speed has dramatically increased. My first attempt processed a single file in 2 hours, now 24 files have been processed in 93 minutes. There should be room for improvement but right now is fine for me. Thanks again.
I attach the script, maybe it could be useful for someone
#!/bin/bash
# RUTAS
base=/home/meteo/PROJECTES/TERMED
dades=$base/DADES
files=$base/FILES
msg_data=$dades/MSG/Agosto
treball=$base/TREBALL
# INICIO DEL SCRIPT
cd $treball
rm *
# Header for final output
cp $files/cabecera-dp-s.txt ./
# Inicio bucle dia
for dia
in 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
do
cp $msg_data/$dia/* ./
ls 2*.txt > lista_datos.txt
awk '{print substr($0,9,6)}' lista_datos.txt > lista_horas.txt
nhoras=`awk 'END{print NR}' lista_horas.txt`
# Inicio bucle hora
n=1
while [ $n -le $nhoras ]
do
# File name and size
nom_file=`awk -v i=$n 'BEGIN { FS = ","} NR==i { print $1 }' lista_datos.txt`
nstation=`awk 'END{print NR}' $nom_file`
# avoid carriage returns
dos2unix -q $nom_file
# Date values
year=`echo $nom_file | cut -c 1-4`
month=`echo $nom_file | cut -c 5-6`
day=`echo $nom_file | cut -c 7-8`
hour=`echo $nom_file | cut -c 9-14`
# Extract region, thanks fedorqui
awk -F, '$2>=-3 && $2<=3 && $3>=35 && $3<=42' $nom_file > output-$year$month$day$hour.txt
# Parte 1 de la línea de datos RAMS
var1=`echo $nom_file | awk '{print substr($0,1,4) " " substr($0,5,2) " " substr($0,7,2) " " substr($0,9,6)}'`
# station_id, latitud, longitud, elevación y temperatura para cada punto
m=1
nstation=`awk 'END{print NR}' output-$year$month$day$hour.txt`
while [ $m -le $nstation ] # Bucle extración datos
do
station_id=$m
elevation=1.5
# Parte 2 de la línea de datos RAMS
var2=`awk -v i=$m -v e=$elevation 'BEGIN { FS = ","} NR==i { print "'${station_id}' " $3 " " $2 " '${elevation}' 000 " $1 " 000" }' output-$year$month$day$hour.txt`
# Pegamos las dos partes para construir la línea de datos
paste <(echo "$var1") <(echo "$var2") -d ' ' >> out.txt
m=$(( $m + 1 ))
done # Final bucle extracción datos
# Guardamos la salida con el formato y nombre RAMS
cat cabecera-dp-s.txt out.txt > dp-s$year-$month-$day-$hour
n=$(( $n + 1 ))
rm out.txt
done # Final bucle horas
# Borra datos para evitar conflicto con lista_horas, lista_datos
rm *txt
done # Final bucle dia

Simple bash script to split csv file by week number

I'm trying to separate a large pipe-delimited file based on a week number field. The file contains data for a full year thus having 53 weeks. I am hoping to create a loop that does the following:
1) check if week number is less than 10 - if it is paste a '0' in front
2) use grep to send the rows to a file (ie `grep '|01|' bigFile.txt > smallFile.txt` )
3) gzip the smaller file (ie `gzip smallFile.txt`)
4) repeat
Is there a resource that would show how to do this?
EDIT :
Data looks like this:
1|#gmail|1|0|0|0|1|01|com
1|#yahoo|0|1|0|0|0|27|com
The column I care about is the 2nd from the right.
EDIT 2:
Here's the script I'm using but it's not functioning:
for (( i = 1; i <= 12; i++ )); do
#statements
echo 'i :'$i
q=$i
# echo $q
# $q==10
if [[ q -lt 10 ]]; then
#statements
k='0'$q
echo $k
grep '|$k|' 20150226_train.txt > 'weeks_files/week'$k
gzip weeks_files/week $k
fi
if [[ q -gt 9 ]]; then
#statements
echo $q
grep \'|$q|\' 20150226_train.txt > 'weeks_files/week'$q
gzip 'weeks_files/week'$q
fi
done
Very simple in awk ...
awk -F'|' '{ print > ("smallfile-" $(NF-1) ".txt";) }' bigfile.txt
Edit: brackets added for "original-awk".
You're almost there.
#!/bin/bash
for (( i = 1; i <= 12; i++ )); do
#statements
echo 'i :'$i
q=$i
# echo $q
# $q==10
#OLD if [[ q -lt 10 ]]; then
if [[ $q -lt 10 ]]; then
#statements
k='0'$q
echo $k
#OLD grep '|$k|' 20150226_train.txt > 'weeks_files/week'$k
grep "|$k|" 20150226_train.txt > 'weeks_files/week'$k
#OLD gzip weeks_files/week $k
gzip weeks_files/week$k
#OLD fi
#OLD if [[ q -gt 9 ]]; then
elif [[ $q -gt 9 ]] ; then
#statements
echo $q
#OLD grep \'|$q|\' 20150226_train.txt > 'weeks_files/week'$q
grep "|$q|" 20150226_train.txt > 'weeks_files/week'$q
gzip 'weeks_files/week'$q
fi
done
You didn't alway use $ in front of your variable values. You can only get away with using k or q without a $ inside the shell arthimetic substitution feature, ie z=$(( x+k)) or just to operate on a variable like (( k++ )). There are others.
You need to learn the difference between single quoting and dbl-quoting. You need to use dbl-quoting when you want a value substituted for a variable, as in your lines
grep "|$q|" 20150226_train.txt > 'weeks_files/week'$q
and others.
I'm guessing that your use of grep \'|$q|\' 20150226_train.txt was an attempt to get the value of $q.
The way to get comfortable with debugging this sort of situation is to set the shell debugging option with set -x (turn it off with set +x). You'll see each line that is executed with the values substituted for the variables. Advanced debugging requires echo "varof Interset now = $var" (print statements). Also, you can use set -vx (and set +vx) to see each line or block of code before it is executed, and then the -x output will show which lines where acctually executed. For your script, you'd see the whole if ... elfi ...fi block printed, and then just the lines of -x output with values for variables. It can be confusing, even after years of looking at it. ;-)
So you can go thru and remove all lines with the prefix #OLD, and I'm hoping your code will work for you.
IHTH
mkdir -p weeks_files &&
awk -F'|' '
{ file=sprintf("weeks_files/week%2d",$(NF-1)); print > file }
!seen[file]++ { print file }
' 20150226_train.txt |
xargs gzip
If your data is ordered so that all of the rows for a given week number are contiguous you can make it simpler and more efficient:
mkdir -p weeks_files &&
awk -F'|' '
$(NF-1) != prev { file=sprintf("weeks_files/week%2d",$(NF-1)); print file }
{ print > file; prev=$(NF-1) }
' 20150226_train.txt |
xargs gzip
There are certainly a number of approaches - the 'awk' line below will reformat your data. If you take a sequential approach, then:
1) awk to reformat
awk -F '|' '{printf "%s|%s|%s|%s|%s|%s|%s|%02d|%s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9}' SOURCE_FILE > bigFile.txt
2) loop through the weeks, create small file an zip it
for N in {01..53}
do
grep "|${N}|" bigFile.txt > smallFile.${N}.txt
gzip smallFile.${N}.txt
done
3) test script showing reformat step
#!/bin/bash
function show_data {
# Data set w/9 'fields'
# 1| 2 |3|4|5|6|7| 8|9
cat << EOM
1|#gmail|1|0|0|0|1|01|com
1|#gmail|1|0|0|0|1|2|com
1|#gmail|1|0|0|0|1|5|com
1|#yahoo|0|1|0|0|0|27|com
EOM
}
###
function stars {
echo "## $# ##"
}
###
stars "Raw data"
show_data
stars "Modified data"
# 1| 2| 3| 4| 5| 6| 7| 8|9 ##
show_data | awk -F '|' '{printf "%s|%s|%s|%s|%s|%s|%s|%02d|%s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9}'
Sample run:
$ bash test.sh
## Raw data ##
1|#gmail|1|0|0|0|1|01|com
1|#gmail|1|0|0|0|1|2|com
1|#gmail|1|0|0|0|1|5|com
1|#yahoo|0|1|0|0|0|27|com
## Modified data ##
1|#gmail|1|0|0|0|1|01|com
1|#gmail|1|0|0|0|1|02|com
1|#gmail|1|0|0|0|1|05|com
1|#yahoo|0|1|0|0|0|27|com

how can i echo a line once , then the rest keep them the way they are in unix bash?

I have the following comment:
(for i in 'cut -d "," -f1 file.csv | uniq`; do var =`grep -c $i file.csv';if (($var > 1 )); then echo " you have the following repeated numbers" $i ; fi ; done)
The output that i get is : You have the following repeated numbers 455
You have the following repeated numbers 879
You have the following repeated numbers 741
what I want is the following output:
you have the following repeated numbers:
455
879
741
Try moving the echo of the header line before the for-loop :
(echo " you have the following repeated numbers"; for i in 'cut -d "," -f1 file.csv | uniq`; do var =`grep -c $i file.csv';if (($var > 1 )); then echo $i ; fi ; done)
Or only print the header once :
(header=" you have the following repeated numbers\n"; for i in 'cut -d "," -f1 file.csv | uniq`; do var =`grep -c $i file.csv';if (($var > 1 )); then echo -e $header$i ; header=""; fi ; done)
Well, here's what I came to:
1) generated input for testing
for x in {1..35},aa,bb ; do echo $x ; done > file.csv
for x in {21..48},aa,bb ; do echo $x ; done >> file.csv
for x in {32..63},aa,bb ; do echo $x ; done >> file.csv
unsort file.csv > new.txt ; mv new.txt file.csv
2) your line ( corrected syntax errors)
dtpwmbp:~ pwadas$ for i in $(cut -d "," -f1 file.csv | uniq);
do var=`grep -c $i file.csv`; if [ "$var" -ge 1 ] ;
then echo " you have the following repeated numbers" $i ; fi ; done | head -n 10
you have the following repeated numbers 8
you have the following repeated numbers 41
you have the following repeated numbers 18
you have the following repeated numbers 34
you have the following repeated numbers 3
you have the following repeated numbers 53
you have the following repeated numbers 32
you have the following repeated numbers 33
you have the following repeated numbers 19
you have the following repeated numbers 7
dtpwmbp:~ pwadas$
3) my line:
dtpwmbp:~ pwadas$ echo "you have the following repeated numbers:";
for i in $(cut -d "," -f1 file.csv | uniq); do var=`grep -c $i file.csv`;
if [ "$var" -ge 1 ] ; then echo $i ; fi ; done | head -n 10
you have the following repeated numbers:
8
41
18
34
3
53
32
33
19
7
dtpwmbp:~ pwadas$
I added quotes, changed if() to [..] expression, and finally moved description sentence out of loop. Number of occurences tested is digit near "-ge" condition. If it is "1", then numbers which appear once or more are printed. Note, that in this expression, if file contains e.g. numbers
8
12
48
then "8" is listed in output as appearing twice. with "-ge 2", if no digits appear more than once, no output (except heading) is printed.

Resources