shell script to read each line and run the condition - shell

I have a log file which has the below data
120
140
200
110
200
200
120
90
100
I want to read this file and compare each line(number) with 200, and if it crosses 200 - then it has to compare the next word till the 5 consecutive which crosses 200 then it has to send a alert otherwise script has to end.
Please help
Thanks,

Are you saying that you want to detect when 5 consecutive rows contain a value greater than 200? If so:
awk '{a = $1 > lim ? a + 1 : 0}
a > seq {print "alert on line " NR}' lim=200 seq=5 input
It's not clear what you actually want, and perhaps you want to use >= rather than > in the above.
This simply reads through the file named input and checks if the number is greater than 200 (the value given to lim). If it is, it increments a counter. When that counter is greater than seq, it prints a message.

Related

How to sum from 100 file - bash/awk?

I need to sum up values from 100 files. This is part of my input
suma_wiazan_wodorowych_2_1.txt
2536
1928
1830
1774
1732
1673
1620
suma_wiazan_wodorowych_2_101.txt (name for every file is changing by 100, so 1, 101, 201 etc)
2535
1987
1895
1829
1805
1714
1657
So my script should add first row from the first file first row from the second file .... to one hundred
2535+2536+..+..+2621
And against the second row from the first + second row from the second file etc.
The length of every file is 5000 rows (so I will have 5000 sums)
Do you have any idea?
A one-liner using pasteand bc
paste -d + suma_wiazan_wodorowych_2_* | bc
assuming the lines contain only bare numbers without a leading + (negative numbers, that are, numbers with a single leading -, are ok), and the files have equal number of lines.
with awk
$ awk '{sum[FNR]+=$1} END{for(i=1;i<=FNR;i++) print sum[i]}' file*
sum all corresponding values from all input files, print at the end.

Bash script to Split a file into n files with each file containing x number of records

I have a requirement where I need write a bash script to split a single input file into 'n' files and each file should not contain more than 'x' number of records (except the last file as the last file will have everything remaining). Values of 'n' and 'x' will be passed to the script as arguments by the user.
n should be the total number of split files
x should be the maximum number of records in a split file (except the last file).
Suppose if the input file has 5000 records and the user passes argument values of n and x as 3 and 1000 then, file 1 and 2 should contain 1000 records each and file 3 should contain 3000 records.
Another example will be if the input file has 4000 records and the user passes argument values of n and x as 2 and 3000 then, file 1 should contain 3000 records and file 2 should contain 1000 records.
I tried the below command:
split -n$maxBatch -l$batchSize --numeric-suffixes $fileDir/$nzbnListFileName $splitFileName
But it throws an error that, split cannot be done in more than one way.
Please advise.
you either need to give -n parameter or -l parameter. not both of them together.
split -l1000 --numeric-suffixes yourFile.txt
Sounds like split isn't enough for your requirements then - it can do either files of X lines each, or N files, but not the combination. Try something like this:
awk -v prefix=$splitFileName -v lines=$x -v maxfiles=$n '
(NR - 1) % lines == 0 && fileno < maxfiles { fileno +=1 }
{ print >> prefix fileno }' input.txt
That increments a counter every X lines up to N times, and writes lines to a file whose name depends on the counter.

How to split big tsv file using unique column element and also keep header

I have a tsv file called myfile.tsv. I want to split this file based on unique element in chr column using awk/gawk/bash or any faster command line and get chr1.tsv (header+row1), chr2.tsv (header+rows2 and 3),chrX.tsv(header+row4),chrY.tsv(header+rows5and6) and chrM.tsv(header+last row).
myfile.tsv
chr value region
chr1 223 554
chr2 433 444
chr2 443 454
chrX 445 444
chrY 445 443
chrY 435 243
chrM 543 544
Here's a little script that does what you're looking for:
NR == 1 {
header = $0
next
}
{
outfile = "chr" $1 ".tsv"
if (!seen[$1]++) {
print header > outfile
}
print > outfile
}
The first row is saved, so it can be used later. The other lines are printed to file matching the value of the first field. The header is added if the value hasn't been seen yet.
NR is the record number, so NR == 1 is only true when the record number is one (i.e. the first line). In this block, the whole line $0 is saved to the variable header. next skips any other blocks and moves to the next line. This means that the second block (which would otherwise be run unconditionally on every record) is skipped.
For every other line in the file, the output filename is built using the value of the first field. The array seen keeps a track of values of $1. !seen[$1]++ is only true the first time a given value of $1 is seen, as the value of seen[$1] is incremented every time it is checked. If the value of $1 has not yet been seen, the header is written to the output file.
Every line is written to the output file.

Difference between two files after average of selected entries using shell script or awk

I have two files. Each has one column with some missing data as 9999, 9000. e.g.
ifile1.txt ifile2.txt
30 20
9999 10
10 40
40 30
10 31
29 9000
9000 9999
9999 9999
31 1250
550 29
I would like to calculate the difference between the averages of the values (which are > 10) in the above two files without considering the missing values. i.e.
average ( the entries > 10 in ifile1.txt) - average (the entries > 10 in ifile2.txt)
Kindly note: The average should be taken over the selected values only i.e. those are > 10 only e.g.
(30+40+29+31+550/5) in ifile1.txt
I asked a similar question here Difference between two files after average using shell script or awk and tried like this, but getting error.
awk '($0>10) && !/9000|9999/{a[ARGIND]+=$0;b[ARGIND]++}END{print a[1]/b[1]-a[2]/b[2]}' file1 file2
Try this awk:
awk '$1>10 && $1 !~ /^(9000|9999)$/{a[ARGIND]+=$1; b[ARGIND]++}
END{printf "%.2f\n", a[1]/b[1]-a[2]/b[2]}' ifile[12].txt
Output:
-97.33
awk '$1>10 && !/^9999$|^9000$/ {if(NR==FNR) {s1+=$1;n1++} else {s2+=$1;n2++}} END {print s1/n1 - s2/n2}' file1 file2
For the first file (NR==FNR), for values greater than 10 and values not exactly equal to 9999 or 9000, add the values to variable s1. Also increment the count variable n1. So s1/n1 gives average for the first file. Similarly for the second file (NR!=FNR), update variables s2 and n2. In the END block, print the difference of the averages.

Merging text files in csh while preserving place rows

I have about 100 text files with two columns that I would like to merge into a single file in a c shell script by using factor "A".
For example, I have file A that looks like this
A B1
1 100
2 200
3 300
4 400
and File B looks like this
A B2
1 100
2 200
3 300
4 400
5 300
6 400
I want the final file C to look like this:
A B1 B2
1 100 100
2 200 200
3 300 300
4 400 400
5 300
6 400
The cat function only puts the files on top of one another and pastes them into file C. I would like to put the data next to each other. Is this possible?
to meet your exact spec, this will work. If the spec changes, you'll need to play with this some,
paste -d' ' factorA factorB \
| awk 'NF==4||NF==3{print $1, $2, $3} NF==2{print$1, $2}' \
> factorC
# note, no spaces or tabs after each of the contintuation chars `\` at end of lines!
output
$ cat factorC
A B1 B2
1 100 100
2 200 200
3 300 300
4 400 400
5 300
6 400
Not sure how you get bold headers to "trasmit" thru unix pipes. ;->
Recall that awk programs all have a basic underlying structure, i.e.
awk 'pattern{action}' file
So pattern can be a range of lines, a reg-exp, an expression (NF==4), missing, or a few other things.
The action is what happens when the pattern is matched. This is more traditional looking code.
If no pattern specified, then action applies to all lines read. If no action is specfied, but the pattern matches, then the line is printed (without further ado).
NF means NumberOfFields in the current line, so NF==2 will only process line with 2 fields (the trailing records in factorB).
The || is a logical OR operator, so that block will only process records, where the number of fields is 3 OR 4. Hopefully, the print statements are self-explanatory.
The , separating $1,$2,$3 (for example) is the syntax that converts to awk's internal variable OFS, which is OutputFieldSeparator, which can be assigned like OFS="\t" (to give an OFS of tab char), or as in this case, we are not specifying a value, so we're getting the default value for OFS, which is the space char (" ") (no quotes!)
IHTH

Resources