Get number of line of word number X in file - bash

Need to make a shell script that splits every csv file that uses \n as separator, the limit per file is the number of words and
I can't cut the line in half.
Finished script with the help of a wizard!
Example:
sh SliceByWords.sh 1000 .
Slices every file by 1000 words and put every part into subfolder
function has_number_number_of_words {
re='^[0-9]+$'
if ! [[ $1 =~ $re ]] ; then
echo "error: Not a number, please run the command with the number of words per file" >&2; exit 1
fi
}
#MAIN
has_number_number_of_words $1
declare -i WORDLIMIT=$1 # N of lines to part each file
subdir="Result"
mkdir $subdir
format=*.csv
for name in $format; do mv "$name" "${name// /___}"; done
for i in $format;
do
if [[ "$i" == "$format" ]]
then
echo "No Files"
else
( locali=$(echo $i | awk '{gsub(/ /,"\\ ");print}');
localword=$i;
FILENAMEWITHOUTEXTENSION="${localword%.*}" ;
subnoext=$subdir"/"$FILENAMEWITHOUTEXTENSION;
echo Processing file "$FILENAMEWITHOUTEXTENSION";
awk -v NOEXT=$subnoext -v wl=$WORDLIMIT -F" " 'BEGIN{fn=1}{c+=NF}{sv=NOEXT"_snd_"fn".csv";print $0>sv;}c>wl{c=0;++fn;close(sv);}' $localword;
)&
fi
done
wait #wait
for name in $format; do mv "$name" "${name//___/ }"; done
echo All files done.
Since i couldnt figure out how to enter awk files with spaces , im using
for name in $format; do mv "$name" "${name//___/ }"; done

I think this would be a lot easier to handle with awk:
awk -F" " 'BEGIN{filenumber=1}{counter+=NF}{print $0 > FILENAME"_part_"filenumber} counter>1000{counter=0;++filenumber}' yourinputfile
awk here is:
Splitting each line by space -F" "
Before processing the file set the filenumber variable to 1
Bump the counter variable by the number of fields in the line {counter+=NF}
Print out the line to the file, numbered by a variable. Using the FILENAME built-in variable here to pull through yourinputfile. {print $0 > FILENAME"_part_"filenumber}
If the counter has popped over 1000, then send it back to 0 and bump the filenumber variable by 1 counter>1000{counter=0;++filenumber}
Minimized a bit:
awk -F" " 'BEGIN{fn=1}{c+=NF}{print $0>FILENAME"_part_"fn}c>1000{c=0;++fn}' yourinputfile

Related

Simple bash program which compares values

I have a file which contains varoius data (date,time,speed, distance from the front, distance from the back), the file looks like this, just with more rows:
2003.09.23.,05:05:21:64,134,177,101
2009.03.10.,17:46:17:81,57,102,57
2018.01.05.,00:30:37:04,354,145,156
2011.07.11.,23:21:53:43,310,125,47
2011.06.26.,07:42:10:30,383,180,171
I'm trying to write a simple Bash program, which tells the dates and times when the 'distance from the front' is less than the provided parameter ($1)
So far I wrote:
#!/bin/bash
if [ $# -eq 0 -o $# -gt 1 ]
then
echo "wrong number of parameters"
fi
i=0
fdistance=()
input='auto.txt'
while IFS= read -r line
do
year=${line::4}
month=${line:5:2}
day=${line:8:2}
hour=${line:12:2}
min=${line:15:2}
sec=${line:18:2}
hthsec=${line:21:2}
fdistance=$(cut -d, -f 4)
if [ "$fdistance[$i]" -lt "$1" ]
then
echo "$year[$i]:$month[$i]:$day[$i],$hour[$i]:$min[$i]:$sec[$i]:$hthsec[$i]"
fi
i=`expr $i + 1`
done < "$input"
but this gives the error "whole expression required" and doesn't work at all.
If you have the option of using awk, the entire process can be reduced to:
awk -F, -v dist=150 '$4<dist {split($1,d,"."); print d[1]":"d[2]":"d[3]","$2}' file
Where in the example above, any record with distance (field 4, $4) less than the dist variable value takes the date field (field 1, $1) and splits() the field into the array d on "." where the first 3 elements will be year, mo, day and then simply prints the output of those three elements separated by ":" (which eliminates the stray "." at the end of the field). The time (field 2, $2) is output unchanged.
Example Use/Output
With your sample data in file, you can do:
$ awk -F, -v dist=150 '$4<dist {split($1,d,"."); print d[1]":"d[2]":"d[3]","$2}' file
2009:03:10,17:46:17:81
2018:01:05,00:30:37:04
2011:07:11,23:21:53:43
Which provides the records in the requested format where the distance is less than 150. If you call awk from within your script you can pass the 150 in from the 1st argument to your script.
You can also accomplish this task by substituting a ':' for each '.' in the first field with gsub() and outputting a substring of the first field with substr() that drops the last character, e.g.
awk -F, -v dist=150 '$4<dist {gsub(/[.]/,":",$1); print substr($1,0,length($1)-1),$2}' file
(same output)
While parsing the data is a great exercise for leaning string handling in shell or bash, in practice awk will be Orders of Magnitude faster than a shell script. Processing a million line file -- the difference in runtime can be seconds with awk compared to minutes (or hours) with a shell script.
If this is an exercise to learn string handling in your shell, just put this in your hip pocket for later understanding that awk is the real Swiss Army-Knife for text processing. (well worth the effort to learn)
Would you try the following:
#/bin/bash
if (( $# != 1 )); then
echo "usage: $0 max_distance_from_the_front" >& 2 # output error message to the stderr
exit 1
fi
input="auto.txt"
while IFS=, read -r mydate mytime speed fdist bdist; do # split csv and assign variables
mydate=${mydate%.}; mydate=${mydate//./:} # reformat the date string
if (( fdist < $1 )); then # if the front disatce is less than $1
echo "$mydate,$mytime" # then print the date and time
fi
done < "$input"
Sample output with the same parameter as Keldorn:
$ ./test.sh 130
2009:03:10,17:46:17:81
2011:07:11,23:21:53:43
There are a few odd things in your script:
Why is fdistance an array. It is not necessary (and here done wrong) since the file is read line by line.
What is the cut of the line fdistance=$(cut -d, -f 4) supposed to cut, what's the input?
(Note: When invalid parameters, better end the script right away. Added in the example below.)
Here is a working version (apart from the parsing of the date, but that is not what your question was about so I skipped it):
#!/usr/bin/env bash
if [ $# -eq 0 -o $# -gt 1 ]
then
echo "wrong number of parameters"
exit 1
fi
input='auto.txt'
while IFS= read -r line
do
fdistance=$(echo "$line" | awk '{split($0,a,","); print a[4]}')
if [ "$fdistance" -lt "$1" ]
then
echo $line
fi
done < "$input"
Sample output:
$ ./test.sh 130
2009.03.10.,17:46:17:81,57,102,57
2011.07.11.,23:21:53:43,310,125,47
$

Remove Files of given range in bash/shell/unix

I have files of below name format.
test_1_452161_987654321.ARC
test_1_452162_987654321.ARC
test_1_452163_987654321.ARC
.
.
.
test_1_452190_987654321.ARC
i.e i need to delete 30 files in the above case where user will give the below inputs
Start_File = 452161
End_File = 452190
How to delete the above series files? (i am new to bash/shell programming i tried using while loop and remove which dint work so seeking experts help)
My Code
echo "Enter Start File"
read start
echo "Enter End File"
read end
i=$start
j=$end
while [$i -le $j]; do
rm *$i*.ARC
((i++))
done
Replace everything after fourth line by
for ((i=$start; i<=$end; i++)); do
echo rm *_${i}_*.ARC
done
If everything looks fine, remove echo.
You can use awk + xargs:
printf "%s\n" test*ARC | awk -v s=452161 -v e=452190 -F_ '$3 >= s && $3 <= e' | xargs rm
awk is splitting file names on _ and then checking 3rd field is within start/end range.
You may be interested in expanding a sequence additional documentation. The code below will echo the file names you are interested in based on the provided start and end input.
echo -n "Enter Start File => "; read start
echo -n "Enter End File => "; read end
for file in *{$start..$end}*
do
echo $file
done
EDIT: Variable expansion works in zsh but not bash
Just to add to your options, you can use seq:
echo "Enter Start File"
read start
echo "Enter End File"
read end
for i in $(seq $start $end)
do
rm -f *_${i}_*.ARC
done
To me this seems clean and clear.

I want to write a shell script to find the smallest filename(based on string length) in current directory

I want to write a shell script to find the smallest filename(based on string length) in current directory.
#!/bin/bash
data=$(ls -trh *)
max=0;
for entry in ${data}
do
echo ${entry}
len=${#entry}
echo ${len}
max1=${len}
echo ${max1}
echo ${max}
if( $max1 -gt $max )
then
word=$entry
max=$max1;
fi;
done
Don't parse ls -- you want:
for entry in *
Quote "$entry" everywhere you use it: echo "$entry"
The syntax for your if condition is wrong:
if [[ $max1 -gt $max ]]
Check your script at http://www.shellcheck.net/
#forte: try:
ls | awk '{LEN=length($0);B[LEN]=B[LEN]?B[LEN] ORS $0:$0;;MIN=MIN<LEN?(MIN?MIN:LEN):LEN} END{print B[MIN]}'
Above should give you smallest name file(s), even if you have more than 1 file which are of same length and smallest it will give them all.
EDIT1: Adding a non-one liner form of solution too now successfully.
ls | awk '{
LEN=length($0);
B[LEN]=B[LEN]?B[LEN] ORS $0:$0;
MIN=MIN<LEN?(MIN?MIN:LEN):LEN
}
END{
print B[MIN]
}
'

issue with if statement in bash

I have issue with an if statement. In WEDI_RC is saved log file in the following format:
name_of_file date number_of_starts
I want to compare first argument $1 with first column and if it is true than increment number of starts. When I start my script it works but just with one file, eg:
file1.c 11:23:07 1
file1.c 11:23:14 2
file1.c 11:23:17 3
file1.c 11:23:22 4
file2.c 11:23:28 1
file2.c 11:23:35 2
file2.c 11:24:10 3
file2.c 11:24:40 4
file2.c 11:24:53 5
file1.c 11:25:13 1
file1.c 11:25:49 2
file2.c 11:26:01 1
file2.c 11:28:12 2
Every time when I change file it begin counts from 1. I need to continue with counting when it ends.
Hope you understand me.
while read -r line
do
echo "line:"
echo $line
if [ "$1"="$($line | grep ^$1)" ]; then
number=$(echo $line | grep $1 | awk -F'[ ]' '{print $3}')
else
echo "error"
fi
done < $WEDI_RC
echo "file"
((number++))
echo $1 `date +"%T"` $number >> $WEDI_RC
There are at least two ways to resolve the problem. The most succinct is probably:
echo "$1 $(date +"%T") $(($(grep -c "^$1 " "$WEDI_RC") + 1))" >> "$WEDI_RC"
However, if you want to have counts for each file separately, you can do that using an associative array, assuming you have Bash version 4.x (not 3.x as is provided on Mac OS X, for example). This code assumes the file is correctly formatted (so that the counts do not reset to 1 each time the file name changes).
declare -A files # Associative array
while read -r file time count # Split line into three variables
do
echo "line: $file $time $count" # One echo - not two
files[$file]="$count" # Record the current maximum for file
done < "$WEDI_RC"
echo "$1 $(date +"%T") $(( ${files[$1]} + 1 ))" >> "$WEDI_RC"
The code uses read to split the line into three separate variables. It echoes what it read and records the current count. When the loop's done, it echoes the data to append to the file. If the file is new (not mentioned in the file yet), then you will get a 1 added.
If you need to deal with the broken file as input, then you can amend the code to count the number of entries for a file, instead of trusting the count value. The bare-array reference notation used in the (( … )) operation is necessary when incrementing the variable; you can't use ${array[sub]}++ with the increment (or decrement) operator because that evaluates to the value of the array element, not its name!
declare -A files # Associative array
while read -r file time count # Split line into three variables
do
echo "line: $file $time $count" # One echo - not two
((files[$file]++)) # Count the occurrences of file
done < "$WEDI_RC"
echo "$1 $(date +"%T") $(( ${files[$1]} + 1 ))" >> "$WEDI_RC"
You can even detect whether the format is in the broken or fixed style:
declare -A files # Associative array
while read -r file time count # Split line into three variables
do
echo "line: $file $time $count" # One echo - not two
if [ $((files[$file]++)) != "$count" ]
then echo "$0: warning - count out of sync: ${files[$file]} vs $count" >&2
fi
done < "$WEDI_RC"
echo "$1 $(date +"%T") $(( ${files[$1]} + 1 ))" >> "$WEDI_RC"
I don't get exactly what you want to achieve with your test [ "$1"="$($line | grep ^$1)" ] but it seems you are checking that the line start with the first argument.
If it is so, I think you can either:
provide the -o option to grep so that it print just the matched output (so $1)
use [[ "$line" =~ ^"$1" ]] as test.

How to printf a variable length line in fixed length chunks?

I need to to analyze (with grep) and print (with some formatting) the content of an
app's log.
This log contains text data in variable length lines. What I need is, after some grepping, loop each line of this output and print it with a maximum fixed length of 50 characters. If a line is longer than 50 chars, it should print a newline and then continue with the rest in the following line and so on until the line is completed.
I tried to use printf to do this, but it's not working and I don't know why. It just outputs the lines in same fashion of echo, without any consideration about printf formatting, though the \t character (tab) works.
function printContext
{
str="$1"
log="$2"
tmp="/tmp/deluge/$$"
rm -f $tmp
echo ""
echo -e "\tLog entries for $str :"
ln=$(grep -F "$str" "$log" &> "$tmp" ; cat "$tmp" | wc -l)
if [ $ln -gt 0 ];
then
while read line
do
printf "\t%50s\n" "$line"
done < $tmp
fi
}
What's wrong? I Know that I can make a substring routine to accomplish this task, but printf should be handy for stuff like this.
Instead of:
printf "\t%50s\n" "$line"
use
printf "\t%.50s\n" "$line"
to truncate your line to 50 characters only.
I'm not sure about printf but seeing as how perl is installed everywhere, how about a simple 1 liner?
echo $ln | perl -ne ' while( m/.{1,50}/g ){ print "$&\n" } '
Here's a clunky bash-only way to break the string into 50-character chunks
i=0
chars=50
while [[ -n "${y:$((chars*i)):$chars}" ]]; do
printf "\t%s\n" "${y:$((chars*i)):$chars}"
((i++))
done

Resources