How do I check for blank fields on a delimited line with sed or awk - bash

I'm parsing source input files using a bash script. I'm generating delimited output in a file. I need a way to check that each field of the delimited output is populated. For example AA,BB,3,4,5,6,7,8 would be good and AA,,3,4,5,6,,8 would be bad. How do I check if there are blank fields on a line using sed/awk or some other tool I can put in a bash script? Thanks in advance!

With bash:
string='AA,,3,4,5,6,,8'
if [[ $string =~ ^,|,,|,$ ]]; then
echo "error"
else
echo "okay"
fi
Output:
error

You can print the lines with at least one empty field using:
awk -F, '{for (i=1;i<=NF;i++) if ($i=="") {print; next}}'
-F, sets the field delimiter as ,
for (i=1;i<=NF;i++) iterates over the fields
if ($i=="") {print; next} prints the record if the field being tested is empty and goes to the next record
Example:
% cat file.txt
AA,BB,3,4,5,6,7,8
AA,,3,4,5,6,,8
% awk -F, '{for (i=1;i<=NF;i++) if ($i=="") {print; next}}' file.txt
AA,,3,4,5,6,,8

You can test with a regular expression with a repeating group that fits with your requirement:
grep -E '^([^,]+,)*[^,]$' <<< "${AA,,3,4,5,6,,8}"
Testcode:
for str in "AA,BB,3,4,5,6,7,8" "AA,,3,4,5,6,,8" ; do
echo "==========="
echo "Testing >>>${str}<<<"
grep -Eq '^([^,]+,)*[^,]$' <<< "${str}" || echo "String incorrect"
done
You can grep the incorrect lines from a file using
grep -vE '^([^,]+,)*[^,]$' inputfile

Related

Unable to get second column using awk

I have a file that contains three columns separated by four spaces
1234 567 q
1902 190 r
I'm trying to get the second column by searching for the first column string
i=`grep $str $file | awk -F "[ ]" '{print $2 }'`
j=`grep $str $file | awk -F "[ ]" '{print $3 }'`
echo second_col=$i
echo third_col=$j
I modified the file and used tab and comma as separators but I'm still unable to print the second or third column values for a particular string.
What am I doing wrong?
I'm trying to get the second column by searching for the first column string
If you don't have spaces in your columns then you can just use awk for this:
awk -v str="$str" '$1 ~ str { print $2 }' "$file"
awk automatically splits fields on whitespaces.
In case you have spaces in your column value then use:
awk -F ' {4}' -v str="$str" '$1 ~ str { print $2 }' "$file"
' {4}' is a regex to make 4 spaces a input field separator.
Reference: Effective AWK Programming
if you have a broken awk try this solution with sed
sed -nE 's/^1234\s+(\S+).*/\1/p'
find the pattern at the beginning of the line and print the next non-space field. If your fields include spaces this approach is not going to work.

Shell Script - Extract number at X column in current line in file

I am reading a file (test.log.csv) line by line until the end of the file, and I want to extract the value at 4th column of current line read then output the value to a text file. (output.txt)
For example, right now I read until 2nd line (INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1) and I want to extract the number at column 4 in the current line and output to a text file named as output.txt.
test.log.csv
INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1
INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1
INSERT,SLT_TEST_1,TEST,1127192896,0,DEBUG1
The desired output is
output.txt
1127192896
1127192896
1127192896
Right now my script is as below
#! /bin/bash
clear
rm /home/mobaxterm/Script/output.txt
while IFS= read -r line
do
if [[ $line == *"INSERT"* ]] && [[ $line == *"$1"* ]]
then
echo $line >> /home/mobaxterm/Script/output.txt
lastID=$(awk -F "," '{if (NR==curLine) { print $4 }}' curLine="${lineCount}")
echo $lastID
else
if [ lastID == "$1" ]
then
echo $line >> /home/mobaxterm/Script/output.txt
fi
fi
lineCount=$(($lineCount+1))
done < "/home/mobaxterm/Script/test.log.csv"
The parameter ($1) will be 1127192896
I tried declaring a counter in the loop and compare NR with the counter, but the script just stopped after it found the first one.
Find all the lines where the 4th field is 1127192896 and output the 4th field:
awk -F, -v SEARCH="1127192896" '$4 ~ SEARCH {print $4}' test.log.csv
1127192896
1127192896
1127192896
Find all the lines containing the word "INSERT" and where the 4th field is 1127192896
awk -F, -v SEARCH="1127192896" '$4 ~ SEARCH && /INSERT/ {print $4}' test.log.csv
If you have the number you want to look for in a variable called $1, put that in place of the 1127192896, like this:
awk -F, -v SEARCH="$1" '$4 ~ SEARCH && /INSERT/ {print $4}' test.log.csv
You can combine variable substitution and definition of array.
array_variable=( ${line//,/ /} )
sth_you_need=${array_variable[1]}
Or you can just use awk/cut
sth_you_need=$(echo $line | awk -F, 'NR==2{print $2}')
# or
sth_you_need=$(echo $line | cut -d, -f2)

printing results in one line separated by commas in bash

How can I print all text file location separated by commas in one line? Can I do this in for loop?
Here is an example of files.
/data/home/files/txt_files_1/file1.txt
/data/home/files/txt_files_1/file2.txt
/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt
/data/home/files/txt_files_2/file2.txt
/data/home/files/txt_files_2/file3.txt
output would look like
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt \
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
Thanks
Here is the correct code
#!/bin/bash
delim=""
for i in /data/home/files/txt_files_1/file*
do
printf "%s%s" "$delim" "$i"
delim=","
done
printf "\\"
printf "\n"
for i in /data/home/files/txt_files_2/file*
do
printf "%s%s" "$delim" "$i"
delim=","
done
For single file input:
awk -v OFS=, -v RS= 'NF { $1 = $1; print }' file
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
Or
awk -v OFS=, -v RS= -v ORS='\n\n' 'NF { $1 = $1; print }' file
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt
You can use printf "%s," "$file" to print several names into a single line. To get the delimiters right, I use this trick:
delim=""
...loop...
printf "%s%s" "$delim" "$file"
delim=","
printf "\n"
<command to generate lines of paths> | tr '\n' ','
example:
echo "/data/home/files/txt_files_1/file1.txt
/data/home/files/txt_files_1/file2.txt
/data/home/files/txt_files_1/file3.txt
/data/home/files/txt_files_2/file1.txt
/data/home/files/txt_files_2/file2.txt" | tr '\n' ','
outputs:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt,,/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt
Assuming your input is in a file called list, this Perl one-liner does the job:
perl -F'\n' -00 -ane 'push #a, join(",", #F) }{ print(join(" \\\n\n", #a), "\n")' list
explanation
-00, in combination with -n, reads the file one block (paragraph) at a time.
The -a switch in combination with -F'\n' auto-splits the text on each new line. The result goes into the array #F.
An array is built, each element containing the comma separated list of the elements in #F
Once the file has been processed, all the elements of the array #a are printed, joined together as you specified. The additional "\n" on the end is optional.
Output:
/data/home/files/txt_files_1/file1.txt,/data/home/files/txt_files_1/file2.txt,/data/home/files/txt_files_1/file3.txt \
/data/home/files/txt_files_2/file1.txt,/data/home/files/txt_files_2/file2.txt,/data/home/files/txt_files_2/file3.txt

unix command to get lines from in between first and last occurence of a word and write to a file

I want a unix command to find the lines between first & last occurence of a word
For example:
let's imagine we have 1000 lines. Tenth line contains word "stackoverflow", thirty fifth line also contains word "stackoverflow".
I want to print lines between 10 and 35 and write it to a new file.
You can make it in two steps. The basic idea is to:
1) get the line number of the first and last match.
2) print the range of lines in between these range.
$ read first last <<< $(grep -n stackoverflow your_file | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ awk -v f=$first -v l=$last 'NR>=f && NR<=l' your_file
Explanation
read first last reads two values and stores them in $first and $last.
grep -n stackoverflow your_file greps and shows the output like this: number_of_line:output
awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}') prints the number of line of the first and last match of stackoverflow in the file.
And
awk -v f=$first -v l=$last 'NR>=f && NR<=l' your_file prints all lines from $first line number till $last line number.
Test
$ cat a
here we
have some text
stackoverflow
and other things
bla
bla
bla bla
stackoverflow
and whatever else
stackoverflow
to make more fun
blablabla
$ read first last <<< $(grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ awk -v f=$first -v l=$last 'NR>=f && NR<=l' a
stackoverflow
and other things
bla
bla
bla bla
stackoverflow
and whatever else
stackoverflow
By steps:
$ grep -n stackoverflow a
3:stackoverflow
9:stackoverflow
11:stackoverflow
$ grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}'
3 11
$ read first last <<< $(grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ echo "first=$first, last=$last"
first=3, last=11
If you know an upper bound of how many lines there can be (say, a million), then you can use this simple abusive script:
(grep -A 100000 stackoverflow | grep -B 1000000 stackoverflow) < file
You can append | tail -n +2 | head -n -1 to strip the border lines as well:
(grep -A 100000 stackoverflow | grep -B 1000000 stackoverflow
| tail -n +2 | head -n -1) < file
I'm not 100% sure from the question whether the output should be inclusive of the first and last matching lines, so I'm assuming it is. But this can be easily changed if we want exclusive instead.
This pure-bash solution does it all in one step - i.e. the file (or pipe) is only read once:
#!/bin/bash
function midgrep {
while read ln; do
[ "$saveline" ] && linea[$((i++))]=$ln
if [[ $ln =~ $1 ]]; then
if [ "$saveline" ]; then
for ((j=0; j<i; j++)); do echo ${linea[$j]}; done
i=0
else
saveline=1
linea[$((i++))]=$ln
fi
fi
done
}
midgrep "$1"
Save this as a script (e.g. midgrep.sh) and pipe whatever output you like to it as follows:
$ cat input.txt | ./midgrep.sh stackoverflow
This works as follows:
find the first matching line and buffer in the first element of an array
continue reading lines until the next match, buffering to the array as we go
on each subsequent matches, flush the buffer array to output
continue reading file to the end. If there are no more matches, then the last buffer is simply discarded.
The advantage of this approach is that we only read through the input one time only. The disadvantage is that we buffer everything between each match - if there are many lines between each match, then these are all buffered to memory, until we hit the next match.
Also this uses the bash =~ regular expression operator to keep this pure bash. But you could replace this with a grep instead, if you are more comfortable with that.
Using perl :
perl -00 -lne '
chomp(my #arr = split /stackoverflow/);
print join "\nstackoverflow", #arr[1 .. $#arr -1 ]
' file.txt | tee newfile.txt
The idea behind this is to feed an array of the whole input file in to chunks using "stackoverflow" string to split. Next, we print the 2nd occurrences to the last -1 with join "stackoverflow".

Save variable from txt using awk

I have a txt in my folder named parameters.txt which contains
PP1 20 30 40 60
PP2 0 0 0 0
I'd like to use awk to read the different parameters depending on the value of the first text field in each line. At the moment, if I run
src_dir='/PP1/'
awk "$src_dir" '{ print $2 }' parameters.txt
I correctly get
20
I would simply like to store that 20 into a variable and to export the variable itself.
Thanks in advance!
If you want to save the output, do var=$(awk expression):
result=$(awk -v value=$src_dir '($1==value) { print $2 }' parameters.txt)
You can make your command more general giving awk the variable with the -v syntax:
$ var="PP1"
$ awk -v v=$var '($1==v) { print $2 }' a
20
$ var="PP2"
$ awk -v v=$var '($1==v) { print $2 }' a
0
You don't really need awk for that. You can do it in bash.
$ src_dir="PP1"
$ while read -r pattern columns ; do
set - $columns
if [[ $pattern =~ $src_dir ]]; then
variable=$2
fi
done < parameters.txt
shell_pattern=PP1
output_var=$(awk -v patt=$shell_pattern '$0 ~ patt {print $2}' file)
Note that $output_var may contain more than one value if the pattern matches more than one line. If you're only interested in the first value, then have the awk program exit after printing .

Resources