I have if test '\n' = "$line" but this doesn't seem to catch the new lines. What is wrong in that code?
How about
if test $line = $'\n'
If you are testing that $line is exactly a newline, you can do either of
test "$line" = $'\n' # This is non-standard, and will only work in some shells
test "$line" = '
' # This (two-liner) will work in any shell
If you want to know if $line merely contains a newline (ie, it is at
least two lines long), you could do:
if echo "$line" | sed 1d | grep -q .; then
echo line is at least 2 lines
fi
Related
I am trying to collect the lines from a file which doesn't start with a # as its first caracter.
I have this code I am able to get them:
while IFS= read -r line
do
[[ -z "$line" ]] && continue
[[ "$line" =~ ^# ]] && continue
#echo "LINEREADED: $line"
done < $file
So the output I have is something like this:
modules/core_as/xxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxx [100]
My question is how can I get only the string without the [100]?
I know there is some commands like sed or trim but the problem is that the string is not always that length, sometimes is different like:
cross_modules/core_as/xxxx/xxxxxxxxx [100-103]
or
cross_modules/core_as/xxxxxxxxxxxx/xxxxxxxxx [100-103]
or anything like that...
And in all this cases I only need the string without the [....] and without the last blank space at the end of last x, whichever the length of the string is, like cross_modules/core_as/xxxxxxxxxxxx/xxxxxxxxx
echo ${caseReaded:1:${#caseReaded}-7}
This also do the job but is not generic for any length.
Does anyone knows how I can get this?
You can strip a certain part of a string in bash
echo "${line% [*}"
cross_modules/core_as/xxxx/xxxxxxxxx
modules/core_as/xxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxx
cross_modules/core_as/xxxxxxxxxxxx/xxxxxxxxx
If the spaces are only before [:
while IFS= read -r line _
do
[[ -z $line ]] && continue
[[ $line =~ ^# ]] && continue
done < "$file"
grep to match all lines not starting with # and then display the first field using cut, which works if the first field doesn't contain spaces:
grep -v ^# "$file" | cut -f1 -d' '
If the thing before [100] contains spaces, this may be the way to go:
grep -v ^# "$file" | sed -E 's/^(.*) .*$/\1/'
The last one works because the .* match in sed is greedy so only the last space will be left to match the outer condition .*$.
Below is a shell script that is written to process a huge file. It typically reads a fixed length file line by line, perform substring and append into another file as a delimited file. It works perfectly, but it is too slow.
array=() # Create array
while IFS='' read -r line || [[ -n "$line" ]] # Read a line
do
coOrdinates="$(echo -e "${line}" | grep POSITION | cut -d'(' -f2 | cut -d')' -f1 | cut -d':' -f1,2)"
if [[ -z "${coOrdinates// }" ]];
then
echo "Not adding"
else
array+=("$coOrdinates")
fi
done < "$1_CTRL.txt"
while read -r line;
do
result='"'
for e in "${array[#]}"
do
SUBSTRING1=`echo "$e" | sed 's/.*://'`
SUBSTRING=`echo "$e" | sed 's/:.*//'`
result1=`perl -e "print substr('$line', $SUBSTRING,$SUBSTRING1)"`
result1="$(echo -e "${result1}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
result=$result$result1'"'',''"'
done
echo $result >> $1_1.txt
done < "$1.txt"
Earlier, i had used the cut command and changed as above, but there is no improvement in the time taken.
Can please suggest what kind of changes can be done to improve the time taken for processing..
Thanks in advance
Update:
Sample content of the input file :
XLS01G702012 000034444132412342134
Control File :
OPTIONS (DIRECT=TRUE, ERRORS=1000, rows=500000) UNRECOVERABLE
load data
CHARACTERSET 'UTF8'
TRUNCATE
into table icm_rls_clientrel2_hg
trailing nullcols
(
APP_ID POSITION(1:3) "TRIM(:APP_ID)",
RELATIONSHIP_NO POSITION(4:21) "TRIM(:RELATIONSHIP_NO)"
)
Output file:
"LS0","1G702012 0000"
perl:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
# read the control file
my $ctrl;
{
local $/ = "";
open my $fh, "<", shift #ARGV;
$ctrl = <$fh>;
close $fh;
}
my #positions = ( $ctrl =~ /\((\d+):(\d+)\)/g );
# read the data file
open my $fh, "<", shift #ARGV;
while (<$fh>) {
my #words;
for (my $i = 0; $i < scalar(#positions); $i += 2) {
push #words, substr($_, $positions[$i], $positions[$i+1]);
}
say join ",", map {qq("$_")} #words;
}
close $fh;
perl parse.pl x_CTRL.txt x.txt
"LS0","1G702012 00003"
Different results from what you requested:
in the POSITION(m:n) syntax of the control file, is n a length or an
index?
in the data file, are those spaces or tabs?
I suggest, with pure bash and to avoid subshells:
if [[ $line =~ POSITION ]] ; then # grep POSITION
coOrdinates="${line#*(}" # cut -d'(' -f2
coOrdinates="${coOrdinates%)*}" # cut -d')' -f1
coOrdinates="${coOrdinates/:/ }" # cut -d':' -f1,2
if [[ -z "${coOrdinates// }" ]]; then
echo "Not adding"
else
array+=("$coOrdinates")
fi
fi
more efficient, by gniourf_gniourf :
if [[ $line =~ POSITION\(([[:digit:]]+):([[:digit:]])\) ]]; then
array+=( "${BASH_REMATCH[*]:1:2}" )
fi
similarly:
SUBSTRING1=${e#*:} # $( echo "$e" | sed 's/.*://' )
SUBSTRING= ${e%:*} # $( echo "$e" | sed 's/:.*//' )
# to confirm, I don't know perl substr
result1=${line:$SUBSTRING:$SUBSTRING1} # $( perl -e "print substr('$line', $SUBSTRING,$SUBSTRING1)" )
#result1= # "$(echo -e "${result1}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
# trim, if nécessary?
result1="${result1%${result1##*[^[:space:]]}}" # right
result1="${result1#${result1%%[^[:space:]]*}}" # left
gniourf_gniourf suggest having the grep out of the loop:
while read ...; do
...
done < <(grep POSITION ...)
for extra efficiency: while/read loops are very slow in Bash, so prefiltering as much as possible will speed up the process quite a lot.
Updated Answer
Here is a version where I parse the control file with awk, save the character positions and then use those when parsing the input file:
awk '
/APP_ID/ {
sub(/\).*/,"") # Strip closing parenthesis and all that follows
sub(/^.*\(/,"") # Strip everything up to opening parenthesis
split($0,a,":") # Extract the two character positions separated by colon into array "a"
next
}
/RELATIONSHIP/ {
sub(/\).*/,"") # Strip closing parenthesis and all that follows
sub(/^.*\(/,"") # Strip everything up to opening parenthesis
split($0,b,"[():]") # Extract character positions into array "b"
next
}
FNR==NR{next}
{ f1=substr($0,a[1]+1,a[2]); f2=substr($0,b[1]+1,b[2]); printf("\"%s\",\"%s\"\n",f1,f2)}
' ControlFile InputFile
Original Answer
Not a complete, rigorous answer, but this should give you an idea of how to do the extraction with awk once you have the POSITION parameters from the control file:
awk -v a=2 -v b=3 -v c=5 -v d=21 '{f1=substr($0,a,b); f2=substr($0,c,d); printf("\"%s\",\"%s\"\n",f1,f2)}' InputFile
Sample Output
"LS0","1G702012 00003"
Try running that on your large input file to get an idea of the performance, then tweak the output. Reading the control file is not at all time-critical so don't bother with optimising that.
To avoid the (slow) while loop , you can use cut and paste
#!/bin/bash
inFile=${1:-checkHugeFile}.in
ctrlFile=${1:-checkHugeFile}_CTRL.txt
outFile=${1:-checkHugeFile}.txt
cat /dev/null > $outFile
typeset -a array # Create array
while read -r line # Read a line
do
coOrdinates="${line#*(}"
coOrdinates="${coOrdinates%%)*}"
[[ -z "${coOrdinates// }" ]] && { echo "Not adding"; continue; }
array+=("$coOrdinates")
done < <(grep POSITION "$ctrlFile" )
echo coOrdinates: "${array[#]}"
for e in "${array[#]}"
do
nr=$((nr+1))
start=${e%:*}
len=${e#*:}
from=$(( start + 1 ))
to=$(( start + len + 1 ))
cut -c$from-$to $inFile > ${outFile}.$nr
done
paste $outFile.* | sed -e 's/^/"/' -e 's/\t/","/' -e 's/$/"/' >${outFile}
rm $outFile.[0-9]
I am reading in from a .txt file which looks something like this:
:DRIVES
name,server,share_1
other_name,other_server,share_2
new_name,new_server,share 3
:NAME
which is information to mount drives. I want to load them into a bash array to cycle through and mount them, however my current code breaks at the third line because the array is being created by any white space. Instead of reading
new_name,new_server,share 3
as one line, it reads it as 2 lines
new_name,new_server,share
3
I have tried changing the value of IFS to
IFS=$'\n' #and
IFS='
'
however neither work. How can I create an array from the above file separated by newlines. My code is below.
file_formatted=$(cat ~/location/to/file/test.txt)
IFS='
' # also tried $'\n'
drives=($(sed 's/^.*:DRIVES //; s/:.*$//' <<< $file_formatted))
for line in "${drives[#]}"
do
#seperates lines into indiviudal parts
drive="$(echo $line | cut -d, -f2)"
server="$(echo $line | cut -d, -f3)"
share="$(echo $line | cut -d, -f4 | tr '\' '/' | tr '[:upper:]' '[:lower:]')"
#mount //$server/$share using osascript
#script breaks because it tries to mount /server/share instead of /server/share 3
EDIT:
tried this and got the same output as before:
drives=$(sed 's/^.*:DRIVES //; s/:.*$//' <<< $file_formatted)
while IFS= read -r line; do
printf '%s\n' "$line"
done <<< "$drives"
This is the correct way to iterate over your file; no arrays needed.
{
# Skip over lines until we read :DRIVES
while IFS= read -r line; do
[[ $line = :DRIVES ]] && break
done
# Split each comma-separated line into the desired variables,
# until we read :NAMES, wt which point we break out of this loop
while IFS=, read -r drive server share; do
[[ $drive == :NAMES ]] && break
# Use $drive, $server, and $share
done
# No need to read the rest of the file, if any
} < ~/location/to/file/test.txt
I have a read loop that is reading a variable but not behaving the way I expect. I want to read every line of my variable and process each one. Here is my loop:
while read -r line
do
echo $line | sed 's/<\/td>/<\/td>$/g' | cut -d'$' -f2,3,4 >> file.txt
done <<< "$TABLE"
I expect it to process every line of the file but instead it just does the first one. If my the middle is simply echo $line >> file.txt it works as expected. What's going on here? How do I get the behavior I want?
It seems your lines are delimited by \r instead of \n.
Use this while loop to iterate the input with use of read -d $'\r':
while read -rd $'\r' line; do
echo "$line" | sed 's~</td>~</td>$~g' | cut -d'$' -f2,3,4 >> file.txt
done <<< "$TABLE"
If $TABLE contains a multi-line string, I recommend
printf '%s\n' "$TABLE" |
while read -r line; do
echo $line | sed 's/<\/td>/<\/td>$/g' | cut -d'$' -f2,3,4 >> file.txt
done
This is also more portable since the '<<<' operator for here-strings is not POSIX.
I have a file, file1.txt, like this:
This is some text.
This is some more text. ② This is a note.
This is yet some more text.
I need to delete any text appearing after "②", including the "②" and any single space appearing immediately before, if such a space is present. E.g., the above file would become file2.txt:
This is some text.
This is some more text.
This is yet some more text.
How can I delete the "②", anything coming after, and any preceding single space?
The solutions at How can I remove all text after a character in bash? do not seem to work, perhaps because "②" is not an ordinary character.
The file is saved in UTF-8.
A Perl solution:
$ perl -CS -i~ -p -E's/ ②.*//' file1.txt
You'll end up with the correct data in file1.txt and a backup of the original file in file1.txt~.
I hope you do realize most unix utilities do not work with unicode. I assume your input is in UTF-8, if not you have to adjust accordingly.
#!/bin/bash
function px {
local a="$#"
local i=0
while [ $i -lt ${#a} ]
do
printf \\x${a:$i:2}
i=$(($i+2))
done
}
(iconv -f UTF8 -t UTF16 | od -x | cut -b 9- | xargs -n 1) |
if read utf16header
then
echo -e $utf16header
out=''
while read line
do
if [ "$line" == "000a" ]
then
out="$out $line"
echo -e $out
out=''
else
out="$out $line"
fi
done
if [ "$out" != '' ] ; then
echo -e $out
fi
fi |
(perl -pe 's/( 0020)* 2461 .*$/ 000a/;s/ *//g') |
while read line
do
px $line
done | (iconv -f UTF16 -t UTF8 )
sed -e "s/[[:space:]]②[^\.]*\.//"
However, I am not sure that the ② symbol is parsed correctly. Maybe you have to use UTF8 codes or something like.
Try this:
sed -e '/②/ s/[ ]*②.*$//'
/②/ look only for the lines containing the magic symbol;
[ ]* for any number (matches none) of spaces before the magic symbol;
.*$ everything else till the end of line.