sed DON'T remove extra whitespace - bash

It seems everybody else wants to remove any additional whitespace, however I have the opposite problem.
I have a file, call it some_file.txt that looks like
a b c d
and some more
and I'm reading it line-by-line with sed,
num_lines=$(cat some_file.txt | wc -l)
for i in $(seq 1 $num_lines); do
echo $(sed "${i}q;d" $file)
string=$(sed "${i}q;d" $file)
echo $string
done
I would expect the number of whitespace characters to stay the same, however the output I get is
a b c d
a b c d
and some more
and some more
So it seems that the problem is with sed removing the extra whitespace between chars, anyway to fix this?

Have a look at this example:
$ echo Hello World
Hello World
$ echo "Hello World"
Hello World
sed is not your problem, your problem is that bash removes the whitespaces when passing the output of sed into echo.
You just need to surround whatever echo is supposed to print with double quotation marks. So instead of
echo $(sed "${i}q;d" $file)
echo $string
You write
echo "$(sed "${i}q;d" $file)"
echo "$string"
The new script should look like this:
#!/usr/bin/env bash
file=some_file.txt
num_lines=$(cat some_file.txt | wc -l)
for i in $(seq 1 $num_lines); do
echo "$(sed "${i}q;d" $file)"
string=$(sed "${i}q;d" $file)
echo "$string"
done
prints the correct output:
a b c d
a b c d
and some more
and some more
However, if you just want to go through your file line by line, I strongly recommend something like this:
while IFS= read -r line; do
echo "$line"
done < some_file.txt
Question from the comments: What to do if you only want 33 lines starting from line x. One possible solution is this:
#!/usr/bin/env bash
declare -i s=$1
declare -i e=${s}+32
sed -n "${s},${e}p" $file | while IFS= read -r line; do
echo "$line"
done
(Note that I would probably include some validation of $1 in there as well.)
I declare s and e as integer variables, then even bash can do some simple arithmetic on them and calculate the actual last line to print.

Related

mv: Cannot stat - No such file or directory

I have piped the output of ls command into a file. The contents are like so:
[Chihiro]_Grisaia_no_Kajitsu_-_01_[1920x816_Blu-ray_FLAC][D2B961D6].mkv
[Chihiro]_Grisaia_no_Kajitsu_-_02_[1920x816_Blu-ray_FLAC][38F88A81].mkv
[Chihiro]_Grisaia_no_Kajitsu_-_03_[1920x816_Blu-ray_FLAC][410F74F7].mkv
My attempt to rename these episodes according to episode number is as follows:
cat grisaia | while read line;
#get the episode number
do EP=$(echo $line | egrep -o "_([0-9]{2})_" | cut -d "_" -f2)
if [[ $EP ]]
#escape special characters
then line=$(echo $line | sed 's/\[/\\[/g' | sed 's/\]/\\]/g')
mv "$line" "Grisaia_no_Kajitsu_${EP}.mkv"
fi
done
The mv commands exit with code 1 with the following error:
mv: cannot stat
'\[Chihiro\]_Grisaia_no_Kajitsu_-01\[1920x816_Blu-ray_FLAC\]\[D2B961D6\].mkv':
No such file or directory
What I really don't get is that if I copy the file that could not be stat and attempt to stat the file, it works. I can even take the exact same string that is output and execute the mv command individually.
If you surround your variable ($line) with double quotes (") you don't need to escape those special characters. So you have two options there:
Remove the following assignation completely:
then # line=$(echo $line | sed 's/\[/\\[/g' | sed 's/\]/\\]/g')`
or
Remove the double quotes in the following line:
mv $line "Grisaia_no_Kajitsu_${EP}.mkv"
Further considerations
Parsing the output of ls is never a good idea. Think about filenames with spaces. See this document for more information.
The cat here is unnecessary:
cat grisaia | while read line;
...
done
Use this instead to avoid an unnecessary pipe:
while read line;
...
done < grisaia
Why is good to avoid pipes in some scenarios? (answering comment)
Pipes create subshells (which are expensive), and you can also make some mistakes as the following:
last=""
cat grisaia | while read line; do
last=$line
done
echo $last # surprise!! it outputs an empty string
The reason is that $last inside the loop belongs to another subshell.
Now, see the same approach wothout pipes:
while read line; do
last=$line
done < grisaia
echo $last # it works as expected and prints the last line

Name (and set) variables in current shell, based on line input data

I have a SQL*Plus output written into a text file in the following format:
3459906| |2|X1|WAS1| Output1
334596| |2|X1|WAS2| Output1
3495792| |1|X1|WAS1| Output1
687954| |1|X1|WAS2| Output1
I need a shell script to fetch the counts which were at the beginning based on the text after the counts.
For example, If the Text is like |2|X1|WAS1| , then 3459906 should be passed on to a variable x1was12 and if the text is like |2|X1|WAS2| , then 334596 should be passed on to a variable x1was22.
I tried writing a for loop and if condition to pass on the counts, but was unsuccessful:
export filename1='file1.dat'
while read -r line ; do
if [[ grep -i "*|2|X1|WAS1| Output1*" | wc -l -eq 0 ]] ; then
export xwas12=sed -n ${line}p $filename1 | \
sed 's/[^0-9]*//g' | sed 's/..$//'
elif [[ grep -i "*|2|X1|WAS2| Output1*" | wc -l -eq 0 ]] ; then
export x1was22=sed -n ${line}p $filename1 | \
sed 's/[^0-9]*//g' | sed 's/..$//'
elif [[ grep -i "*|1|X1|WAS1| Output1*" | wc -l -eq 0 ]] ; then
export x1was11=sed -n ${line}p $filename1 | \
sed 's/[^0-9]*//g' | sed 's/..$//'
elif [[ grep -i "*|1|X1|WAS2| Output1*" | wc -l -eq 0 ]]
export x1was21=sed -n ${line}p $filename1 | \
sed 's/[^0-9]*//g' | sed 's/..$//'
fi
done < "$filename1"
echo '$x1was12' > output.txt
echo '$x1was22' >> output.txt
echo '$x1was11' >> output.txt
echo '$x1was21' >> output.txt
What I was trying to do was:
Go to the first line in the file
-> Search for the text and if found then assign the sed output to the variable
Then go to the second line of the file
-> Search for the texts in the if commands and assign the sed output to another variable.
same goes for other
while IFS='|' read -r count _ n x was _; do
# remove spaces from all variables
count=${count// /}; n=${n// /}; x=${x// /}; was=${was// /}
varname="${x}${was}${n}"
printf -v "${varname,,}" %s "$count"
done <<'EOF'
3459906| |2|X1|WAS1| Output1
334596| |2|X1|WAS2| Output1
3495792| |1|X1|WAS1| Output1
687954| |1|X1|WAS2| Output1
EOF
With the above executed:
$ echo "$x1was12"
3459906
Of course, the redirection from a heredoc could be replaced with a redirection from a file as well.
How does this work? Let's break it down:
Every time IFS='|' read -r count _ n x was _ is run, it reads a single line, separating it by |s, putting the first column into count, discarding the second by assigning it to _, reading the third into n, the fourth into x, the fifth into was, and the sixth and all following content into _. This practice is discussed in detail in BashFAQ #1.
count=${count// /} is a parameter expansion which prunes spaces from the variable count, by replacing all such spaces with empty strings. See also BashFAQ #100.
"${varname,,}" is another parameter expansion, this one converting a variable's contents to all-lowercase. (This requires bash 4.0; in prior versions, consider "$(tr '[:upper:]' '[:lower:]' <<<"$varname") as a less-efficient alternative).
printf -v "$varname" %s "value" is a mechanism for doing an indirect assignment to the variable named in the variable varname.
If not for the variable names, the whole thing could be done with two commands:
cut -d '|' -f1 file1.dat | tr -d ' ' > output.txt
The variable names make it more interesting. Two bash methods follow, plus a POSIX method...
The following bash code ought to do what the OP's sample code was
meant to do:
declare $(while IFS='|' read a b c d e f ; do
echo $a 1>&2 ; echo x1${e,,}$c=${a/ /}
done < file1.dat 2> output.txt )
Notes:
The bash shell is needed for ${e,,}, (turns "WAS" into "was"), and $a/ /} , (removes a leading space that might be in
$a), and declare.
The while loop parses file1.dat and outputs a bunch of variable assignments. Without the declare this code:
while IFS='|' read a b c d e f ; do
echo x1${e,,}$c=${a/ /} ;
done < file1.dat
Outputs:
x1was12=3459906
x1was22=334596
x1was11=3495792
x1was21=687954
The while loop outputs to two separate streams: stdout (for the declare), and stderr (using the 1>&2 and 2> redirects for
output.txt).
Using bash associative arrays:
declare -A x1was="( $(while IFS='|' read a b c d e f ; do
echo $a 1>&2 ; echo [${e/WAS/}$c]=${a/ /}
done < file1.dat 2> output.txt ) )"
In which case the variable names require brackets:
echo ${x1was[21]}
687954
POSIX shell code (tested using dash):
eval $(while IFS='|' read a b c d e f ; do
echo $a 1>&2; echo x1$(echo $e | tr '[A-Z]' '[a-z]')$c=$(echo $a)
done < file1.dat 2> output.txt )
eval should not be used if there's any doubt about what's in file1.dat. The above code assumes the data in file1.dat is
uniformly dependable.

Cut unix variable

I have the following at the moment:
for file in *
do
list="$list""$file "`cat $file | wc -l | sort -k1`$'\n'
done
echo "$list"
This is printing:
fileA 10
fileB 20
fileC 30
I would then like to cycle through $list and cut column 2 and perform calculations.
When I do:
for line in "$list"
do
noOfLinesInFile=`echo "$line" | cut -d\ -f2`
echo "$noOfLinesInFile"
done
It prints:
10
20
30
BUT, the for loop is only being entered once. In this example, it should be entering the loop 3 times.
Can someone please tell me what I should do here to achieve this?
If you quote the variable
for line in "$list"
there is only one word, so the loop is executed just once.
Without quotes, $line would be populated with any word found in the $list, which is not what you want, either, as it would process the values one by one, not lines.
You can set the $IFS variable to newline to split $list on newlines:
IFS=$'\n'
for line in $list ; do
...
done
Don't forget to reset IFS to the original value - either put the whole part into a subshell (if no variables should survive the loop)
(
IFS=$'\n'
for ...
)
or backup the value:
IFS_=$IFS
IFS=$'\n'
for ...
IFS=$IFS_
...
done
This is because list in shell are just defined using space as a separator.
# list="a b c"
# for i in $list; do echo $i; done
a
b
c
# for i in "$list"; do echo $i; done
a b c
in your first loop, you actually are not building a list in shell sens.
You should setting other than default separators either for the loop, in the append, or in the cut...
Use arrays instead:
#!/bin/bash
files=()
linecounts=()
for file in *; do
files+=("$file")
linecounts+=("$(wc -l < "$file")")
done
for i in "${!files[#]}" ;do
echo "${linecounts[i]}"
printf '%s %s\n' "${files[i]}" "${linecounts[i]}" ## Another form.
done
Although it can be done simpler as printf '%s\n' "${linecounts[#]}".
wc -l will only output one value, so you don't need to sort it:
for file in *; do
list+="$file "$( wc -l < "$file" )$'\n'
done
echo "$list"
Then, you can use a while loop to read the list line-by-line:
while read file nlines; do
echo $nlines
done <<< "$list"
That while loop is fragile if any filename has spaces. This is a bit more robust:
while read -a words; do
echo ${words[-1]}
done <<< "$list"

Concatenate strings in bash

I have in a bash script:
for i in `seq 1 10`
do
read AA BB CC <<< $(cat file1 | grep DATA)
echo ${i}
echo ${CC}
SORT=${CC}${i}
echo ${SORT}
done
so "i" is a integer, and CC is a string like "TODAY"
I would like to get then in SORT, "TODAY1", etc
But I get "1ODAY", "2ODAY" and so
Where is the error?
Thanks
You should try
SORT="${CC}${i}"
Make sure your file does not contain "\r" that would end just in the end of $CC.
This could well explain why you get "1ODAY".
Try including
|tr '\r' ''
after the cat command
try
for i in {1..10}
do
while read -r line
do
case "$line" in
*DATA* )
set -- $line
CC=$3
SORT=${CC}${i}
echo ${SORT}
esac
done <"file1"
done
Otherwise, show an example of file1 and your desired output
ghostdog is right: with the -r option, read avoids succumbing to potential horrors, like CRLFs. Using arrays makes the -r option more pleasant:
for i in `seq 1 10`
do
read -ra line <<< $(cat file1 | grep DATA)
CC="${line[3]}"
echo ${i}
echo ${CC}
SORT=${CC}${i}
echo ${SORT}
done

Randomizing arg order for a bash for statement

I have a bash script that processes all of the files in a directory using a loop like
for i in *.txt
do
ops.....
done
There are thousands of files and they are always processed in alphanumerical order because of '*.txt' expansion.
Is there a simple way to random the order and still insure that I process all of the files only once?
Assuming the filenames do not have spaces, just substitute the output of List::Util::shuffle.
for i in `perl -MList::Util=shuffle -e'$,=$";print shuffle<*.txt>'`; do
....
done
If filenames do have spaces but don't have embedded newlines or backslashes, read a line at a time.
perl -MList::Util=shuffle -le'$,=$\;print shuffle<*.txt>' | while read i; do
....
done
To be completely safe in Bash, use NUL-terminated strings.
perl -MList::Util=shuffle -0 -le'$,=$\;print shuffle<*.txt>' |
while read -r -d '' i; do
....
done
Not very efficient, but it is possible to do this in pure Bash if desired. sort -R does something like this, internally.
declare -a a # create an integer-indexed associative array
for i in *.txt; do
j=$RANDOM # find an unused slot
while [[ -n ${a[$j]} ]]; do
j=$RANDOM
done
a[$j]=$i # fill that slot
done
for i in "${a[#]}"; do # iterate in index order (which is random)
....
done
Or use a traditional Fisher-Yates shuffle.
a=(*.txt)
for ((i=${#a[*]}; i>1; i--)); do
j=$[RANDOM%i]
tmp=${a[$j]}
a[$j]=${a[$[i-1]]}
a[$[i-1]]=$tmp
done
for i in "${a[#]}"; do
....
done
You could pipe your filenames through the sort command:
ls | sort --random-sort | xargs ....
Here's an answer that relies on very basic functions within awk so it should be portable between unices.
ls -1 | awk '{print rand()*100, $0}' | sort -n | awk '{print $2}'
EDIT:
ephemient makes a good point that the above is not space-safe. Here's a version that is:
ls -1 | awk '{print rand()*100, $0}' | sort -n | sed 's/[0-9\.]* //'
If you have GNU coreutils, you can use shuf:
while read -d '' f
do
# some stuff with $f
done < <(shuf -ze *)
This will work with files with spaces or newlines in their names.
Off-topic Edit:
To illustrate SiegeX's point in the comment:
$ a=42; echo "Don't Panic" | while read line; do echo $line; echo $a; a=0; echo $a; done; echo $a
Don't Panic
42
0
42
$ a=42; while read line; do echo $line; echo $a; a=0; echo $a; done < <(echo "Don't Panic"); echo $a
Don't Panic
42
0
0
The pipe causes the while to be executed in a subshell and so changes to variables in the child don't flow back to the parent.
Here's a solution with standard unix commands:
for i in $(ls); do echo $RANDOM-$i; done | sort | cut -d- -f 2-
Here's a Python solution, if its available on your system
import glob
import random
files = glob.glob("*.txt")
if files:
for file in random.shuffle(files):
print file

Resources