bash loop through file replace string - bash

I have a file called file.txt that contains the following:
123
223
Lane,id,s_id_sample_id
1,3_range.single_try,N76
2,44_range.single_try,N77
3,92_out_range.double_try,N79
I like to loop through this file and do the following:
begin from line after 'Lane' then split using comma and take the second column (id)
then take the id column and split on underscore, then
search and replace all dots and underscores with 'X' EXCEPT THE LAST TWO UNDERSCORES. So do not search and replace the last underscore (e.g. double_try).
So will like to end up with:
123
223
Lane,id,s_id_sample_id
1,3Xrange_single_try,N76
2,44Xrange_single_try,N77
3,92XoutXrange_double_try,N79
This is what I have done:
while IFS=',' read -r f1 f2; do
sed -e 's/_/X/g;s/\./X/g;s/'
echo "$f1,$f2"
done < "$file" > output
mv output $file
The problem is how can I specify to ignore the last two underscores?

This works by first replacing the last two dots or underscores with '#', then replacing the remaining dots and underscores with 'X', and finally, replacing all the '#' characters with underscores:
IFS=','
while read -r f1 f2 f3; do
f2=$(sed 's/[._]\([^._]\+\)[._]\([^._]\+\)$/#\1#\2/;s/[._]/X/g;s/#/_/g' <<< "$f2")
echo -n "$f1"
[[ -n $f2 ]] && echo -n ",$f2"
[[ -n $f3 ]] && echo -n ",$f3"
echo
done < "$file" > output
mv output "$file"
If '#' is likely to occur in your input data, you may want to use a different character. Anything that you can be reasonably sure won't occur in your input will do.

Related

UNIX :: Padding for files containing string and multipleNumber

I have many files not having consistent filenames.
For example
IMG_20200823_1.jpg
IMG_20200823_10.jpg
IMG_20200823_12.jpg
IMG_20200823_9.jpg
I would like to rename all of them and ensure they all follow same naming convention
IMG_20200823_0001.jpg
IMG_20200823_0010.jpg
IMG_20200823_0012.jpg
IMG_20200823_0009.jpg
Found out it's possible to change for file having only a number using below
printf "%04d\n"
However am not able to do with my files considering they mix string + "_" + different numbers.
Could anyone help me ?
Thanks !
With Perl's standalone rename or prename command:
rename -n 's/(\d+)(\.jpg$)/sprintf("%04d%s",$1,$2)/e' *.jpg
Output:
rename(IMG_20200823_10.jpg, IMG_20200823_0010.jpg)
rename(IMG_20200823_12.jpg, IMG_20200823_0012.jpg)
rename(IMG_20200823_1.jpg, IMG_20200823_0001.jpg)
rename(IMG_20200823_9.jpg, IMG_20200823_0009.jpg)
if everything looks fine, remove -n.
With Bash regular expressions:
re='(IMG_[[:digit:]]+)_([[:digit:]]+)'
for f in *.jpg; do
[[ $f =~ $re ]]
mv "$f" "$(printf '%s_%04d.jpg' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}")"
done
where BASH_REMATCH is an array containing the capture groups of the regular expression. At index 0 is the whole match; index 1 contains IMG_ and the first group of digits; index 2 contains the second group of digits. The printf command is used to format the second group with zero padding, four digits wide.
Use a regex to extract the relevant sub-strings from the input and then pad it...
For each file.
Extract the prefix, number and suffix from the filename.
Pad the number with zeros.
Create the new filename.
Move files
The following code for bash:
echo 'IMG_20200823_1.jpg
IMG_20200823_10.jpg
IMG_20200823_12.jpg
IMG_20200823_9.jpg' |
while IFS= read -r file; do # foreach file
# Use GNU sed to extract parts on separate lines
tmp=$(<<<"$file" sed 's/\(.*_\)\([0-9]*\)\(\..*\)/\1\n\2\n\3\n/')
# Read the separate parts separated by newlines
{
IFS= read -r prefix
IFS= read -r number
IFS= read -r suffix
} <<<"$tmp"
# create new filename
newfilename="$prefix$(printf "%04d" "$number")$suffix"
# move the files
echo mv "$file" "$newfilename"
done
outputs:
mv IMG_20200823_1.jpg IMG_20200823_0001.jpg
mv IMG_20200823_10.jpg IMG_20200823_0010.jpg
mv IMG_20200823_12.jpg IMG_20200823_0012.jpg
mv IMG_20200823_9.jpg IMG_20200823_0009.jpg
Being puzzled by your hint at printf...
Current folder content:
$ ls -1 IMG_*
IMG_20200823_1.jpg
IMG_20200823_21.jpg
Surely is not a good solution but with printf and sed we can do that:
$ printf "mv %3s_%8s_%d.%3s %3s_%8s_%04d.%3s\n" $(ls -1 IMG_* IMG_* | sed 's/_/ /g; s/\./ /')
mv IMG_20200823_1.jpg IMG_20200823_0001.jpg
mv IMG_20200823_21.jpg IMG_20200823_0021.jpg

How to remove last part of a string with different length in bash

I am trying to collect the lines from a file which doesn't start with a # as its first caracter.
I have this code I am able to get them:
while IFS= read -r line
do
[[ -z "$line" ]] && continue
[[ "$line" =~ ^# ]] && continue
#echo "LINEREADED: $line"
done < $file
So the output I have is something like this:
modules/core_as/xxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxx [100]
My question is how can I get only the string without the [100]?
I know there is some commands like sed or trim but the problem is that the string is not always that length, sometimes is different like:
cross_modules/core_as/xxxx/xxxxxxxxx [100-103]
or
cross_modules/core_as/xxxxxxxxxxxx/xxxxxxxxx [100-103]
or anything like that...
And in all this cases I only need the string without the [....] and without the last blank space at the end of last x, whichever the length of the string is, like cross_modules/core_as/xxxxxxxxxxxx/xxxxxxxxx
echo ${caseReaded:1:${#caseReaded}-7}
This also do the job but is not generic for any length.
Does anyone knows how I can get this?
You can strip a certain part of a string in bash
echo "${line% [*}"
cross_modules/core_as/xxxx/xxxxxxxxx
modules/core_as/xxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxx
cross_modules/core_as/xxxxxxxxxxxx/xxxxxxxxx
If the spaces are only before [:
while IFS= read -r line _
do
[[ -z $line ]] && continue
[[ $line =~ ^# ]] && continue
done < "$file"
grep to match all lines not starting with # and then display the first field using cut, which works if the first field doesn't contain spaces:
grep -v ^# "$file" | cut -f1 -d' '
If the thing before [100] contains spaces, this may be the way to go:
grep -v ^# "$file" | sed -E 's/^(.*) .*$/\1/'
The last one works because the .* match in sed is greedy so only the last space will be left to match the outer condition .*$.

bash while loop "eats" my space characters

I am trying to parse a huge text file, say 200mb.
the text file contains some strings
123
1234
12345
12345
so my script looked like
while read line ; do
echo "$line"
done <textfile
however using this above method, my string " 12345" gets truncated to "12345"
I tried using
sed -n "$i"p textfile
but the the throughput is reduced from 27 to 0.2 lines per second, which is inacceptable ;-)
any Idea howto solve this?
You want to echo the lines without a fieldsep:
while IFS="" read line; do
echo "$line"
done <<< " 12345"
When you also want to skip interpretation of special characters, use
while IFS="" read -r line; do
echo "$line"
done <<< " 12345"
You can write the IFS without double quotes:
while IFS= read -r line; do
echo "$line"
done <<< " 12345"
This seems to be what you're looking for:
while IFS= read line; do
echo "$line"
done < textfile
The safest method is to use read -r in comparison to just read which will skip interpretation of special characters (thanks Walter A):
while IFS= read -r line; do
echo "$line"
done < textfile
OPTION 1:
#!/bin/bash
# read whole file into array
readarray -t aMyArray < <(cat textfile)
# echo each line of the array
# this will preserve spaces
for i in "${aMyArray[#]}"; do echo "$i"; done
readarray -- read lines from standard input
-t -- omit trailing newline character
aMyArray -- name of array to store file in
< <() -- execute command; redirect stdout into array
cat textfile -- file you want to store in variable
for i in "${aMyArray[#]}" -- for every element in aMyArray
"" -- needed to maintain spaces in elements
${ [#]} -- reference all elements in array
do echo "$i"; -- for every iteration of "$i" echo it
"" -- to maintain variable spaces
$i -- equals each element of the array aMyArray as it cycles through
done -- close for loop
OPTION 2:
In order to accommodate your larger file you could do this to help alleviate the work and speed up the processing.
#!/bin/bash
sSearchFile=textfile
sSearchStrings="1|2|3|space"
while IFS= read -r line; do
echo "${line}"
done < <(egrep "${sSearchStrings}" "${sSearchFile}")
This will grep the file (faster) before it cycles it through the while command. Let me know how this works for you. Notice you can add multiple search strings to the $sSearchStrings variable.
OPTION 3:
and an all in one solution to have a text file with your search criteria and everything else combined...
#!/bin/bash
# identify file containing search strings
sSearchStrings="searchstrings.file"
while IFS= read -r string; do
# if $sSearchStrings empty read in strings
[[ -z $sSearchStrings ]] && sSearchStrings="${string}"
# if $sSearchStrings not empty read in $sSearchStrings "|" $string
[[ ! -z $sSearchStrings ]] && sSearchStrings="${sSearchStrings}|${string}"
# read search criteria in from file
done <"${sSearchStrings}"
# identify file to be searched
sSearchFile="text.file"
while IFS= read -r line; do
echo "${line}"
done < <(egrep "${sSearchStrings}" "${sSearchFile}")

Checking for empty lines in a file

I don't have a code example here since I'm not sure how to do this at all, but I have a file. A legal empty line is one that only contains the new-line tab. Spaces or tabs are illegal.
How do I check if a line is "legally empty"?
If it doesn't have any words (I can check this with wc -w), how do I check if it has no spaces or tabs either, just new-line?
So I've tried something like this:
while read line; do
if [[ "$line" =~ ^$ ]]; then
echo empty line
continue
fi
done < $1
But it's not working. If I put a " " in an otherwise empty line, it still considers it empty.
If you want the line numbers of those empty lines:
perl -lne 'print $. if(/^$/)' your_file
If you want to delete those lines without Perl:
grep . your_file >new_file
If you want to delete those empty line in place using Perl:
perl -i -lne 'print if(/./)' your_file
Terminology: a line that contains only white space is a blank line. A line that contains nothing (except for the newline terminator) is an empty line.
The read builtin strips off leading and trailing whitespace. So if it encounters a blank line, it sets its argument to an empty string, regardless of the amount of whitespace. To avoid this behavior and return the input line unmodified, set the field separator characters to nothing (by default, they are space, tab and newline): set the IFS variable to the empty string. See Why is while IFS= read used so often, instead of IFS=; while read..? for a more detailed explanation. While you're at it, pass the -r option to read, unless you want backslash-newline sequences to be a line continuation.
while IFS= read -r line; do
if [ -z "$line" ]; then
echo empty line
fi
done <"$1"
If you want to tell whether a line is blank:
while IFS= read -r line; do
case "$line" in
'') echo "empty line";;
*[![:space:]]*) echo "non-blank line";;
*) echo "non-empty blank line";;
esac
done <"$1"
You can use Bash regular expression matching if you prefer:
while IFS= read -r line; do
if [[ "$line" =~ ^$ ]]; then
echo "empty line"
elif [[ "$line" =~ ^[[:space:]]+$ ]]; then
echo "non-empty blank line"
else
echo "non-blank line"
fi
done <"$1"
These can be done with pattern matching too (using shell wildcards, which have a different syntax from common regular expressions):
while IFS= read -r line; do
if [[ "$line" == "" ]]; then
echo "empty line"
elif [[ "$line" != *[![:space:]]* ]]; then
echo "non-empty blank line"
else
echo "non-blank line"
fi
done <"$1"
If you merely want to look for empty lines in the file and aren't processing the lines in any other way, you can use grep:
if grep -qxF '' <"$1"; then
echo "$1 contains an empty line"
fi
If you're looking for blank lines that are not empty:
if grep -Ex '[[:space:]]+' <"$1"; then
echo "$1 contains a non-empty blank line"
fi
You can check for an empty line with the regex
^$
^ is the beginning of a line, $ is the end of a line, the above regex matches if there are no other characters.
You can now use that in e.g. sed
sed '/^$/d' input.txt
This would delete all empty lines from your input file.
This would remove empty lines from the file and display the file content on console. The file still remains unchanged.
If you want to remove the empty lines from the file (meaning, changing the file content), then run:
sed -i '/^$/d' input.txt

Bash script get item from array

I'm trying to read file line by line in bash.
Every line has format as follows text|number.
I want to produce file with format as follows text,text,text etc. so new file would have just text from previous file separated by comma.
Here is what I've tried and couldn't get it to work :
FILENAME=$1
OLD_IFS=$IFSddd
IFS=$'\n'
i=0
for line in $(cat "$FILENAME"); do
array=(`echo $line | sed -e 's/|/,/g'`)
echo ${array[0]}
i=i+1;
done
IFS=$OLD_IFS
But this prints both text and number but in different format text number
here is sample input :
dsadadq-2321dsad-dasdas|4212
dsadadq-2321dsad-d22as|4322
here is sample output:
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
What did I do wrong?
Not pure bash, but you could do this in awk:
awk -F'|' 'NR>1{printf(",")} {printf("%s",$1)}'
Alternately, in pure bash and without having to strip the final comma:
#/bin/bash
# You can get your input from somewhere else if you like. Even stdin to the script.
input=$'dsadadq-2321dsad-dasdas|4212\ndsadadq-2321dsad-d22as|4322\n'
# Output should be reset to empty, for safety.
output=""
# Step through our input. (I don't know your column names.)
while IFS='|' read left right; do
# Only add a field if it exists. Salt to taste.
if [[ -n "$left" ]]; then
# Append data to output string
output="${output:+$output,}$left"
fi
done <<< "$input"
echo "$output"
No need for arrays and sed:
while IFS='' read line ; do
echo -n "${line%|*}",
done < "$FILENAME"
You just have to remove the last comma :-)
Using sed:
$ sed ':a;N;$!ba;s/|[0-9]*\n*/,/g;s/,$//' file
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
Alternatively, here is a bit more readable sed with tr:
$ sed 's/|.*$/,/g' file | tr -d '\n' | sed 's/,$//'
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
Choroba has the best answer (imho) except that it does not handle blank lines and it adds a trailing comma. Also, mucking with IFS is unnecessary.
This is a modification of his answer that solves those problems:
while read line ; do
if [ -n "$line" ]; then
if [ -n "$afterfirst" ]; then echo -n ,; fi
afterfirst=1
echo -n "${line%|*}"
fi
done < "$FILENAME"
The first if is just to filter out blank lines. The second if and the $afterfirst stuff is just to prevent the extra comma. It echos a comma before every entry except the first one. ${line%|\*} is a bash parameter notation that deletes the end of a paramerter if it matches some expression. line is the paramter, % is the symbol that indicates a trailing pattern should be deleted, and |* is the pattern to delete.

Resources