If found a pattern, how to paste the last line before that contain another pattern in bash? - bash

After put a list of all folders and subfolders in a list.txt with the command ls -R, I have this kind of data:
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_Diadematidae/Sp_01:
DSCF0214.JPG
DSCF0215.JPG
DSCF0231.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae:
Sp_02
Sp_03
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_02:
DSCF8981.JPG
DSCF8988.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_03:
DSCF0638.JPG
Invertebrates/Phylum_echinoderma/Class_Holothuroidea/Fam_Stichopodidae:
Sp_07
Invertebrates/Phylum_echinoderma/Class_Holothuroidea/Fam_Stichopodidae/Sp_07:
DSCF0724.JPG
I would like to add a line code that which will allow to add the path before the pictures ("XXX.JPG").
So I tried to say in bash: "if there is the ".JPG" pattern, paste before the picture name the "last line before" that contain "/Sp*". And replace : by /.
In order to obtain this:
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_Diadematidae/Sp_01:
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_Diadematidae/Sp_01/DSCF0214.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_Diadematidae/Sp_01/DSCF0215.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_Diadematidae/Sp_01/DSCF0231.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae:
Sp_02
Sp_03
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_02:
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_02/DSCF8981.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_02/DSCF8988.JPG
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_03
Invertebrates/Phylum_echinoderma/Class_Echinoidea/Fam_PasDiadematidae/Sp_03/DSCF0638.JPG
Invertebrates/Phylum_echinoderma/Class_Holothuroidea/Fam_Stichopodidae:
Sp_07
Invertebrates/Phylum_echinoderma/Class_Holothuroidea/Fam_Stichopodidae/Sp_07:
Invertebrates/Phylum_echinoderma/Class_Holothuroidea/Fam_Stichopodidae/Sp_07/DSCF0724.JPG
I did'nt found a way to explain to bash "the last line before" that contain "/Sp*".
This is my code:
# Find the .JPG pattern and catch the picture name ("(.*\).JPG") and add "the last line before" that contain "/Sp*" and reput the .JPG pattern with the picture name:
sed 's/\(.*\).JPG/"the last line before" that contain "/Sp*""\1.JPG/' list.txt > list2.txt
sed -e 's/\:/\//g' list2.txt > list3.txt
Any advice to help me to complete this part of code is greatly appreciated.

While there are be a better alternative for getting the list of files, if that is not an option, for you specific problem if would write a simple bash script.
prefix=""
outfile=list2.txt
> $outfile # clean any existing file content, remove if not expected
while read -r line; do
if [[ $line =~ (.*):$ ]]; then
echo $line >> $outfile
prefix="${BASH_REMATCH[1]}"
elif [[ $line =~ \.JPG$ ]]; then
echo "${prefix}/${line}" >> $outfile
else
echo "${line}" >> $outfile
fi
done < list.txt

If I understand your question correctly you are actually looking for a way to find all files in this folder and all sub-folders and get the full path to them. If that is the case you should use find instead of ls. Like:
find .
or if you do want the full path from root you could do:
find /home/yourname/thedirectory/you/are/looking/in

if your data in 'd' file, try gnu sed:
sed -E '/Sp_[0-9]+:$/{h;p;:c N;/\.JPG$/{s!:\n\s*!/!p;g;bc}; z}' d

Although misguided, it is possible to do with sed :
sed -n -e '/:$/{p;s#:$#/#;h}' -e '/\.JPG$/{H;x;h;s/\n//;p;x;s/\n.*//;h}'
You can try it here.
The first expression is used when a directory is encountered (based on the fact that the line ends with :), prints it and saves the directory path in the hold buffer after having replaced the : by the / path-separator.
The second expression is used when a .JPG file is encountered, and does this sequence of action :
appends the line to the hold buffer (pattern space : picture.JPG ; hold buffer : dir/\npicture.JPG)
exchange the pattern space and the hold buffer (pattern space : dir/\npicture.JPG ; hold buffer : picture.jpg)
saves the pattern space to the hold buffer (pattern space : dir/\npicture.JPG ; hold buffer : dir/\npicture.JPG)
removes the linefeed from the pattern space (pattern space : dir/picture.JPG ; hold buffer : dir/\npicture.JPG)
prints the pattern space (buffers unchanged)
exchange the hold buffer and pattern space (pattern space : dir/\npicture.JPG ; hold buffer : dir/picture.JPG)
removes the linefeed and what follows from the pattern space (pattern space : dir/ ; hold buffer : dir/picture.JPG)
saves the pattern space to the hold buffer (pattern space : dir/ ; hold buffer : dir/)

Related

Rename multiple datetime files in Unix by inserting - and _ characters

I have many files in a directory that I want to rename so that they are recognizable according to a certain convention:
SURFACE_OBS:2019062200
SURFACE_OBS:2019062206
SURFACE_OBS:2019062212
SURFACE_OBS:2019062218
SURFACE_OBS:2019062300
etc.
How can I rename them in UNIX to be as follows?
SURFACE_OBS:2019-06-22_00
SURFACE_OBS:2019-06-22_06
SURFACE_OBS:2019-06-22_12
SURFACE_OBS:2019-06-22_18
SURFACE_OBS:2019-06-23_00
A bash shell loop using mv and parameter expansion could do it:
for file in *:[[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]]
do
prefix=${file%:*}
suffix=${file#*:}
mv -- "${file}" "${prefix}:${suffix:0:4}-${suffix:4:2}-${suffix:6:2}_${suffix:8:2}"
done
This loop picks up every file that matches the pattern:
* -- anything
: -- a colon
[[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]] -- 10 digits
... and then renames it by inserting dashes and and underscore in the desired locations.
I've chosen the wildcard for the loop carefully so that it tries to match the "input" files and not the renamed files. Adjust the pattern as needed if your actual filenames have edge cases that cause the wildcard to fail (and thus rename the files a second time).
#!/bin/bash
strindex() {
# get position of character in string
x="${1%%"$2"*}"
[[ "$x" = "$1" ]] && echo -1 || echo "${#x}"
}
get_new_filename() {
# change filenames like: SURFACE_OBS:2019062218
# into filenames like: SURFACE_OBS:2019-06-22_18
src_str="${1}"
# add last underscore 2 characters from end of string
final_underscore_pos=${#src_str}-2
src_str="${src_str:0:final_underscore_pos}_${src_str:final_underscore_pos}"
# get position of colon in string
colon_pos=$(strindex "${src_str}" ":")
# get dash locations relative to colon position
y_dash_pos=${colon_pos}+5
m_dash_pos=${colon_pos}+8
# now add dashes in date
src_str="${src_str:0:y_dash_pos}-${src_str:y_dash_pos}"
src_str="${src_str:0:m_dash_pos}-${src_str:m_dash_pos}"
echo "${src_str}"
}
# accept path as argument or default to /tmp/baz/data
target_dir="${1:-/tmp/baz/data}"
while read -r line ; do
# since file renaming depends on position of colon extract
# base filename without path in case path has colons
base_dir=${line%/*}
filename_to_change=$(basename "${line}")
echo "mv ${line} ${base_dir}/$(get_new_filename "${filename_to_change}")"
# find cmd attempts to exclude files that have already been renamed
done < <(find "${target_dir}" -name 'SURFACE*' -a ! -name '*_[0-9]\{2\}$')

Using sed in order to change a specific character in a specific line

I'm a beginner in bash and here is my problem. I have a file just like this one:
Azzzezzzezzzezzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I try in a script to edit this file.ABC letters are unique in all this file and there is only one per line.
I want to replace the first e of each line by a number who can be :
1 in line beginning with an A,
2 in line beginning with a B,
3 in line beginning with a C,
and I'd like to loop this in order to have this type of result
Azzz1zzz5zzz1zzz...
Bzzz2zzz4zzz5zzz...
Czzz3zzz6zzz3zzz...
All the numbers here are random int variables between 0 and 9. I really need to start by replacing 1,2,3 in first exec of my loop, then 5,4,6 then 1,5,3 and so on.
I tried this
sed "0,/e/s/e/$1/;0,/e/s/e/$2/;0,/e/s/e/$3/" /tmp/myfile
But the result was this (because I didn't specify the line)
Azzz1zzz2zzz3zzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I noticed that doing sed -i "/A/ s/$/ezzz/" /tmp/myfile will add ezzz at the end of A line so I tried this
sed -i "/A/ 0,/e/s/e/$1/;/B/ 0,/e/s/e/$2/;/C/ 0,/e/s/e/$3/" /tmp/myfile
but it failed
sed: -e expression #1, char 5: unknown command: `0'
Here I'm lost.
I have in a variable (let's call it number_of_e_per_line) the number of e in either A, B or C line.
Thank you for the time you take for me.
Just apply s command on the line that matches A.
sed '
/^A/{ s/e/$1/; }
/^B/{ s/e/$2/; }
# or shorter
/^C/s/e/$3/
'
s command by default replaces the first occurrence. You can do for example s/s/$1/2 to replace the second occurrence, s/e/$1/g (like "Global") replaces all occurrences.
0,/e/ specifies a range of lines - it filters lines from the first up until a line that matches /e/.
sed is not part of Bash. It is a separate (crude) programming language and is a very standard command. See https://www.grymoire.com/Unix/Sed.html .
Continuing from the comment. sed is a poor choice here unless all your files can only have 3 lines. The reason is sed processes each line and has no way to keep a separate count for the occurrences of 'e'.
Instead, wrapping sed in a script and keeping track of the replacements allows you to handle any file no matter the number of lines. You just loop and handle the lines one at a time, e.g.
#!/bin/bash
[ -z "$1" ] && { ## valiate one argument for filename provided
printf "error: filename argument required.\nusage: %s filename\n" "./$1" >&2
exit 1
}
[ -s "$1" ] || { ## validate file exists and non-empty
printf "error: file not found or empty '%s'.\n" "$1"
exit 1
}
declare -i n=1 ## occurrence counter initialized 1
## loop reading each line
while read -r line || [ -n "$line" ]; do
[[ $line =~ ^.*e.*$ ]] || continue ## line has 'e' or get next
sed "s/e/1/$n" <<< "$line" ## substitute the 'n' occurence of 'e'
((n++)) ## increment counter
done < "$1"
Your data file having "..." at the end of each line suggests your files is larger than the snippet posted. If you have lines beginning 'A' - 'Z', you don't want to have to write 26 separate /match/s/find/replace/ substitutions. And if you have somewhere between 3 and 26 (or more), you don't want to have to rewrite a different sed expression for every new file you are faced with.
That's why I say sed is a poor choice. You really have no way to make the task a generic task with sed. The downside to using a script is it will become a poor choice as the number of records you need to process increase (over 100000 or so just due to efficiency)
Example Use/Output
With the script in replace-e-incremental.sh and your data in file, you would do:
$ bash replace-e-incremental.sh file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...
To Modify file In-Place
Since you make multiple calls to sed here, you need to redirect the output of the file to a temporary file and then replace the original by overwriting it with the temp file, e.g.
$ bash replace-e-incremental.sh file > mytempfile && mv -f mytempfile file
$ cat file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...

how to add a word at the end of a line with ^ without a line break?

I would like to add a string (example: "1565555555") at the end of a particular line in my file.
My file .txt before :
mystrinsdsfssffdfdg
mystrdsfdsfdfffding
mystrsfdsdfsffdfing
mystrdsfdfsdfsffing
Here is my script:
for file in mydirectory/*txt; do
filename=`basename "$file"`
# read each line
while IFS= read -r line
do
old="$IFS"
IFS="^"
set $line
IFS="$old"
count=1
id="2656556655"
sed "s/$line/&^$id/" -i $file #my problem
((count++))
done < "$file"
done
Today, my result :
mystrinsdsfssffdfdg
^2656556655
mystrdsfdsfdfffding
^2656556655
mystrsfdsdfsffdfing
^2656556655
mystrdsfdfsdfsffing
^2656556655
Expected result :
mystrinsdsfssffdfdg^2656556655
mystrdsfdsfdfffding^2656556655
mystrsfdsdfsffdfing^2656556655
mystrdsfdfsdfsffing^2656556655
Assuming the objective is to append a string (^2656556655) on the end of every line in a given file ...
One sample file:
$ cat mystring.txt
mystrinsdsfssffdfdg
mystrdsfdsfdfffding
mystrsfdsdfsffdfing
mystrdsfdfsdfsffing
One sed solution that appends to the end of every line in the file:
$ sed 's/$/^2656556655/g' mystring.txt
mystrinsdsfssffdfdg^2656556655
mystrdsfdsfdfffding^2656556655
mystrsfdsdfsffdfing^2656556655
mystrdsfdfsdfsffing^2656556655
One benefit to this method is that you replace a) the inner looping construct and the repeated sed calls for each line in the file with b) a single sed call and a single pass through the input file. Net result is that you should see a noticeable speed up in the time it takes to process a given file.

How to replace lower case with sed

SET_VALUE(ab.ms.r.gms_dil_cfg.f().gms_dil_mode, dsad_sd );
How can I use sed to replace only from the SET_VALUE until the , with each letter after _ to be upper case?
result:
SET_VALUE(ab.ms.r.gmsDilCfg.f().gmsDilMode, dsad_sd );
For your input string you may apply the following sed expression + bash variable substitution:
s="SET_VALUE(ab.ms.r.gms_dil_cfg.f().gms_dil_mode, dsad sd )"
res=$(sed '1s/_\([a-z]\)/\U\1/g;' <<< "${s%,*}"),${s#*,}
echo "$res"
The output:
SET_VALUE(ab.ms.r.gmsDilCfg.f().gmsDilMode, dsad_sd );
Got distracted while writing this one up so Roman beat me to the punch, but this has a slight variation so figured I'd post it as another option ...
$ s="SET_VALUE(ab.ms.r.gms_dil_cfg.f().gms_dil_mode, dsad_sd );"
$ sed 's/,/,\n/g' <<< "$s" | sed -n '1{s/_\([a-z]\)/\U\1/g;N;s/\n//;p}'
SET_VALUE(ab.ms.r.gmsDilCfg.f().gmsDilMode, dsad_sd );
s/,/,\n/g : break input into separate lines at the comma (leave comma on first line, push rest of input to a second line)
at this point we've broken our input into 2 lines; the second sed invocation will now be working with a 2-line input
sed -n : refrain from printing input lines as they're processed; we'll explicitly print lines when required
1{...} : for the first line, apply the commands inside the braces ...
s/_\([a-z]\)/\U\1/g : for each pattern we find like '_[a-z]', save the [a-z] in buffer #1, and replace the pattern with the upper case of the contents of buffer #1
at this point we've made the desired edits to line #1 (ie, everything before the comma in the original input), now ...
N : read and append the next line into the pattern space
s/\n// : replace the carriage return with a null character
at this point we've pasted lines #1 and #2 together into a single line
p : print the pattern space

Displaying only single most recent line of a command's output

How can I print a command output like one from rm -rv * in a single line ? I think it would need \r but I can't figure out how.
I would need to have something like this :
From:
removed /path/file1
removed /path/file2
removed /path/file3
To : Line 1 : removed /path/file1
Then : Line 1 : removed /path/file2
Then : Line 1 : removed /path/file3
EDIT : I may have been misunderstood, I want to have the whole process beeing printing in a single same line, changing as the command outputs an another line (like removed /path/file123)
EDIT2 : The output is sometimes too long to be display in on line (very long path). I would need something that considers that problem too :
/very/very/very/long/path/to/a/very/very/very/far/file/with-a-very-very-very-long-name1
/very/very/very/long/path/to/a/very/very/very/far/file/with-a-very-very-very-long-name2
/very/very/very/long/path/to/a/very/very/very/far/file/with-a-very-very-very-long-name3
Here's a helper function:
shopt -s checkwinsize # ensure that COLUMNS is available w/ window size
oneline() {
local ws
while IFS= read -r line; do
if (( ${#line} >= COLUMNS )); then
# Moving cursor back to the front of the line so user input doesn't force wrapping
printf '\r%s\r' "${line:0:$COLUMNS}"
else
ws=$(( COLUMNS - ${#line} ))
# by writing each line twice, we move the cursor back to position
# thus: LF, content, whitespace, LF, content
printf '\r%s%*s\r%s' "$line" "$ws" " " "$line"
fi
done
echo
}
Used as follows:
rm -rv -- * 2>&1 | oneline
To test this a bit more safely, one might use:
for f in 'first line' 'second line' '3rd line'; do echo "$f"; sleep 1; done | oneline
...you'll see that that test displays first line for a second, then second line for a second, then 3rd line for a second.
If you want a "status line" result that is showing the last line output by the program where the line gets over-written by the next line when it comes out you can send the output for the command through a short shell while loop like this:
YourCommand | while read line ; do echo -n "$line"$' ...[lots of spaces]... \r' ; done
The [Lots of spaces] is needed in case a shorter line comes after a longer line. The short line needs to overwrite the text from the longer line or you will see residual characters from the long line.
The echo -n $' ... \r' sends a literal carriage return without a line-feed to the screen which moves the position back to the front of the line but doesn't move down a line.
If you want the text from your command to just be output in 1 long line, then
pipe the output of any command through this sed command and it should replace the carriage returns with spaces. This will put the output all on one line. You could change the space to another delimiter if desired.
your command | sed ':rep; {N;}; s/\n/ /; {t rep};'
:rep; is a non-command that marks where to go to in the {t rep} command.
{N;} will join the current line to the next line.
It doesn't remove the carriage return but just puts the 2 lines in the buffer to be used for following commands.
s/\n/ /; Says to replace the carriage return character with a space character. They space is between the second and third/ characters.
You may need to replace \r\n depending on if the file has line feeds. UNIX files don't unless they came from a pc and haven't been converted.
{t rep}; says that if the match was found in the s/// command then go to the :rep; marker.
This will keep joining lines, removing the \n, then jumping to :rep; until there are no more likes to join.

Resources