How to extract strings from a text in shell

How to extract strings from a text in shell - shell

I have a file name
"PHOTOS_TIMESTAMP_5373382"
I want to extract from this filename "PHOTOS_5373382" and add "ABC" i.e. finally want it to look like
"abc_PHOTOS_5373382" in shell script.

echo "PHOTOS_TIMESTAMP_5373382" | awk -F"_" '{print "ABC_"$1"_"$3}'
echo will provide input for awk command.
awk command does the data tokenization on character '_' of input using the option -F.
Individual token (starting from 1) can be accessed using $n, where n is the token number.

You will need the following sequence of commands directly on your shell, preferably bash shell (or) as a complete script which takes a single argument the file to be converted
#!/bin/bash
myFile="$1" # Input argument (file-name with extension)
filename=$(basename "$myFile") # Getting the absolute file-path
extension="${filename##*.}" # Extracting the file-name part without extension
filename="${filename%.*}" # Extracting the extension part
IFS="_" read -r string1 string2 string3 <<<"$filename" # Extracting the sub-string needed from the original file-name with '_' de-limiter
mv -v "$myFile" ABC_"$string1"_"$string3"."$extension" # Renaming the actual file
On running the script as
$ ./script.sh PHOTOS_TIMESTAMP_5373382.jpg
`PHOTOS_TIMESTAMP_5373382.jpg' -> `ABC_PHOTOS_5373382.jpg'

Although I like awk
Native shell solution
k="PHOTOS_TIMESTAMP_5373382"
IFS="_" read -a arr <<< "$k"
echo abc_${arr[0]}_${arr[2]}
Sed solution
echo "abc_$k" | sed -e 's/TIMESTAMP_//g'
abc_PHOTOS_5373382

Related

How to print just the file name when looping through all files of a directory

Imagine there are these 3 subdirectories inside my directory:
dfcg7 yhjs6 gbggse3
Inside each of this subdirectories there is a txt file, which I would like to use in another program, so I would like to print all the relative paths to this files.
I am trying:
for file in /mnt/lustre/mydir*
do
printf "$file/*.txt \t"
done
and I also tried:
for file in /mnt/lustre/mydir*
do
printf "$file"/*.txt "\t"
done
but in both cases, my output is this:
/mnt/lustre/mydir/dfcg7/*txt/mnt/lustre/mydir/yhjs6/*txt/mnt/lustre/mydir/gbggse3/*txt
My output is no tab separated
It is printing the full path, instead of the relative
It is not printing the file name inside each subdirectory
So, my desired output would be this:
dfcg7/fileA.txt yhjs6/fileB.txt gbggse3/fileC.txt
How can I solve this?

You could store the path prefix in a variable:
prefix=/mnt/lustre/mydir
Assign the files to an array:
files=("$prefix"/*/*.txt)
And then print the array, tab separated, while removing the prefix from each element:
$ (IFS=$'\t'; printf '%s\n' "${files[*]/#"$prefix"\/}")
dfcg7/fileA.txt gbggse3/fileC.txt yhjs6/fileB.txt
This uses a subshell to contain the scope of the modified IFS.

This is a quick solution that I've been able to come up with
RESULT=""
for file in /path/to/you/files/*; do
OUTPUT=$(ls $file/*txt)
RESULT="$RESULT:$OUTPUT"
done
# replace every : with a tab symbol and remove the first tab
echo $RESULT | tr ':' '\t' | gsed -r 's/^\s+//g'
Notice that I've used gsed here. It is a GNU sed available for MacOS. If you are using Linux you can simply use sed.

You can try.
for file in /mnt/lustre/mydir/*/*.txt; do
printf '%s\t%s\t%s' "${file#*/*/*/*/}"
done
Output
dfcg7/fileA.txt gbggse3/fileC.txt yhjs6/fileB.txt

Try:
cd /mnt/lustre/mydir
for file in */*.txt
do
printf "%s\t" "$file"
done
printf "\n"
Or easier:
(cd /mnt/lustre/mydir; ls -d */*.txt | tr '\n' '\t'); echo
The last echo is to append a \n at the end, but if you don't need it, you can leave it out.

Read file by line then process as different variable

I have created a text file with a list of file names like below
022694-39.tar
022694-39.tar.2017-05-30_13:56:33.OLD
022694-39.tar.2017-07-04_09:22:04.OLD
022739-06.tar
022867-28.tar
022867-28.tar.2018-07-18_11:59:19.OLD
022932-33.tar
I am trying to read the file line by line then strip anything after .tar with awk and use this to create a folder unless it exists.
Then the plan is to copy the original file to the new folder with the original full name stored in $LINE.
$QNAP= "Path to storage"
$LOG_DIR/$NOVA_TAR_LIST= "Path to text file containing file names"
while read -r LINE; do
CURNT_JOB_STRIPED="$LINE | `awk -F ".tar" '{print $1}'`"
if [ ! -d "$QNAP/$CURNT_JOB_STRIPED" ]
then
echo "Folder $QNAP/$CURNT_JOB_STRIPED doesn't exist."
#mkdir "$QNAP/$CURNT_JOB_STRIPED"
fi
done <"$LOG_DIR/$NOVA_TAR_LIST"
Unfortunately this seems to be trying to join all the file names together when trying to create the directories rather than doing them one by one and I get a
File name too long
output:
......951267-21\n951267-21\n961075-07\n961148-13\n961520-20\n971333-21\n981325-22\n981325-22\n981743-40\n999111-99\n999999-04g\n999999-44': File name too long
Apologies if this is trivial, bit of a rookie...

Try modifying your script as follows:
CURNT_JOB_STRIPED=$(echo "${LINE}" | awk -F ".tar" '{print $1}')
You have to use $(...) for command substitution. Also, you should print the variable LINE in order to prevent the shell from interpreting its value as a command but passing it to the next command of the pipe (as an input) instead. Finally, you should remove the backticks from the awk expression (this is the deprecated syntax for command substitution) since what you want is the result from the piping commands.
For further information, take a look over http://tldp.org/LDP/abs/html/commandsub.html
Alternatively, and far less readable (neither with a higher performance, thus just as a "curiosity"), you can just use instead of the whole while loop:
xargs -I{} bash -c 'mkdir -p "${2}/${1%.tar*}"' - '{}' "${QNAP}" < "${LOG_DIR}/${NOVA_TAR_LIST}"

The problem is with the CURNT_JOB_STRIPED="$LINE | `awk -F ".tar" '{print $1}'`" line.
The `command` is legacy a syntax, $(command) should be used instead.
$LINE variable should be printed so awk can receive its value trough a pipe.
If you run the whole thing in a sub shell ( $(command) ) you can assign the output into a variable: var=$(date)
Is is safer to put variables into ${} so if there is surrounding text you will not get unexpected results.
This should work:
CURNT_JOB_STRIPED=$(echo "${LINE}" | awk -F '.tar' '{print $1}')
With variable substitution this can be achieved with more efficient code, and it also clean to read I believe.
Variable substitution is not change the ${LINE} variable so it can be used later as the variable that have the full filename unchanged while ${LINE%.tar*} cut the last .tar text from the variable value and with * anything after that.
while read -r LINE; do
if [ ! -d "${QNAP}/${LINE%.tar*}" ]
then
echo "Folder ${QNAP}/${LINE%.tar*} doesn't exist."
#mkdir "${QNAP}/${LINE%.tar*}"
fi
done <"${LOG_DIR}/${NOVA_TAR_LIST}"
This way you not store the directory name as variable and ${LINE} only store the filename. If You need it into a variable you can do that easily: var="${LINE%.tar*}"
Variable Substitution:
There is more i only picked this 4 for now as they similar and relevant here.
${var#pattern} - Use value of var after removing text that match pattern from the left
${var##pattern} - Same as above but remove the longest matching piece instead the shortest
${var%pattern} - Use value of var after removing text that match pattern from the right
${var%%pattern} - Same as above but remove the longest matching piece instead the shortest

How to use variables in a sed command line?

I have this configuration file that has
# some other configuration settings
.....
wrapper.java.classpath.1=/opt/project/services/wrapper.jar
wrapper.java.classpath.2=/opt/project/RealTimeServer/RTEServer.jar
wrapper.java.classpath.3=/opt/project/mysql-connector-java-5.1.39-bin.jar
.....
# some other configuration settings
and I want it to look like this
# some other configuration settings
.....
wrapper.java.classpath.1=/opt/project/services/wrapper.jar
wrapper.java.classpath.2=/opt/project/RealTimeServer/RTEServer.jar
wrapper.java.classpath.3=/opt/project/mysql-connector-java-5.1.39-bin.jar
wrapper.java.classpath.4=/opt/project/RealTimeServer/some_other.jar
.....
# some other configuration settings
So I wrote this bash shell
#!/bin/bash
CONF_FILE=$1
JAR_FILE=$2
DIR=$3
# Get the last wrapper.java.classpath.N=/some_path line
CLASSPATH=`awk '/classpath/ {aline=$0} END{print aline}' $CONF_FILE`
echo $CLASSPATH
# Get the left side of the equation
IFS='=' read -ra LS <<< "$CLASSPATH"
# Get the value of N
NUM=${LS##*\.}
# Increment by 1
NUM=$((NUM+1))
echo $NUM
NEW_LINE="wrapper.java.classpath.$NUM=$DIR/$JAR_FILE"
echo $NEW_LINE
# Append classpath line to conf file
sed "/$CLASSPATH/a \\${NEW_LINE}" $CONF_FILE
I call it this way
./append_classpath.sh some_file.conf some_other.jar /opt/project/RealTimeServer
But I get
sed: -e expression #1, char 28: unknown command: `o'

I just saw your shell script. A shell is an environment from which to call tools, it is NOT a tool to manipulate text. The standard, general purpose UNIX tool to manipulate text is awk. Your entire shell script can be reduced to:
$ dir="/opt/project/RealTimeServer"
$ jar_file="some_other.jar"
$ awk -v new="$dir/$jar_file" 'p~/classpath/ && !/classpath/{match(p,/([^=]+\.)([0-9]+)=/,a); print a[1] (++a[2]) "=" new} {print; p=$0}' file
# some other configuration settings
.....
wrapper.java.classpath.1=/opt/project/services/wrapper.jar
wrapper.java.classpath.2=/opt/project/RealTimeServer/RTEServer.jar
wrapper.java.classpath.3=/opt/project/mysql-connector-java-5.1.39-bin.jar
wrapper.java.classpath.4=/opt/project/RealTimeServer/some_other.jar
.....
# some other configuration settings
The above uses GNU awk for the 3rd arg to match(). Read the book Effective Awk Programming, 4th Edition, by Arnold Robbins if you will ever have to manipulate text in a UNIX environment.
Now back to your question:
This is the syntax for what you are TRYING to do:
sed '/'"$some_string"'/a '"$some_line" "$some_file"
BUT DON'T DO IT or you'll be condemning yourself to cryptic, non-portable, unmaintainable, peeling the onion, escaping-everything hell (see Is it possible to escape regex metacharacters reliably with sed)!
sed is for simple subsitutions on individual lines, that is all. For anything else, e.g. what you are attempting, you should be using awk:
awk -v regexp="$some_string" -v line="$some_line" '{print} $0~regexp{print line}' file
Note that although your shell variable is named "some_string" you were using it in a regexp context (all you can do with sed) so I used it in a regexp context in the awk command too and named the awk variable "regexp" rather than "string" for clarity (it's just a variable name, though, no hidden meaning).
If you really DID want it treated as a string rather than a regexp then that'd be:
awk -v string="$some_string" -v line="$some_line" '{print} index($0,string){print line}' file
The only caveat to the above is that backslashes in the shell variables will be expanded when the awk variables are initialized from them so \t, for example, would become a literal tab character. If that's undesirable let us know and we can provide an alternative syntax for initing the awk variables that does not expand backslashes, see http://cfajohnson.com/shell/cus-faq-2.html#Q24.

The sed command will have problems with the slashes in your variables.
Look for some unique delimiter such as a # and try something like
CLASSPATH="wrapper.java.classpath.4=/opt/project/RealTimeServer/some_other.jar"
NEW_LINE="wrapper.java.classpath.5=your/data.rar"
echo "# some other configuration settings
.....
wrapper.java.classpath.1=/opt/project/services/wrapper.jar
wrapper.java.classpath.2=/opt/project/RealTimeServer/RTEServer.jar
wrapper.java.classpath.3=/opt/project/mysql-connector-java-5.1.39-bin.jar
wrapper.java.classpath.4=/opt/project/RealTimeServer/some_other.jar
.....
# some other configuration settings
Some more config lines
" | sed "s#${CLASSPATH}#&\n${NEW_LINE}#"

This is a one-pass pure bash solution - it should be fine if the configuration file is not huge
pfx=wrapper.java.classpath.
while IFS= read -r line; do
if [[ $line == $pfx*=* ]]; then
lastclasspath=$line
elif [[ -n $lastclasspath ]]; then
newline=${lastclasspath#$pfx}
num=${newline%%=*}
newline="$pfx$((num+1))=$DIR/$JAR_FILE"
echo "$newline"
unset lastclasspath
fi
echo "$line"
done <$CONF_FILE

How to split the contents of `$PATH` into distinct lines?

Suppose echo $PATH yields /first/dir:/second/dir:/third/dir.
Question: How does one echo the contents of $PATH one directory at a time as in:
$ newcommand $PATH
/first/dir
/second/dir
/third/dir
Preferably, I'm trying to figure out how to do this with a for loop that issues one instance of echo per instance of a directory in $PATH.

echo "$PATH" | tr ':' '\n'
Should do the trick. This will simply take the output of echo "$PATH" and replaces any colon with a newline delimiter.
Note that the quotation marks around $PATH prevents the collapsing of multiple successive spaces in the output of $PATH while still outputting the content of the variable.

As an additional option (and in case you need the entries in an array for some other purpose) you can do this with a custom IFS and read -a:
IFS=: read -r -a patharr <<<"$PATH"
printf %s\\n "${patharr[#]}"
Or since the question asks for a version with a for loop:
for dir in "${patharr[#]}"; do
echo "$dir"
done

How about this:
echo "$PATH" | sed -e 's/:/\n/g'
(See sed's s command; sed -e 'y/:/\n/' will also work, and is equivalent to the tr ":" "\n" from some other answers.)
It's preferable not to complicate things unless absolutely necessary: a for loop is not needed here. There are other ways to execute a command for each entry in the list, more in line with the Unix Philosophy:
This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.
such as:
echo "$PATH" | sed -e 's/:/\n/g' | xargs -n 1 echo
This is functionally equivalent to a for-loop iterating over the PATH elements, executing that last echo command for each element. The -n 1 tells xargs to supply only 1 argument to it's command; without it we would get the same output as echo "$PATH" | sed -e 'y/:/ /'.
Since this uses xargs, which has built-in support to split the input, and echoes the input if no command is given, we can write that as:
echo -n "$PATH" | xargs -d ':' -n 1
The -d ':' tells xargs to use : to separate it's input rather than a newline, and the -n tells /bin/echo to not write a newline, otherwise we end up with a blank trailing line.

here is another shorter one:
echo -e ${PATH//:/\\n}

You can use tr (translate) to replace the colons (:) with newlines (\n), and then iterate over that in a for loop.
directories=$(echo $PATH | tr ":" "\n")
for directory in $directories
do
echo $directory
done

My idea is to use echo and awk.
echo $PATH | awk 'BEGIN {FS=":"} {for (i=0; i<=NF; i++) print $i}'
EDIT
This command is better than my former idea.
echo "$PATH" | awk 'BEGIN {FS=":"; OFS="\n"} {$1=$1; print $0}'

If you can guarantee that PATH does not contain embedded spaces, you can:
for dir in ${PATH//:/ }; do
echo $dir
done
If there are embedded spaces, this will fail badly.

# preserve the existing internal field separator
OLD_IFS=${IFS}
# define the internal field separator to be a colon
IFS=":"
# do what you need to do with $PATH
for DIRECTORY in ${PATH}
do
echo ${DIRECTORY}
done
# restore the original internal field separator
IFS=${OLD_IFS}

Use sed te extract ascii hex string from a single line in a file

I have a file that looks like this:
some random
text
00ab46f891c2emore random
text
234324fc234ba253069
and yet more text
only one line in the file contains only hex characters (234324fc234ba253069), how do I extract that? I tried sed -ne 's/^\([a-f0-9]*\)$/\1/p' file I used line start and line end (^ and &) as delimiters, but I am obviously missing something...

Grep does the job,
$ grep '^[a-f0-9]\+$' file
234324fc234ba253069
Through awk,
$ awk '/^[a-f0-9]+$/{print}' file
234324fc234ba253069
Based on the search pattern given, awk and grep prints the matched line.
^ # start
[a-f0-9]\+ # hex characters without capital A-F one or more times
$ # End

sed can make it:
sed -n '/^[a-f0-9]*$/p' file
234324fc234ba253069
By the way, your command sed -ne 's/^\([a-f0-9]*\)$/\1/p' file is working to me. Note, also, that it is not necessary to use \1 to print back. It is handy in many cases, but now it is too much because you want to print the whole line. Just sed -n '/pattern/p' does the job, as I indicate above.
As there is just one match in the whole file, you may want to exit once it is found (thanks NeronLeVelu!):
sed -n '/^[a-f0-9]*$/{p;q}' file
Another approach is to let printf decide when the line is hexadecimal:
while read line
do
printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"
done < file
Based on Hexadecimal To Decimal in Shell Script, printf "%f" 0xNUMBER executes successfully if the number is indeed hexadecimal. Otherwise, it returns an error.
Hence, using printf ... >/dev/null 2>&1 && echo "$line" does not let printf print anything (redirects to /dev/null) but then prints the line if it was hexadecimal.
For your given file, it returns:
$ while read line; do printf "%f\n" "0x"$line >/dev/null 2>&1 && echo "$line"; done < a
234324fc234ba253069

Using egrep you can restrict your regex to select lines that only match valid hex characters i.e. [a-fA-F0-9]:
egrep '^[a-fA-F0-9]+$' file
234324fc234ba253069

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to extract strings from a text in shell - shell

I have a file name "PHOTOS_TIMESTAMP_5373382" I want to extract from this filename "PHOTOS_5373382" and add "ABC" i.e. finally want it to look like "abc_PHOTOS_5373382" in shell script.

echo "PHOTOS_TIMESTAMP_5373382" | awk -F"_" '{print "ABC_"$1"_"$3}' echo will provide input for awk command. awk command does the data tokenization on character '_' of input using the option -F. Individual token (starting from 1) can be accessed using $n, where n is the token number.

Although I like awk Native shell solution k="PHOTOS_TIMESTAMP_5373382" IFS="_" read -a arr <<< "$k" echo abc_${arr[0]}_${arr[2]} Sed solution echo "abc_$k" | sed -e 's/TIMESTAMP_//g' abc_PHOTOS_5373382

Related

How to print just the file name when looping through all files of a directory

Read file by line then process as different variable

How to use variables in a sed command line?

How to split the contents of `$PATH` into distinct lines?

Use sed te extract ascii hex string from a single line in a file

Categories

Resources