Read file by line then process as different variable - bash

I have created a text file with a list of file names like below
022694-39.tar
022694-39.tar.2017-05-30_13:56:33.OLD
022694-39.tar.2017-07-04_09:22:04.OLD
022739-06.tar
022867-28.tar
022867-28.tar.2018-07-18_11:59:19.OLD
022932-33.tar
I am trying to read the file line by line then strip anything after .tar with awk and use this to create a folder unless it exists.
Then the plan is to copy the original file to the new folder with the original full name stored in $LINE.
$QNAP= "Path to storage"
$LOG_DIR/$NOVA_TAR_LIST= "Path to text file containing file names"
while read -r LINE; do
CURNT_JOB_STRIPED="$LINE | `awk -F ".tar" '{print $1}'`"
if [ ! -d "$QNAP/$CURNT_JOB_STRIPED" ]
then
echo "Folder $QNAP/$CURNT_JOB_STRIPED doesn't exist."
#mkdir "$QNAP/$CURNT_JOB_STRIPED"
fi
done <"$LOG_DIR/$NOVA_TAR_LIST"
Unfortunately this seems to be trying to join all the file names together when trying to create the directories rather than doing them one by one and I get a
File name too long
output:
......951267-21\n951267-21\n961075-07\n961148-13\n961520-20\n971333-21\n981325-22\n981325-22\n981743-40\n999111-99\n999999-04g\n999999-44': File name too long
Apologies if this is trivial, bit of a rookie...

Try modifying your script as follows:
CURNT_JOB_STRIPED=$(echo "${LINE}" | awk -F ".tar" '{print $1}')
You have to use $(...) for command substitution. Also, you should print the variable LINE in order to prevent the shell from interpreting its value as a command but passing it to the next command of the pipe (as an input) instead. Finally, you should remove the backticks from the awk expression (this is the deprecated syntax for command substitution) since what you want is the result from the piping commands.
For further information, take a look over http://tldp.org/LDP/abs/html/commandsub.html
Alternatively, and far less readable (neither with a higher performance, thus just as a "curiosity"), you can just use instead of the whole while loop:
xargs -I{} bash -c 'mkdir -p "${2}/${1%.tar*}"' - '{}' "${QNAP}" < "${LOG_DIR}/${NOVA_TAR_LIST}"

The problem is with the CURNT_JOB_STRIPED="$LINE | `awk -F ".tar" '{print $1}'`" line.
The `command` is legacy a syntax, $(command) should be used instead.
$LINE variable should be printed so awk can receive its value trough a pipe.
If you run the whole thing in a sub shell ( $(command) ) you can assign the output into a variable: var=$(date)
Is is safer to put variables into ${} so if there is surrounding text you will not get unexpected results.
This should work:
CURNT_JOB_STRIPED=$(echo "${LINE}" | awk -F '.tar' '{print $1}')
With variable substitution this can be achieved with more efficient code, and it also clean to read I believe.
Variable substitution is not change the ${LINE} variable so it can be used later as the variable that have the full filename unchanged while ${LINE%.tar*} cut the last .tar text from the variable value and with * anything after that.
while read -r LINE; do
if [ ! -d "${QNAP}/${LINE%.tar*}" ]
then
echo "Folder ${QNAP}/${LINE%.tar*} doesn't exist."
#mkdir "${QNAP}/${LINE%.tar*}"
fi
done <"${LOG_DIR}/${NOVA_TAR_LIST}"
This way you not store the directory name as variable and ${LINE} only store the filename. If You need it into a variable you can do that easily: var="${LINE%.tar*}"
Variable Substitution:
There is more i only picked this 4 for now as they similar and relevant here.
${var#pattern} - Use value of var after removing text that match pattern from the left
${var##pattern} - Same as above but remove the longest matching piece instead the shortest
${var%pattern} - Use value of var after removing text that match pattern from the right
${var%%pattern} - Same as above but remove the longest matching piece instead the shortest

Related

How to extract strings from a text in shell

I have a file name
"PHOTOS_TIMESTAMP_5373382"
I want to extract from this filename "PHOTOS_5373382" and add "ABC" i.e. finally want it to look like
"abc_PHOTOS_5373382" in shell script.
echo "PHOTOS_TIMESTAMP_5373382" | awk -F"_" '{print "ABC_"$1"_"$3}'
echo will provide input for awk command.
awk command does the data tokenization on character '_' of input using the option -F.
Individual token (starting from 1) can be accessed using $n, where n is the token number.
You will need the following sequence of commands directly on your shell, preferably bash shell (or) as a complete script which takes a single argument the file to be converted
#!/bin/bash
myFile="$1" # Input argument (file-name with extension)
filename=$(basename "$myFile") # Getting the absolute file-path
extension="${filename##*.}" # Extracting the file-name part without extension
filename="${filename%.*}" # Extracting the extension part
IFS="_" read -r string1 string2 string3 <<<"$filename" # Extracting the sub-string needed from the original file-name with '_' de-limiter
mv -v "$myFile" ABC_"$string1"_"$string3"."$extension" # Renaming the actual file
On running the script as
$ ./script.sh PHOTOS_TIMESTAMP_5373382.jpg
`PHOTOS_TIMESTAMP_5373382.jpg' -> `ABC_PHOTOS_5373382.jpg'
Although I like awk
Native shell solution
k="PHOTOS_TIMESTAMP_5373382"
IFS="_" read -a arr <<< "$k"
echo abc_${arr[0]}_${arr[2]}
Sed solution
echo "abc_$k" | sed -e 's/TIMESTAMP_//g'
abc_PHOTOS_5373382

How to split the contents of `$PATH` into distinct lines?

Suppose echo $PATH yields /first/dir:/second/dir:/third/dir.
Question: How does one echo the contents of $PATH one directory at a time as in:
$ newcommand $PATH
/first/dir
/second/dir
/third/dir
Preferably, I'm trying to figure out how to do this with a for loop that issues one instance of echo per instance of a directory in $PATH.
echo "$PATH" | tr ':' '\n'
Should do the trick. This will simply take the output of echo "$PATH" and replaces any colon with a newline delimiter.
Note that the quotation marks around $PATH prevents the collapsing of multiple successive spaces in the output of $PATH while still outputting the content of the variable.
As an additional option (and in case you need the entries in an array for some other purpose) you can do this with a custom IFS and read -a:
IFS=: read -r -a patharr <<<"$PATH"
printf %s\\n "${patharr[#]}"
Or since the question asks for a version with a for loop:
for dir in "${patharr[#]}"; do
echo "$dir"
done
How about this:
echo "$PATH" | sed -e 's/:/\n/g'
(See sed's s command; sed -e 'y/:/\n/' will also work, and is equivalent to the tr ":" "\n" from some other answers.)
It's preferable not to complicate things unless absolutely necessary: a for loop is not needed here. There are other ways to execute a command for each entry in the list, more in line with the Unix Philosophy:
This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.
such as:
echo "$PATH" | sed -e 's/:/\n/g' | xargs -n 1 echo
This is functionally equivalent to a for-loop iterating over the PATH elements, executing that last echo command for each element. The -n 1 tells xargs to supply only 1 argument to it's command; without it we would get the same output as echo "$PATH" | sed -e 'y/:/ /'.
Since this uses xargs, which has built-in support to split the input, and echoes the input if no command is given, we can write that as:
echo -n "$PATH" | xargs -d ':' -n 1
The -d ':' tells xargs to use : to separate it's input rather than a newline, and the -n tells /bin/echo to not write a newline, otherwise we end up with a blank trailing line.
here is another shorter one:
echo -e ${PATH//:/\\n}
You can use tr (translate) to replace the colons (:) with newlines (\n), and then iterate over that in a for loop.
directories=$(echo $PATH | tr ":" "\n")
for directory in $directories
do
echo $directory
done
My idea is to use echo and awk.
echo $PATH | awk 'BEGIN {FS=":"} {for (i=0; i<=NF; i++) print $i}'
EDIT
This command is better than my former idea.
echo "$PATH" | awk 'BEGIN {FS=":"; OFS="\n"} {$1=$1; print $0}'
If you can guarantee that PATH does not contain embedded spaces, you can:
for dir in ${PATH//:/ }; do
echo $dir
done
If there are embedded spaces, this will fail badly.
# preserve the existing internal field separator
OLD_IFS=${IFS}
# define the internal field separator to be a colon
IFS=":"
# do what you need to do with $PATH
for DIRECTORY in ${PATH}
do
echo ${DIRECTORY}
done
# restore the original internal field separator
IFS=${OLD_IFS}

Bash variables not acting as expected

I have a bash script which parses a file line by line, extracts the date using a cut command and then makes a folder using that date. However, it seems like my variables are not being populated properly. Do I have a syntax issue? Any help or direction to external resources is very appreciated.
#!/bin/bash
ls | grep .mp3 | cut -d '.' -f 1 > filestobemoved
cat filestobemoved | while read line
do
varYear= $line | cut -d '_' -f 3
varMonth= $line | cut -d '_' -f 4
varDay= $line | cut -d '_' -f 5
echo $varMonth
mkdir $varMonth'_'$varDay'_'$varYear
cp ./$line'.mp3' ./$varMonth'_'$varDay'_'$varYear/$line'.mp3'
done
You have many errors and non-recommended practices in your code. Try the following:
for f in *.mp3; do
f=${f%%.*}
IFS=_ read _ _ varYear varMonth varDay <<< "$f"
echo $varMonth
mkdir -p "${varMonth}_${varDay}_${varYear}"
cp "$f.mp3" "${varMonth}_${varDay}_${varYear}/$f.mp3"
done
The actual error is that you need to use command substitution. For example, instead of
varYear= $line | cut -d '_' -f 3
you need to use
varYear=$(cut -d '_' -f 3 <<< "$line")
A secondary error there is that $foo | some_command on its own line does not mean that the contents of $foo gets piped to the next command as input, but is rather executed as a command, and the output of the command is passed to the next one.
Some best practices and tips to take into account:
Use a portable shebang line - #!/usr/bin/env bash (disclaimer: That's my answer).
Don't parse ls output.
Avoid useless uses of cat.
Use More Quotes™
Don't use files for temporary storage if you can use pipes. It is literally orders of magnitude faster, and generally makes for simpler code if you want to do it properly.
If you have to use files for temporary storage, put them in the directory created by mktemp -d. Preferably add a trap to remove the temporary directory cleanly.
There's no need for a var prefix in variables.
grep searches for basic regular expressions by default, so .mp3 matches any single character followed by the literal string mp3. If you want to search for a dot, you need to either use grep -F to search for literal strings or escape the regular expression as \.mp3.
You generally want to use read -r (defined by POSIX) to treat backslashes in the input literally.

bash script prepending ? to file name

I am using the below script. When I have it echo $f as shown below, it gives the correct result:
#/bin/bash
var="\/home\/"
while read p; do
f=$(echo $p | sed "s/${var}/\\n/g")
f=${f%.sliced.bam}.fastq
echo $f
~/bin/samtools view $p | awk '{print "#"$1"\n"$10"\n+\n"$11}' > $f
./run.sh $f ${f%.fastq}
rm ${f%.sliced.bam}.fastq
done < $1
I get the output as expected
test.fastq
But the file being created by awk > $f has the name
?test.fastq
Note that the overall goal here is to run this loop on every file listed in a file with absolute paths but then write locally (which is what the sed call is for)
edit: Run directly on the command line (without variables) the samtools | awk line runs correctly.
Awk cannot possibly have anything to do with your problem. The shell is completely responsible for file redirection, so f MUST have a weird character in it.
Most likely whatever you are sending to this script has a special character in it (e.g. perhaps a UTF character, and your terminal is showing ASCII only). When you do the echo, the shell doesn't know how to display the char, and probably just shows it as whitespace, and when you send it through ls (which might be doing things like colorization) it combines in a strange way and ends up showing the ?.
Oh wait...why are you putting a newline into the filename with sed??? That is possibly your problem...try just:
sed "s/${var}//g"

Error while using while read; do grep

I am using this command
cat text.csv | while read a ; do grep $a text1.csv >> text2.csv; done
text.csv has file names with full path. The file names are having spaces.
Example: C:\Users\Downloads\File Name.txt
text1.csv contains logs showing user id and the file name with full path.
Example: MyName,C:\Users\Downloads\File Name.txt
When I run the command, I get and error
grep: Name: No such file or Directory
I know that the error is because of the spaces in the file name. I would like to know how can I remove this error.
Use your grep pattern with double quotes otherwise shell will treat it as different arguments to grep:
while read a ; do grep "$a" text1.csv >> text2.csv; done < text.csv
There is NO need of extra cat hence I changed it in my answer.
Quote the variable:
cat text.csv | while read a ; do grep "$a" text1.csv >> text2.csv; done
In general, you should usually quote variables, unless you specifically want the value to undergo word splitting and wildcard expansion.

Resources