How do I deal with unusual filenames in a command's parameters? - bash

I have a file ('list') which contains a large list of filenames, with all kinds of character combinations such as:
-sensei
I am using the following script to process this list of files:
#!/bin/bash
while read -r line
do
html2text -o ./text/$line $line
done < list
Which is giving me 'Cannot open input file' errors.
What is the correct way of dealing with these filenames, to prevent any errors?
I have changed the example list above to now include only one filename (out of many) which does not work, no matter how I quote or don't quote it.
#!/bin/bash
while read -r line
do
html2text -o "./text/$line" "$line"
done < list
The error I get is:
Unrecognized command line option "-sensei", try "-help".
As such this question does not resolve this issue.

Something like this should fix your issues (unless the file list has CRLF line endings):
while IFS='' read -r file
do
html2text -o ./text/"$file" -- "$file"
done < filelist.txt
notes:
IFS='' read -r is mandatory when you want to capture a line accurately
most commands support -- for signaling the end of options; whatever the following arguments might be, they will not be treated as options. BTW, an other common work-around for filenames that start with - is to prepend ./ to them.

Are you sure the files are present? Usually there should not be a problem with your script. For example:
#!/bin/bash
while read -r line
do
touch ./text/$line
done < $1
ls -l /tmp/text
works perfectly fine for me (with your example input). Bash should escape them for you. Are you in the right directory? If you are sure the files are present, there is a problem with html2text.
Also make sure not to have a trailing blank line in your input file.

Related

Bash while loop from txt file?

i trying to make a script to organize a pair of list i have, and process with other programs, but im a little bit stuck now.
I want from a List in Txt process every line first creating a folder to each line in the list and then process due to different scripts i have.
But my problem is is the list i give to the script is like 3-4 elements works great and create there own directory, but if i put a list with +1000 lines, then my script process only a few elements thru the scripts.
EDIT: the process are like 30-35 scripts, different language python,bash,python and golang
Any suggestions?
cat $STORES+NEW.txt | while read NEWSTORES
do
cd $STORES && mkdir $NEWSTORES && cd $NEWSTORES && mkdir .Files
python3 checkstatus.py -n $NEWSTORES
checkemployes $NEWSTORES -status
storemanagers -s $NEWSTORES -o $NEWSTORES+managers.txt
curl -s https://redacted.com/store?=$NEWSTORES | grep -vE "<|^[\*]*[\.]*$NEWSTORES" | sort -u | awk 'NF' > $NEWSTORES+site.txt
..
..
..
..
..
..
cd ../..
done
I'm not supposed to give an answer yet but I mistakenly answered my what should be a comment reply. Anyway here a few things I can suggest:
Avoid unnecessary use of cat.
Open your input file using another FD to prevent commands that read input inside the loop from eating the input: IFS= read -ru 3 NEWSTORES; do ...; done 3< "$STORES+NEW.txt" or { IFS= read -ru "$FD" NEWSTORES; do ...; done; } {FD}< "$STORES+NEW.txt". Also see https://stackoverflow.com/a/28837793/445221.
Not completely related but don't use while loop in a pipeline since it will execute in a subshell. In the future if you try to alter a variable and expect it to be saved outside the loop, it won't. You can use lastpipe to avoid it but it's unnecessary most of the time.
Place your variable expansions around double quotes to prevent unwanted word splitting and filename expansion.
Use -r option unless you want backslashes to escape characters.
Specify IFS= before read to prevent stripping of leading and trailing spaces.
Using readarray or mapfile makes it more convenient: readarray -t ALL_STORES_DATA < "$STORES+NEW.txt"; for NEWSTORES IN "${ALL_STORES_DATA[#]}"; do ...; done
Use lowercase characters on your variables when you don't use them in a global manner to avoid conflict with bash's variables.

How to use each line of a file as an argument to a for loop

I'm hoping to do a command on each line listed within foo.txt, where each line of foo.txt is a file name.
There's been plenty of great support for this question, and have tried a while read, another while read, I am now trying to do a for loop. However, I'm starting to think the issue is in the body of the loop.
#!/bin/bash
File=/mnt/d/R_projects/EC/foo.txt
Lines=$(cat $File)
for Line in $Lines
do
echo "fastp -i /mnt/d/R_projects/EC/download/fastq/$Line -o /mnt/e/EC/fastp_trimmed/$Line"
./fastp -i /mnt/d/R_projects/EC/download/fastq/$Line -o /mnt/e/EC/fastp_trimmed/$Line
done
I unfortunately receive the error:
ERROR: Failed to open file: /mnt/d/R_projects/EC/download/fastq/SRR6132950_1.fastq
The file exists, and doing less confirms.
Oddly, the echo doesn't echo what I was expecting and instead states:
" -o /mnt/e/EC/fastp_trimmed/SRR6132950_1.fastqRR6132950_1.fastq"
What could be causing this issue? It's as if the first half was cut off.
Thank you all,
I had suspected the windows \r\n was causing an issue here, but I tried in nano to delete these and moved on. Notepad++ showed my error and was able to fix with notepad++ edit>EOL conversion>Linux (LF).
I also adapted the suggestions to remove cat and add IFS - the suggestion was helpful.
#!/bin/bash
IFS=$'\n'
for Line in $(< /mnt/d/R_projects/EC/foo.txt); do
echo "$Line"
echo "fastp -i /mnt/d/R_projects/EC/download/fastq/$Line -o /mnt/e/EC/fastp_trimmed/$Line"
./fastp -i /mnt/d/R_projects/EC/download/fastq/$Line -o /mnt/e/EC/fastp_trimmed/$Line
done
I have like a million needs for running commands based on a text file and this will be extremely helpful for further applications.

How do I use `sed` to alter a variable in a bash script?

I'm trying to use enscript to print PDFs from Mutt, and hitting character encoding issues. One way around them seems to be to just use sed to replace the problem characters: sed -ir 's/[“”]/"/g' {input}
My test input file is this:
“very dirty”
we’re
I'm hoping to get "very dirty" and we're but instead I'm still getting
â\200\234very dirtyâ\200\235
weâ\200\231re
I found a nice little post on printing to PDFs from Mutt that I used as a starting point. I have a bash script that I point to from my .muttrc with set print_command="$HOME/.mutt/print.sh" -- the script currently reads about like this:
#!/bin/bash
input="$1" pdir="$HOME/Desktop" open_pdf=evince
# Straighten out curly quotes
sed -ir 's/[“”]/"/g' $input
sed -ir "s/[’]/'/g" $input
tmpfile="`mktemp $pdir/mutt_XXXXXXXX.pdf`"
enscript --font=Courier8 $input -2r --word-wrap --fancy-header=mutt -p - 2>/dev/null | ps2pdf - $tmpfile
$open_pdf $tmpfile >/dev/null 2>&1 &
sleep 1
rm $tmpfile
It does a fine job of creating a PDF (and works fine if you give it a file as an argument) but I can't figure out how to fix the curly quotes.
I've tried a bunch of variations on the sed line:
input=sed -r 's/[“”]/"/g' $input
$input=sed -ir "s/[’]/'/g" $input
Per the suggestion at Can I use sed to manipulate a variable in bash? I also tried input=$(sed -r 's/[“”]/"/g' <<< $input) and I get an error: "Syntax error: redirection unexpected"
But none manages to actually change $input -- what is the correct syntax to change $input with sed?
Note: I accepted an answer that resolved the question I asked, but as you can see from the comments there are a couple of other issues here. enscript is taking in a whole file as a variable, not just the text of the file. So trying to tweak the text inside the file is going to take a few extra steps. I'm still learning.
On Editing Variables In General
BashFAQ #21 is a comprehensive reference on performing search-and-replace operations in bash, including within variables, and is thus recommended reading. On this particular case:
Use the shell's native string manipulation instead; this is far higher performance than forking off a subshell, launching an external process inside it, and reading that external process's output. BashFAQ #100 covers this topic in detail, and is well worth reading.
Depending on your version of bash and configured locale, it might be possible to use a bracket expression (ie. [“”], as your original code did). However, the most portable thing is to treat “ and ” separately, which will work even without multi-byte character support available.
input='“hello ’cruel’ world”'
input=${input//'“'/'"'}
input=${input//'”'/'"'}
input=${input//'’'/"'"}
printf '%s\n' "$input"
...correctly outputs:
"hello 'cruel' world"
On Using sed
To provide a literal answer -- you almost had a working sed-based approach in your question.
input=$(sed -r 's/[“”]/"/g' <<<"$input")
...adds the missing syntactic double quotes around the parameter expansion of $input, ensuring that it's treated as a single token regardless of how it might be string-split or glob-expanded.
But All That May Not Help...
The below is mentioned because your test script is manipulating content passed on the command line; if that's not the case in production, you can probably disregard the below.
If your script is invoked as ./yourscript “hello * ’cruel’ * world”, then information about exactly what the user entered is lost before the script is started, and nothing you can do here will fix that.
This is because $1, in that scenario, will only contain “hello; ’cruel’ and world” are in their own argv locations, and the *s will have been replaced with lists of files in the current directory (each such file substituted as a separate argument) before the script was even started. Because the shell responsible for parsing the user's command line (which is not the same shell running your script!) did not recognize the quotes as valid at the time when it ran this parsing, by the time the script is running, there's nothing you can do to recover the original data.
Abstract: The way to use sed to change a variable is explored, but what you really need is a way to use and edit a file. It is covered ahead.
Sed
The (two) sed line(s) could be solved with this (note that -i is not used, it is not a file but a value):
input='“very dirty”
we’re'
sed 's/[“”]/\"/g;s/’/'\''/g' <<<"$input"
But it should be faster (for small strings) to use the internals of the shell:
input='“very dirty”
we’re'
input=${input//[“”]/\"}
input=${input//[’]/\'}
printf '%s\n' "$input"
$1
But there is an underlying problem with your script, you are trying to clean an input received from the command line. You are using $1 as the source of the string. Once somebody writes:
./script “very dirty”
we’re
That input is lost. It is broken into shell's tokens and "$1" will be “very only.
But I do not believe that is what you really have.
file
However, you are also saying that the input comes from a file. If that is the case, then read it in with:
input="$(<infile)" # not $1
sed 's/[“”]/\"/g;s/’/'\''/g' <<<"$input"
Or, if you don't mind to edit (change) the file, do this instead:
sed -i 's/[“”]/\"/g;s/’/'\''/g' infile
input="$(<infile)"
Or, if you are clear and certain that what is being given to the script is a filename, like:
./script infile
You can use:
infile="$1"
sed -i 's/[“”]/\"/g;s/’/'\''/g' "$infile"
input="$(<"$infile")"
Other comments:
Then:
Quote your variables.
Do not use the very old `…` syntax, use $(…) instead.
Do not use variables in UPPER case, those are reserved for environment variables.
And (unless you actually meant sh) use a shebang (first line) that targets bash.
The command enscript most definitively requires a file, not a variable.
Maybe you should use evince to open the PS file, there is no need of the step to make a pdf, unless you know you really need it.
I believe that is better use a file to store the output of enscript and ps2pdf.
Do not hide the errors printed by the commands until everything is working as desired, then, just call the script as:
./script infile 2>/dev/null
Or as required to make it less verbose.
Final script.
If you call the script with the name of the file that enscript is going to use, something like:
./script infile
Then, the whole script will look like this (runs both in bash or sh):
#!/usr/bin/env bash
Usage(){ echo "$0; This script require a source file"; exit 1; }
[ $# -lt 1 ] && Usage
[ ! -e $1 ] && Usage
infile="$1"
pdir="$HOME/Desktop"
open_pdf=evince
# Straighten out curly quotes
sed -i 's/[“”]/\"/g;s/’/'\''/g' "$infile"
tmpfile="$(mktemp "$pdir"/mutt_XXXXXXXX.pdf)"
outfile="${tmpfile%.*}.ps"
enscript --font=Courier10 "$infile" -2r \
--word-wrap --fancy-header=mutt -p "$outfile"
ps2pdf "$outfile" "$tmpfile"
"$open_pdf" "$tmpfile" >/dev/null 2>&1 &
sleep 5
rm "$tmpfile" "$outfile"

how does a while read loop work in bash?

This is a crawler from GitHub that I am to implement on myself but am unable to read the bash since am a novice. Can this be explained in answer
#!/bin/bash
# Create an array files that contains list of filenames
files=($(< url.txt))
cities=($(< city.txt))
url="http://www.grotal.com/"
citycodes=($(<citycode.txt))
# Read through the url.txt file and execute wget command for every filename
while IFS='=| ' read -r param uri; do
for file in "${files[#]}"; do
for city in "${cities[#]}"; do
mkdir "${city}"
mkdir "${city}/${file}"
wget -O "${city}/${file}/${file}${citycodes[#]}" "${uri}${url}${city}/${file}-${citycodes[#]}/"
done
done
done < url.txt
specifically these (even if u choose to downvote...)
while IFS='=| ' read -r param uri;
and then this:
done < url.txt
Let's break this down into pieces:
read, unless given a non-default -d argument to specify a terminator to use in place of the newline, reads a single line from stdin (that is, reads up to the next newline); splits that line on IFS characters, and writes each field into a different variable. If it stops being able to read more data before reaching a newline, then it emits a nonzero exit status, even if it successfully populated the variables given. (The -r argument prevents read from treating backslashes as continuation characters rather than literals; unless you have a specific reason to have continuation characters available in the context at hand, you should make a habit of passing -r to read by default).
< url.txt redirects a read handle on url.txt into stdin for the command (including a compound command such as a while loop) to which it's appended.
A while loop runs the conditional command it's given, checks whether that conditional reports success or failure, and then proceeds to run the body and restart on success, or exit on failure.
Thus, if you have IFS='=| ' read -r param uri, it will read a single line from stdin; assign everything up to the first =, | or space to the variable named param, and assign what's left to the variable uri.
If you put that in the conditional part of a while loop, then the loop will operate until that read fails -- as it will if there isn't more content (up to and including a newline character) available to be read.
For more in-depth discussion of the idiom and its uses, see BashFAQ #1.
Some asides:
Using mkdir -p -- "${city}/${file}" will let you have only a single mkdir command that creates both directories (and avoids generating error messages if they already exist).
Using readarray -t files < url.txt is a more robust way to read the contents of url.txt into an array named files, though it requires bash 4.0 or newer. For older versions of the shell, consider IFS=$'\n' read -r -d '' -a files <url.txt || (( ${#files[#]} )). These will behave far better than the original idiom if you have wildcards, whitespace, or other unexpected content in your input files.

Bash: Why mv won't move files under this for-loop?

Using a Bash script, I'd like to move a list of files by using a for-loop, not a while-loop (for testing purpose). Can anyone explain to me why mv always acts as file rename rather than file move under this for loop? How can I fix it to move the list of files?
The following works:
for file in "/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
do
mv "$file" "/Volumes/HDD2/"
done
UPDATE#1:
However, suppose that I have a sample_pathname.txt
cat sample_pathname.txt
"/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
Why the following for-loop will not work then?
array=$(cat sample_path2.txt)
for file in "${array[#]}"
do
mv "$file" "/Volumes/HDD2/"
done
Thanks.
System: OS X
Bash version: 3.2.53(1)
cat sample_pathname.txt
"/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
The quotation marks here are the problem. Unless you need to cope with file names with newlines in them, the simple and standard way to do this is to list one file name per line, with no quotes or other metainformation.
vbvntv$ cat sample_pathname_fixed.txt
/Volumes/HDD1/001.jpg
/Volumes/HDD1/002.jpg
vbvntv$ while read -r file; do
> mv "$file" "/Volumes/HDD2/"
> done <sample_pathname_fixed.txt
In fact, you could even
xargs mv -t /Volumes/HDD2 <sample_pathname_fixed.txt
(somewhat depending on how braindead your xargs is).
The syntax used in your example will not create an array... It is just storing the file contents in a variable named array.
IFS=$'\n' array=$(cat sample_path2.txt)
If you have a text file containing filenames (each on separate line would be simplest), you can load it into an array and iterate over it as follows. Note the use of $(< file ) as a better alternative to cat and the parenthesis to initialize the contents into an array. Each line of the file corresponds to an index.
array=($(< file_list.txt ))
for file in "${array[#]}"; do
mv "$file" "/absolute/path"
done
Update: Your IFS was probably not set correctly if the command at the top of the post didn't work. I updated it to reflect that. Also, there are a couple of other reliable ways to initialize an array from a file. But like you mentioned, if you are just piping the file directly into a while loop, you may not need it.
This is a shell builtin in Bash 4+ and a synonym of mapfile. This works great if its available.
readarray -t array < file
The 'read' command can also initialize an array for you:
IFS=$'\n' read -d '' -r -a array < file
use this:
for file in "/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
do
f=$(basename $file)
mv "$file" "/Volumes/HDD2/$f"
done

Resources