i trying to make a script to organize a pair of list i have, and process with other programs, but im a little bit stuck now.
I want from a List in Txt process every line first creating a folder to each line in the list and then process due to different scripts i have.
But my problem is is the list i give to the script is like 3-4 elements works great and create there own directory, but if i put a list with +1000 lines, then my script process only a few elements thru the scripts.
EDIT: the process are like 30-35 scripts, different language python,bash,python and golang
Any suggestions?
cat $STORES+NEW.txt | while read NEWSTORES
do
cd $STORES && mkdir $NEWSTORES && cd $NEWSTORES && mkdir .Files
python3 checkstatus.py -n $NEWSTORES
checkemployes $NEWSTORES -status
storemanagers -s $NEWSTORES -o $NEWSTORES+managers.txt
curl -s https://redacted.com/store?=$NEWSTORES | grep -vE "<|^[\*]*[\.]*$NEWSTORES" | sort -u | awk 'NF' > $NEWSTORES+site.txt
..
..
..
..
..
..
cd ../..
done
I'm not supposed to give an answer yet but I mistakenly answered my what should be a comment reply. Anyway here a few things I can suggest:
Avoid unnecessary use of cat.
Open your input file using another FD to prevent commands that read input inside the loop from eating the input: IFS= read -ru 3 NEWSTORES; do ...; done 3< "$STORES+NEW.txt" or { IFS= read -ru "$FD" NEWSTORES; do ...; done; } {FD}< "$STORES+NEW.txt". Also see https://stackoverflow.com/a/28837793/445221.
Not completely related but don't use while loop in a pipeline since it will execute in a subshell. In the future if you try to alter a variable and expect it to be saved outside the loop, it won't. You can use lastpipe to avoid it but it's unnecessary most of the time.
Place your variable expansions around double quotes to prevent unwanted word splitting and filename expansion.
Use -r option unless you want backslashes to escape characters.
Specify IFS= before read to prevent stripping of leading and trailing spaces.
Using readarray or mapfile makes it more convenient: readarray -t ALL_STORES_DATA < "$STORES+NEW.txt"; for NEWSTORES IN "${ALL_STORES_DATA[#]}"; do ...; done
Use lowercase characters on your variables when you don't use them in a global manner to avoid conflict with bash's variables.
Related
Here is my code:
ls | grep -E '^application--[0-9]{4}-[0-9]{2}.tar.gz$' | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}' | xargs -r echo
ls | grep -E '^application--[0-9]{4}-[0-9]{2}.tar.gz$' | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}' | xargs -r rm
As you can see it will get a list of files, show it on screen (for logging purpose) and then delete it.
The issue is that if a file was created between first and second line gets executed, I will delete a file without logging that fact.
Is there a way to create a script that will read the same pipe twice, so the awk result will be piped to both xargs echo and xargs rm commands?
I know I can use a file as a temporary buffer, but I would like to avoid that.
You can change your command to something like
touch example
ls example* | tee >(xargs rm)
I would prefer to avoid parsing ls:
while IFS= read -r file; do
if [[ "$1" < "application--${CLEAR_DATE_LEVEL0}.tar.gz" ]]; then
echo "Removing ${file}"
rm "${file}"
fi
done < <(find . -regextype egrep -regex "./application--[0-9]{4}-[0-9]{2}.tar.gz")
EDIT: An improvement:
As #tripleee mentioned is their answer, using rm -v avoids the additional echo and will also avoid an echo when removing a file failed.
For your specific case, you don't need to read the pipe twice, you can just use rm -v to have rm itself also "echo" each file.
Also, in cases like this, it is better for shell scripts to use globs instead grep ..., both for robustness and performance reasons.
And once you do that, even better: you can loop on the glob and not go through any pipes at all (even more robust in the general case, because there are even less places to worry "could a character in this be special to that program?", and might perform better because everything stays in one process):
for file in application--[0-9][0-9][0-9][0-9]-[0-9][0-9].tar.gz
do
if [[ "$file" < "application--${CLEAR_DATE_LEVEL0}.tar.gz" ]]
then
# echo "$file"
# rm "$file"
rm -v "$file"
fi
done
But if you find yourself in a situation where you really do need to get data from a pipe and a glob won't work, there are a couple ways:
One neat trick in the shell is that loops and other compound commands can be pipes - so a loop can read a pipe, and the inside of the loop can have all the commands you wanted to have read from the pipe:
ls ... | awk ... | while IFS="" read -r file
do
# echo "$file"
# rm "$file"
rm -v "$file"
done
(As a general best practice, you'd want to set IFS= to the empty string for the read command so that read doesn't split the input on characters like spaces, and give read the -r argument to tell it to not interpret special characters like backslashes. In your specific case it doesn't matter.)
But if a loop doesn't work for what you need, then in the general case, you can catch the result of a pipe in a shell variable:
pipe_contents="$(ls application--[0-9][0-9][0-9][0-9]-[0-9][0-9].tar.gz | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}')"
echo "$pipe_contents"
rm $pipe_contents
(This works fine unless your pipe output contains characters that would be special to the shell at the point that the pipe output has to be unquoted - in this case, it needs to be unquoted for the rm, because if it's quoted then the shell won't split the captured pipe output on whitespace, and rm will end up looking for one big file name that looks like the entire pipe output. Part of why looping on a glob is more robust is that it doesn't have these kinds of problems: the pipe combines all file names into one big text that needs to be re-split on whitespace. Luckily in your case, your file names don't have whitespace nor globbing characters, so leaving the pipe output unquoted ends up being fine.)
Also, since you're using bash and your pipe data is multiple separate things, you can use an array variable (bash extension, also found in shells like zsh) instead of a regular variable:
files=($(ls application--[0-9][0-9][0-9][0-9]-[0-9][0-9].tar.gz | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}'))
echo "${files[#]}"
rm "${files[#]}"
(Note that an unquoted expansion still happens with the array, it just happens when defining the array instead of when passing the pipe contents to rm. A small advantage is that if you had multiple commands which needed the unquoted contents, using an array does the splitting only once. A big advantage is that once you recognize array syntax, it does a better job of expressing your big-picture intent through the code itself.)
You can also use a temporary file instead of a shell variable, but you said you want to avoid that. I also prefer a variable when the data fits in memory because Linux/UNIX does not give shell scripts a reliable way to clean up external resources (you can use trap but for example traps can't run on uncatchable signals).
P.S. ideally, in the general habit, you should use printf '%s\n' "$foo" instead of echo "$foo", because echo has various special cases (and portability inconsistencies, but that doesn't matter as much if you always use bash until you need to care about portable sh). In modern featureful shells like bash, you can also use %q instead of %s in printf, which is great because for example printf '%q\n' "${files[#]}" will actually print each file with any special characters properly quoted or escaped, which can help with debugging if you ever are dealing with files that have special whitespace or globbing characters in them.
No, a pipe is a stream - once you read something from it, it is forever gone from the pipe.
A good general solution is to use a temporary file; this lets you rewind and replay it. Just take care to remove it when you're done.
temp=$(mktemp -t) || exit
trap 'rm -f "$temp"' ERR EXIT
cat >"$temp"
cat "$temp"
xargs rm <"$temp"
The ERR and EXIT pseudo-signals are Bash extensions. For POSIX portability, you need a somewhat more involved set of trap commands.
Properly speaking, mktemp should receive an argument which is used as a template for the temporary file's name, so that the user can see which temporary file belongs to which tool. For example, if this script was called rmsponge, you could use mktemp rmspongeXXXXXXXXX to have mktemp generate a temporary file name which begins with rmsponge.
If you only expect a limited amount of input, perhaps just capture the input in a variable. However, this scales poorly, and could have rather unfortunate problems if the input data exceeds available memory;
# XXX avoid: scales poorly
values=$(cat)
xargs printf "%s\n" <<<"$values"
xargs rm <<<"$values"
The <<< "here string" syntax is also a Bash extension. This also suffers from the various issues from https://mywiki.wooledge.org/BashFAQ/020 but this is inherent to your problem articulation.
Of course, in this individual case, just use rm -v to see which files rm removes.
I'm trying to loop same variables from multiple files in bash. Here's my file structure and their contents;
script.sh
first.conf
second.conf
Inside of first.conf;
var=Hello1
Inside of second.conf;
var=Hello2
Inside of script.sh;
#!/bin/bash
_a=`find ~/ -name "*.conf"`
source ${_a}
for x in ${_a}
do
echo "$var"
done
This might look really dumb tho, I'm really new to programming.
What I'm trying to do is to loop and echo these $vars from 2 different configs.
How can I do that?
Consider:
while IFS= read -r -d '' conf; do
(source "$conf" && echo "$var")
done < <(find ~ -name '*.conf' -print0)
Breaking down how this works:
The while read syntax is discussed in BashFAQ #1. The variant with -d '' expects input separated by NULs rather than newlines -- more on that later.
Putting (source "$conf" && echo "$var") in parens prevents side effects on the rest of your script -- while this has a performance cost, it ensures that variables added by source are only present for the echo. Using the && prevents the echo from running if the source command fails.
<(...) is process substitution syntax; it's replaced with a filename that can be read to retrieve the output of the command therein (in this case find). Using this syntax rather than piping into the loop avoids the bugs discussed in BashFAQ #24.
The -print0 action in find prints the name of the file found, followed by a NUL character -- a zero byte. What's useful about NUL bytes is that, unlike any other ASCII character, they can't exist in UNIX paths; using them thus prevents your code from being subject to trickery (think about someone running d=$'./tmp/\n/etc/passwd\n' && mkdir -p -- "$d" && touch "$d/hi.conf" -- traditional find output would have /etc/passwd showing up on its own line, but with find -print0, the newlines in the name aren't mistaken for a separator between files.
This is a shorter and simpler way.
#!/bin/bash
for f in *.conf
do
source "$f"; echo "$f : $var"
done
I'm trying to use enscript to print PDFs from Mutt, and hitting character encoding issues. One way around them seems to be to just use sed to replace the problem characters: sed -ir 's/[“”]/"/g' {input}
My test input file is this:
“very dirty”
we’re
I'm hoping to get "very dirty" and we're but instead I'm still getting
â\200\234very dirtyâ\200\235
weâ\200\231re
I found a nice little post on printing to PDFs from Mutt that I used as a starting point. I have a bash script that I point to from my .muttrc with set print_command="$HOME/.mutt/print.sh" -- the script currently reads about like this:
#!/bin/bash
input="$1" pdir="$HOME/Desktop" open_pdf=evince
# Straighten out curly quotes
sed -ir 's/[“”]/"/g' $input
sed -ir "s/[’]/'/g" $input
tmpfile="`mktemp $pdir/mutt_XXXXXXXX.pdf`"
enscript --font=Courier8 $input -2r --word-wrap --fancy-header=mutt -p - 2>/dev/null | ps2pdf - $tmpfile
$open_pdf $tmpfile >/dev/null 2>&1 &
sleep 1
rm $tmpfile
It does a fine job of creating a PDF (and works fine if you give it a file as an argument) but I can't figure out how to fix the curly quotes.
I've tried a bunch of variations on the sed line:
input=sed -r 's/[“”]/"/g' $input
$input=sed -ir "s/[’]/'/g" $input
Per the suggestion at Can I use sed to manipulate a variable in bash? I also tried input=$(sed -r 's/[“”]/"/g' <<< $input) and I get an error: "Syntax error: redirection unexpected"
But none manages to actually change $input -- what is the correct syntax to change $input with sed?
Note: I accepted an answer that resolved the question I asked, but as you can see from the comments there are a couple of other issues here. enscript is taking in a whole file as a variable, not just the text of the file. So trying to tweak the text inside the file is going to take a few extra steps. I'm still learning.
On Editing Variables In General
BashFAQ #21 is a comprehensive reference on performing search-and-replace operations in bash, including within variables, and is thus recommended reading. On this particular case:
Use the shell's native string manipulation instead; this is far higher performance than forking off a subshell, launching an external process inside it, and reading that external process's output. BashFAQ #100 covers this topic in detail, and is well worth reading.
Depending on your version of bash and configured locale, it might be possible to use a bracket expression (ie. [“”], as your original code did). However, the most portable thing is to treat “ and ” separately, which will work even without multi-byte character support available.
input='“hello ’cruel’ world”'
input=${input//'“'/'"'}
input=${input//'”'/'"'}
input=${input//'’'/"'"}
printf '%s\n' "$input"
...correctly outputs:
"hello 'cruel' world"
On Using sed
To provide a literal answer -- you almost had a working sed-based approach in your question.
input=$(sed -r 's/[“”]/"/g' <<<"$input")
...adds the missing syntactic double quotes around the parameter expansion of $input, ensuring that it's treated as a single token regardless of how it might be string-split or glob-expanded.
But All That May Not Help...
The below is mentioned because your test script is manipulating content passed on the command line; if that's not the case in production, you can probably disregard the below.
If your script is invoked as ./yourscript “hello * ’cruel’ * world”, then information about exactly what the user entered is lost before the script is started, and nothing you can do here will fix that.
This is because $1, in that scenario, will only contain “hello; ’cruel’ and world” are in their own argv locations, and the *s will have been replaced with lists of files in the current directory (each such file substituted as a separate argument) before the script was even started. Because the shell responsible for parsing the user's command line (which is not the same shell running your script!) did not recognize the quotes as valid at the time when it ran this parsing, by the time the script is running, there's nothing you can do to recover the original data.
Abstract: The way to use sed to change a variable is explored, but what you really need is a way to use and edit a file. It is covered ahead.
Sed
The (two) sed line(s) could be solved with this (note that -i is not used, it is not a file but a value):
input='“very dirty”
we’re'
sed 's/[“”]/\"/g;s/’/'\''/g' <<<"$input"
But it should be faster (for small strings) to use the internals of the shell:
input='“very dirty”
we’re'
input=${input//[“”]/\"}
input=${input//[’]/\'}
printf '%s\n' "$input"
$1
But there is an underlying problem with your script, you are trying to clean an input received from the command line. You are using $1 as the source of the string. Once somebody writes:
./script “very dirty”
we’re
That input is lost. It is broken into shell's tokens and "$1" will be “very only.
But I do not believe that is what you really have.
file
However, you are also saying that the input comes from a file. If that is the case, then read it in with:
input="$(<infile)" # not $1
sed 's/[“”]/\"/g;s/’/'\''/g' <<<"$input"
Or, if you don't mind to edit (change) the file, do this instead:
sed -i 's/[“”]/\"/g;s/’/'\''/g' infile
input="$(<infile)"
Or, if you are clear and certain that what is being given to the script is a filename, like:
./script infile
You can use:
infile="$1"
sed -i 's/[“”]/\"/g;s/’/'\''/g' "$infile"
input="$(<"$infile")"
Other comments:
Then:
Quote your variables.
Do not use the very old `…` syntax, use $(…) instead.
Do not use variables in UPPER case, those are reserved for environment variables.
And (unless you actually meant sh) use a shebang (first line) that targets bash.
The command enscript most definitively requires a file, not a variable.
Maybe you should use evince to open the PS file, there is no need of the step to make a pdf, unless you know you really need it.
I believe that is better use a file to store the output of enscript and ps2pdf.
Do not hide the errors printed by the commands until everything is working as desired, then, just call the script as:
./script infile 2>/dev/null
Or as required to make it less verbose.
Final script.
If you call the script with the name of the file that enscript is going to use, something like:
./script infile
Then, the whole script will look like this (runs both in bash or sh):
#!/usr/bin/env bash
Usage(){ echo "$0; This script require a source file"; exit 1; }
[ $# -lt 1 ] && Usage
[ ! -e $1 ] && Usage
infile="$1"
pdir="$HOME/Desktop"
open_pdf=evince
# Straighten out curly quotes
sed -i 's/[“”]/\"/g;s/’/'\''/g' "$infile"
tmpfile="$(mktemp "$pdir"/mutt_XXXXXXXX.pdf)"
outfile="${tmpfile%.*}.ps"
enscript --font=Courier10 "$infile" -2r \
--word-wrap --fancy-header=mutt -p "$outfile"
ps2pdf "$outfile" "$tmpfile"
"$open_pdf" "$tmpfile" >/dev/null 2>&1 &
sleep 5
rm "$tmpfile" "$outfile"
I have a delimited (|) input file (TableInfo.txt) that has data as shown below
dbName1|Table1
dbName1|Table2
dbName2|Table3
dbName2|Table4
...
I have a shell script (LoadTables.sh) that parses each line and calls a executable passing args from the line like dbName, TableName. This process reads data from a SQL Server and loads it into HDFS.
while IFS= read -r line;do
fields=($(printf "%s" "$line"|cut -d'|' --output-delimiter=' ' -f1-))
query=$(< ../sqoop/"${fields[1]}".sql)
sh ../ProcessName "${fields[0]}" "${fields[1]}" "$query"
done < ../TableInfo.txt
Right now my process is running in sequential for each line in the file and its time consuming based on the number of entries in the file.
Is there any way I can execute the process in parallel? I have heard about using xargs/GNU parallel/ampersand and wait options. I am not familiar on how to construct and use it. Any help is appreciated.
Note:I don't have GNU parallel installed on the Linux machine. So xargs is the only option as I have heard some cons on using ampersand and wait option.
Put an & on the end of any line you want to move to the background. Replacing the silly (buggy) array-splitting method used in your code with read's own field-splitting, this looks something like:
while IFS='|' read -r db table; do
../ProcessName "$db" "$table" "$(<"../sqoop/${table}.sql")" &
done < ../TableInfo.txt
...FYI, re: what I meant about "buggy" --
fields=( $(foo) )
...performs not only string-splitting but also globbing on the output of foo; thus, a * in the output is replaced with a list of filenames in the current directory; a name such as foo[bar] can be replaced with files named foob, fooa or foor; the globfail shell option can cause such an expansion to result in a failure, the nullglob shell option can cause it to result in an empty result; etc.
If you have GNU xargs, consider the following:
# assuming you have "nproc" to get the number of CPUs; otherwise, hardcode
xargs -P "$(nproc)" -d $'\n' -n 1 bash -c '
db=${1%|*}; table=${1##*|}
query=$(<"../sqoop/${table}.sql")
exec ../ProcessName "$db" "$table" "$query"
' _ < ../TableInfo.txt
I am sure this has been answered before, but I cannot find the solution.
for i in `ls | grep ^t`; do echo $i; done
This gives me the expected results (the files and directories starting with t).
If I want to use a shell variable in the for loop (with the pipe), how do I do this?
This was my start but it does not work.
z="ls | grep ^t"
for i in `$z` ; do echo $i; done
EDIT: I agree this example was not wisely chosen, but what I basically need ishow to use a variable (here $z) in the for loop (for i in $z) which has at least one pipe.
I try to state my question more clearly: How do I need to define my variable (if I want to loop over the output of a command with one or more pipes) and how is the call for the for loop?
To loop through all files starting with t, you should use glob expansion instead of parsing ls output:
$ ls t*
tf tfile
$ ls
abcd tf tfile
$ for i in t*; do echo "$i"; done
tf
tfile
$
Your approach will break in a number of cases, the simplest being when file names contain spaces. There is no need to use any external tool here; t* expands to all files starting with t.
As for your question, you use ls | grep ^t
And another good practice is to use subshell instead of backticks which are more readable and can be nested, use this: $(ls | grep ^t)
You don't use a variable for this. You don't put commands in strings. See Bash FAQ 050 for discussion about why.
You shouldn't be using a for loop to read data line-by-line anyway. See Bash FAR 001 for better ways to do that.
Specifically (though really read that page):
while IFS= read -r line; do
printf '%s\n' "$line"
done < <(some command)