Error while using while read; do grep - bash

I am using this command
cat text.csv | while read a ; do grep $a text1.csv >> text2.csv; done
text.csv has file names with full path. The file names are having spaces.
Example: C:\Users\Downloads\File Name.txt
text1.csv contains logs showing user id and the file name with full path.
Example: MyName,C:\Users\Downloads\File Name.txt
When I run the command, I get and error
grep: Name: No such file or Directory
I know that the error is because of the spaces in the file name. I would like to know how can I remove this error.

Use your grep pattern with double quotes otherwise shell will treat it as different arguments to grep:
while read a ; do grep "$a" text1.csv >> text2.csv; done < text.csv
There is NO need of extra cat hence I changed it in my answer.

Quote the variable:
cat text.csv | while read a ; do grep "$a" text1.csv >> text2.csv; done
In general, you should usually quote variables, unless you specifically want the value to undergo word splitting and wildcard expansion.

Related

Read file by line then process as different variable

I have created a text file with a list of file names like below
022694-39.tar
022694-39.tar.2017-05-30_13:56:33.OLD
022694-39.tar.2017-07-04_09:22:04.OLD
022739-06.tar
022867-28.tar
022867-28.tar.2018-07-18_11:59:19.OLD
022932-33.tar
I am trying to read the file line by line then strip anything after .tar with awk and use this to create a folder unless it exists.
Then the plan is to copy the original file to the new folder with the original full name stored in $LINE.
$QNAP= "Path to storage"
$LOG_DIR/$NOVA_TAR_LIST= "Path to text file containing file names"
while read -r LINE; do
CURNT_JOB_STRIPED="$LINE | `awk -F ".tar" '{print $1}'`"
if [ ! -d "$QNAP/$CURNT_JOB_STRIPED" ]
then
echo "Folder $QNAP/$CURNT_JOB_STRIPED doesn't exist."
#mkdir "$QNAP/$CURNT_JOB_STRIPED"
fi
done <"$LOG_DIR/$NOVA_TAR_LIST"
Unfortunately this seems to be trying to join all the file names together when trying to create the directories rather than doing them one by one and I get a
File name too long
output:
......951267-21\n951267-21\n961075-07\n961148-13\n961520-20\n971333-21\n981325-22\n981325-22\n981743-40\n999111-99\n999999-04g\n999999-44': File name too long
Apologies if this is trivial, bit of a rookie...
Try modifying your script as follows:
CURNT_JOB_STRIPED=$(echo "${LINE}" | awk -F ".tar" '{print $1}')
You have to use $(...) for command substitution. Also, you should print the variable LINE in order to prevent the shell from interpreting its value as a command but passing it to the next command of the pipe (as an input) instead. Finally, you should remove the backticks from the awk expression (this is the deprecated syntax for command substitution) since what you want is the result from the piping commands.
For further information, take a look over http://tldp.org/LDP/abs/html/commandsub.html
Alternatively, and far less readable (neither with a higher performance, thus just as a "curiosity"), you can just use instead of the whole while loop:
xargs -I{} bash -c 'mkdir -p "${2}/${1%.tar*}"' - '{}' "${QNAP}" < "${LOG_DIR}/${NOVA_TAR_LIST}"
The problem is with the CURNT_JOB_STRIPED="$LINE | `awk -F ".tar" '{print $1}'`" line.
The `command` is legacy a syntax, $(command) should be used instead.
$LINE variable should be printed so awk can receive its value trough a pipe.
If you run the whole thing in a sub shell ( $(command) ) you can assign the output into a variable: var=$(date)
Is is safer to put variables into ${} so if there is surrounding text you will not get unexpected results.
This should work:
CURNT_JOB_STRIPED=$(echo "${LINE}" | awk -F '.tar' '{print $1}')
With variable substitution this can be achieved with more efficient code, and it also clean to read I believe.
Variable substitution is not change the ${LINE} variable so it can be used later as the variable that have the full filename unchanged while ${LINE%.tar*} cut the last .tar text from the variable value and with * anything after that.
while read -r LINE; do
if [ ! -d "${QNAP}/${LINE%.tar*}" ]
then
echo "Folder ${QNAP}/${LINE%.tar*} doesn't exist."
#mkdir "${QNAP}/${LINE%.tar*}"
fi
done <"${LOG_DIR}/${NOVA_TAR_LIST}"
This way you not store the directory name as variable and ${LINE} only store the filename. If You need it into a variable you can do that easily: var="${LINE%.tar*}"
Variable Substitution:
There is more i only picked this 4 for now as they similar and relevant here.
${var#pattern} - Use value of var after removing text that match pattern from the left
${var##pattern} - Same as above but remove the longest matching piece instead the shortest
${var%pattern} - Use value of var after removing text that match pattern from the right
${var%%pattern} - Same as above but remove the longest matching piece instead the shortest

Use sed substitution from different files

Okay, I am a newbie to Unix scripting. I was given the task to find a temporary work around for this:
cat /directory/filename1.xml |sed -e "s/ABCXYZ/${c}/g" > /directory/filename2.xml
$c is a variable from a sqlplus count query. I totally understand how this sed command is working. But here is where I am stuck. I am storing the count associated with the variable in another file called filename3 as count[$c] where $c is replaced with a number. So my question is how can I update this sed command to substitute ABCXYZ with the count from file3?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
UPDATE: In case anyone has a similar issue I got mine to work using:
rm /directory/folder/variablefilename.dat
echo $c >> /directory/folder/variablefilename.dat
d=$(grep [0-9] /directory/folder/variablefilename.dat)
sed -3 "s/ABC123/${d}/g" /directory/folder/inputfile.xml >> /directory/folder/outputfile.xml
thank you to Kaz for pointing me in the right direction
Store the count in filename3 using the syntax c=number. Then you can source the file as a shell script:
. /filename3 # get c variable
sed -e "s/ABCXYZ/${c}/g" /directory/filename1.xml > /directory/filename2.xml
If you can't change the format of filename3, you can write a shell function which scrapes the number out of that file and sets the c variable. Or you can scrape the number out with an external program like grep, and then interpolate its output into a variable assignment using command substitution: $(command arg ...) syntax.
Suppose we can rely on file3 to contain exactly one line of the form count[42]. Then we can just extract the digits with grep -o:
c=$(grep -E -o '[0-9]+' filename3)
sed -e "s/ABCXYZ/$c/g" /directory/filename1.xml > /directory/filename2.xml
The c variable can be eliminated, of course; you can stick the $(grep ...) into the sed command line in place of $c.
A file which contains numerous instances of syntax like count[42] for various variables could be transformed into a set of shell variable assignments using sed, and then sourced into the current shell to make those assignments happen:
$ sed -n -e 's/^\([A-Za-z_][A-Za-z0-9_]\+\)\[\(.*\)\]/\1=\2/p' filename3 > vars.sh
$ . ./vars.sh
you can use sed like this
sed -r "s/ABCXYZ/$(sed -nr 's/.*count[[]([0-9])+[]].*/\1/p' path_to_file)/g" path_to_file
the expression is double quoted which allow the shell to execute below and find the number in count[$c] in the file and use it as a substitute
$(sed -nr 's/.*count[[]([0-9])+[]].*/\1/p' path_to_file)

How to Read a file word by word and use those words to grep in bash shell?

I want to read a file word by word and i want to use each word in that text file as an input to grep.
to read the file word by word i have used the following code
for word in $(<filename)
do
echo "$word"
done
now when I replaced
echo "$word"
with
grep -i "$word"
I'm not getting any output.
The following will read the file word by word and apply grep using the read word as input:
#!/bin/bash
while read line; do
for word in $line; do
grep -i "<REGULAR_EXPRESSION_HERE>" "$word"
done
done < filename
The reason you are not getting any output is that grep expects two arguments. If you leave out the filename argument, it will wait for you to type in the text to grep from; it is reading standard input. (This is what allows you to use it in a pipeline, like command | grep error.)
Anyway, what you are attempting is already built into grep. Just pass it the file of search expressions as an argument to -f.
grep -irf filename .
where -r says to search recursively through all the files in a directory and . is the current directory.
Note, however, that this will search for matches anywhere on a line. If your input file contains dog then grep will find a match on lines which contain dogmatic or endogenous; and if it contains an empty line, it will match all lines in all files. Maybe look at the -w and/or -x options (as well as perhaps -F to disarm any regex specials in the input) to address these issues.
See if this serves your purpose:
$ grep -o "\S*" filename | grep -i "<your regex here>"
The first grep in the pipeline will flatten the file to one word per line. Then second grep will search those word for your regex.
Note: This answer assumes that the individual words in file are the data you want to grep in. If those are supposed to be interpreted as filenames, refer to higuaro's answer.
This is what worked for me
while read line
do
output=`grep -i "$line" /filepath/*`
if [ $? -eq 0 ]; then
echo "$line present in file : $output"
fi
done <filename

bash string expansion for grep -E expression in single quotes

Within a bash script I am trying to construct a string for grep -E so that it appears as
grep -E 'alice|bar|bob|foo'
If I test the grep at the command line-- ls * | grep -E 'alice|bar|bob|foo'-- things work as expected. It excludes all the files with the same name as the list within the extended regular expression.
The issue I've found is that it will not match the first and last strings within a bash script if I construct the string as 'alice|bar|bob|foo'
Broken testcase:
#!/bin/bash
touch foo.txt bar.txt alice.txt bob.txt
touch alice.tmp bob.tmp foo.tmp crump.tmp dammitall.tmp
EXCLUDE_PATTERN=$(echo *.txt | sed 's/\.txt /|/g' | sed 's/\.txt//')
EXCLUDE_PATTERN="'""$EXCLUDE_PATTERN""'"
echo "Excluding files that match the string $EXCLUDE_PATTERN"
for file in *.tmp
do
if echo $file | grep -q -E $EXCLUDE_PATTERN
then
echo "Keeping $file"
else
echo "Deleting $file"
rm -f $file
fi
done
Outputs:
Excluding files that match the string 'alice|bar|bob|foo'
Deleting alice.tmp
Keeping bob.tmp
Deleting crump.tmp
Deleting dammitall.tmp
Deleting foo.tmp
... and yet I don't want it do delete alice.tmp or foo.tmp because they're in the regex!
I assume the shell is getting some characters that it's not when the string is expanded in this script, but I can't for the life of me figure out in what manner the string passed to grep -E is getting hosed by the "broken" script above.
Variations like EXCLUDE_PATTERN="'$EXCLUDE_PATTERN'" don't seem to help. Haven't found the magic string.
Edit to include useful comment below:
Using set -x indicates that bash does the single-quote wrapping itself, so the incorrect code above does this EXCLUDE_PATTERN=''\''alice|bar|bob|foo'\''' which is just adding single quotes around single quotes.
Why are you adding the single quote marks? Just remove this line:
EXCLUDE_PATTERN="'""$EXCLUDE_PATTERN""'"
I'm getting the following without that line:
Excluding files that match the string alice|bar|bob|foo
Keeping alice.tmp
Keeping bob.tmp
Deleting crump.tmp
Deleting dammitall.tmp
Keeping foo.tmp

How to read output of sed into a variable

I have variable which has value "abcd.txt".
I want to store everything before the ".txt" in a second variable, replacing the ".txt" with ".log"
I have no problem echoing the desired value:
a="abcd.txt"
echo $a | sed 's/.txt/.log/'
But how do I get the value "abcd.log" into the second variable?
You can use command substitution as:
new_filename=$(echo "$a" | sed 's/.txt/.log/')
or the less recommended backtick way:
new_filename=`echo "$a" | sed 's/.txt/.log/'`
You can use backticks to assign the output of a command to a variable:
logfile=`echo $a | sed 's/.txt/.log/'`
That's assuming you're using Bash.
Alternatively, for this particular problem Bash has pattern matching constructs itself:
stem=$(textfile%%.txt)
logfile=$(stem).log
or
logfile=$(textfile/%.txt/.log)
The % in the last example will ensure only the last .txt is replaced.
The simplest way is
logfile="${a/\.txt/\.log}"
If it should be allowed that the filename in $a has more than one occurrence of .txt in it, use the following solution. Its more safe. It only changes the last occurrence of .txt
logfile="${a%%\.txt}.log"
if you have Bash/ksh
$ var="abcd.txt"
$ echo ${var%.txt}.log
abcd.log
$ variable=${var%.txt}.log

Resources