bash script prepending ? to file name - bash

I am using the below script. When I have it echo $f as shown below, it gives the correct result:
#/bin/bash
var="\/home\/"
while read p; do
f=$(echo $p | sed "s/${var}/\\n/g")
f=${f%.sliced.bam}.fastq
echo $f
~/bin/samtools view $p | awk '{print "#"$1"\n"$10"\n+\n"$11}' > $f
./run.sh $f ${f%.fastq}
rm ${f%.sliced.bam}.fastq
done < $1
I get the output as expected
test.fastq
But the file being created by awk > $f has the name
?test.fastq
Note that the overall goal here is to run this loop on every file listed in a file with absolute paths but then write locally (which is what the sed call is for)
edit: Run directly on the command line (without variables) the samtools | awk line runs correctly.

Awk cannot possibly have anything to do with your problem. The shell is completely responsible for file redirection, so f MUST have a weird character in it.
Most likely whatever you are sending to this script has a special character in it (e.g. perhaps a UTF character, and your terminal is showing ASCII only). When you do the echo, the shell doesn't know how to display the char, and probably just shows it as whitespace, and when you send it through ls (which might be doing things like colorization) it combines in a strange way and ends up showing the ?.
Oh wait...why are you putting a newline into the filename with sed??? That is possibly your problem...try just:
sed "s/${var}//g"

Related

Read file by line then process as different variable

I have created a text file with a list of file names like below
022694-39.tar
022694-39.tar.2017-05-30_13:56:33.OLD
022694-39.tar.2017-07-04_09:22:04.OLD
022739-06.tar
022867-28.tar
022867-28.tar.2018-07-18_11:59:19.OLD
022932-33.tar
I am trying to read the file line by line then strip anything after .tar with awk and use this to create a folder unless it exists.
Then the plan is to copy the original file to the new folder with the original full name stored in $LINE.
$QNAP= "Path to storage"
$LOG_DIR/$NOVA_TAR_LIST= "Path to text file containing file names"
while read -r LINE; do
CURNT_JOB_STRIPED="$LINE | `awk -F ".tar" '{print $1}'`"
if [ ! -d "$QNAP/$CURNT_JOB_STRIPED" ]
then
echo "Folder $QNAP/$CURNT_JOB_STRIPED doesn't exist."
#mkdir "$QNAP/$CURNT_JOB_STRIPED"
fi
done <"$LOG_DIR/$NOVA_TAR_LIST"
Unfortunately this seems to be trying to join all the file names together when trying to create the directories rather than doing them one by one and I get a
File name too long
output:
......951267-21\n951267-21\n961075-07\n961148-13\n961520-20\n971333-21\n981325-22\n981325-22\n981743-40\n999111-99\n999999-04g\n999999-44': File name too long
Apologies if this is trivial, bit of a rookie...
Try modifying your script as follows:
CURNT_JOB_STRIPED=$(echo "${LINE}" | awk -F ".tar" '{print $1}')
You have to use $(...) for command substitution. Also, you should print the variable LINE in order to prevent the shell from interpreting its value as a command but passing it to the next command of the pipe (as an input) instead. Finally, you should remove the backticks from the awk expression (this is the deprecated syntax for command substitution) since what you want is the result from the piping commands.
For further information, take a look over http://tldp.org/LDP/abs/html/commandsub.html
Alternatively, and far less readable (neither with a higher performance, thus just as a "curiosity"), you can just use instead of the whole while loop:
xargs -I{} bash -c 'mkdir -p "${2}/${1%.tar*}"' - '{}' "${QNAP}" < "${LOG_DIR}/${NOVA_TAR_LIST}"
The problem is with the CURNT_JOB_STRIPED="$LINE | `awk -F ".tar" '{print $1}'`" line.
The `command` is legacy a syntax, $(command) should be used instead.
$LINE variable should be printed so awk can receive its value trough a pipe.
If you run the whole thing in a sub shell ( $(command) ) you can assign the output into a variable: var=$(date)
Is is safer to put variables into ${} so if there is surrounding text you will not get unexpected results.
This should work:
CURNT_JOB_STRIPED=$(echo "${LINE}" | awk -F '.tar' '{print $1}')
With variable substitution this can be achieved with more efficient code, and it also clean to read I believe.
Variable substitution is not change the ${LINE} variable so it can be used later as the variable that have the full filename unchanged while ${LINE%.tar*} cut the last .tar text from the variable value and with * anything after that.
while read -r LINE; do
if [ ! -d "${QNAP}/${LINE%.tar*}" ]
then
echo "Folder ${QNAP}/${LINE%.tar*} doesn't exist."
#mkdir "${QNAP}/${LINE%.tar*}"
fi
done <"${LOG_DIR}/${NOVA_TAR_LIST}"
This way you not store the directory name as variable and ${LINE} only store the filename. If You need it into a variable you can do that easily: var="${LINE%.tar*}"
Variable Substitution:
There is more i only picked this 4 for now as they similar and relevant here.
${var#pattern} - Use value of var after removing text that match pattern from the left
${var##pattern} - Same as above but remove the longest matching piece instead the shortest
${var%pattern} - Use value of var after removing text that match pattern from the right
${var%%pattern} - Same as above but remove the longest matching piece instead the shortest

For Loop Issues with CAT and tr

I have about 700 text files that consist of config output which uses various special characters. I am using this script to remove the special characters so I can then run a different script referencing an SED file to remove the commands that should be there leaving what should not be in the config.
I got the below from Remove all special characters and case from string in bash but am hitting a wall.
When I run the script it continues to loop and writes the script into the output file. Ideally, it just takes out the special characters and creates a new file with the updated information. I have not gotten to the point to remove the previous text file since it probably wont be needed. Any insight is greatly appreciated.
for file in *.txt for file in *.txt
do
cat * | tr -cd '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]' >> "$file" >> "$file".new_file.txt
done
A less-broken version of this might look like:
#!/usr/bin/env bash
for file in *.txt; do
[[ $file = *.new_file.txt ]] && continue ## skip files created by this same script
tr -cd '[:alnum:]\n\r' <"$file" \
| tr '[:upper:]' '[:lower:]' \
>> "$file".new_file.txt
done
Note:
We're referring to the "$file" variable being set by for.
We aren't using cat. It slows your script down with no compensating benefits whatsoever. Instead, using <"$file" redirects from the specific input file being iterated over at present.
We're skipping files that already have .new_file.txt extensions.
We only have one output redirection (to the new_file.txt version of the file; you can't safely write to the file you're using as input in the same pipeline).
Using GNU sed:
sed -i 's/[^[:alnum:]\n\r]//g;s/./\l&/g' *.txt

Deleting first n rows and column x from multiple files using Bash script

I am aware that the "deleting n rows" and "deleting column x" questions have both been answered individually before. My current problem is that I'm writing my first bash script, and am having trouble making that script work the way I want it to.
file0001.csv (there are several hundred files like these in one folder)
Data number of lines 540
No.,Profile,Unit
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm
Desired output
1,1027.84
2,1027.92
3,1028
4,1028.81
I am able to use sed and cut individually but for some reason the following bash script doesn't take cut into account. It also gives me an error "sed: can't read ls: No such file or directory", yet sed is successful and the output is saved to the original files.
sem2csv.sh
for files in 'ls *.csv' #list of all .csv files
do
sed '1,2d' -i $files | cut -f '1-2' -d ','
done
Actual output:
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm
I know there may be awk one-liners but I would really like to understand why this particular bash script isn't running as intended. What am I missing?
The -i option of sed modifies the file in place. Your pipeline to cut receives no input because sed -i produces no output. Without this option, sed would write the results to standard output, instead of back to the file, and then your pipeline would work; but then you would have to take care of writing the results back to the original file yourself.
Moreover, single quotes inhibit expansion -- you are "looping" over the single literal string ls *.csv. The fact that you are not quoting it properly then causes the string to be subject to wildcard expansion inside the loop. So after variable interpolation, your sed command expands to
sed -i 1,2d ls *.csv
and then the shell expands *.csv because it is not quoted. (You should have been receiving a warning that there is no file named ls in the current directory, too.) You probably attempted to copy an example which used backticks (ASCII 96) instead of single quotes (ASCII 39) -- the difference is quite significant.
Anyway, the ls is useless -- the proper idiom is
for files in *.csv; do
sed '1,2d' "$files" ... # the double quotes here are important
done
Mixing sed and cut is usually not a good idea because you can express anything cut can do in terms of a simple sed script. So your entire script could be
for f in *.csv; do
sed -i -e '1,2d' -e 's/,[^,]*$//' "$f"
done
which says to remove the last comma and everything after it. (If your sed does not like multiple -e options, try with a semicolon separator: sed -i '1,2d;s/,[^,]*$//' "$f")
You may use awk,
$ awk 'NR>2{sub(/,[^,]*$/,"",$0);print}' file
1,1027.84
2,1027.92
3,1028
4,1028.81
or
sed -i '1,2d;s/,[^,]*$//' file
1,2d; for deleting the first two lines.
s/,[^,]*$// removes the last comma part in remaining lines.

Reading a file line by line in ksh

We use some package called Autosys and there are some specific commands of this package. I have a list of variables which i like to pass in one of the Autosys commands as variables one by one.
For example one such variable is var1, using this var1 i would like to launch a command something like this
autosys_showJobHistory.sh var1
Now when I launch the below written command, it gives me the desired output.
echo "var1" | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
But if i put the var1 in a file say Test.txt and launch the same command using cat, it gives me nothing. I have the impression that command autosys_showJobHistory.sh does not work in that case.
cat Test.txt | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
What I am doing wrong in the second command ?
Wrote all of below, and then noticed your grep statement.
Recall that ksh doesn't support .. as an indicator for 'expand this range of values'. (I assume that's your intent). It's also made ambiguous by your lack of quoting arguments to grep. If you were using syntax that the shell would convert, then you wouldn't really know what reg-exp is being sent to grep. Always better to quote argments, unless you know for sure that you need the unquoted values. Try rewriting as
grep '1[1-6]:[0-9][0-9]' | grep '24.12.2012'
Also, are you deliberately using the 'match any char' operator '.' OR do you want to only match a period char? If you want to only match a period, then you need to escape it like \..
Finally, if any of your files you're processing have been created on a windows machine and then transfered to Unix/Linux, very likely that the line endings (Ctrl-MCtrl-J) (\r\n) are causing you problems. Cleanup your PC based files (or anything that was sent via ftp) with dos2unix file [file2 ...].
If the above doesn't help, You'll have to "divide and conquer" to debug your problem.
When I did the following tests, I got the expected output
$ echo "var1" | while read line ; do print "line=${line}" ; done
line=var1
$ vi Test.txt
$ cat Test.txt
var1
$ cat Test.txt | while read line ; do print "line=${line}" ; done
line=var1
Unrelated to your question, but certain to cause comment is your use of the cat commnad in this context, which will bring you the UUOC award. That can be rewritten as
while read line ; do print "line=${line}" ; done < Test.txt
But to solve your problem, now turn on the shell debugging/trace options, either by changing the top line of the script (the shebang line) like
#!/bin/ksh -vx
Or by using a matched pair to track the status on just these lines, i.e.
set -vx
while read line; do
print -u2 -- "#dbg: Line=${line}XX"
autosys_showJobHistory.sh $line \
| grep 1[1..6]:[0..9][0..9] \
| grep 24.12.2012 \
| tail -1
done < Test.txt
set +vx
I've added an extra debug step, the print -u2 -- .... (u2=stderror, -- closes option processing for print)
Now you can make sure no extra space or tab chars are creeping in, by looking at that output.
They shouldn't matter, as you have left your $line unquoted. As part of your testing, I'd recommend quoting it like "${line}".
Then I'd comment out the tail and the grep lines. You want to see what step is causing this to break, right? So does the autosys_script by itself still produce the intermediate output you're expecting? Then does autosys + 1 grep produce out as expected, +2 greps, + tail? You should be able to easily see where you're loosing your output.
IHTH

Trying to write a script to clean <script.aa=([].slice+'hjkbghkj') from multiple htm files, recursively

I am trying to modify a bash script to remove a glob of malicious code from a large number of files.
The community will benefit from this, so here it is:
#!/bin/bash
grep -r -l 'var createDocumentFragm' /home/user/Desktop/infected_site/* > /home/user/Desktop/filelist.txt
for i in $(cat /home/user/Desktop/filelist.txt)
do
cp -f $i $i.bak
done
for i in $(cat /home/user/Desktop/filelist.txt)
do
$i | sed 's/createDocumentFragm.*//g' > $i.awk
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
This is where the script bombs out with this message:
+ for i in '$(cat /home/user/Desktop/filelist.txt)'
+ sed 's/createDocumentFragm.*//g'
+ /home/user/Desktop/infected_site/index.htm
I get 2 errors and the script stops.
/home/user/Desktop/infected_site/index.htm: line 1: syntax error near unexpected token `<'
/home/user/Desktop/infected_site/index.htm: line 1: `<html><head><script>(function (){ '
I have the first 2 parts done.
The files containing createDocumentfragm have been enumerated in a text file correctly.
The files in the textfile.txt have been duplicated, in their original location with a .bak added to them IE: infected_site/some_directory/infected_file.htm and infected_file.htm.bak
effectively making sure we have a backup.
All I need to do now is write an AWK command that will use the list of files in filelist.txt, use the entire glob of malicious text as a pattern, and remove it from the files. Using just the uppercase script as the starting point, and the lower case script is too generic and could delete legitimate text
I suspect this may help me, but I don't know how to use it correctly.
http://backreference.org/2010/03/13/safely-escape-variables-in-awk/
Once I have this part figured out, and after you have verified that the files weren't mangled you can do this to clean out the bak files:
for i in $(cat /home/user/Desktop/filelist.txt)
do
rm -f $i.bak
done
Several things:
You have:
$i | sed 's/var createDocumentFragm.*//g' > $i.awk
You should probably meant this (using your use of cat which we'll talk about in a moment):
cat $i | sed 's/var createDocumentFragm.*//g' > $i.awk
You're treating each file in your file list as if it was a command and not a file.
Now, about your use of cat. If you're using cat for almost anything but concatenating multiple files together, you probably are doing something not quite right. For example, you could have done this:
sed 's/var createDocumentFragm.*//g' "$i" > $i.awk
I'm also a bit confused about the awk statement. Exactly what file are you using awk on? Your awk statement is using STDIN and STDOUT, so it's reading file names from the for loop and then printing the output on the screen. Is the sed statement suppose to feed into the awk statement?
Note that I don't have to print out my file to STDOUT, then pipe that into sed. The sed command can take the file name directly.
You also want to avoid for loops over a list of files. That is very inefficient, and can cause problems with the command line getting overloaded. Not a big issue today, but can affect you when you least suspect it. What happens is that your $(cat /home/user/Desktop/filelist.txt) must execute first before the for loop can even start.
A little rewriting of your program:
cd ~/Desktop
grep -r -l 'var createDocumentFragm' infected_site/* > filelist.txt
while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
We can use one loop, and we made it a while loop. I could even feed the grep into that while loop:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
and then I don't even have to create a temporary file.
Let me know what's going on with the awk. I suspect you wanted something like this:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" \
| awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p' > "$i.awk"
done < filelist.txt
Also note I put quotes around file names. This helps prevent problems if file name has a space in it.

Resources