Shell Scripting -awk option help required [duplicate] - shell

This question already has answers here:
Shell Scripting -Help needed [closed]
(2 answers)
Closed 8 years ago.
The script should read each file in the path and replace the string in each single row.How to create temp file and mv replace while i am iterating 10 diff input files name in the same path
Pls advice
SunOS 5.10
FILES=/export/home/*.txt
for f in $FILES
do
echo "Processing $f file..."
cat $f | awk 'BEGIN {FS="|"; OFS="|"} {$8=substr($8, 1, 6)"XXXXXXXXXX\""; print}'
done
input file
"2013-04-30"|"X"|"0000628"|"15000231"|"1999-12-05"|"ST"|"2455525445552000"|"1111-11-11"|75.00|"XXE11111"|"224425"
"2013-04-30"|"Y"|"0000928"|"95000232"|"1999-12-05"|"VT"|"2455525445552000"|"1111-11-11"|95.00|"VVE11111"|"224425"
output file
"2013-04-30"|"X"|"0000628"|"15000231"|"1999-12-05"|"ST"|"245552xxxxxxxxxx"|"1111-11-11"|75.00|"XXE11111"|"224425"
"2013-04-30"|"Y"|"0000928"|"95000232"|"1999-12-05"|"VT"|"245552xxxxxxxxxx"|"1111-11-11"|95.00|"VVE11111"|"224425"
Not sure how use this
cat $f | awk 'BEGIN {FS="|"; OFS="|"} {$8=substr($8, 1, 6)"XXXXXXXXXX\""; print}' $f > tmp.txt && mv tmp.txt $f

To achieve what appears to be your desired end result you can use ed instead of awk.
FILES=/export/home/*.txt
for f in $FILES; do
echo "Processing ${f} file..."
ed "${f}" <<EOF
% s/\([0-9]\{6\}\)[0-9]\{10\}/\1xxxxxxxxxx/
w
q
EOF
done
This requires fewer steps (ie command calls) because you're not creating a temp file and moving it. Instead you're editing the file in place with the desired changes and then closing it.
% means "operate on every line" (don't worry about lines that don't match)
s means "perform a substitution" -- /[pattern]/[replacement]/
w means write
q means quit
EOF closes out the "here document"
Hope that helps.
Edit Note: Charles Duffy pointed out that ed ${f} would fail on files names with spaces in them and ed "${f}" would not suffer from that particular deficiency. This is true. It's also the case, however, that the for loop above would likely split on any spaces in the file names. You can set IFS (IFS='\n') to get around this limitation on KSH, BASH, MKSH, ASH, and DASH. In ZSH (depending on your version) you may need to set SH_WORD_SPLIT. As an alternative you can change from a for loop to a while loop with read:
FILES=/export/home/*.txt
ls ${FILES} | while read f; do
echo "Processing ${f} file..."
ed "${f}" <<-EOF
% s/\([0-9]\{6\}\)[0-9]\{10\}/\1xxxxxxxxxx/
w
q
EOF
done
Edit Note: My erroneous statements above stricken but kept for historical purposes. See comments from Charles Duffy (below) for clarification.

Related

Some tips to improve a bash script for count fastq files

Hi guys I got this bash one line that i wish to make a script
for i in 'ls *.fastq.gz'; do echo $(zcat ${i} | wc -l)/4|bc; done
I would like to make it as a script to read from a data dir and print out the result with the name of the file.
I tried to put the dir in front of the 'data/*.fastq.gz' but got am error No such dir exist...
I would like some like this:
name1.fastq.gz 1898516
name2.fastq.gz 2467421
namen.fastq.gz 1234532
I am not experienced in bash.
Could you guys give a help?
Thanks
Take the dir as an argument, but default to the current dir if it's not set.
dir="${1-.}"
Then put it in the glob: "$dir"/*.fastq.gz
As well:
Quote variables and command expansions.
Don't parse ls.
Don't trust echo with arbitrary data (filenames). Use printf instead.
Use an end-of-options flag -- when giving filenames to commands.
I prefer to not have any inline command expansions, but that's just personal preference
Putting it together:
#!/bin/bash
dir="${1-.}"
for file in "$dir"/*.fastq.gz; do
printf '%s ' "$file"
lines="$(zcat -- "$file" | wc -l)"
bc <<< "$lines/4" # Using a here-string (Bash feature)
done
There is no need to escape to bc for integer math (divide by 4), or to use 'ls' to enumerate the files. The original version will do with minor changes:
#!/bin/bash
dir="${1-.}"
for i in "$dir"/*.fastq.gz; do
lines=$(zcat "${i}" | wc -l)
printf '%s %d\n' "$i" "$((lines/4))"
done

Reading and writing line by line in a bash script

After searching online I was able to figure out how to read a file line by line:
while read p; do
echo $p
done < file.txt
But I would actually like to modify the line in the file.
For example:
while read p; do
if condition
then
echo $p | perl -i -pe 's/a/b/'
fi
done < file.txt
However this doesn't actually modify the file.
Update   A far better version of bash code added. Thanks to Charles Duffy for comments.
Your Perl one-liner takes a line piped into it by echo $p |, getting its standard input that way. It doesn't do anything with the file itself, so the -i flag has no effect. The -p makes it print to the standard output stream. So that whole line, echo ..., doesn't touch the file.
You can redirect the output to a new file and then move that to overwrite file.txt. Here is a simple minded example, that appends each line to a new file. For better bash code see the update below.
while read p; do
if condition
then
echo $p | perl -pe 's/a/b/' >> temp_out.txt
else
echo $p >> temp_out.txt
fi
done < file.txt
mv temp_out.txt file.txt
We have to add the else where all unmodified lines are also appended. Note that in general we cannot have just some lines replaced but the whole file has to be re-written.
If this is all that the script does you can do it with a very simple one-liner, see the end. If more work is done you can also put it all in a Perl script but I take it that there may be other good reasons for a bash script.
Update   A much better version of the above. See read and echo in Builtins in Bash manual
Appending each line opens the file anew each time without a need for that.
Just redirect at the end of the loop, much like it is done in the terminal
read uses backslash for escaping, removing it from input. Turn that off with -r
Trailing white space is removed, as a part of breaking the line into words. Suppress this by unsetting the variable that controls which characters are used for splitting, IFS=
The echo $p can do all kinds of unintended things. A formatted print is better, printf '%s\n' "$p", or at least echo "$p"
With this,
while IFS= read -r p; do
if condition
then
echo "$p" | perl -pe 's/a/b/'
else
echo "$p"
fi
done < file.txt > temp_out.txt
mv temp_out.txt file.txt
Finally, if the sole purpose of the Perl one-liner were to run a simple substitution, it is much better to simply do that in the shell itself than to have a pipeline and run a whole new process for each line.
echo "${p//a/b}"
Thanks to Charles Duffy for raising all these points in comments.
A few comments on Perl one-liners. See documentation at perlrun.
The command perl -e '...' executes any valid Perl code between ''. When we add the -n or -p switch it also reads standard input and executes that code on a line of it at the time, where -p also prints out each line after it's processed. The standard input can be supplied to it from a file,
perl -pe '...' input.txt
in which case adding -i flag will result in the file being changed in-place. Or, the input can be piped into it, for example
echo "input text" | perl -pe '...'
in which case the processed line is printed to standard output. This can be redirected to a file, as in the answer above.
To make changes to a given file a line at a time you only need this on the command line
perl -i -pe 's/a/b/' file.txt
If there is more work to do then it may well be better to put it in a script, of course. In this case the one-liner can be a command in the bash script as well, replacing all that code above (unless some bash-specific functionality is preferred for processing lines).

Loop for deleting first line of Multiple Files using Bash Script

Just new to Bash scripting and programming in general. I would like to automate the deletion of the first line of multiple .data files in a directory. My script is as follows:
#!/bin/bash
for f in *.data ;
do tail -n +2 $f | echo "processing $f";
done
I get the echo message but when I cat the file nothing has changed. Any ideas?
Thanks in advance
I get the echo message but when I cat the file nothing has changed.
Because simply tailing wouldn't change the file.
You could use sed to modify the files in-place with the first line excluded. Saying
sed -i '1d' *.data
would delete the first line from all .data files.
EDIT: BSD sed (on OSX) would expect an argument to -i, so you can either specify an extension to backup older files, or to edit the files in-place, say:
sed -i '' '1d' *.data
You are not changing the file itself. By using tail you simply read the file and print parts of it to stdout (the terminal), you have to redirect that output to a temporary file and then overwrite the original file with the temporary one.
#!/usr/bin/env bash
for f in *.data; do
tail -n +2 "$f" > "${f}".tmp && mv "${f}".tmp "$f"
echo "Processing $f"
done
Moreover it's not clear what you'd like to achieve with the echo command. Why do you use a pipe (|) there?
sed will give you an easier way to achieve this. See devnull's answer.
I'd do it this way:
#!/usr/bin/env bash
set -eu
for f in *.data; do
echo "processing $f"
tail -n +2 "$f" | sponge "$f"
done
If you don't have sponge you can get it in the moreutils package.
The quotes around the filename are important--they will make it work with filenames containing spaces. And the env thing at the top is so that people can set which Bash interpreter they want to use via their PATH, in case someone has a non-default one. The set -eu makes Bash exit if an error occurs, which is usually safer.
ed is the standard editor:
shopt -s nullglob
for f in *.data; do
echo "Processing file \`$f'"
ed -s -- "$f" < <( printf '%s\n' "1d" "wq" )
done
The shopt -s nullglob is here just because you should always use this when using globs, especially in a script: it will make globs expand to nothing if there are no matches; you don't want to run commands with uncontrolled arguments.
Next, we loop on all your files, and use ed with the commands:
1: go to first line
d: delete that line
wq: write and quit
Options for ed:
-s: tells ed to shut up! we don't want ed to print its junk on our screen.
--: end of options: this will make your script much more robust, in case a file name starts with a hypen: in this case, the hyphen will confuse ed trying to process it as an option. With --, ed knows that there are no more options after that and will happily process any files, even those starting with a hyphen.

list in script shell bash

I did this script
#!/bin/bash
liste=`ls -l`
for i in $liste
do
echo $i
done
The problem is I want the script displays each result line by line, but it displays word by word :
I have :
my_name
etud
4096
Oct
8
10:13
and I want to have :
my_name etud 4096 Oct 8 10:13
The final aim of the script is to analyze each line ; it is the reason I want to be able to recover the entire line. Maybe the list is not the best solution but I don't know how to recover the lines.
To start, we'll assume that none of your filenames ever contain newlines:
ls -l | IFS= while read -r line; do
echo "$line"
# Do whatever else you want with $line
done
If your filenames could contain newlines, things get tricky. In this case, it's better (although slower) to use stat to retrieve the desired metadata from each file individually. Consult man stat for details about how your local variety of stat works, as it is unfortunately not very standardized.
for f in *; do
line=$(stat -c "%U %n %s %y" "$f") # One possibility
# Work with $line as if it came from ls -l
done
You can replace
echo $i
with
echo -n "$i "
echo -n outputs to console without newline.
Another to do it with a while loop and without a pipe:
#!/bin/bash
while read line
do
echo "line: $line"
done < <(ls -l)
First, I hope that you aren't genuinely using ls in your real code, but only using it as an example. If you want a list of files, ls is the wrong tool; see http://mywiki.wooledge.org/ParsingLs for details.
Second, modern versions of bash have a builtin called readarray.
Try this:
readarray -t my_array < <(ls -l)
for entry in "${my_array[#]}"; do
read -a pieces <<<"$entry"
printf '<%s> ' "${pieces[#]}"; echo
done
First, it creates an array (called my_array) with all the output from the command being run.
Then, for each line in that output, it creates an array called pieces, and emits each piece with arrow brackets around them.
If you want to read a line at a time, rather than reading the entire file at once, see http://mywiki.wooledge.org/BashFAQ/001 ("How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?")
Joinning the previous answers with the need to store the list of files in a variable. You can do this
echo -n "$list"|while read -r lin
do
echo $lin
done

Unix shell for loop [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Unix for loop help please?
I am trying to list the names of all the files in a directory separated by a blank line. I was using a for loop but after trying a few examples, none really work by adding blank lines in between. Any ideas?
Is there any command which outputs only the first line of a file in unix? How could I only display the first line?
for i in ls
do
echo "\n" && ls -l
done
for i in ls
do
echo "\n"
ls
done
Use head or sed 1q to display only the first line of a file. But in this case, if I'm understanding you correctly, you want to capture and modify the output of ls.
ls -l | while read f; do
printf '%s\n\n' "$f"
# alternately
echo "$f"; echo
done
IFS="
"
for i in $(ls /dir/name/here/or/not)
do
echo -e "$i\n"
done
To see the first part of a file use head and for the end of a file use tail (of course). The command head -n 1 filename will display the first line. Use man head to get more options. (I know how that sounds).
Use shell expansion instead of ls to list files.
for file in *
do
echo "$file"
echo
if [ -f "$file" ];then
read firstline < "$file"
echo "$firstline" # read first line
fi
done

Resources