Modifying data using awk

Modifying data using awk - bash

In a long file i'm searching for something like this:
c 0.5p_f
10 px 2
I need to modify a 3rd column of a line after 'c 0.5p_f' marker.
It's part of a bash script that would do this and i would like to avoid using, like, awk scripts, only bash commands.

Why not use awk? It's perfect.
do_modify{$3="modify";do_modify=0}/c 0\.5p_f/{do_modify=1}1
If you can use sed scripts,
/c 0\.5p_f/{n;s/\([^[:space:]]*[[:space:]]\+[^[:space:]]*[[:space:]]\+\)\S*/\1modify/}
would do. Not that pure Bash is hard either, though.
do_modify=
while read -r line; do
if [[ -n ${do_modify} ]]; then
columns=(${line})
columns[2]=modified
line=${columns[*]}
do_modify=
fi
printf '%s\n' "${line}"
if [[ ${line} = *'c 0.5p_f'* ]]; then
do_modify=1
fi
done

Related

Some tips to improve a bash script for count fastq files

Hi guys I got this bash one line that i wish to make a script
for i in 'ls *.fastq.gz'; do echo $(zcat ${i} | wc -l)/4|bc; done
I would like to make it as a script to read from a data dir and print out the result with the name of the file.
I tried to put the dir in front of the 'data/*.fastq.gz' but got am error No such dir exist...
I would like some like this:
name1.fastq.gz 1898516
name2.fastq.gz 2467421
namen.fastq.gz 1234532
I am not experienced in bash.
Could you guys give a help?
Thanks

Take the dir as an argument, but default to the current dir if it's not set.
dir="${1-.}"
Then put it in the glob: "$dir"/*.fastq.gz
As well:
Quote variables and command expansions.
Don't parse ls.
Don't trust echo with arbitrary data (filenames). Use printf instead.
Use an end-of-options flag -- when giving filenames to commands.
I prefer to not have any inline command expansions, but that's just personal preference
Putting it together:
#!/bin/bash
dir="${1-.}"
for file in "$dir"/*.fastq.gz; do
printf '%s ' "$file"
lines="$(zcat -- "$file" | wc -l)"
bc <<< "$lines/4" # Using a here-string (Bash feature)
done

There is no need to escape to bc for integer math (divide by 4), or to use 'ls' to enumerate the files. The original version will do with minor changes:
#!/bin/bash
dir="${1-.}"
for i in "$dir"/*.fastq.gz; do
lines=$(zcat "${i}" | wc -l)
printf '%s %d\n' "$i" "$((lines/4))"
done

Why does my variable set in a do loop disappear? (unix shell)

This part of my script is comparing each line of a file to find a preset string. If the string does NOT exist as a line in the file, it should append it to the end of the file.
STRING=foobar
cat "$FILE" | while read LINE
do
if [ "$STRING" == "$LINE" ]; then
export ISLINEINFILE="yes"
fi
done
if [ ! "$ISLINEINFILE" == yes ]; then
echo "$LINE" >> "$FILE"
fi
However, it appears as if both $LINE and $ISLINEINFILE are both cleared upon finishing the do loop. How can I avoid this?

Using shell
If we want to make just the minimal change to your code to get it working, all we need to do is switch the input redirection:
string=foobar
while read line
do
if [ "$string" == "$line" ]; then
islineinfile="yes"
fi
done <"$file"
if [ ! "$islineinfile" == yes ]; then
echo "$string" >> "$file"
fi
In the above, we changed cat "$file" | while do ...done to while do...done<"$file". With this one change, the while loop is no longer in a subshell and, consequently, shell variables created in the loop live on after the loop completes.
Using sed
I believe that the whole of your script can be replaced with:
sed -i.bak '/^foobar$/H; ${x;s/././;x;t; s/$/\nfoobar/}' file*
The above adds line foobar to the end of each file that doesn't already have a line that matches ^foobar$.
The above shows file* as the final argument to sed. This will apply the change to all files matching the glob. You could list specific files individually if you prefer.
The above was tested on GNU sed (linux). Minor modifications may be needed for BSD/OSX sed.
Using GNU awk (gawk)
awk -i inplace -v s="foobar" '$0==s{f=1} {print} ENDFILE{if (f==0) print s; f=0}' file*
Like the sed command, this can tackle multiple files all in one command.

Why does my variable set in a do loop disappear?
It disappears because it is set in a shell pipeline component. Most shells run each part of a pipeline in a subshell. By Unix design, variables set in a subshell cannot affect their parent or any already running other shell.
How can I avoid this?
There are several ways:
The simplest is to use a shell that doesn't run the last component of a pipeline in a subshell. This is ksh default behavior, e.g. use that shebang:
#!/bin/ksh
This behavior can also be bash one when the lastpipe option is set:
shopt -s lastpipe
You might use the variable in the same subshell that set it. Note that your original script indentation is wrong and might lead to the incorrect assumption that the if block is inside the pipeline, which isn't the case. Enclosing the whole block with parentheses will rectify that and would be the minimal change (two extra characters) to make it working:
STRING=foobar
cat "$FILE" | ( while read LINE
do
if [ "$STRING" == "$LINE" ]; then
export ISLINEINFILE="yes"
fi
done
if [ ! "$ISLINEINFILE" == yes ]; then
echo "$LINE" >> "$FILE"
fi
)
The variable would still be lost after that block though.
You might simply avoid the pipeline, which is straigthforward in your case, the cat being unnecessary:
STRING=foobar
while read LINE
do
if [ "$STRING" == "$LINE" ]; then
export ISLINEINFILE="yes"
fi
done < "$FILE"
if [ ! "$ISLINEINFILE" == yes ]; then
echo "$LINE" >> "$FILE"
fi
You might use another argorithmic approach, like using sed or gawk as suggested by John1024.
See also https://unix.stackexchange.com/a/144137/2594 for standard compliance details.

Command not acting properly in script

The command:
value=${value%?}
will remove the last character from a variable.
Is there any logical reason why it would not work from within a script?
In my script it has no effect whatsoever.
if [[ $line =~ "What I want" ]]
then
if [[ $CURRENT -eq 3 ]]
then
echo "line is " $line
value=`echo "$line" | awk '{print $4}'`
echo "value = "$value
value=${value%?}
echo "value = $value "
break
fi
fi
I cant post the whole script, but this is the piece I refer to. The loop is being entered properly, but the 2 echo $value lines return the same thing.
Edit - this question still stands. The code works fine line bu line in a terminal, but all together in a script it fails.

Echo adds an extra line character to $value in this line:
value=`echo "$line" | awk '{print $4}'`
And afaik that extra char is removed with %?, so it seems it does not change anything at all.
Try echo -n instead, which does not add \n to the string:
value=`echo -n "$line" | awk '{print $4}'`

Since you have provided only the relevant part in the code and not the whole file, I'm going to assume that the first line of your file reads `#!/bin/sh'. This is your problem. What you are trying to do (parameter expansion) is specific to bash, so unless /bin/sh points to bash via a symlink, then you are running the script in a shell which does not understand bash parameter expansion.
To see what /bin/sh really is you can do: ls -l /bin/sh. And to remedy the situation, force the script to run in bash by changing the `shebang' at the top to read `#!/bin/bash'

bash sed fail in while loop

#!/bin/bash
fname=$2
rname=$1
echo "$(<$fname)" | while read line ; do
result=`echo "$(<$rname)" | grep "$line"; echo $?`
if [ $result != 0 ]
then
sed '/$line/d' $fname > newkas
fi 2> /dev/null
done
Hi all, i am new to bash.
i have two lists one older than another. I wish to compare the names on 'fname' against 'rname'. 'Result' is the standard out put which i will get if the name is still available in 'rname'. if is not then i will get the non-zero output.
Using sed to delete that line and re route it to a new file.
I have tried part by part of the code and it works until i add in the while loop function. sed don't seems to work as the final output of 'newkas' is the same as the initial input 'fname'.
Is my method wrong or did i miss out any parts?

Part 1: What's wrong
The reason your sed expression "doesn't work" is because you used single quotes. You said
sed '/$line/d' $fname > newkas
Supposing fname=input.txt' and line='example text' this will expand to:
sed '/$line/d' input.txt > newkas
Note that $line is still literally present. This is because bash will not interpolate variables inside single quotes, thus sed sees the $ literally.
You could fix this by saying
sed "/$line/d/" $fname > newkas
Because inside double quotes the variable will expand. However, if your sed expression becomes more complicated you could run into difficulty in cases where bash interprets things which you intended to be interpreted by sed. I tend to use the form
sed '/'"$line"'/d/' $fname > newkas
Which is a bit harder to read but, if you look carefully, single-quotes everything I intend to be part of the sed expression and double quotes the variable I want to expand.
Part 2: How to improve it
Your script contains a number things which could be improved.
echo "$(<$fname)" | while read line ; do
:
done
In the first place you're reading the file with "$(<$fname)" when you could just redirect the stdin of the while loop. This is a bit redundant, but more importantly you're piping to while, which creates an extra subshell and means you can't modify any variables from the enclosing scope. Better to say
while IFS= read -r line ; do
:
done < "$fname"
Next, consider your grep
echo "$(<$rname)" | grep "$line"
Again you're reading the file and echoing it to grep. But, grep can read files directly.
grep "$line" "$rname"
Afterwards you echo the return code and check its value in an if statement, which is a classic useless construct.
result=$( grep "$line" "$rname" ; echo $?)
Instead you can just pass grep directly to if, which will test its return code.
if grep -q "$line" "$rname" ; then
sed "/$line/d" "$fname" > newkas
fi
Note here that I have quoted $fname, which is important if it might ever contain a space. I have also added -q to grep, which suppresses its output.
There's now no need to suppress error messages from the if statement, here, because we don't have to worry about $result containing an unusual value or grep not returning properly.
The final result is this script
while IFS= read -r line ; do
if grep -q "$line" "$rname" ; then
sed "/$line/d" "$fname" > newkas
fi
done < "$fname"
Which will not work, because newkas is overwritten on every loop. This means that in the end only the last line in $fname was used. Instead you could say:
cp "$fname" newkas
while IFS= read -r line ; do
if grep -q "$line" "$rname" ; then
sed -i '' "/$line/d" newkas
fi
done < "$fname"
Which, I believe, will do what you expect.
Part 3: But don't do that
But this is all tangential to solving your actual problem. It appears to me that you want to simply create a file newkas which contains the all the lines of $fname except those that appear in $rname. This is easily done with the comm utility:
comm -2 -3 <(sort "$fname") <(sort "$rname") > newkas
This also changes the sort order of the lines, which may not be good for you. If you want to do it without changing the ordering then using the method #fge suggests is best.
grep -F -v -x -f "$rname" "$fname"

If I understand your need correctly, you want a file newaks which contains the lines in $fname which are also in $rname.
If this is what you want, using sed is overkill. Use fgrep:
fgrep -x -f $fname $rname > newkas
Also, there are problems with your script:
you capture the output of grep in result, which means it will never be exactly 0; what you want is executing the command and simply check for $?
your echoes are convoluted, just do grep whatever thefilename, or while...done <thefile;
finally, you take the line as is from the source file: the line can potentially be a regex, which means you will try and match a regex in $rname, which may yield to unexpected results.
And others.

bash search for string in each line of file

I'm trying what seems like a very simple task: use bash to search a file for strings, and if they exist, output those to another file. It could be jetlag, but this should work:
#!/bin/bash
cnty=CNTRY
for line in $(cat wheatvrice.csv); do
if [[ $line = *$cnty* ]]
then
echo $line >> wr_imp.csv
fi
done
I also tried this for completeness:
#!/bin/bash
cnty=CNTRY
for line in $(cat wheatvrice.csv); do
case $line in
*"$cnty"*) echo $line >> wr_imp.csv;;
*) echo "no";;
esac
done
both output everything, regardless of whether the line contains CNTRY or not, and I'm copy/pasting from seemingly reliable sources, so apparently there's something simple about bash-ness that I'm missing?

Don't use bash, use grep.
grep -F "$cnty" wheatvrice.csv >> wr_imp.csv

While I would suggest to simply use grep too, the question is open, why you approach didn't work. Here a self referential modification of your second approach - with keyword 'bash' to match itself:
#!/bin/bash
cnty=bash
while read -r line
do
case $line in
*${cnty}*)
echo $line " yes" >> bashgrep.log
;;
*)
echo "no"
;;
esac
done < bashgrep.sh
The keypoint is while read -r line ... < FILE. Your command with cat involves String splitting, so every single word is processed in the loop, not every line.
The same problem in example 1.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio