Replace line containing variables and space by bash commands - bash

I have some files containing lines, some of them are similar that shown below:
HETATM 2340 C2 2FN 1 15.566 27.839 11.677 1.00 24.33 C
I need to replace
2FN 1
to
2FN D 1
so that the final result is:
HETATM 2340 C2 2FN 1 15.566 27.839 11.677 1.00 24.33 C
This is rather easy by using sed command and in the case of you always have the same words to replace
sed 's/2FN 1/2FN D 1/g' input.file > output.file
However, in the case one wants to use variables
A="2FN"
B="1"
in sed command, the result is not what is expected, I suppose due to the multiple spaces in the text to replace.
I tried several ways, such as:
A="2FN"
B="1"
S=' '
G=$(echo "$LIG${S}$LIGN")
sed 's/$G/2FN D 1/g' input.file > output.file
But no expected result has been obtained.
Interestingly, by echo G variable is:
"2FN 1"
but sed doesn't replace to
"2FN D 1"
Do you have any suggestions?
Thanks

The problem is you're trying to get bash to resolve variables within single quotes. Single quotes are telling bash: "Don't resolve anything in here, take it literally as is"
If you simply replace the single quotes in your sed command with double quotes, as #oguzismail suggested, you'll be fine.
Much more detail, if needed, is here:
https://stackoverflow.com/a/13802438/236528

Related

Shell separate line into multiple lines after every number

So I have a selection of text files all of which are on one line
I need a way to seperate the line into multiple lines after every number.
At the minute I have something like this
a 111111b 222c 3d 444444
and I need a way to get it to this
a 11111
b 222
c 3
d 444444
I have been trying to create a gawk with regex but I'm not aware of a way to get this to work. (I am fairly new to shell)
Easy with sed.
$: cat file
a 51661b 99595c 65652d 51515
$: sed -E 's/([a-z] [0-9]+)\n*/\1\n/g' file
a 51661
b 99595
c 65652
d 51515
Pretty easy with awk.
$: awk '{ print gensub("([a-z] [0-9]+)\n*", "\\1\n", "g") }' file
a 51661
b 99595
c 65652
d 51515
Could even do with bash built-ins only...but don't...
while read -r line
do while [[ "$line" =~ [a-z]\ [0-9]+ ]]
do printf "%s\n" "$BASH_REMATCH"
line=${line#$BASH_REMATCH}
done
done < file
a 51661
b 99595
c 65652
d 51515
You already have a good answer from Paul, but for sed an arguably more direct expression simply using the first two numbered backreferences separated by a newline would be:
sed -E 's/([0-9])([^0-9])/\1\n\2/g' file
Example Use/Output
In your case that would be:
$ echo "a 111111b 222c 3d 444444" | sed -E 's/([0-9])([^0-9])/\1\n\2/g'
a 111111
b 222
c 3
d 444444

sed replace text with backslash \ and curved brackets {} using bash variable

I have a line of latex source code which I want to replace.
The problem is, it contains curved brackets and backslash.
Furthermore, I would like to replace it with a bash variable
Before: In case0.tex, I have this line:
\title{Analysis Case 0}
I want to change the title inside the curved bracket to a string contained in a bash variable called $CASE.
This is what I tried, however I am not sure how to treat this special case with sed.
CASE=Analysis Case 1
sed -e "s/ \title{Analysis Case 0} / \title{ $CASE } /g" ./case0.tex > ./case1.tex
After: In case1.tex I would like to get this line.
\title{Analysis Case 1}
It would be nice if someone could tell me how to do that!
Using sed
$ case="Analysis Case 1"
$ sed s"/\({\)[^}]*/\1$case/" case0.tex > case1.tex
$ cat case1.tex
\title{Analysis Case 1}

sed/awk between two patterns in a file: pattern 1 set by a variable from lines of a second file; pattern 2 designated by a specified charcacter

I have two files. One file contains a pattern that I want to match in a second file. I want to use that pattern to print between that pattern (included) up to a specified character (not included) and then concatenate into a single output file.
For instance,
File_1:
a
c
d
and File_2:
>a
MEEL
>b
MLPK
>c
MEHL
>d
MLWL
>e
MTNH
I have been using variations of this loop:
while read $id;
do
sed -n "/>$id/,/>/{//!p;}" File_2;
done < File_1
hoping to obtain something like the following output:
>a
MEEL
>c
MEHL
>d
MLWL
But have had no such luck. I have played around with grep/fgrep awk and sed and between the three cannot seem to get the right (or any output). Would someone kindly point me in the right direction?
Try:
$ awk -F'>' 'FNR==NR{a[$1]; next} NF==2{f=$2 in a} f' file1 file2
>a
MEEL
>c
MEHL
>d
MLWL
How it works
-F'>'
This sets the field separator to >.
FNR==NR{a[$1]; next}
While reading in the first file, this creates a key in array a for every line in file file.
NF==2{f=$2 in a}
For every line in file 2 that has two fields, this sets variable f to true if the second field is a key in a or false if it is not.
f
If f is true, print the line.
A plain (GNU) sed solution. Files are read only once. It is assumed that characters in File_1 needn't to be quoted in sed expression.
pat=$(sed ':a; $!{N;ba;}; y/\n/|/' File_1)
sed -E -n ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}" File_2
Explanation:
The first call to sed generates a regular expression to be used in the second call to sed and stores it in the variable pat. The aim is to avoid reading repeatedly the entire File_2 for each line of File_1. It just "slurps" the File_1 and replaces new-line characters with | characters. So the sample File_1 becomes a string with the value a|c|d. The regular expression a|c|d matches if at least one of the alternatives (a, b, c for this example) matches (this is a GNU sed extension).
The second sed expression, ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}", could be converted to pseudo code like this:
begin:
read next line (from File_2) or quit on end-of-file
label_a:
if line begins with `>` followed by one of the alternatives in `pat` then
label_b:
print the line
read next line (from File_2) or quit on end-of-file
if line begins with `>` goto label_a else goto label_b
else goto begin
Let me try to explain why your approach does not work well:
You need to say while read id instead of while read $id.
The sed command />$id/,/>/{//!p;} will exclude the lines which start
with >.
Then you might want to say something like:
while read id; do
sed -n "/^>$id/{N;p}" File_2
done < File_1
Output:
>a
MEEL
>c
MEHL
>d
MLWL
But the code above is inefficient because it reads File_2 as many times as the count of the id's in File_1.
Please try the elegant solution by John1024 instead.
If ed is available, and since the shell is involve.
#!/usr/bin/env bash
mapfile -t to_match < file1.txt
ed -s file2.txt <<-EOF
g/\(^>[${to_match[*]}]\)/;/^>/-1p
q
EOF
It will only run ed once and not every line that has the pattern, that matches from file1. Like say if you have a to z from file1,ed will not run 26 times.
Requires bash4+ because of mapfile.
How it works
mapfile -t to_match < file1.txt
Saves the entry/value from file1 in an array named to_match
ed -s file2.txt point ed to file2 with the -s flag which means don't print info about the file, same info you get with wc file
<<-EOF A here document, shell syntax.
g/\(^>[${to_match[*]}]\)/;/^>/-1p
g means search the whole file aka global.
( ) capture group, it needs escaping because ed only supports BRE, basic regular expression.
^> If line starts with a > the ^ is an anchor which means the start.
[ ] is a bracket expression match whatever is inside of it, in this case the value of the array "${to_match[*]}"
; Include the next address/pattern
/^>/ Match a leading >
-1 go back one line after the pattern match.
p print whatever was matched by the pattern.
q quit ed

Bash Columns SED and BASH Commands without AWK?

I wrote 2 difference scripts but I am stuck at the same problem.
The problem is am making a table from a file ($2) that I get in args and $1 is the numbers of columns. A little bit hard to explain but I am gonna show you input and output.
The problem is now that I don't know how I can save every column now in a difference var so i can build it in my HTML code later
#printf #TR##TD#$...#/TD##TD#$...#/TD##TD#$..#/TD##/TR##TD#$...
so input look like that :
Name\tSize\tType\tprobe
bla\t4711\tfile\t888888888
abcde\t4096\tdirectory\t5555
eeeee\t333333\tblock\t6666
aaaaaa\t111111\tpackage\t7777
sssss\t44444\tfile\t8888
bbbbb\t22222\tfolder\t9999
Code :
c=1
column=$1
file=$2
echo "$( < $file)"| while read Line ; do
Name=$(sed "s/\\\t/ /g" $file | cut -d' ' -f$c,-$column)
printf "$Name \n"
#let c=c+1
#printf "<TR><TD>$Name</TD><TD>$Size</TD><TD>$Type</TD></TR>\n"
exit 0
done
Output:
Name Size Type probe
bla 4711 file 888888888
abcde 4096 directory 5555
eeeee 333333 block 6666
aaaaaa 111111 package 7777
sssss 44444 file 8888
bbbbb 22222 folder 9999
This is tailor-made job for awk. See this script:
awk -F'\t' '{printf "<tr>";for(i=1;i<=NF;i++) printf "<td>%s</td>", $i;print "</tr>"}' input
<tr><td>bla</td><td>4711</td><td>file</td><td>888888888</td></tr>
<tr><td>abcde</td><td>4096</td><td>directory</td><td>5555</td></tr>
<tr><td>eeeee</td><td>333333</td><td>block</td><td>6666</td></tr>
<tr><td>aaaaaa</td><td>111111</td><td>package</td><td>7777</td></tr>
<tr><td>sssss</td><td>44444</td><td>file</td><td>8888</td></tr>
<tr><td>bbbbb</td><td>22222</td><td>folder</td><td>9999</td></tr>
In bash:
celltype=th
while IFS=$'\t' read -a columns; do
rowcontents=$( printf '<%s>%s</%s>' "$celltype" "${columns[#]}" "$celltype" )
printf '<tr>%s</tr>\n' "$rowcontents"
celltype=td
done < <( sed $'s/\\\\t/\t/g' "$2")
Some explanations:
IFS=$'\t' read -a columns reads a line from standard input, using only the tab character to separate fields, and putting each field into a separate element of the array columns. We change IFS so that other whitespace, which could occur in a field, is not treated as a field delimiter.
On the first line read from standard input, <th> elements will be output by the printf line. After resetting the value of celltype at the end of the loop body, all subsequent rows will consist of <td> elements.
When setting the value of rowcontents, take advantage of the fact that the first argument is repeated as many times as necessary to consume all the arguments.
Input is via process substitution from the sed command, which requires a crazy amount of quoting. First, the entire argument is quoted with $'...', which tells bash to replace escaped characters. bash converts this to the literal string s/\\t/^T/g, where I am using ^T to represent a literal ASCII 09 tab character. When sed sees this argument, it performs its own escape replacement, so the search text is a literal backslash followed by a literal t, to be replaced by a literal tab character.
The first argument, the column count, is unnecessary and is ignored.
Normally, you avoid making the while loop part of a pipeline because you set parameters in the loop that you want to use later. Here, all the variables are truly local to the while loop, so you could avoid the process substitution and use a pipeline if you wish:
sed $'s/\\\\t/\t/g' "$2" | while IFS=$'\t' read -a columns; do
...
done

Compare Lines of file to every other line of same file

I am trying to write a program that will print out every line from a file with another line of that file added at the end, basically creating pairs from a portion of each line. If the line is the same, it will do nothing. Also, it must avoid repeating the same pairs. A B is the same as B A
In short
FileInput:
otherstuff A
otherstuff B
otherstuff C
otherstuff D
Output:
A B
A C
A D
B C
B D
C D
I was trying to do this with a BASH script, but was having trouble because I could not get my nested while loops to work. It would read the first line, compare it to each other line, and then stop (Basically only outputting the first 3 lines in the example output above, the outer while loop only ran once).
I also suspect I might be able to do this using MATLAB, so suggestions using that are also welcome.
Here is the bash script that I have thus far. As I said, it is no printing out correctly for me, as the outer loop only runs once.
#READS IN file from terminal
FILE1=$1
#START count at 0
count0=
exec 3<&0
exec 0< $FILE1
while read LINEa; do
while read LINEb; do
eventIDa=$(echo $LINEa | cut -c20-23)
eventIDb=$(echo $LINEb | cut -c20-23)
echo $eventIDa $eventIDb
done
done
Using bash:
#!/bin/bash
[ -f "$1" ] || { echo >&2 "File not found"; exit 1; }
mapfile -t lines < <(cut -c20-23 <"$1" | sort | uniq)
for i in ${!lines[#]}; do
elem1=${lines[$i]}
unset lines[$i]
for elem2 in "${lines[#]}"; do
echo "$elem1" "$elem2"
done
done
This will read a file given as a parameter on the command line, sort and filter out duplicates, and output all combinations. You can modify the parameter to cut to adjust to your particular input file.
Due to the particular way you seem to indent to use cut, your input example above won't work. Instead, use something with the correct line length, such as:
123456789012345678 A
123456789012345678 B
123456789012345678 C
123456789012345678 D
Assuming the otherstuff is not relevant (otherwise you can of course add it later) this should do the trick in Matlab:
combnk({'A' 'B' 'C' 'D'},2)

Resources