shell script read several lines from file A and write them in file B - shell

hi i'm new to shell and got stuck on this:
i have file A like
[area1]
a
b
[area2]
c
d
[area3]
e
f
i want to read the lines in a certain area and append the lines to file B. like for [area2], i'm expecting to read
c
d
also the area names are random and stored in a variable say $AREA, so i'll also need to match the variable instead of directly using "[area2]", so what i need to cut is from the line equals $AREA to the next line start with "["
how can i achieve this? any help would be appreciated!

a="area2"
sed '/\['"$a"'\]/,+4!d' A|sed '1,2d' >>B

Sed has a write command(w) that you can use to redirect output to a file:
AREA="area2";
sed '/\['"$AREA"'\]/,/^\[/!d;//d;w B' A
/\['"$AREA"'\]/,/^\[/!d: prints lines from your area value up to next line starting with [
//d: removes lines matching the addresses(here [area2] and [area3])
w B: writes to file B
To append output to file B:
sed '/\['"$AREA"'\]/,/^\[/!d;//d;' A >> B

Related

sed/awk between two patterns in a file: pattern 1 set by a variable from lines of a second file; pattern 2 designated by a specified charcacter

I have two files. One file contains a pattern that I want to match in a second file. I want to use that pattern to print between that pattern (included) up to a specified character (not included) and then concatenate into a single output file.
For instance,
File_1:
a
c
d
and File_2:
>a
MEEL
>b
MLPK
>c
MEHL
>d
MLWL
>e
MTNH
I have been using variations of this loop:
while read $id;
do
sed -n "/>$id/,/>/{//!p;}" File_2;
done < File_1
hoping to obtain something like the following output:
>a
MEEL
>c
MEHL
>d
MLWL
But have had no such luck. I have played around with grep/fgrep awk and sed and between the three cannot seem to get the right (or any output). Would someone kindly point me in the right direction?
Try:
$ awk -F'>' 'FNR==NR{a[$1]; next} NF==2{f=$2 in a} f' file1 file2
>a
MEEL
>c
MEHL
>d
MLWL
How it works
-F'>'
This sets the field separator to >.
FNR==NR{a[$1]; next}
While reading in the first file, this creates a key in array a for every line in file file.
NF==2{f=$2 in a}
For every line in file 2 that has two fields, this sets variable f to true if the second field is a key in a or false if it is not.
f
If f is true, print the line.
A plain (GNU) sed solution. Files are read only once. It is assumed that characters in File_1 needn't to be quoted in sed expression.
pat=$(sed ':a; $!{N;ba;}; y/\n/|/' File_1)
sed -E -n ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}" File_2
Explanation:
The first call to sed generates a regular expression to be used in the second call to sed and stores it in the variable pat. The aim is to avoid reading repeatedly the entire File_2 for each line of File_1. It just "slurps" the File_1 and replaces new-line characters with | characters. So the sample File_1 becomes a string with the value a|c|d. The regular expression a|c|d matches if at least one of the alternatives (a, b, c for this example) matches (this is a GNU sed extension).
The second sed expression, ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}", could be converted to pseudo code like this:
begin:
read next line (from File_2) or quit on end-of-file
label_a:
if line begins with `>` followed by one of the alternatives in `pat` then
label_b:
print the line
read next line (from File_2) or quit on end-of-file
if line begins with `>` goto label_a else goto label_b
else goto begin
Let me try to explain why your approach does not work well:
You need to say while read id instead of while read $id.
The sed command />$id/,/>/{//!p;} will exclude the lines which start
with >.
Then you might want to say something like:
while read id; do
sed -n "/^>$id/{N;p}" File_2
done < File_1
Output:
>a
MEEL
>c
MEHL
>d
MLWL
But the code above is inefficient because it reads File_2 as many times as the count of the id's in File_1.
Please try the elegant solution by John1024 instead.
If ed is available, and since the shell is involve.
#!/usr/bin/env bash
mapfile -t to_match < file1.txt
ed -s file2.txt <<-EOF
g/\(^>[${to_match[*]}]\)/;/^>/-1p
q
EOF
It will only run ed once and not every line that has the pattern, that matches from file1. Like say if you have a to z from file1,ed will not run 26 times.
Requires bash4+ because of mapfile.
How it works
mapfile -t to_match < file1.txt
Saves the entry/value from file1 in an array named to_match
ed -s file2.txt point ed to file2 with the -s flag which means don't print info about the file, same info you get with wc file
<<-EOF A here document, shell syntax.
g/\(^>[${to_match[*]}]\)/;/^>/-1p
g means search the whole file aka global.
( ) capture group, it needs escaping because ed only supports BRE, basic regular expression.
^> If line starts with a > the ^ is an anchor which means the start.
[ ] is a bracket expression match whatever is inside of it, in this case the value of the array "${to_match[*]}"
; Include the next address/pattern
/^>/ Match a leading >
-1 go back one line after the pattern match.
p print whatever was matched by the pattern.
q quit ed

How can I create array of lines in this case?

Given a file so that in any line can be more than one word, and exists a single space between any word to other, for example:
a a a a
b b b b
c c
d d
a a a a
How can I create array so that in the cell number i will be the line number i , but WITHOUT DUPLICATES BETWEEN THE ELEMENTS IN THE ARRAY !
In according to the file above, we will need create this array:
Array[0]="a a a a" , Array[1]="b b b b" , Array[2]="c c" , Array[3]=d d.
(The name of the file pass to the script as argument).
I know how to create array that will contain all the lines. Something like that:
Array=()
while read line; do
Array=("${Array[#]}" "${line}")
done < $1
But how can I pass to the while read.. the sorting (and uniq) output of the file?
You should be able to use done < <(sort "$1" | uniq) in place of done < $1.
The <() syntax creates a file-like object from a subshell to execute a separate set of commands.

Formatting need to change for text file in bash scripting

I have below output from a text file. This is long file i just copy here some rows only.
HP83904B74E6
13569.06
7705.509999999999
HP4DC2EECAA8
4175.1
2604.13
And i want to print it like below.
HP83904B74E6 13569.06 7705.509999999999
HP4DC2EECAA8 4175.1 2604.13
I have tried by reading the file line by live using while loop and try to store the value of variable e.g. variablename$i so that i can print it like variablename0 and after every 3 line i have used If statement to print the value of variablename0 variablename1 variablename2, but did not work for me.
Use pr:
$ pr -a3t tmp.txt
HP83904B74E6 13569.06 7705.509999999999
HP4DC2EECAA8 4175.1 2604.13
i have tried by reading the file line by live using while loop and try to store the value of variable e.g. variablename$i so that i can print it like variablename0 and after every 3 line i have used If statement to print the value of variablename0 variablename1 variablename2, but did not work for me. I am just learning bash.
while read -r a; do
read -r b;
read -r c;
echo "$a $b $c";
done < file
you get,
HP83904B74E6 13569.06 7705.509999999999
HP4DC2EECAA8 4175.1 2604.13

read column from csv file in terminal ignoring the header

I'm writting a simple .ksh file to read a single column from a .csv file and then printing the output to the screen:
fname=($(cut -d, -f2 "myfile.csv"))
# loop through these names
for i in ${fname[#]};
do echo "$i"
done
This works fine but I don't want to return the header row, that is the first row of the file. How would I alter the cut command so that it ignore the first value or string. In this case the header is called 'NAME'. I want to print all of the other rows of this file.
That being said, is it easier to loop through from 2:fname as the code is currently written or is it best to alter the cut command?
You could do
fname=($(sed 1d myfile.csv | cut -d, -f2))
Alternately, the index of the first element of the array is 0: to start the loop at index 1:
for i in "${fname[#]:1}"; do
Demo:
$ a=(a b c d e f)
$ echo "${a[#]:1}"
b c d e f
Note, you should always put the array expansion in double quotes.

Compare Lines of file to every other line of same file

I am trying to write a program that will print out every line from a file with another line of that file added at the end, basically creating pairs from a portion of each line. If the line is the same, it will do nothing. Also, it must avoid repeating the same pairs. A B is the same as B A
In short
FileInput:
otherstuff A
otherstuff B
otherstuff C
otherstuff D
Output:
A B
A C
A D
B C
B D
C D
I was trying to do this with a BASH script, but was having trouble because I could not get my nested while loops to work. It would read the first line, compare it to each other line, and then stop (Basically only outputting the first 3 lines in the example output above, the outer while loop only ran once).
I also suspect I might be able to do this using MATLAB, so suggestions using that are also welcome.
Here is the bash script that I have thus far. As I said, it is no printing out correctly for me, as the outer loop only runs once.
#READS IN file from terminal
FILE1=$1
#START count at 0
count0=
exec 3<&0
exec 0< $FILE1
while read LINEa; do
while read LINEb; do
eventIDa=$(echo $LINEa | cut -c20-23)
eventIDb=$(echo $LINEb | cut -c20-23)
echo $eventIDa $eventIDb
done
done
Using bash:
#!/bin/bash
[ -f "$1" ] || { echo >&2 "File not found"; exit 1; }
mapfile -t lines < <(cut -c20-23 <"$1" | sort | uniq)
for i in ${!lines[#]}; do
elem1=${lines[$i]}
unset lines[$i]
for elem2 in "${lines[#]}"; do
echo "$elem1" "$elem2"
done
done
This will read a file given as a parameter on the command line, sort and filter out duplicates, and output all combinations. You can modify the parameter to cut to adjust to your particular input file.
Due to the particular way you seem to indent to use cut, your input example above won't work. Instead, use something with the correct line length, such as:
123456789012345678 A
123456789012345678 B
123456789012345678 C
123456789012345678 D
Assuming the otherstuff is not relevant (otherwise you can of course add it later) this should do the trick in Matlab:
combnk({'A' 'B' 'C' 'D'},2)

Resources