So I have a selection of text files all of which are on one line
I need a way to seperate the line into multiple lines after every number.
At the minute I have something like this
a 111111b 222c 3d 444444
and I need a way to get it to this
a 11111
b 222
c 3
d 444444
I have been trying to create a gawk with regex but I'm not aware of a way to get this to work. (I am fairly new to shell)
Easy with sed.
$: cat file
a 51661b 99595c 65652d 51515
$: sed -E 's/([a-z] [0-9]+)\n*/\1\n/g' file
a 51661
b 99595
c 65652
d 51515
Pretty easy with awk.
$: awk '{ print gensub("([a-z] [0-9]+)\n*", "\\1\n", "g") }' file
a 51661
b 99595
c 65652
d 51515
Could even do with bash built-ins only...but don't...
while read -r line
do while [[ "$line" =~ [a-z]\ [0-9]+ ]]
do printf "%s\n" "$BASH_REMATCH"
line=${line#$BASH_REMATCH}
done
done < file
a 51661
b 99595
c 65652
d 51515
You already have a good answer from Paul, but for sed an arguably more direct expression simply using the first two numbered backreferences separated by a newline would be:
sed -E 's/([0-9])([^0-9])/\1\n\2/g' file
Example Use/Output
In your case that would be:
$ echo "a 111111b 222c 3d 444444" | sed -E 's/([0-9])([^0-9])/\1\n\2/g'
a 111111
b 222
c 3
d 444444
Related
I have two files. One file contains a pattern that I want to match in a second file. I want to use that pattern to print between that pattern (included) up to a specified character (not included) and then concatenate into a single output file.
For instance,
File_1:
a
c
d
and File_2:
>a
MEEL
>b
MLPK
>c
MEHL
>d
MLWL
>e
MTNH
I have been using variations of this loop:
while read $id;
do
sed -n "/>$id/,/>/{//!p;}" File_2;
done < File_1
hoping to obtain something like the following output:
>a
MEEL
>c
MEHL
>d
MLWL
But have had no such luck. I have played around with grep/fgrep awk and sed and between the three cannot seem to get the right (or any output). Would someone kindly point me in the right direction?
Try:
$ awk -F'>' 'FNR==NR{a[$1]; next} NF==2{f=$2 in a} f' file1 file2
>a
MEEL
>c
MEHL
>d
MLWL
How it works
-F'>'
This sets the field separator to >.
FNR==NR{a[$1]; next}
While reading in the first file, this creates a key in array a for every line in file file.
NF==2{f=$2 in a}
For every line in file 2 that has two fields, this sets variable f to true if the second field is a key in a or false if it is not.
f
If f is true, print the line.
A plain (GNU) sed solution. Files are read only once. It is assumed that characters in File_1 needn't to be quoted in sed expression.
pat=$(sed ':a; $!{N;ba;}; y/\n/|/' File_1)
sed -E -n ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}" File_2
Explanation:
The first call to sed generates a regular expression to be used in the second call to sed and stores it in the variable pat. The aim is to avoid reading repeatedly the entire File_2 for each line of File_1. It just "slurps" the File_1 and replaces new-line characters with | characters. So the sample File_1 becomes a string with the value a|c|d. The regular expression a|c|d matches if at least one of the alternatives (a, b, c for this example) matches (this is a GNU sed extension).
The second sed expression, ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}", could be converted to pseudo code like this:
begin:
read next line (from File_2) or quit on end-of-file
label_a:
if line begins with `>` followed by one of the alternatives in `pat` then
label_b:
print the line
read next line (from File_2) or quit on end-of-file
if line begins with `>` goto label_a else goto label_b
else goto begin
Let me try to explain why your approach does not work well:
You need to say while read id instead of while read $id.
The sed command />$id/,/>/{//!p;} will exclude the lines which start
with >.
Then you might want to say something like:
while read id; do
sed -n "/^>$id/{N;p}" File_2
done < File_1
Output:
>a
MEEL
>c
MEHL
>d
MLWL
But the code above is inefficient because it reads File_2 as many times as the count of the id's in File_1.
Please try the elegant solution by John1024 instead.
If ed is available, and since the shell is involve.
#!/usr/bin/env bash
mapfile -t to_match < file1.txt
ed -s file2.txt <<-EOF
g/\(^>[${to_match[*]}]\)/;/^>/-1p
q
EOF
It will only run ed once and not every line that has the pattern, that matches from file1. Like say if you have a to z from file1,ed will not run 26 times.
Requires bash4+ because of mapfile.
How it works
mapfile -t to_match < file1.txt
Saves the entry/value from file1 in an array named to_match
ed -s file2.txt point ed to file2 with the -s flag which means don't print info about the file, same info you get with wc file
<<-EOF A here document, shell syntax.
g/\(^>[${to_match[*]}]\)/;/^>/-1p
g means search the whole file aka global.
( ) capture group, it needs escaping because ed only supports BRE, basic regular expression.
^> If line starts with a > the ^ is an anchor which means the start.
[ ] is a bracket expression match whatever is inside of it, in this case the value of the array "${to_match[*]}"
; Include the next address/pattern
/^>/ Match a leading >
-1 go back one line after the pattern match.
p print whatever was matched by the pattern.
q quit ed
I have two files and would like to insert the contents of one file into the other, replacing a specified line.
File 1:
abc
def
ghi
jkl
File 2:
123
The following code is what I have.
file1=numbers.txt
file2=letters.txt
linenumber=3s
echo $file1
echo $file2
sed "$linenumber/.*/r $file1/" $file2
Which results in the output:
abc
def
r numbers.txt
jkl
The output I am hoping for is:
abc
def
123
jkl
I thought it could be an issue with bash variables but I still get the same output when I manually enter the information.
How am I misunderstanding sed and/or the read command?
Your script replaces the line with the string "r $file1". The part in sed in s command is not re-interpreted as a command, but taken literally.
You can:
linenumber=3
sed "$linenumber"' {
r '"$file1"'
d
}' "$file2"
Read line number 3, print file1 and then delete the line.
See here for a good explanation and reference.
Surely we can make that a oneliner:
sed -e "$linenumber"' { r '"$file2"$'\n''d; }' "$file1"
Life example at tutorialpoints.
I would use the c command as follows:
linenumber=3
sed "${linenumber}c $(< $file1)" "$file2"
This replaces the current line with the text that comes after c.
Your command didn't work because it expands to this:
sed "3s/.*/r numbers.txt/" letters.txt
and you can't use r like that. r has to be the command that is being run.
I am leaning how to use awk and sed this week. I know this question might have been asked before but I am not sure what is wrong with my script. I have three files and I am using grep to search for the pattern gge0001x gge0001y gge0001z. x is in file1, y is in file 2, and z is in file3. If anyone wants to see L2E[1-3].iva they are here: https://gist.github.com/anonymous/1112988408874c730cd4f3d313226ba4
#!/bin/bash
echo "Performance Data"
sed -n '1,19p' L2E1.iva|cat > file1 #take lines 1-19 in L2E1 and take the
# output into file1. The next two commands do the same thing
sed -n '1,19p' L2E2.iva|cat > file2
sed -n '1,19p' L2E3.iva|cat > file3
curveName=`grep "F" file1|sed "s/F/ /"`
# This will search for F in file 1, and then substitute F with a space
curveName2=`grep "F" file2|sed "s/F/ /"`
curveName3=`grep "F" file3|sed "s/F/ /"`
echo "Curve Name" "$curveName $curveName2 $curveName3"
I want my output to be Curve Name gge0001x gge0001y gge0001z. But the output is this instead:
Performance Data gge0006ze gge0006x
If I echo them out by themselves then it is fine, but once I echo all three on the same line the output gets skewed. Why does x show up last when it is first when I echo it and where did my y go to?
A few tips at first:
sed -n '1,19p' L2E1.iva|cat > file1
You can omit the cat an redirect the output of sed directly to the file:
sed -n '1,19p' L2E1.iva > file1
curveName=`grep "F" file1|sed "s/F/ /"`
Use $() instead of backticks for process substitution:
curveName=$(grep "F" file1 | sed "s/F/ /")
But the output is this instead: Performance Data gge0006ze gge0006x
The reason for Performance Data in your output is, that you echo it at the beginning of your script.
Moreover you've got a typo in the last echo: $curvename2 -> $curveName2, this is why your y is missing.
Did you double check your files for the right contents? That's the only reason i can imagine, why your x comes last and the z first.
You can perhaps compress your script into one line
$ echo Curve name $(grep -Pohm1 '(?<=F ).*' L2E{1..3})
Curve name gge0006x gge0006y gge0006z
exercise is to search the options used in grep
I'm trying to retrieve nth column from "busfile" file by substituting values in "i"
the below code works fine on redhat linux, when tried on hp unix i'm getting error
"sed: Function {i}{p} cannot be parsed."
here is my code
acList=/z/temp/busfile
i=1
temp1=`sed -n "{i}{p}" $acList`
echo $temp1
Update:
Even when I add the $ as suggested in some of the answers, I still have the same problem.
temp1=`sed -n "${i}{p}" $acList`
If you're trying to use the i variable to print a line, you need to precede it with $:
temp1=`sed -n "${i}p" $acList`
as per the following transcript:
pax> i=3
pax> echo 'a
...> b
...> c
...> d
...> e
...> f
...> g' | sed -n "${i}p"
c
In situations like this, I tend to first try the simplest solution then gradually add complexity until it fails.
The first step would be to create a four-line file (called myfile) with the words one through four:
one
two
three
four
then try various commands with it, in ever increasing complexity:
sed -n "p" myfile # Print all lines.
sed -n "3p" myfile # Print hard-coded line.
i=3 ; sed -n "${i}p" myfile # Print line with parameter.
i=3 ; x=`sed -n "${i}p" myfile` ; echo $x # Capture line with parameter.
At some point, it will hopefully "break" and you can then target your investigations in a more concentrated manner.
However, I suspect it's unnecessary here since your purported use of that command to extract a column is incorrect. If you're trying to print a column rather than a line, then awk may be a better tool for the job:
pax> i=5
pax> echo 'pax is a really great guy' | awk -vf=$i '{print $f}'
great
You can use:
acList=/z/temp/busfile
i=1
temp1=`sed -n $i'p' $acList`
echo "$temp1"
I have two programs that produce data on stdout, and I'd like to paste their output together. I can successfully do this like so:
paste <(./prog1) <(./prog2)
But I find that this method will print all lines from both inputs,
and what I really want is to stop paste after either input program is finished.
So if ./prog1 produces the output:
a
b
c
But ./prog2 produces:
Hello
World
I would expect the output:
a Hello
b World
Also note that one of the input programs may actually produce infinite output, and I want to be able to handle that case as well. For example, if my inputs are yes and ./prog2, I should get:
y Hello
y World
Use join instead, with a variation on the Schwartzian transform:
numbered () {
nl -s- -ba -nrz
}
join -j 1 <(prog1 | numbered) <(prog2 | numbered) | sed 's/^[^-]*-//'
Piping to nl numbers each line, and join -1 1 will join corresponding lines with the same number. The extra lines in the longer file will have no join partner and be omitted. Once the join is complete, pipe through sed to remove the line numbers.
Here's one solution:
while IFS= read -r -u7 a && IFS= read -r -u8 b; do echo "$a $b"; done 7<$file1 8<$file2
This has the slightly annoying effect of ignoring the last line of an input file if it is not terminated with a newline (but such a file is not a valid text file).
You can wrap this in a function, of course:
paste_short() {
(
while IFS= read -r -u7 a && IFS= read -r -u8 b; do
echo "$a $b"
done
) 7<"$1" 8<"$2"
}
Consider using awk:
awk 'FNR==NR{a[++i]=$0;next} FNR>i{exit}
{print a[FNR], $0}' <(printf "hello\nworld\n") <(printf "a\nb\nc\n")
hello a
world b
Keep the longer output producing program as your 2nd input.