How to insert a generated value by a loop while you open a file in bash - bash

Lets say that I have:
cat FILENAME1.txt
Definition john
cat FILENAME2.txt
Definition mary
cat FILENAME3.txt
Definition gary
cat textfile.edited
text
text
text
I want to obtain an ouput like:
1 john text
2 mary text
3 gary text
I tried to use "stored" values from FILENAMES "generated" by a loop. I wrote this:
for file in $(ls *.txt); do
name=$(cat $file| grep -i Definition|awk '{$1="";print $0}')
#echo $name --> this command works as it gives the names
done
cat textfile.edited| awk '{printf "%s\t%s\n",NR,$0}'
which very close to what I want to get
1 text
2 text
3 text
My issue was coming through when I tried to add the "stored" value. I tried the following with no success.
cat textfile.edited| awk '{printf "%s\t%s\n",$name,NR,$0}'
cat textfile.edited| awk '{printf "%s\t%s\n",name,NR,$0}'
cat textfile.edited| awk -v name=$name '{printf "%s\t%s\n",NR,$0}'
Sorry if the terminology used is not the best, but I started scripting recently.
Thank you in advance!!!

One solution using paste and awk ...
We'll append a count to the lines in textfile.edited (so we can see which lines are matched by paste):
$ cat textfile.edited
text1
text2
text3
First we'll look at the paste component:
$ paste <(egrep -hi Definition FILENAME*.txt) textfile.edited
Definition john text1
Definition mary text2
Definition gary text3
From here awk can do the final slicing-n-dicing-n-numbering:
$ paste <(egrep -hi Definition FILENAME*.txt) textfile.edited | awk 'BEGIN {OFS="\t"} {print NR,$2,$3}'
1 john text1
2 mary text2
3 gary text3
NOTE: It's not clear (to me) if the requirement is for a space or tab between the 2nd and 3rd columns; above solution assumes a tab, while using a space would be doable via a (awk) printf call.

You can do all with one awk command.
First file is the textfile.edited, other files are mentioned last.
awk 'NR==FNR {text[NR]=$0;next}
/^Definition/ {namenr++; names[namenr]=$2}
END { for (i=1;i<=namenr;i++) printf("%s %s %s\n", i, names[i], text[i]);}
' textfile.edited FILENAME*.txt
You can avoid awk with
paste -d' ' <(seq $(wc -l <textfile.edited)) \
<(sed -n 's/^Definition //p' FILE*) \
textfile.edited

Another version of the paste solution with a slightly careless grep -
$: paste -d\ <( grep -ho '[^ ]*$' FILENAME?.txt ) textfile.edited
john text
mary text
gary text
Or, one more way to look at it...
$: a=( $(sed '/^Definition /s/.* //;' FILENAME[123].txt) )
$: echo "${a[#]}"
john mary gary
$: b=( $(<textfile.edited) )
$: echo "${b[#]}"
text text text
$: c=-1 # initialize so that the first pre-increment returns 0
$: while [[ -n "${a[++c]}" ]]; do echo "${a[c]} ${b[c]}"; done
john text
mary text
gary text
This will put all the values in memory before printing anything, so if the lists are really large it might not be your best bet. If they are fairly small, it's pretty efficient, and a single parallel index will keep them in order.
If the lines are not the same as the number of files, what did you want to do? As long as there aren't more files than lines, and any extra lines are ok to ignore, this still works. If there are more files than lines, then we need to know how you'd prefer to handle that.

A one-liner using GNU utilities:
paste -d ' ' <(cat -n FILENAME*.txt | sed 's/\sDefinition//') textfile.edited
Or,
paste -d ' ' <(cat -n FILENAME*.txt | sed 's/^\s*//;s/\sDefinition//') textfile.edited
if the leading white spaces are not desired.
Alternatively:
paste -d ' ' <(sed 's/^Definition\s//' FILENAME*.txt | cat -n) textfile.edited

Related

How to merge in one file, two files in bash line by line [duplicate]

What's the easiest/quickest way to interleave the lines of two (or more) text files? Example:
File 1:
line1.1
line1.2
line1.3
File 2:
line2.1
line2.2
line2.3
Interleaved:
line1.1
line2.1
line1.2
line2.2
line1.3
line2.3
Sure it's easy to write a little Perl script that opens them both and does the task. But I was wondering if it's possible to get away with fewer code, maybe a one-liner using Unix tools?
paste -d '\n' file1 file2
Here's a solution using awk:
awk '{print; if(getline < "file2") print}' file1
produces this output:
line 1 from file1
line 1 from file2
line 2 from file1
line 2 from file2
...etc
Using awk can be useful if you want to add some extra formatting to the output, for example if you want to label each line based on which file it comes from:
awk '{print "1: "$0; if(getline < "file2") print "2: "$0}' file1
produces this output:
1: line 1 from file1
2: line 1 from file2
1: line 2 from file1
2: line 2 from file2
...etc
Note: this code assumes that file1 is of greater than or equal length to file2.
If file1 contains more lines than file2 and you want to output blank lines for file2 after it finishes, add an else clause to the getline test:
awk '{print; if(getline < "file2") print; else print ""}' file1
or
awk '{print "1: "$0; if(getline < "file2") print "2: "$0; else print"2: "}' file1
#Sujoy's answer points in a useful direction. You can add line numbers, sort, and strip the line numbers:
(cat -n file1 ; cat -n file2 ) | sort -n | cut -f2-
Note (of interest to me) this needs a little more work to get the ordering right if instead of static files you use the output of commands that may run slower or faster than one another. In that case you need to add/sort/remove another tag in addition to the line numbers:
(cat -n <(command1...) | sed 's/^/1\t/' ; cat -n <(command2...) | sed 's/^/2\t/' ; cat -n <(command3) | sed 's/^/3\t/' ) \
| sort -n | cut -f2- | sort -n | cut -f2-
With GNU sed:
sed 'R file2' file1
Output:
line1.1
line2.1
line1.2
line2.2
line1.3
line2.3
Here's a GUI way to do it: Paste them into two columns in a spreadsheet, copy all cells out, then use regular expressions to replace tabs with newlines.
cat file1 file2 |sort -t. -k 2.1
Here its specified that the separater is "." and that we are sorting on the first character of the second field.

Compare two files, and keep only if first word of each line is the same

Here are two files where I need to eliminate the data that they do not have in common:
a.txt:
hello world
tom tom
super hero
b.txt:
hello dolly 1
tom sawyer 2
miss sunshine 3
super man 4
I tried:
grep -f a.txt b.txt >> c.txt
And this:
awk '{print $1}' test1.txt
because I need to check only if the first word of the line exists in the two files (even if not at the same line number).
But then what is the best way to get the following output in the new file?
output in c.txt:
hello dolly 1
tom sawyer 2
super man 4
Use awk where you iterate over both files:
$ awk 'NR == FNR { a[$1] = 1; next } a[$1]' a.txt b.txt
hello dolly 1
tom sawyer 2
super man 4
NR == FNR is only true for the first file making { a[$1] = 1; next } only run on said file.
Use sed to generate a sed script from the input, then use another sed to execute it.
sed 's=^=/^=;s= .*= /p=' a.txt | sed -nf- b.txt
The first sed turns your a.txt into
/^hello /p
/^tom /p
/^super /p
which prints (p) whenever a line contains hello, tom, or super at the beginning of line (^) followed by a space.
This combines grep, cut and sed with process substitution:
$ grep -f <(cut -d ' ' -f 1 a.txt | sed 's/^/^/') b.txt
hello dolly 1
tom sawyer 2
super man 4
The output of the process substitution is this (piping to cat -A to show spaces):
$ cut -d ' ' -f 1 a.txt | sed 's/^/^/;s/$/ /' | cat -A
^hello $
^tom $
^super $
We then use this as input for grep -f, resulting in the above.
If your shell doesn't support process substitution, but your grep supports reading from stdin with the -f option (it should), you can use this instead:
$ cut -d ' ' -f 1 a.txt | sed 's/^/^/;s/$/ /' | grep -f - b.txt
hello dolly 1
tom sawyer 2
super man 4

How to get word from text file BASH

I want to get only one word from this txt file: http://pastebin.com/jFDu0Le5 . The word is from last row: WER: 45.67% Correct: 65.87% Acc: 54.33%
I want to get only the value: 45.67 to save it to the file value.txt..I want to create BASH script to get this value. Can you give me an example how to do it??? I am new in Bash and I need it for school. The whole .txt file is saved on my server as text file file.txt.
Try this:
grep WER file.txt | awk '{print $2}' | uniq | sed -e 's/%//' > value.txt
Note that this will overwrite value.txt each time you run the command.
You want grep "WER:" value.txt | cut -???
I have ??? because I do not know the structure of the file. Tab delimited? Fixed Width?
Do man cut an you can get the arguments you need.
There a many ways and instruments to do the task:
sed
tac file.txt | sed -n '/^WER: /{s///;s/%.*//;p;q}' > value.txt
awk
tac file.txt | awk -F'[ %]' '/^WER:/{print $2;exit}' > value.txt
bash
while read a b c
do
if [ $a = "WER:" ]
then
b=${b%\%*}
echo ${b#* }
break
fi
done < <(tac file.txt) > value.txt
If the format is as you said, then this also works
awk -F'[: %]' '/^WER/{print $3}' file.txt > value.txt
Explanation
-F specifies the field separator as one of [: %]
/<PATTERN>/ {<ACTION>} refers to: if a line matches some PATTERN, then do some ACTION
in my case,
the PATTERN is: starts with ^ the string WER
the ACTION is: print field $3 (as split by the -F field separators)
> sends the output to value.txt

sed, capture only the number

I have this text file:
some text A=10 some text
some more text A more text
some other text A=30 other text
I'm trying to use sed to capture only the numeric value of A. Using this
cat textfile | sed -r 's/.*A=(\S+).*/\1/'
I get:
10
some more text A more text
30
But what i really need is:
10
0
30
If the string A= does not exist output a 0. How can I accomplish this?
I cannot think on a one-liner, so this is my approach:
while read line
do
grep -Po '(?<=A=)\d+' <<< "$line" || echo "0"
done < file
I am using the look-behind grep to get any number after A=. In case there is none, the || (else) will print a 0.
I love code-golf!
sed -e 's/^/A=0 /; s/.*\<A=\(\d\+\).*/\1/'
This prepends A=0 to the line before substituting.
try this one-liner:
awk -F'A=' 'NF==1{print "0";next}{sub(/ .*/,"",$2);print $2}' file
with your data:
kent$ echo "some text A=10 some text
some more text A more text
some other text A=30 other text"|awk -F'A=' 'NF==1{print "0";next}{sub(/.*/,"",$2);print $2}'
10
0
30
gawk
awk '{$0=gensub(/^.*A=?([[:digit:]]+).*$/, "\\1", "g"); print($0+0)}' file.txt
This might work for you (GNU sed):
sed '/.*A=\([0-9][0-9]*\).*/s//\1/;t;s/.*/0/' file
Look for the string A= followed by one or more numbers and if it occurs replace the whole line by the back reference. Otherwise replace the whole of the line by 0.
I think the best way is to do two different commands - the first replaces lines without 'A=' with the line 'A=0', the second does what you did.
So
cat textfile | sed -r 's/^([^A]|A[^=)*$/A=0/' | sed -r 's/.*A=(\S+).*/\1/'
How about:
sed -r -e 's/.*A=(\S+).*/\1/' -e 's/.*A.*/0/'
Some grep-sed-cut combination:
grep -o 'A=\?[0-9]*' input | sed 's/A$/A=0/' | cut -d= -f2
Produces:
10
0
30

Remove the first word in a text stream

How would I remove the first word from each line of text in a stream?
For example,
$ cat myfile
some text 1
some text 2
some text 3
I want:
$ cat myfile | magiccommand
text 1
text 2
text 3
How would I go about this using Bash? I could use awk '{print $2 $3 $4 $5 ....}', but that's messy and would result in extra spaces for all null arguments. I was thinking that sed might be able to do this, but I could not find any examples of this.
Based on your example text,
cut -d' ' -f2- yourFile
should do the job.
That should work:
$ cat test.txt
some text 1
some text 2
some text 3
$ sed -e 's/^\w*\ *//' test.txt
text 1
text 2
text 3
Here is a solution using awk
awk '{$1= ""; print $0}' yourfile
Run this:
sed "s/^some\s//g" myfile
You even don't need to use a pipe.
To remove the first word, until space no matter how many spaces exist, use: sed 's/[^ ]* *//'
Example:
$ cat myfile
some text 1
some text 2
some text 3
$ cat myfile | sed 's/[^ ]* *//'
text 1
text 2
text 3

Resources