write a SHELL Script to substitute new line to tab [duplicate] - bash

This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
Shell script to remove new line after numeric string
I need to write a SHELL Script to remove new line \n after string doesn't start with number and substitute it to tab \t or for 5 spaces for example.
For example a have a file:
asasas
12345
adab-123
123
I need output like this:
asasasi 12345
adab-123 123

This should work -
[jaypal~/Temp]$ cat file9
asasas
12345
adab-123
123
fffd
223
2323
afdf
23234
with tab:
[jaypal~/Temp]$ sed '/^[0-9]/!{N;s/\n/\t/}' file9
asasas 12345
adab-123 123
fffd 223
2323
afdf 23234

The awk way =)
awk '/^[0-9]/{print} !/^[0-9]/{printf("%s\t", $0)}' file
same output as Jaypal.

Related

How to reduce run time of shell script? [duplicate]

This question already has answers here:
Take nth column in a text file
(6 answers)
Closed 2 years ago.
I have written a simple code that takes data from a text file( which has space-separated columns and 1.5 million rows) gives the output file with the specified column. But this code takes more than an hr to execute. Can anyone help me out to optimize runtime
a=0
cat 1c_input.txt/$1 | while read p
do
IFS=" "
for i in $p
do
a=`expr $a + 1`
if [ $a -eq $2 ]
then
echo "$i"
fi
done
a=0
done >> ./1.c.$2.column.freq
some lines of sample input:
1 ib Jim 34
1 cr JoHn 24
1 ut MaRY 46
2 ti Jim 41
2 ye john 6
2 wf JoHn 22
3 ye jOE 42
3 hx jiM 21
some lines of sample output if the second argument entered is 3:
Jim
JoHn
MaRY
Jim
john
JoHn
jOE
jiM
I guess you are trying to print just 1 column, then do something like
#! /bin/bash
awk -v c="$2" '{print $c}' 1c_input.txt/$1 >> ./1.c.$2.column.freq
If you just want something faster, use a utility like cut. So to
extract the third field from a single space delimited file bigfile
do:
cut -d ' ' -f 3 bigfile
To optimize the shell code in the question, using only builtin shell
commands, do something like:
while read a b c d; echo "$c"; done < bigfile
...if the field to be printed is a command line parameter, there are
several shell command methods, but they're all based on that line.

How to get the length of each word in a column without AWK, sed or a loop? [duplicate]

This question already has answers here:
Length of string in bash
(11 answers)
Closed 6 years ago.
Is it even possible? I currently have a one-liner to count the number of words in a file. If I output what I currently have it looks like this:
3 abcdef
3 abcd
3 fec
2 abc
This is all done in 1 line without loops and I was thinking if I could add a column with length of each word in a column. I was thinking I could use wc -m to count the characters, but I don't know if I can do that without a loop?
As seen in the title, no AWK, sed, perl.. Just good old bash.
What I want:
3 abcdef 6
3 abcd 4
3 fec 3
2 abc 3
Where the last column is length of each word.
while read -r num word; do
printf '%s %s %s\n' "$num" "$word" "${#word}"
done < file
You can do something like this also:
File
> cat test.txt
3 abcdef
3 abcd
3 fec
2 abc
Bash script
> cat test.txt.sh
#!/bin/bash
while read line; do
items=($line) # split the line
strlen=${#items[1]} # get the 2nd item's length
echo $line $strlen # print the line and the length
done < test.txt
Results
> bash test.txt.sh
3 abcdef 6
3 abcd 4
3 fec 3
2 abc 3

Randomly sample lines retaining commented header lines

I'm attempting to randomly sample lines from a (large) file, while always retaining a set of "header lines". Header lines are always at the top of the file and unlike any other lines, begin with a #.
The actual file format I'm dealing with is a VCF, but I've kept the question general
Requirements:
Output all header lines (identified by a # at line start)
The command / script should (have the option to) read from STDIN
The command / script should output to STDOUT
For example, consider the following sample file (file.in):
#blah de blah
1
2
3
4
5
6
7
8
9
10
An example output (file.out) would be:
#blah de blah
10
2
5
3
4
I have a working solution (in this case selecting 5 non-header lines at random) using bash. It is capable of reading from STDIN (I can cat the contents of file.in into the rest of the command) however it writes to a named file rather than STDOUT:
cat file.in | tee >(awk '$1 =~ /^#/' > file.out) | awk '$1 !~ /^#/' | shuf -n 5 >> file.out
By using process substitution (thanks Tom Fenech), both commands are seen as files.
Then using cat we can concatenate these "files" together and output to STDOUT.
cat <(awk '/^#/' file) <(awk '!/^#/' file | shuf -n 10)
Input
#blah de blah
1
2
3
4
5
6
7
8
9
10
Output
#blah de blah
1
9
8
4
7
2
3
10
6
5

how to loop over pattern from a file with grep [duplicate]

This question already has answers here:
Looping through the content of a file in Bash
(16 answers)
Closed 7 years ago.
I am trying to grep a series of patterns that are, one per line, in a text file (list.txt), to find how many matches there are of each pattern in the target file (file1). The target file looks like this:
$ cat file1
2346 TGCA
2346 TGCA
7721 GTAC
7721 GTAC
7721 CTAC
And I need counts of each numerical pattern (2346 and 2271).
I have this script that works if you provide a list of the patterns in quotes:
$ for p in '7721' '2346'; do printf '%s = ' "$p"; grep -c "$p" file1; done
7721 = 3
2346 = 2
What I would like to do is search for all patterns in list.txt:
$ cat list.txt
7721
2346
6555
25425
22
125
....
19222
How can I convert my script above to look in the list.txt and search for each pattern, and return the same output as above (pattern = count) e.g.:
2346 = 2
7721 = 3
....
19222 = 6
try this awk oneliner:
awk 'NR==FNR{p[$0];next}$1 in p{p[$1]++}END{for(x in p)print x, p[x]}' list.txt file

Unix code wanted to copy template file and replace strings in template file in the copied files

I have 2 files:
File_1.txt:
John
Mary
Harry
Bill
File_2.txt:
My name is ID, and I am on line NR of file 1.
I want to create four files that look like this:
Output_file_1.txt:
My name is John, and I am on line 1 of file 1.
Output_file_2.txt:
My name is Mary, and I am on line 2 of file 1.
Output_file_3.txt:
My name is Harry, and I am on line 3 of file 1.
Output_file_4.txt:
My name is Bill, and I am on line 4 of file 1.
Normally I would use the following sed command to do this:
for q in John Mary Harry Bill
do
sed 's/ID/'${q}'/g' File_2.txt > Output_file.txt
done
But that would only replace the ID for the name, and not include the line nr of File_1.txt. Unfortunately, my bash skills don't go much further than that... Any tips or suggestions for a command that includes both file 1 and 2? I do need to include file 1, because actually the files are much larger than in this example, but I'm thinking I can figure the rest of the code out if I know how to do it with this hopefully simpler example... Many thanks in advance!
How about:
n=1
while read q
do
sed -e 's/ID/'${q}'/g' -e "s/NR/$n/" File_2.txt > Output_file_${n}.txt
((n++))
done < File_1.txt
See the Advanced Bash Scripting Guide on redirecting input to code blocks, and maybe the section on double parentheses for further reading.
How about awk, instead?
[ghoti#pc ~]$ cat file1
John
Mary
[ghoti#pc ~]$ cat file2
Harry
Bill
[ghoti#pc ~]$ cat merge.txt
My name is %s, and I am on the line %s of file '%s'.
[ghoti#pc ~]$ cat doit.awk
#!/usr/bin/awk -f
BEGIN {
while (getline line < "merge.txt") {
fmt = fmt line "\n";
}
}
{
file="Output_File_" NR ".txt";
printf(fmt, $1, FNR, FILENAME) > file;
}
[ghoti#pc ~]$ ./doit.awk file1 file2
[ghoti#pc ~]$ grep . Output_File*txt
Output_File_1.txt:My name is John, and I am on the line 1 of file 'file1'.
Output_File_2.txt:My name is Mary, and I am on the line 2 of file 'file1'.
Output_File_3.txt:My name is Harry, and I am on the line 1 of file 'file2'.
Output_File_4.txt:My name is Bill, and I am on the line 2 of file 'file2'.
[ghoti#pc ~]$
If you really want your filenames to be numbered, we can do that too.
What's going on here?
The awk script BEGINs by reading in your merge.txt file and appending it to the variable "fmt", line by line (separated by newlines). This makes fmt a printf-compatile format string.
Then, for every line in your input files (specified on the command line), an output file is selected (NR is the current record count spanning all files). The printf() function replaces each %s in the fmt variable with one of its options. Output is redirected to the appropriate file.
The grep just shows you all the files' contents with their filenames.
This might work for you:
sed '=' File_1.txt |
sed '1{x;s/^/'"$(<File_2.txt)"'/;x};N;s/\n/ /;G;s/^\(\S*\) \(\S*\)\n\(.*\)ID\(.*\)NR\(.*\)/echo "\3\2\4\1\5" >Output_file_\1.txt/' |
bash
TXR:
$ txr merge.txr
My name is John, and I am on the line 1 of file1.
My name is Mary, and I am on the line 2 of file1.
My name is Harry, and I am on the line 3 of file1.
My name is Bill, and I am on the line 4 of file1.
merge.txr:
#(bind count #(range 1))
#(load "file2.txt")
#(next "file1.txt")
#(collect)
#name
#(template name #(pop count) "file1")
#(end)
file2.txt:
#(define template (ID NR FILE))
#(output)
My name is #ID, and I am on the line #NR of #FILE.
#(end)
#(end)
Read the names into an array.
get the array length
iterate over the array
Test preparation:
echo "John
Mary
Harry
Bill
" > names
Names and numbers:
name=($(<names))
max=$(($(echo ${#name[*]})-1))
for i in $(seq 0 $max) ; do echo $i":"${name[i]}; done
with template:
for i in $(seq 0 $max) ; do echo "My name is ID, and I am on the line NR of file 1." | sed "s/ID/${name[i]}/g;s/NR/$((i+1))/g"; done
My name is John, and I am on the line 1 of file 1.
My name is Mary, and I am on the line 2 of file 1.
My name is Harry, and I am on the line 3 of file 1.
My name is Bill, and I am on the line 4 of file 1.
A little modification needed in your script.Thats it.
pearl.306> cat temp.sh
#!/bin/ksh
count=1
cat file1|while read line
do
sed -e "s/ID/${line}/g" -e "s/NR/${count}/g" File_2.txt > Output_file_${count}.txt
count=$(($count+1))
done
pearl.307>
pearl.303> temp.sh
pearl.304> ls -l Out*
-rw-rw-r-- 1 nobody nobody 59 Mar 29 18:54 Output_file_1.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_2.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_3.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_4.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_5.txt
pearl.305> cat Out*
My name is linenumber11, and I am on the line 1 of file 1.
My name is linenumber2, and I am on the line 2 of file 1.
My name is linenumber1, and I am on the line 3 of file 1.
My name is linenumber4, and I am on the line 4 of file 1.
My name is linenumber6, and I am on the line 5 of file 1.
pearl306>

Resources