Unix shell - search Tab delimted file

Unix shell - search Tab delimted file - shell

Hi I have a tab delimited text file. The columns are age and name.
Number of rows are 50.
I want to find and output to a new file, the rows, when they age is "eighteen"
cat old_filename | grep "eighteen" > new_filename
when I try , it does not output the rows into the newfile but only the first instance it finds eighteen

grep can check inside files, it matches lines by default.
So if the input was:
name age
tom eighteen
joe sixteen
sam eighteen
grep "eighteen" old_filename > new_filename
new_filename would have
tom eighteen
sam eighteen

Related

combine 2 text files with different number of rows in to csv file

I have 2 text files as below
A.txt (with 2 rows):
abc-1234
tik-3456
B.txt (with 4 rows)
123456
234567
987
12
I want to combine these 2 to get the below file in CSV format:
column-1 column-2
abc-1234 123456
tik-3456 234567
987
12
I am trying below command. However, not achieving the above result.
paste -d "," A.txt B.txt > C.csv
It is giving below output:
abc-1234
,123456
tik-3456,234567
,987
,12
Can anyone please let me know, what I am missing here?

In linux we have one utility that does one think very good. So:
paste merges files
column with -t creates tables
The following:
paste -d',' /tmp/1 /tmp/2 | column -t -N 'column-1,column-2' -s',' -o' '
outputs the desired result.

Keep only differences (and the some lines above and below) between 2 CSV files

I'm very new to shell scripting and wasn't sure how to go about doing this.
Suppose I have two files:
file1.csv | file2.csv
--------------------
Apples Apples
Dogs Dogs
Cats Cats
Grapes Oranges
Batman Thor
Borgs Daleks
Kites Kites
Blah Blah
xyz xyz
How do I only keep the differences in each file, and 2 lines above the start of the differences, and 2 lines after? For example, the output would be:
file1.csv | file2.csv
-----------------------
Dogs Dogs
Cats Cats
Grapes Oranges
Batman Thor
Borgs Daleks
Kites Kites
Blah Blah
Thank you very much!

This is a job for diff.
diff -u2 file1.csv file2.csv | sed '1,3d;/##/,+2d' > diff
The diff command will produce a patch style difference containing meta information of the files in the form:
--- file1.csv 2017-05-12 15:21:47.564801174 -0700
+++ file2.csv 2017-05-12 15:21:52.462801174 -0700
## -2,7 +2,7 ##
Any block of difference will have header like ## -2,7 +2,7 ##. We want to throw these away using sed.
1,3d - means delete the top 3 lines
/##/,+2d - delete any lines containing ## and the next 2 lines after it. This is not needed for your case but is good to be included here in case your input suddenly has multiple blocks of differences.
The result of the above commands will produce this list.
Dogs
Cats
-Grapes
-Batman
-Borgs
+Oranges
+Thor
+Daleks
Kites
Blah
The contents has a 1 character prefix, ' ' is common to both, '-' is only on file1.csv while '+' is only on file2.csv. Now all we need is to distribute these to the 2 files.
sed '/^+.*/d;s/^.//' diff > file1.csv
sed '/^-.*/d;s/^.//' diff > file2.csv
The sed commands here will filter the file and write the proper contents to each of the input files.
/^+.*/d - lines starting with '+' will be deleted.
s/^.// - will remove the 1 character prefix which was added by diff.
/^-.*/d - lines starting with '-' will be deleted.
Finally, remove the transient file diff.

How to divide my script output by the output of another command?

I have a folder, my_folder, which contains over 800 files named myfile_*.dat where * is the unique ID for each file. In my file I basically have a variety of repeated fields but the one I am interested in is the <rating> field. Lines of this field look like the following: <rating>n where n is the rating score. I have a script which sums up all of the ratings per file, but now I must divide it by the number of lines that have <rating>n in order to obtain an average rating per file. Here is my script:
dir=$1
cd $dir
grep -P -o '(?<=<rating>).*' * |awk -F: '{A[$1]+=$2;next}END{for(i in A){print i,A[i]}}'|sort -nr -k2
I figure that I would use grep -c <rating> myfile_*.dat to count the number of matching lines and then divide the sum by this count per file but do not know where to put this in my script? Any suggestions are appreciated.
My script takes the folder name as an argument in the command line.
INPUT FILE
<Overall Rating>
<Avg. Price>$155
<URL>
<Author>Jeter5
<Content>I hope we're not disappointed! We enjoyed New Orleans...
<Date>Dec 19, 2008
<No. Reader>-1
<No. Helpful>-1
<rating>4
<Value>-1
<Rooms>3
<Location>5
<Cleanliness>3
<Check in / front desk>5
<Service>5
<Business service>5
<Author>...
repeat fields again...

Just set up another array L to track the count of items:
grep -P -o '(?<=<rating>).*' * |
awk -F: '{A[$1]+=$2;L[$1]++;next}END{for(i in A){print i,A[i],A[i]/L[i]}}' |
sort -nr -k2

How to split text files by number of rows that corresponds to another set of files?

Cut a file into several files according to numbers in a list:
$ wc -l all.txt
8500 all.txt
$ wc -l STS.*.txt
2000 STS.input.answers-forums.txt
1500 STS.input.answers-students.txt
2000 STS.input.belief.txt
1500 STS.input.headlines.txt
1500 STS.input.images.txt
How do I split my all.txt into the no. of lines of the STS.*.txt and then save them to the respective STS.output.*.txt?
I've been doing it manually as such:
$ sed '1,2000!d' all.txt > STS.output.answers-forums.txt
$ sed '2001,3500!d' all.txt > STS.output.answers-students.txt
$ sed '3501,5500!d' all.txt > STS.output.belief.txt
$ sed '5501,7000!d' all.txt > STS.output.headlines.txt
$ sed '7001,8500!d' all.txt > STS.output.images.txt
The all.txt input would look something like this:
$ head all.txt
2.3059
2.2371
2.1277
2.1261
2.0576
2.0141
2.0206
2.0397
1.9467
1.8518
Or sometimes all.txt looks like this:
$ head all.txt
2.3059 92.123
2.2371 1.123
2.1277 0.12452
2.1261123 213
2.0576 100
2.0141 0
2.02062 1
2.03972 34.123
1.9467 9.23
1.8518 9123.1
As for the STS.*.txt, they are just plain text lines, e.g.:
$ head STS.output.answers-forums.txt
The problem likely will mean corrective changes before the shuttle fleet starts flying again. He said the problem needs to be corrected before the space shuttle fleet is cleared to fly again.
The technology-laced Nasdaq Composite Index .IXIC inched down 1 point, or 0.11 percent, to 1,650. The broad Standard & Poor's 500 Index .SPX inched up 3 points, or 0.32 percent, to 970.
"It's a huge black eye," said publisher Arthur Ochs Sulzberger Jr., whose family has controlled the paper since 1896. "It's a huge black eye," Arthur Sulzberger, the newspaper's publisher, said of the scandal.

Wish you'd posted some sample input for splitting an input file of, say, 10 lines into output files of say, 2, 3, and 5 lines instead of 8500 lines into.... as that would have given us something to test a solution against. Oh well, this might work but is untested of course:
awk '
ARGIND < (ARGC-1) { outfile[NR] = gensub(/input/,"output","",FILENAME); next }
{ print > outfile[FNR] }
' STS.input.* all.txt
The above used GNU awk for ARGIND and gensub().
It just creates an array that maps each line number across all "input" files to the name of the "output" file that that same line number of "all.txt" should be written to.
Any time you write a loop in shell just to manipulate text you have the wrong approach. The guys who created shell also created awk for shell to call to manipulate text so just do that.

I would suggest writing a loop:
for file in answers-forums answers-students belief headlines images; do
lines=$(wc -l < "STS.input.$file.txt")
sed "$(( total + 1 )),$(( total + lines ))!d" all.txt > "STS.output.$file.txt"
(( total += lines ))
done
total keeps a track of how many lines have been read so far. The sed command extracts the lines from total + 1 to total + lines, writing them to the corresponding output file.

Unix code wanted to copy template file and replace strings in template file in the copied files

I have 2 files:
File_1.txt:
John
Mary
Harry
Bill
File_2.txt:
My name is ID, and I am on line NR of file 1.
I want to create four files that look like this:
Output_file_1.txt:
My name is John, and I am on line 1 of file 1.
Output_file_2.txt:
My name is Mary, and I am on line 2 of file 1.
Output_file_3.txt:
My name is Harry, and I am on line 3 of file 1.
Output_file_4.txt:
My name is Bill, and I am on line 4 of file 1.
Normally I would use the following sed command to do this:
for q in John Mary Harry Bill
do
sed 's/ID/'${q}'/g' File_2.txt > Output_file.txt
done
But that would only replace the ID for the name, and not include the line nr of File_1.txt. Unfortunately, my bash skills don't go much further than that... Any tips or suggestions for a command that includes both file 1 and 2? I do need to include file 1, because actually the files are much larger than in this example, but I'm thinking I can figure the rest of the code out if I know how to do it with this hopefully simpler example... Many thanks in advance!

How about:
n=1
while read q
do
sed -e 's/ID/'${q}'/g' -e "s/NR/$n/" File_2.txt > Output_file_${n}.txt
((n++))
done < File_1.txt
See the Advanced Bash Scripting Guide on redirecting input to code blocks, and maybe the section on double parentheses for further reading.

How about awk, instead?
[ghoti#pc ~]$ cat file1
John
Mary
[ghoti#pc ~]$ cat file2
Harry
Bill
[ghoti#pc ~]$ cat merge.txt
My name is %s, and I am on the line %s of file '%s'.
[ghoti#pc ~]$ cat doit.awk
#!/usr/bin/awk -f
BEGIN {
while (getline line < "merge.txt") {
fmt = fmt line "\n";
}
}
{
file="Output_File_" NR ".txt";
printf(fmt, $1, FNR, FILENAME) > file;
}
[ghoti#pc ~]$ ./doit.awk file1 file2
[ghoti#pc ~]$ grep . Output_File*txt
Output_File_1.txt:My name is John, and I am on the line 1 of file 'file1'.
Output_File_2.txt:My name is Mary, and I am on the line 2 of file 'file1'.
Output_File_3.txt:My name is Harry, and I am on the line 1 of file 'file2'.
Output_File_4.txt:My name is Bill, and I am on the line 2 of file 'file2'.
[ghoti#pc ~]$
If you really want your filenames to be numbered, we can do that too.
What's going on here?
The awk script BEGINs by reading in your merge.txt file and appending it to the variable "fmt", line by line (separated by newlines). This makes fmt a printf-compatile format string.
Then, for every line in your input files (specified on the command line), an output file is selected (NR is the current record count spanning all files). The printf() function replaces each %s in the fmt variable with one of its options. Output is redirected to the appropriate file.
The grep just shows you all the files' contents with their filenames.

This might work for you:
sed '=' File_1.txt |
sed '1{x;s/^/'"$(<File_2.txt)"'/;x};N;s/\n/ /;G;s/^\(\S*\) \(\S*\)\n\(.*\)ID\(.*\)NR\(.*\)/echo "\3\2\4\1\5" >Output_file_\1.txt/' |
bash

TXR:
$ txr merge.txr
My name is John, and I am on the line 1 of file1.
My name is Mary, and I am on the line 2 of file1.
My name is Harry, and I am on the line 3 of file1.
My name is Bill, and I am on the line 4 of file1.
merge.txr:
#(bind count #(range 1))
#(load "file2.txt")
#(next "file1.txt")
#(collect)
#name
#(template name #(pop count) "file1")
#(end)
file2.txt:
#(define template (ID NR FILE))
#(output)
My name is #ID, and I am on the line #NR of #FILE.
#(end)
#(end)

Read the names into an array.
get the array length
iterate over the array
Test preparation:
echo "John
Mary
Harry
Bill
" > names
Names and numbers:
name=($(<names))
max=$(($(echo ${#name[*]})-1))
for i in $(seq 0 $max) ; do echo $i":"${name[i]}; done
with template:
for i in $(seq 0 $max) ; do echo "My name is ID, and I am on the line NR of file 1." | sed "s/ID/${name[i]}/g;s/NR/$((i+1))/g"; done
My name is John, and I am on the line 1 of file 1.
My name is Mary, and I am on the line 2 of file 1.
My name is Harry, and I am on the line 3 of file 1.
My name is Bill, and I am on the line 4 of file 1.

A little modification needed in your script.Thats it.
pearl.306> cat temp.sh
#!/bin/ksh
count=1
cat file1|while read line
do
sed -e "s/ID/${line}/g" -e "s/NR/${count}/g" File_2.txt > Output_file_${count}.txt
count=$(($count+1))
done
pearl.307>
pearl.303> temp.sh
pearl.304> ls -l Out*
-rw-rw-r-- 1 nobody nobody 59 Mar 29 18:54 Output_file_1.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_2.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_3.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_4.txt
-rw-rw-r-- 1 nobody nobody 58 Mar 29 18:54 Output_file_5.txt
pearl.305> cat Out*
My name is linenumber11, and I am on the line 1 of file 1.
My name is linenumber2, and I am on the line 2 of file 1.
My name is linenumber1, and I am on the line 3 of file 1.
My name is linenumber4, and I am on the line 4 of file 1.
My name is linenumber6, and I am on the line 5 of file 1.
pearl306>

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Unix shell - search Tab delimted file - shell

grep can check inside files, it matches lines by default. So if the input was: name age tom eighteen joe sixteen sam eighteen grep "eighteen" old_filename > new_filename new_filename would have tom eighteen sam eighteen

Related

combine 2 text files with different number of rows in to csv file

Keep only differences (and the some lines above and below) between 2 CSV files

How to divide my script output by the output of another command?

How to split text files by number of rows that corresponds to another set of files?

Unix code wanted to copy template file and replace strings in template file in the copied files

Categories

Resources