Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I need to sort data which reside in txt file. The sample data is as follows:
======
Jhon
Doe
score -
------
======
Ann
Smith
score +
------
======
Will
Marrow
score -
------
And I need to extract only sections where score + is defined. So the result should be
======
Ann
Smith
score +
------
I would try this one:
$ grep -B3 -A1 "score +" myfile
It means... grep three lines Before and one line After "score +".
Sed can do it as follows:
sed -n '/^======/{:a;N;/\n------/!ba;/score +/p}' infile
======
Ann
Smith
score +
------
where -n prevents printing, and
/^======/ { # If the pattern space starts with "======"
:a # Label to branch to
N # Append next line to pattern space
/\n------/!ba # If we don't match "------", branch to :a
/score +/p # If we match "score +", print the pattern space
}
Things could be more properly anchored with /\n------$/, but there are spaces at the end of the lines, and I'm not sure if those are real or copy-paste artefacts – but this work for the example data.
give this oneliner a try:
awk -v RS="==*" -F'\n' '{p=0;for(i=1;i<=NF;i++)if($i~/score \+/)p=1}p' file
with the given data, it outputs:
Ann
Smith
score +
------
The idea is, take all lines divided by ====... as one multiple-line record, and check if the record contains the searching pattern, print it out.
With GNU awk for multi-char RS:
$ awk -v RS='=+\n' '/score \+/' file
Ann
Smith
score +
------
Given:
$ echo "$txt"
======
Jhon
Doe
score -
------
======
Ann
Smith
score +
------
======
Will
Marrow
score -
------
You can create a toggle type match in awk to print only the section that you wist:
$ echo "$txt" | awk '/^=+/{f=1;s=$0;next} /^score \+/{f=2} f {s=s"\n"$0} /^-+$/ {if(f==2) {print s} f=0}'
======
Ann
Smith
score +
------
Use Grep Context Flags
Assuming you have a truly fixed-format file, you can just use fgrep (or GNU or BSD grep with the speedy --fixed-strings flag) along with the the --before-context and --after-context flags. For example:
$ fgrep -A1 -B3 'score +' /tmp/foo
======
Ann
Smith
score +
------
The flags will find your match, and include the three lines before and one line after each match. This gives you the output you're after, but with a lot less complexity than a sed or awk script. YMMV.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 months ago.
Improve this question
Let's say I have two large files. The first one is looking like this:
File1:
N 4764
56.067000 50.667000 24.026000
HT1 4765
55.129000 51.012000 24.198000
HT2 4766
56.059000 50.183000 23.126000
and the second one:
File2:
N NH2 -0.850000
HT1 H 0.222000
HT2 H 0.222000
I would like to replace all the N, HT1, and ..., in the first file with their matches in the second file (in second column of file2). so the outcome would be:
Outcome:
NH2 4764
56.067000 50.667000 24.026000
H 4765
55.129000 51.012000 24.198000
H 4766
56.059000 50.183000 23.126000
I am trying to do it with 'sed' but have not worked yet. Maybe awk is a better option?
*edit: My initial examples looked confusing so I changed my examples to the actual files I am dealing with. These are just three lines of my files.
If the first field of both files are sorted then a simple join command will give you the expected result:
join -o 2.2,1.2,1.3 file1.txt file2.txt
A3 125 111
B1 132 195
C56 145 695
D3 177 1001
If not, then you can use awk:
awk '
FNR == NR { arr[$1] = $2 OFS $3; next }
$1 in arr { print $2,arr[$1] }
' file1.txt file2.txt
One awk idea:
awk '
FNR==NR { a[$1]=$2; next } # 1st file: save entries in array
$1 in a { $1=a[$1] } # 2nd file: if $1 is an index in array then replace $1 with match from array
1 # print current line
' File2 File1
This generates:
NH2 4764
56.067000 50.667000 24.026000
H 4765
55.129000 51.012000 24.198000
H 4766
56.059000 50.183000 23.126000
NOTE: assumes spacing does not need to be maintained in the lines undergoing a replacement
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
here is the code, the CN part is working but awk...
I run function outside and it seems really clear. I just meet the bash :(
windowsearch()
{
starting_line_number=$1
ending_line_number=$2
position=$3
file_name=$4
CN=$(head -40 "$4" | sed -n "$starting_line_number","$ending_line_number p" )
awk -v CN="$CN" -F "\t" '{ print CN }' "$file_name" | sort -n -k"$position"
}
windowsearch 10 20 2 $imdbdir/tsv2/title.principals.tsv
desired output is like:
should yield:
tt0000009 nm0085156,nm0063086,nm1309758,nm0183823
tt0000014 nm0166380,nm0525910,nm0244989
tt0000010 nm0525910
tt0000016 nm0525910
tt0000012 nm0525910,nm0525908
tt0000015 nm0721526
tt0000018 nm0804434,nm3692071
tt0000019 nm0932055
tt0000013 nm1715062,nm0525910,nm0525908
tt0000017 nm3691272,nm0804434,nm1587194,nm3692829
tt0000011 nm3692297,nm0804434
but my output is giving me all data in the file. So, I think my filter doesn't work.
edit: sorry for the misunderstanding, this is my first question.
Your question lacks a description of the task and, ideally, examples of input data and desired output. It is hard to guess someone’s intentions from a completely broken script snippet. A possible wild guess might be:
windowsearch() {
awk "NR > ${2} {exit}
NR >= ${1}" < "$4" | sort -k "$3"
}
The awk code exits after it exceeds the upper limit on line numbers and prints entire lines after it reaches the lower limit. (NR is the current line number.) The output from awk (which is the interval of lines between the lower and upper limit) gets sorted (which awk itself can do as well, but using sort was shorter in this case).
Example (sort /etc/fstab lines 9 through 13 by mount point (field 2)):
windowsearch 9 13 2 /etc/fstab
My interpretation of your intention
you want to sort a range of lines from a file based on a given column.
$ awk -v start=10 -v end=20 'start<=NR && NR<=end' | sort -n k2
just parametrize input values in your script
This question already has answers here:
Take nth column in a text file
(6 answers)
Closed 2 years ago.
I have written a simple code that takes data from a text file( which has space-separated columns and 1.5 million rows) gives the output file with the specified column. But this code takes more than an hr to execute. Can anyone help me out to optimize runtime
a=0
cat 1c_input.txt/$1 | while read p
do
IFS=" "
for i in $p
do
a=`expr $a + 1`
if [ $a -eq $2 ]
then
echo "$i"
fi
done
a=0
done >> ./1.c.$2.column.freq
some lines of sample input:
1 ib Jim 34
1 cr JoHn 24
1 ut MaRY 46
2 ti Jim 41
2 ye john 6
2 wf JoHn 22
3 ye jOE 42
3 hx jiM 21
some lines of sample output if the second argument entered is 3:
Jim
JoHn
MaRY
Jim
john
JoHn
jOE
jiM
I guess you are trying to print just 1 column, then do something like
#! /bin/bash
awk -v c="$2" '{print $c}' 1c_input.txt/$1 >> ./1.c.$2.column.freq
If you just want something faster, use a utility like cut. So to
extract the third field from a single space delimited file bigfile
do:
cut -d ' ' -f 3 bigfile
To optimize the shell code in the question, using only builtin shell
commands, do something like:
while read a b c d; echo "$c"; done < bigfile
...if the field to be printed is a command line parameter, there are
several shell command methods, but they're all based on that line.
I have a file where there is a lot of books with index number.
I want to search the books with index number.
The file format is kind of like this:
"The Declaration of Independence of the United States of America,
1
by Thomas Jefferson"
......................
Alice's Adventures in Wonderland, by Lewis Carroll
11
#!/bin/bash
echo "Enter the content your are searching for:"
read content
echo -e "\nResult Shwoing For: $content\n"
grep $content GUTINDEX.ALL
If user search for 1.This code is printing 1, 11 every line that has one in them. I want to only print the line which contains 1:
"The Declaration of Independence of the United States of America, 1
simple use the -w flag, read more at grep --help
grep -w ${line_number} ${file_name}
for grep -w 1 books
The Declaration of Independence of the United States of America 1
Bobs's 1 in Wonderland, by Lewis Carroll 11
it may catch book names that contains number,
so better use regex [${digit}]$ for example [1]$ for matching
index at end of line.
grep -w [${line_number}]$ ${file_name}
for grep -w 1$ books
The Declaration of Independence of the United States of America, 1
you need to use regex. Change grep to egrep.
file:
1
11
111
if you want to search only 1 then you can use
cat file | egrep "^1$" # it means start and end with 1.`
then you need extend scrip. For example
file.txt
abc,1
abd,111
abf,11111
#
cat file.txt | while read line ; do
res=$(echo ${line} | awk -v FS=',' '{print $2}' | grep "^1$")
if [ $? -eq 0 ]; then
echo $line
fi
done
Good day. Ive been trying to sort the following data from a txt file using shell script but as of now I`ve been unable to do so.
Here is what the data on the file looks like,:
Name:ID:Date
Clinton Mcdaniel:100:16/04/2016
Patience Mccarty:101:18/03/2013
Carol Holman:102:24/10/2013
Roth Lamb:103:11/02/2015
Chase Gardner:104:14/06/2014
Jacob Tucker:105:05/11/2013
Maite Barr:106:24/04/2014
Acton Galloway:107:18/01/2013
Helen Orr:108:10/05/2014
Avye Rose:109:07/06/2014
What i want to do is being able to sort this by Date instead of name or ID.
When i execute the following code i get this:
Code:
sort -t "/" -k3.9 -k3.4 -k3
Result:
Acton Galloway:107:18/01/2013
Amaya Lynn:149:11/08/2013
Anne Sullivan:190:12/01/2013
Bruno Hood:169:01/08/2013
Cameron Phelps:187:17/11/2013
Carol Holman:102:24/10/2013
Chaney Mcgee:183:11/09/2013
Drew Fowler:173:28/07/2013
Hadassah Green:176:17/01/2013
Jacob Tucker:105:05/11/2013
Jenette Morgan:160:28/11/2013
Lael Aguirre:148:29/05/2013
Lareina Morin:168:06/05/2013
Laura Mercado:171:06/06/2013
Leonard Richard:154:02/06/2013
As you can see it only sorts by the year, but the months and everything else are still a little out of place. Does anyone knows how to correctly sort this by date?
EDIT:
Well, I`ve found how to do it, answer below:
Code: sort -n -t":" -k3.9 -k3.4,3.5 -k3
Result:
Anne Sullivan:190:12/01/2013
Hadassah Green:176:17/01/2013
Acton Galloway:107:18/01/2013
Nasim Gonzalez:163:18/01/2013
Patience Mccarty:101:18/03/2013
Sacha Stevens:164:01/04/2013
Lareina Morin:168:06/05/2013
Lael Aguirre:148:29/05/2013
Leonard Richard:154:02/06/2013
Laura Mercado:171:06/06/2013
Drew Fowler:173:28/07/2013
Bruno Hood:169:01/08/2013
Virginia Puckett:144:08/08/2013
Moses Mckay:177:09/08/2013
Amaya Lynn:149:11/08/2013
Chaney Mcgee:183:11/09/2013
Willa Bond:153:22/09/2013
Oren Flores:184:27/09/2013
Olga Buckley:181:11/10/2013
Carol Holman:102:24/10/2013
Jacob Tucker:105:05/11/2013
Veda Gillespie:125:09/11/2013
Thor Workman:152:12/11/2013
Cameron Phelps:187:17/11/2013
Jenette Morgan:160:28/11/2013
Mason Contreras:129:29/12/2013
Martena Sosa:158:30/12/2013
Vivian Stevens:146:20/01/2014
Benedict Massey:175:02/03/2014
Macey Holden:127:01/04/2014
Orla Estrada:174:06/04/2014
Maite Barr:106:24/04/2014
Helen Orr:108:10/05/2014
Randall Colon:199:27/05/2014
Avye Rose:109:07/06/2014
Cleo Decker:117:12/06/2014
Chase Gardner:104:14/06/2014
Mark Lynn:113:21/06/2014
Geraldine Solis:197:24/06/2014
Thor Wheeler:180:25/06/2014
Aimee Martin:192:21/07/2014
Gareth Cervantes:166:26/08/2014
Serena Fernandez:122:24/09/2014
`
The sort you are using will fail for any date before year 2000 (e.g. 1999 will sort after 2098). Continuing from your question in the comment, you currently show
sort -n -t":" -k3.9 -k3.4,3.5 -k3
You should use
sort -n -t":" -k3.7 -k3.4,3.5 -k3.1,3.2
Explanation:
Your -t separates the fields on each colon. (':') The -k KEYDEF where KEYDEF is in the form f[.c][opt] (that's field.character option) (you need no separate option after character). Your date field is (field 3):
d d / m m / y y y y
1 2 3 4 5 6 7 8 9 0 -- chars counting from 1 in field 3
So you first sort by -k3.9 (the 9th character in field 3) which is the 2-digit year in the 4-digit field. You really want to sort on -k3.7 (which is the start of the 4-digit date)
You next sort by the month (characters 4,5) which is fine.
Lastly, you sort on -k3 (which fails to limit the characters considered). Just as you have limited the sort on the month to chars 4,5, you should limit the sort of the days to characters 1,2.
Putting that together gives you sort -n -t":" -k3.7 -k3.4,3.5 -k3.1,3.2. Hope that answers your question from the comment.
You're hamstrung by your (terrible, IMO) date format. Here's a bit of a Schwartzian transform:
awk -F'[:/]' '{printf "%s%s%s %s\n", $NF, $(NF-1), $(NF-2), $0}' file | sort -n | cut -d' ' -f2-
That extracts the year, month, day and adds it as a separate word to the start of each line. Then you can sort quite simply. Then discard that date.