Create final column containing row numbers in text file - bash

I am new to using the Mac terminal. I need to add a tab delimited column to a text file with 3 existing columns. The columns look pretty much like this:
org1 1-20 1-40
org2 3-35 6-68
org3 16-38 40-16
etc.
I need them to look like this:
org1 1-20 1-40 1
org2 3-35 6-68 2
org3 16-38 40-16 3
etc.
My apologies if this question has been covered. Answers to similar questions are sometimes exceedingly esoteric and are not easily translatable to this specific situation.

In awk. print the record and the required tab and row count after it:
$ awk '{print $0 "\t" NR }' foo
org1 1-20 1-40 1
org2 3-35 6-68 2
org3 16-38 40-16 3

If you want to add the line numbers to the last column:
perl -i -npe 's/$/"\t$."/e' file
where
-i replaces the file in-pace (remove, if you want to print the result to the standard output);
-n causes Perl to apply the substitution to each line from the file, just like sed;
-p prints the result of expression;
-e accepts Perl expression;
s/.../.../e substitutes the first part to the second (delimited with slash), and the e flag causes Perl to evaluate the replacement as Perl expression;
$ is the end-of-line anchor;
$. variable keeps the number of the current line
In other words, the command replaces the end of the line ($) with a tab followed by the line number $..

You can paste the file next to the same file with line numbers prepended (nl), and all the other columns removed (cut -f 1):
$ paste infile <(nl infile | cut -f 1)
org1 1-20 1-40 1
org2 3-35 6-68 2
org3 16-38 40-16 3
The <(...) construct is called process substitution and basically allows you to treat the output of a command like a file.

Related

3 Is there anyway to insert new lines in-between two patterns

Is there anyway to insert new lines in-between 2 specific patterns of characters? I want to insert a new line every time "butterfly" occurs in a text file, however I want this new line to be inserted between the "butter" and "fly". For example butter\nfly
I also want to find the length of each line after splitting.
Eg:
if textfile contains:
fgsccgewvdhbejbecbecboubutterflybvdcvhkebcjl
vdjchvhecbihbutterflyglehblejkbedkbutterflyr
Then, I want a result like the following:
29 fgsccgewvdhbejbecbecboubutter
33 flybvdcvhkebcjlvdjchvhecbihbutter
22 flyglehblejkbedkbutter
4 flyr
I believe one way to tackle it would be to insert a new line using "sed" everywhere "butter" occurs and is followed by "fly". Strip out all blank line using grep with a -v flag. Then get the length of each line. However, even after trying a lot, I am unable to get the correct answer.
The Sed 's' sub-command + awk can work together:
sed -e "s/butterfly/butter\\nfly/g" < input.txt | awk '{ print length, $0 }'
This might work for you (GNU sed & bash):
sed -Ez 's/\n//g;s/(butter)(fly)/\1\n\2/g;s/^.*$/l=&;printf "%d %s\n" ${#l} &/meg' file
Slurp the file into memory using the -z sed option. Remove all existing newlines and then insert new ones between butter and fly. Using the m, g and e flags of the sed substitute command, split into separate lines and using bash make a variable l and via printf print the required format.

Use grep to print only the context

Using grep, you can print lines that match your search query. Adding a -C option will print two lines of surrounding context, like this:
> grep -C 2 'lorem'
some context
some other context
**lorem ipsum**
another line
yet another line
Similarly, you can use grep -B 2 or grep -A 2 to print matching lines with two preceding or two following lines, respectively, for example:
> grep -A 2 'lorem'
**lorem ipsum**
another line
yet another line
Is it possible to skip the matching line and only print the context? Specifically, I would like to only print the line that is exactly 2 lines above a match, like this:
> <some magic command>
some context
If you can allow couple of grep instances to be used, you can try like as I mentioned in the comments section.
$ grep -v "lorem" < <(grep -A2 "lorem" file)
another line
yet another line
$ grep -A2 "lorem" file | grep -v "lorem"
another line
yet another line
If you are interested in a dose of awk, there is a cool way to do it as
$ awk -v count=2 '{a[++i]=$0;}/lorem/{for(j=NR-count;j<NR;j++)print a[j];}' file
another line
yet another line
It works by storing the entire file in its own array and after searching for the pattern lorem, the awk special variable which stores the row number(NR), points at the exact line in which the pattern is found. If we loop for 2 lines before it as dictated by the awk variable -v count, we can print the lines needed.
If you are interested in the printing the pattern also, just change the condition in for-loop as j<=NR instead of j<NR. That's it!
There’s no way to do this purely through a grep command. If there’s only one instance of lorem in the text, you could pipe the output through head.
grep -B2 lorem t | head -1
If there may be multiple occurrence of lorem, you could use awk:
awk '{second_previous=previous; previous=current_line; current_line=$0}; /lorem/ { print second_previous; }'
This awk command saves each line (along with the previous and the one before that) in variables so when it encounters a line containing lorem, it prints the second last line. If lorem happens to occur in the first or second line of the input, nothing would be printed.
awk, as others have said, is your friend here. You don't need complex loops or arrays or other junk, though; basic patterns suffice.
When you use -B N, (and the --no-group-separator flag) you get output in groups of M=N+1 lines. To select precisely one of those lines (in your question, you want the very first of the group), you can use modular arithmetic (tested with GNU awk).
awk -vm=3 -vx=1 'NR%m==x{print}'
You can think of the lines being numbered like this: they count up until you reach the match, at which point they go back to zero. So set m to N+1 and x to the line you want to extract.
1 some context
2 some other context
0 **lorem ipsum**
So the final command would be
grep -B2 --no-group-separator lorem $input | awk -vm=3 -vx=1 'NR%m==x{print}'

Sed replace the first value

I want to replace the first value (in first column and line so here 1) and add one to this value, so I have a file like this
1
1 1
2 5
1 6
I use this sentence
read -r a < file
echo $aa
sed "s/$aa/$(($aa + 1))/" file
# or
sed 's/$aa/$(($aa + 1))/' file
But when I make that, he change all first column one into two. I have try to change the quote but it make nothing.
restrict the script to first line only, i.e.
sed '1s/old/new/'
awk might be a better tool for this.
awk 'NR==1{$1=$1+1}1'
for the first line add 1 to the first field and print. Can be rewritten as
awk 'NR==1{$1+=1}1'
or
awk 'NR==1{$1++}1'
perl -p0e 's/(\d+)/$1+1/e' file

using xargs as an argument for cut

Say i have a file a.txt containing a word, followed by a number, followed by a newline on
and 3
now 2
for 2
something 7
completely 8
different 6
I need to select the nth char from every word (specified by the number next to the word)
cat a.txt | cut -d' ' -f2 | xargs -i -n1 cut a.txt -c {}
I tried this command, which selects the numbers and uses xargs to put them into the -c option from cut, but the cut command gets executed on every line, instead of a.txt being looped (which I had expected to happen) How can I resolve this problem?
EDIT: Since it seems to be unclear, i want to select a character from a word. The character which I need to select can be found next to the word, for example:
and 3, will give me d. I want to do this for the entire file, which will then form a word :)
A pure shell solution:
$ while read word num; do echo ${word:$((num-1)):1}; done < a.txt
d
o
o
i
e
r
This is using a classic while; do ... ; done shell loop and the read builtin. The general format is
while read variable1 variable2 ... variableN; do something; done < input_file
This will iterate over each line of your input file splitting it into as many variables as you've given. By default, it will split at whitespace but you can change that by changing the $IFS variable. If you give a single variable, the entire line will be saved, if you give more, it will populate as many variables as you give it and save the rest in the last one.
In this particular loop, we're reading the word into $word and the number into $num. Once we have the word, we can use the shell's string manipulation capabilities to extract a substring. The general format is
${string:start:length}
So, ${string:0:2} would extract the first two characters from the variable $string. Here, the variable is $word, the start is the number minus one (this starts counting at 0) and the length is one. The result is the single letter at the position given by the number.
I would suggest that you used awk:
awk '{print substr($1,$2,1)}' file
substr takes a substring of the first field starting from the number contained in the second field and of length 1.
Testing it out (using the original input from your question):
$ cat file
and 3
now 2
for 2
something 7
completely 8
different 6
$ awk '{print substr($1,$2,1)}' file
d
o
o
i
e
r

'grep +A': print everything after a match [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 7 years ago.
I have a file that contains a list of URLs. It looks like below:
file1:
http://www.google.com
http://www.bing.com
http://www.yahoo.com
http://www.baidu.com
http://www.yandex.com
....
I want to get all the records after: http://www.yahoo.com, results looks like below:
file2:
http://www.baidu.com
http://www.yandex.com
....
I know that I could use grep to find the line number of where yahoo.com lies using
grep -n 'http://www.yahoo.com' file1
3 http://www.yahoo.com
But I don't know how to get the file after line number 3. Also, I know there is a flag in grep -A print the lines after your match. However, you need to specify how many lines you want after the match. I am wondering is there something to get around that issue. Like:
Pseudocode:
grep -n 'http://www.yahoo.com' -A all file1 > file2
I know we could use the line number I got and wc -l to get the number of lines after yahoo.com, however... it feels pretty lame.
AWK
If you don't mind using AWK:
awk '/yahoo/{y=1;next}y' data.txt
This script has two parts:
/yahoo/ { y = 1; next }
y
The first part states that if we encounter a line with yahoo, we set the variable y=1, and then skip that line (the next command will jump to the next line, thus skip any further processing on the current line). Without the next command, the line yahoo will be printed.
The second part is a short hand for:
y != 0 { print }
Which means, for each line, if variable y is non-zero, we print that line. In AWK, if you refer to a variable, that variable will be created and is either zero or empty string, depending on context. Before encounter yahoo, variable y is 0, so the script does not print anything. After encounter yahoo, y is 1, so every line after that will be printed.
Sed
Or, using sed, the following will delete everything up to and including the line with yahoo:
sed '1,/yahoo/d' data.txt
This is much easier done with sed than grep. sed can apply any of its one-letter commands to an inclusive range of lines; the general syntax for this is
START , STOP COMMAND
except without any spaces. START and STOP can each be a number (meaning "line number N", starting from 1); a dollar sign (meaning "the end of the file"), or a regexp enclosed in slashes, meaning "the first line that matches this regexp". (The exact rules are slightly more complicated; the GNU sed manual has more detail.)
So, you can do what you want like so:
sed -n -e '/http:\/\/www\.yahoo\.com/,$p' file1 > file2
The -n means "don't print anything unless specifically told to", and the -e directive means "from the first appearance of a line that matches the regexp /http:\/\/www\.yahoo\.com/ to the end of the file, print."
This will include the line with http://www.yahoo.com/ on it in the output. If you want everything after that point but not that line itself, the easiest way to do that is to invert the operation:
sed -e '1,/http:\/\/www\.yahoo\.com/d' file1 > file2
which means "for line 1 through the first line matching the regexp /http:\/\/www\.yahoo\.com/, delete the line" (and then, implicitly, print everything else; note that -n is not used this time).
awk '/yahoo/ ? c++ : c' file1
Or golfed
awk '/yahoo/?c++:c' file1
Result
http://www.baidu.com
http://www.yandex.com
This is most easily done in Perl:
perl -ne 'print unless 1 .. m(http://www\.yahoo\.com)' file
In other words, print all lines that aren’t between line 1 and the first occurrence of that pattern.
Using this script:
# Get index of the "yahoo" word
index=`grep -n "yahoo" filepath | cut -d':' -f1`
# Get the total number of lines in the file
totallines=`wc -l filepath | cut -d' ' -f1`
# Subtract totallines with index
result=`expr $total - $index`
# Gives the desired output
grep -A $result "yahoo" filepath

Resources