Compare lines within a file in bash - bash

input.txt file
12345678,Manoj,23,Developer
12345678,Manoj,34,Developer
12345678,Manoj,67,Developer
12345679,Vijay,12,Tester
12345679,Vijay,98,Tester
12345676,Samrat,100,Manager
12345676,Samrat,25,Manager
12345676,Samrat,28,Manager
Desired output file
12345678,Manoj,23,Developer,0
12345678,Manoj,34,Developer,1
12345678,Manoj,67,Developer,2
12345679,Vijay,12,Tester,0
12345679,Vijay,98,Tester,1
12345676,Samrat,100,Manager,0
12345676,Samrat,25,Manager,1
12345676,Samrat,28,Manager,2
Explanation
Here the first value i.e 12345678 in the first 3 lines of my input file are the same so append the first 3 lines with ,0 ,1 and ,2 respectively. And similarly to the following lines.
How it can be done in Shell Script.
Edit in Desired Output
Is is also possible to change the Desired Output number format to the following for the output?
12345678,Manoj,23,Developer,0000000
12345678,Manoj,34,Developer,0000001
12345678,Manoj,67,Developer,0000002
12345679,Vijay,12,Tester,0000000
12345679,Vijay,98,Tester,0000001
12345676,Samrat,100,Manager,0000000
12345676,Samrat,25,Manager,0000001
12345676,Samrat,28,Manager,0000002
New:
Is it possible to start the numbering from 0000019. Is there anyother option to initialize a variable like a=5, a=19, a=39 from where i can increment afterwards.
12345678,Manoj,23,Developer,0000019
12345678,Manoj,34,Developer,0000020
12345678,Manoj,67,Developer,0000021
12345679,Vijay,12,Tester,0000019
12345679,Vijay,98,Tester,0000020
12345676,Samrat,100,Manager,0000019
12345676,Samrat,25,Manager,0000020
12345676,Samrat,28,Manager,0000021

Using awk:
$ awk 'BEGIN{FS=OFS=",";RS="\r?\n"}{print $0,a[$1]++}' file
Output:
12345678,Manoj,23,Developer,0
12345678,Manoj,34,Developer,1
12345678,Manoj,67,Developer,2
12345679,Vijay,12,Tester,0
12345679,Vijay,98,Tester,1
12345676,Samrat,100,Manager,0
12345676,Samrat,25,Manager,1
12345676,Samrat,28,Manager,2
Edit:
As the requirements changed and a lot of commenting took place, here is the final version (revision one as the requirements were different in comments and the OP, knocking on wood):
$ awk 'BEGIN{FS=","}{sub(/\r$/,"");printf "%s,%07d" ORS,$0,a[$1]++}' file
Explained:
$ awk '
BEGIN {
FS=","
# ORS="\r\n" # uncomment if Windows line-endings are desired
}
{
sub(/\r$/,"") # remove Windows line-endings (ie. \r from \r\n)
printf "%s,%07d" ORS,$0,a[$1]++ # output zeropadded running count on $1
}' file
Tested with gawk, mawk, busybox awk and the original-awk (awk version 20121220). Oh, and recycled my Solaris box 5 years ago. ;D

Update to fix my former self-unknown line-ending error.
Use this, will work on both \r\n and \n line endings, output will end in \n:
awk -F, 'sub(/\r$/,"") ($(NF+1)=sprintf("%07d",a[$2]++))' OFS=, input.txt
Output:
12345678,Manoj,23,Developer,0000000
12345678,Manoj,34,Developer,0000001
12345678,Manoj,67,Developer,0000002
12345679,Vijay,12,Tester,0000000
12345679,Vijay,98,Tester,0000001
12345676,Samrat,100,Manager,0000000
12345676,Samrat,25,Manager,0000001
12345676,Samrat,28,Manager,0000002
I wrote like that is for conciseness, it's functionally equals to:
awk 'BEGIN{FS=OFS=","}{sub(/\r$/,"");$(NF+1)=sprintf("%07d",a[$2]++)}1' input.txt
If you have ruby installed:
ruby -aF, -pe 'BEGIN{a=Hash.new(-1)};sub(/\r?$/, "," + "%07d" % a[$F[1]]+=1)' input.txt
Same output.
Btw, if you want it starts with 19, you can use this (add 19+ to the value):
awk 'sub(/\r$/,"") ($(NF+1)=sprintf("%07d",19+a[$2]++))' FS=, OFS=, input.txt
Or this(initialize with 18):
ruby -aF, -pe 'BEGIN{a=Hash.new(18)};sub(/\r?$/, "," + "%07d" % a[$F[1]]+=1)' input.txt
These all used $2 (column 2) as the keys, since in your samples $1 and $2 are related, so use either one would work.

Could you please try following.(without editing line simply print it by addiotnal array's count value)
awk 'BEGIN{FS=OFS=","} {printf("%s,%07d\n",$0,count[$2]++)}' Input_file

Using Perl
$ cat manoj.txt
12345678,Manoj,23,Developer
12345678,Manoj,34,Developer
12345678,Manoj,67,Developer
12345679,Vijay,12,Tester
12345679,Vijay,98,Tester
12345676,Samrat,100,Manager
12345676,Samrat,25,Manager
12345676,Samrat,28,Manager
$ perl -F, -lane ' $F[$#F]=~s/\r//g; $F[$#F+1]=sprintf("%07d",$kv{$F[0]}++);$,=","; print #F ' manoj.txt
12345678,Manoj,23,Developer,0000000
12345678,Manoj,34,Developer,0000001
12345678,Manoj,67,Developer,0000002
12345679,Vijay,12,Tester,0000000
12345679,Vijay,98,Tester,0000001
12345676,Samrat,100,Manager,0000000
12345676,Samrat,25,Manager,0000001
12345676,Samrat,28,Manager,0000002
$

Related

Take string from multiple files and copy to new file and print filename into second column in bash

I have multiple files containing this information:
sP12345.txt
COMMENT Method: conceptual translation.
FEATURES Location/Qualifiers
source 1..3024
/organism="H"
/isolate="sP12345"
/isolation_source="blood"
/host="Homo sapiens"
/db_xref="taxon:11103"
/collection_date="31-Mar-2014"
/note="genotype: 3"
sP4567.txt
COMMENT Method: conceptual translation.
FEATURES Location/Qualifiers
source 1..3024
/organism="H"
/isolate="sP4567"
/isolation_source="blood"
/host="Homo sapiens"
/db_xref="taxon:11103"
/collection_date="31-Mar-2014"
/note="genotype: 2"
Now I would like to get the /note="genotype: 3" and copy only the number that is after genotype: copy it to a new textfile and print the filename from which is has been taken as column 2.
Expected Output:
3 sP12345
2 sP4567
I tried this code: but it only prints the first column and not the filename:
awk -F'note="genotype: ' -v OFS='\t' 'FNR==1{++c} NF>1{print $2, c}' *.txt > output_file.txt
You may use:
awk '/\/note="genotype: /{gsub(/^.* |"$/, ""); f=FILENAME; sub(/\.[^.]+$/, "", f); print $0 "\t" f}' sP*.txt
3 sP12345
2 sP4567
$ awk -v OFS='\t' 'sub(/\/note="genotype:/,""){print $0+0, FILENAME}' sP12345.txt sP4567.txt
3 sP12345.txt
2 sP4567.txt
You can do:
awk '/\/note="genotype:/{split($0,a,": "); print a[2]+0,"\t",FILENAME}' sP*.txt
3 sP12345.txt
2 sP4567.txt
With your shown samples, in GNU awk please try following awk code.
awk -v RS='/note="genotype: [0-9]*"' '
RT{
gsub(/.*: |"$/,"",RT)
print RT,FILENAME
nextfile
}
' *.txt
Explanation: Simple explanation would be, passing all .txt files to GNU awk program here. Then setting RS(record separator) as /note="genotype: [0-9]*" as per shown samples and requirement. In main program of awk, using gsub(global substitution) to removing everything till colon followed by space AND " at the end of value of RT with NULL. Then printing value of RT followed by current file's name. Using nextfile will directly take program to next file skipping rest of contents of file, to save sometime for us.

Prepend text to specific line numbers with variables

I have spent hours trying to solve this. There are a bunch of answers as to how to prepend to all lines or specific lines but not with a variable text and a variable number.
while [ $FirstVariable -lt $NextVariable ]; do
#sed -i "$FirstVariables/.*/$FirstVariableText/" "$PWD/Inprocess/$InprocessFile"
cat "$PWD/Inprocess/$InprocessFile" | awk 'NR==${FirstVariable}{print "$FirstVariableText"}1' > "$PWD/Inprocess/Temp$InprocessFile"
FirstVariable=$[$FirstVariable+1]
done
Essentially I am looking for a particular string delimiter and then figuring out where the next one is and appending the first result back into the following lines... Note that I already figured out the logic I am just having issues prepending the line with the variables.
Example:
This >
Line1:
1
2
3
Line2:
1
2
3
Would turn into >
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
You can do all that using below awk one liner.
Assuming your pattern starts with Line, then the below script can be used.
> awk '{if ($1 ~ /Line/ ){var=$1;print $0;}else{ if ($1 !="")print var $1}}' $PWD/Inprocess/$InprocessFile
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
Here is how the above script works:
If the first record contains word Line then it is copied into an awk variable var. From next word onwards, if the record is not empty, the newly created var is appended to that record and prints it producing the desired result.
If you need to pass the variables dynamically from shell to awk you can use -v option. Like below:
awk -v var1=$FirstVariable -v var2=$FirstVariableText 'NR==var{print var2}1' > "$PWD/Inprocess/Temp$InprocessFile"
The way you addressed the problem is by parsing everything both with bash and awk to process the file. You make use of bash to extract a line, and then use awk to manipulate this one line. The whole thing can actually be done with a single awk script:
awk '/^Line/{str=$1; print; next}{print (NF ? str $0 : "")}' inputfile > outputfile
or
awk 'BEGIN{RS="";ORS="\n\n";FS=OFS="\n"}{gsub(FS,OFS $1)}1' inputfile > outputfile

How to write a bash script that dumps itself out to stdout (for use as a help file)?

Sometimes I want a bash script that's mostly a help file. There are probably better ways to do things, but sometimes I want to just have a file called "awk_help" that I run, and it dumps my awk notes to the terminal.
How can I do this easily?
Another idea, use #!/bin/cat -- this will literally answer the title of your question since the shebang line will be displayed as well.
Turns out it can be done as pretty much a one liner, thanks to #CharlesDuffy for the suggestions!
Just put the following at the top of the file, and you're done
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
So for my awk_help example, it'd be:
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
# Basic form of all awk commands
awk search pattern { program actions }
# advanced awk
awk 'BEGIN {init} search1 {actions} search2 {actions} END { final actions }' file
# awk boolean example for matching "(me OR you) OR (john AND ! doe)"
awk '( /me|you/ ) || (/john/ && ! /doe/ )' /path/to/file
# awk - print # of lines in file
awk 'END {print NR,"coins"}' coins.txt
# Sum up gold ounces in column 2, and find out value at $425/ounce
awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt
# Print the last column of each line in a file, using a comma (instead of space) as a field separator:
awk -F ',' '{print $NF}' filename
# Sum the values in the first column and pretty-print the values and then the total:
awk '{s+=$1; print $1} END {print "--------"; print s}' filename
# functions available
length($0) > 72, toupper,tolower
# count the # of times the word PASSED shows up in the file /tmp/out
cat /tmp/out | awk 'BEGIN {X=0} /PASSED/{X+=1; print $1 X}'
# awk regex operators
https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operators.html
I found another solution that works on Mac/Linux and works exactly as one would hope.
Just use the following as your "shebang" line, and it'll output everything from line 2 on down:
test.sh
#!/usr/bin/tail -n+2
hi there
how are you
Running this gives you what you'd expect:
$ ./test.sh
hi there
how are you
and another possible solution - just use less, and that way your file will open in searchable gui
#!/usr/bin/less
and this way you can grep if for something too, e.g.
$ ./test.sh | grep something

modify line in a .txt file in bash [duplicate]

This question already has answers here:
Modify column 2 only using awk and sed
(2 answers)
Closed 6 years ago.
I have a .txt that contains lines and these in turn data separated by "," for example:
10,05,nov,2016,122,2,2,330,user
What I want is to be able to modify a parameter of an X line, which the search method is the first number, which is unique, is not repeated.
For example find the number 10 (f1) and modify the row containing the 122 (f5).
I've tried it with sed but I can't do it.
I've commented that with awk I could, but I did'nt study that command.
Some help??
A simple awk script like the following should do the trick :
awk -v find="10" -v field="5" -v newval="abcd" 'BEGIN {FS=OFS=","} {if ($1 == find) $field=newval; print $0}' test.csv
Explanation:
awk -v find="10" -v field="5" -v newval="abcd" : defines 3 variables for awk. find, that contains the pattern we are looking for,field that contains the number of the field we want to edit, and newval with the value to replace.
BEGIN {FS=OFS=","} : before iterating through the file, we set the File Separator and Output File Separator to ",".
if ($1 == find) $field=newval: if the 1rst field of a line contains the pattern we want, we set the Nth field (1st if $field=1, 2nd if $field=2, ...) to the value of newval
print $0: whatever the result from the if test, we print the whole line.
A shorter (but less understandable) version of this script could be written as follow :
awk -v a="10" -v f="5" -v n="abcd" -F, '$1 == a {$f=n}OFS=FS' test.csv
Where a refers to find, f refers to field, n refers to newval and -F, refers to FS=","
Script in action :
> cat test.csv
11,05,nov,2016,122,2,2,330,user
10,05,nov,2016,123,2,2,330,user
12,05,nov,2016,124,2,2,330,user
> awk -v find="10" -v field="5" -v newval="abcd" 'BEGIN {FS=OFS=","} {if ($1 == find) $field=newval; print $0}' test.csv
11,05,nov,2016,122,2,2,330,user
10,05,nov,2016,abcd,2,2,330,user
12,05,nov,2016,124,2,2,330,user
With sed:
$ sed '/^10/s/,[^,]*/,333/4' <<< "10,05,nov,2016,122,2,2,330,user"
10,05,nov,2016,333,2,2,330,user
In lines starting with 10, search for 4th comma followed by non-comma characters and replace with your substitution string.

Join lines based on pattern

I have the following file:
test
1
My
2
Hi
3
i need a way to use cat ,grep or awk to give the following output:
test1
My2
Hi3
How can i achieve this in a single command? something like
cat file.txt | grep ... | awk ...
Note that its always a string followed by a number in the original text file.
sed 'N;s/\n//' file.txt
This should give the desired output when the content is in file.txt
paste -d "" - - < filename
This takes consecutive lines and pastes them together delimited by the empty string.
awk '{printf("%s", $0);} !(NR%2){printf("\n");}' file.txt
EDIT: I just noticed that your question requires the use of cat and grep. Both of those programs are unnecessary to achieve your stated aims. If you have some reason for including them that you haven't mentioned, try this (uselessly inefficient) version of the line I wrote immediately above:
cat file.txt | grep '^' | awk '{printf("%s", $0);} !(NR%2){printf("\n");}'
It is possible that this command uses features not present in the original awk program. You may need to invoke the new awk program, nawk instead.
If your input file is always 1 number then 1 string, and you only want the strings, all you have to do is take every other line.
If you only want the odd lines, you can do awk 'NR % 2' file.txt
If you want the evens, this becomes awk 'NR % 2==0' data
Here is the answer:
cat file.txt | awk 'BEGIN { lno = 0 } { val=$0; if (lno % 2 == 1) {printf "%s\n", $0} else {printf "%s", $0}; ++lno}'

Resources