I need to split consecutive numbers.
Case 1) 34123
Output:
34
123
Case 2) 123434123
Output:
1234
34
123
[#function splitConsecutiveNumbers input]
[#assign output = ""]
[#list 0..<input?length as i]
[#if !i?is_last]
[#if (input[i]?number+1) != input[i+1]?number]
[#assign output += input[i] + " "]
[#else]
[#assign output += input[i]]
[/#if]
[#else]
[#assign output += input[i]]
[/#if]
[/#list]
[#return output]
[/#function]
${splitConsecutiveNumbers("34123")}
${splitConsecutiveNumbers("123434123")}
Related
I have a file that looks messed up:
contig_1 bin.0013 Rhizobium flavum (taxid 1335061)
contig_2 Alphaproteobacteria (taxid 28211)
contig_3 bin.009
contig_4 bin.008 unclassified (taxid 0)
contig_5 bin.001 Fluviicoccus keumensis (taxid 1435465)
contig_12 bin.003
I want it to look properly with tab delimited columns and zeros where it's empty:
contig_1 bin.0013 Rhizobium flavum (taxid 1335061)
contig_2 0 Alphaproteobacteria (taxid 28211)
contig_3 bin.009 0
contig_4 bin.008 unclassified (taxid 0)
contig_5 bin.001 Fluviicoccus keumensis (taxid 1435465)
contig_12 bin.003 0
If I use smth like sed 's/ /,/g' filename commas are inserted everywhere besides 1-2 and 2-3 columns.
If awk is your option, would you please try the following:
awk -v OFS="\t" '
NR==FNR {
# in the 1st pass, detect the starting positions of the 2nd field and the 3rd
sub(" +$", "") # it avoids misdetection due to extra trailing blanks
if (match($0, "[^[:blank:]]+[[:blank:]]+")) {
# RLENGTH holds the ending position of the 1st blank
if (col2 == 0 || RLENGTH < col2) col2 = RLENGTH + 1
if (match($0, "[^[:blank:]]+[[:blank:]]+[^[:blank:]]+[[:blank:]]+")) {
# RLENGTH holds the ending position of the 2nd blank
if (col3 == 0 || RLENGTH < col3) col3 = RLENGTH + 1
}
}
next
}
{
# in the 2nd pass, extract the substrings in the fixed position and reformat them
# by removing extra spaces and putting "0" if the fiels is empty
c1 = substr($0, 1, col2 - 1); sub(" +$", "", c1); if (c1 == "") c1 = "0"
c2 = substr($0, col2, col3 - col2); sub(" +$", "", c2); if (c2 == "") c2 = "0"
c3 = substr($0, col3); gsub(" +", " ", c3); if (c3 == "") c3 = "0"
# print c1, c2, c3 # use this for the tab-separated output
printf("%-12s%-12s%-s\n", c1, c2, c3)
}' file file
Output:
contig_1 bin.0013 Rhizobium flavum (taxid 1335061)
contig_2 0 Alphaproteobacteria (taxid 28211)
contig_3 bin.009 0
contig_4 bin.008 unclassified (taxid 0)
contig_5 bin.001 Fluviicoccus keumensis (taxid 1435465)
contig_12 bin.003 0
The process consists of two passes. In the 1st pass, it detects the starting positions of the fields.
In the 2nd pass, it cuts out individual fields by using the positions calculated in the 1st pass.
I have picked printf to visually align the output. You can switch to tab separated values
depending on the preference.
I have an input file with fields in several lines. In this file, the field pattern is repeated according to query size.
ZZZZ
21293
YYYYY XXX WWWW VV
13242 MUTUAL BOTH NO
UUUUU TTTTTTTT SSSSSSSS RRRRR QQQQQQQQ PPPPPPPP
3 0 3 0
NNNNNN MMMMMMMMM LLLLLLLLL KKKKKKKK JJJJJJJJ
2 0 5 3
IIIIII HHHHHH GGGGGGG FFFFFFF EEEEEEEEEEE DDDDDDDDDDD
5 3 0 3
My desired output is one line per total group of fields. Empty
fields should be marked. Example:"x"
21293 13242 MUTUAL BOTH NO 3 0 X 3 0 X 2 0 X 5 3 5 3 0 X 3 X
12345 67890 MUTUAL BOTH NO 3 0 X 3 0 X 2 0 X 5 3 5 3 0 X 3 X
I have been thinking about how can I get the desired output with awk/unix scripts but can't figure it out. Any ideas? Thank you very much!!!
This isn't really a great fit for awk's style of programming, which is based on fields that are delimited by a pattern, not fields with variable positions on the line. But it can be done.
When you process the first line in each pair, scan through it finding the positions of the beginning of each field name.
awk 'NR%3 == 1 {
delete fieldpos;
delete fieldlen;
lastspace = 1;
fieldindex = 0;
for (i = 1; i <= length(); i++) {
if (substr($0, i, 1) != " ") {
if (lastspace) {
fieldpos[fieldindex] = i;
if (fieldindex > 0) {
fieldlen[fieldindex-1] = i - fieldpos[fieldindex-1];
}
fieldindex++;
}
lastspace = 0;
} else {
lastspace = 1;
}
}
}
NR%3 == 2 {
for (i = 0; i < fieldindex; i++) {
if (i in fieldlen) {
f = substr($0, fieldpos[i], fieldlen[i]);
} else { # last field, go to end of line
f = substr($0, fieldpos[i]);
}
gsub(/^ +| +$/, "", f); # trim surrounding spaces
if (f == "") { f = "X" }
printf("%s ", f);
}
}
NR%15 == 14 { print "" } # print newline after 5 data blocks
'
Assuming your fields are separated by blank chars and not tabs, GNU awk's FIELDWITDHS is designed to handle this sort of situation:
/^ZZZZ/ { if (rec!="") print rec; rec="" }
/^[[:upper:]]/ {
FIELDWIDTHS = ""
while ( match($0,/\S+\s*/) ) {
FIELDWIDTHS = (FIELDWIDTHS ? FIELDWIDTHS " " : "") RLENGTH
$0 = substr($0,RLENGTH+1)
}
next
}
NF {
for (i=1;i<=NF;i++) {
gsub(/^\s+|\s+$/,"",$i)
$i = ($i=="" ? "X" : $i)
}
rec = (rec=="" ? "" : rec " ") $0
}
END { print rec }
$ awk -f tst.awk file
2129 13242 MUTUAL BOTH NO 3 0 X 3 0 X 2 0 X 5 3 5 3 0 X 3 X
In other awks you'd use match()/substr(). Note that the above isn't perfect in that it truncates a char off 21293 - that's because I'm not convinced your input file is accurate and if it is you haven't told us why that number is longer than the string on the preceding line or how to deal with that.
I want output like this
1
0 1
0 1 0
1 0 1 0
Just add print " "*(5-i) , Like this:
for i in 1..5
print " "*(5-i)
for j in 1..i
if (i%2 == 0);
k = (j%2 == 0) ? 1:0;
else;
k = (j%2 ==0) ? 0:1;
end
print k," "
end
puts
end
The n-th line is going to have n digits plus n-1 spaces - in case of the fifth line nine chars.
Generate each line as a string and print it using puts str.center(9)
I had a input which is a result from text comparison. It is in a very simple format. It has 3 columns, position, original texts and new texts.
But some of the records looks like this
4 ATCG ATCGC
10 1234 123
How to write the short script to normalize it to
7 G GC
12 34 3
probably, the whole original texts and the whole new text is like below respectively
ACCATCGGA1234
ACCATCGCGA123
"Normalize" means "trying to move the position in the first column to the position that changes gonna occur", or "we would remove the common prefix ATG, add its length 3 to the first field; similarly on line 2 the prefix we remove is length 2"
This script
awk '
BEGIN {OFS = "\t"}
function common_prefix_length(str1, str2, max_len, idx) {
idx = 1
if (length(str1) < length(str2))
max_len = length(str1)
else
max_len = length(str2)
while (substr(str1, idx, 1) == substr(str2, idx, 1) && idx < max_len)
idx++
return idx - 1
}
{
len = common_prefix_length($2, $3)
print $1 + len, substr($2, len + 1), substr($3, len + 1)
}
' << END
4 ATCG ATCGC
10 1234 123
END
outputs
7 G GC
12 34 3
A file of student records contains name, gender (M or F), age (in year) and marital status (single or married) for each student. Design an algorithm that will read through the file and calculate the number of married men, single men, and married women. Print these numbers on a student summary report. If any single men are over 30 year of age. Print their names and ages on a separate eligible bachelors report.
Can anyone tell me if I am wrong in any line? thanks! hope you can help me!
Set marriedMen to 0
Set singleMen to 0
Set marriedWomen to 0
Set singleWomen to 0
READ name, sex, age, status
DOWHILE(NOT EOF)
IF (status = married) THEN //check if status is married, if yes then check next
IF (sex = ‘F’) THEN //check if sex is F, if yes then +1
marriedWomen = marriedWomen + 1
ELSE
IF (sex = ‘M’) THEN //under married, and sex is M then +1
marriedMen = marriedMen + 1
ENDIF
ENDIF
ENDIF
IF (status = single) THEN //check if status is single, if yes then check next
IF (sex = ‘F’) THEN //check if sex is F, if yes then +1 to singleWomen
singleWomen = singleWomen + 1
ELSE
IF (sex = ‘M’) THEN //under single, and sex is M then +1
singleMen = singleMen + 1
IF (age > 30) THEN //under single, sex = M and age is over 30 then print the name, age
Print ‘Eligible bachelors Report’
Print ‘Name: ‘, name
Print ‘Age: ‘, age
ENDIF
ENDIF
ENDIF
ENDIF
READ next record
ENDDO
Print ‘Student Summary Report’
Print ‘Married Men: ‘, marriedMen
Print ‘Single Men: ‘, singleMen
Print ‘Married Women: ‘, marriedWomen
Print ‘Single Women: ‘, singleWomen
In the code below, I have done the following:
Added a boolean to prevent you from printing your Eligible bachelors Report header for every single line.
Indented your code so it is easier to read.
Replaced your use of ‘ with ' because you obviously used a word processor such as MS-Word to write your code. (May I suggest NotePad++?)
I left your code mostly intact: I take it for granted that ElseIf isn't a valid keyword in your pseudo code and that your professor might pass you a file with genders other than M and F.
Set marriedMen to 0
Set singleMen to 0
Set marriedWomen to 0
Set singleWomen to 0
Set hasPrintedHeader to False
READ name, sex, age, status
DOWHILE(NOT EOF)
IF (status = married) THEN //check if status is married, if yes then check next
IF (sex = 'F') THEN //check if sex is F, if yes then +1
marriedWomen = marriedWomen + 1
ELSE
IF (sex = 'M') THEN //under married, and sex is M then +1
marriedMen = marriedMen + 1
ENDIF
ENDIF
ENDIF
IF (status = single) THEN //check if status is single, if yes then check next
IF (sex = 'F') THEN //check if sex is F, if yes then +1 to singleWomen
singleWomen = singleWomen + 1
ELSE
IF (sex = 'M') THEN //under single, and sex is M then +1
singleMen = singleMen + 1
IF (age > 30) THEN //under single, sex = M and age is over 30 then print the name, age
IF (hasPrintedHeader = False) THEN
Print 'Eligible bachelors Report'
hasPrintedHeader = True
END IF
Print 'Name: ', name
Print 'Age: ', age
ENDIF
ENDIF
ENDIF
ENDIF
READ next record
ENDDO
Print 'Student Summary Report'
Print 'Married Men: ', marriedMen
Print 'Single Men: ', singleMen
Print 'Married Women: ', marriedWomen
Print 'Single Women: ', singleWomen
DrawTree(n, direction, length)
if n > 0 do
DrawTrunk(direction, length)
DrawTree(n-1, 3DRandomAngle(direction), length*Factor(n))
DrawTree(n-1, direction + random % 10, length*Factor(n))
DrawTree(n-1, 3DRandomAngle(direction), length*Factor(n))
else
DrawLeaf()
end if
end DrawTree