How to define ctag parser for custom file format - ctags

I regularly use a file format that doesn't have a parser for Ctags. I would like to write a parser for it, but I'm not sure how. The file format doesn't have keywords like a computer language does, but instead where you are in the file is dependent on the content of the last 10 columns of each line in the file. (Sorry, the ENDF format was created in the 1960s.)
How can I create a new parser that depends on the contents of a particular column?
Here is an abbreviated example of the file, but it still contains enough information to get the gist of what I'm trying to do:
MMMMFFTTT
33 856 176 17434 1451
34 2 155 17434 1451
34 51 115 17434 1451
0.000000+0 0.000000+0 0 0 0 07434 1 0
0.000000+0 0.000000+0 0 0 0 07434 0 0
7.418300+4 1.813790+2 0 0 1 07434 2151
7.418300+4 1.000000+0 0 0 2 07434 2151
1.000000-5 5.000000+3 1 7 0 17434 2151
0.000000+0 0.000000+0 0 3 5 07434 2151
0.000000+0 0.000000+0 2 0 24 47434 2151
7.418300+4 1.813790+2 0 0 0 07434 3 28
-7.222000+6-7.222000+6 0 0 1 397434 3 28
39 2 7434 3 28
7.261820+6 0.000000+0 9.300000+6 0.000000+0 9.600000+6 2.18585-137434 3 28
1.000000+7 5.01372-13 1.050000+7 1.32071-11 1.100000+7 8.70475-107434 3 28
0.000000+0 0.000000+0 0 0 0 07434 3 0
7.418300+4 1.813790+2 0 0 0 07434 3 37
-2.093600+7-2.093600+7 0 0 1 207434 3 37
2.105140+7 0.000000+0 2.200000+7 7.150990-5 2.400000+7 2.707920-27434 3 37
1.300000+8 5.411910-2 1.500000+8 3.895580-2 7434 3 37
0.000000+0 0.000000+0 0 0 0 07434 3 0
7.418300+4 1.813790+2 0 0 0 07434 3 41
-1.328500+7-1.328500+7 0 0 1 267434 3 41
26 2 7434 3 41
1.335820+7 0.000000+0 1.550000+7 0.000000+0 1.600000+7 2.56183-147434 3 41
1.700000+7 9.60380-12 1.800000+7 3.02742-10 1.900000+7 1.474340-77434 3 41
1.300000+8 1.582280-2 1.500000+8 1.154350-2 7434 3 41
I've labeled the columns MMMM, FF, and TT. When these change is when I need a "tag" (using the term loosely) to tell me that it has changed. Note, this is (kind of) nested in that, there are many TTs in each FF, and many FFs inside each MMMM.
I'm not sure what the tag output should look like. I've never even looked at the tag output; I've always relied on someone else to parse them for me. Please assist this novice as I try to learn.
I wrote a syntax parser for Vim several years ago and was hoping this might be a good addition.

My answer assumes you use Universal-ctags (https://ctags.io).
I expect you know the basic concept of ctags: kinds and fields. See https://docs.ctags.io/en/latest/man/ctags.1.html#tag-entries if you don't know them.
I expect you know the output format of ctags. See https://docs.ctags.io/en/latest/man/tags.5.html if you don't know.
There are various ways to implement a parser in ctags. In this case, you may want to write the parser in C language with line-oriented way.
33 856 176 17434 1451
34 2 155 17434 1451
...
You may expect the 7434 at the first line is tagged as mmmm.
However you may not expect the 7434 at the second line.
The parser must have an ability to track the state of input; the parser should not make a tag of which name is already tagged.
It means you cannot define the parser for the language in your .ctags with regular expressions. You may have to write it in C.
The inpue is line oriented. So you can use readLineFromInputFile function. It is the heart of line oriented parser.
https://github.com/masatake/ctags/commit/e8e0015393ae7a3b447ee886bd0884f45d11ced2 is a runnable example illustrating how to use readLineFromInputFile.
With the example, ctags emits following tags output:
$ ctags --options=NONE --list-kinds=ENDF
m materials
f material files
t material subdivisions
$ ctags --options=NONE --sort=no -o - input.endf
434 input.endf /^ 33 856 176 17434 1451$/;" m
14 input.endf /^ 33 856 176 17434 1451$/;" f mat:434
51 input.endf /^ 33 856 176 17434 1451$/;" t mf:434 14
...

Related

Extract column from file with shell [duplicate]

This question already has answers here:
bash: shortest way to get n-th column of output
(8 answers)
Closed 4 years ago.
I would like to extract column number 8 from the following table using shell (ash):
0xd024 2 0 32 20 3 0 1 0 2 1384 1692 -61 27694088
0xd028 0 1 5 11 1 0 46 0 0 301 187 -74 27689154
0xd02c 0 0 35 14 1 0 21 0 0 257 250 -80 27689410
0xd030 1 1 15 13 1 0 38 0 0 176 106 -91 27689666
0xd034 1 1 50 20 1 0 8 0 0 790 283 -71 27689980
0xd038 0 0 0 3 4 0 89 0 0 1633 390 -90 27690291
0xd03c 0 0 8 3 3 0 82 0 0 1837 184 -95 27690603
0xd040 0 0 4 5 1 0 90 0 0 0 148 -97 27690915
0xd064 0 0 36 9 1 0 29 0 0 321 111 -74 27691227
0xd068 0 0 5 14 14 0 40 0 0 8066 2270 -85 27691539
0xd06c 1 1 39 19 1 0 15 0 0 1342 261 -74 27691850
0xd070 0 0 12 11 1 0 53 0 0 203 174 -73 27692162
0xd074 0 0 18 2 1 0 75 0 0 301 277 -94 27692474
How can I do that?
the following command "awk '{print $8}' file" works fine

Remove rows that have a specific numeric value in a field

I have a very bulky file about 1M lines like this:
4001 168991 11191 74554 60123 37667 125750 28474
8 145 25 101 83 51 124 43
2985 136287 4424 62832 50788 26847 89132 19184
3 129 14 101 88 61 83 32 1 14 10 12 7 13 4
6136 158525 14054 100072 134506 78254 146543 41638
1 40 4 14 19 10 35 4
2981 112734 7708 54280 50701 33795 75774 19046
7762 339477 26805 148550 155464 119060 254938 59592
1 22 2 12 10 6 17 2
6 136 16 118 184 85 112 56 1 28 1 5 18 25 40 2
1 26 2 19 28 6 18 3
4071 122584 14031 69911 75930 52394 89733 30088
1 9 1 3 4 3 11 2 14 314 32 206 253 105 284 66
I want to remove rows that have a value less than 100 in the second column.
How to do this with sed?
I would use awk to do this. Example:
awk ' $2 >= 100 ' file.txt
this will only display every row from file.txt that has a column $2 greater than 100.
Use the following approach:
sed '/^\w+\s+([0-9]{1,2}|[0][0-9]+)\b/d' -E /tmp/test.txt
(replace /tmp/test.txt with your current file path)
([0-9]{1,2}|[0][0-9]+) - will match either digits from 0 to 99 OR a digits with leading zero (ex. 012, 00982)
d - delete the pattern space;
-E(--regexp-extended) - Use extended regular expressions rather than basic regular expressions
To remove matched lines in place use -i option:
sed -i -E '/^\w+\s+([0-9]{1,2}|[0][0-9]+)\b/d' /tmp/test.txt

How do I remove something lines from a text file in Ruby?

Every 4 rows leave following 6 lines removed and so until the end of the file.
No rows deleted can be written in another file
file to remove the lines
34
511
6977
511
0
22
20
8569
15
23
6466
390
1
54
9140
-100
0
12
10
5308
19
12
9240
442
1
46
433
55
file after removing lines
34
511
6977
511
6466
390
1
54
19
12
9240
442
The basis for this is the each_with_index function:
lines.each_with_index do |line, i|
case (i % 10)
when 0..3
puts line
end
end
You can adapt that code to put the output somewhere else, like an additional array or what have you.

how to add text to next line in tab separated file from other file?

I have a set of files contain tab separated values, at the last but third line, I have my desired values. I have extracted that value with
cat result1.tsv | tail -3 | head -1 > final1.tsv
cat resilt2.tsv | tail -3 | head -1 >final2.tsv
..... so on (I have almost 30-40 files)
I want the content of final tsv files in next line in a new single file.
I tried
cat final1.tsv final2.tsv > final.tsv
but this works for the limited amount of files difficult to write the name of all files.
I tried to put the file names in a loop as variables but not worked.
final1.tsv contains:
270 96 284 139 271 331 915 719 591 1679 1751 1490 968 1363 1513 1184 1525 490 839 425 967 855 356
final2.tsv contains:
1 1 0 2 6 5 1 1 11 7 1 3 4 1 0 3 2 1 0 3 2 1 28
all the files (final1.tsv,final2.tsv,final3.tsv,final5..... contains same number of columns but different values)
I want the rows of each file merged in new file like
final.tsv
final1 270 96 284 139 271 331 915 719 591 1679 1751 1490 968 1363 1513 1184 1525 490 839 425 967 855 356
final2 1 1 0 2 6 5 1 1 11 7 1 3 4 1 0 3 2 1 0 3 2 1 28
final3 270 96 284 139 271 331 915 719 591 1679 1751 1490 968 1363 1513 1184 1525 490 839 425 967 855 356
final4 1 1 0 2 6 5 1 1 11 7 1 3 4 1 0 3 2 1 0 3 2 1 28
here you go...
for f in final{1..4}.tsv;
do
echo -en $f'\t' >> final.tsv;
cat $f >> final.tsv;
done
Try this:
rm final.tsv
for FILE in result*.tsv
do
tail -3 $FILE | head -1 >> final.tsv
done
As long as the files aren't enormous, it's simplest to read each file into an array and select the third record from the end
This solves your problem for you. It looks for all files in the current directory that match result*.tsv and writes the required line from each of them to final.tsv
use strict;
use warnings 'all';
my #results = sort {
my ($aa, $bb) = map /(\d+)/, ($a, $b);
$aa <=> $bb;
} glob 'result*.tsv';
open my $out_fh, '>', 'final.tsv';
for my $result_file ( #results ) {
open my $fh, '<', $result_file or die qq({Unable to open "$result_file" for input: $!};
my #data = <$fh>;
next unless #data >= 3;
my ($name) = $result_file =~ /([^.]+)/;
print { $out_fh } "$name\t$data[-3]";
}

Maths in a while loop causing random negative numbers

So I have done this in both python and bash, and the code I am about to post probably has a world of things wrong with it but it is generally very basic and I cannot see a reason that it would cause this 'bug' which I will explain soon.. I have done the same in Python, but much more professionally and cleanly and it also causes this error (at some point, the maths generates a negative number, which makes no sense.)
#!/bin/bash
while [ 1 ];
do
zero=0
ARRAY=()
ARRAY2=()
first=`command to generate a list of numbers`
sleep 1
second=`command to generate a list of numbers`
# so now we have two data sets, 1 second between the capture of each.
for i in $first;
do
ARRAY+=($i)
done
for i in $second;
do
ARRAY2+=($i)
done
for (( c=$zero; c<=${#ARRAY2[#]}; c++ ))
do
expr ${ARRAY2[$c]} - ${ARRAY[$c]}
done
ARRAY=()
ARRAY2=()
zero=0
c=0
first=``
second=``
math=''
done
So the script grabs a set of data, waits 1 second, grabs it again, does math on the two sets to get the difference, that difference is printed. It's very simple, and I have done it elegantly in Python too - no matter how I would do it every now and then, could be anywhere from 3 loops in to 30 loops in, we will get negative numbers.. like so:
START 0 0 0 0 0 19 10 563 0
-34 19 14 2 0
-1302 1198
-532 639
-1078 1119 1 0 0
-843 33 880 0 5
-8
-13508 8773 4541 988 181
-12
-205 217
-9 7 1
-360 303 60 1 0 0
-12
-96 98 3
-870 904
-130
-2105 2264 6
-3084 1576 1650
-939 971
-2249 1150 1281
-693 9 513 142 76 expr: syntax error
Please help, I simply can't find anything about this.
Sample OUTPUT as requested:
ARRAY1 OUTPUT
1 15 1 25 25 1 2 1 3541 853 94567 42 5 1 351 51 1 11 1 13 7 14 12 3999 983 5 1938 3 8287 40 1 1 1 5253 706 1 1 1 1 5717 3 50 1 85 100376 17334 4655 1 1345 2 1 16 1777 1 3 38 23 8 32 47 781 947 1 1 206 9 1 3 2 81 2602 7 158 1 1 43 91 1 120 6589 6 2534 1092 1 6014 7 2 2 37 1 1 1 80 2 1 1270 15448 66 1 10238 1 10794 16061 4 1 1 1 9754 5617 1123 926 3 24 10 16
ARRAY2 OUTPUT
1 15 1 25 25 1 2 1 3555 859 95043 42 5 1 355 55 1 11 1 13 7 14 12 4015 987 5 1938 3 8335 40 1 1 1 5280 706 1 1 1 1 5733 3 50 1 85 100877 17396 4691 1 1353 2 1 16 1782 1 3 38 23 8 32 47 787 947 1 1 206 9 1 3 2 81 2602 7 159 1 1 43 91 1 120 6869 6 2534 1092 1 6044 7 2 2 37 1 1 1 80 2 1 1270 15563 66 1 10293 1 10804 16134 4 1 1 1 9755 5633 1135 928 3 24 10 16
START
The answer lies in Russell Uhl's comment above. Your loop runs one time to many(this is your code):
for (( c=$zero; c<=${#ARRAY2[#]}; c++ ))
do
expr ${ARRAY2[$c]} - ${ARRAY[$c]}
done
To fix, you need to change the test condition from c <= ${#ARRAY2[#]} to c < ${#ARRAY2[#]}:
for (( c=$zero; c < ${#ARRAY2[#]}; c++ ))
do
echo $((${ARRAY2[$c]} - ${ARRAY[$c]}))
done
I've also changed the expr to use arithmetic evaluation builtin $((...)).
The test script (sum.sh):
#!/bin/bash
zero=0
ARRAY=()
ARRAY2=()
first="1 15 1 25 25 1 2 1 3541 853 94567 42 5 1 351 51 1 11 1 13 7 14 12 3999 983 5 1938 3 8287 40 1 1 1 5253 706 1 1 1 1 5717 3 50 1 85 100376 17334 4655 1 1345 2 1 16 1777 1 3 38 23 8 32 47 7
second="1 15 1 25 25 1 2 1 3555 859 95043 42 5 1 355 55 1 11 1 13 7 14 12 4015 987 5 1938 3 8335 40 1 1 1 5280 706 1 1 1 1 5733 3 50 1 85 100877 17396 4691 1 1353 2 1 16 1782 1 3 38 23 8 32 47
for i in $first; do
ARRAY+=($i)
done
# Alternately as chepner suggested:
ARRAY2=($second)
for (( c=$zero; c < ${#ARRAY2[#]}; c++ )); do
echo -n $((${ARRAY2[$c]} - ${ARRAY[$c]})) " "
done
Running it:
samveen#precise:/tmp$ echo $BASH_VERSION
4.2.25(1)-release
samveen#precise:/tmp$ bash sum.sh
0 0 0 0 0 0 0 0 14 6 476 0 0 0 4 4 0 0 0 0 0 0 0 16 4 0 0 0 48 0 0 0 0 27 0 0 0 0 0 16 0 0 0 0 501 62 36 0 8 0 0 0 5 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 280 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 115 0 0 55 0 10 73 0 0 0 0 1 16 12 2 0 0 0 0
EDIT:
* Added improvements from suggestions in comments.
I think the problem has to be when the two arrays don't have the same size. It's easy to reproduce that syntax error -- one of the operands for the minus operator is an empty string:
$ a=5; b=3; expr $a - $b
2
$ a=""; b=3; expr $a - $b
expr: syntax error
$ a=5; b=""; expr $a - $b
expr: syntax error
$ a=""; b=""; expr $a - $b
-
Try
ARRAY=( $(command to generate a list of numbers) )
sleep 1
ARRAY2=( $(command to generate a list of numbers) )
if (( ${#ARRAY[#]} != ${#ARRAY2[#]} )); then
echo "error: different size arrays!"
echo "ARRAY: ${#ARRAY[#]} (${ARRAY[*]})"
echo "ARRAY2: ${#ARRAY2[#]} (${ARRAY2[*]})"
fi
"The error occurs whenever the first array is smaller than the second" -- of course. You're looping from 0 to the array size of ARRAY2. When ARRAY has fewer elements, you'll eventually try to access an index that does not exist in the array. When you try to reference an unset variable, bash gives you the empty string.
$ a=(1 2 3)
$ b=(4 5 6 7)
$ i=2; expr ${a[i]} - ${b[i]}
-3
$ i=3; expr ${a[i]} - ${b[i]}
expr: syntax error

Resources