Given a number of JSON files in a directory, I need to extract the names of those that contain, in a line labeled "keyword:", a combination of words that match a Boolean query.
For example:
file 01 contains the line:
...
"keyword": "Alabama", "Washington"
...
file 02 contains the line:
...
"keyword": "Washington", "Pennsylvania"
...
this is how the search should work:
$ query "Washington"
01
02
$ query "Washington & !Alabama"
02
$ query "Alabama | Pennsylvania"
01
02
and so on, for any Boolean combination of individual keys, say
query "(Alabama & !California) | Maine"
grep "keyword" <directory> will extract the lines of interest, and the rest can be pipelined from there, but grep implementations of logical operators differ much from Boolean expressions, and awk, while getting closer in its syntax, doesn't seem to work for me (may be I haven't dedicated enough time to it).
What would be a good candidate to implement this functionality, and how to translate to such an intuitive syntax?
Related
I have a two requirements:
I must concatenate some fields from a file in a Cobol program. The way i must concatenate is based on one of the aforementioned field. The concatenated fields must be outputted in a new file.
I must then sort this new file with a sort utility invoked by JCL.
The Issue
I need to sort same file for 2 conditions. I have tried with ifthen outrec build. How can I sort it in one pass?
Here is a source-code example :
ID DIVISION.
PROGRAM-ID. FOO.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
Select Infil assign to inp001.
Select Outfil assign to out001.
DATA DIVISION.
FILE SECTION.
FD Infil.
01 Main.
03 A.
05 ws-Pc. Pic x(1).
05 filler Pic x(5).
03 B. Pic x(4).
03 C. Pic x(4).
03 D. Pic 9(13)V99.
03 E. Pic x(13).
FD Outfil.
01 Temp Pic x(42).
WORKING-STORAGE SECTION.
01 file-flag Pic x(01).
88 file-end value 'Y'.
88 not-file-end value 'N'.
PROCEDURE DIVISION.
Open input Infil
Open output Outfil
read Infil
at end
set file-end to true
not at end
set not-file-end to true
end-read
Perform until file-end
If ws-Pc = 3
String A B C Delimited by size
into Temp
End-String
Else
String A B C E Delimited by size
into Temp
End-String
End-if
Write Temp
read Infil
at end
Set file-end to true
end-read
end-Perform.
end program foo.
Here is the logic I need for the sort utility :
If ws-Pc=3
Sort(fieldA,fieldB)
Else
Sort(fieldA,fieldB,fieldE)
End-if.
I can propose you three variations :
If you are sure that your last 13 characters are the same (spaces for instance) in the case without a field E (ws-pc equal to three) you can just have this sysin:
SORT FIELDS=(1,6,CH,A,7,4,CH,A),EQUALS
Indeed, thanks to the equals all your inputs will keep their relative order once sorted. For the case ws-pc=3 it will be sorted according to field A and B. Field E plays no importance because it is the same for all.
If you are not sure that the last 13 characters are the same you can do it yourself:
INREC IFTHEN=(WHEN=(1,1,CH,EQ,C'3'),BUILD=(1,14,13X))
SORT FIELDS=(1,6,CH,A,7,4,CH,A),EQUALS
This will force your last characters to be spaces.
If you don't want to use the "EQUALS" you can create your own ordering by appending a line number in the field E when it is unused. You then have to remove it.
INREC IFTHEN=(WHEN=(1,1,CH,EQ,C'3'),BUILD=(1,14,SEQNUM,13,ZD))
SORT FIELDS=(1,6,CH,A,7,4,CH,A)
OUTREC IFTHEN=(WHEN=(1,1,CH,EQ,C'3'),BUILD=(1,14,13X))
I have a file with several lines of data. The fields are not always in the same position/column. I want to search for 2 strings and then show only the field and the data that follows. For example:
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
I would like to return the following:
"id":"1111","hwVersion":"4444"
"id":"5555","hwVersion":"7777"
I am struggling because the data isn't always in the same position, so I can't chose a column number. I feel I need to search for "id" and "hwVersion" Any help is GREATLY appreciated.
Totally agree with #KamilCuk. More specifically
jq -c '{id: .id, hwVersion: .hwVersion}' <<< '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
Outputs:
{"id":"1111","hwVersion":"4444"}
Not quite the specified output, but valid JSON
More to the point, your input should probably be processed record by record, and my guess is that a two column output with "id" and "hwVersion" would be even easier to parse:
cat << EOF | jq -j '"\(.id)\t\(.hwVersion)\n"'
{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}
{"id":"5555","name":"6666","hwVersion":"7777"}
EOF
Outputs:
1111 4444
5555 7777
Since the data looks like a mapping objects and even corresponding to a JSON format, something like this should do, if you don't mind using Python (which comes with JSON) support:
import json
def get_id_hw(s):
d = json.loads(s)
return '"id":"{}","hwVersion":"{}"'.format(d["id"], d["hwVersion"])
We take a line of input string into s and parse it as JSON into a dictionary d. Then we return a formatted string with double-quoted id and hwVersion strings followed by column and double-quoted value of corresponding key from the previously obtained dict.
We can try this with these test input strings and prints:
# These will be our test inputs.
s1 = '{"id":"1111","name":"2222","versionCurrent":"3333","hwVersion":"4444"}'
s2 = '{"id":"5555","name":"6666","hwVersion":"7777"}'
# we pass and print them here
print(get_id_hw(s1))
print(get_id_hw(s2))
But we can just as well iterate over lines of any input.
If you really wanted to use awk, you could, but it's not the most robust and suitable tool:
awk '{ i = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
h = gensub(/.*"id":"([0-9]+)".*/, "\\1", "g")
printf("\"id\":\"%s\",\"hwVersion\":\"%s\"\n"), i, h}' /your/file
Since you mention position is not known and assuming it can be in any order, we use one regex to extract id and the other to get hwVersion, then we print it out in given format. If the values could be something other then decimal digits as in your example, the [0-9]+ but would need to reflect that.
And for the fun if it (this preserves the order) if entries from the file, in sed:
sed -e 's#.*\("\(id\|hwVersion\)":"[0-9]\+"\).*\("\(id\|hwVersion\)":"[0-9]\+"\).*#\1,\3#' file
It looks for two groups of "id" or "hwVersion" followed by :"<DECIMAL_DIGITS>".
I've log input lines.
I want my filer to filter only lines that have the "Add" word within it
(this word can be at anywhere at line)
and extract some values from line
to get something like: (at json format)
Action: Add, val1: 12, val2: 15
Action: Add, val1: 11, val2: 12
from those lines input
ifoeife, Add, val1:12, val2:15
eife, frfr, 90088, Add, val1:11, val2:12
eife, val1:11, val2:12
[val1, val2, action are indexes]
Well. You can use Grok filter. It is possible to create some kind of complicated pattern or just use Regular expressions as described here: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#_regular_expressions
Regexp for your line would be something like
[a-zA-Z0-9,]*?(Add|SomeOtherPossibleAction), (val1:\d+), (val2:\d+)
Forgive me, I am very new to Ruby and relatively new to programming in general. My problem is probably not hard, but I have googled until my fingers bled looking for a solution and I just cant get it.
I have a line of text that looks like this:
6 19 11 28 22 localhost G6UI ip0 cameraLink cameraLinkMissingScans 15116
After all is said and done, I want it to look like this:
6.19.2014,11.28.22,localhost,G6UI,ip0,cameraLink cameraLinkMissingScans 15116
I have accomplished this in Bash (I am essentially just making a CSV file, with the time and date formatted the way I want it) but, for reasons to lengthy to explain, Id like to do it with Ruby.
I have a start, although its probably a bit sad:
myLineOfText.sub!(/[^-a-zA-Z0-9]/,'\1.\2.')
Which gives me this:
6..19 11 28 22 localhost G6UI ip0 cameraLink cameraLinkMissingScans 15116
Any help would be greatly appreciated, I just need something to get me started.
Thanks in advance.
If you can be sure the format always remains the same, you can do:
str.sub!(/(\d+) (\d+) (\d+)/,'\1.\2.\3').gsub!(/ /,',')
Example:
str='6 19 11 28 22 localhost G6UI ip0 cameraLink cameraLinkMissingScans 15116'
str.sub!(/(\d+) (\d+) (\d+)/,'\1.\2.\3').gsub!(/ /,',')
puts str
=> "6.19.11,28,22,localhost,G6UI,ip0,cameraLink,cameraLinkMissingScans,15116"
With questions such as this one, the answer depends on what is fixed and what is variable in the data's format. I have assumed:
there are at least nine substrings separated with spaces
substrings 0 and 1 (base 0) correspond to the month and year, and are to be combined with the literal "2014" to form a date of the form dd.mm.2014
substrings 2-4 are to be joined with '.' and followed with ','
substrings 5-7 are to be joined with ',' and followed with ','
the remainder of the substrings are to be joined with a space
I don't think a regex is the right tool for the formating; rather just split the string on spaces and form the new string by using a series of String#join's, combining the resulting substrings in the obvious way:
s = "6 19 11 28 22 localhost G6UI ip0 cameraLink cameraLinkMissingScans 15116"
a = s.split(' ')
#=> ["6", "19", "11", "28", "22", "localhost", "G6UI", "ip0", "cameraLink",
# "cameraLinkMissingScans", "15116"]
a[0]+'.'+a[1]+'.2014,'+a[2..4].join('.')+','+a[5..7].join(',')+','+
a[8..-1].join(' ')
#=> "6.19.2014,11.28.22,localhost,G6UI,ip0,cameraLink cameraLinkMissingScans 15116"
Suppose I am in following directory A/B/C/prop and need to check e0and e1 files out of other 100 files.In these two file i have following entries:
$DBConnection_target=targetname
$DBConnection_source1=sourcename
I need to change targetname and sourcename only and string that will be used, taken from Keyboard(read).
These string may occur more than 2-3 times.
You can use sed to perform the replacements in the two files, e0 and e1, as shown below:
# set what you want the source and target to be changed to here:
newSource=foo
newTarget=bar
sed -i 's/\($DBConnection_target\)=.*$/\1='"$newTarget"'/;s/\($DBConnection_source1\)=.*$/\1='"$newSource"'/' e0 e1