Search-For Utility Mainframe Algorithm - algorithm

Can someone please give me some pointers on how the IBM mainframe Search-For Utility algorithm works?
How does it compare strings? What kind of matching algorithm does it use? How should I enter different strings in order to make the less comparisons possible?
I am using the utility but I do not know how it works, and I believe I am not using it as well as I should.
Thank you very much for your help!

Think of it as a very dumb search.
It doesn't have the capacity to enter a REGEX or anything like that. I don't think anyone will be able to tell you what algorithm is used.
Search-For uses the SuperC program to actually perform the search. What it appears to do is search line by line for a match to the string you provided. So if I do a search for:
'PIC 9(9)'
I am going to get back results for every line that has that string in it. The only way I could bring back less search results, would be to add more to that string. So maybe search for:
'PIC 9(9).' 'PIC 9(9) VALUE 'PIC 9(9) COMP'
any of these 3 would provide less results than the first search. So if that string breaks a line like:
05 WS-SOME-VARIABLE PIC 9(9)
VALUE 123456.
a search for 'PIC 9(9) VALUE' will not return anything, but a search for 'PIC 9(9)' would.
The more specific you are, the less search results you will get back. Depending on what you are looking for, you may be able to get better results by using Search-For in batch, or using File-Aid instead. Every specific scenario is different. So without knowing exactly what you are searching for and what your requirement it, its hard to tell you how to proceed.

You might consider IBM Developer for z, which which can do regular expression based searches. When the Remote Systems Explorer Daemon (RSED) is setup and running on the z/OS lpar, you can do searches across a single PDS or groups of PDS's using IDz filters. Very powerful. It also searches in the background so you can do other tasks while it searches. The searches can be saved for future ease of reference.

Related

Parsing STDF Files to Compare results

I am new to this site and I would like to get some inputs regarding parsing STDF files. Generally speaking, I am trying to parse a STDF file to gather only the results (numbers) and not the rest of the line. If I am able to achieve this, I would then like to compare all the numbers together through a bubble sort or insertion sort and see if any numbers are equal to each other. I am capable of doing this in C/C++ and Java but I have no experience parsing documents using Scripts.
Could anyone push me in the right direction? What should I be reading to learn my way around this?
Are you already using an STDF library?
You did not mention one, so I assume not.
You should find a library you are comfortable with (the list changes over time, but you can find some by Googling or looking at the STDF page on Wikipedia) rather than attempting to parse STDF yourself, unless you have a good reason to recreate the STDF parser wheel.
An STDF file contains many tests. It generally does not make sense to compare the results for different tests, so I assume you are looking for matching values within the set of results for each test.
I would use your chosen STDF parser to read the value of each test for each part. Keep a set of the results for each test. As you read each new result, check the set to see if already exists. If it does, you have found the case you were looking for, otherwise add the result to the set.

Can't figure out how to search LOINC using FHIR for a specific test by name?

Can anyone provide some insight on the required syntax to use to search LOINC using FHIR for a specific string in the labs descriptive text portion of an Observation resource?
Is this even possible?
The documentation is all over the place and I can't find an example for this generic kind of search.
I found similar examples here: https://www.hl7.org/fhir/2015Sep/valueset-operations.html
Such as: GET "[base]/ValueSet/23/$validate-code?system=http://loinc.org&code=1963-8&display=test"
But none of them are providing a general enough case to do a global search of the LOINC system for a specific string in an Observation resource.
None of my attempts to use the FHIR UI here, http://polaris.i3l.gatech.edu:8080/gt-fhir-webapp/search?serverId=gatechreadonly&resource=Observation , have been successful. I keep getting a 500 Internal Server Error because I don't know the correct syntax to use for the value part of the search, and I can't find any documentation out of all the copious documents online that explains this very simple concept.
Can anyone provide some insight?
Totally frustrated at this point.
Observation?code=12345-6
or
Observation?code=http://loinc.org|12345-6
where 12345-6 is whatever LOINC code you want to look for (e.g. 39802-4)
The second ensures you'll only match on LOINC codes as opposed to codes from other systems, though given the relatively unique format of LOINC codes, you're mostly safe without including that.
If you want to search for a set of codes, then you can separate the codes or the tuples with commas: E.g.
Observation?code=12345-6,12345-7
or
Observation?code=http://loinc.org|12345-6,http://loinc.org|123456
If you expect to search by a really long list of codes frequently, you can define a value set that includes all the desired codes and then filter by value set:
Observation?code:in=http://somwhere.org/whatever/ValueSet/123
Note: for readability, I haven't escaped the URL contents, but you'll need to escape the URL values appropriately.

to_tsquery() validation

I'm currently developing a website that allows a search on a PostgreSQL
database, the search works with to_tsquery() and I'm trying to find a way to validate the input before it's being sent as a query.
Other than that I'm also trying to add a phrasing capability, so that if someone searches for HELLO | "I LIKE CATS" it will only find results with "hello" or the entire phrase "i like cats" (as opposed to I & LIKE & CATS that will find you articles that have all 3 words,
regardless where they might appear).
Is there some reason why it's too expensive to let the DB server validate it? It does seem a bit excessive to duplicate the ts_query parsing algorithm in the client.
If the concern is that you don't want it to try running the whole query (which presumably will involve table access) each time it validates, you could use the input in a smaller query, just in pseudocode (which may look a bit like Python, but that's just coincidence):
is_valid_query(input):
try:
execute("SELECT ts_query($1)", input);
return True
except DatabaseError:
return False
With regard to phrasing, it's probably easiest to search by the non-phrased query first (using indexes), then filter those for having the phrase. That could be done server side or client side. Depending on the language being parsed, it might be easiest to construct a simple regex of the phrase that deals with repeated whitespace or other ignorable symbols.
Search for to_tsquery('HELLO|(I&LIKE&CATS)'), getting back a list of documents which loosely match.
In the client, filter that to those matching the regex "HELLO|(I\s+LIKE\s+CATS)".
The downside is you do need some additional code for translating your query into the appropriate looser query, and then for translating it into a regex.
Finally, there might be a technique in PostgreSQL to do proper phrase searching using the lexeme positions that are stored in ts_vectors. I'm guessing that phrase searches are one of the intended uses, but I couldn't find an example of it in my cursory search. There's a section on it near the bottom of http://linuxgazette.net/164/sephton.html at least.

How does spell checker and spell fixer of Google (or any search engine) work?

When searching for something in Google, if you misspell a word (may be by mistake or may be when you really mean this non-dictionary word), Google says:
"Showing results for ..... Search instead for .......".
I am trying to figure out how this would work.
This basically means being able to find the closest dictionary word to the non-dictionary word entered. How does it work? One way I can guess is :
count no. of instances of each character and then scan dictionary to find a word with same no. of instances of each character (only with +-1 difference). But this will also return anagrams.
Is some kind of probabilistic model of any use here such as Markov etc. I don't understand Markov well enough to throw it around but just a very wild guess.
Any insights?
You're forgetting that google has a lot more information available to it then you do. They track when people type in a word, don't select a result, and then do another search shortly afterwards. They then use this information to suggest better searches for you.
See How does the Google "Did you mean?" Algorithm work? for a fuller explanation.
Note that this approach makes sense when you consider that Google aren't actually doing spell-checking. Instead, they are trying to work out what search term will give you the answer you are looking for. Obviously there is a lot of overlap between this and spell-checking, but it means they are not always trying to correct a search for, e.g., "Flickr".
When you search something which is related to other searches performed earlied closed to yours and got more results, google shows suggest on them.
We are sure that it is not spell checking but it shows what other people queried the related keywords.

How does google know if I type in redflower.jpg I mean Red Flower?

I'm curious what the programming terms or methodology is used when Google shows you the "did you mean" link for a word that is made up of multiple words?
For example if I type in "redflower.jpg" It knows to break that up into Red Flower
Is there a common paradigm for doing that sort of operation? Would a Lucene search give you that?
thanks!
If google does not see a lot of matching results for reflowers.jpg, it might then try to cut the words in multiple words until it finds a lot of matching results.
It might also recognize the extension (.jpg), recognize the image extension and then try to find images with the similar name.
If I would have to make an algorithm like this, I would use an huge EXISTING database (either a dictionary or a search engine) and then try what I said in the beginning of my post.
Perhaps they could to look at what other people do when they have searched for redflowers.jpg? Maybe a number of people searched for "redflowers.jpg", didn't click on any links, and then searched for "Red Flower" and found some results worth clicking on.
Of course they would have to take into account that the queries are similar (contain matching strings), otherwise some strange results might appear.

Resources