I have a problem: for example, we have a few letters: b,o,s. It's a letters from some word and they go in the same order as in word (in this case word is books). But, of course, it may be another word.
So I need to get the list of the possible words, for example, lenght = 10. How can I do these? I feel, that the problem is close to crossword solving, so may be there is some services with API?
Get yourself a word list https://www.google.com/?gws_rd=ssl#q=english+word+list
Put the words in a database table CREATE TABLE Words(Word VARCHAR(64) PRIMARY KEY) ... INSERT INTO Words(Word) VALUE(UPPER(#word)) ...
Query the table with a LIKE query SELECT Word FROM Words WHERE Word LIKE 'B%O%S'
Related
I have a large table with a clob column (+100,000 rows) from which I need to search for specific words within a certain timeframe.
{select id, clob_field, dbms_lob.instr(clob_field, '.doc',1,1) as doc, --ideally want .doc
dbms_lob.instr(clob_field, '.docx',1,1) as docx, --ideally want .docx
dbms_lob.instr(clob_field, '.DOC',1,1) as DOC, --ideally want .DOC
dbms_lob.instr(clob_field, '.DOCX',1,1) as DOCX --ideally want .DOCX
from clob_table, search_words s
where (to_char(date_entered, 'DD-MON-YYYY')
between to_date('01-SEP-2018') and to_date('30-SEP-2018'))
AND (contains(clob_field, s.words )>0) ;}
The set of words are '.doc', '.DOC', '.docx', and '.docx'. When I use
CONTAINS() it seems to ignore the dot and so provides me with lots of rows, but not with the document extensions in it. It finds emails with .doc as part of the address, so the doc will have a period on either side of it.
i.e. mail.doc.george#here.com
I don't want those occurrences. I have tried it with a space at the end of the word and it ignores the spaces. I have put these in a search table I created, as shown above, and it still ignores the spaces. Any suggestions?
Thanks!!
Here's two suggestions.
The simple, inefficient way is to use something besides CONTAINS. Context indexes are notoriously tricky to get right. So instead of the last line, you could do:
AND regexp_instr(clob_field, '\.docx', 1,1,0,'i') > 0
I think that should work, but it might be very slow. Which is when you'd use an index. But Oracle Text indexes are more complicated than normal indexes. This old doc explains that punctuation characters (as defined in the index parameters) are not indexed, because the point of Oracle Text is to index words. If you want special characters to be indexed as part of the word, you need to add it to the set of printjoin characters. This doc explains how, but I'll paste it here. You need to drop your existing CONTEXT index and re-create it with this preference:
begin
ctx_ddl.create_preference('mylex', 'BASIC_LEXER');
ctx_ddl.set_attribute('mylex', 'printjoins', '._-'); -- periods, underscores, dashes can be parts of words
end;
/
CREATE INDEX myindex on clob_table(clob_field) INDEXTYPE IS CTXSYS.CONTEXT
parameters ('LEXER mylex');
Keep in mind that CONTEXT indexes are case-insensitive by default; I think that's what you want, but FYI you can change it by setting the 'mixed_case' attribute to 'Y' on the lexer, right below where you set the printjoins attribute above.
Also it seems like you're trying to search for words which end in .docx, but CONTAINS isn't INSTR - by default it matches entire words, not strings of characters. You'd probably want to modify your query to do AND contains(clob_field, '%.docx')>0
I want to use a hash table to store words.
For example, I have two words aba and aab ,because they are made up of the same elements just in different order , so I want to store them with the same index, and plug a link list at that link list. It's easy for me to search in a certain way. The elements of words are just 26 letters. How to design a proper index of the hash table? How to organize the table?
So the questions you want to answer with your hash table is: What words can be built with the letters I have?
I assume you are reading some dictionary and want to put all the values in the hash table. Then you could use an int array with the count how many times each letter occurs as key (e.g. 'a' would be index 0 and 'z' index 25) and for the value you would have to use a list, so that you can add more than one word to that entry.
But the simplest solution is probably just to use the sorted word as key (e.g. 'aba' gets key 'aab' and 'aab' obviously too), because words are not very long the sort isn't expensive (avoid creating new strings all the time by working with the character array).
So in Java you could get the key like this:
char[] key = word.toCharArray();
Arrays.sort(key);
// and if you want a string
String myKey = new String(key);
I have two classes claim and index. i have a field in my claim class called topic which is a string. I m trying to index the topic column not using database index column features. But it should by coding the following method.
Suppose i have claim 1, for claim 1 topic field ("i love muffins muffins") i ll do the folowing treatment
#1. Create an empty Dictionary with "word"=>occurrences
#2. Create a List of the stopwords exemple stopwords = ("For","This".....etc )
#3. Create List of the delimiters exemple delimiter_chars = ",.;:!?"
#4. Split the Text(topic field) into words delimited by whitespace.
#5. Remove unwanted delimiter characters adjoining words.
#6. Remove stopwords.
#7. Remove Duplicate
#8. now i create multiple index object (word="love",occurences = 1,looked = 0,reference on claim 1),(word="muffins",occurences = 2,looked = 0,reference on claim 1),
now whenever i look the word muffins for exemple looked will increase by one and i will move the record up in my database. So my question is the following is this method good ? is it better than database index features ? is there someways to improve things ?
What I think you are looking for is something called a B-Tree. In your case, you would use a 26 (or 54 if you need case sensitivity) branch node in the tree. This will make finding objects very fast. I think the time is nlogn or something. In the node, you would have a pointer to the actual data in an array, list, file, or something else.
However, unless you are willing to put the time in to code something specific for your application, you might be better off using a database such as Oracle, Microsoft SQL Server, or MySQL because these are professionally developed and profiled to get the maximum performance possible.
Lets say i have 3 table A,B,C.
In every table i have some insert query.
I want to using Find "ctrl+f" to find every insert query with some format.
Example: i want to find code that contain "insert [table_name] value" no matter what is the table name (A or B or C), so i want to search some code but skip the word in the middle of it.
I have googling with any keyword, but i doesn't get any solution that even close to what i want.
Is it possible to do something like this.?
You need to use what are known as "wildcard" characters.
In the find window, you'll notice there is a check box called "Use Pattern Matching".
If you check this, then you can use some special characters to expand your search.
? is a wildcard that indicates any character can take this place.
* is a wildcard that indicates a string of any length could take this place
eg. ca? would match cat, car, cam etc
ca* would match cat, car, catastrophe, called ... etc
So something along the lines of insert * value should find what you are interested in.
I'm new to Oracle, I'm using oracle 11g. I'm storing postal codes of UK. Values are like these.
N22 5HF
SW1 4JD
N14 8IT
N22 1JT
E1 5DP
e1 8DS
E3 8TU
I should be able to easily compare first four characters of each postal code.
What is the best data type to store these data ?
As a slight variation on Lalit's answer, since you want the outward code rather than a fixed substring of the first four characters (which could incude a space and the start of the inward code), you can create a virtual column based on the first word of the value:
postcode varchar2(8),
outward_code generated always as
(substr(postcode, 1, instr(postcode, ' ', 1, 1) - 1))
And optionally, but probably if you're using this to search, an index on the virtual column.
This assumes the post codes are formatted properly in the first place. It won't work if you don't always have the space between the outward and inward codes. And to answer your original question, the actual post code should be a varchar2(8) column to hold alphanumeric valus up to the maximum size and with the standard format.
SQL Fiddle demo.
I should be able to easily compare first four characters of each postal code.
Then keep these first four characters in a separate column. And index this column. You could keep the other characters in different column. Now, if the codes are a mixture of alphanumeric characters, then you are left with VARCHAR2 data type.
Your query predicate would like -
WHERE post_code_col = substr('N22 5HF', 1, 4)
Thus the indexed column post_code_col would be efficient in performance.
On 11g, you have the option to create a virtual column. However, indexing it would be equivalent to a function-based index. So I woukd prefer the first way as I suggested above.
It is better to normalize the table during the design phase, else the issues would start creeping in later.
In my opinion you should use varchar2 data type because this field will not going to be in mathematical calculations (they should not be int or decimal) and these fields are not big enough (so this should not be text)