GlazedLists with two Comparators per column - sorting

I would like to know if somebody already did the effort to integrate the following into GlazedLists:
I want a separate Comparator for each sort direction of a column. A practical example for something like this is a file browser where the directories are always sorted to the front and only as secondary sorting requirement, I want to sort by the file/directory name. If i have only one Comparator, this is of course not possible.
I also just realized that it might exist also a workaround if I use sorting by multiple columns and the main sorting priority is on the column that says whether some element is a file or directory.
Does anyone have experience with this issue?
Thanks.

Ok, it seems I still did not unleash the full power of GlazedLists when I asked the question. The answer is very simple: I can just add another SortedList that sorts by file types as decorator of my initial SortedList that is sorted based on the table.

Related

Recursive database viewing

I have this situation. Starting from a table, I have to check all the records that match a key. If records are found, I have to check another table using a key from the first table and so on, more on less on five levels. There is a way to do this in a recursive way, or I have to write all the code "by hand"? The language I am using is Visual Fox Pro. If this is is not possible, is it al least possible to use recursion to popolate a treeview?
You can set a relation between tables. For example:
USE table_1.dbf IN 0 SHARED
USE table_2.dbf IN 0 SHARED
SET ORDER TO TAG key_field OF table_2.cdx IN table_2
SET RELATION TO key_field INTO table_2 ADDITIVE IN table_1
First two commands open table_1 and table_2. Then you have to set the order/index of table_2. If you don't have an index for the key field then this will not work. The final command sets the relation between the two tables on the key field.
From here you can browse both tables and table_2's records will be filtered based on table_1's key field. Hope this helps.
If the tables have similar structure or you only need to look at a few fields, you could write a recursive routine that receives the name of the table, the key to check, and perhaps the fields you need to check as parameters. The tricky part, I guess, is knowing what to pass down to the next call.
I don't think I can offer any more advice without at least seeing some table structures.
Sorry for answering so late, but the problem was of course that the recursion wasn't a viable solution since I had to search inside multiple tables. So I resolved by doing a simple 2-Level search in the tables that I needed.
Thank you very much for the help, and sorry again for answering so late.

Field order between Source and Source Qualifier need to be same?

This is might be a basic question but just curious to know, My source is a flat file and do we need to maintain same field order between source and source qualifier objects in Informaitca?
Thanks!
No but you need to maintain field order against list of fields returned by the query... especially if you use an sql override
No, you should not.You even may need to change this order if you are going to Sort data by Source Qualifier. If you set integer value for ‘Number of Sorted Ports’ SQ property Informatica Information Service(IS) will look at that number of ports starting at the top. So, you may need to rearrange ports order by moving sorting ports at top.

SAS: alternatives to First. and Last. variables when data can not be sorted?

Please help me with the following SAS problem. I need to transform my data set from "original" to "new" as shown in the picture. Because the "priority" variable can not be sorted, it seems that first. and last. variables would not work here, no? The goal is to have each sequence of priorities represent one entry in the "new" dataset.
Thank you!
p.s. I did not know how to create a table in this post so I just took a snapshot of the screen.
Seems fairly straightforward to me. Just create a batch ID.
data have_pret;
set have;
by subject;
if first.subject then batchID=0;
if priority=1 then batchID+1;
run;
Then you can transpose by subject/batchID. This assumes priority always starts at 1 - if it starts at > 1 sometimes, you may want to adjust your logic and keep track of prior value of priority.

Algorithm and data structure to store First name and last name

Is there a efficient way to store first name and last name in data structure so that we can lookup using either first or last name? I would consider a binary search tree with first name. It would be efficient to search first name. But wouldnt be efficient when trying to search last name. we can also consider one more BST with last name. Any ideas to implement it efficiently?
What if the question is
String names[] = { "A B","C D"};
A requirement is to be able to extend this directory dynamically at runtime,
without persistent storage. The directory can eventually grow to hundreds or
thousands of names and must be searchable by first or last name.
Now we can't have hash tables to store. Any ideas?
Two hash tables: one from first name to person, and one from last name to person.
Simple is best.
Why not put both first and last names in a trie?
As a bonus, this way you can even get suggestions on partial names by traversing all leaves after current node (maybe on an asynchronous call)
You're idea is pretty good, but here's another option: how about implementing to hash tables?
The first hash table would use first names as a key, and the associated value would either be the last name or a pointer to a Name object. The second hash table would use last names as keys, with the first names or pointers to Name as the values.
Personally, for choosing the values, I would go for a pointer to a Name object, since this method would be more applicable in case you'd like to store even more information (e.g. data of birth, etc.)
Also, see Does Java have a HashMap with reverse lookup?…, which is specific to Java but the discussion on the data structures is relevant to any language.
Note that structures such as Bidirectional Sorted Maps also allow range searches (which dual hash tables don't).
if you need to search only by first name or only by last name then yes, two hashmaps are the best (and notice you're not duplicating the data, you're partitioning it) but if you don't mind then put both first and last names in a single hashmap and don't differentiate between the two.

Dynamic search and display

I have a big load of documents, text-files, that I want to search for relevant content. I've seen a searching tool, can't remeber where, that implemented a nice method as I describe in my requirement below.
My requirement is as follows:
I need an optimised search function: I supply this search function with a list (one or more) partially-complete (or complete) words separated with spaces.
The function then finds all the documents containing words starting or equal to the first word, then search these found documents in the same way using the second word, and so on, at the end of which it returns a list containing the actual words found linked with the documents (name & location) containing them, for the complete the list of words.
The documents must contain all the words in the list.
I want to use this function to do an as-you-type search so that I can display and update the results in a tree-like structure in real-time.
A possible approach to a solution I came up with is as follows:
I create a database (most likely using mysql) with three tables: 'Documents', 'Words' and 'Word_Docs'.
'Documents' will have (idDoc, Name, Location) of all documents.
'Words' will have (idWord, Word) , and be a list of unique words from all the documents (a specific word appears only once).
'Word_Docs' will have (idWord, idDoc) , and be a list of unique id-combinations for each word and document it appears in.
The function is then called with the content of an editbox on each keystroke (except space):
the string is tokenized
(here my wheels spin a bit): I am sure a single SQL statement can be constructed to return the required dataset: (actual_words, doc_name, doc_location); (I'm not a hot-number with SQL), alternatively a sequence of calls for each token and parse-out the non-repeating idDocs?
this dataset (/list/array) is then returned
The returned list-content is then displayed:
e.g.: called with: "seq sta cod"
displays:
sequence - start - code - Counting Sequences [file://docs/sample/con_seq.txt]
- stop - code - Counting Sequences [file://docs/sample/con_seq.txt]
sequential - statement - code - SQL intro [file://somewhere/sql_intro.doc]
(and-so-on)
Is this an optimal way of doing it? The function needs to be fast, or should it be called only when a space is hit?
Should it offer word-completion? (Got the words in the database) At least this would prevent useless calls to the function for words that does not exist.
If word-completion: how would that be implemented?
(Maybe SO could also use this type of search-solution for browsing the tags? (In top-right of main page))
What you're talking about is known as an inverted index or posting list, and operates similary to what you propose and what Mecki proposes. There's a lot of literature about inverted indexes out there; the Wikipedia article is a good place to start.
Better, rather than trying to build it yourself, use an existing inverted index implementation. Both MySQL and recent versions of PostgreSQL have full text indexing by default. You may also want to check out Lucene for an independent solution. There are a lot of things to consider in writing a good inverted index, including tokenisation, stemming, multi-word queries, etc, etc, and a prebuilt solution will do all this for you.
The fastest way is certainly not using a database at all, since if you do the search manually with optimized data, you can easily beat select search performance. The fastest way, assuming the documents don't change very often, is to build index files and use these for finding the keywords. The index file is created like this:
Find all unique words in the text file. That is split the text file by spaces into words and add every word to a list unless already found on that list.
Take all words you have found and sort them alphabetically; the fastest way to do this is using Three Way Radix QuickSort. This algorithm is hard to beat in performance when sorting strings.
Write the sorted list to disk, one word a line.
When you now want to search the document file, ignore it completely, instead load the index file to memory and use binary search to find out if a word is in the index file or not. Binary search is hard to beat when searching large, sorted lists.
Alternatively you can merge step (1) and step (2) within a single step. If you use InsertionSort (which uses binary search to find the right insert position to insert a new element into an already sorted list), you not only have a fast algorithm to find out if the word is already on the list or not, in case it is not, you immediately get the correct position to insert it and if you always insert new ones like that, you will automatically have a sorted list when you get to step (3).
The problem is you need to update the index whenever the document changes... however, wouldn't this be true for the database solution as well? On the other hand, the database solution buys you some advantages: You can use it, even if the documents contain so many words, that the index files wouldn't fit into memory anymore (unlikely, as even a list of all English words will fit into memory of any average user PC); however, if you need to load index files of a huge number of documents, then memory may become a problem. Okay, you can work around that using clever tricks (e.g. searching directly within the files that you mapped to memory using mmap and so on), but these are the same tricks databases use already to perform speedy look-ups, thus why re-inventing the wheel? Further you also can prevent locking problems between searching words and updating indexes when a document has changed (that is, if the database can perform the locking for you or can perform the update or updates as an atomic operation). For a web solution with AJAX calls for list updates, using a database is probably the better solution (my first solution is rather suitable if this is a locally running application written in a low level language like C).
If you feel like doing it all in a single select call (which might not be optimal, but when you dynamacilly update web content with AJAX, it usually proves as the solution causing least headaches), you need to JOIN all three tables together. May SQL is a bit rusty, but I'll give it a try:
SELECT COUNT(Document.idDoc) AS NumOfHits, Documents.Name AS Name, Documents.Location AS Location
FROM Documents INNER JOIN Word_Docs ON Word_Docs.idDoc=Documents.idDoc
INNER JOIN Words ON Words.idWord=Words_Docs.idWord
WHERE Words.Word IN ('Word1', 'Word2', 'Word3', ..., 'WordX')
GROUP BY Document.idDoc HAVING NumOfHits=X
Okay, maybe this is not the fastest select... I guess it can be done faster. Anyway, it will find all matching documents that contain at least one word, then groups all equal documents together by ID, count how many have been grouped togetehr, and finally only shows results where NumOfHits (the number of words found of the IN statement) is equal to the number of words within the IN statement (if you search for 10 words, X is 10).
Not sure about the syntax (this is sql server syntax), but:
-- N is the number of elements in the list
SELECT idDoc, COUNT(1)
FROM Word_Docs wd INNER JOIN Words w on w.idWord = wd.idWord
WHERE w.Word IN ('word1', ..., 'wordN')
GROUP BY wd.idDoc
HAVING COUNT(1) = N
That is, without using like. With like things are MUCH more complex.
Google Desktop Search or a similar tool might meet your requirements.

Resources