if..if else..else..in linq to objects - linq

Given a DataView that contains multiple rows, I want to extract a row based on the following criteria;
If a row starts with a certain string and ends with a certain string, then choose that row above all others
If no rows meet the first criteria, then just look for a row that start with the certain string
If we can't match any of the above, default to null.
My quick attempt simply returns the first row that meets any criteria (excuse me if the VB syntax isn't right, I'm not that familiar with it);
Dim result = (From row In dv.Table.Rows() _
Where (GetString(row, "id").StartsWith(Me.ID.Substring(0, 3), StringComparison.OrdinalIgnoreCase) AndAlso _
GetString(row, "id").EndsWith(Me.ID.Substring(Me.RegisteredID.Length - 2, 2), StringComparison.OrdinalIgnoreCase)) OrElse _
GetString(row, "id").StartsWith(Me.ID.Substring(0, 3), StringComparison.OrdinalIgnoreCase) _
Select row).FirstOrDefault()
Edit: I meant to add that something like https://stackoverflow.com/a/443055/685760 looks promising, but I don't think it will work in my situation. Feel free to correct me if I'm wrong.

It would be difficult, if not impossible, to express that kind of algorithm in a query. A query will only judge each row independantly from the others.
You need to have the resultset of the first criteria before you can check for the second...
In my opinion you need to do three separate queries (which might just be ending up executing only the first one if it returns any result, since at that point you met your first criteria and do not need to check for other results).
The other way would be, I guess, through some kind of ranking of the data, but you might end up with something less clear AND less performant.
In any case, your code will be a hundred times more readable for common mortals, even if there was a way to express your needs in a query. There is such a thing as overuse of Linq ;)

Related

Is it indexing Or tagging?

I have two classes claim and index. i have a field in my claim class called topic which is a string. I m trying to index the topic column not using database index column features. But it should by coding the following method.
Suppose i have claim 1, for claim 1 topic field ("i love muffins muffins") i ll do the folowing treatment
#1. Create an empty Dictionary with "word"=>occurrences
#2. Create a List of the stopwords exemple stopwords = ("For","This".....etc )
#3. Create List of the delimiters exemple delimiter_chars = ",.;:!?"
#4. Split the Text(topic field) into words delimited by whitespace.
#5. Remove unwanted delimiter characters adjoining words.
#6. Remove stopwords.
#7. Remove Duplicate
#8. now i create multiple index object (word="love",occurences = 1,looked = 0,reference on claim 1),(word="muffins",occurences = 2,looked = 0,reference on claim 1),
now whenever i look the word muffins for exemple looked will increase by one and i will move the record up in my database. So my question is the following is this method good ? is it better than database index features ? is there someways to improve things ?
What I think you are looking for is something called a B-Tree. In your case, you would use a 26 (or 54 if you need case sensitivity) branch node in the tree. This will make finding objects very fast. I think the time is nlogn or something. In the node, you would have a pointer to the actual data in an array, list, file, or something else.
However, unless you are willing to put the time in to code something specific for your application, you might be better off using a database such as Oracle, Microsoft SQL Server, or MySQL because these are professionally developed and profiled to get the maximum performance possible.

Split a Value in a Column with Right Function in SSIS

I need an urgent help from you guys, the thing i have a column which represent the full name of a user , now i want to split it into first and last name.
The format of the Full name is "World, hello", now the first name here is hello and last name is world.
I am using Derived Column(SSIS) and using Right Function for First Name and substring function for last name, but the result of these seems to be blank, this where even i am blank. :)
It's working for me. In general, you should provide more detail in your questions on places such as this to help others recreate and troubleshoot your issue. You did not specify whether we needed to address NULLs in this field nor do I know how you'd want to interpret it so there is room for improvement on this answer.
I started with a simple OLE DB Source and hard coded a query of "SELECT 'World, Hello' AS Name".
I created 2 Derived Column Tasks. The first one adds a column to Data Flow called FirstCommaPosition. The formula I used is FINDSTRING(Name,",", 1) If NAME is NULLable, then we will need to test for nullability prior to calling the FINDSTRING function. You'll then need to determine how you will want to store the split data in the case of NULLs. I would assume both first and last are should be NULLed but I don't know that.
There are two reasons for doing this in separate steps. The first is performance. As counter-intuitive as it sounds, doing less in a derived column results in better performance because the SSIS engine can better parallelize the operations. The other is more simple - I will need to use this value to make the first and last name split so it will be easier and less maintenance to reference a column than to copy paste a formula.
The second Derived Column is going to actually perform the split.
My FirstNameUnicode column uses this formula (FirstCommaPosition > 0) ? RTRIM(LTRIM(RIGHT(Name,FirstCommaPosition))) : "" That says "If we found a comma in the preceding step, then slice out everything from the comma's position to the end of the string and apply trim operations. If we didn't find a comma, then just return a blank string. The default string type for expressions will be the Unicode (DT_WSTR) so if that is not your need, you will need to cast the resultant into the correct string codepage (DT_STR)
My LastNameUnicode column uses this formula (FirstCommaPosition > 0) ? SUBSTRING(Name,1,FirstCommaPosition -1) : "" Similar logic as above except now I use the SUBSTRING operation instead of RIGHT. Users of the 2012 release of SSIS and beyond, rejoice fo you can use the LEFT function instead of SUBSTRING. Also note that you will need to back off 1 position to remove the comma.

Iterate through items on a given date within date range rails

I kind of have the feeling this has been asked before, but I have been searching, but cannot come to a clear description.
I have a rails app that holds items that occur on a specific date (like birthdays). Now I would like to make a view that creates a table (or something else, divs are all right as well) that states a specified date once and then iterates over the related items one by one.
Items have a date field and are, of course, not related to a date in a separate table or something.
I can of course query the database for ~30 times (as I want a representation for one months worth of items), but I think it looks ugly and would be massively repetitive. I would like the outcome to look like this (consider it a table with two columns for the time being):
Jan/1 | jan1.item1.desc
| jan1.item2.desc
| jan1.item3.desc
Jan/2 | jan2.item1.desc
| etc.
So I think I need to know two things: how to construct a correct query (but it could be that this is as simple as Item.where("date > ? < ?", lower_bound, upper_bound)) and how to translate that into the view.
I have also thought about a hash with a key for each individual day and an array for the values, but I'd have to construct that like above(repetition) which I expect is not very elegant.
Using GROUP BY does not seem to get me anything different (apart from the grouping, of course, of the items) to work with than other queries. Just an array of objects, but I might do this wrong.
Sorry if it is a basic question. I am relatively new to the field (and programming in general).
If you're making a calendar, you probably want to GROUP BY date:
SELECT COUNT(*) AS instances, DATE(`date`) AS on_date FROM items GROUP BY DATE(`date`)
This is presuming your column is literally called date, which seeing as how that's a SQL reserved word, is probably a bad idea. You'll need to escape that whenever it's used if that's the case, using ``` here in MySQL notation. Postgres and others use a different approach.
For instances in a range, what you want is probably the BETWEEN operator:
#items = Item.where("`date` BETWEEN ? AND ?", lower_bound, upper_bound)

Query performance in PostgreSQL using 'similar to'

I need to retrieve certain rows from a table depending on certain values in a specific column, named columnX in the example:
select *
from tableName
where columnX similar to ('%A%|%B%|%C%|%1%|%2%|%3%')
So if columnX contains at least one of the values specified (A, B, C, 1, 2, 3), I will keep the row.
I can't find a better approach than using similar to. The problem is that the query takes too long for a table with more than a million rows.
I've tried indexing it:
create index tableName_columnX_idx on tableName (columnX)
where columnX similar to ('%A%|%B%|%C%|%1%|%2%|%3%')
However, if the condition is variable (the values could be other than A, B, C, 1, 2, 3), I would need a different index for each condition.
Is there any better solution for this problem?
EDIT: Thanks everybody for the feedback. Looks like I've achieved to this point maybe because of a design mistake (topic I've posted in a separated question).
If you are only going to search lists of one-character values, then split each string into an array of characters and index the array:
CREATE INDEX
ix_tablename_columnxlist
ON tableName
USING GIN((REGEXP_SPLIT_TO_ARRAY(columnX, '')))
then search against the index:
SELECT *
FROM tableName
WHERE REGEXP_SPLIT_TO_ARRAY(columnX, '') && ARRAY['A', 'B', 'C', '1', '2', '3']
I agree with #Quassnoi, a GIN index is fastest and simplest - unless write performance or disk space are issues because it occupies a lot of space and eats quite a bit of performance for INSERT, UPDATE and DELETE.
My additional answer is triggered by your statement:
I can't find a better approach than using similar to.
If that is what you found, then your search isn't over, yet. SIMILAR TO is a complete waste of time. Literally. PostgreSQL only features it to comply to the (weird) SQL standard. Inspect the output of EXPLAIN ANALYZE for your query and you will find that SIMILAR TO has been replaced by a regular expression.
Internally every SIMILAR TO expression is rewritten to a regular expression. Consequently, for each and every SIMILAR TO expression there is at least one regular expression match that is a bit faster. Let EXPLAIN ANALYZE translate it for you, if you are not sure. You won't find this in the manual, PostgreSQL does not promise to do it this way, but I have yet to see an exception.
More details in this related answer on dba.SE.
This strikes me as a data modelling issue. You appear to be using a text field as a set, storing single character codes to identify values present in the set.
If so, I'd want to remodel this table to use one of the following approaches:
Standard relational normalization. Drop columnX, and replace it with a new table with a foreign key reference to tableName(id) and a charcode column that contains one character from the old columnX per row, like CREATE TABLE tablename_columnx_set(tablename_id integer not null references tablename(id), charcode "char", primary key (tablename_id, charcode)). You can then fairly efficiently search for keys in columnX using normal SQL subqueries, joins, etc. If your application can't cope with that change you could always keep columnX and maintain the side table using triggers.
Convert columnX to a hstore of keys with a dummy value. You can then use hstore operators like columnX ?| ARRAY['A','B','C']. A GiST index on the hstore of columnX should provide fairly solid performance for those operations.
Split to an array as recommended by Quassnoi if your table change rate is low and you can pay the costs of the GIN index;
Convert columnX to an array of integers, use intarray and the intarray GiST index. Have a mapping table of codes to integers or convert in the application.
Time permitting I'll follow up with demos of each. Making up the dummy data is a pain, so it'll depend on what else is going on.
I'll post this as an answer because it may guide other people in the future: Why not have 6 columns, haveA, haveB ~ have3 and do a 6-part OR query? Or use a bitmask?
If there are too many attributes to assign a column each, I might try creating an "attribute" table:
(fkey, attr) VALUES (1, 'A'), (1, 'B'), (2, '3')
and let the DBMS worry about the optimization.

Salesforce SOQL query length and efficiency

I am trying to solve a problem of deleting only rows matching two criteria, each being a list of ids. Now these Ids are in pairs, if the item to be deleted has one, it must have the second one in the pair, so just using two in clauses will not work. I have come up with two solutions.
1) Use the two in clauses but then loop over the items and check that the two ids in question appear in the correct pairing.
I.E.
for(Object__c obj : [SELECT Id FROM Object__c WHERE Relation1__c in :idlist1 AND Relation2__c in:idlist2]){
if(preConstructedPairingsAsString.contains(''+obj.Relation1__c+obj.Relation2__c)){
listToDelete.add(obj);
}
}
2) Loop over the ids and build an admittedly long query.
I like the second choice because I only get the items I need and can just throw the list into delete but I know that salesforce has hangups with SOQL queries. Is there a penalty to the second option? Is it better to build and query off a long string or to get more objects than necessary and filter?
In general you want to put as much logic as you can into soql queries because that won't use any script statements and they execute faster than your code. However, there is a 10k character limit on soql queries (can be raised to 20k) so based on my back of the envelope calculations you'd only be able to put in 250 id pairs or so before hitting that limit.
I would go with option 1 or if you really care about efficiency you can create a formula field on the object that pairs the ids and filter on that.
formula: relation1__c + '-' + relation2__c
for(list<Object__c> objs : [SELECT Id FROM Object__c WHERE formula__c in :idpairs]){
delete objs;
}

Resources