How to paging while every page may start form a different offset - ajax

For example: 50 rows per page
Page1: First fetch 50 rows. row 0 to row 49, and remove 15 rows by a filter. So I have only 35 rows, then I fetch another 50 rows. Row 50 to row 99, then I get 15 rows from these rows by the same filter to merge those 35 rows. Now I have 50 rows, but spent at lest 65 rows.
Page2: Start with >= row 65.
Page3: Start with >= row 115.
Page4: Start with >= row 165.
How do I get the start offset of page2 when getting the page4?

If you are fetching from a database, make the filter part of the query itself and let the database software do the counting for you. E.g.
SELECT * from myrows WHERE [condition] LIMIT 50 OFFSET 200
would give you the filtered records 200..249, i.e. contents of page #5.

Related

Talend : How to fix the code of method is exceeding the 65535 byte limit

I have a set of 5 tables that have 2 millions rows and 450 columns approximately
My job look like this :
tDBInput 1 ---tMap-----
tDBInput 1 ---tMap-----
tDBInput 1 ---tMap---tUnite---tDBOutput
tDBInput 1 ---tMap-----
tDBInput 1 ---tMap-----
It's my 5 tables tables that I'm trying to union, with the tMap where I'm adding an Id to trace which table date come from + reduce number of columns (from 450 to 20)
Then I unite the 5 in one tUnite that load a table in Truncate - Insert mode
I'm trying to make it work but always have the same error which is "The code of method tDBInput-10Process is exceeding the 65535 bytes limit"
If you use only 20 of 450 columns, you could select only those columns in each of your tDBInput, instead of extracting all columns and filtering them in tMap.

In sqoop what does "size" mean when used with --split-limit arguments

From sqoop docs
Using the --split-limit parameter places a limit on the size of the split section created. If the size of the split created is larger than the size specified in this parameter, then the splits would be resized to fit within this limit, and the number of splits will change according to that.
What does "size" refer to here. Can some one explain with a little example.
I was just reading this and I think it would be interpreted like this.
Example table has a Primary Key col called ID and is an INT and table has 1000 rows with the ID values from 1 to 1000. if you set num-mappers to 50 then you would have 50 tasks each try to import 20 rows. The first query would have a predicate that says WHERE ID >= 1 AND ID <= 20. The 2nd mapper would say WHERE ID >= 21 AND ID <= 40... and so on.
If you also define the split-limit then depending on the size of the splits this parameter may adjust the number of tasks used to sqoop the data.
For example, with num-mappers set to 50 and split-limit set to 10, you would now need 100 tasks to import 10 rows of data each to get all 1000 rows. Your first task would now saw WHERE ID >= 1 AND ID <= 10.
In the case of a DateTime column, the value is now based on seconds. So If you have 10 years of data with 1 row for every day you would have about 3,653 rows of data. If you set num-mappers to 10 then your tasks would each try to sqoop about 365 days of data with a predicate that looked something like MYDATETIMECOL >= '2010-01-01' AND MYDATETIMECOL <= '2010-12-31' but if you also set the split-limit to something like 2592000 (num of seconds in 30 days) then you would need about 122 tasks to sqoop the data and the first task would have a predicate like MYDATETIMECOL >= '2010-01-01' AND MYDATETIMECOL <= '2010-01-30'.
These two examples have both used a 1:1 ratio for column value to row count. If either of these tables had 1000 rows per value in the split-by col then ALL of those rows would be sqooped as well.
Example with DateTime col where every day you have loaded 1000 rows for the last 10 years and now you have 3,653,000 rows, the predicates and the number of tasks would be the same but the number of rows sqooped in each of those tasks would be 1000x more.

Sampling Issue with hive

"all_members" is a table in hive with 10m rows and 1 column: "membership_nbr". I want to sample 3000 rows. This is what I have done:
hive>create table sample_members as select * from all_members limit 1;
hive>insert overwrite table sample_members select membership_nbr from all_members tablesample(3000 rows);
hive>select count(*) from sample_members;
OK 45000
The result wont change if I replace 3000 rows with 300 rows
Do I do something wrong?
Table Sampling using tablesample(3000 rows) wont fetch 3000 rows from entire table instead it will fetch 3000 rows from each input split.
So, your query might run 15 mappers. So, each mapper will fetch 3000 rows. Totally, 3000 * 15 = 45000 rows. Also, if you change the 3000 rows to 300 rows you will get 4500 rows as output after sampling.
So, as per your requirement you have to give tablesample(200 rows). As a result each mapper will fetch 200 rows. Finally, 15 mappers will fetch 3000 sampling rows.
Refer the below link for various types of sampling:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling

Sum Column At Each Value Change of another field in foxpro

Sum Column At Each Value Change of another field in foxpro. How can I get the additive column? I was able to do a running total but for all the items how can I get it to start at each item change?
E.g.
Item Number QTY ADDITIVE
1045 50 50
1045 25 75
1045 35 110
2045 50 50
2045 50 100
2045 25 125
3056 30 30
3056 30 60
3056 30 90
It looks like simple additive, but how are you planning on storing and presenting to the end-user the results... in a grid or just final running total per individual item? It looks like this might represent sales order item / qty sold. I would probably query into a read/write cursor ordered by item, then apply a scan loop to update each one... something like.
select ItemNumber, Qty, 000000 as RunningTotal ;
from YourTable ;
order by ItemNumber ;
into cursor C_Sample readwrite
lastItem = ""
runTotal = 0
scan
*/ If different item, reset running total back to zero
if lastItem != ItemNumber
runTotal = 0
endif
*/ Update running total
runTotal = runTotal + Qty
*/ Update the record column
replace RunningTotal with runTotal
*/ preserve the ID we just processed for comparison to next record
lastItem = ItemNumber
endscan
*/ Done...

use lag int the next line after its line have been executed

This is a very complicated situation for me and I was wondering if someone can help me with it:
Here is my table:
Record_no Type Solde SQLCalculatedPmu DesiredValues
------------------------------------------------------------------------
2570088 Insertion 60 133 133
2636476 Insertion 67 119,104 119,104
2636477 Insertion 68 117,352 117,352
2958292 Insertion 74 107,837 107,837
3148350 Radiation 73 107,837 107,83 <---
3282189 Insertion 80 98,401 98,395
3646066 Insertion 160 49,201 49,198
3783510 Insertion 176 44,728 44,725
3783511 Insertion 177 44,475 44,472
4183663 Insertion 188 41,873 41,87
4183664 Insertion 189 41,651 41,648
4183665 Radiation 188 41,651 41,64 <---
4183666 Insertion 195 40,156 40,145
4183667 Insertion 275 28,474 28,466
4183668 Insertion 291 26,908 26,901
4183669 Insertion 292 26,816 26,809
4183670 Insertion 303 25,842 25,836
4183671 Insertion 304 25,757 25,751
In my table every value in the SQLCalculatedPmu column or desiredValue Column is calculated based on the preceding value.
As you can see, I have calculated the SQLcalculatedPMU column based on the round on 3 decimals. The case is that on each line radiation, the client want to start the next calculation based on 2 decimals instead of 3(represented in the desired values column). Next values will be recalculated. For example line 6 will change as the value in line 5 is now on 2 decimals. I could handle this if there where one single radiation but in my case I have a lot of Radiations and in this case they will change all based on the calculation of the two decimals.
In summary, Here are the steps:
1 - round the value of the preceding row of a raditaiton and put it in the radiation row.
2 - calculate all next insertion rows.
3 - when we reach another radiation we redo steps 1 and 2 and so on
I m using an oracle DB and I m the owner so I can make procedures, insert, update, select.
But I m not familiar with procedures or loops.
For information, this is the formula for SQLCalculatedPmu uses two additional culmns price and number and this is calculated every line cumulativelly for each investor:
(price * number)+(cumulative (price*number) of the preceeding lines)
I tried something like this :
update PMUTemp
set SQLCalculatedPmu =
case when Type = 'Insertion' then
(number*price)+lag(SQLCalculatedPmu ,1) over (partition by investor
order by Record_no)/
(number+lag(solde,1) over (partition by investor order by Record_no))
else
TRUNC(lag(SQLCalculatedPmu,1) over partition by invetor order by Record_no))
end;
but I gave me this error (I think it's because I m looking at the preceiding line that itself is modified during the SQL statement) :
ORA-30486 : window function are allowed only in the SELECT list of a query.
I was wondering if creating a procedure that will be called as many time as the number of radiations would do the job but I m really not good in procedures
Any help
Regards,
just to make my need simpler, all I want is to have the DesiredValues column starting from the SQLCalculatedPmu column. Steps are
1 - on a radiation the value become = trunc(preceding value,2)
2 - calculate all next insertion rows this way : (price * number)+(cumulative (price*number) of the preceeding lines). As the radiation value have changed then I need to recalculate next lines based on it
3 - when we reach another radiation we redo steps 1 and 2 and so on
Kindest regards
You should not need a procedure here -- a SQL update of the Radiation rows in the table would do this quicker and more reliably.
Something like ..
update my_table t1
set (column_1, column_2) =
(select round(column_1,2), round(column_2,2)
from my_table t2
where t2.type = 'Insertion' and
t2.record_no = (select max(t3.record_no)
from my_table t3
where t3.type = 'Insertion' and
t3.record_no < t1.record_no ))
where t1.type = 'Radiation'

Resources