How to split FoxPro records? - visual-foxpro

I have 60,000 records in the dbf file in FoxPro. I want to split it into each 20,000 records (20000 * 3 = 60,000).
How can I achieve this?
I am new to FoxPro. I am using Visual FoxPro 5.0.
Thanks in advance.

You must issue a SKIP command when using the COPY command to make sure you are starting on the next record.
USE MyTable
GO TOP
COPY NEXT 20000 TO NewTable1
SKIP 1
COPY NEXT 20000 TO NewTable2
SKIP 1
COPY NEXT 20000 TO NewTable3

Todd's suggestion will work if you don't care how the records are split. If you want to divide them up based on their content, you'll want to do something like Stuart's first suggestion, though his exact answer will only work if the IDs for the records run from 1 to 60,000 in order.
What's the ultimate goal here? Why divide the table up?
Tamar

You can directly select from the first table:
SELECT * from MyBigTable INTO TABLE SmallTable1 WHERE ID < 20000
SELECT * from MyBigTable INTO TABLE SmallTable2 WHERE ID BETWEEN (20000, 39999)
SELECT * from MyBigTable INTO TABLE SmallTable3 WHERE ID > 39999
if you want more control, though, or you need to manipulate the data, you can use xbase code, something like this:
SELECT MyBigTable
scan
scatter name oRecord memo
if oRecord.Id < 20000
select SmallTable1
append blank
gather name oRecord memo
else if oRecord.Id < 40000
select SmallTable2
append blank
gather name oRecord memo
else
select SmallTable3
append blank
gather name oRecord memo
endscan
It's been a while since I used VFP and I don't have it here, so apologies for any syntax errors.

use in 0 YourTable
select YourTable
go top
copy to NewTable1 next 20000
copy to NewTable2 next 20000
copy to NewTable3 next 20000

If you wanted to split based on record numbers, try this:
SELECT * FROM table INTO TABLE tbl1 WHERE RECNO() <= 20000
SELECT * FROM table INTO TABLE tbl2 WHERE BETWEEN(RECNO(), 20001, 40000)
SELECT * FROM table INTO TABLE tbl3 WHERE RECNO() > 40000

Related

Duplicate removal from a table using Informatica

I have a scenario to be implemented in informatica where I need to remove duplicate records from a table based on PK. But I need to keep the 1st occurrence of the PK values and remove the others(in case of duplicate PK).
For example, If my source has 1,1,1,2,3,3,4,5,4. I want to see my target data as 1,2,3,4,5. I have to read data from the same table and need to load into the same table., no new table can be introduced. please help me with your inputs.
Thanks in Advance!
I suppose you want the first occurrence because there are other (data) columns in addition to the key you entered. Therefore you want
1,b
1,c
1,a
2,d
3,c
3,d
4,e
5,f
4,b
Turned into
1,b
2,d
3,c
4,e
5,f
??
In that case try this mapping layout:
SRC -> SQ -> SRT -> AGG -> TGT
SEQ /
Where the sorter is set to sort on the KEY,sequence_port (desc)
And the aggregator is set to group by the KEY, and the sequence_port should not go out of the sorter
Hope you can follow me :)
There are multiple ways to do this, the simplest would be too do it in the SQL override.
Assuming the example quoted above, the SQL would be like this. General idea is to set a row number for a primary key ( so if you have 3 rows with same pk they will have 1,2,3 as row numbers before being reset for the next pk)
SQL:
select * from (
Select primary_key,column2 row_number() over (partition by primary_key order by primary_key) as distinct_key) where distinct_key=1
Before:
1,b
1,c
1,a
2,d
3,c
3,d
Intermediate query:
1,c,1
1,a,2
2,d,1
3,c,1
3,d,2
output:
1,c
2,d
3,d
I am able to achieve this by following the below steps.
1. Passing Sorted data(keys are EMP_ID, MOBILE, DEPTID) to an expression.
2. Creating the following variable ports in the expression and getting the counts.
V_CURR_EMP_ID = EMP_ID
V_CURR_MOBILE = MOBILE
V_CURR_DEPTID = DEPTID
V_COUNT =
IIF(V_CURR_EMP_ID=V_PREV_EMP_ID AND V_CURR_MOBILE=V_PREV_MOBILE AND V_CURR_DEPTID=V_PREV_DEPTID ,V_COUNT+1,1)
V_PREV_EMP_ID = EMP_ID
V_PREV_MOBILE = MOBILE
V_PREV_DEPTID = DEPTID
O_COUNT =V_COUNT
3. In the next transformation which is filter, I am taking only the records which have count more than 1 and deleting them using update strategy(DD_DELETE).
Here is the mapping flow.
SQ->SRTR->EXP->FIL->UPD->TGT
Also, when I tried to delete them using aggregator , it is deleting only the first occurrence of duplicates but not all.
Thanks again for your inputs!

Count Length and then Count those records.

I am trying to create a view that displays size (char) of LastName and the total number of records whose last name has that size. So far I have:
SELECT LENGTH(LastName) AS Name_Size
FROM Table
ORDER BY Name_Size;
I need to add something like
COUNT(LENGTH(LastName)) AS Students
This is giving me an error. Do I need to add a GROUP BY command? I need the view:
Name_Size Students
3 11
4 24
5 42
SELECT LENGTH(LastName) as Name_Size, COUNT(*) as Students
FROM Table
GROUP BY Name_Size
ORDER BY Name_Size;
You may have to change the group by and order by to LENGTH(LastName) as not all SQL engines let you reference an alias from the select statement in a clause on that same statement.
HTH,
Eric

Comparing Similar Hive Tables

I have two hive tables (t1 and t2) that I would like to compare. The second table has 5 additional columns that are not in the first table. Other than the five disjoint fields, the two tables should be identical. I am trying to write a query to check this. Here is what I have so far:
SELECT * FROM t1
UNION ALL
select * from t2
GROUP BY some_value
HAVING count(*) == 2
If the tables are identical, this should return 0 records. However, since the second table contains 5 extra fields, I need to change the second select statement to reflect this. There are almost 60 column names so I would really hate to write it like this:
SELECT * FROM t1
UNION ALL
select field1, field2, field3,...,fieldn from t2
GROUP BY some_value
HAVING count(*) == 2
I have looked around and I know there is no select * EXCEPT syntax, but is there a way to do this query without having to explicity name each column that I want included in the final result?
You should have used UNION DISTINCT for the logic you are applying.
However, the number and names of columns returned by each select_statement have to be the same otherwise a schema error is thrown.
You could have a look at this Python program that handles such comparisons of Hive tables (comparing all the rows and all the columns), and would show you in a webpage the differences that might appear: https://github.com/bolcom/hive_compared_bq
To skip the 5 extra fields, you could use the "--ignore-columns" option.

Special Sorting in Teradata

I have the following query that is not sorting the table the way I want it:
SELECT * FROM tbl
ORDER BY
BAN,
BEN,
bill_seq_no DESC,
CASE
WHEN Ebene='BAN - Open Debts' THEN 1
WHEN Ebene='BEN - Open Debts' THEN 2
END,
Rufnummer
;
It should sort the table first by BAN, then by BEN. Now in the third level row with Ebene='BEN - Open Debts' has bill_seq_no = NULL. This is why it sorts this row in the bottom.
I want it at the top.
How can I do that?
Got it! It's
SELECT * FROM adam_tmp.AAM711119__result
ORDER BY
BAN,
BEN,
CASE
WHEN Ebene LIKE '%BEN - Open Debts%' THEN 1
ELSE 2
END,
bill_seq_no DESC,
Rufnummer
;

How to get records randomly from the oracle database?

I need to select rows randomly from an Oracle DB.
Ex: Assume a table with 100 rows, how I can randomly return 20 of those records from the entire 100 rows.
SELECT *
FROM (
SELECT *
FROM table
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum < 21;
SAMPLE() is not guaranteed to give you exactly 20 rows, but might be suitable (and may perform significantly better than a full query + sort-by-random for large tables):
SELECT *
FROM table SAMPLE(20);
Note: the 20 here is an approximate percentage, not the number of rows desired. In this case, since you have 100 rows, to get approximately 20 rows you ask for a 20% sample.
SELECT * FROM table SAMPLE(10) WHERE ROWNUM <= 20;
This is more efficient as it doesn't need to sort the Table.
SELECT column FROM
( SELECT column, dbms_random.value FROM table ORDER BY 2 )
where rownum <= 20;
In summary, two ways were introduced
1) using order by DBMS_RANDOM.VALUE clause
2) using sample([%]) function
The first way has advantage in 'CORRECTNESS' which means you will never fail get result if it actually exists, while in the second way you may get no result even though it has cases satisfying the query condition since information is reduced during sampling.
The second way has advantage in 'EFFICIENT' which mean you will get result faster and give light load to your database.
I was given an warning from DBA that my query using the first way gives loads to the database
You can choose one of two ways according to your interest!
In case of huge tables standard way with sorting by dbms_random.value is not effective because you need to scan whole table and dbms_random.value is pretty slow function and requires context switches. For such cases, there are 3 additional methods:
1: Use sample clause:
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
for example:
select *
from s1 sample block(1)
order by dbms_random.value
fetch first 1 rows only
ie get 1% of all blocks, then sort them randomly and return just 1 row.
2: if you have an index/primary key on the column with normal distribution, you can get min and max values, get random value in this range and get first row with a value greater or equal than that randomly generated value.
Example:
--big table with 1 mln rows with primary key on ID with normal distribution:
Create table s1(id primary key,padding) as
select level, rpad('x',100,'x')
from dual
connect by level<=1e6;
select *
from s1
where id>=(select
dbms_random.value(
(select min(id) from s1),
(select max(id) from s1)
)
from dual)
order by id
fetch first 1 rows only;
3: get random table block, generate rowid and get row from the table by this rowid:
select *
from s1
where rowid = (
select
DBMS_ROWID.ROWID_CREATE (
1,
objd,
file#,
block#,
1)
from
(
select/*+ rule */ file#,block#,objd
from v$bh b
where b.objd in (select o.data_object_id from user_objects o where object_name='S1' /* table_name */)
order by dbms_random.value
fetch first 1 rows only
)
);
To randomly select 20 rows I think you'd be better off selecting the lot of them randomly ordered and selecting the first 20 of that set.
Something like:
Select *
from (select *
from table
order by dbms_random.value) -- you can also use DBMS_RANDOM.RANDOM
where rownum < 21;
Best used for small tables to avoid selecting large chunks of data only to discard most of it.
Here's how to pick a random sample out of each group:
SELECT GROUPING_COLUMN,
MIN (COLUMN_NAME) KEEP (DENSE_RANK FIRST ORDER BY DBMS_RANDOM.VALUE)
AS RANDOM_SAMPLE
FROM TABLE_NAME
GROUP BY GROUPING_COLUMN
ORDER BY GROUPING_COLUMN;
I'm not sure how efficient it is, but if you have a lot of categories and sub-categories, this seems to do the job nicely.
-- Q. How to find Random 50% records from table ?
when we want percent wise randomly data
SELECT *
FROM (
SELECT *
FROM table_name
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum <= (select count(*) from table_name) * 50/100;

Resources