Incremental import using Sqoop2 - sqoop2

I would like to import data from MySQL table into HDFS. I have everything configured and I am able to create simple job in sqoop-shell that is copying data. However I would like to copy each time only new records, but I am not sure how to achieve this. When I create job there is a parameter named "check column" and I have columns like ID or eventTimestamp that seem to be suitable there. However, in such case I should enter also "last value" then. Do I have to manage this last value by myself and each time create new job with new "last value"? Why in such case create a job if it is used only once and then has to be recreated? Is it not possible for Sqoop to manage this, by storing each time new "last value" and import only new records? Moreover, why I have this error message when I enter anything as "last value": "Size of input exceeds allowance for this input field. Maximal allowed size is -1"?

Concerning the problem with "last value" I have added a bug:
https://issues.apache.org/jira/browse/SQOOP-2640
It has fixed status now, so in release 1.99.7 it should be fine.

Related

You cannot import data to this record because the record was updated in Microsoft Dynamics 365 after it was exported

I'm having a strange issue with exporting/updating/importing data in our on-premises Dynamics 365 (8.2). I was doing a bulk update of over 3000 records by exporting the records to an Excel workbook, updating the data in a specific column, then importing the workbook back into CRM. It worked for all of the records except 14 of them, which according to the import log was for the reason that "You cannot import data to this record because the record was updated in Microsoft Dynamics 365 after it was exported." I looked at the Audit History of those 14 records, and find that they have not been modified in any way for a good two months. Strangely, the modified date of the most recent Audit History entry for ALL 14 records is the exact same date/time.
We have a custom workflow that runs once every 24 hours on a schedule that automatically updates the Age field of our Contact records based on the value in the respective Birthday field. For these 14 records, ALL of them have a birthday of November 3rd, but in different years. What that means though is that the last modification that was done to them was on 11/3/2019 via the workflow. However, I cannot understand why the system "thinks" that this should prevent a data update/import.
I am happy to provide any additional information that I may have forgotten to mention here. Can anyone help me, please?
While I was not able to discover why the records would not update, I was able to resolve the issue. Before I share what I did to update the records, I will try and list as many things as I can remember that I tried that did not work:
I reworked my Advanced Find query that I was using to export the records that needed updated to return ONLY those records that had actual updates. Previously, I used a more forgiving query that returned about 30 or so records, even though I knew that only 14 of them had new data to import. I did so because the query was easier to construct, and it was no big deal to remove the "extra" records from the workbook before uploading it for import. I would write a VLOOKUP for the 30-something records, and remove the columns for which the VLOOKUP didn't find a value in my dataset, leaving me with the 14 that did have new data. After getting the error a few times, I started to ensure that I only exported the 14 records that needed to be updated. However, I still got the error when trying to import.
I tried formatting the (Do Not Modify) Modified On column in the exported workbook to match the date format in the import window. On export of the records, Excel was formatting this column as m/d/yyyy h:mm while the import window with the details on each successful and failed import showed this column in mm/dd/yyyy hh:mm:ss format. I thought maybe if I matched the format in Excel to the import window format it might allow the records to import. It did not.
I tried using some Checksum verification tool to ensure that the value in the (Do Not Modify) Checksum column in the workbook wasn't being written incorrectly or in an invalid format. While the tool I used didn't actually give me much useful information, it did recognize that the values were checksum hashes, so I supposed that was helpful enough for my purposes.
I tried switching my browser from the new Edge browser (the one that uses Chromium) to just IE as suggested on the thread provided by Arun. However, it did not resolve the issue.
What ended up working in the end was Arun's suggestion to just do some arbitrary edit to all the records and exporting them afterward. This was okay to do for just 14 records, but I'm still slightly vexed as this wouldn't really be a feasible solution of it were, say, a thousand records that were not importing. There was no field that ALL 14 Contact records had in common that I could just bulk edit, and bulk edit back again. What I ended up doing was finding a text field on the Contact Form that did not have any value in it for any of the records, putting something in that field, then going to each record in turn and removing the value (since I don't know of a way to "blank out" or clear a text field while bulk editing. Again, this was okay for such a small number of records, but if it were to happen on a larger number, I would have to come up with an easier way to bulk edit and then bulk "restore" the records. Thanks to Arun for the helpful insights, and for taking the time to answer. It is highly appreciated!
When you first do an import of an entity (contacts for example) you see that your imported excel contains 3 hidden columns (Do Not Modify) Contact, (Do Not Modify) Row Checksum, (Do Not Modify) Modified On.
When you want to create new instances of the entity, just edit the records and clear the content of the 3 hidden colums.
This error will happen when there is a checksum difference or rowversion differs from the exported record vs the record in database.
Try to do some dummy edit for those affected records & try to export/reimport again.
I could think of two reasons - either the datetime format confusing the system :( or the the community thread explains a weird scenario.
Apparently when importing the file, amending and then saving as a different file type alters the spreadsheet's parameters.
I hence used Internet Explorer since when importing the file, the system asks the user to save as a different format. I added .xlsx at the end to save it as the required format. I amended the file and imported it back to CRM..It worked
For me it turned out to be a different CRM time zone setting for the exporter and importer. Unfortunately this setting doesn't seem to be able to be changed by an administrator via the user interface.
The setting is available for each user under File->Options->Time Zone.

Question on changing the APEX data load wizard to give default target columns

Within the data load wizard that comes with APEX 18.1, after you choose your csv file to be uploaded you are offered a "TARGET COLUMN" drop down LOV which defaults to "DO NOT LOAD". It is possible to tell APEX which values you want in this LOV. I have done this.
My issue is, that this is quite laborious. Your users will not necessarily know which value you want them to pick from the LOV to map the related column when they are using a csv file with no header. As they are going to be doing.
Does anyone know how to change the "DO NOT LOAD" value in the LOV to another value? If I could get it to default to a column of my choosing, this would be great. Alternatively, there's a "SOURCE COLUMN" field in the wizard.
Getting the "SOURCE COLUMN" field to denote which column I wish users to map to the LOV value would be something also. Has anyone faced this before? Does anyone know if it is possible to do what I am suggesting as a work around? Thanks for looking and for your thoughts.
APEX does the column mapping automatically by checking the name of the column in the csv(the first row is names usually).
So if the names of the columns match in the table and the csv, it will connect them by itself. What you can also do is set column aliases.
If you want to edit the existing data load, you can go to Shared Components- Data Load definitions and pick the one you are using in there.
Then you can set column aliases there. But afaik you can only do one alias per column there.

Dynamics CRM Option Set Duplicated on Solution Import

Scenario, in error a new option set label of "Update" was added with value 100,000,000 to a field in the default managed solution in Production.
Identical label with value 866,100,002 was added to same field in the unmanaged Development Solution, when latest round of export and import occurred a duplicate option set label for "Update" was added to the managed solution in Production.
I now have data in the tables with both values and duplicate labels in the managed solution.
Question: how to unwind this mess - can I delete the label relating to value 100,000,000 seeing how it will just get duplicated upon the next solution import?
What happens to the data in the Database - is there a way to update the recorded values of 100,000,000 to a correct 866,100,002?
Do an Advanced Find for records having optionset value 100,000,000, export to excel (with selected option to reimport). Bulk update the exported records to the correct 866,100,002 optionset value & reimport it. This is first thing, it will correct the data.
Then you can delete the dupe label in picklist & monitor for future imports.
Test it in lower environments. Take solution/database backup as a precaution.

Get data's source in kettle

When I use kettle , I was wandering how to get a table column's source column. Just for an example , after I have merged two tables into one table based on primary key already , Given any column in output table , I could judge whether table it belongs to and get the original column name in original table. Thank you for helping and sorry for my poor English...
http://i.stack.imgur.com/xoR0s.png
When I was given any field in table3 (suppose a field named A in table3) , I could know where it comes from without the graphical view (from java code or other ways) , like the original table name (here are input1 or input2) and the original column name(maybe B in input1 , but represents A in table3). Besides I use mysql.
There are a couple of ways to do this:
1) Manually. If you right-click on the output step and choose Show Output fields (or whatever it's called), you will see the "origin step" for each of the outgoing fields. You can do the same for input fields. Then you can trace them back to those origin steps, and repeat the process of viewing the input fields at those steps, and seeing those fields' origins, and so on. This is probably not what you're looking for.
2) With code. Prior to 6.0, you'd need to programmatically perform the same operations as are listed in option 1 above. In 6.0 there is the Data Lineage capability, which offers the LineageClient API that can find the origin fields for the specified output fields. For more information see my blog post describing the Data Lineage capability. Also I put a Gremlin Console in the PDI Marketplace, to make the use of LineageClient easier (and you can visually see the lineage graph too).

FileMaker Pro 10 - How to have cursor be in first field on new record?

When I start a new record in FileMaker is there a way for the cursor to automatically be in the first field so I can just start typing? And to specify which field that should be?
Background:
I'm trying to set up a FileMaker layout for use with a barcode scanner. So someone can scan in one record (there are two fields on the layout). After scanning it should go to a new record and place the cursor in the scan field so it's ready to scan again.
I put a trigger on the scan field to run a script to create a new record after hitting the enter key in the one field. After the new record statement I put a "go to field" statement but it doesn't seem to do anything. It always goes to the other field instead of the scan field.
Updates
I just tried using a "set selection" statement in the script instead of "go to field" (I also tried using both one after the other). Neither of those seemed to work.
I tried changing the tab order but it still goes to the other field instead of the scan field.
The default behavior when you create a new record is to go to the first field in the tab order, so this should work without you having to do anything.
The fact that it's not sounds to me like there might be a script trigger, either at the layout or field level, that's interfering with this or exiting the record. Try turning on the script debugger, create a new record, and see if a script runs.
I ended up doing a workaround.
I have just one field on the scan layout. After a user scans, a script fires which changes the layout to one which shows all the information about the scanned record that was just entered. It pauses for 1 second and then goes back to the scan layout for the next scan.
You may be able to set your scanner to submit pre and post data keystrokes. We routinely use this to invoke a filemaker script pre-scan to "go to field", enter the data, then post-scan to "perform a find"
I eventually devised a decent workaround with a second trigger script which seems to have no disadvantages.
However the tabs are set, I have found that an OnObjectExit trigger or an OnObjectSave trigger set up on the scan field will perform a script to process the scanned data, but the the step to return the cursor to the scan field will NOT work, probably because that field is still active in some way.
Rather than banging my head against I brick wall, I decided to set up an OnObjectExit trigger on the field to which the cursor is always deflected. This fires off a script to clear the scan field and then return the cursor to the scan field, ready for the next scan. This way, the cursor DOES arrive back where I want it.
Perhaps rather inelegant, but it works fine!

Resources