compare two files then put matched records into one file and unmatched records into another file - dfsort

I have requirement to compare two files then matched records one file and unmatched into another file using joinkeys in jcl
I'm not sure whether this is the correct one or not. Could you, please, help me?
JOINKEYS FILE=F1,FIELDS=(1,18,A)
JOINKEYS FILE=F2,FIELDS=(1,18,A)
JOIN UNPAIRED,F1,F2
REFORMAT FIELDS=(F1:1,100,F2:1,503,?)
SORT FIELDS=COPY
OUTFIL FNAMES=NOMATCH,INCLUDE=(604,1,SS,EQ,C'1,2'),
IFTHEN=(WHEN=(604,1,CH,EQ,C'1'),
BUILD=(1:1,258,260:264,1,263:334,2)),
IFTHEN=(WHEN=NONE,
BUILD=(604,603))
OUTFIL FNAMES=MATCH,SAVE,
BUILD=(1:1,258,260:264,1,263:334,2)

Related

How can I sort the 2nd column from 100 CSV files with EmEditor?

I have over 100 csv files where I need to sort the all the data in the 2nd column and I have some trouble figuring out how.
I was able to use this guide to delete specific columns
How can you delete the first and fifth columns from 100 CSV files with EmEditor?
And also tried using some of the sort commands from the website without any luck.
The Advanced Sort feature can sort all opened documents.
Open Sort | Advanced Sort... and change the Column dropdown to the 2nd column. Change How to Sort to the method you need. Then test that this sort works on a single CSV document.
Now open all 100 files. This time, check Apply to all documents in the group in Advanced Sort and click Sort Now.

Compare 2 csv file using shell script and print the output in 3rd file

I am learning shell script and by using it trying to build a framework for my team for their testing purpose. Thus need your help in something.
Overview: I am trying to extract the aggregated values from hive through my queries using shell script and storing the result in a separate file, let's say File1.csv.
Now I wanted to compare above csv file with another csv file File2.csv using shell script and print the result as PASS(if records are matching) or FAIL(if records are not matching) row wise into the third file, let's say output.txt
Note: First we need to sort the records into File1.csv and then compare it with File2.csv, following with store the result PASS/FAIL row wise into output.txt
Format of File1.csv
Postcode Location InnerLocation Value_% Volume_%
XYZ London InnerLondon 6.987 2.561
ABC NY High Street 3.564 0.671
DEF Florida Miami 8.129 3.178
Quick help will be appreciated. Thanks in Advance.
You have two sorted text files and want to see which lines are different. There is nothing in your question which would make the problem CSV specific.
A convenient tool for this type of task would be sdiff.
sdiff -s File[12].csv
The -s option ensures that you see only different lines, but have a look at the sdiff man page: Maybe you want also to add one of the options dealing with white space.
If you need to go into more detail and, for example, show not just different CSV lines, but out which field in the line is different, and if there are really general CSV files, you really should use a CSV parser and not do it in shell scripts. Parsing a CSV file from a shell script really works if you know for sure that only a subset of all features allowed for CSV files are actually used.

Pentaho Data Integration (DI) Get Last File in a Directory of a SFTP Server

I am doing a transformation on Pentaho Data Integration and I have a list of files in a directory of my SFTP server. This files are named with FILE_YYYYMMDDHHIISS.txt format, my directory looks like that:
mydirectory
FILE_20130701090000.txt
FILE_20130701170000.txt
FILE_20130702090000.txt
FILE_20130702170000.txt
FILE_20130703090000.txt
FILE_20130703170000.txt
My problem is that I need get the last file of this list in accordance of its creation date, to pass it to other transformation step...
How can I do this in Pentaho Data Integration?
In fact this is quite simple because your file names can be sorted textually, and the max in the sort list will be your most recent file.
Since a list of files is likely short, you can use a Memory Group by step. A grouping step needs a separate column by which to aggregate. If you only have column and you want to find the max in the entire set, you can add a grouping column with an Add Constants step, and configure it to add a column with, say an integer 1 in every row.
Configure your Memory Group by to group on the column of 1s, and use the file name column as the subject. Then simply select the Maximum grouping type. This will produce a single row with your grouping column, the file name field removed and the aggregate column containing your max file name. It would look something like this:

Bash script - Construct a single line out of many lines having duplicates in a single column

I have an instrumented log file that have 6 lines of duplicated first column as below.
//SC001#1/1/1#1/1,get,ClientStart,1363178707755
//SC001#1/1/1#1/1,get,TalkToSocketStart,1363178707760
//SC001#1/1/1#1/1,get,DecodeRequest,1363178707765
//SC001#1/1/1#1/1,get-reply,EncodeReponse,1363178707767
//SC001#1/1/1#1/2,get,DecodeRequest,1363178708765
//SC001#1/1/1#1/2,get-reply,EncodeReponse,1363178708767
//SC001#1/1/1#1/2,get,TalkToSocketEnd,1363178708770
//SC001#1/1/1#1/2,get,ClientEnd,1363178708775
//SC001#1/1/1#1/1,get,TalkToSocketEnd,1363178707770
//SC001#1/1/1#1/1,get,ClientEnd,1363178707775
//SC001#1/1/1#1/2,get,ClientStart,1363178708755
//SC001#1/1/1#1/2,get,TalkToSocketStart,1363178708760
Note: , (comma) is the delimiter here
Like wise there are many duplicate first column values (IDs) in the log file (above example having only two values (IDs); //SC001#1/1/1#1/1 and //SC001#1/1/1#1/2) I need to consolidate log records as below format.
ID,ClientStart,TalkToSocketStart,DecodeRequest,EncodeReponse,TalkToSocketEnd,ClientEnd
//SC001#1/1/1#1/1,1363178707755,1363178707760,1363178707765,1363178707767,1363178707770,1363178707775
//SC001#1/1/1#1/2,1363178708755,1363178708760,1363178708765,1363178708767,1363178708770,1363178708775
I suppose to have a bash script for this exercise and appreciate an expert support for this. Hope there may be a sed or awk solution which is more efficient.
Thanks much
One way:
sort -t, -k4n,4 file | awk -F, '{a[$1]=a[$1]?a[$1] FS $NF:$NF;}END{for(i in a){print i","a[i];}}'
sort command sorts the file on the basis of the last(4th) column. awk takes the sorted input and forms an array where the 1st field is the key, and the value is combination of values of the last column.

List of names and their numbers needed to be sorted .TXT file

I have a list of names (never over 100 names) with a value for each of them, either 3 or 4 digits.
john2E=1023
mary2E=1045
fred2E=968
And so on... They're formatted exactly like that in the .txt file. I have Python and Excel, also willing to download whatever I need.
What I want to do is sort all the names according to their values in a descending order so highest is on top. I've tried to use Excel by replacing the '2E=' with ',' so I can have the name,value then important the data so each are in separate columns but I still couldn't sort them any other way than A to Z.
Help is much appreciated, I did take my time to look around before posting this.
Replace the "2E=" with a tab character so that the data is displayed in excel in two columns. Then sort on the value column.

Resources