To find latest entry for a particular record in the unix file - bash

I have a file which has multiple entries for a single record. For example:
abc~20160120~120
abc~20160125~150
xyz~20160201~100
abc~20160205~200
xyz~20160202~90
pqr~20160102~250
The first column is record name, second column is date and third column is the entry for that particular date.
Now what I want to display in my file is the latest entry for a particular record. This is how my output should look like
abc~20160205~200
xyz~20160202~90
pqr~20160102~250
Can anybody help with a shell script for the same? Keeping in mind that I have too many records which needs to be sorted first according to their record name and then taking out the latest one for each record according to date.

Sort the lines by record name and date reversed, than use the -u unique flag of sort to only output the first entry for each record:
sort -t~ -k1,2r < input-file | sort -t~ -k1,1 -u

Related

How can I extract a column's (called say "X") any cell value from a text file with multiple columns using Bash?

I have a huge file with 100 columns.
I am concerned with one column called 'Location'. I know for a fact that all rows of this column are same in value. I need to get that value through Bash.
Any thoughts on how to go about this?
If the column is always in the same location relative to other columns (say 10th) you could use
cut -d" " -f10
In this case you're assuming there's one whitespace between each column, you could change the delimiter to whatever separates between the columns.

Gnumeric Sort function

Can someone please direct me to a detailed explanation (link) of the Gnumeric sort function? The Gnumeric manual is abbreviated and has no examples. I haven't been able to find any appropriate info through the search engines and even Stackoverflow only has half a dozen questions on it which don't suit.
My problem is:
I have a table with rows of dates, names, and columns of data. (pretty straightforward stuff).
I want to sort ALL columns by the NAME column.
That is: keep each row intact for data but move them in the table up or down so that the order is alphabetic by name.
I can do this easily with Libercalc but prefer the feel and simplicity of Gnumeric, yet I have never been able to understand from the drop-down sort menu how to get this done. I can sort any column fine by itself, but can't seem to lock the other data in the row to be taken with it.
This is such a frequent function I'm surprised it's not made clearer in the drop-down menu. That is: Order by column x
The only way one can sort with Gnumeric, apparently, is to move the key column (i.e. in my case the NAME column) to be the left-most column (column A) in the table, and then sort, subsequently moving the columns back into their required format (date and time in first column) as I want it. This seems very clumsy to me and I wondered if there was an easier way of ordering a table in any format (e.g. just as it is imported from the csv file) by simply selecting the column to sort wherever it is in the table, as can be done in LiberCalc?
1) You need to select ALL the columns you want to sort:
menu > data > sort
2) Keep the column with the NAMEs to be sorted, and remove the rest of the columns in:
sort specification

Pentaho Data Integration (DI) Get Last File in a Directory of a SFTP Server

I am doing a transformation on Pentaho Data Integration and I have a list of files in a directory of my SFTP server. This files are named with FILE_YYYYMMDDHHIISS.txt format, my directory looks like that:
mydirectory
FILE_20130701090000.txt
FILE_20130701170000.txt
FILE_20130702090000.txt
FILE_20130702170000.txt
FILE_20130703090000.txt
FILE_20130703170000.txt
My problem is that I need get the last file of this list in accordance of its creation date, to pass it to other transformation step...
How can I do this in Pentaho Data Integration?
In fact this is quite simple because your file names can be sorted textually, and the max in the sort list will be your most recent file.
Since a list of files is likely short, you can use a Memory Group by step. A grouping step needs a separate column by which to aggregate. If you only have column and you want to find the max in the entire set, you can add a grouping column with an Add Constants step, and configure it to add a column with, say an integer 1 in every row.
Configure your Memory Group by to group on the column of 1s, and use the file name column as the subject. Then simply select the Maximum grouping type. This will produce a single row with your grouping column, the file name field removed and the aggregate column containing your max file name. It would look something like this:

Bash script - Construct a single line out of many lines having duplicates in a single column

I have an instrumented log file that have 6 lines of duplicated first column as below.
//SC001#1/1/1#1/1,get,ClientStart,1363178707755
//SC001#1/1/1#1/1,get,TalkToSocketStart,1363178707760
//SC001#1/1/1#1/1,get,DecodeRequest,1363178707765
//SC001#1/1/1#1/1,get-reply,EncodeReponse,1363178707767
//SC001#1/1/1#1/2,get,DecodeRequest,1363178708765
//SC001#1/1/1#1/2,get-reply,EncodeReponse,1363178708767
//SC001#1/1/1#1/2,get,TalkToSocketEnd,1363178708770
//SC001#1/1/1#1/2,get,ClientEnd,1363178708775
//SC001#1/1/1#1/1,get,TalkToSocketEnd,1363178707770
//SC001#1/1/1#1/1,get,ClientEnd,1363178707775
//SC001#1/1/1#1/2,get,ClientStart,1363178708755
//SC001#1/1/1#1/2,get,TalkToSocketStart,1363178708760
Note: , (comma) is the delimiter here
Like wise there are many duplicate first column values (IDs) in the log file (above example having only two values (IDs); //SC001#1/1/1#1/1 and //SC001#1/1/1#1/2) I need to consolidate log records as below format.
ID,ClientStart,TalkToSocketStart,DecodeRequest,EncodeReponse,TalkToSocketEnd,ClientEnd
//SC001#1/1/1#1/1,1363178707755,1363178707760,1363178707765,1363178707767,1363178707770,1363178707775
//SC001#1/1/1#1/2,1363178708755,1363178708760,1363178708765,1363178708767,1363178708770,1363178708775
I suppose to have a bash script for this exercise and appreciate an expert support for this. Hope there may be a sed or awk solution which is more efficient.
Thanks much
One way:
sort -t, -k4n,4 file | awk -F, '{a[$1]=a[$1]?a[$1] FS $NF:$NF;}END{for(i in a){print i","a[i];}}'
sort command sorts the file on the basis of the last(4th) column. awk takes the sorted input and forms an array where the 1st field is the key, and the value is combination of values of the last column.

How to sort file rows with vi?

I have to edit multiple files with multiple rows, and also everything is in three columns, like this:
#file
save get go
go save get
rest place reset
Columns are separated with tab. Is there any possible way to sort rows based on second or third column using vi?
sort by the 2nd col:
:sor /\t/
sort by the 3rd col:
:sor /\t[^\t]*\t/
Second column:
:sort /\%9c/
Third column:
:sort /\%16c/
\%16c means "column 16".
Hi light the rows you want to sort with "V" command
Use a bash command with "!" to work on the selection, like:
!sort -k 10
Where the number is the column number where your second (sort) column starts.
vi will replace the selection with the output of the sort command - which is given the original selection.
You can specify a pattern for sort. For example:
sort /^\w*\s*/
Will sort on the second column (the first thing to sort after matching the pattern).
Likewise
sort /^\w*\s*\w*\s*/
Should sort on the third column.
delimit the column using some char here I have | symbol as delimiter, once did with that you can use below command to sort specific column use -n if u want to sort numeric and its working on some version of vi and not on ubuntu vi :(
/|.*|/ | sort

Resources