When I use 'join' to merge two sorted files, the result is unexpected.
here is the example:
//file a.bat
12
123
456
13421
123456
//file b.bat
12
123
5432
123456
execute the command:
$ join -1 1 -2 1 -o '1.1 2.1' a.dat b.dat
12 12
123 123
where 123456 is ignored! In fact, i did try other files, some of them also didn't get full results. why did it happen?
Your input needs to be lexically sorted in order for join to work correctly. Your input is numerically sorted, which is wrong. All strings which start with 1 should be before all strings which start with 2, etc.
Related
I have a very simple text file of 3 fields, each is separated by a space, like following:
123 15 0
123 14 0
345 12 0
345 11 0
And I issued a sort command to sort by the first column: sort -k 1 myfile. But it does not sort just by the first column. It sort by the whole line and I get the following result:
123 14 0
123 15 0
345 11 0
345 12 0
Is there anything wrong on my command or file?
You need to use:
sort -k 1,1 -s myfile
if you want to sort only on the first field. This syntax specifies the start and end field for sorting. sort -k 1 means to sort starting with the first field through to the end of the line. To ensure the lines are kept in the same order with respect to the input where the sort key is the same, you need to use a stable sort with the -s flag (GNU).
See this from the sort(1) man page:
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where
F is a field number and C a character position in the field; both are
origin 1, and the stop position defaults to the line's end.
and the info page:
The --stable (-s) option disables this last-resort comparison so that
lines in which all fields compare equal are left in their original relative
order.
I have a very simple text file of 3 fields, each is separated by a space, like following:
123 15 0
123 14 0
345 12 0
345 11 0
And I issued a sort command to sort by the first column: sort -k 1 myfile. But it does not sort just by the first column. It sort by the whole line and I get the following result:
123 14 0
123 15 0
345 11 0
345 12 0
Is there anything wrong on my command or file?
You need to use:
sort -k 1,1 -s myfile
if you want to sort only on the first field. This syntax specifies the start and end field for sorting. sort -k 1 means to sort starting with the first field through to the end of the line. To ensure the lines are kept in the same order with respect to the input where the sort key is the same, you need to use a stable sort with the -s flag (GNU).
See this from the sort(1) man page:
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where
F is a field number and C a character position in the field; both are
origin 1, and the stop position defaults to the line's end.
and the info page:
The --stable (-s) option disables this last-resort comparison so that
lines in which all fields compare equal are left in their original relative
order.
I have 2 text files as below
A.txt (with 2 rows):
abc-1234
tik-3456
B.txt (with 4 rows)
123456
234567
987
12
I want to combine these 2 to get the below file in CSV format:
column-1 column-2
abc-1234 123456
tik-3456 234567
987
12
I am trying below command. However, not achieving the above result.
paste -d "," A.txt B.txt > C.csv
It is giving below output:
abc-1234
,123456
tik-3456,234567
,987
,12
Can anyone please let me know, what I am missing here?
In linux we have one utility that does one think very good. So:
paste merges files
column with -t creates tables
The following:
paste -d',' /tmp/1 /tmp/2 | column -t -N 'column-1,column-2' -s',' -o' '
outputs the desired result.
I want to compare two files only by their first column.
My first looks like this:
0009608a4138a8e7 hdisk26 altinst_rootvg
000f7d4a8234a675 hdisk12 vgdbf
000f7d4a8234d5c9 hdisk22 vgarcbkp
My second file looks like this:
000f7d4a8234a675 hdiskpower64 [Lun_vgdbf]
000f7d4a8234d5c9 hdiskpower61 [Lun_vgarcbkp]
This is the output I would like to generate:
0009608a4138a8e7 hdisk26 altinst_rootvg
000f7d4a8234a675 hdisk12 vgdbf hdiskpower64 [Lun_vgdbf]
000f7d4a8234d5c9 hdisk22 vgarcbkp hdiskpower61 [Lun_vgarcbkp]
I wonder why diff does not support positional compare.
Something like diff -y -p1-17 file1 file2. Any idea?
You can use join to produce your desired output :
join -a 1 file1 file2
The -a 1 option states to output lines from the first file which have no correspondances in the second, so this assumes the first file contains every id that is present in the second.
It also relies on the files being sorted on their first file, which could be the case according to your sample data. If it's not you will need to sort them beforehand (the join command will warn you about your files not being sorted).
Sample execution :
$ echo '1 a b
> 2 c d
> 3 e f' > test1
$ echo '2 9 8
> 3 7 6' > test2
$ join -a 1 test1 test2
1 a b
2 c d 9 8
3 e f 7 6
Ideone test with your sample data.
I have a very simple text file of 3 fields, each is separated by a space, like following:
123 15 0
123 14 0
345 12 0
345 11 0
And I issued a sort command to sort by the first column: sort -k 1 myfile. But it does not sort just by the first column. It sort by the whole line and I get the following result:
123 14 0
123 15 0
345 11 0
345 12 0
Is there anything wrong on my command or file?
You need to use:
sort -k 1,1 -s myfile
if you want to sort only on the first field. This syntax specifies the start and end field for sorting. sort -k 1 means to sort starting with the first field through to the end of the line. To ensure the lines are kept in the same order with respect to the input where the sort key is the same, you need to use a stable sort with the -s flag (GNU).
See this from the sort(1) man page:
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where
F is a field number and C a character position in the field; both are
origin 1, and the stop position defaults to the line's end.
and the info page:
The --stable (-s) option disables this last-resort comparison so that
lines in which all fields compare equal are left in their original relative
order.