I have a file containing file paths and filenames that I want to sort starting at the end of the string.
My file contains a list such as below:
/Volumes/Location/Workers/Andrew/2015-08-12_Andrew_PC/DOCS/3177109.doc
/Volumes/Location/Workers/Andrew/2015-09-17_Andrew_PC/DOCS/2130419.doc
/Volumes/Location/Workers/Bill/2016-03-17_Bill_PC/DOCS/1998816.doc
/Volumes/Location/Workers/Charlie/2016-07-06_Charlie_PC/DOCS/4744123.doc
I want to sort this list such that the filenames will be sequential, this will help find duplicates based on filename regardless of path.
The list should appear like this:
/Volumes/Location/Workers/Bill/2016-03-17_Bill_PC/DOCS/1998816.doc
/Volumes/Location/Workers/Andrew/2015-09-17_Andrew_PC/DOCS/2130419.doc
/Volumes/Location/Workers/Andrew/2015-08-12_Andrew_PC/DOCS/3177109.doc
/Volumes/Location/Workers/Charlie/2015-07-06_Charlie_PC/DOCS/4744128.doc
Here's a way to do this:
sed -e 's|^.*/\(.*\)$|\1\t\0|' list.txt | sort | cut -f 2-
This uses sed to insert a copy of the filename to the beginning of each line so that we can sort the list with sort. Then we remove the stuff that we added in the first step.
This should work:
sort -t/ -k7 input_file
This will sort based on dynamic last field which is separated by /.
First it will append last field to the start of the line and then sort. First field which is appended earlier is removed by second awk.
awk -F'/' '{ $0= $NF " " $0;print $0 |"sort -k1"}' fil |awk '{print $2}'
/Volumes/Location/Workers/Bill/2016-03-17_Bill_PC/DOCS/1998816.doc
/Volumes/Location/Workers/Andrew/2015-09-17_Andrew_PC/DOCS/2130419.doc
/Volumes/Location/Workers/Andrew/2015-08-12_Andrew_PC/DOCS/3177109.doc
/Volumes/Location/Workers/Charlie/2016-07-06_Charlie_PC/DOCS/4744123.doc
Say I have file - a.csv
ram,33,professional,doc
shaym,23,salaried,eng
Now I need this output (pls dont ask me why)
ram,doc,doc,
shayam,eng,eng,
I am using cut command
cut -d',' -f1,4,4 a.csv
But the output remains
ram,doc
shyam,eng
That means cut can only print a Field just one time. I need to print the same field twice or n times.
Why do I need this ? (Optional to read)
Ah. It's a long story. I have a file like this
#,#,-,-
#,#,#,#,#,#,#,-
#,#,#,-
I have to covert this to
#,#,-,-,-,-,-
#,#,#,#,#,#,#,-
#,#,#,-,-,-,-
Here each '#' and '-' refers to different numerical data. Thanks.
You can't print the same field twice. cut prints a selection of fields (or characters or bytes) in order. See Combining 2 different cut outputs in a single command? and Reorder fields/characters with cut command for some very similar requests.
The right tool to use here is awk, if your CSV doesn't have quotes around fields.
awk -F , -v OFS=, '{print $1, $4, $4}'
If you don't want to use awk (why? what strange system has cut and sed but no awk?), you can use sed (still assuming that your CSV doesn't have quotes around fields). Match the first four comma-separated fields and select the ones you want in the order you want.
sed -e 's/^\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)/\1,\4,\4/'
$ sed 's/,.*,/,/; s/\(,.*\)/\1\1,/' a.csv
ram,doc,doc,
shaym,eng,eng,
What this does:
Replace everything between the first and last comma with just a comma
Repeat the last ",something" part and tack on a comma. VoilĂ !
Assumptions made:
You want the first field, then twice the last field
No escaped commas within the first and last fields
Why do you need exactly this output? :-)
using perl:
perl -F, -ane 'chomp($F[3]);$a=$F[0].",".$F[3].",".$F[3];print $a."\n"' your_file
using sed:
sed 's/\([^,]*\),.*,\(.*\)/\1,\2,\2/g' your_file
As others have noted, cut doesn't support field repetition.
You can combine cut and sed, for example if the repeated element is at the end:
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/&&,/'
Output:
ram,doc,doc,
shaym,eng,eng,
Edit
To make the repetition variable, you could do something like this (assuming you have coreutils available):
n=10
rep=$(seq $n | sed 's:.*:\&:' | tr -d '\n')
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/'"$rep"',/'
Output:
ram,doc,doc,doc,doc,doc,doc,doc,doc,doc,doc,
shaym,eng,eng,eng,eng,eng,eng,eng,eng,eng,eng,
I had the same problem, but instead of adding all the columns to awk, I just used (to duplicate the 2nd column):
awk -v OFS='\t' '$2=$2"\t"$2' # for tab-delimited files
For CSVs you can just use
awk -F , -v OFS=, '$2=$2","$2'
I am trying to learn bash/shell *nix commands /scripting.
So rather than writing a python program, I thought of trying it out using bash/awk etc but am having a hard time.
I have a huge text (its actually csv )file
id_1, id_2, some attributes.
I want to sort this file based on id2?
how do i do this?
Thanks
Use the --key option for sort.
For example, the following sorts input.csv on the second field (using comma as a field separator) and writes the output to output.csv.
sort --key=2,2 -t',' input.csv > output.csv
p.s. Don't forget to use the -n option if you're doing a numerical sort.
For more info, see the man page for sort.
You can use -k option of sort(1)
-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)
sort -t, -k2 filename.csv
I don't have a shell to verify, but basically you need to specify the separator and the sort key
checkout the command cut:
cat file.cvs | cut -d";" -f 2 | sort
I assumed your csv is semi-colon separated, but you can change it.
Save into a different name:
cat file.cvs | cut -d";" -f 2 | sort > newfile.txt
I am looking for the easiest way to solve this problem. I have a huge data set that i cannot load into excel of this type of format
This is a sentence|10
This is another sentence|5
This is the last sentence|20
What I want to do is sort this from least to greatest based on the number.
cat MyDataSet.txt | tr "|" "\t" | ???
Not sure what the best way is to do this, I was thinking about using awk to switch the columns and the do a sort, but I was having trouble doing it.
Help me out please
sort -t\| -k +2n dataset.txt
Should do it. field separator and alternate key selection
You usually don't need cat to send the file to a filter. That said, you can use the sort filter.
sort -t "|" -k 2 -n MyDataSet.txt
This sorts the MyDataSet.txt file using the | character as field separator and sorting numerically according to the second field (the number).
have you tried sort -n
$ sort -n inputFile
This is another sentence|5
This is a sentence|10
This is the last sentence|20
you could switch the columns with awk too
$ awk -F"|" '{print $2"|"$1}' inputFile
10|This is a sentence
5|This is another sentence
20|This is the last sentence
combining awk and sort:
$ awk -F"|" '{print $2"|"$1}' inputFile | sort -n
5|This is another sentence
10|This is a sentence
20|This is the last sentence
per comments
if you have numbers in the sentence
$ sort -n -t"|" -k2 inputFile
This is another sentence|5
This is a sentence|10
This is the last sentence|20
this is a sentence with a number in it 2|22
and of course you could redirect it to a new file:
$ awk -F"|" '{print $2"|"$1}' inputFile | sort -n > outFile
Try this sort command:
sort -n -t '|' -k2 file.txt
Sort by number, change the separator and grab the second group using sort.
sort -n -t'|' -k2 dataset.txt
I have a text file containing ~300k rows. Each row has a varying number of comma-delimited fields, the last of which is guaranteed numerical. I want to sort the file by this last numerical field. I can't do:
sort -t, -n -k 2 file.in > file.out
as the number of fields in each row is not constant. I think sed, awk maybe the answer, but not sure how. E.g:
awk -F, '{print $NF}' file.in
gives me the last column value, but how to use this to sort the file?
Use awk to put the numeric key up front. $NF is the last field of the current record. Sort. Use sed to remove the duplicate key.
awk -F, '{ print $NF, $0 }' yourfile | sort -n -k1 | sed 's/^[0-9][0-9]* //'
vim file.in -c '%sort n /.*,\zs/' -c 'saveas file.out' -c 'q'
Maybe reverse the fields of each line in the file before sorting? Something like
perl -ne 'chomp; print(join(",",reverse(split(","))),"\n")' |
sort -t, -n -k1 |
perl -ne 'chomp; print(join(",",reverse(split(","))),"\n")'
should do it, as long as commas are never quoted in any way. If this is a full-fledged CSV file (in which commas can be quoted with backslash or space) then you need a real CSV parser.
Perl one-liner:
#lines=<STDIN>;foreach(sort{($a=~/.*,(\d+)/)[0]<=>($b=~/.*,(\d+)/)[0]}#lines){print;}
I'm going to throw mine in here as an alternative (and I couldn't get awk to work) :)
sample file:
Call of Doody 1322
Seam the Ripper 1329
Mafia Bots 1 1109
Chicken Fingers 1243
Batup Light 1221
Hunter F Tomcat 1140
Tober 0833
code:
for i in `sed -e 's/.* \(\d\)*/\1/' file.txt | sort`; do grep $i file.txt; done > file_sort.txt
Python one-liner:
python -c "print ''.join(sorted(open('filename'), key=lambda l: int(l.split(',')[-1])))"