Hi I want to sort my file output using custom sort please help me.
Input File.
USA|123|Pin
UK|1243|Pin
Australia|555|Pin
Germany|1|Pin
Singapore|65|Pin
Germany|10|Pin
Here I want to show Row contains Germany in first position and rest of rows n same order as in file.
Output >>
Germany|1|Pin
Germany|10|Pin
USA|123|Pin
UK|1243|Pin
Australia|555|Pin
Singapore|65|Pin
Easy enough with two passes through the file.
grep '^Germany|' originalfile >newfile
grep -v '^Germany|' originalfile >>newfile
Related
I'm on mac terminal.
I have a txt file with one column with 9 IDs, allofthem.txt, where every ID starts with ¨rs¨:
rs382216
rs11168036
rs9296559
rs9349407
rs10948363
rs9271192
rs11771145
rs11767557
rs11
Also, I have another txt file, useful.txt, with those IDs that were useful in an analysis I did. It looks the same, one column with several rows of IDs, but with less IDS, only 5.
rs9349407
rs10948363
rs9271192
rs11
Problem:I want to generate a new txt file with the non-useful ones (the ones that appear in allofthem.txt but not in useful.txt).
I want to do the inverse of:
grep -f useful.txt allofthem.txt
I want to use some systematic way of deleting all the IDs in useful and obtain a file with the remaining ones. Maybe with awk or sed, but I can´t see it. Can you help me, please? Thanks in advance!
Desired output:
rs382216
rs11168036
rs9296559
rs11771145
rs11767557
-v option does the inverse for you:
grep -vxf useful.txt allofthem.txt > remaining.txt
-x option matches the whole line in allofthem.txt, not parts.
As #hek2mgl rightly pointed out, you need -F if you want to treat the content of useful.txt as strings and not patterns:
grep -vxFf useful.txt allofthem.txt > remaining.txt
Make sure your files have no leading or trailing white spaces - they could affect the results.
I recommend to use awk:
awk 'FNR==NR{patterns[$0];next} $0 in patterns' useful.txt allofthem.txt
Explanation:
FNR==NR is true as long as we are reading useful.txt. We create an index in patterns for every line of useful.txt. next stops further processing.
$0 in patterns runs, because of the previous next statement, on every line of allofthem.txt. It checks for every line of that file if it is a key in patterns. If that checks evaluates to true awk will print that line.
So, I have a file which contains the results of some calculations I've run in the past weeks. I've collected the results in a file which I intend to plot. It is basically a bunch of rows with the format "x" "y" "f(x,y)", like this:
1.7 4.7 -460.5338556921
1.7 4.9 -460.5368762353
1.7 5.5
However, some lines, exemplified by the last one, contain a blank space in the 3rd column, resulting from failed calculations. I'd still like to plot the viable points, but, as there are thousands of points (and therefore rows) that task just be accomplished easily by hand. I'd like to know how to make a script or program (I'd prefer a shell script, but I'll gladly go along with whatever works), which identifies those lines and deletes them. Does anyone know a way to do it?
awk '$3' <filename>
or better
awk 'NF > 2' <filename> # if in any entry in the column-3 happens to be zero
This will do the purpose!
The simplest form of grep command that should probably be understood by any shell these days:
grep -v '^[^[:space:]]*[[:space:]]*[^[:space:]]*[[:space:]]*$' <filename>
With grep:
grep ' .* [^ ]' file
or using ERE:
grep -E '\s\S+\s\S' file
I would to use:
perl -lanE 'print if #F==3 && /^[\d\s\.+-]+$/' file
will print only lines:
which contains 3 fields
and contains only numbers, spaces, and .+-
I do not know how you are going to plot. You would like a grep or awk solution and pipe all valid lines into your plotting application.
When you need to call a program for each set of values, you can skip the invalid lines when you are reading the values:
while read -r x y fxy; do
if [ -n "${fxy}" ]; then
myplotter "$x" "$y" "${fxy}"
fi
done < file
How would one remove the 3rd column for example from a csv file directly from the command line of the Mac terminal. I understand
cut -d',' -f3 data.csv
extracts the column info out directly into the terminal, but I want the 3rd column to be entirely removed from the dataset. How can I do this via the terminal?
Try
cut -d',' -f1-2,4- data.csv
All the examples seem a bit tricky if trying to delete multiple fields or if you want to see only several columns. As such I simply show only the columns I want. So if I only want to get columns 1,2 and 5 I'd do this:
cut -d, -f 1,2,5 hugeData.csv
NB: -d sets whatever the separator is in the file. In the example above it is a comma ,
My grep/regex is rusty, but if the number of columns is fixed, then you can simply make a grep statement for each quote pair (and its contents), and then replace with all but the third pair. That's a clunky of doing it; but nonetheless should get the job done.
man grep
and page down to the REGULAR EXPRESSIONS section for help with how to specify.
UPDATE 2:
OK I can't get this to work properly, it was working on one file but not another:
C:\cygwin\bin\sort -t"," -k5,2 c:\tmp\CI-tosplit.csv > c:\tmp\CI-DEN-sorted.csv
This seems to sort the data but it's ignoring the header, I thought the ,2 was saying start # line 2 which it does on one file but not another.
All I am trying to do is sort a csv file by column 5 and keep the header.
Thanks again for all the input.
UPDATED:
OK I've now switched to cygwin for this and I'm using the following command:
C:\cygwin\bin>sort -t"," -k8 c:\tmp\test.csv > c:\tmp\test-sorted.csv
-t to set deliminiter
-k for column number 12
This works, but I cannot get the header to stay in place.
Any input would be great, thanks guys.
I am trying to sort a CSV by a specified column using awk, but I cannot find anything that works.
sort -t, -k2 - u test.csv
Input file specified two times.
Please help, I am using Windows BTW.
It looks like you are using sort rather than awk. which has a -u option (no space) for unique, also the -t option should have a value (the seperator). Try:
sort -t, -k2 -u test.csv
If anybody else does find this useful, I found the easiest way to sort csv files by columns was using cygwin, the following works for me:
sort -t"," -k5,2 c:\tmp\test.csv > c:\tmp\test-sorted.csv
-t set the determiner
-K Columnn , 2 for line 2 if you have a header
input file
output file
Thanks for all the input guys.
I am currently working on a script which processes csv files, and one of the things it does is remove and keep note of duplicate lines in the files. My current method to do this is to run uniq once using uniq -d once to display all duplicates, then run uniq again without any options to actually remove the duplicates.
Having said that, I was wondering if it would be possible to perform this same function in one action instead of having to run uniq twice. I've found a bunch of different examples of using awk to remove duplicates out there, but as far as I know I have not been able to find any that both displayed the duplicates and removed them at the same time.
If anyone could offer advice or help for this I would really appreciate it though, thanks!
Here's something to get you started:
awk 'seen[$0]++{print|"cat>&2";next}1' file > tmp && mv tmp file
The above will print any duplicated lines to stderr at the same time as removing them from your input file. If you need more, tell us more....
In general, the size of you input shall be your guide. If you're processing GBs of data, you often have no choice other than relying on sort and uniq, because these tools support external operations.
That said, here's the AWK way:
If you input is sorted, you can keep track of duplicate items in AWK easily by comparing line i to line i-1 with O(1) state: if i == i-1 you have a duplicate.
If your input is not sorted, you have to keep track of all lines, requiring O(c) state, where c is the number of unique lines. You can use a hash table in AWK for this purpose.
This solution does not use awk but it does produce the result you need. In the command below replace sortedfile.txt with your csv file.
cat sortedfile.txt | tee >(uniq -d > duplicates_only.txt) | uniq > unique.txt
tee sends the output of the cat command to uniq -d.