Keep text file rows by line number in bash [duplicate] - bash

This question already has answers here:
Print lines indexed by a second file
(4 answers)
Closed 8 years ago.
Have two files. the first file (called k.txt) looks like this
lineTTY
lineRTU
lineERT
.....furtherline like this...
The other file (called w.txt) contains indices of rows which shall be kept. It looks like:
2
9
12
The indices in the latter file are sorted. Is there a way to do that in bash quickly as my file is large over 1 million rows?
Every line is the row of a matrix in a text file and only specific rows specified in the other file should be in the matrix.

I think you need here is:
cat w.txt | xargs -i{} sed -n '{}p' k.txt
if you must also sort a file, then
sort -g w.txt | xargs -i{} sed -n '{}p' k.txt

Related

bash: sort applied to a file returns right results as terminal output, but does change the file itself [duplicate]

This question already has answers here:
How can I use a file in a command and redirect output to the same file without truncating it?
(14 answers)
Closed 8 months ago.
I am using this command
sort -k1 -n source-g5.txt
to sort the content of file tmp-source-g5.txt (n rows, 2 columns) according to the numerical value of the first column.
When I run that line, the terminal prints out the desired result, but as I try to save the result into the same file,
sort -k1 -n source-g5.txt > source-g5.txt
the file shows no difference from before.
What am I doing wrong?
SOLVED
From this thread it turns out that redirecting the output of sort into the same file from which sort reads as source will not work since
the shell is makes the redirections (not the sort(1) program) and the
input file (as being the output also) will be erased just before
giving the sort(1) program the opportunity of reading it.
So I have split my command into two
sort -k1 -n source-g5.txt > tmp-source-g5.txt
mv tmp-source-g5.txt > source-g5.txt

Cut columns two and three using bash [duplicate]

This question already has answers here:
How to extract one column of a csv file
(18 answers)
Closed 1 year ago.
I have a .csv file with three columns. I want to keep the first column only. I have been trying to work with a command similar to the one below.
cut -f 1,4 output.csv > output.txt
No matter what I do, my output remains the same- giving me all three columns. Can anyone give me some insight?
Thanks!
read file one line at a time, trim everything right of that first comma:
while read -r line; do echo ${line%%,*}; done < output.csv > output.txt

How to shuffle multiple files and save different files? [duplicate]

This question already has answers here:
Shuffle multiple files in same order
(3 answers)
Closed 4 years ago.
I have three files as:
file1 file2 file3
A  B  C
D  E  F
G  H  I
The lines in each file relate to each other.
Thus, I want to generate shuffled files as:
file1.shuf file2.shuf file3.shuf
G     H    I
D     E    F
A     B    C
I often face this kind of problem and I always write a small script in Ruby or Python, but I thought it can be solved by some simple shell commands.
Could you suggest any simple ways to do this by shell commands or a script?
Here’s a simple script that does what you want. Specify all the input
files on the command line. It assumes all of the files have the same
number of lines.
First it creates a list of numbers and shuffles it. Then it combines
those numbers with each input file, sorts that, and removes the numbers.
Thus, each input file is shuffled in the same order.
#!/bin/bash
# Temp file to hold shuffled order
shuffile=$(mktemp)
# Create shuffled order
lines=$(wc -l < "$1")
digits=$(printf "%d" $lines | wc -c)
fmt=$(printf "%%0%d.0f" $digits)
seq -f "$fmt" $lines | shuf > $shuffile
# Shuffle each file in same way
for fname in "$#"; do
paste $shuffile "$fname" | sort | cut -f 2- > "$fname.shuf"
done
# Clean up
rm $shuffile

How to merge two csv files not including duplicates [duplicate]

This question already has answers here:
Remove duplicate lines without sorting [duplicate]
(8 answers)
Closed 9 years ago.
If I have a csv files like this
lion#mammal#scary animal
human#mammal#human
hummingbird#bird#can fly
dog#mammal#man's best friend
cat#mammal#purrs a lot
shark#fish#very scary
fish#fish#blub blub
and I have another csv file like this
cat#mammal#purrs a lot
shark#fish#very scary
fish#fish#blub blub
rockets#pewpew#fire
banana#fruit#yellow
I want the output to be like this:
lion#mammal#scary animal
human#mammal#human
hummingbird#bird#can fly
dog#mammal#man's best friend
cat#mammal#purrs a lot
shark#fish#very scary
fish#fish#blub blub
rockets#pewpew#fire
banana#fruit#yellow
some of the things in the first csv file are present in the second csv file; they overlap pretty much. How can I combine these csv files with the correct order? It is guaranteed that the new entries will always be the first few lines in the beginning of the first csv file.
Soultion 1:
awk '!a[$0]++' file1.cvs file2.cvs
Solution 2 (if don't care of the original order)
sort -u file1 file2
Here's one way:
Use cat -n to concatenate input files and prepend line numbers
Use sort -u remove duplicate data
Use sort -n to sort again by prepended number
Use cut to remove the line numbering
$ cat -n file1 file2 | sort -uk2 | sort -nk1 | cut -f2-
lion#mammal#scary animal
human#mammal#human
hummingbird#bird#can fly
dog#mammal#man's best friend
cat#mammal#purrs a lot
shark#fish#very scary
fish#fish#blub blub
rockets#pewpew#fire
banana#fruit#yellow
$

Shell line command Sorting command [duplicate]

This question already has answers here:
find difference between two text files with one item per line [duplicate]
(11 answers)
Closed 9 years ago.
I have a Masters.txt (all records) and a New.txt file. I want to process New.txt against Masters.txt and output all the lines from New.txt that do not exist in Masters.txt
i'm not sure if this is something the sort -u command can do.
Sort both files first using sort and then use the comm command to list the lines that exist only in new.txt and not in masters.txt. Something like:
sort masters.txt >masters_sorted.txt
sort new.txt >new_sorted.txt
comm -2 -3 new_sorted.txt masters_sorted.txt
comm produces three columns in its output by default; column 1 contains lines unique to the first file, column 2 contains lines unique to the second file; column 3 contains lines common to both files. The -2 -3 switches suppress the second and third columns.
see the linux comm command:
http://unstableme.blogspot.com/2009/08/linux-comm-command-brief-tutorial.html

Resources