subset ls based on position - bash

I have a list of files in a folder that need to be fed piped through to more commands, if I know the position of the files when using ls -v file_*.nc is it possible to remove/ignore files based on their position? So if ls -v file_*.nc returns 300 files, and I want files 8,73, and 151 removed from the pipe I could do something like ls -v file_*.nc | {remove 8,73,151} | do other stuff.
I don't want to delete/move the files, I just don't want them piped through to the next command.

If you wanted to filter out from the input as you said : is it possible to remove/ignore files
you can use grep -v <PATTERN> which -v is an exclusive match option.
input files
ls -v1 txt*
txt
txt-1
txt-2
txt-3
txt-4
txt-5
txt-6
txt-7
txt-8
txt-9
txt-10
then ignore any file which contains either 7, 8, 9
ls -v txt* | grep -v '[789]'
txt
txt-1
txt-2
txt-3
txt-4
txt-5
txt-6
txt-10
removed/ignored
txt-7
txt-8
txt-9

Related

Wondering how to delete the files when it's name increments?

I have a file in the dir as
file3.proto
file2.proto
file1.proto
I want to delete the file1 and file2, the highest number is the latest file that I don't want to delete. How can I achieve this in the shell script?
This below thing does the job but I want to be more dynamic. I don't want to change the shell script every time if the number increments, example if the file is 4 then I need to change 1..3.
ls | grep '.proto' | rm file{1..2}.proto
ls *.proto | head -n -1 | xargs rm
which with these files
file1.proto
file2.proto
file3.proto
executes the command
rm file1.proto file2.proto
UPDATE: Be warned that ls command outputs files in alphabetical order, which is not numerical order... I mean, if you have also a file25.proto, you'll get this output from ls:
file1.proto
file25.proto
file2.proto
file3.proto
So it should be better (if possible) to rename files like file001.proto, depending on the maximum possible number of files present in the folder. This is a common issue with file names ordering...

Grep using shell script to get files with some name less files with specified name

I need to get all files using shell script with some name X less files with name Y.
I'm trying to using grep -l -v to do it but i don`t really know how to use it.
Considering that you want to get files from a dir:
Try this:
ls | grep "X" | grep -v "Y"

How to print all lines of a file that do not contain a *partial* pattern

We know grep -v pattern file prints lines that do not contain pattern.
My file to search is a table:
Sample File, Sample Name, Panel, Marker, Allele 1, Allele 2, GQ,
M090972.s-206_B01.fsa, M090972-206, Sample ID-1, SNPchr1, C, T,0.9933,
I want to weed out the lines that contain "M090972-206" and some more patterns like that.
My search patterns come from a directory of text files:
$ ls 20170227_snap_genotypes_1_VCF
M070370-208_S1.genome.vcf M170276-201_S20.genome.vcf
M170308-201_S5.genome.vcf
Only the part of these filenames up to the first "_" is in my table (or the first "." if I remove the ".s" in the example). It is not a constant number of characters. I could remove the characters after the first "." but could not find a way in the sed and awk documentation.
Alternatively I tried using agrep 3.441 with the "-f" option for reading the patterns from a temporary file made with
$ ls "directory" > temp.txt
$ ./agrep -v -f temp.txt $infile >> $outfile
But agrep -f does not find any match (or everything with -v).
What am I missing? Is there a better way, perhaps with sed or awk?
If you are deriving your patterns from the name of files (up to the first _) that exist in 20170227_snap_genotypes_1_VCF directory, then you could do this:
# run from the parent of 20170227_snap_genotypes_1_VCF directory
grep -vf <(cd 20170227_snap_genotypes_1_VCF; ls | cut -f1 -d_) file

How to quickly check a .gz file without unzip? [duplicate]

How to get the first few lines from a gziped file ?
I tried zcat, but its throwing an error
zcat CONN.20111109.0057.gz|head
CONN.20111109.0057.gz.Z: A file or directory in the path name does not exist.
zcat(1) can be supplied by either compress(1) or by gzip(1). On your system, it appears to be compress(1) -- it is looking for a file with a .Z extension.
Switch to gzip -cd in place of zcat and your command should work fine:
gzip -cd CONN.20111109.0057.gz | head
Explanation
-c --stdout --to-stdout
Write output on standard output; keep original files unchanged. If there are several input files, the output consists of a sequence of independently compressed members. To obtain better compression, concatenate all input files before compressing
them.
-d --decompress --uncompress
Decompress.
On some systems (e.g., Mac), you need to use gzcat.
On a mac you need to use the < with zcat:
zcat < CONN.20111109.0057.gz|head
If a continuous range of lines needs be, one option might be:
gunzip -c file.gz | sed -n '5,10p;11q' > subFile
where the lines between 5th and 10th lines (both inclusive) of file.gz are extracted into a new subFile. For sed options, refer to the manual.
If every, say, 5th line is required:
gunzip -c file.gz | sed -n '1~5p;6q' > subFile
which extracts the 1st line and jumps over 4 lines and picks the 5th line and so on.
If you want to use zcat, this will show the first 10 rows
zcat your_filename.gz | head
Let's say you want the 16 first row
zcat your_filename.gz | head -n 16
This awk snippet will let you show not only the first few lines - but a range you can specify. It will also add line numbers which i needed for debugging an error message pointing to a certain line way down in a gzipped file.
gunzip -c file.gz | awk -v from=10 -v to=20 'NR>=from { print NR,$0; if (NR>=to) exit 1}'
Here is the awk snippet used in the one liner above. In awk NR is a built-in variable (Number of records found so far) which usually is equivalent to a line number. the from and to variable are picked up from the command line via the -v options.
NR>=from {
print NR,$0;
if (NR>=to)
exit 1
}

How can I select the filename with the highest version number?

I wrote a build script and would like to be able to select the latest version of the script when it installs, e.g. the package name is package_X.X.X.tar.gz and there are multiple copies.
Is there a way to point the build command to package_Y.tar.gz? where Y=max(X.X.X)?
If the files are equal except for the version numbers, you could use something like
ls -v | tail -n 1
From the man-page of ls:
...
-v natural sort of (version) numbers within text
...
Example usage:
$ ls
package_1.5.7.9.tar.gz package_2.5.3.9.tar.gz package_4.6.1.0.tar.gz
$ ls -v | tail -n 1
package_4.6.1.0.tar.gz

Resources