subset ls based on position

subset ls based on position - bash

I have a list of files in a folder that need to be fed piped through to more commands, if I know the position of the files when using ls -v file_*.nc is it possible to remove/ignore files based on their position? So if ls -v file_*.nc returns 300 files, and I want files 8,73, and 151 removed from the pipe I could do something like ls -v file_*.nc | {remove 8,73,151} | do other stuff.
I don't want to delete/move the files, I just don't want them piped through to the next command.

If you wanted to filter out from the input as you said : is it possible to remove/ignore files
you can use grep -v <PATTERN> which -v is an exclusive match option.
input files
ls -v1 txt*
txt
txt-1
txt-2
txt-3
txt-4
txt-5
txt-6
txt-7
txt-8
txt-9
txt-10
then ignore any file which contains either 7, 8, 9
ls -v txt* | grep -v '[789]'
txt
txt-1
txt-2
txt-3
txt-4
txt-5
txt-6
txt-10
removed/ignored
txt-7
txt-8
txt-9

Related

Wondering how to delete the files when it's name increments?

I have a file in the dir as
file3.proto
file2.proto
file1.proto
I want to delete the file1 and file2, the highest number is the latest file that I don't want to delete. How can I achieve this in the shell script?
This below thing does the job but I want to be more dynamic. I don't want to change the shell script every time if the number increments, example if the file is 4 then I need to change 1..3.
ls | grep '.proto' | rm file{1..2}.proto

ls *.proto | head -n -1 | xargs rm
which with these files
file1.proto
file2.proto
file3.proto
executes the command
rm file1.proto file2.proto
UPDATE: Be warned that ls command outputs files in alphabetical order, which is not numerical order... I mean, if you have also a file25.proto, you'll get this output from ls:
file1.proto
file25.proto
file2.proto
file3.proto
So it should be better (if possible) to rename files like file001.proto, depending on the maximum possible number of files present in the folder. This is a common issue with file names ordering...

Grep using shell script to get files with some name less files with specified name

I need to get all files using shell script with some name X less files with name Y.
I'm trying to using grep -l -v to do it but i don`t really know how to use it.

Considering that you want to get files from a dir:
Try this:
ls | grep "X" | grep -v "Y"

How to print all lines of a file that do not contain a partial pattern

We know grep -v pattern file prints lines that do not contain pattern.
My file to search is a table:
Sample File, Sample Name, Panel, Marker, Allele 1, Allele 2, GQ,
M090972.s-206_B01.fsa, M090972-206, Sample ID-1, SNPchr1, C, T,0.9933,
I want to weed out the lines that contain "M090972-206" and some more patterns like that.
My search patterns come from a directory of text files:
$ ls 20170227_snap_genotypes_1_VCF
M070370-208_S1.genome.vcf M170276-201_S20.genome.vcf
M170308-201_S5.genome.vcf
Only the part of these filenames up to the first "_" is in my table (or the first "." if I remove the ".s" in the example). It is not a constant number of characters. I could remove the characters after the first "." but could not find a way in the sed and awk documentation.
Alternatively I tried using agrep 3.441 with the "-f" option for reading the patterns from a temporary file made with
$ ls "directory" > temp.txt
$ ./agrep -v -f temp.txt $infile >> $outfile
But agrep -f does not find any match (or everything with -v).
What am I missing? Is there a better way, perhaps with sed or awk?

If you are deriving your patterns from the name of files (up to the first _) that exist in 20170227_snap_genotypes_1_VCF directory, then you could do this:
# run from the parent of 20170227_snap_genotypes_1_VCF directory
grep -vf <(cd 20170227_snap_genotypes_1_VCF; ls | cut -f1 -d_) file

How to quickly check a .gz file without unzip? [duplicate]

How to get the first few lines from a gziped file ?
I tried zcat, but its throwing an error
zcat CONN.20111109.0057.gz|head
CONN.20111109.0057.gz.Z: A file or directory in the path name does not exist.

zcat(1) can be supplied by either compress(1) or by gzip(1). On your system, it appears to be compress(1) -- it is looking for a file with a .Z extension.
Switch to gzip -cd in place of zcat and your command should work fine:
gzip -cd CONN.20111109.0057.gz | head
Explanation
-c --stdout --to-stdout
Write output on standard output; keep original files unchanged. If there are several input files, the output consists of a sequence of independently compressed members. To obtain better compression, concatenate all input files before compressing
them.
-d --decompress --uncompress
Decompress.

On some systems (e.g., Mac), you need to use gzcat.

On a mac you need to use the < with zcat:
zcat < CONN.20111109.0057.gz|head

If a continuous range of lines needs be, one option might be:
gunzip -c file.gz | sed -n '5,10p;11q' > subFile
where the lines between 5th and 10th lines (both inclusive) of file.gz are extracted into a new subFile. For sed options, refer to the manual.
If every, say, 5th line is required:
gunzip -c file.gz | sed -n '1~5p;6q' > subFile
which extracts the 1st line and jumps over 4 lines and picks the 5th line and so on.

If you want to use zcat, this will show the first 10 rows
zcat your_filename.gz | head
Let's say you want the 16 first row
zcat your_filename.gz | head -n 16

This awk snippet will let you show not only the first few lines - but a range you can specify. It will also add line numbers which i needed for debugging an error message pointing to a certain line way down in a gzipped file.
gunzip -c file.gz | awk -v from=10 -v to=20 'NR>=from { print NR,$0; if (NR>=to) exit 1}'
Here is the awk snippet used in the one liner above. In awk NR is a built-in variable (Number of records found so far) which usually is equivalent to a line number. the from and to variable are picked up from the command line via the -v options.
NR>=from {
print NR,$0;
if (NR>=to)
exit 1
}

How can I select the filename with the highest version number?

I wrote a build script and would like to be able to select the latest version of the script when it installs, e.g. the package name is package_X.X.X.tar.gz and there are multiple copies.
Is there a way to point the build command to package_Y.tar.gz? where Y=max(X.X.X)?

If the files are equal except for the version numbers, you could use something like
ls -v | tail -n 1
From the man-page of ls:
...
-v natural sort of (version) numbers within text
...
Example usage:
$ ls
package_1.5.7.9.tar.gz package_2.5.3.9.tar.gz package_4.6.1.0.tar.gz
$ ls -v | tail -n 1
package_4.6.1.0.tar.gz

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

subset ls based on position - bash

Related

Wondering how to delete the files when it's name increments?

Grep using shell script to get files with some name less files with specified name

How to print all lines of a file that do not contain a partial pattern

How to quickly check a .gz file without unzip? [duplicate]

How can I select the filename with the highest version number?

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

subset ls based on position - bash

Related

Wondering how to delete the files when it's name increments?

Grep using shell script to get files with some name less files with specified name

How to print all lines of a file that do not contain a *partial* pattern

How to quickly check a .gz file without unzip? [duplicate]

How can I select the filename with the highest version number?

Categories

Resources

How to print all lines of a file that do not contain a partial pattern