skip lines while reading text files from ksh - shell

I have a text file having some names line by line.
I am reading this file through KornShell (ksh) and getting those names and performing some operations in loop.
I want to put some comment in the text file for readability (i.e., lines starting with # are comments an no need to read that).
So, what I want is to read the lines which are not starting with # symbol.
In ksh, I am reading like this:
while read base
do
---
---
done<file
I tried to use grep, but is is not working.
I want the correct syntax to achieve it in ksh.

You can do for example this (read.sh):
#!/bin/ksh
while read line
do
[[ $line = \#* ]] && continue
echo $line
done < read.sh

How about this (edited to include full code snippet):
while read base
do
# skip comments
[ -z "`echo $base | grep '^#'`" ] || continue
# handle remaining lines here
done<file
But the other answer contains a much more concise and ksh-ish solution.

Related

How to pad a value with zeroes based on a match in a string and the length of the following string?

I have some problems adapting the answers from previous questions, so I hope it is ok to write for a specific solution.
I have a file with RNA-reads in the fasta format, however the end of the readname has been messed up, so I need to correct it.
It is a simple task of padding zeroes into the middle of a string, however I cannot get it to work as I also need to identify the length and the position of the problem.
My read file header looks like this:
#V350037327L1C001R0010000023/1_U1
and I need to search for the "/1_U" and then left pad zeroes to the rest of the line up to a total length of 6.
It will look like this:
#V350037327L1C001R0010000023/1_U000001
The final length should be six following "/1_U".
eg: input:
#V350037327L1C001R0010000055/1_U300 = /1_U000300
#V350037327L1C001R0010000122/1_U45000 = /1_U045000
I have tried with awk, however I cannot get it to check the initial length and hence not pad the correct number of zeroes.
Thank you in advance and thank you for your neverending support in this forum
Try this:
#! /bin/bash
files=('#V350037327L1C001R0010000023/1_U1'
'#V350037327L1C001R0010000055/1_U300'
'#V350037327L1C001R0010000122/1_U45000')
for file in "${files[#]}"; do
if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
printf '%s%06d\n' "${BASH_REMATCH[#]:1}"
fi
done
Update: This reads the files from stdin.
#! /bin/bash
while read -r file; do
if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
printf '%s%06d\n' "${BASH_REMATCH[#]:1}"
fi
done
Update 2: You should really learn the basics of shell programming before you start programming the shell. Typical basics are conditional constructs.
#! /bin/bash
while read -f file; do
if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
printf '%s%06d\n' "${BASH_REMATCH[#]:1}"
else
printf '%s\n' "$file"
fi
done

Adding test_ in front of a file name with path

I have a list of files stored in a text file, and if a Python file is found in that list. I want to the corresponding test file using Pytest.
My file looks like this:
/folder1/file1.txt
/folder1/file2.jpg
/folder1/file3.md
/folder1/file4.py
/folder1/folder2/file5.py
When 4th/5th files are found, I want to run the command pytest like:
pytest /folder1/test_file4.py
pytest /folder1/folder2/test_file5.py
Currently, I am using this command:
cat /workspace/filelist.txt | while read line; do if [[ $$line == *.py ]]; then exec "pytest test_$${line}"; fi; done;
which is not working correctly, as I have file path in the text as well. Any idea how to implement this?
Using Bash's variable substring removal to add the test_. One-liner:
$ while read line; do if [[ $line == *.py ]]; then echo "pytest ${line%/*}/test_${line##*/}"; fi; done < file
In more readable form:
while read line
do
if [[ $line == *.py ]]
then
echo "pytest ${line%/*}/test_${line##*/}"
fi
done < file
Output:
pytest /folder1/test_file4.py
pytest /folder1/folder2/test_file5.py
Don't know anything about the Google Cloudbuild so I'll let you experiment with the double dollar signs.
Update:
In case there are files already with test_ prefix, use this bash script that utilizes extglob in variable substring removal:
shopt -s extglob # notice
while read line
do
if [[ $line == *.py ]]
then
echo "pytest ${line%/*}/test_${line##*/?(test_)}" # notice
fi
done < file
You can easily refactor all your conditions into a simple sed script. This also gets rid of the useless cat and the similarly useless exec.
sed -n 's%[^/]*\.py$%test_&%p' /workspace/filelist.txt |
xargs -n 1 pytest
The regular expression matches anything after the last slash, which means the entire line if there is no slash; we include the .py suffix to make sure this only matches those files.
The pipe to xargs is a common way to convert standard input into command-line arguments. The -n 1 says to pass one argument at a time, rather than as many as possible. (Maybe pytest allows you to specify many tests; then, you can take out the -n 1 and let xargs pass in as many as it can fit.)
If you want to avoid adding the test_ prefix to files which already have it, one solution is to break up the sed script into two separate actions:
sed -n '/test_[^/]*\.py/p;t;s%[^/]*\.py$%test_&%p' /workspace/filelist.txt |
xargs -n 1 pytest
The first p simply prints the matches verbatim; the t says if that matched, skip the rest of the script for this input.
(MacOS / BSD sed will want a newline instead of a semicolon after the t command.)
sed is arguably a bit of a read-only language; this is already pressing towards the boundary where perhaps you would rewrite this in Awk instead.
You may want to focus on lines that ends with ".py" string
You can achieve that using grep combined with a regex so you can figure out if a line ends with .py - that eliminates the if statement.
IFS=$'\n'
for file in $(cat /workspace/filelist.txt|grep '\.py$');do pytest $file;done

Unexpected end of file in while loop in bash

I am trying to write a bash script that will do the following:
Take a directory or file as input (will always begin with /mnt/user/)
Search other mount points for same file or directory (will always begin with /mnt/diskx)
Return value
So, for example, the input will be "/mnt/user/my_files/file.txt". It will search if ""/mnt/disk1/my_files/file.txt" exists and will incrementally look for each disk (disk2, disk3, etc) until it finds it or disk20.
This is what I have so far:
#/user/bin/bash
var=$1
i=0
while [ -e $check_var = echo $var | sed 's:/mnt/user:/mnt/disk$i+1:']
do
final=$check_var
done
It's incomplete yes, but I am not that proficient in bash so I'm doing a little at a time. I'm sure my command won't work properly yet either but right now I am getting an "unexpected end of file" and I can't figure out why.
There are many issues here:
If this is the actual code you're getting "unexpected end of file" on, you should save the file in Unix format, not DOS format.
The shebang should be #!/usr/bin/bash or #!/bin/bash depending on your system
You have to assign check_var before running [ .. ] on it.
You have to use $(..) to expand a command
Variables like $i are not expanded in single quotes
sed can't add numbers
i is never incremented
the loop logic is inverted, it should loop until it matches and not while it matches.
You'd want to assign final after -- not in -- the loop.
Consider doing it in even smaller pieces, it's easier to debug e.g. the single statement sed 's:/mnt/user:/mnt/disk$i+1:' than your entire while loop.
Here's a more canonical way of doing it:
#!/bin/bash
var="${1#/mnt/user/}"
for file in /mnt/disk{1..20}/"$var"
do
[[ -e "$file" ]] && final="$file" && break
done
if [[ $final ]]
then
echo "It exists at $final"
else
echo "It doesn't exist anywhere"
fi

Bash/Shell | How to prioritize quote from IFS in read [duplicate]

This question already has answers here:
IFS separate a string like "Hello","World","this","is, a boring", "line"
(3 answers)
Closed 6 years ago.
I'm working with a hand fill file and I am having issue to parse it.
My file input file cannot be altered, and the language of my code can't change from bash script.
I made a simple example to make it easy for you ^^
var="hey","i'm","happy, like","you"
IFS="," read -r one two tree for five <<<"$var"
echo $one:$two:$tree:$for:$five
Now I think you already saw the problem here. I would like to get
hey:i'm:happy, like:you:
but I get
hey:i'm:happy: like:you
I need a way to tell the read that the " " are more important than the IFS. I have read about the eval command but I can't take that risk.
To end this is a directory file and the troublesome field is the description one, so it could have basically anything in it.
original file looking like that
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
Edit #1
I will give a better exemple; the one I use above is too simple and #StefanHegny found it cause another error.
while read -r ldapLine
do
IFS=',' read -r objectClass dumy1 uidNumber gidNumber username description modifyTimestamp nsAccountLock gecos homeDirectory loginShell createTimestamp dumy2 <<<"$ldapLine"
isANetuser=0
while IFS=":" read -r -a class
do
for i in "${class[#]}"
do
if [ "$i" == "account" ]
then
isANetuser=1
break
fi
done
done <<< $objectClass
if [ $isANetuser == 0 ]
then
continue
fi
#MORE STUFF APPEND#
done < file.csv
So this is a small part of the code but it should explain what I do. The file.csv is a lot of lines like this:
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
If the various bash versions you will use are all more recent than v3.0, when regexes and BASH_REMATCH were introduced, you could use something like the following function: [Note 1]
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"[^\"]*\") ]]; do
printf "%s\n" "${BASH_REMATCH[2]:-${BASH_REMATCH[1]:1:-1}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
It's argument is a single line (remember to quote it!) and it prints each comma-separated field on a separate line. As written, it assumes that no field has an enclosed newline; that's legal in CSV, but it makes dividing the file into lines a lot more complicated. If you actually needed to deal with that scenario, you could change the \n in the printf statement to a \0 and then use something like xargs -0 to process the output. (Or you could insert whatever processing you need to do to the field in place of the printf statement.)
It goes to some trouble to dequote quoted fields without modifying unquoted fields. However, it will fail on fields with embedded double quotes. That's fixable, if necessary. [Note 2]
Here's a sample, in case that wasn't obvious:
while IFS= read -r line; do
each_field "$line"
printf "%s\n" "-----"
done <<EOF
type,cn,uid,gid,gecos,"description",timestamp,disabled
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
EOF
Output:
type
cn
uid
gid
gecos
description
timestamp
disabled
-----
top:shadowAccount:account:posixAccount
Jdupon
12345
6789
Jdupon
Jean Mark, Dupon
20140511083750Z
Jean Mark, Dupon
/home/user/Jdupon
/bin/ksh
20120512083750Z
-----
Notes:
I'm not saying you should use this function. You should use a CSV parser, or a language which includes a good CSV parsing library, like python. But I believe this bash function will work, albeit slowly, on correctly-formatted CSV files of a certain common CSV dialect.
Here's a version which handles doubled quotes inside quoted fields, which is the classic CSV syntax for interior quotes:
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"(([^\"]|\"\")*)\") ]]; do
echo "${BASH_REMATCH[2]:-${BASH_REMATCH[3]//\"\"/\"}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
My suggestion, as in some previous answers (see below), is to switch the separator to | (and use IFS="|" instead):
sed -r 's/,([^,"]*|"[^"]*")/|\1/g'
This requires a sed that has extended regular expressions (-r) however.
Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

How can I grep contents of files with bash only without using find or grep -r?

I have an assignment to write a bash program which if I type in the following:
-bash-4.1$ ./sample.sh path regex keyword
that will result something like that:
path/sample.txt:12
path/sample.txt:34
path/dir/sample1.txt:56
path/dir/sample2.txt:78
The numbers are the line number of the search results. I have absolutely no idea how can I achieve this in bash, without using find or grep -r. I am allowed to use grep, sed, awk, …
Break the problem into parts.
First, you need to obtain the file names to search in. How can you list the files in a directory and its subdirectories? (Hint: there's a glob pattern for that.)
You need to iterate over the files. What form of loop should this be?
For each file, you need to read each line from the file in turn. There's a builtin for that.
For each line, you need to test whether the line matches the specified regexp. There's a construct for that.
You need to maintain a counter of the number of lines read in a file to be able to print the line number.
Search for globstar in the bash manual.
See https://unix.stackexchange.com/questions/18886/why-is-while-ifs-read-used-so-often-instead-of-ifs-while-read/18936#18936 regarding while read loops.
shopt -s globstar # to enable **/
GLOBIGNORE=.:.. # to match dot files
dir=$1; regex=$2
for file in "$dir"/**/*; do
[[ -f $file ]] || continue
n=1
while IFS= read -r line; do
if [[ $line =~ $regex ]]; then
echo "$file:$n"
fi
((++n))
done <"$file"
done
It's possible that your teacher didn't intend you to use the globstar feature, which is a relatively recent addition to bash (appeared in version 4.0). If so, you'll need to write a recursive function to recurse into subdirectories.
traverse_directory () {
for x in "$1"/*; do
if [ -d "$x" ]; then
traverse_directory "$x"
elif [ -f "$x" ]; then
grep "$regexp" "$x"
fi
done
}
Putting this into practice:
#!/bin/sh
regexp="$2"
traverse_directory "$1"
Follow-up exercise: the glob pattern * omits files whose name begins with a . (dot files). You can easily match dot files as well by adding looping over .* as well, i.e. for x in .* *; do …. However, this throws the function into an infinite loop as it recurses forever into . (and also ..). How can you change the function to work with dot files as well?
while read
do
[[ $REPLY =~ foo ]] && echo $REPLY
done < file.txt

Resources