I am processing a developer commit message in git hook.
let's say the file has the following content
\n new lines here
# this is a sample commit
# only for the developers
Ticket-ID: we fix old bugs and introduces new ones
we always do this stuff
so cool, not really :P
# company name
My intention is to get only this line Ticket-ID : we fix old bugs and introduces new ones
User123's comment is nice and terse: grep -E "^[[:alnum:]]" file |head -n 1 however is does not catch lines of text that start with non-alphanumeric characters that are not a # such as commit messages that start with an emoji, dashes, parenthesis, etc..
🚀 yeah this line is an exception
--> This is also an edge case
(so is this)
To catch all edge cases you can loop through the file and check each $line with a negated ! regexp operator =~ for:
Not being a newline ! $line =~ (^[^\n ]*$)
Not starting with a pound sign ! $line =~ ^#
Not being a line consisting of all spaces ! $line =~ (^[ ]*$)
Then just echo the $line and break if those conditions are met:
# file parse.sh
#!/bin/bash
if [[ -f $1 ]]; then
while IFS= read -r line
do
[[ ! $line =~ (^[^\n ]*$) && ! $line =~ ^# && ! $line =~ (^[ ]*$) ]] && echo "$line" && break
done < "$1"
fi
# file commit .txt
# this is a sample commit
# only for the developers
Ticket-ID: we fix old bugs and introduces new ones
we always do this stuff
so cool, not really :P
# company name
Now you can invoke the parse.sh like this
bash parse.sh commit.txt
Or save the results to a variable using a subshell
result=$(bash parse.sh commit.txt); echo "$result"
Below single line grep should work as per your requirement:
grep -E "^[[:alnum:]]" file |head -n 1
Explanation:
^[[:alnum:]] :: to capture only the line starting with any alphanumeric character[0-9A-Za-z]
head -n 1 :: to capture the first occurrence
Related
I'am trying to get the first character of each string using regex and BASH_REMATCH in shell script.
My input text file contain :
config_text = STACK OVER FLOW
The strings STACK OVER FLOW must be uppercase like that.
My output should be something like this :
SOF
My code for now is :
var = config_text
values=$(grep $var test_file.txt | tr -s ' ' '\n' | cut -c 1)
if [[ $values =~ [=(.*)]]; then
echo $values
fi
As you can see I'am using tr and cut but I'am looking to replace them with only BASH_REMATCH because these two commands have been reported in many links as not functional on MacOs.
I tried something like this :
var = config_text
values=$(grep $var test_file.txt)
if [[ $values =~ [=(.*)(\b[a-zA-Z])]]; then
echo $values
fi
VALUES as I explained should be :
S O F
But it seems \b does not work on shell script.
Anyone have an idea how to get my desired output with BASH_REMATCH ONLY.
Thanks in advance for any help.
A generic BASH_REMATCH solution handling any number of words and any separator.
local input="STACK OVER FLOW" pattern='([[:upper:]]+)([^[:upper:]]*)' result=""
while [[ $input =~ $pattern ]]; do
result+="${BASH_REMATCH[1]::1}${BASH_REMATCH[2]}"
input="${input:${#BASH_REMATCH[0]}}"
done
echo "$result"
# Output: "S O F"
Bash's regexes are kind of cumbersome if you don't know how many words there are in the input string. How's this instead?
config_text="STACK OVER FLOW"
sed 's/\([^[:space:]]\)[^[:space:]]*/\1/g' <<<"$config_text"
First Put a valid shebang and paste your script at https://shellcheck.net for validation/recommendation.
With the assumption that the line starts with config and ends with FLOW e.g.
config_text = STACK OVER FLOW
Now the script.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
while IFS= read -r line; do
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
done < test_file.txt
If there is Only one line or the target string/pattern is at the first line of the test_file.txt, the while loop is not needed.
#!/usr/bin/env bash
values="config_text = STACK OVER FLOW"
regexp="config_text = ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1})[^ ]+ ([[:upper:]]{1}).+$"
IFS= read -r line < test_file.txt
[[ "$line" = "$values" && "$values" =~ $regexp ]] &&
printf '%s %s %s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}"
Make sure you have and running/using Bashv4+ since MacOS, defaults to Bashv3
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Another option rather than bash regex would be to utilize bash parameter expansion substring ${parameter:offset:length} to extract the desired characters:
$ read -ra arr <text.file ; printf "%s%s%s\n" "${arr[2]:0:1}" "${arr[3]:0:1}" "${arr[4]:0:1}"
SOF
I have an exercise where I have a file and at the begin of it I have something like
#!usr/bin/bash
# tototata
#tititutu
#ttta
Hello world
Hi
Test test
#zabdazj
#this is it
And I have to take each first line starting with a # until the line where I don't have one and stock it in a variable. In case of a shebang, it has to skip it and if there's blank space between lines, it has to skip them too. We just want the comment between the shebang and the next character.
I'm new to bash and I would like to know if there's a way to do it please ?
Expected output:
# tototata
#tititutu
#ttta
Try in this easy way to better understand.
#!/bin/bash
sed 1d your_input_file | while read line;
do
check=$( echo $line | grep ^"[#;]" )
if ([ ! -z "$check" ] || [ -z "$line" ])
then
echo $line;
else
exit 1;
fi
done
This may be more correct, although your question was unclear about weather the input file had a script shebang, if the shebang had to be skipped to match your sample output, or if the input file shebang was just bogus.
It is also unclear for what to do, if the first lines of the input file are not starting with #.
You should really post your assignment's text as a reference.
Anyway here is a script that does collects first set of consecutive lines starting with a sharp # into the arr array variable.
It may not be an exact solution to your assignment (witch you should be able to solve with what your previous lessons taught you), but will get you some clues and keys to iterate reading lines from a file and testing that lines starts with a #.
#!/usr/bin/env bash
# Our variable to store parsed lines
# Is an array of strings with an entry per line
declare -a arr=()
# Iterate reading lines from the file
# while it matches Regex: ^[#]
# mean while lines starts with a sharp #
while IFS=$'\n' read -r line && [[ "$line" =~ ^[#] ]]; do
# Add line to the arr array variable
arr+=("$line")
done <a.txt
# Print each array entries with a newline
printf '%s\n' "${arr[#]}"
How about this (not tested, so you may have to debug it a bit, but my comments in the code should explain what is going on):
while read line
do
# initial is 1 one the first line, and 0 after this. When the script starts,
# the variable is undefined.
: ${initial:=1}
# Test for lines starting with #. Need to quote the hash
# so that it is not taken as comment.
if [[ $line == '#'* ]]
then
# Test for initial #!
if (( initial == 1 )) && [[ $line == '#!'* ]]
then
: # ignore it
else
echo $line # or do whatever you want to do with it
fi
fi
# stop on non-blank, non-comment line
if [[ $line != *[^\ ]* ]]
then
break
fi
initial=0 # Next line won't be an initial line
done < your_file
I'm using bash on cygwin.
I have to take a .csv file that is a subset of a much larger set of settings and shuffle the new csv settings (same keys, different values) into the 1000-plus-line original, making a new .json file.
I have put together a script to automate this. The first step in the process is to "clean up" the csv file by extracting lines that start with "mme " and "sms ". Everything else is to pass through cleanly to the "clean" .csv file.
This routine is as follows:
# clean up the settings, throwing out mme and sms entries
cat extract.csv | while read -r LINE; do
if [[ $LINE == "mme "* ]]
then
printf "$LINE\n" >> mme_settings.csv
elif [[ $LINE == "sms "* ]]
then
printf "$LINE\n" >> sms_settings.csv
else
printf "$LINE\n" >> extract_clean.csv
fi
done
My problem is that this thing stubs its toe on the following string at the end of one entry: 100%." When it's done with the line, it simply elides the %." and the new-line marker following it, and smears the two lines together:
... 100next.entry.keyname...
I would love to reach in and simply manually delimit the % sign, but it's not a realistic option for my use case. Clearly I'm missing something. My suspicion is that I am in some wise abusing cat or read in the first line.
If there is some place I should have looked to find the answer before bugging you all, by all means point me in that direction and I'll sod off.
Syntax for printf is :
printf format [argument]...
In [ printf ] format string, anything followed by % is a format specifier as described in the link above. What you would like to do is :
while read -r line; do # Replaced LINE with line, full uppercase variable are reserved for the syste,
if [[ "$line" = "mme "* ]] # Here* would glob for anything that comes next
then
printf "%s\n" $line >> mme_settings.csv
elif [[ "$line" = "sms "* ]]
then
printf "%s\n" $line >> sms_settings.csv
else
printf "%s\n" $line >> extract_clean.csv
fi
done<extract.csv # Avoided the useless use of cat
As pointed out, your problem is expanding a parameter containing a formatting instruction in the formatting argument of printf, which can be solved by using echo instead or moving the parameter to be expanded out of the formatting string, as demonstrated in other answers.
I recommend not looping over your whole file with Bash in the first place, as it's notoriously slow; you're extracting lines starting with certain patterns, which is a job at which grep excels:
grep '^mme ' extract.csv > mme_settings.csv
grep '^sms ' extract.csv > sms_settings.csv
grep -v '^mme \|^sms ' extract.csv > extract_clean.csv
The third command uses the -v option (extract lines that don't match) and alternation to exclude lines both starting with mme and sms.
I am trying to read lines from a file containing multiple lines. I want to identify lines that contain only spaces.
By definition, an empty line is empty and does not contain anything (including spaces).
I want to detect lines that seems to be empty but they are not (lines that contain spaces only)
while read line; do
if [[ `echo "$line" | wc -w` == 0 && `echo "$line" | wc -c` > 1 ]];
then
echo "Fake empty line detected"
fi
done < "$1"
But because read ignores spaces in the start and in the end of a string my code isn't working.
an example of a file
hi
hi
(empty line, no spaces or any other char)
hi
(two spaces)
hey
Please help me to fix the code
Disable word splitting by clearing the value of IFS (the internal field separator):
while IFS= read -r line; do
....
done < "$1"
The -r isn't strictly necessary, but it is good practice.
Also, a simpler way to check the value of line (I assume you're looking for a line with nothing but whitespace):
if [[ $line =~ ^$ ]]; then
echo "Fake empty line detected"
fi
Following your code, it can be improved.
while read line; do
if [ -z "$line" ]
then
echo "Fake empty line detected"
fi
done < "$1"
The test -z checks if $line is empty.
Output:
Fake empty line detected
Fake empty line detected
I'm trying to get the first character of a variable, but I'm getting a Bad substitution error. Can anyone help me fix it?
code is:
while IFS=$'\n' read line
do
if [ ! ${line:0:1} == "#"] # Error on this line
then
eval echo "$line"
eval createSymlink $line
fi
done < /some/file.txt
Am I doing something wrong or is there a better way of doing this?
-- EDIT --
As requested - here's some sample input which is stored in /some/file.txt
$MOZ_HOME/mobile/android/chrome/content/browser.js
$MOZ_HOME/mobile/android/locales/en-US/chrome/browser.properties
$MOZ_HOME/mobile/android/components/ContentPermissionPrompt.js
To get the first character of a variable you need to say:
v="hello"
$ echo "${v:0:1}"
h
However, your code has a syntax error:
[ ! ${line:0:1} == "#"]
# ^-- missing space
So this can do the trick:
$ a="123456"
$ [ ! "${a:0:1}" == "#" ] && echo "doesnt start with #"
doesnt start with #
$ a="#123456"
$ [ ! "${a:0:1}" == "#" ] && echo "doesnt start with #"
$
Also it can be done like this:
$ a="#123456"
$ [ "$(expr substr $a 1 1)" != "#" ] && echo "does not start with #"
$
$ a="123456"
$ [ "$(expr substr $a 1 1)" != "#" ] && echo "does not start with #"
does not start with #
Update
Based on your update, this works to me:
while IFS=$'\n' read line
do
echo $line
if [ ! "${line:0:1}" == "#" ] # Error on this line
then
eval echo "$line"
eval createSymlink $line
fi
done < file
Adding the missing space (as suggested in fedorqui's answer ;) ) works for me.
An alternative method/syntax
Here's what I would do in Bash if I want to check the first character of a string
if [[ $line != "#"* ]]
On the right hand side of ==, the quoted part is treated literally whereas * is a wildcard for any sequence of character.
For more information, see the last part of Conditional Constructs of Bash reference manual:
When the ‘==’ and ‘!=’ operators are used, the string to the right of the operator is considered a pattern and matched according to the rules described below in Pattern Matching
Checking that you're using the right shell
If you are getting errors such as "Bad substitution error" and "[[: not found" (see comment) even though your syntax is fine (and works fine for others), it might indicate that you are using the wrong shell (i.e. not Bash).
So to make sure you are using Bash to run the script, either
make the script executable and use an appropriate shebang e.g. #!/bin/bash
or execute it via bash my_script
Also note that sh is not necessarily bash, sometimes it can be dash (e.g. in Ubuntu) or just plain ol' Bourne shell.
Try this:
while IFS=$'\n' read line
do
if ! [ "${line:0:1}" = "#" ]; then
eval echo "$line"
eval createSymlink $line
fi
done < /some/file.txt
or you can use the following for your if syntax:
if [[ ! ${line:0:1} == "#" ]]; then
TIMTOWTDI ^^
while IFS='' read -r line
do
case "${line}" in
"#"*) echo "${line}"
;;
*) createSymlink ${line}
;;
esac
done < /some/file.txt
Note: I dropped the eval, which could be needed in some (rare!) cases (and are dangerous usually).
Note2: I added a "safer" IFS & read (-r, raw) but you can revert to your own if it is better suited. Note that it still reads line by line.
Note3: I took the habit of using always ${var} instead of $var ... works for me (easy to find out vars in complex text, and easy to see where they begin and end at all times) but not necessary here.
Note4: you can also change the test to : *"#"*) if some of the (comments?) lines can have spaces or tabs before the '#' (and none of the symlink lines does contain a '#')