Store into variable a string that changes dynamically - shell

I'm quite new unix in shell scripting so I assume I need help regarding the following issue:
I want to store into a variable a string of a log text that changes dynamically. E.x one time you run the program that creates this log, this string may be server1 and the next time server238. I have found ways to find the first occurrence of this string through sed or grep and cut. However since the log file that this software creates may differ from version to version I can't count on a specific line that contains this string. E.x one version may log "The server you are using is server98" and the next one to "Used server is server98". Is there a way through sed or awk that this string may be retrieved regardless of the log layout.
Thanks in advance.

I'd go with:
server=$(grep -Eo 'server[0-9]+' file | head -n 1)
to find any occurrence of the word server followed by some digits, e.g. server3, server98
-E means to use Extended regular expressions (i.e. \d+ for multiple digits)
-o means only output the matching part of the string - not the whole line that contains it.
Here it is in action on OSX:
cat file
server9
fred server98 fred
3
/usr/bin/grep -Eo 'server[0-9]+' file
server9
server98

Try this:
MY_VAR="$(sed -n 's/^.*\(server[0-9][0-9]*\).*$/\1/p' my_file.log | sort -u)"

Related

Counting char in word with different delimiter

I am writing a shell script, in which I get the location of java via which java. As response I get (for example)
/usr/pi/java7_32/jre/bin/java.
I need the path to be cut so it ends with /jre/, more specificly
/usr/pi/java7_32/jre/
as the programm this information is provided to can not handle the longe path to work.
I have used cut with the / as delimiter and as I thought that the directory of the Java installation is always the same, therfore a
cut -d'/' -f1-5
worked just fine to get this result:
/usr/pi/java7_32/jre/
But as the java could be installed somewhere else aswell, for example at
/usr/java8_64/jre/
the statement would not work correctly.
I need tried sed, awk, cut and different combinations of them but found no answer I liked.
As the title says I would count the number of appereance of the car / until the substing jre/ is found under the premisse that the shell counts from the left to the right.
The incremented number would be the the field I want to see by cutting with the delimiter.
path=$(which java) # example: /usr/pi/java7_32/jre/bin/java
i=0
#while loop with a statment which would go through path
while substring != jre/ {
if (char = '/')
i++
}
#cut the path
path=$path | cut -d'/' -f 1-i
#/usr/pi/java7_32/jre result
Problem is the eventual difference in the path before and after
/java7_64/jre/, like */java*/jre/
I am open for any ideas and solutions, thanks a lot!
Greets
Jan
You can use the shell's built-in parameter operations to get what you need. (This will save the need to create other processes to extract the information you need).
jpath="$(which java)"
# jpath now /usr/pi/java7_32/jre/bin/java
echo ${jpath%jre*}jre
produces
/usr/pi/java7_32/jre
The same works for
jpath=/usr/java8_64/jre/
The % indicates remove from the right side of the string the matching shell reg-ex pattern. Then we just put back jre to have your required path.
You can overwrite the value from which java
jpath=${jpath%jre*}jre
IHTH
You can get the results with grep:
path=$(echo $path | grep -o ".*/jre/")

Ruby or Bash or Grep regex is not working for '?' i.e. 0 or 1 occurrence of previous character

Linux Red Hat Enterprise Linux Server release 6.9 (Santiago)
Issue:
I have 2 Jenkins jobs and they both which calls another common/reusable downstream job (that uses a regex to pick its rpm). Actual code is written in Ruby (where I was using ::Dir.glob("<pattern>",'') and it didn't work for picking the correct rpm name (without giving me any error), but here I'm just focusing on the regex part.
In job1, in my workspace, I see myrpm.rpm and myrpm-extra.rpm.
In job2, in my workspace, I see myrpm-3.0.0.1027-2018_12_21_121519.noarch.rpm and myrpm-extra-3.0.0.1027-2018_12_21_121519.noarch.rpm rpms which I'm getting after downloading these files from Artifactory via some AQL.
Once rpms are downloaded from Aritfactory i.e. available in Jenkins workspace , then I use this common downstream job to pick a given rpm that I need by
using "${rpmname}*.rpm" regex.
The issue is, when I'm passing rpmname parameter value as "myrpm", the logic is picking myrpm-extra.rpm (in Job1) or myrpm-extra-3.0.0.1027-2018_12_21_121519.noarch.rpm (Job2) instead of the correct one (non extra one), as - character's order comes first due to ASCII sequence.
I tested the regex and seeing why in the last command I didn't see expected output. Isn't 1? in the regex going to give us any lines which has arun with either 0 or 1 occurrence of character 1?
[giga#linux-server giga]# echo -e "arun\narun1\narun2\narun11" |grep "arun"
arun
arun1
arun2
arun11
[giga#linux-server giga]#
[giga#linux-server giga]# echo -e "arun\narun1\narun2\narun11" |grep "arun1"
arun1
arun11
[giga#linux-server giga]#
[giga#linux-server giga]# echo -e "arun\narun1\narun2\narun11" |grep "arun1?"
[giga#linux-server giga]#
Questions:
1. Why this works if I use egrep?
2. Why it didn't work with grep, while grep man page / examples tells it supports it?
3. What regex can I use so that if I pass myrpm as the job parameter's value, then it works in both Job1 and Job2 where rpm filename contains either the short and full rpm name.
Here: https://www.cyberciti.biz/faq/grep-regular-expressions/ (search for grep Regular Expression Operator) and
man grep shows:
Repetition
A regular expression may be followed by one of several repetition operators:
---------------------------------------------------------------
? The preceding item is optional and matched at most once.
---------------------------------------------------------------
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{,m} The preceding item is matched at most m times. This is a GNU extension.
{n,m} The preceding item is matched at least n times, but not more than m times.
egrep is the same as grep -E, in your case, the regex used, need -E or -P to support ?.
Search for different regex support please. There're POSIX, Extended, and Perl etc.
Resources:
POSIX Basic and Extended Regular Expressions
Perl regular expressions
Regular expression
Final Solution:
Used this regex pattern: ${package_name}(?:-[0-9]*)?\..*rpm, example of code (in Ruby) is shown below.
In Ruby: after you change to the directory where the rpms existed, I got this line to find the correct rpm name. NOTE: In Ruby, ::Dir.glob("<pattern>",..) is not a real REGEX pattern, it's just a SHELL glob pattern.
wildcard = "#{new_resource.session['package_name']}(?:-[0-9]*)?\..*#{ext_type}"
rpmfullname = Dir["#{new_resource.session['package_name']}*.#{ext_type}"].select { |f| f =~ /#{wildcard}/ }.sort.first
This website greatly helped.
https://rubular.com/
with example pattern being myrpm(?:-[0-9]*)?\..*rpm:
and test cases strings as:
myrpm.rpm
myrpm-extra.rpm
myrpm-1.0.0_112233.noarch.rpm
myrpm-11.0.0_112233.noarch.rpm
myrpm-111.0.0_112233.noarch.rpm
myrpm-1.0_112233.noarch.rpm
myrpm-extra-1.0.0_112233.noarch.rpm
foo.yum
berayan.rpm
arun.yum
Toni.rpm
myrpm-1.0.0_112233.rpm

How to get date and string separately in a given file name using shell script

Hi I am trying to get the date and string separately from the given file name but not getting exact idea how to do it.
This is the file name "95FILRDF01PUBLI20170823XEURC0V41000.XML"
I want to extract date "20170823" and string "XEUR" from this file name.
I was going through lots of posts in Stackexchange/Stackoverflow, but didn't understand the regular expression they are using.
https://unix.stackexchange.com/questions/182563/how-to-extract-a-part-of-file-name-in-unix-linux-shell-script
Extract part of filename
To extract date and name:
$ name="95FILRDF01PUBLI20170823XEURC0V41000.XML"
$ echo "$name" | sed -E 's/.*([[:digit:]]{8})([[:alpha:]]{4}).*/date=\1 name=\2/'
date=20170823 name=XEUR
The key part of the regex is ([[:digit:]]{8})([[:alpha:]]{4}). The first part of that, ([[:digit:]]{8}) matches 8 digits and saves them as group 1. The second part of that, ([[:alpha:]]{4}) matches four letters that follow the date and saves them as group 2.
The key part is surrounded by .* before and .* after which matches whatever is left over.
The replacement text is date=\1 name=\2 which formats the output.

how to edit url string with sed

My Linux repository file contain a link that until now was using http with a port number to point to it repository.
baseurl=http://host.domain.com:123/folder1/folder2
I now need a way to replace that URL to use https with no port or a different port .
I need also the possibility to change the server name for example from host.domain.com to host2.domain.com
So my idea was to use sed to search for the start of the http until the first / that come after the 2 // thus catching whatever in between and will give me the ability to change both server name port or http\s usage.
Im now using this code (im using echo just for the example):
the example shows how in 2 cases where one time i have a link with http and port 123 converted to https and the second time the other way around
and both code i was using the same sed for generic reasons.
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
OR
WANTED_URL="http://host.domain.com:123"
echo 'https://host.domain.com/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
is that the correct way doing so?
sed regexes are greedy by default. You can tell sed to consume only non-slashes, like this:
echo 'http://host.domain.com:123/folder1/folder2' | sed -e 's|http://[^/]*|https://host.domain.com|'
result:
https://host.domain.com/folder1/folder2
(BTW you don't have to escape slashes because you are using an alternate separating character)
the key is using [^/]* which will match anything but slashes so it stops matching at the first slash (non-greedy).
You used /.*/ and .* can contain slashes, not that you wanted (greedy by default).
Anyway my approach is different because expression does not include the trailing slash so it is not removed from final output.
Assuming it doesn't really matter if you have 1 sed script or 2 and there isn't a good reason to hard-code the URLs:
$ echo 'http://host.domain.com:123/folder1/folder2' |
sed 's|\(:[^:]*\)[^/]*|s\1|'
https://host.domain.com/folder1/folder2
$ port='123'; echo 'https://host.domain.com/folder1/folder2' |
sed 's|s\(://[^/]*\)|\1:'"$port"'|'
http://host.domain.com:123/folder1/folder2
If that isn't what you need then edit your question to clarify your requirements and in particular explain why:
You want to use hard-coded URLs, and
You need 1 script to do both transformations.
and provide concise, testable sample input and expected output that demonstrates those needs (i.e. cases where the above doesn't work).
wrt what you had:
WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"
The main issues are:
Don't use all-upper-case for non-exported shell variable names to avoid clashes with exported variables and to avoid obfuscating your code (this convention has been around for 40 years so people expect all upper case variables to be exported).
Never enclose any script in double quotes as it exposes the whole script to the shell for interpretation before the command you want to execute even sees it. Instead just open up the single quotes around the smallest script segment possible when necessary, i.e. to expand $y in a script use cmd 'x'"$y"'z' not cmd "x${y}z" because the latter will fail cryptically and dangerously given various input, script text, environment settings and/or the contents of the directory you run it from.
The -i option for sed is to edit a file in-place so you can't use it on an incoming pipe because you can't edit a pipe in-place.
When you let a shell variable expand to become part of a script, you have to take care about the possible characters it contains and how they'll be interpreted by the command given the context the variable expands into. If you let a whole URL expand into the replacement section of a sed script then you have to be careful to first escape any potential backreference characters or script delimiters. See Is it possible to escape regex metacharacters reliably with sed. If you just let the port number expand then you don't have to deal with any of that.

Call script on all file names starting with string in folder bash

I have a set of files I want to perform an action on in a folder that i'm hoping to write a scipt for. Each file starts with mazeFilex where x can vary from any number , is there a quick and easy way to perform an action on each file? e.g. I will be doing
cat mazeFile0.txt | ./maze_ppm 5 | convert - maze0.jpg
how can I select each file knowing the file will always start with mazeFile?
for fname in mazeFile*
do
base=${fname%.txt}
base=${base#mazeFile}
./maze_ppm 5 <"$fname" | convert - "maze${base}.jpg"
done
Notes
for fname in mazeFile*; do
This codes starts the loop. Written this way, it is safe for all filenames, whether they have spaces, tabs or whatever in their names.
base=${fname%.txt}; base=${base#mazeFile}
This removes the mazeFile prefix and .txt suffix to just leave the base name that we will use for the output file.
./maze_ppm 5 <"$fname" | convert - "maze${base}.jpg"
The output filename is constructed using base. Note also that cat was unnecessary and has been removed here.
for i in mazeFile*.txt ; do ./maze_ppm 5 <$i | convert - `basename maze${i:8} .txt`.jpg ; done
You can use a for loop to run through all the filenames.
#!/bin/bash
for fn in mazeFile*; do
echo "the next file is $fn"
# do something with file $fn
done
See answer here as well: Bash foreach loop
I see you want a backreference to the number in the mazeFile. Thus I recommend John1024's answer.
Edit: removes the unnecessary ls command, per #guido 's comment.

Resources