How to take string from a file name and use it as an argument - bash

If a file name is in this format
assignment_number_username_filename.extension
Ex.
assignment_01_ssaha_homework1.txt
I need to extract just the username to use it in the rest of the script.
How do I take just the username and use it as an argument.
This is close to what I'm looking for but not exactly:
Extracting a string from a file name
if someone could explain how sed works in that scenario that would be just as helpful!
Here's what I have so far; I haven't used cut in a while so I'm getting error messages while trying to refresh myself.
#!/bin/sh
a = $1
grep $a /home | cut -c 1,2,4,5 echo $a`

You probably need command substitution, plus echo plus sed. You need to know that sed regular expressions can remember portions of the match. And you need to know basic regular expressions. In context, this adds up to:
filename="assignment_01_ssaha_homework1.txt"
username=$(echo "$file" | sed 's/^[^_]*_[^_]*_\([^_]*\)_[^.]*\.[^.]*$/\1/')
The $(...) notation is command substitution. The commands in between the parentheses are run and the output is captured as a string. In this case, the string is assigned to the variable username.
In the sed command, the overall command applies a particular substitution (s/match/replace/) operation to each line of input (here, that will be one line). The [^_]* components of the regular expression match a sequence of (zero or more) non-underscores. The \(...\) part remembers the enclosed regex (the third sequence of non-underscores, aka the user name). The switch to [^.]* at the end recognizes the change in delimiter from underscore to dot. The replacement text \1 replaces the entire name with the remembered part of the pattern. In general, you can have several remembered subsections of the pattern. If the file name does not match the pattern, you'll get the input as output.
In bash, there are ways of avoiding the echo; you might well be able to use some of the more esoteric (meaning 'not available in other shells') mechanisms to extract the data. That will work on the majority of modern POSIX-derived shells (Korn, Bash, and others).

filename="assignment_01_ssaha_homework1.txt"
username=$(echo "$file" | awk -F_ '{print $3}')

Just bash:
filename="assignment_01_ssaha_homework1.txt"
tmp=${filename%_*}
username=${tmp##*_}
http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion

Related

sed: Can't replace latest text occurrence including "-" dashes using variables

Trying to replace a text to another with sed, using a variable. It works great until the variable's content includes a dash "-" and sed tries to interpret it.
It is to be noted that in this context, I need to replace only the latest occurrence of the origin variable ${src}, which is why my sed command looks like this:
sed -e "s:${source}([^${source}]*)$:${dest}\1:"
"sed" is kind of new to me, I always got my results with "replace" or "awk" whenever possible, but here I'm trying to make the code as versatile as possible, hence using sed. If you think of another solution, that is viable as well.
Example for the issue:
# mkdir "/home/youruser/TEST-master"
# source="TEST-master" ; dest="test-master" ; find /home/youruser/ -depth -type d -name '*[[:upper:]]*' | grep "TEST" | sed -e "s:${source}([^${source}]*)$:${dest}\1:"
sed: -e expression #1, char 46: Invalid range end
Given that I don't know how many dashes every single variable may contain, does any sed expert know how could I make this work?
Exact context: Open source project LinuxGSM for which I'm rewriting a function to recursively lowercase files and directories.
Bash function I'm working on and comment here: https://github.com/GameServerManagers/LinuxGSM/issues/1868#issuecomment-996287057
If I'm understanding the context right, the actual goal is to take a path that contains some uppercase characters in its last element, and create a version with the last element lowercased. For example, /SoMe/PaTh/FiLeNaMe would be converted to /SoMe/PaTh/filename. If that's the case, rather than using string substitution, use dirname and basename to split it into components, uppercase the last, then reassemble it:
parentdir=$(dirname "$src")
filename=$(basename "$src")
lowername=$(echo "${latestpath}" | tr '[:upper:]' '[:lower:]')
dst="$parentdir/$lowername"
(Side note: it's important to quote the parameters to tr, to make sure the shell doesn't treat them as filename wildcards and replace them with lists of matching files.)
As long as the paths contain at least one "/" but not end with "/", you can use bash substitutions instead of dirname and basename:
parentdir="${src%/*}"
filename="${src##*/}"
As long as you're using bash v4.0 or later, you can also use a builtin substitution to do the lowercasing:
lowername="${filename,,}"

Grepping for exact string while ignoring regex for dot character

So here's my issue. I need to develop a small bash script that can grep a file containing account names (let's call it file.txt). The contents would be something like this:
accounttest
account2
account
accountbtest
account.test
Matching an exact line SHOULD be easy but apparently it's really not.
I tried:
grep "^account$" file.txt
The output is:
account
So in this situation the output is OK, only "account" is displayed.
But if I try:
grep "^account.test$" file.txt
The output is:
accountbtest
account.test
So the next obvious solution that comes to mind, in order to stop interpreting the dot character as "any character", is using fgrep, right?
fgrep account.test file.txt
The output, as expected, is correct this time:
account.test
But what if I try now:
fgrep account file.txt
Output:
accounttest
account2
account
accountbtest
account.test
This time the output is completely wrong, because I can't use the beginning/end line characters with fgrep.
So my question is, how can I properly grep a whole line, including the beginning and end of line special characters, while also matching exactly the "." character?
EDIT: Please note that I do know that the "." character needs to be escaped, but in my situation, escaping is not an option, because of further processing that needs to be done to the account name, which would make things too complicated.
The . is a special character in regex notation which needs to be escaped to match it as a literal string when passing to grep, so do
grep "^account\.test$" file.txt
Or if you cannot afford to modify the search string use the -F flag in grep to treat it as literal string and not do any extra processing in it
grep -Fx 'account.test' file.txt
From man grep
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
-x, --line-regexp
Select only those matches that exactly match the whole line. For a regular expression pattern, this is like parenthesizing the pattern and then surrounding it with ^ and $.
fgrep is the same as grep -F. grep also has the -x option which matches against whole lines only. You can combine these to get what you want:
grep -Fx account.test file.txt

Dynamic delimiter in Unix

Input:-
echo "1234ABC89,234" # A
echo "0520001DEF78,66" # B
echo "46545455KRJ21,00"
From the above strings, I need to split the characters to get the alphabetic field and the number after that.
From "1234ABC89,234", the output should be:
ABC
89,234
From "0520001DEF78,66", the output should be:
DEF
78,66
I have many strings that I need to split like this.
Here is my script so far:
echo "1234ABC89,234" | cut -d',' -f1
but it gives me 1234ABC89 which isn't what I want.
Assuming that you want to discard leading digits only, and that the letters will be all upper case, the following should work:
echo "1234ABC89,234" | sed 's/^[0-9]*\([A-Z]*\)\([0-9].*\)/\1\n\2/'
This works fine with GNU sed (I have 4.2.2), but other sed implementations might not like the \n, in which case you'll need to substitute something else.
Depending on the version of sed you can try:
echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1\n\2/'
or:
echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1$\2/' | tr '$' '\n'
DEF
78,66
Explanation: the regular expression replaces the input with the expected output, except instead of the new-line it puts a "$" sign, that we replace to a new-line with the tr command
Where do the strings come from? Are they read from a file (or other source external to the script), or are they stored in the script? If they're in the script, you should simply reformat the data so it is easier to manage. Therefore, it is sensible to assume they come from an external data source such as a file or being piped to the script.
You could simply feed the data through sed:
sed 's/^[0-9]*\([A-Z]*\)/\1 /' |
while read alpha number
do
…process the two fields…
done
The only trick to watch there is that if you set variables in the loop, they won't necessarily be visible to the script after the done. There are ways around that problem — some of which depend on which shell you use. This much is the same in any derivative of the Bourne shell.
You said you have many strings like this, so I recommend if possible save them to a file such as input.txt:
1234ABC89,234
0520001DEF78,66
46545455KRJ21,00
On your command line, try this sed command reading input.txt as file argument:
$ sed -E 's/([0-9]+)([[:alpha:]]{3})(.+)/\2\t\3/g' input.txt
ABC 89,234
DEF 78,66
KRJ 21,00
How it works
uses -E for extended regular expressions to save on typing, otherwise for example for grouping we would have to escape \(
uses grouping ( and ), searches three groups:
firstly digits, + specifies one-or-more of digits. Oddly using [0-9] results in an extra blank space above results, so use POSIX class [[:digit:]]
the next is to search for POSIX alphabetical characters, regardless if lowercase or uppercase, and {3} specifies to search for 3 of them
the last group searches for . meaning any character, + for one or more times
\2\t\3 then returns group 2 and group 3, with a tab separator
Thus you are able to extract two separate fields per line, just separated by tab, for easier manipulation later.

Using sed/awk to process a pattern in bash

I have a command whose output is of the form:
[{"foo1":<some value>,"foo2":<some value>,"foo3":<some value>}]
I want to take the output of this command and just get the value corresponding to foo2
How do I use sed/awk or any other shell utility readily available in a bash script to do this?
Assuming that the values do not contain commas, this sed rune will do it:
sed -n 's/.*"foo2":\([^,]*\),.*/\1/'p
sed -n tells sed not to print lines by default.
The s ("substitute") command uses a regexp group delimited by \( and \) to pick out just the bit you want.
"foo2": provides the context needed to find the right value.
[^,]* means "a character that is not a comma, any number of times". This is your . If values are not delimited by commas, change this (and the comma after the grouping parens) to match correctly.
.* means "any character, any number of times", and it is used to match all the characters before and after the bit you want. Now the regexp will match the entire line.
\1 means the contents of the grouping parentheses. sed will substitute the string that matches the pattern (which is the whole line, because we used .* at the beginning and end) with the contents of the parens, .
Finally, the p on the end means "print the resulting line".
With this awk for example:
$ awk -F[:,] '{print $4}' file
<some value2>
-F[:,] sets possible field separators as : or ,. Then, it is a matter of counting the position in which <some value> of foo2 are. It happens to be the 4th.
With sed:
$ sed 's/.*"foo2":\([^,]*\).*/\1/g' file
<some value2>
.*"foo2":\([^,]*\).* gets the string coming after foo2: and until the comma appears. Then it prints it back with \1.
Your block of data looks like JSON. There is no native JSON parsing in bash, sed or awk, so ALL the answers here will either suggest that you use a different, more appropriate tool, or they will be hackish and might easily fail if your real data looks different from the example you've provided here.
That said, if you are confident that your variable:value blocks and line structure are always in the same format as this example, you may be able to get away with writing your own (very) basic parser that will work for just your use case.
Note that you can't really parse things in sed, it's just not designed for that. If your data always looks the same, a sed solution may be sufficient ... but remember that you are simply pattern matching, not parsing the input data. There are other answers already which cover this.
For very simple matching of the string that appears after the colon after "foo2", as Peter suggested, you could use the following:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | sed -ne 's/.*"foo2":\([^,]*\),.*/\1/p'
As I say, this should in no way be confused with parsing of your JSON. It would work equally well (or badly) with an input string of abcde"foo2":bar,abcde.
In awk, you can make things that are a bit more advanced, but you still have serious limitations when it comes to JSON. For example, if you choose to separate fields with commas, but then you put a comma inside the <some value> in your data, awk doesn't know how to distinguish it from a field separator.
That said, if your JSON is only one level deep (i.e. matches your sample data), the following might work for you:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | awk -F: -vRS=, '{gsub(/[^[:alnum:]]/,"",$1)} $1=="foo2" {print $2}'
This awk script considers commas as record separators and colons as field separators. It does not support any level of depth in your JSON, and depends on alphanumeric variable names. But it should handle JSON split on to multiple lines.
Alternately, if you want to avoid ugly hacks, and perl or python solutions don't work for you, you might want to try out jsawk. With it, you might use something like this:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | jsawk -a 'return this.foo2'
[222]
SEE ALSO: Parsing json with awk/sed in bash to get key value pair
This worked for me. You can Try this one
echo "[{"foo1":<some value>,"foo2":<some value>,"foo3":<some value>}]" | awk -F"[:,]+" '{ if($3=="foo2") { print $4 }}'
Above line awk uses multiple field separators.I have used colon and comma here
Since this looks like JSON, let's parse it like JSON:
perl -MJSON -ne '$json = decode_json($_); print $json->[0]{foo2}, "\n"' <<END
[{"foo1":"some value","foo2":"some, value","foo3":"some value"}]
END
some, value

Delete a specific string with tr

Is it possible to delete a specific string with tr command in a UNIX-Shell?
For example: If I type:
tr -d "1."
and the input is 1.1231, it would show 23 as an output, but I want it to show 1231 (notice only the first 1 has gone). How would I do that?
If you know a solution or a better way, please explain the syntax since I don't want to just copy&paste but also to learn.
I have huge problems with awk, so if you use this, please explain it even more.
In your example above the cut command would suffice.
Example: echo '1.1231' | cut -d '.' -f 2 would return 1231.
For more information on cut, just type man cut.
You would be better off using some kind of regex (maybe something like sed).
For example, with the input 1.1231 you could use the following to get the 1231 output:
sed 's/1\.//g'
Maybe have a look here:
http://tldp.org/LDP/abs/html/string-manipulation.html
You could also use sed for this kind of thing:
$ echo "1.1231" | sed -e "s/1\.//"
1231
This is just using sed to run a regular expression search and replace, replacing "1." (with appropriate escaping) with "". It only deletes the first match by default.
If you are using bash, you can do this easily with parameter substitution:
$ a=1.1231
$ echo ${a#1.}
1231
This will remove the leading "1." string. If you want to remove up to and including the first occurrence, use ${a#*1.} and if you want to remove everything up to and including the last occurrence, use ${##*1.}.
The TLDP page on string manipulation has further options (such as substring extraction).
Note that using standard sh built-in string manipulation tools for such simple transformations will always be much faster than using an external tool, such as sed, awk or cut because the shell doesn't have to create a sub-process to perform the operation. However, for more complicated things (e.g. you need to use regular expressions or when the input is large), you're better of using the dedicated tools.
Since you asked specifically about awk, here is another one.
awk '{ gsub(/1\./,"") }1' input.txt
As any awk tutorial will tell you, the general form of an awk program is a sequence of 'condition { actions }'. If you have no actions, the default action is to print. If you have no conditions, the actions will be taken unconditionally. This program uses both of these special cases.
The first part is an action without a condition, i.e. it will be taken for all lines. The action is to substitute all occurrences of the regular expression /1\./ with nothing. So this will trim any '1.' (regardless of context) from a line.
The second part is a condition without an action, i.e. it will print if the condition is true, and the condition is always true. This is a common idiom for "we are done -- print whatever we have now". It consists simply of the constant 1 (which when used as a condition means "true", simply).
This could be reformulated in a number of ways. For example, you could factor the print into the first action;
awk '{ gsub(/1\./,""); print }' input.txt
Perhaps you want to substitute the integer part, i.e. any numbers before a period sign. The regex for that would be something like /[0-9]+\./.
gsub is a GNU extension, so you might want to replace it with sub or some sort of loop if you need portability to legacy awk syntax.

Resources