Trimming pathnames beyond a keyword (awk, sed, ?) - bash

I want to trim a pathname beyond a certain point after finding a keyword. I'm drawing a blank this morning.
/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java
I want to find the keyword Java, save the pathname beyond that (tsupdater), then cut everything off after the Java portion.

I don't know if this is what you want, but you can split the pathname into two with:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 'h;s/.*Java//p;g;s/Java.*/Java/'
Which outputs:
/tsupdater/src/tsupdater.java
/home/quikq/1.0/dev/Java
If you would like to save the second part into a file part2.txt and print the first part, you could do:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 'h;s/.*Java//;wpart2.txt;g;s/Java.*/Java/'
If you're writing a shell script:
myvar="/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"
part1="${myvar%Java*}Java"
part2="${myvar#*Java/}"
Hope this helps =)

take one you need:
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#(.*Java/[^/]*).*#\1#g'
/home/quikq/1.0/dev/Java/tsupdater
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#(.*Java).*#\1#g'
/home/quikq/1.0/dev/Java
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#.*Java/([^/]*).*#\1#g'
tsupdater

I'm not entirely sure what you want as output (please specify more clearly), but this command:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 's/.*Java//'
results in:
/tsupdater/src/tsupdater.java
If you want the preceding part then this command:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 's/Java.*//'
results in:
/home/quikq/1.0/dev/

Like I said, I was having a weird morning, but it dawned on me.
echo /home/quikq/1.0/dev/Java/TSUpdater/src/TSUpdater.java | sed s/Java.*//g
Yields
/home/quikq/1.0/dev
Lots of great tips here for chopping it up different ways though. Thanks a bunch!

Related

How to remove the N th target word(='remove_mark') in a line by sed?

I am learning sed of shell.
I tried the following code,
echo "one tworemove_markthree fourremove_markfive" | sed -E "s?(.*)remove_mark(.*)?\1\2?"
I expected the output of this is
one twothree fourremove_markfive
But the output of above code is following,
one tworemove_markthree fourfive
The first remove_mark is remained but the second one is removed.
However I would like to remove the first one. How to do it? And How to removed all of matched target word? Thank you very much.
By just matching remove_mark and replacing with nothing.
Example
$ echo "one tworemove_markthree fourremove_markfive" | sed 's/remove_mark//'
one twothree fourremove_markfive
To remove all the targets, use g(global) modifier.
Example
$ echo "one tworemove_markthree fourremove_markfive" | sed 's/remove_mark//g'
one twothree fourfive

Bash show charcaters if not in string

I am trying out bash, and I am trying to make a simple hangman game now.
Everything is working but I don't understand how to do one thing:
I am showing the user the word with guessed letters (so for example is the world is hello world, and the user guessed the 'l' I show them **ll* ***l* )
I store the letters that the user already tried in var guess
I do that with the following:
echo "${word//[^[:space:]$guess]/*}"
The thing I want to do now is echo the alphabet, but leave out the letters that the user already tried, so in this case show the full alphabet without the L.
I already tried to do it the same way as I shown just yet, but it won't quite work.
If you need any more info please let me know.
Thanks,
Tim
You don't show what you tried, but parameter expansion works fine.
$ alphabet=abcdefghijklmnopqrstuvwxyz
$ word="hello world"
$ guesses=aetl
$ echo "${word//[^[:space:]$guesses]/*}"
*ell* ***l*
$ echo "${alphabet//[$guesses]/*}"
*bcd*fghijk*mnopqrs*uvwxyz
First store both strings in files where they are stored one char per line:
sed 's/./&\n/g' | sort <<< $guess > guessfile
sed 's/./&\n/g' | sort <<< $word > wordfile
Then we can filter the words that are only present in one of the files and paste the lines together as a string:
grep -xvf guessfile wordfile | paste -s -d'\0'
And of course we clean up after ourselves:
rm wordfile
rm guessfile
If the output is not correct, try switching arguments in grep (i.e. wordfile guessfile instead of guessfile wordfile).

Use awk to extract value from a line

I have these two lines within a file:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
where I'd like to get the following as output using awk or sed:
3
50000
Using this sed command does not work as I had hoped, and I suspect this is due to the presence of the quotes and delimiters in my line entry.
sed -n '/WORD1/,/WORD2/p' /path/to/file
How can I extract the values I want from the file?
awk -F'[<>]' '{print $3}' input.txt
input.txt:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
Output:
3
50000
sed -e 's/[a-zA-Z.<\/>= \-]//g' file
Using sed:
sed -E 's/.*limit"*>([0-9]+)<.*/\1/' file
Explanation:
.* takes care of everything that comes before the string limit
limit"* takes care of both the lines, one with limit" and the other one with just limit
([0-9]+) takes care of matching numbers and only numbers as stated in your requirement.
\1 is actually a shortcut for capturing pattern. When a pattern groups all or part of its content into a pair of parentheses, it captures that content and stores it temporarily in memory. For more details, please refer https://www.inkling.com/read/introducing-regular-expressions-michael-fitzgerald-1st/chapter-4/capturing-groups-and
The script solution with parameter expansion:
#!/bin/bash
while read line || test -n "$line" ; do
value="${line%<*}"
printf "%s\n" "${value##*\>}"
done <"$1"
output:
$ ./ltags.sh dat/ltags.txt
3
50000
Looks like XML to me, so assuming it forms part of some valid XML, e.g.
<root>
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
</root>
You can use Perl's XML::Simple and do something like this:
perl -MXML::Simple -E '$xml = XMLin("file"); say $xml->{"first-value"}->{"content"}; say $xml->{"second-value-limit"}'
Output:
3
50000
If the XML structure is more complicated, then you may have to drill down a bit deeper to get to the values you want. If that's the case, you should edit the question to show the bigger picture.
Ashkan's awk solution is straightforward, but let me suggest a sed solution that accepts non-integer numbers:
sed -n 's/[^>]*>\([.[:digit:]]*\)<.*/\1/p' input.txt
This extracts the number between the first > character of the line and the following <. In my RE this "number" can be the empty string, if you don't want to accept an empty string please add the -r option to sed and replace \([.[:digit:]]*\) by ([.[:digit:]]+).

Remove nth character from middle of string using Shell

I've been searching google for ever, and I cannot find an example of how to do this. I also do not grasp the concept of how to construct a regular expression for SED, so I was hoping someone could explain this to me.
I'm running a bash script against a file full of lines of text that look like this: 2222,H,73.82,04,07,2012
and I need to make them all look like this: 2222,H,73.82,04072012
I need to remove the last two commas, which are the 16th and 19th characters in the line.
Can someone tell me how to do that? I was going to use colrm, which is blessedly simple, but i can't seem to get that installed in CYGWIN. Please and thank you!
I'd use awk for this:
awk -F',' -v OFS=',' '{ print $1, $2, $3, $4$5$6 }' inputfile
This takes a CSV file and prints the first, second and third fields, each followed by the output field separator (",") and then the fourth, fifth and sixth fields concatenated.
Personally I find this easier to read and maintain than regular expression-based solutions in sed and it will cope well if any of your columns get wider (or narrower!).
This will work on any string and will remove only the last 2 commas:
sed -e 's/\(.*\),\([^,]*\),\([^,]*\)$/\1\2\3/' infile.txt
Note that in my sed variant I have to escape parenthesis, YMMV.
I also do not grasp the concept of how to construct a regular
expression for SED, so I was hoping someone could explain this to me.
The basic notation that people are telling you here is: s/PATTERN/REPLACEMENT/
Your PATTERN is a regular expression, which may contain parts that are in brackets. Those parts can then be referred to in the REPLACEMENT part of the command. For example:
> echo "aabbcc" | sed 's/\(..\)\(..\)\(..\)/\2\3\1/'
bbccaa
Note that in the version of sed I'm using defaults to the "basic" RE dialect, where the brackets in expressions need to be escaped. You can do the same thing in the "extended" dialect:
> echo "aabbcc" | sed -E 's/(..)(..)(..)/\2\3\1/'
bbccaa
(In GNU sed (which you'd find in Linux), you can get the same results with the -r options instead of -E. I'm using OS X.)
I should say that for your task, I would definitely follow Johnsyweb's advice and use awk instead of sed. Much easier to understand. :)
It should work :
sed -e 's~,~~4g' file.txt
remove 4th and next commas
echo "2222,H,73.82,04,07,2012" | sed -r 's/(.{15}).(..)./\1\2/'
Take 15 chars, drop one, take 2, drop one.
sed -e 's/(..),(..),(....)$/\1\2\3/' myfile.txt

Bash substring with pipes and stdin

My goal is to cut the output of a command down to an arbitrary number of characters (let's use 6). I would like to be able to append this command to the end of a pipeline, so it should be able to just use stdin.
echo "1234567890" | your command here
# desired output: 123456
I checked out awk, and I also noticed bash has a substr command, but both of the solutions I've come up with seem longer than they need to be and I can't shake the feeling I'm missing something easier.
I'll post the two solutions I've found as answers, I welcome any critique as well as new solutions!
Solution found, thank you to all who answered!
It was close between jcollado and Mithrandir - I will probably end up using both in the future. Mithrandir's answer was an actual substring and is easier to view the result, but jcollado's answer lets me pipe it to the clipboard with no EOL character in the way.
Do you want something like this:
echo "1234567890" | cut -b 1-6
What about using head -c/--bytes?
$ echo t9p8uat4ep | head -c 6
t9p8ua
I had come up with:
echo "1234567890" | ( read h; echo ${h:0:6} )
and
echo "1234567890" | awk '{print substr($0,1,6)}'
But both seemed like I was using a sledgehammer to hit a nail.
This might work for you:
printf "%.6s" 1234567890
123456
If your_command_here is cat:
% OUTPUT=t9p8uat4ep
% cat <<<${OUTPUT:0:6}
t9p8ua

Resources