Parsing data in Bash - bash

I have a file with lots of Rubbish inside. All of it is in one line.
But there are often things like:
"var":"value"
Before and after are different characters ...
What I want to do is, to extract only the above mentioned format and put them into a single line. Any ideas how I could realize it in Shell scripting?
Best regards,
Alex

I believe
grep -o '"[^"]*":"[^"]*"' yourFile.txt > yourOutput.txt
would do the trick:
> echo 'xxx "a":"b" yyy"x":"y"' | grep -o '"[^"]*":"[^"]*"'
"a":"b"
"x":"y"

Related

grep a string between two patterns multiple instances in a file?

I'm new to bash scripting and I need to make a script that will go through files of logs about jobs that ran and I need to extract certain values such as the memory used and then the memory requested to calculate the memory used.
To begin this I'm simply trying to get a grep command that will grep a value between two patterns in a file, which will be my starting point for this script.
The file looks something like this:
20200429:04/29/2020 04:25:32;S;1234567.vpbs3;user=xx group=xxxxxx=_xxx_xxx_xxxx jobname=xx_xxxxxx queue=xxx ctime=1588148732 qtime=1588148732 etime=1588148732 start=1588148732 exec_host=xxx2/1*8 exec_vnode=(xx2:mem=402653184kb:ncpus=8) Resource_List.mem=393216mb Resource_List.ncpus=8 Resource_List.nodect=1 Resource_List.place=free Resource_List.preempt_targets=NONE Resource_List.Qlist=xxxq Resource_List.select=1:mem=393216mb:ncpus=8 Resource_List.walltime=24:00:00 resource_assigned.mem=402653184kb resource_assigned.ncpus=8
The values in bold are what I need to extract. Its multiple jobs and dates, so the file goes on with multiple paragraphs like this of data with different dates and numbers.
From going through similar questions online, I've come up with:
egrep -Eo 'Resource_List.mem=.{1,50}' sampleoutput.txt | cut -d "=" -f 2-
and I get multple lines of this:
393216mb Resource_List.ncpus=8 Resource_List.nodec
and I'm stuck as to how to get only that '393216mb' as I've never really used grep or cut much. Any suggestions, even if its not using grep, would be greatly appreciated!
Use:
grep -o -E 'Resource_List.mem=[^\ ]+|resource_assigned.mem=[^\ ]+'
Very close! . is a wildcard, you want to match numbers.
egrep -Eo 'Resource_List.mem=[0-9]*..' sampleoutput.txt

Unable to modify file as part of entry point command

My Dockerfile's entry point CMD executes a shell script to modify a local file based on an environment variable before executing my application (Flask). The shell script is like so:
cat static/login.html | sed "s/some_match/some_substitute/g" > static/login.html
However, I am finding that the resulting file is zero bytes. Any ideas what might be going on?
Thanks.
I can reproduce your problem, but not explain it. Maybe other answers follow.
But here is a solution which fixes the problem:
Use a different file name for the output than for the input.
cat input.txt | sed "s/a/b/g" > input2.txt
Alternatively, use the -i.bak option.
sed -i.bak "s/b/a/g" input.txt
I can only speculate about what exactly is going on:
Maybe the output of your pipe is opened for writing (non-appending) before the input is read.

Parsing simple text file and integrate content in bash script

I want to automatically parse a text file with a content like this:
car 12345 W
train 54321 D
To be integrated in a new bash file. The content after that should look like that:
curl http://example.com/?vehicle=car&number=12345&period=W
curl http://example.com/?vehicle=train&number=54321&period=D
My problem is that I really don't now how to realize that or which program to use, sed, awk, etc..
What do I have to do?
Here's probably the funniest answer: assume your file is file.txt, then just do this:
printf 'curl http://example.com/?vehicle=%s&number=%s&period=%s\n' $(<file.txt)
It:
is 100% pure bash,
has no explicit loops
is the shortest answer
is very funny
Hope this helps :)
Try this:
while read vehicle number period ; do
echo curl http://example.com/\?vehicle="$vehicle"\&number="$number"\&period="$period"
done < input.txt
Maybe use sed to interleave the URL arguments with other parts. Guessing from the sample you posted:
sed 's/\([a-z]*\) \([0-9]*\) \([A-Z]\)/curl http:\/\/example.com\/?vehicle=\1\&number=\2\&period=\3/'

Trimming pathnames beyond a keyword (awk, sed, ?)

I want to trim a pathname beyond a certain point after finding a keyword. I'm drawing a blank this morning.
/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java
I want to find the keyword Java, save the pathname beyond that (tsupdater), then cut everything off after the Java portion.
I don't know if this is what you want, but you can split the pathname into two with:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 'h;s/.*Java//p;g;s/Java.*/Java/'
Which outputs:
/tsupdater/src/tsupdater.java
/home/quikq/1.0/dev/Java
If you would like to save the second part into a file part2.txt and print the first part, you could do:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 'h;s/.*Java//;wpart2.txt;g;s/Java.*/Java/'
If you're writing a shell script:
myvar="/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"
part1="${myvar%Java*}Java"
part2="${myvar#*Java/}"
Hope this helps =)
take one you need:
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#(.*Java/[^/]*).*#\1#g'
/home/quikq/1.0/dev/Java/tsupdater
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#(.*Java).*#\1#g'
/home/quikq/1.0/dev/Java
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#.*Java/([^/]*).*#\1#g'
tsupdater
I'm not entirely sure what you want as output (please specify more clearly), but this command:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 's/.*Java//'
results in:
/tsupdater/src/tsupdater.java
If you want the preceding part then this command:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 's/Java.*//'
results in:
/home/quikq/1.0/dev/
Like I said, I was having a weird morning, but it dawned on me.
echo /home/quikq/1.0/dev/Java/TSUpdater/src/TSUpdater.java | sed s/Java.*//g
Yields
/home/quikq/1.0/dev
Lots of great tips here for chopping it up different ways though. Thanks a bunch!

BASH file attribute gymnastics: How do I easily get a file with full paths and privileges?

Dear Masters of The Command Line,
I have a directory tree for which I want to generate a file that contains on two entries per line: full path for each file and the corresponding privileges of said file.
For example, one line might contain:
/v1.6.0.24/lib/mylib.jar -r-xr-xr-x
The best way to generate the left hand column there appears to be find. However, because ls doesn't seem to have a capability to either read a list of filenames or take stdin, it looks like I have to resort to a script that does this for me. ...Cumbersome.
I was sure I've seen people somehow get find to run a command against each file found but I must be daft this morning as I can't seem to figure it out!
Anyone?
In terms of reading said file there might be spaces in filenames, so it sure would be nice if there was a way to get some of the existing command-line tools to count fields right to left. For example, we have cut. However, cut is left-hand-first and won't take a negative number to mean start the numbering on the right (as seems the most obvious syntax to me). ... Without having to write a program to do it, are there any easy ways?
Thanks in advance, and especial thanks for explaining any examples you may provide!
Thanks,
RT
GNU findutils 4.2.5+:
find -printf "$PWD"'/%p %M\n'
It can also be done with ls and awk:
ls -l -d $PWD/* | awk '{print $9 " " $1}' > my_files.txt
stat -c %A file
Will print file permissions for file.
Something like:
find . -exec echo -ne '{}\t\t' ';' -exec stat -c %A {} ';'
Will give you a badly formatted version of what your after.
It is made much trickier because you want everything aligned in tables. You might want to look into the 'column' command. TBH I would just relax my output requirements a little bit. Formatting output in SH is a pain in the ass.
bash 4
shopt -s globstar
for file in /path/**
do
stat -c "%n %A" "$file"
done

Resources