How to use grep/awk/sed to print until a certain character? - bash

I am a complete beginner on shell scripting and I am trying to iterate through a set of JSON files and trying to extract a certain field out of it. Each JSON file has a "country:"xxx" field. In each JSON file, there are 10k of the same field with the same country name so I need only the first occurrence and I can do that using "-m 1".
I tried to use grep for this but could not figure out how to extract the whole field including the country name from each file at first occurrence.
for FILE in *.json;
do
grep -o -a -m 1 -h -r '"country":"' $FILE;
done
I tried to use another pipe and use the below pattern but it did not work
| egrep -o '^[^"]+'
Actual Output:
"country":"
"country":"
"country":"
Desired Output:
"country:"romania"
"country:"united kingdom"
"country:"tajikistan"
but I need the whole thing. Any help would be great. Thanks

There is one general answer on the question "I only want the first occurence", and that answer is:
... | head -n 1
This mean, whatever your do: take the head (the first lines), the -n switch gives you the possibility to say how many you want (one in this case).
The same can be done for the last occurence(s), but then you use tail instead of head (you can also use the -n switch).

After trying many things. I found the pattern I was looking for.
grep -Po '"country":.*?[^\\]",' $FILE | head -n 1;

Related

grep a string between two patterns multiple instances in a file?

I'm new to bash scripting and I need to make a script that will go through files of logs about jobs that ran and I need to extract certain values such as the memory used and then the memory requested to calculate the memory used.
To begin this I'm simply trying to get a grep command that will grep a value between two patterns in a file, which will be my starting point for this script.
The file looks something like this:
20200429:04/29/2020 04:25:32;S;1234567.vpbs3;user=xx group=xxxxxx=_xxx_xxx_xxxx jobname=xx_xxxxxx queue=xxx ctime=1588148732 qtime=1588148732 etime=1588148732 start=1588148732 exec_host=xxx2/1*8 exec_vnode=(xx2:mem=402653184kb:ncpus=8) Resource_List.mem=393216mb Resource_List.ncpus=8 Resource_List.nodect=1 Resource_List.place=free Resource_List.preempt_targets=NONE Resource_List.Qlist=xxxq Resource_List.select=1:mem=393216mb:ncpus=8 Resource_List.walltime=24:00:00 resource_assigned.mem=402653184kb resource_assigned.ncpus=8
The values in bold are what I need to extract. Its multiple jobs and dates, so the file goes on with multiple paragraphs like this of data with different dates and numbers.
From going through similar questions online, I've come up with:
egrep -Eo 'Resource_List.mem=.{1,50}' sampleoutput.txt | cut -d "=" -f 2-
and I get multple lines of this:
393216mb Resource_List.ncpus=8 Resource_List.nodec
and I'm stuck as to how to get only that '393216mb' as I've never really used grep or cut much. Any suggestions, even if its not using grep, would be greatly appreciated!
Use:
grep -o -E 'Resource_List.mem=[^\ ]+|resource_assigned.mem=[^\ ]+'
Very close! . is a wildcard, you want to match numbers.
egrep -Eo 'Resource_List.mem=[0-9]*..' sampleoutput.txt

Defining a variable using head and cut

might be an easy question, I'm new in bash and haven't been able to find the solution to my question.
I'm writing the following script:
for file in `ls *.map`; do
ID=${file%.map}
convertf -p ${ID}_par #this is a program that I use, no problem
NAME=head -n 1 ${ID}.ind | cut -f1 -d":" #Now: This step is the problem: don't seem to be able to make a proper NAME function. I just want to take the first column of the first line of the file ${ID}.ind
It gives me the return
line 5: bad substitution
any help?
Thanks!
There are a couple of issues in your code:
for file in `ls *.map` does not do what you want. It will fail e.g. if any of the filenames contains a space or *, but there's more. See http://mywiki.wooledge.org/BashPitfalls#for_i_in_.24.28ls_.2A.mp3.29 for details.
You should just use for file in *.map instead.
ALL_UPPERCASE names are generally used for system variables and built-in shell variables. Use lowercase for your own names.
That said,
for file in *.map; do
id="${file%.map}"
convertf -p "${id}_par"
name="$(head -n 1 "${id}.ind" | cut -f1 -d":")"
...
looks like it would work. We just use $( cmd ) to capture the output of a command in a string.

How to use grep to filter out words like food and foot but not liked foody or footed

Hey People so I know what I want to do which would be to use the grep command to filter out words like foot, food, fool from a dictionary file but still retain words like footed and foodilicous.
so this is the code I have so far
cat /home1/02836/sulstice/dictionary.txt | grep -E foo | grep -vE '^foo'
The cat command is just pulling the dictionary txt which is just a bunch of words.
The last command I feel like there would be something I can put to say ^foo(if there is a character and end of the word then omit that too).
There must be a way using the grep function, anyone got a way?
Thank you
With Gnu grep, you can use the -w flag to restrict the match to full words, so:
grep -w foo[[:alpha:]] /home1/02836/sulstice/dictionary.txt
will match full words which consist of foo plus one letter.
Note that there is no need for cat. You can tell grep which file(s) to search in.
Assuming that dictionary.txt has a single word per line then you should be able to just use ^foo$

Unix script is sorting the input

I am having sometime here with my home assignment. Maybe you guys will advise what to read or what commands I can use in order to create the following:
Create a shell script test that will act as follows:
The script will display the following message on the terminal screen:
Enter file names (wild cards OK)
The script will read the list of names.
For each file on the list that is a proper file, display a table giving the ten most frequently used words in the file, sorted with the most frequent first. Include the count.
Repeat steps 1-3 over and over until the user indicates end-of-file. This is done by entering the single character Ctrl-d as a file name.
Here is what I have so far:
#!/bin/bash
echo 'Enter file names (wild cards OK)'
read input_source
if test -f "$input_source"
then
I'm usually ignoring homework questions without showing some progress and effort to learn something - but you're as beautifully cheeky so i'll make an exception.
here is what you want
while read -ep 'Files?> ' files
do
for file in $files
do
echo "== word counts for the $file =="
tr -cs '[:alnum:]' '\n' < "$file" | sort | uniq -c | tail | sort -nr
done
done
And now = at least try understand what the above doing...
Ps: voting to close...
How to find the ten most frequently used words in a file
Assumptions:
The files given have one word per line.
The files are not huge, so efficiency isn't a primary concern.
You can use sort and uniq to find the count of non-unique values in a file, then tail to cut off all but the last ten, and reverse-numeric sort to put them in descending order.
sort "$afile" | uniq -c | tail | sort -rd
Some tips:
have access to the complete bash manual: it's daunting at first, but it's an invaluable reference -- http://www.gnu.org/software/bash/manual/bashref.html
You can get help about bash builtins at the command line: try help read
the read command can handle printing the prompt with the -p option (see previous tip)
you'll accomplish the last step with a while loop:
while read -p "the prompt" filenames; do
# ...
done

bash grep 'random matching' string

Is there a way to grab a 'random matching' string via bash from a text file?
I am currently grabbing a download link via bash, curl & grep from a online text file.
Example:
DOWNLOADSTRING="$(curl -o - "http://example.com/folder/downloadlinks.txt" | grep "$VARIABLE")"
from online text file which contains
http://alphaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
where $VARIABLE is something the user selected.
Works great, but i wanted to add some mirrors to the text file.
So when the variable 'banana' is selected, text file which i grep contains:
http://alphaserver.com/files/apple.zip
http://betaserver.com/files/apple.zip
http://gammaserver.com/files/apple.zip
http://deltaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
http://betaserver.com/files/banana.zip
http://gammaserver.com/files/banana.zip
http://deltaserver.com/files/banana.zip
the code should pick a random 'banana' string and store it as the 'DOWNLOADSTRING' variable.
the current code above can only work with 1 string in the text file, since it grabs everything 'banana'.
What this is for; i wanted to add some mirror downloadlinks for the files in the online text file, and the current code doesn't allow that.
Can i let grep grab one random 'banana' string? (and not all of them)
See this question to see how to get a random line after grep. rl seems like a good candidate
What's an easy way to read random line from a file in Unix command line?
then do a grep ... | rl | head -n 1
Try this:
DOWNLOADSTRING="$(curl -o - "http://example.com/folder/downloadlinks.txt" | grep "$VARIABLE")" |
sort -R | head -1
The output will be random-sorted and then the first line will be selected.
If mirrors.txt has the following data, which you provided in your question:
http://alphaserver.com/files/apple.zip
http://betaserver.com/files/apple.zip
http://gammaserver.com/files/apple.zip
http://deltaserver.com/files/apple.zip
http://alphaserver.com/files/banana.zip
http://betaserver.com/files/banana.zip
http://gammaserver.com/files/banana.zip
http://deltaserver.com/files/banana.zip
Then you can use the following command to get a random "matched string" from the file:
grep -E "${VARIABLE}" mirrors.txt | shuf -n1
Then you can store it as the variable DOWNLOADSTRING by setting it's value with a function call like so:
rand_mirror_call() { grep -E "${1}" mirrors.txt | shuf -n1; }
DOWNLOADSTRING="$(rand_mirror_call ${VARIABLE})"
This will give you a dedicated random line from the text file based on the user's ${VARIABLE} input. It is a lot less typing this way.

Resources