Using grep (on Windows) to find a string contained by 's - windows

I'm trying to write a shell script in windows, which is why I'm not using something like awk or grep -o etc.
I'm trying to parse my angular files for the controllers being used. For example, I'll have a line like this in a file.
widgetList.controller('widgetListController', [
What I want is to pull out widgetListController
Here's what I've got so far:
grep -h "[[:alpha:]]*Controller[[:alpha:]]*" C:/workspace/AM/$file | tr ' ' '\n' | grep -h "[[:alpha:]]*Controller[[:alpha:]]*"
It works decently well, but it will pull out the entire line like so:
widgetList.controller('widgetListController', rather than just the word.
Also in instances where the controller is formatted as so:
controller : 'widgetListController',
It returns 'widgetListController',
How can I adjust this to simply return whatever is between the 's? I've tried various ways of escaping that character but it doesn't seem to be working.

You can use this sed command:
sed "/Controller/s/.*'\([^']*\)'.*$/\1/" C:/workspace/AM/$file
Output:
widgetListController

Related

Text Processing - how to remove part of string from search results using sed?

I am parsing through .xml files looking for names that are inside HTML tags.
I have found what I need, but I would just like to keep the family names.
This is what I have until now (grep command for the names + clean-up of the result, which includes removing the tags and the file name, I will later sort them and leave only unique names):
grep -oP '<name>([A-ZÖÄÜÕŽS][a-zöäüõžš]*)[\s-]([A-ZÖÄÜÕŽS][a-zöäüõžš]*)</name>' *.xml --colour | sed -e 's/<[^>]*>//g' | sed 's/la[0-9]*//' | sed 's/$*.xml://'
The output looks like this:
Mart Kreos
Hans Väär
Karel Väär
Jaan Tibbin
Jüri Kull
I would like to keep the family names, but remove the first names.
I tried to use the following command, but it only worked for some names and not for the others:
sed -r 's/([A-ZÖÄÜÕŽŠ][a-zöäüõžš]+[ ])([A-ZÖÄÜÕŽS][a-zöäüõžš]+)/\2/g'
You should use cut. It is more adapted to what you're trying to achieve here. And you would avoid struggling with UTF-8 characters.
This would give you the expected result for all names in your sample output:
cut -d ' ' -f 2

Defining a variable using head and cut

might be an easy question, I'm new in bash and haven't been able to find the solution to my question.
I'm writing the following script:
for file in `ls *.map`; do
ID=${file%.map}
convertf -p ${ID}_par #this is a program that I use, no problem
NAME=head -n 1 ${ID}.ind | cut -f1 -d":" #Now: This step is the problem: don't seem to be able to make a proper NAME function. I just want to take the first column of the first line of the file ${ID}.ind
It gives me the return
line 5: bad substitution
any help?
Thanks!
There are a couple of issues in your code:
for file in `ls *.map` does not do what you want. It will fail e.g. if any of the filenames contains a space or *, but there's more. See http://mywiki.wooledge.org/BashPitfalls#for_i_in_.24.28ls_.2A.mp3.29 for details.
You should just use for file in *.map instead.
ALL_UPPERCASE names are generally used for system variables and built-in shell variables. Use lowercase for your own names.
That said,
for file in *.map; do
id="${file%.map}"
convertf -p "${id}_par"
name="$(head -n 1 "${id}.ind" | cut -f1 -d":")"
...
looks like it would work. We just use $( cmd ) to capture the output of a command in a string.

using curl to call data, and grep to scrub output

I am attempting to call an API for a series of ID's, and then leverage those ID's in a bash script using curl, to query a machine for some information, and then scrub the data for only a select few things before it outputs this.
#!/bin/bash
url="http://<myserver:myport>/ws/v1/history/mapreduce/jobs"
for a in $(cat jobs.txt); do
content="$(curl "$url/$a/counters" "| grep -oP '(FILE_BYTES_READ[^:]+:\d+)|FILE_BYTES_WRITTEN[^:]+:\d+|GC_TIME_MILLIS[^:]+:\d+|CPU_MILLISECONDS[^:]+:\d+|PHYSICAL_MEMORY_BYTES[^:]+:\d+|COMMITTED_HEAP_BYTES[^:]+:\d+'" )"
echo "$content" >> output.txt
done
This is for a MapR project I am currently working on to peel some fields out of the API.
In the example above, I only care about 6 fields, though the output that comes from the curl command gives me about 30 fields and their values, many of which are irrelevant.
If I use the curl command in a standard prompt, I get the fields I am looking for, but when I add it to the script I get nothing.
Please remove quotes after
$url/$a/counters" ". Like following:
content="$(curl "$url/$a/counters | grep -oP '(FILE_BYTES_READ[^:]+:\d+)|FILE_BYTES_WRITTEN[^:]+:\d+|GC_TIME_MILLIS[^:]+:\d+|CPU_MILLISECONDS[^:]+:\d+|PHYSICAL_MEMORY_BYTES[^:]+:\d+|COMMITTED_HEAP_BYTES[^:]+:\d+'" )"

How to add input into a website that is retrieved by Shell script so that the out can be printed out?

I have the following Shell script below which can download the website into a variable. This is as far as I have got. What I would like to do is add input into this website (which accepts an IP address, and outputs ones location) from the console when I execute a Shell script with an argument(IP address) so that it can output the geographical location of the IP address. Please can anyone help.
#! /bin/bash
read input
content=$(wget http://freegeoip.net -q )
echo $content
Save the following as a file called getgeo:
#!/bin/bash
location=$(curl -s http://freegeoip.net/csv/$1)
echo $location
Then use it like this:
chmod +x getgeo
./getgeo 141.20.1.33
"141.20.1.33","DE","Germany","16","Berlin","Berlin","","52.5167","13.4000","",""
Or, if you just want the 5th and 3rd field and no quotes, do this:
./getgeo 141.20.1.33 | tr -d '"' | awk -F, '{print $5,$3}'
Berlin Germany
Or, you can do the trimming inside the script itself:
#!/bin/bash
location=$(curl -s http://freegeoip.net/csv/$1)
echo $location | tr -d '"' | awk -F, '{print $5,$3}'
If you prefer parsing XML or JSON, you can change the /csv/ to /XML/ or /JSON/ and you will get the following:
<?xml version="1.0" encoding="UTF-8"?> <Response> <Ip>92.238.99.46</Ip> <CountryCode>GB</CountryCode> <CountryName>United Kingdom</CountryName> <RegionCode>E6</RegionCode> <RegionName>Gloucestershire</RegionName> <City>Gloucester</City> <ZipCode>GL3</ZipCode> <Latitude>51.8456</Latitude> <Longitude>-2.1575</Longitude> <MetroCode></MetroCode> <AreaCode></AreaCode> </Response>
or JSON
{"ip":"141.20.1.33","country_code":"DE","country_name":"Germany","region_code":"16","region_name":"Berlin","city":"Berlin","zipcode":"","latitude":52.5167,"longitude":13.4,"metro_code":"","areacode":""}
Notes:
The command tr -d '"' removes all double quotes from whatever it receives as input:
The -F, switch to awk says to uses the comma as the field separator.
You may want to use curl - go ahead and refer to the documentation in the following site:
http://curl.haxx.se/docs/httpscripting.html#GET
The simplest and most common request/operation made using HTTP is to get a URL. The URL could itself refer to a web page, an image or a file. The client issues a GET request to the server and receives the document it asked for. If you issue the command line
curl http://curl.haxx.se
you get a web page returned in your terminal window. The entire HTML document that that URL holds.
Therefore, you can achieve what you want by redirecting the output of curl freegeoip.net/{format}/{ip_or_hostname} to a file, and then grep the info you want from it.

reverse geocoding in bash

I have a gps unit which extracts longitude and latitude and outputs as a google maps link
http://maps.googleapis.com/maps/api/geocode/xml?latlng=51.601154,-0.404765&sensor=false
From this i'd like to call it via curl and display the "short name" in line 20
"short_name" : "Northwood",
so i'd just like to be left with
Northwood
so something like
curl -s http://maps.googleapis.com/maps/api/geocode/xml?latlng=latlng=51.601154,-0.404765&sensor=false sed sort_name
Mmmm, this is kind of quick and dirty:
curl -s "http://maps.googleapis.com/maps/api/geocode/json?latlng=40.714224,-73.961452&sensor=false" | grep -B 1 "route" | awk -F'"' '/short_name/ {print $4}'
Bedford Avenue
It looks for the line before the line with "route" in it, then the word "short_name" and then prints the 4th field as detected by using " as the field separator. Really you should use a JSON parser though!
Notes:
This doesn't require you to install anything.
I look for the word "route" in the JSON because you seem to want the road name - you could equally look for anything else you choose.
This isn't a very robust solution as Google may not always give you a route, but I guess other programs/solutions won't work then either!
You can play with my solution by successively removing parts from the right hand end of the pipeline to see what each phase produces.
EDITED
Mmm, you have changed from JSON to XML, I see... well, this parses out what you want, but I note you are now looking for a locality whereas before you were looking for a route or road name? Which do you want?
curl -s "http://maps.googleapis.com/maps/api/geocode/xml?latlng=51.601154,-0.404765&sensor=false" | grep -B1 locality | grep short_name| head -1|sed -e 's/<\/.*//' -e 's/.*>//'
The "grep -B1" looks for the line before the line containing "locality". The "grep short_name" then gets the locality's short name. The "head -1" discards all but the first locality if there are more than one. The "sed" stuff removes the <> XML delimiters.
This isn't text, it's structured JSON. You don't want the value after the colon on line 12, you want the value of short name in the address_component with type 'route' from the result.
You could do this with jsawk or python, but it's easier to get it from XML output with xmlstarlet, which is lighter than python and more available than jsawk. Install xmlstarlet and try:
curl -s 'http://maps.googleapis.com/maps/api/geocode/xml?latlng=40.714224,-73.961452&sensor=false' \
| xmlstarlet sel -t -v '/GeocodeResponse/result/address_component[type="route"]/short_name'
This is much more robust than trying to parse JSON as plaintext.
The following seems to work assuming you always like the short_name at line 12:
curl -s 'http://maps.googleapis.com/maps/api/geocode/json?latlng=40.714224,-73.961452&sensor=false' | sed -n -e '12s/^.*: "\([a-zA-Z ]*\)",/\1/p'
or if you are using the xml api and wan't to trap the short_name on line 20:
curl -s 'http://maps.googleapis.com/maps/api/geocode/xml?latlng=51.601154,-0.404765&sensor=false' | sed -n -e '19s/<short_name>\([a-zA-Z ]*\)<\/short_name>/\1/p'

Resources