I'm trying to fetch the yammer Followers using below rest API.
https://www.yammer.com/api/v1/users.json
Api contains details for each user. From this I need to extract followers count alone.
{"type":"user","id":1517006975,"network_id":461,"stats":{"following":0,"followers":0,"updates":0}}
Rate limit for per page is 50, as we have 100 000+ users I need to iterate 2000+ times to get the whole dump which is actually slow.
So I need method to directly extract the necessary data.
I am using shell script + pentaho .
I think you have two options.
If you are bound to shell, you could run the json response through a series of sed silliness to get to a list that you can then parse more effectively with shell tools. something like: curl http://foo.com | sed 's/,/\n/g'
will get you something more row based, and then you can start to parse it out from there using more sed or awk or cut and tr.
look at jq? it is a statically linked standalone c binary that allows really nice filtering of json
Related
Using JMeter to support functional API testing and have run across a problem with reading data from a CSV file. The data from the file is used in building a POST data body which contains something like this:
"wibbles" : ${wibble-var},
${wibble-var} is read from a CSV file and has the format :
["wibble1","wibble2","wibble3"]
... there are over 1000 wibble values in the list.
If "wibbles" : ["wibble1","wibble2","wibble3"]... is hard-coded into the POST body, then JMeter is happy, builds the POST request and does the business, but it's proved impossible to create a CSV file with even the 3 value example above, that JMeter will parse. JMeter skips the thread containing the 'CSV read' without building the POST request or sending it, so there's no response to examine, and a Debug Sampler is similarly skipped. I've heard rumours that doubling up the quotes can work but haven't been able to find the right syntax. Can anyone throw any light on this issue? Thanks
Double quotes will work if you can get "wibble1,wibble2,wibble3" & if you set Allow quoted data to true in CSV data set config
You can get this value and then use beanshell preprocessor to convert to the format "wibble1","wibble2","wibble3".
If you want to get in this format "wibble1","wibble2","wibble3" directly, you can use \t as the delimiter & modify the data in the CSV file accordingly.
Trial and error led to the following solution.
The format of the single data variable I needed to parse is ["value1","value2","value3"] (i.e a JSON array.) and this is exactly what the CSV file contained (with a header name of course on the first row), including the [ and ] brackets.
I modified the parameterised POST body to:
"wibbles": [${wibble-var}],
-- that is, I moved the square brackets out of the CSV file so that the CSV file now just contained the quoted elements of the array:
"value1","value2","value3" etc
I then set the delimiter in the CSV Data Set Config to |
And Allow Quoted Data to FALSE. <--- This was a bit counter intuitive but without it JMeter would not read the whole comma separated list of 2000 quoted strings as a single variable.
With these changes in place the script executed correctly.
Thanks again for the responses, I will definitely look at the __String functions mentioned.
I would go for the following options:
If your "wibbles" are a single string which you need to pass a a JSON Array it might be a lot easier to access them via __StringFromFile() or __FileToString() functions like:
"wibbles" : ${_StringFromFile(/path/to/file/containing/wibbles,,,)},
If you need to access individual "wibbles" and your CSV file is basically a JSON file:
Add HTTP Request Sampler to your test plan (before one which sends these "wibbles") and configure it as follows:
Protocol: file
Path: c:/testdata/yourfile.csv
Add JSON Path PostProcessor and use a JSON Path query to store the "wibbles" into a JMeter Variable(s)
I need help understanding a weird problem with sed, bash and a while loop.
MY data looks like this:
-File 1- CSV
account,hostnames,status,ipaddress,port,user,pass
-File 2- XML - This is a sample record set for two entries under one account
<accountname="account">
<cname="fqdn or simple name goes here">
<field="hostname">ahostname or ipv4 goes here</field>
<protocol>aprotocol</protocol>
<field="port">aportnumber</field>
<field="username">ausername</field>
<field="password">apassword</field>
</cname>
<cname="fqdn or simple name goes here">
<field="hostname">ahostname or ipv4 goes here</field>
<protocol>aprotocol</protocol>
<field="port">aportnumber</field>
<field="username">ausername</field>
<field="password">apassword</field>
</cname>
</accountname>
So far, I can add records in between the respective account holder from File1 to File2. But, if I need to remove records that no longer exists it does not work efficiently since it wipes other records from different accounts, ie it does not delete between a matched accountname.
I import from File 1 into File 2 with a while loop in my bash program:
-Bash Program excerpts-
//Read File in to F//
cat File 2 | while read F
do
//extract fields from F into variables
_vmname="$(echo $F |grep 'cname'| sed 's/<cname="//g' |sed 's/.\{2\}$//g')"
_account="$(echo $F | grep 'accountname' | sed 's/accountname="//g' |sed 's/.\{2\}$//g')"
// I then compare my File1 and look for stale records that are still in File2
if grep "$_vmname" File1 ;then
continue
else
// if not matched, delete between the respective accountname
sed -i '/'"$_account"'/,/<\/accountname>/ {/'"$_vmname"'/,/<\/cname>/d}' File2
If I manually declare _vmname and _account and run
sed -i '/'"$_account"'/,/<\/accountname>/ {/'"$_vmname"'/,/<\/cname>/d}' File2
It removes the stale records from File2. When I let my bash script run, it does not.
I think I have three problems:
Reading the variables for _vmname and _account name inside a loop makes it read numerous times. Any better way to do is appreciated.
I do not think the sed statement for matching these two patterns and then delete works like I want inside a while loop.
I may have a logic problem with my thought chain.
Any pointers, and please no awk, perl, lxml or python for this one.
Thanks!
and please no awk
I appreciate that you want to keep things simple, and I suppose awk seems more complicated than what you're doing. But I'd like to point out you have so far 3 grep and 4 sed invocations per line in the file, to process another file N times, once per line. That's O(mn) using the slowest method on the planet to read the file (a while loop). And it doesn't work.
I may have a logic problem with my thought chain.
I'm afraid we must allow for that possibility!
The right advice is to tackle XML with an XML parser, because XML is not a regular language and so can't be parsed with regular expressions. And that's really what you need here, because your program processes the whole XML document. You're not just plucking out bits and depending on incidental formatting artifacts; you want to add records that aren't there and remove those that "no longer exist". Apparently there is information in the XML document you need to preserve, else you would just produce it from the CSV. A parser would spoon-feed it to you.
The second-best advice is to use awk. I suppose you might try an approach like:
Process the CSV and produce the XML to be inserted.
In awk, first read the new input XML into an array keyed by cname, Then process the XML target once. For every CNAME, consult your array; if you find a match, insert your pre-constructed XML replacement (or modify the "paragraph" accordingly).
I'm not sure what the delete criteria are, so I don't know if it can be done in the same pass with step #2. If not, extract the salient information somehow. Maybe print a list of keys from each of the two files, and use comm(1) to produce a list of to-be-deleted. Then, similar to step #2, read in that list, and process the XML file one more time. Write anything you delete to stderr so you can keep track of what went missing, from what lines.
Any pointers
Whenever you find yourself processing the same file N times for N inputs, you know you're headed for trouble. One of the two inputs is always smaller, and that one can be put in some kind of array. cat file | while read is another warning signal, telling you use awk or any of a dozen obvious utilities that understand lines of text.
You posted your question on SO two weeks ago. I suspect no one answered it because you warned them away: preemptively saying, in effect, don't tell me to use good tools. I'm only here to suggest that you'll be more comfortable after you take off that straightjacket. Better tools, in this case, are the only right answer.
I've been trying to get an response over http with curl. The response is in json format and contains numbers
when I get the reply there are fields with numeric values but the floating point has been changed as follows:
"value": 2.7123123E7 instead of just "value": 27123123
why is this happening and how I can disable it? I do not want to parse the file second time and do the change, but just disable this behavior. For example my web browser where I submit the same query does not has this behavior but I cannot use my browser because the data I want to gather (response) is very big and it stucks :S
Thank you
It looks like jq will do this for you if you want a simple filter to convert the notation:
$ echo '{"value":2.7123123E7}' | jq '.'
{
"value": 27123123
}
See the manual for more info. So, a simple parsing would just be to pipe the output of curl through jq.
I am merging two feeds using Yahoo pipes and using the output feed on a website. However, as would like to identify the "feed source" for each item in the output feed. Is it possible to manipulate the original feeds so I can add another node/element to the feed items?
Thanks
One way to do that is using the Regex operator. Let's say you want to add a new field called source. You could use Regex with parameters:
In: item.source
replace: .*
with: (the text you want)
See it in action here:
http://pipes.yahoo.com/janos/7a3b9993cfc143d414fe7b637b1bd95a
That is, I have two feeds, I added a source attribute in the first with value "Question 1" and in the second with value "Question 2".
As an added bonus interesting undocumented Yahoo Pipes hack, I used one more Regex after the Union to make the source appear in the title.
However, this only adds the attribute to the node in the pipe debugger. You can use it for further processing, like I added it here to the title, it won't create a <source> tag in the output. That's because the RSS output of Yahoo Pipes removes all other fields that are not in the RSS standard. You can still see it in the JSON output though.
my soap/xml response looks like below:
<Account><Accountnumber>1234<Description>savings</Account><Account><Accountnumber>1235<Description>Savings1</Account>
I would like to store accountnumbers in a variable or array and would call it in another soap xml request in jmeter for knowing their details. can somebody help me how i can store and how i can call that variable ? I am new to Jmeter.
Thanks in advance.
If the account numbers are static, you're better off using a .csv file, as mentioned by Vance because the CSV data reader has less overhead then regex.
However, if you want dynamic data, it's very easy to do.
Download "regex coach" to help you write regular expressions. It's an amazing tool.
Attach a "regular expression extractor" as a child to your SOAP/XML request
Run the request once, to get the reponse
Copy the response into regex coach (or whatever tool you use), and write your regex. It'll look something like this: (\d+?)\D (look for any digit after the text accountNumber and stop after a non-digit)
Configure the rest of the regex. In this case, you'll want:
Apply to: Main Sample Only
Response filed to check: Main Body
Reference Name: VariableName
Regular Expression: See step 3
Match No: 1 (1st match) 0 (any match) or -1 (all
matches, useful when doing "FOR EACH
found" logic
Default Value: failed
TO use your variable account number in other requests, simply use the reference name. In this example: ${VariableName}
Reference: http://jmeter.apache.org/usermanual/component_reference.html#Regular_Expression_Extractor
You may save your data in a ".csv" file and Jmeter can read it easily through its csv data set config.
Use ${your data variable} in your scripts.