How to extract float values in number format instead of exponential values using jq? - precision

Im extracting few fields from a list of jsons in a file using jq and one of the field contains a float value (eg.float 0.0000875) which when extracted through jq changed to '8.75e-05'. Is there a way we can extract these values without being converted to exponential form?

The "master" version of jq now generally preserves the "external" number format, e.g.
$ jq -n '0.0000875'
0.0000875
The relevant commit date was Oct 21, 2019.
For some installation guidelines, see https://github.com/stedolan/jq/wiki/Installation

Related

Bash: Sort file numerically, but only where the first field matches a pattern

Due to poor past naming practices, I'm left with a list of names that is proving to be a challenge to work with. The bottom line is that I want the most current name (by date) to be placed in a variable. All the names are listed (unsorted) in a file called bar.txt.
In this case I can't rename, and there's no way to get the actual dates of the images; these names are all I have to go on. The names can follow one of several patterns;
foo
YYYYMMDD-foo
YYYYMMDD##-foo
foo can be anything from a single character to a long string of letters/numbers/symbols. I am interested only in the names matching the second use case, YYMMDD-foo, as those are from after we started tagging consistently.
I would like to end up with a variable containing the most recent date that follows the pattern YYMMDD-foo.
I know sort -k1 -n < bar.txt, but then I'm not sure how to isolate the second pattern's results to extract what I need.
How do I sort the file to ignore anything but the second pattern, and return the most current date?
Sample
Given that bar.txt looks like this;
test
2017120901-develop-BUILD-31
20170326-TEST-1.2.0
20170406-BUILD-40-1.2.0-test
2010818_001
I would want to extract 20170406-BUILD-40-1.2.0-test
Since your requirement involves 1) to get only files of a certain format 2) apply sorting and get only the latest file. Am using a Awk & GNU sort together to achieve it
awk -F'-' 'length($1) == 8' file | sort -nrk1 | head -1
20170406-BUILD-40-1.2.0-test
The solution works by only getting those lines in the file whose first column has 8 characters exactly corresponding to YYYYMMDD alignment. Once those filtered, sort applied on first field and the first line is obtained using head.

Applying String Manipulations/ Mathematical operations to the contents of a flow file in nifi

I have a flow file coming in, which has fixed width data in the following format :
ABC 0F 15343543543454434 gghhhhhg
ABC 01 433534343434 hjvh
I want to have my output data in the following format:
ABC|15|15343543543454434|gghhhhhg
ABC|1|433534343434|hjvh
to get this output I need to convert the second field in each line to base10 integer and apply a strip operation to all the other fields to trim the white spaces.
I tried using the replaceText processor but I could not find a way to convert the second field to a base10 integer or apply strip function to the string fields.
Working with hexadecimal numbers is not something that is easily done in a current release of NiFi. In order to get it to work you'd need to use one of the scripting processors ExecuteScript or InvokeScripted processor.
That said, doing numeric evaluations is one of my focuses in this upcoming release (which is currently being curated to be finalized) and I've been able to create a solution involving just the ReplaceText processor. I used the following configuration:
Search Value: ^(\w*)\ *(\w*)\ *(\d*)\ *(\w*)$
Replacement Value: $1|${'$2':prepend('0x'):append('p0'):toNumber()}|$3|$4
Replacement Strategy: Regex Replace
Evaluation Mode: Line-by-line
The rest is up to your use-case (ie. which ever character set it is in). The search value will create capture groups for each of the sections. Then in the replacement value I utilize the second (the one for the hex digit) in an Expression language function to convert to base 10. The purpose of the "append" and "prepend" is that on the current master only decimals/double accept hex numbers (I need to improve that) so I just make it format it as a double.
So it is unfortunate this use-case isn't currently handled out of the box, it soon will be!
Edit: I've created a Jira to track adding hex -> whole numbers in EL here: https://issues.apache.org/jira/browse/NIFI-2950
Edit2: A commit addressing the issue has been merged to master and will be in versions 1.1+: https://github.com/apache/nifi/commit/c4be800688bf23a3bdea8def75b84c0f4ded243d

How to get the values from JSON URL

I'm trying to fetch the yammer Followers using below rest API.
https://www.yammer.com/api/v1/users.json
Api contains details for each user. From this I need to extract followers count alone.
{"type":"user","id":1517006975,"network_id":461,"stats":{"following":0,"followers":0,"updates":0}}
Rate limit for per page is 50, as we have 100 000+ users I need to iterate 2000+ times to get the whole dump which is actually slow.
So I need method to directly extract the necessary data.
I am using shell script + pentaho .
I think you have two options.
If you are bound to shell, you could run the json response through a series of sed silliness to get to a list that you can then parse more effectively with shell tools. something like: curl http://foo.com | sed 's/,/\n/g'
will get you something more row based, and then you can start to parse it out from there using more sed or awk or cut and tr.
look at jq? it is a statically linked standalone c binary that allows really nice filtering of json

curl bash shell disable floating point transformation on

I've been trying to get an response over http with curl. The response is in json format and contains numbers
when I get the reply there are fields with numeric values but the floating point has been changed as follows:
"value": 2.7123123E7 instead of just "value": 27123123
why is this happening and how I can disable it? I do not want to parse the file second time and do the change, but just disable this behavior. For example my web browser where I submit the same query does not has this behavior but I cannot use my browser because the data I want to gather (response) is very big and it stucks :S
Thank you
It looks like jq will do this for you if you want a simple filter to convert the notation:
$ echo '{"value":2.7123123E7}' | jq '.'
{
"value": 27123123
}
See the manual for more info. So, a simple parsing would just be to pipe the output of curl through jq.

Figure date format from string in ruby

I am working in a simple data loader for text files and would like to add a feature for correctly loading dates into the tables. The problem I have is that I do not know the date format before hand, and it will not be my script doing the inserts - it has to generate insert statements for later use.
The Date.parse is almost what I'd need. If there was a way to grab the format it identified on the string in a way I could use to generate a to_date(...)(Oracle standard) would be perfect.
An example:
My input file:
user_name;birth_date
Sue;20130427
Amy;31/4/1984
Should generate:
insert into my_table values ('Sue', to_date('20130427','yyyymmdd'));
insert into my_table values ('Amy', to_date('31/4/1984','dd/mm/yyyy'));
Note that it is important the original string remains unchanged - so I cannot parse it to a standard format used in the inserts (it is a requirement).
At the moment I am just testing a bunch of regexes and doing some validation, but I was wondering if there was a more robust way.
Suppose (using for example String#scan), you extracted an array of the date strings from a single file. It may be like:
strings = ["20130427", "20130102", ...]
Prepare in advance an array of all formats you can think of. It may be like:
Formats = ["%Y%m%d", "%y%m%d", "%y/%m/%d", "%m/%d/%y", "%d/%m/%y", ...]
Then check all formats that can parse all of the strings:
require "date"
formats =
Formats.select{|format| strings.all?{|s| Date.strptime(s, format) rescue nil}}
If this array formats includes exactly one element, then that means the strings were unambiguously parsed with that format. Using that format, you can go back to the strings and parse them with that format.
Otherwise, either you failed to provide the appropriate format within Formats, or the strings remained ambiguous.
I would use the Chronic gem. It will extract dates in most formats.
It has options to resolve the ambiguity in the xx/xx/xxxx format, but you'd have to specify which to prefer when either match.

Resources