labelling different lines on split operation - elasticsearch

I am using split on 1 of my fields.
It is split into different lines.
If I use .label('something'), then each line has the same name. but I want to give each line a different name.
How to label each line with different name ?

You need to use the regex functionality of .label
Try this:
.label("Ride ID: $1", "^.* > ride_id:(.+) > .*")
The $1 will be replaced by the first argument in the regex, ride_id:(.+) so you should end up your key labels as:
Ride ID: 4
Ride ID: 54
Ride ID: 5
Ride ID: 14
Ride ID: 50

Related

How to use two different excel files in same syntax procedure?

I have an excel file with information about variables (excel1) and another one with information about lists (excel2).
In order to create a syntax to generate a new syntax to create VARIABLE and VALUES LABELS, I used solution proposed by #eli.k here.
But with this solution I have to have a dataset with lists so I could use it instead of writing it “by hand” (copy/paste) (here). One problem came with L2, which has 195 entries so the new create variable would need to be bigger that 20.000 characters (is this possible in SPSS?), appearing all in one line.
What I want to know is if it’s possible to use excel2 automatically in code, line by line.
Using the following code:
GET DATA
/TYPE=XLSX
/FILE=" D:\excel1.xlsx "
/SHEET=name 'Folha1'
/CELLRANGE=FULL
/READNAMES=ON
/DATATYPEMIN PERCENTAGE=95.0.
STRING cmd1 cmd2 (a200).
SORT CASES by List.
MATCH FILES /FILE=* /FIRST=first /LAST=last /BY List. /* marking first and last lines.
DO IF first.
COMPUTE cmd1="VARIABLE LABELS".
COMPUTE cmd2="VALUE LABELS".
END IF.
IF not first cmd1=concat(rtrim(cmd1), " "). /* "/" only appears from the second varname.
COMPUTE cmd1=concat(rtrim(cmd1), " ", Var_label).
COMPUTE cmd2=concat(rtrim(cmd2), " ", Var).
DO IF last.
COMPUTE cmd1=concat(rtrim(cmd1), ".").
COMPUTE cmd2=concat(rtrim(cmd2), " ",' 1 "Afghanistan" 2 "Albania" (…) 195 "Zimbabwe".').
END IF.
EXECUTE.
SELECT IF ('List' 'L2').
ADD FILES /file=* /rename cmd1=cmd /file=* /rename cmd2=cmd.
EXECUTE.
I would like to know if there is a way to replace ' 1 "Afghanistan" 2 "Albania" (…) 195 "Zimbabwe".'' by some function/procedure to grab information from excel2 concerning L2, and showing it line by line:
(…)
VARIABLE LABELS V2 "Country"
/ V3 "Country Mother"
/ V4 "Country Father".
VALUE LABELS V2
V3
V4
1 "Afghanistan"
2 "Albania"
(…)
195 "Zimbabwe".
Thanks for helping me!
This issue is pretty complex and would usually be beyond the scope of Stack-Overflow Q&A but here's my answer anyway:
First I recreate the parts of your example data concerning the value labels only:
data list list/var list (2a5).
begin data
"v1" "L1"
"v2" "L2"
"v3" "L2"
"v4" "L2"
end data.
dataset name xl1.
data list list/list (a5) nb (f5) nb_txt (a20).
begin data
"L1" 1 "Female"
"L1" 2 "Male"
"L2" 1 "Afghanistan"
"L2" 2 "Albania"
"L2" 43 "Israel"
"L2" 195 "Zimbabwe"
end data.
dataset name xl2.
data list list/v1 v2 v3 v4 (4f3).
begin data
1 1 2 3
2 2 2 43
1 2 1 195
end data.
dataset name gen.
Now to work:
The first part is to create a macro for each list of variable labels. since some of the lists are long, I use ADD Value labels separately for each value.
dataset activate xl2.
string cmd (a200) cmdFin (a20).
sort cases by list nb.
match files /file=* /by list /first=first /last=last.
compute cmd=concat("add value labels !1 ", string(nb,f6), " '", rtrim(nb_txt), "' .").
if first cmd=concat("define dolist_", list, " (!pos=!cmdend) ", rtrim(cmd)).
if last cmdFin=" !enddefine .".
write outfile="path\create value label macros.sps"/cmd/cmdfin.
exe.
insert file="path\create value label macros.sps".
After inserting the generated syntax a macro has been defined for each of the value lists. Now we create an additional syntax that will run the related macro for each of the variable names in the list:
dataset activate xl1.
string cmd (a200).
compute cmd=concat("dolist_", list, " ", var, " .").
write outfile="path\run value label macros.sps"/cmd.
exe.
Now we can actually try out the generated macros on our original data:
dataset activate gen.
insert file="path\run value label macros.sps".

I need to grab the total number of items using regex

How can I grab the value 3 from "Page 1 of 3" given the below text:
Displaying Results Items 1 - 50 of 120, Page 1 of 3
If someone can briefly explain the regex that would be helpful.
Regular Expression you need consists of a + quantifier (one or more times - greedy), a numerical character class [0-9] and a capturing group (...).
str = "Displaying Results Items 1 - 50 of 120, Page 1 of 3"
print str.match(/Page +[0-9]+ +of +([0-9]+)/)[1]
Live demo
Explanation:
Page + # Match `Page` and any number of spaces (one or more)
[0-9]+ # Then any number of digits (one or more)
+of # Then any number of spaces (one or more) followed by `of`
+ # Then any number of spaces (one or more)
([0-9]+) # Finally up to another sequence of digits - captured by constructing a capturing group
There is a good reference here to learn more about RegExes.
you can do
str.scan(/Page \d+ of (\d+)/) #=> [["3"]]
It is trying to match the pattern of "Page # of #" and grabbing the last capture group. This will work if you have multiples of the same pattern in the string, it will all be a part of the resulting array.

Ruby Regex: How to match pattern that follows another pattern?

I have ID numbers that should come after the text ID: so my file consists of
ID: A1234
ID: A1235
ID: A1236
etc. I want to match /[A-Z]*[0-9]+/ but only if it comes after the characters ID:. How would I add that to the regular expression but not make it return ID: as part of the result? I just want it to match the regex that follows ID:, because at the end of the file I have numbers and it's returning them, but those aren't ID numbers.
/ID:\s*([A-Z]*[0-9]+)/
the parentheses capture what's inside the parentheses, and then you can refer to it using backreferences. If you post some code of how you're using the regex, I can try to add some more detail to show you how.

sed: flexible template w/ line number constraint

Problem
I need to insert text of arbitrary length ( # of lines ) into a template while maintaining an exact number of total lines.
Sample source data file:
You have a hold available for pickup as of 2012-01-13:
Title: Really Long Test Title Regarding Random Gibberish. Volume 1, A-B, United States
and affiliated territories, United Nations, countries of the world
Author: Barrel Roll Morton
Title: How to Compromise Free Speech Using Everyday Tools. Volume XXVI
Author: Lamar Smith
#end-of-record
You have a hold available for pickup as of 2012-01-13:
Title: Selling Out Democracy For Fun and Profit. Volume 1, A-B, United States
Author: Lamar Smith
Copy: 12
#end-of-record
Sample Template ( simplified for brevity ):
<%CUST-NAME%>
<%CUST-ADDR%>
<%CUST-CTY-ZIP%>
<%TITLES GO HERE%>
<%STORE-NAME%>
<%STORE-ADDR%>
<%STORE-CTY-ZIP%>
At this point I use bash's 'mapfile' to load the source file
record by record using the /^#end-of-file/ regex ...so far so good.
Then I pull predictable aspects of each record according to the line
on which they occur, then process the info using a series of sed
search replace statements.
The Hang-Up
So the problem is the unknown number of 'title' records that could occur.
How can I accommodate an unknown number of titles and always have output
of precisely 65 lines?
Given that title records always occur starting on line 8, I can pull the
titles easily with:
sed -n '8,$p' test-match.txt
However, how can I insert this within an allotted space, ex, between <%CUST-CTY-ZIP%> and <%STORE-NAME%> without pushing the store info out of place in the template?
My idea so far:
-first send the customer info through:
Ex.
sed 's/<%CUST-NAME%>/Benedict Arnold/' template.txt
-Append title records
???
-Then the store/location info
sed 's/<%STORE-NAME%>/Smith's House of Greasy Palms/' template.txt
I have code and functions for this stuff if interested but this post is 'windy' as it is.
Just need help with inserting the title records while maintaining position of following text and maintaining total line number of 65.*
UPDATE
I've decided to change tactics. I'm going to create place holders in the template for all available lines between customer and store info --- then:
Test if line is null in source
if yes -- replace placeholder with null leaving the line ending. Line number maintained.
if not null -- again, replace with text, maintaining line number and line endings in template.
Eventually, I plan to invest some time looking closer at Triplee's suggestion regarding Perl. The Perl way really does look simpler and easier to maintain if I'm going to be stuck with this project long term.
This might work for you:
cat <<! >titles.txt
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> Title 1
> Title 2
> Title 3
> Title 4
> Title 5
> Title 6
> !
cat <<! >template.txt
> <%CUST-NAME%>
> <%CUST-ADDR%>
> <%CUST-CTY-ZIP%>
>
> <%TITLES GO HERE%>
>
> <%STORE-NAME%>
> <%STORE-ADDR%>
> <%STORE-CTY-ZIP%>
> !
sed '1,7d;:a;$!{N;ba};:b;G;s/\n[^\n]*//5g;tc;bb;:c;s/\n/\\n/g;s|.*|/<%TITLES GO HERE%>/c\\&|' titles.txt |
sed -f - template.txt
<%CUST-NAME%>
<%CUST-ADDR%>
<%CUST-CTY-ZIP%>
Title 1
Title 2
Title 3
Title 4
Title 5
<%STORE-NAME%>
<%STORE-ADDR%>
<%STORE-CTY-ZIP%>
This pads/squeezes the titles to 5 lines (s/\n[^\n]*//5g) if you want fewer or more change the 5 to the number desired.
This will give you five lines of output regardless of the number of lines in titles.txt:
sed -n '$s/$/\n\n\n\n\n/;8,$p' test-match.txt | head -n 5
Another version:
sed -n '8,$N; ${s/$/\n\n\n\n\n/;s/\(\([^\n]*\n\)\{4\}\).*/\1/p}' test-match.txt
Use one less than the number of lines you want (4 in this example will cause 5 lines of output).
Here's a quick proof of concept using Perl formats. If you are unfamiliar with Perl, I guess you will need some additional help with how to get the values from two different files, but it's quite doable, of course. Here, the data is simply embedded into the script itself.
I set the $titles format to 5 lines instead of the proper value (58 or something?) in order to make this easier to try out in a terminal window, and to demonstrate that the output is indeed truncated when it is longer than the allocated space.
#!/usr/bin/perl
use strict;
use warnings;
use vars (qw($cust_name $cust_addr $cust_cty_zip $titles
$store_name $store_addr $store_cty_zip));
my $fmtline = '#' . '<' x 78;
my $titlefmtline = '^' . '<' x 78;
my $empty = '';
my $fmt = join ("\n$fmtline\n", 'format STDOUT = ',
'$cust_name', '$cust_addr', '$cust_cty_zip', '$empty') .
("\n$titlefmtline\n" . '$titles') x 5 . #58
join ("\n$fmtline\n", '', '$empty',
'$store_name', '$store_addr', '$store_cty_zip');
#print $fmt;
eval "$fmt\n.\n";
titles = <<____HERE;
Title: Really Long Test Title Regarding Random Gibberish. Volume 1, A-B, United States
and affiliated territories, United Nations, countries of the world
Author: Barrel Roll Morton
Title: How to Compromise Free Speech Using Everyday Tools. Volume XXVI
Author: Lamar Smith
____HERE
# Preserve line breaks -- ^<< will fill lines, but preserves line breaks on \r
$titles =~ s/\n/\r\n/g;
while (<DATA>) {
chomp;
($cust_name, $cust_addr, $cust_cty_zip, $store_name, $store_addr, $store_cty_zip)
= split (",");
write STDOUT;
}
__END__
Charlie Bravo,23 Alpa St,Delta ND 12345,Spamazon,98 Spamway,Atlanta GA 98765
The use of $empty to get an empty line is pretty ugly, but I wanted to keep the format as regular as possible. I'm sure it could be avoided, but at the cost of additional code complexity IMHO.
If you are unfamiliar with Perl, the use strict is a complication, but a practical necessity; it requires you to declare your variables either with use vars or my. It is a best practice which helps immensely if you try to make changes to the script.
Here documents with <<HERE work like in shell scripts; it allows you to create a multi-line string easily.
The x operator is for repetition; 'string' x 3 is 'stringstringstring' and ("list") x 3 is ("list" "list" "list"). The dot operator is string concatenation; that is, "foo" . "bar" is "foobar".
Finally, the DATA filehandle allows you to put arbitrary data in the script file itself after the __END__ token which signals the end of the program code. For reading from standard input, use <> instead of <DATA>.

What's wrong with this RegEx?

I'm trying to implement this in a small ruby script, and tested it on http://www.rubular.com/, where it worked perfectly. Not sure why its not performing in the actual script.
The RegEx: /(motion|links|sound|button|symbol)|(0.\d{8})|(\s\d{1}\s)|(\d{10}\s)/
The Text it's Against:
Trial ID: 1 | Trial Type: motion | Trick? 1
Click Time: 0.87913100 1302969732
Trial ID: 7 | Trial Type: button | Trick? 0
Click Time: 0.19817800 1302987043
etc. etc.
What I am trying to grab: Only the numbers, and the single word after "Trial Type". So for the first line of the example, I would only want " 1 motion 1 0.87913100 1302969732" to be returned. I also want to keep the space before the first number in each trial.
My short ruby script:
File.open('log.txt', 'r') do |file|
contents = file.readlines.to_s
regex = Regexp.new(/(motion|links|sound|button|symbol)|(0\.\d{8})|(\s\d{1}\s)|(\d{10}\s)/)
matchdata = regex.match(contents).to_a
matchdata.each do |match|
if match != nil
puts match
end
end
end
It only outputs two "1"s though. Hmm... I know its reading the file contents right, and when I tried an alternate simplet regex it worked fine.
Thanks for any help I get here!! : )
You want to use String#scan
matchdata = contents.scan(regex)
Also #Mike Penington is correct, you shouldn't have to do the if match != nil if you do it right. You have to clean up your regex as well. The pipe character in regex is a special character to denote match the left side OR the right side, and you have the litteral pipe character that you must escape.
You need to escape the literal pipes inside the regex, fill in other missing literals (like Trick, \?, Click\sTime:, remove some of the spaces, etc...), and insert regex spaces where appropriate... i.e.
regex = Regexp.new(/(motion|links|sound|button|symbol)\s\|\sTrick\?\s*\d\s*Click\s+Time:\s+(0\.\d{,8})\s(\d{10}))/)
EDIT: fixed parenthesis nesting in the original
If you know that the data follows a particular pattern, you can just follow that pattern in the regex, and pick up the portions you want with ( ).
/Trial ID: (\d+) \| Trial Type: (\w+) \| Trick\? (\d+) Click Time: ([\.\d]+) ([\.\d]+)/
The more you know previously about the data, the more specifically you can make the regex.
If you see some variations in the data, and the regex fails to match, then just relax the pattern:
If the Trail ID, Trail ID may include a decimal point, use [\.\d]+ instead of \d+.
If the space can be more than one, then replace it with []+
If the space can be a tab, or can be absent, use \s* or [ \t]*.
If the Trial ID: part may appear as a different phrase, replace it with .*?,
and so on.
If you are not sure how many spaces/tabs appear, use this:
/Trial\s*ID:\s*(\d+)\s*\|\s*Trial\s*Type:\s*(\w+)\s*\|\s*Trick\?\s*(\d+)\s*Click\s*Time:\s*([\.\d]+)\s+([\.\d]+)/
This is one of those times that trying to everything in a big regex makes you work too hard. Simplify things:
ary = [
'Trial ID: 1 | Trial Type: motion | Trick? 1 Click Time: 0.87913100 1302969732',
'Trial ID: 7 | Trial Type: button | Trick? 0 Click Time: 0.19817800 1302987043'
]
ary.each do |li|
numbers = li.scan(/[\d.]+/)
trial_type = li[/Trial Type: (\w+)/, 1]
puts "%d %s %d %f %d\n" % [numbers.first, trial_type, *numbers[1 .. -1]]
end
# >> 1 motion 1 0.879131 1302969732
# >> 7 button 0 0.198178 1302987043
Regex patterns are powerful, but people think it's macho to do everything in one big line. You have to weigh doing that with the increased work necessary to put together the regex in the first place, plus maintain it if something changes in the text being parsed later.

Resources