i'm using SUutime / stanford nlp, and it's doing a great job, but i can't figure out how to read regular dates formats.
for instance:
'we went at 27/10/1988 to the event'
it returns null
for expression like: 'we went at october 27th 1988 to the event', it works just fine
any ideas?
cheers
I am not experiences with Stanford temporal package, but it is probably not tuned for that temporal format.
Something that I suggest you take a look is this:
http://cogcomp.cs.illinois.edu/page/software_view/IllinoisTemporalExtractor
Which essentially works based on HeidelTime:
https://code.google.com/p/heideltime/
ok everyone, i think i got it.
in the sutime/english.sutime.txt line 319, there are few patterns for US tagging:
{ ruleType: "time", pattern: /yyyy-?MM-?dd-?'T'HH(:?mm(:?ss([.,]S{1,3})?)?)?(Z)?/ }
{ ruleType: "time", pattern: /yyyy-MM-dd/ }
{ ruleType: "time", pattern: /'T'HH(:?mm(:?ss(.,)?)?)?(Z)?/ }
# Tokenizer "sometimes adds extra slash
{ ruleType: "time", pattern: /yyyy\?/MM\?/dd/ }
{ ruleType: "time", pattern: /MM?\?/dd?\?/(yyyy|yy)/ }
{ ruleType: "time", pattern: /MM?-dd?-(yyyy|yy)/ }
{ ruleType: "time", pattern: /HH?:mm(:ss)?/ }
{ ruleType: "time", pattern: /yyyy-MM/ }
just need to add few ruleTypes, to get it the needed order
I'll put this here incase someone finds it useful.
The problem is that some time formats are not supported.
Taking a look at the sutime/english.sutime.txt file, you'll see a line like those seen below. The TODO there shows other formats can still be added. I added 2 others to mine as seen below:
# TODO: Support other timezone formats
{ ruleType: "time", pattern: /yyyy-?MM-?dd-?'T'HH(:?mm(:?ss([.,]S{1,3})?)?)?(Z)?/ }
{ ruleType: "time", pattern: /yyyy-MM-dd/ }
{ ruleType: "time", pattern: /'T'HH(:?mm(:?ss([.,](S{1,3}))?)?)?(Z)?/ }
#The entries below are newly added to support other time formats.
{ ruleType: "time", pattern: /dd\/MM\/yyyy/ }
{ ruleType: "time", pattern: /dd-MM-yyyy/ }
The newly added entries enable SUTime to correctly identify time formats of the form:
20-12-2014 or 28/12/2014
which is identical to the OPs required form.
Related
I'm creating a search query for a simple search on a site using GraphQL.
I've been looking for the absolutely simplest working examples of GraphQL but the questions/examples I've found on Stack Overflow are either too simple, too specific, or way too technical including too many unnecessary bits from the API.
The patterns of when to use the {}, when to use the where, and when optional naming come into play seem to disrupt the patterns that are explained in the docs.
Any insight here would be much appreciated.
Great question.
Here's how I would start. I will assume you have setup a database with Products and those Products have a name and a description.
First - here's how you get all the products (you will be inputting this into the GraphQL playground).:
query {
allProducts {
name
description
}
}
Second - here's how you get a product with a specific name:
query {
allProducts (where: {name: "Nike Air VaporMax"}){
name
description
}
}
Third - here's how to introduce "contains" as in name or description contains "nike". The _i suffix means case insensitive.
query {
allProducts (where: {name_contains_i: "nike"}){
name
description
}
}
Fourth - here's how to introduce an OR (note the commas and the container curly brackets):
query {
allProducts (where: {
OR: [{description_contains_i:"shoes"}, {name_contains_i:"shoes"}]
}
)
{
name
description
}
}
Fifth - here's how to introduce the AND (same as above, note the comma and the curly brackets):
query {
allProducts (where: {
AND: [{description_contains_i:"shoes"}, {name_contains_i:"shoes"}]
}
)
{
name
description
}
}
Sixth - here is how to start introducing variables - we'll use this with a WHERE + OR:
query ($varTest: String!) {
allProducts(
where: {
OR: [{ description_contains_i: "shoes" }, { name_contains_i: $varTest }]
}
) {
name
description
}
}
And !important! for the above, you will need to fill in the Query Variables:
{
"varTest": "Nike"
}
In case, you're not familiar with the placement of where to put the Query Variable, it will roughly look like this (look for the second window in which to place the Query Variables.)
Seventh - here is the kicker. You can optionally name these queries. The disruption in the pattern consistency threw me off initially. Let me add it here with a pretty obvious name so you can see it too:
query THIS_IS_MY_COOL_QUERY_NAME($varTest: String!) {
allProducts(
where: {
OR: [{ description_contains_i: "shoes" }, { name_contains_i: $varTest }]
}
) {
name
description
}
}
Eight - bonus. You won't need this BUT I want to introduce it here so it doesn't throw you off in the future. When you submit the query, you can assign your own name for the returned array of returned objects. Don't let this previous sentence confuse you, I'll give you examples of the returned array so it's clear.
Here is the Eight query (don't forget to use a Query Variable as you did in the Seventh example). I'll add a pretty obvious name directly in the query:
query THIS_IS_MY_COOL_QUERY_NAME($varTest: String!) {
resultsWillBeReturnedAsArrayWithThisName: allProducts(
where: {
OR: [{ description_contains_i: "shoes" }, { name_contains_i: $varTest }]
}
) {
name
description
}
}
The results from previous query (Seventh) will look like this:
{
"data": {
"allProducts": [
{
"name": "Air Jordan 1",
"description": "Wow - there are shoes!"
},
{
"name": "Nike Blazer Mid",
"description": "Very nice!"
},
{
"name": "Shoes",
"description": "These are shoes!"
}
]
}
}
But the results from the Eight Query will look like this (notice how the name you introduced will come back to you from GraphQL). :
{
"data": {
"resultsWillBeReturnedAsArrayWithThisName": [
{
"name": "Air Jordan 1",
"description": "Wow - there are shoes!"
},
{
"name": "Nike Blazer Mid",
"description": "Very nice!"
},
{
"name": "Shoes",
"description": "These are shoes!"
}
]
}
}
That should give you a solid building block to understanding GraphQL.
I am having a query related to grok processor.
For example this is my message filed
{
"message":"agentId:agent003"
}
I want to Grok this and my output should me something like this
{
"message":"agentId:agent003",
"agentId":"agent003"
}
Could some one help me on this how to achieve this? If i am able to do it for one field i can manage for rest of my fields. Thanks in advance.
This is the pipeline i have created in elasticsearch
PUT _ingest/pipeline/dissectpipeline
{
"description" : "split message content",
"processors": [
{
"dissect": {
"field": "message",
"pattern" : "%{apm_application_message.agentId}:%{apm_application_message.agentId}"
}
}
]
}
Central management added filebeat module other config
- pipeline:
if: ctx.first_char == '{'
name: '{< IngestPipeline "dissectpipeline" >}'
There is no error with my filebeat it's working fine but i am unable to find any field like apm_application_message.agentId in index.
How to make sure my pipeline working or not. Also if i am doing something wrong please let me know.
Instead of grok I'd suggest using the dissect filter instead with, which is more intuitive and easier to use.
dissect {
mapping => {
"message" => "%{?agentId}:%{&agentId}"
}
}
If you're using Filebeat, there is also the possibility to use the dissect processor:
processors:
- dissect:
tokenizer: "%{?agentId}:%{&agentId}"
field: "message"
target_prefix: ""
I have logs in the format:
2018-09-17 15:24:34;Count of files in error folder in;C:\Scripts\FOLDER\SUBFOLDER\error;1
I want to put in a separate field the path to the folder and the number after.
Like
dirTEST=C:\Scripts\FOLDER\SUBFOLDER\
count.of.error.filesTEST=1
or
dir=C:\Scripts\FOLDER\SUBFOLDER\
count.of.error.files=1
I use for this grok pattern in logstash config:
if "TestLogs" in [tags] {
grok{
match => { "message" => "%{DATE:date_in_log}%{SPACE}%{TIME:time.in.log};%{DATA:message.text.log};%{WINPATH:dir};%{INT:count.of.error.files}" }
add_field => { "dirTEST" => "%{dir}" }
add_field => { "count.of.error.filesTEST" => "%{count.of.error.files}" }
}
}
No errors in logstash logs.
But in the Kibana I get the usual log without new fields.
A couple of notes here. First of all, it must be said that the solution seems to be doing what you expect, so probably the problem is that your Index Pattern has not been updated with the new fields. To do so in Kibana you can go to Management -> Kibana -> Index Patterns and refresh the field list in the upper right corner (Next to the delete Index Pattern button).
Second is that you must take into account that using points to separate the terms makes the structured data look like this:
{
"date_in_log": "18-09-17",
"count": {
"of": {
"error": {
"files": "1"
}
}
},
"time": {
"in": {
"log": "15:24:34"
}
},
"message": {
"text": {
"log": "Count of files in error folder in"
}
},
"dir": "C:\\Scripts\\FOLDER\\SUBFOLDER\\error"
}
I don't know if this is how you want your data to be represented, but maybe you should consider other solution changing the naming of the fields in the grok pattern.
I have a bunch of log files that are named as 'XXXXXX_XX_yymmdd_hh:mm:ss.txt' - I need to include the date and time (separate fields) from the filename in fields that are added to Logstash.
Can anyone help?
Thanks
Use a grok filter to extract the date and time:
filter {
grok {
match => [
"path",
"^%{GREEDYDATA}/[^/]+_%{INT:date}_%{TIME:time}\.txt$"
]
}
}
Depending on what goes instead of XXXXXX_XX you might prefer a stricter expression. Also, GREEDYDATA isn't very efficient. This might yield better performance:
filter {
grok {
match => [
"path", "^(?:/[^/]+)+/[^/]+_%{INT:date}_%{TIME:time}\.txt$"
]
}
}
QUESTION Maybe anyone has solution how to filter/query ElasticSearch data by month or day ? Let's say I need to get all users who celebrating birthdays today.
mapping
mappings:
dob: { type: date, format: "dd-MM-yyyy HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss'Z'||yyyy-MM-dd'T'HH:mm:ss+SSSS"}
and stored in this way:
dob: 1950-06-03T00:00:00Z
main problem is how to search users by month and day only. Ignore the year, because birthday is annually as we know.
SOLUTION
I found solution to query birthdays with wildcards. As we know if we want use wildcards, the mapping of field must be a string, so I used multi field mapping.
mappings:
dob:
type: multi_field
fields:
dob: { type: date, format: "yyyy-MM-dd'T'HH:mm:ss'Z'}
string: { type: string, index: not_analyzed }
and query to get users by only month and day is:
{
"query": {
"wildcard": {
"dob.string": "*-06-03*"
}
}
}
NOTE
This query can be slow, as it needs to iterate over many terms.
CONCLUSION
It's not pretty nice way, but it's the only one I've found and it works!.
You should store the value-to-be-searched in Elasticsearch. The string/wildcard solution is half the way, but storing the numbers would be even better (and faster):
mappings:
dob:
type: date, format: "yyyy-MM-dd'T'HH:mm:ss'Z'
dob_day:
type: byte
dob_month:
type: byte
Example:
dob: 1950-03-06
dob_day: 06
dob_month: 03
Filtering (or querying) for plain numbers is easy: Match on both fields.
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"dob_day": 06,
}
},
{
"term": {
"dob_month": 03,
}
},
]
}
}
}
}
PS: While thinking about the solution: Storing the date as self-merged number like "06-03" -> "603" or "6.03" would be less obvious, but allow range queries to be used. But remember that 531 (05-31) plus one day would be 601 (06-01).
A manually-computed julian date might also be handy, but the calculation must always assume 29 days for February and the range query would have a chance of being off-by-1 if the range includes the 29st of Feb.
Based on your question I am assuming that you want a query, and not a filter (they are different), you can use the date math/format combined with a range query.
See: range query for usage
For explanation of date math see the following link
curl -XPOST http://localhost:9200/twitter/tweet/_search -d
{
"query": {
"range": {
"birthday": {
"gte" : "2014-01-01",
"lte" : "2014-01-01"
}
}
}
}
I have tested this with the latest elastic search.
If you don't want to parse strings, you can use a simple script. This filter will match when dateField has a specific month in any year:
"filter": {
"script": {
"lang": "expression",
"script": "doc['dateField'].getMonth() == month",
"params": {
"month": 05,
}
}
}
Note: the month parameter is 0-indexed.
The same method will work for day of month or any other date component.