kibana4 doesn't recognize my timestamp, ES is mapping it as a string - elasticsearch

I want to push data I have in my hadoop cluster to ES and then visualize the hole thing in kibana4.
this is what I’ve done :
1)
CREATE TABLE xx(traffic_type_id INT, caller INT, time STRING, tranche_horaire INT, called INT, call_duration INT, code_type_trafic STRING, code_destination_trafic STRING, location_number STRING, id_offre INT, id_service INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t';
LOAD DATA INPATH ‘/user/hive/outt.csv’ OVERWRITE INTO TABLE xx;
2)
CREATE EXTERNAL TABLE esxx (caller INT, time STRING, tranche INT, called_number INT, duration INT, code_type STRING, code_destination STRING, location STRING, offre INT, service INT)
STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler’
TBLPROPERTIES(‘es.resource’ = ‘xx/xx’,
‘es.nodes’=’192.168.238.130:9200′,
‘es.mapping.names’ = ‘time:#timestamp’);
INSERT OVERWRITE TABLE escdr SELECT s.caller, s.time, s.tranche_horaire, s.called, s.call_duration, s.code_type_trafic, s.code_destination_trafic, s.location_number, s.id_offre, s.id_service FROM xx s;
3)
CREATE EXTERNAL TABLE xx (
caller INT,
time TIMESTAMP,
tranche INT,
called_number INT,
duration INT,
code_type STRING,
code_destination STRING,
location STRING,
offre INT,
service INT)
STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler’
TBLPROPERTIES(‘es.resource’ = ‘xx/xx/’,
‘es.nodes’=’192.168.238.130:9200′,
‘es.mapping.names’ = ‘time:#timestamp’);
But Kibana doesn’t seem to recognize my timestamp “time”, ES keeps on mapping it as a string (the time field in my csv file is as so : exp : 01AUG2014:19:02:11 ! What should I do and change to let ES do the appropriate mapping and thus recognize my timestamp?
Best regards,
Omar,

If I were you I would convert this strange timestamp format to basic ISO8601 on the fly while importing, so that your timestamps look like 2014-08-01T19:02:11Z (or +HH:MM for whatever timezone you have your time in; I have no way to tell).

Related

Spark repartitionAndSortWithinPartitions with tuples

I'm trying to follow this example to partition hbase rows: https://www.opencore.com/blog/2016/10/efficient-bulk-load-of-hbase-using-spark/
However, I have data already stored in (String, String, String) where the first is the rowkey, second is column name, and third is column value.
I tried writing an implicit ordering to achieve the OrderedRDD implicit
implicit val caseInsensitiveOrdering: Ordering[(String, String, String)] = new Ordering[(String, String, String)] {
override def compare(x: (String, String, String), y: (String, String, String)): Int = ???
}
but repartitionAndSortWithinPartitions is still not available. Is there a way I can use this method with this tuple?
RDD must have key and value, not only values, for ex.:
val data = List((("5", "6", "1"), (1)))
val rdd : RDD[((String, String, String), Int)] = sparkContext.parallelize(data)
implicit val caseInsensitiveOrdering = new Ordering[(String, String, String)] {
override def compare(x: (String, String, String), y: (String, String, String)): Int = 1
}
rdd.repartitionAndSortWithinPartitions(..)

What's the format of timestamp to write parquet file in go

I am trying to write a Go struct in a Parquet file and upload to S3. What format and type do I specify for timestamp parameter in the struct so that athena displays correct timestamp when reading from the parquet file.
type example struct {
ID int64 `parquet:"name=id, type=INT64"`
CreatedAt int64 `parquet:"name=created_at,type=TIMESTAMP_MILLIS"`
}
ex := example{}
ex.ID = int64(10)
ex.CreatedAt = time.Now().Unix()
fw, err := ParquetFile.NewLocalFileWriter("new.parquet")
pw, err := ParquetWriter.NewParquetWriter(fw, new(example), 1)
pw.Write(ex)
Upload the file new.parquet to S3
Reference - https://github.com/xitongsys/parquet-go. I created a table in Athena with int and timestamp field for the same and trying querying the table. The date is showing something like - 1970-01-18 21:54:23.751.
which no where matches the current timestamp.
For example,
package main
import (
"fmt"
"time"
)
func main() {
type example struct {
CreatedAt int64 `parquet:"name=created_at,type=TIMESTAMP_MILLIS"`
}
ex := example{}
ex.CreatedAt = time.Now().UnixNano() / int64(time.Millisecond)
fmt.Println(ex.CreatedAt)
}
Playground: https://play.golang.org/p/ePOlUKiT6fD
Output:
1257894000000

Can't store byte array in google's datastore

I'm using Google's datastore in my Go app. I have a Song struct, which has a uuid.UUID field.
type Song struct {
ID: uuid.UUID
Title: string
...
}
This UUID is taken from github.com/satori/go.uuid and is defined as
type UUID [16]byte
It seems that datastore can't handle byte arrays but in this use case only byte slices or strings. In the json package I can use a tag to interpret it as a string
type Song struct {
ID: uuid.UUID `json:"id,string"`
....
}
Is there a way of telling datastore to interpret the UUID as a slice/string or do I either have to give up "type"-safety and just store a string or use a custom PropertyLoadSaver?
Per Google's Documentation:
Valid value types are:
signed integers (int, int8, int16, int32 and int64),
bool,
string,
float32 and float64,
[]byte (up to 1 megabyte in length),
any type whose underlying type is one of the above predeclared types,
ByteString,
*Key,
time.Time (stored with microsecond precision),
appengine.BlobKey,
appengine.GeoPoint,
structs whose fields are all valid value types,
slices of any of the above.
So, you will have to use a byte slice or string. You could do some behind the scenes manipulation when you need to do your setting or getting like (Playground Example):
type uuid [16]byte
type song struct {
u []byte
}
func main() {
var b [16]byte
copy(b[:], "0123456789012345")
var u uuid = uuid(b) //this would represent when you get the uuid
s := song{u: []byte(u[:])}
copy(b[:], s.u)
u = uuid(b)
fmt.Println(u)
}
This could also be done through methods. (Playground example)
Alternatively, you could have an entity specific to the datastore that carries the byte slice, and the transformers that go to and from that entity know how to do the conversion.

struct Time property doesn't load from Go sqlx library

I have a struct with a property time:
type Basket struct {
...
Created_at time.Time `db:"created_at"`
}
with the time saved as:
basket.Created_at = time.Now().UTC()
If I save it using Insert sql statement, it saves the time nicely in the SQLite3 but when I select the desired record using:
ret_basket := Basket{}
err := database.DB.Get(&ret_basket, "SELECT id, ..., created_at FROM baskets WHERE user_id = ?", some_user_id)
It returns the record with other properties loaded properly except the time property which is ret_basket.Created_at as 0001-01-01 00:00:00 +0000 UTC
Any suggestions?
There is no official package for Sqlite so I assume you are using https://github.com/mattn/go-sqlite3 Probably your issue is result of wrong created_at field declaration in database which should be DATETIME because the next code works perfectly on my machine (I've removed all error checks):
package main
import (
"github.com/jmoiron/sqlx"
_ "github.com/mattn/go-sqlite3"
"log"
"time"
)
type Post struct {
Id int64 `db:"post_id"`
Created time.Time `db:"created"`
}
func main() {
db, _ := sqlx.Connect("sqlite3", "post_db.db")
db.MustExec("DROP TABLE IF EXISTS posts; CREATE TABLE posts (post_id INT, created DATETIME);")
p1 := Post{Id: 1, Created: time.Now().UTC()}
p2 := Post{}
tx := db.MustBegin()
tx.NamedExec("INSERT INTO posts (post_id, created) VALUES (:post_id, :created)", &p1)
tx.Commit()
db.Get(&p2, "SELECT post_id, created FROM posts WHERE post_id = $1", p1.Id)
log.Println(p2.Created.Format("2006-01-02"))
}

Deletion of folder on Amazon S3, while creating external table

We are getting very unusual behavior on our S3 Bucket and this behavior is not consistent. So, we are not able to pin point the problem. Now coming to the issue i fire one query(creation of external table). Which leads to deletion of the folder which i was pointing in external table. And this has happened 3-4 time to us. So, could you please explain this behaviour. For you convenience i am attaching the external table query and the logs which operation are being performed on S3 bucket.
Query:
create table apr_2(date_local string, time_local string,s_computername string,c_ip string,s_ip string,s_port string,s_sitename string, referer string, localfile string, TimeTakenMS string, status string, w3status string, sc_substatus string, uri string, qs string, sc_bytes string, cs_bytes string, cs_username string, cs_User_Agent string, s_proxy string, c_protocol string, cs_version string, cs_method string, cs_Cookie string, cs_Host string, w3wpbytes string, RequestsPerSecond string, CPU_Utilization string, BeginRequest_UTC string, EndRequest_UTC string, time string, logdate string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' location 's3://logs/apr_2_com'
logs:
REST.DELETE.OBJECT logs/apr_2_com/000002.tar.gz
REST.DELETE.OBJECT logs/apr_2_com/000001.tar.gz
Try using this syntax -
create external table if not exists apr_2(date_local string, time_local string,s_computername string,c_ip string,s_ip string,s_port string,s_sitename string, referer string, localfile string, TimeTakenMS string, status string, w3status string, sc_substatus string, uri string, qs string, sc_bytes string, cs_bytes string, cs_username string, cs_User_Agent string, s_proxy string, c_protocol string, cs_version string, cs_method string, cs_Cookie string, cs_Host string, w3wpbytes string, RequestsPerSecond string, CPU_Utilization string, BeginRequest_UTC string, EndRequest_UTC string, time string, logdate string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' location 's3://logs/apr_2_com'

Resources