I am currently getting data from an external API for use my in Laravel API. I have everything working but I feel like it is slow.
I'm getting the data from the API with Http:get('url) and that works fast. It is only when I start looping through the data and making edits when things are slowing down.
I don't need all the data, but it would still be nice to edit before entering the data to the database as things aren't very consitent if possible. I also have a few columns that use data and some logic to make new columns so that each app/site doesn't need to do it.
I am saving to the database on each foreach loop with the eloquent Model:updateOrCreate() method which works but these json files can easily be 6000 lines long or more so it obviously takes time to loop through each set modify values and then save to the database each time. There usually isn't more than 200 or so entries but it still takes time. Will probably eventually update this to the new upset() method to make less queries to the database. Running in my localhost it is currently take about a minute and a half to run, which just seams way too long.
Here is a shortened version of how I was looping through the data.
$json = json_decode($contents, true);
$features = $json['features'];
foreach ($features as $feature){
// Get ID
$id = $feature['id'];
// Get primary condition data
$geometry = $feature['geometry'];
$properties = $feature['properties'];
// Get secondary geometry data
$geometryType = $geometry['type'];
$coordinates = $geometry['coordinates'];
Model::updateOrCreate(
[
'id' => $id,
],
[
'coordinates' => $coordinates,
'geometry_type' => $geometryType,
]);
}
Most of what I'm doing behind the scenes to the data before going into the database is cleaning up some text strings but there are a few logic things to normalize or prep the data for websites and apps.
Is there a more efficient way to get the same result? This will ultimately be used in a scheduler and run on an interval.
Example Data structure from API documentation
{
"$schema": "http://json-schema.org/draft-04/schema#",
"additionalProperties": false,
"properties": {
"features": {
"items": {
"additionalProperties": false,
"properties": {
"attributes": {
"type": [
"object",
"null"
]
},
"geometry": {
"additionalProperties": false,
"properties": {
"coordinates": {
"items": {
"items": {
"type": "number"
},
"type": "array"
},
"type": "array"
},
"type": {
"type": "string"
}
},
"required": [
"coordinates",
"type"
],
"type": "object"
},
"properties": {
"additionalProperties": false,
"properties": {
"currentConditions": {
"items": {
"properties": {
"additionalData": {
"type": "string"
},
"conditionDescription": {
"type": "string"
},
"conditionId": {
"type": "integer"
},
"confirmationTime": {
"type": "integer"
},
"confirmationUserName": {
"type": "string"
},
"endTime": {
"type": "integer"
},
"id": {
"type": "integer"
},
"sourceType": {
"type": "string"
},
"startTime": {
"type": "integer"
},
"updateTime": {
"type": "integer"
}
},
"required": [
"id",
"userName",
"updateTime",
"startTime",
"conditionId",
"conditionDescription",
"confirmationUserName",
"confirmationTime",
"sourceType",
"endTime"
],
"type": "object"
},
"type": "array"
},
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"nameId": {
"type": "string"
},
"parentAreaId": {
"type": "integer"
},
"parentSubAreaId": {
"type": "integer"
},
"primaryLatitude": {
"type": "number"
},
"primaryLongitude": {
"type": "number"
},
"primaryMP": {
"type": "number"
},
"routeId": {
"type": "integer"
},
"routeName": {
"type": "string"
},
"routeSegmentIndex": {
"type": "integer"
},
"secondaryLatitude": {
"type": "number"
},
"secondaryLongitude": {
"type": "number"
},
"secondaryMP": {
"type": "number"
},
"sortOrder": {
"type": "integer"
}
},
"required": [
"id",
"name",
"nameId",
"routeId",
"routeName",
"primaryMP",
"secondaryMP",
"primaryLatitude",
"primaryLongitude",
"secondaryLatitude",
"secondaryLongitude",
"sortOrder",
"parentAreaId",
"parentSubAreaId",
"routeSegmentIndex",
"currentConditions"
],
"type": "object"
},
"type": {
"type": "string"
}
},
"required": [
"type",
"geometry",
"properties",
"attributes"
],
"type": "object"
},
"type": "array"
},
"type": {
"type": "string"
}
},
"required": [
"type",
"features"
],
"type": "object"
}
Second, related question.
Since this is being updated on an interval I have it updating and creating records from the json data, but is there an efficient way to delete old records that are no longer in the json file? I currently get an array of current ids and compare them to the new ids and then loop through each and delete them. There has to be a better way.
Have no idea what to say to your first question, but I think you may try to do something like this regarding the second question.
SomeModel::query()->whereNotIn('id', $newIds)->delete();
$newIds you can collect during the first loop.
I followed
and https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html but I could not find the answer that required .
I created index using
GET film/_mapping
Also as per the accepted answer in Elasticsearch: Datatype for time(HH:mm:ss.SSS) field
I created my mapping as per below .
PUT film/_mapping
{
"properties": {
"filmname": {
"type": "keyword"
},
"runtime": {
"type": "date",
"format": "hour_minute_second_fraction"
},
"genre": {
"type": "text"
},
"releasedate":{
"type": "date",
"format": "yyyy-MM-dd"
},
"budget":{
"type": "double"
}
}
}
I tried to add a document using below request .
PUT film/_doc/1/
{
"film": "The Terminator",
"runtime": "12_50_30_40",
"genre": "SCIFI war Action",
"releasedate": "1984-11-30",
"budget" : "45.12"
}
But it throws the attached error , actually I want to add this field only in hour_minute_second format.
The format expected by hour_minute_second_fraction is HH:mm:ss.SSS, so in your case, it would mean 12:50:30.400
If you want to keep using 12_50_30_40, then you need to set your own format, like this:
"runtime": {
"type": "date",
"format": "HH_mm_ss_SS"
},
Am trying to create a mapping for my new index named as region. Please find my below mapping file.
PUT region
{
"mappings": {
"doc": {
"properties": {
"catalog_product_id": {
"type": "long",
},
"id": {
"type": "long"
},
"region_id":{
"type": "text"
},
"region_type":{
"type" : "text"
}
}
}
}
}
}
While am trying to execute this mapping script am getting the below error
was expecting double-quote to start field name
I have double checked your mapping with my Kibana, and it seems that there is a parsing error on the JSON format. Remove , after "type": "long", and remove one of } from the end, as follows:
PUT region
{
"mappings": {
"doc": {
"properties": {
"catalog_product_id": {
"type": "long"
},
"id": {
"type": "long"
},
"region_id":{
"type": "text"
},
"region_type":{
"type" : "text"
}
}
}
}
}
I have created the index in elasticsearch with the following mapping:
{
"test": {
"mappings": {
"documents": {
"properties": {
"fields": {
"type": "nested",
"properties": {
"uid": {
"type": "keyword"
},
"value": {
"type": "text",
"copy_to": [
"fulltext"
]
}
}
},
"fulltext": {
"type": "text"
},
"tags": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"url": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
While searching I want to set the preference of fields for example if search text found in title or url then that document comes first then other documents.
Can we set a field preference for search result sequence(in my case preference like title,url,tags,fields)?
Please help me into this?
This is called "boosting" . Prior to elasticsearch 5.0.0 - boosting could be applied in indexing phase or query phase( added as part of field mapping ). This feature is deprecated now and all mappings after 5.0 are applied in query time .
Current recommendation is to to use query time boosting.
Please read this documents to get details on how to use boosting:
1 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
2 - https://www.elastic.co/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
Is it possible to use multi-fields to set and query multilingual fields?
Consider this mapping:
PUT multi_test
{
"mappings": {
"data": {
"_field_names": {
"enabled": false
},
"properties": {
"book_title": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
},
"german": {
"type": "text",
"analyzer": "german"
},
"italian": {
"type": "text",
"analyzer": "italian"
}
}
}
}
}
}
}
I tried the following, but it doesn't work:
PUT multi_test/data/1
{
"book_title.english": "It's good",
"book_title.german": "Das gut"
}
The error seems to indicate I'm trying to add new fields:
{ "error": { "root_cause": [ { "type": "mapper_parsing_exception",
"reason": "Could not dynamically add mapping for field
[book_title.english]. Existing mapping for [book_title] must be of
type object but found [text]." } ], "type":
"mapper_parsing_exception", "reason": "Could not dynamically add
mapping for field [book_title.english]. Existing mapping for
[book_title] must be of type object but found [text]." }, "status":
400 }
What am I doing wrong here?
If my approach is unworkable, what is a better way to do this?
The problem is that you are using using fields for the field book_title.
Fields keyword is used when you want to keep same field and data in multiple ways i.e using different analyzers or some other setting changes but values should be same in all field names under fields.Here is the link describing what is keyword fields https://www.elastic.co/guide/en/elasticsearch/reference/2.4/multi-fields.html
In you use case the mapping should be like below
PUT multi_test
{
"mappings": {
"data": {
"_field_names": {
"enabled": false
},
"properties": {
"book_title": {
"properties": {
"english": {
"type": "text",
"analyzer": "english"
},
"german": {
"type": "text",
"analyzer": "german"
},
"italian": {
"type": "text",
"analyzer": "italian"
}
}
}
}
}
}
}
This will define book_title as object type and you can add multiple fields with different data under book_title