enableAutoTierToHotFromCool Does not move from cool to hot - azure-blob-storage

I have some Azure Storage Accounts (StorageV2) located in West Europe. All blobs uploaded are by default in the Hot tier and I have this lifecycle rule defined on them:
{
"rules": [
{
"enabled": true,
"name": "moveToCool",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"enableAutoTierToHotFromCool": true,
"tierToCool": {
"daysAfterLastAccessTimeGreaterThan": 1
}
}
},
"filters": {
"blobTypes": [
"blockBlob"
]
}
}
}
]
}
Somehow the uploaded blobs are moved to cool but then after I access them again, in the portal they still appear under Cool tier. Any idea why? (I have waited more than 24 for the rule to be in effect)
Some more questions about: "enableAutoTierToHotFromCool": true:
does it depend on the blob size? (for example if some blobs were moved to cool and then they accessed simultaneously the time between a 1 Gib is moved back to hot is the same for 10KiB blob)
does it depend on the number of blobs that are accessed? (it there a queue and if multiple blobs from cool are accessed in the same time, the requests are served based on a queue order)

The enableAutoTierToHotFromCool property is a Boolean value that indicates whether a blob should automatically be tiered from cool back to hot if it is accessed again after being tiered to cool.
And to apply new policy it takes 48hrs, and enableAutoTierToHotFromCool": true doesn’t depend on size of blob , and not depends on the number of blobs
If you enable firewall rules for your storage account, lifecycle management requests may be blocked. You can unblock these requests by providing exceptions for trusted Microsoft services. For more information, refer this document the Exceptions section in Configure firewalls and virtual networks.
A lifecycle management policy must be read or written in full. Partial updates are not supported. So try with writing
"prefixMatch": [
"containerName/log"
]
For more details refer this document:

Related

Rate limit exceeded - only used on one computer

I'm getting an error with Plaid that my rate limit has been exceeded, since I have 5 items in use on my developer account. I have only used Plaid on my localhost from my browser, and the quickstart app to look up my actual accounts. I'm confused how it thinks these are new systems - and also how to release one of these items so it frees up developer slots. The documentation says I can hit the release route, but that that doesn't restore an item slot.
Is there anything I'm missing?
{
"display_message": null,
"documentation_url": "https://plaid.com/docs/?ref=error#rate-limit-exceeded-errors",
"error_code": "ADDITION_LIMIT",
"error_message": "addition limit exceeded for this client_id. contact support to increase the limit.",
"error_type": "RATE_LIMIT_EXCEEDED",
"request_id": "#####",
"suggested_action": null
}
Ah yes you are not the first person to be confused by this! You need to request access to Development via the Plaid dashboard, which, once approved, will unlock access to 95 additional Items. You can do this here: https://dashboard.plaid.com/overview/development
The number of computers you are using doesn't matter, the only thing being counted is Items -- each Item takes up one slot, but in Development deleting Items does not free up slots.

Any way to monitor Nifi Processors? Any Utility Dashboard?

If I have developed a NiFi flow and a support person wants to view what's the current state and which processor is currently running, which processor already ran, which ones completed?
I mean to say any dashboard kind of utility provided by NiFi to monitor activities ?
You can use the Reporting tasks and NiFi itself, or a new NiFi instance, that is what I choose.
To do that you must do the following:
Open the reporting task menu
And add the desired reporting tasks
And configure it properly
Then create a flow to manage the reporting data
In my case I am putting the information into an Elasticsearch
There are numerous ways to monitor NiFi flows and status. The status bar along the top of the UI shows running/stopped/invalid processor counts, and cluster status, thread count, etc. The global menu at the top right has options for monitoring JVM usage, flowfiles processed/in/out, CPU, etc.
Each individual processor will show a status icon for running/stopped/invalid/disabled, and can be right-clicked for the same JVM usage, flowfile status, etc. graphs as the global view, but for the individual processor. There are also some Reporting Tasks provided by default to integrate with external monitoring systems, and custom reporting tasks can be written for any other desired visualization or monitoring dashboard.
NiFi doesn’t have the concept of batch/job processing, so processors aren’t “complete”.
1. In built monitoring in Apache NiFi.
Bulletin Board
The bulletin board shows the latest ERROR and WARNING getting generated by NiFi processors in real time. To access the bulletin board, a user will have to go the right hand drop down menu and select the Bulletin Board option. It refreshes automatically and a user can disable it also. A user can also navigate to the actual processor by double-clicking the error. A user can also filter the bulletins by working out with the following −
by message
by name
by id
by group id
Data provenance UI
To monitor the Events occurring on any specific processor or throughout NiFi, a user can access the Data provenance from the same menu as the bulletin board. A user can also filter the events in data provenance repository by working out with the following fields −
by component name
by component type
by type
NiFi Summary UI
Apache NiFi summary also can be accessed from the same menu as the bulletin board. This UI contains information about all the components of that particular NiFi instance or cluster. They can be filtered by name, by type or by URI. There are different tabs for different component types. Following are the components, which can be monitored in the NiFi summary UI −
Processors
Input ports
Output ports
Remote process groups
Connections
Process groups
In this UI, there is a link at the bottom right hand side named system diagnostics to check the JVM statistics.
2. Reporting Tasks
Apache NiFi provides multiple reporting tasks to support external monitoring systems like Ambari, Grafana, etc. A developer can create a custom reporting task or can configure the inbuilt ones to send the metrics of NiFi to the externals monitoring systems. The following table lists down the reporting tasks offered by NiFi 1.7.1.
Reporting Task:
AmbariReportingTask - To setup Ambari Metrics Service for NiFi.
ControllerStatusReportingTask - To report the information from the NiFi summary UI for the last 5 minute.
MonitorDiskUsage - To report and warn about the disk usage of a specific directory.
MonitorMemory To monitor the amount of Java Heap used in a Java Memory pool of JVM.
SiteToSiteBulletinReportingTask To report the errors and warning in bulletins using Site to Site protocol.
SiteToSiteProvenanceReportingTask To report the NiFi Data Provenance events using Site to Site protocol.
3. NiFi API
There is an API named system diagnostics, which can be used to monitor the NiFI stats in any custom developed application.
Request
http://localhost:8080/nifi-api/system-diagnostics
Response
{
"systemDiagnostics": {
"aggregateSnapshot": {
"totalNonHeap": "183.89 MB",
"totalNonHeapBytes": 192819200,
"usedNonHeap": "173.47 MB",
"usedNonHeapBytes": 181894560,
"freeNonHeap": "10.42 MB",
"freeNonHeapBytes": 10924640,
"maxNonHeap": "-1 bytes",
"maxNonHeapBytes": -1,
"totalHeap": "512 MB",
"totalHeapBytes": 536870912,
"usedHeap": "273.37 MB",
"usedHeapBytes": 286652264,
"freeHeap": "238.63 MB",
"freeHeapBytes": 250218648,
"maxHeap": "512 MB",
"maxHeapBytes": 536870912,
"heapUtilization": "53.0%",
"availableProcessors": 4,
"processorLoadAverage": -1,
"totalThreads": 71,
"daemonThreads": 31,
"uptime": "17:30:35.277",
"flowFileRepositoryStorageUsage": {
"freeSpace": "286.93 GB",
"totalSpace": "464.78 GB",
"usedSpace": "177.85 GB",
"freeSpaceBytes": 308090789888,
"totalSpaceBytes": 499057160192,
"usedSpaceBytes": 190966370304,
"utilization": "38.0%"
},
"contentRepositoryStorageUsage": [
{
"identifier": "default",
"freeSpace": "286.93 GB",
"totalSpace": "464.78 GB",
"usedSpace": "177.85 GB",
"freeSpaceBytes": 308090789888,
"totalSpaceBytes": 499057160192,
"usedSpaceBytes": 190966370304,
"utilization": "38.0%"
}
],
"provenanceRepositoryStorageUsage": [
{
"identifier": "default",
"freeSpace": "286.93 GB",
"totalSpace": "464.78 GB",
"usedSpace": "177.85 GB",
"freeSpaceBytes": 308090789888,
"totalSpaceBytes": 499057160192,
"usedSpaceBytes": 190966370304,
"utilization": "38.0%"
}
],
"garbageCollection": [
{
"name": "G1 Young Generation",
"collectionCount": 344,
"collectionTime": "00:00:06.239",
"collectionMillis": 6239
},
{
"name": "G1 Old Generation",
"collectionCount": 0,
"collectionTime": "00:00:00.000",
"collectionMillis": 0
}
],
"statsLastRefreshed": "09:30:20 SGT",
"versionInfo": {
"niFiVersion": "1.7.1",
"javaVendor": "Oracle Corporation",
"javaVersion": "1.8.0_151",
"osName": "Windows 7",
"osVersion": "6.1",
"osArchitecture": "amd64",
"buildTag": "nifi-1.7.1-RC1",
"buildTimestamp": "07/12/2018 12:54:43 SGT"
}
}
}
}
You also can use nifi-api for monitoring. You can receive detailed information about each processor group, controller service or processor.
You can use MonitoFi. It is an open source tool that is highly configurable, uses nifi-api to collect stats about various nifi processors and stores those in influxdb. It also comes with Grafana Dashboards and Alerting functionality.
http://www.monitofi.com
or
https://github.com/microsoft/MonitoFi

Is it possible to ask Terraform to destroy AWS nodes with known IPs

We use Terraform to create and destroy Mesos DC/OS cluster on AWS EC2. Number of agent nodes is defined in a variable.tf file:
variable "instance_counts" {
type = "map"
default = {
master = 1
public_agent = 2
agent = 5
}
}
Once the cluster is up, you can add or remove agent nodes by changing the number of agent in that file and apply again. Terraform is smart enough to recognize the difference and act accordingly. When it destroy nodes, it tends to go for the highest numbered nodes. For example, if I have a 8-node dcos cluster and want to terminate 2 of the agents, Terraform would take down dcos_agent_node-6 and dcos_agent_node-7.
What if I want to destroy an agent with a particular IP? Terraform must be aware of the IPs because it knows the order of the instances. How do I hack Terraform to remove agents by providing the IPs?
I think you're misunderstanding how Terraform works.
Terraform takes your configuration and builds out a dependency graph of how to create the resources described in the configuration. If it has a state file it then overlays information from the provider (such as AWS) to see what is already created and managed by Terraform and removes that from the plan and potentially creates destroy plans for resources that exist in the provider and state file.
So if you have a configuration with a 6 node cluster and a fresh field (no state file, nothing built by Terraform in AWS) then Terraform will create 6 nodes. If you then set it to have 8 nodes then Terraform will attempt to build a plan containing 8 nodes, realises it already has 6 and then creates a plan to add the 2 missing nodes. When you then change your configuration back to 6 nodes Terraform will build a plan with 6 nodes, realise you have 8 nodes and create a destroy plan for nodes 7 and 8.
To try and get it to do anything different to that would involve some horrible hacking of the state file so that it thinks that nodes 7 and 8 are different to the ones most recently added by Terraform.
As an example your state file might look something like this:
{
"version": 3,
"terraform_version": "0.8.1",
"serial": 1,
"lineage": "7b565ca6-689a-4aab-a3ec-a1ed77e83678",
"modules": [
{
"path": [
"root"
],
"outputs": {},
"resources": {
"aws_instance.test.0": {
"type": "aws_instance",
"depends_on": [],
"primary": {
"id": "i-01ee444f57aa32b8e",
"attributes": {
...
},
"meta": {
"schema_version": "1"
},
"tainted": false
},
"deposed": [],
"provider": ""
},
"aws_instance.test.1": {
"type": "aws_instance",
"depends_on": [],
"primary": {
"id": "i-07c1999f1109a9ce2",
"attributes": {
...
},
"meta": {
"schema_version": "1"
},
"tainted": false
},
"deposed": [],
"provider": ""
}
},
"depends_on": []
}
]
}
If I wanted to go back to a single instance instead of 2 then Terraform would attempt to remove the i-07c1999f1109a9ce2 instance as the configuration is telling it that aws_instance.test.0 should exist but not aws_instance.test.1. To get it to remove i-01ee444f57aa32b8e instead then I could edit my state file to flip the two around and then Terraform would think that that instance should be removed instead.
However, you're getting into very difficult territory as soon as you start doing things like that and hacking the state file. While it's something you can do (and occasionally may need to) you should seriously consider how you are working if this is anything other than a one off case for a special reason (such as moving raw resources into modules - now made easier with Terraform's state mv command).
In your case I'd question why you need to remove two specific nodes in a Mesos cluster rather than just specifying the size of the Mesos cluster. If it's a case of a specific node being bad then I'd always terminate it and allow Terraform to build me a fresh, healthy one anyway.

Resource Identifiers between two FHIR servers

Our scenario is there is an EHR system that is integrating with a device sensor partner using FHIR. In this scenario both companies will have independent FHIR servers. Each of them has different Patient and Organization(s) records with their own identifiers. The preference is the the sensor FHIR server keep the mapping of EHR identifiers to it's own internal identifiers for these resources
The EHR wants to assign a Patient to a Device with the sensor FHIR server.
Step 1: First the EHR would #GET the list of Device resources for a given Organization where a Patient is not currently assigned from the sensor FHIR server e.g.
/api/Device?organization.identifier=xyz&patient:missing=true
Here I would assume the Organization identifier is that of the EHR system since the EHR system doesn't have knowledge of the sensor system Organization identifier at this point.
The reply to this call would be a bundle of devices:
... snip ...
"owner": {
"reference": "http://sensor-server.com/api/Organization/3"
},
... snip ...
Question 2: Would the owner Organization reference have the identifier from the search or the internal/logical ID as it's known by the sensor FHIR server as in the snippet above?
Step 2: The clinician of the EHR system chooses a Device from the list to assign it to a Patient in the EHR system
Step 3: The EHR system will now issue a #PUT /api/Device/{id} request back to the sensor FHIR server to assign a Patient resource to a Device resource e.g.
{
"resourceType": "Device",
"owner": {
"reference": "http://sensor-server.com/api/Organization/3"
},
"id": "b4994c31f906",
"patient": {
"reference": "https://ehr-server.com/api/Patient/4754475"
},
"identifier": [
{
"use": "official",
"system": "bluetooth",
"value": "b4:99:4c:31:f9:06",
"label": "Bluetooth address"
}
]
}
Question 3: What resource URI/identifier should be used for the Patient resource? I would assume it is that of the EHR system since the EHR system doesn't have knowledge of the sensor system Patient identifier. Notice however, that the Organization reference is to a URI in the sensor FHIR server while the Patient reference is a URI to the EHR system - this smells funny.
Step 4: The EHR can issue a #GET /api/Device/{id} on the sensor FHIR server and get back the Device resource e.g.
{
"resourceType": "Device",
"owner": {
"reference": "http://sensor-server.com/api/Organization/3"
},
"id": "b4994c31f906",
"patient": {
"reference": "https://sensor-server.com/api/Patient/abcdefg"
},
"identifier": [
{
"use": "official",
"system": "bluetooth",
"value": "b4:99:4c:31:f9:06",
"label": "Bluetooth address"
}
]
}
Question 4: Would we expect to see a reference to the Patient containing the absolute URI to the EHR FHIR server (as it was on the #PUT in Step 3) or would/could the sensor FHIR server have modified that to return a reference to a resource in it's FHIR server using it's internal logical ID?
I didn't see a Question 1, so I'll presume it's the "assume" sentence in front of your first example. If the EHR is querying the device sensor server and the organizations on the device sensor server include the business identifier known by the EHR, then that's reasonable. You would need some sort of business process to ensure that occurs though.
Question 2: The device owner element would be using a resource reference, which means it's pointing to the "id" element of the target organization. Think of resource ids as primary keys. They're typically assigned by the server that's storing the data, though in some architectures, they can be set by the client (who creates the record using PUT instead of POST). In any event, you can't count on them being meaningful business identifiers - and according to most data storage best practices, they generally shouldn't be. And if, as I expect, your scenario involves multiple EHR clients potentially talking to the "device" server, the resource id couldn't possibly align with the business ids of all of the EHRs. (That's a long way of saying "no 'xyz' probably won't be '3')
Question 3: If the EHR has its own server, the EHR client could update the device on the "sensor" server to point to a URL on the EHR server. Whether that's appropriate or not depends on your architecture. If you want other EHRs to recognize the patient, then you'd probably want the "sensor" server to host patients too and for the EHR to look up the patient by business id and then reference the "sensor" server's URL. If not, then pointing to the EHR server's URL is fine.
Question 4: When you do a "GET", its normal to receive back the same data you specified on a POST. It's legal for the server to change the data, including possibly updating references. But that's likely to confuse a lot of client systems, so it's not generally recommended or typical.

Why isn't the Google QPX Express API returning results for all airlines?

I enabled access to the Google QPX Express API to do some analytics on the prices of Delta's tickets and Fare Classes. But the response seems to only include flights from a limited set of airlines.
For example, the following request
{
"request": {
"passengers": {
"adultCount": 1
},
"slice": [
{
"origin": "JFK",
"destination": "SFO",
"date": "2015-02-15",
"maxStops": 0
}
],
"solutions": 500
}
}
only returns flights for AS (Alaska Airlines), US (US Air), VX (Virgin America), B6 (JetBlue), and UA (United Airlines).
If I add "permittedCarriers": [DL], then I get an empty response. Likewise, I get an empty response if I leave out permittedCarriers and look for flights between Delta hubs (e.g., "origin": "ATL", "destination": "MSP").
The documentation suggests that QPX Express is supposed to have most airline tickets available. Is there something wrong with my request? Why am I not seeing any results for Delta?
I received a response from Google's QPX Express help team about missing data for Delta. The response was that
Delta's data, as well as American Airline's data, is not included in
QPX Express search results as a default. Access to their data
requires approval by those carriers.
After informing him that my plans to use the data were for research purpsoses, he responded,
American and Delta restrict access to their pricing and availability
to companies which they approve, which are primarily organizations
driving the sale of airline tickets. Unfortunately, requests for
access are only being reviewed for companies that plan to use the API
for commercial purposes.

Resources