I'm trying to use CSVLogger to log metrics during training. Although I see loss of both train and valid, I see only the metrics (e.g. accuracy) of train in the saved log file at each epoch:
Howe can I extract the metrics of valid data as well?
Related
I just started trying to integrate micrometer, prometheus and Grafana into my microservices. At a first glance, it is very easy to use and there are many existing dashboard you can rely on. But the more I test the more it gets confusing. Maybe I don't understand the main idea behind this technology stack.
I would like to start my custom Grafana dashboard by showing the amount of request per endpoint for the selected time range (as a single stat), but I am not able to find the right query for that (and I am not sure it exists)
I tried different:
http_server_requests_seconds_count{uri="/users"}
Which always shows the current value. For example, if I sent 10 requests 30 minutes ago, this query will also return value 10 when I am changing changing the time range last 5 minutes (even though no request was entering the system during the last 5 minutes)
When I am using
increase(http_server_requests_seconds_count{uri="/users"}[$__range])
the query will not return the accurate value, instead something close to actual request amount. At least it works for a time range that doesn't include new incoming requests. In that case the query return 0.
So my question is, is there a way to use this Technology stack to get the amount of new requests for the selected period of time?
For the sake of performance when operating with millions of time series, many Prometheus functions show approximate and/or interpolated values. For example, the increase() function is basically a per-second rate() multiplied by the number of seconds in the interval. With such formula and possible missing data points, an accurate result is rather an exception than a normal thing.
The reason why it is so is that Prometheus exchanges accuracy for performance and reliability. It doesn't really matter if your server actual CPU usage is 86.3% instead of 86.4%, but it does matter whether you can get this information instantly. Prometheus even have this statement in their docs:
Prometheus values reliability. You can always view what statistics are available about your system, even under failure conditions. If you need 100% accuracy, such as for per-request billing, Prometheus is not a good choice as the collected data will likely not be detailed and complete enough. In such a case you would be best off using some other system to collect and analyze the data for billing, and Prometheus for the rest of your monitoring.
That being said, if you really need accurate values consider using something else. You can for example store logs and count lines (Grafana Loki, The Elastic Stack), or maybe write and retrieve this information from a traditional database with your own solution.
I am beginning to train a semantic segmentation model in AWS Sagermaker and they provide the following metrics for the output. I understand mIOU, loss, and pixel accuracy, but I do not know what throughput is or how to interpret it. Please see the image below and let me know if you need additional information.
Throughput is reported in records per second (i.e. images per second). It shows how fast the algorithm can iterate over training or validation data. For example, with a throughput of 30 records/sec it would take a minute to iterate over 1800 images.
I am quite new to « Big Data » technologies, especially Cassandra, so I need your advices for the task I have to do. I have been looking to Datastax examples about handling timeseries, and different discussion here about this topic, but if you think I might have missed something, feel free to tell me.
Here it my problem.
I need to store and analyze data coming from about 100 sensor stations that we are testing. In each sensor station, we have several thousand sensors. So for each station, we run several tests (about 10, each one lasting about 2h30), during which the sensors are recording information every millisecond (can be boolean, integer or float). The records of each test are kept on the station during the test, then they are sent to me once the test is completed. It means about 10 GB for each test (each parameter is about 1 MB of information).
Here is a schema to illustrate the hierarchy:
Hierarchy description
Right now, I have access to a small Hadoop Cluster with Spark and Cassandra for testing. I may be able to install other tools, but I would really appreciate to keep working with Spark/Cassandra.
My question is: what could be the best data model for storing then analyzing the information coming from these sensors?
By “analyzing”, I mean:
find min, max, average value on a specific parameter recorded by a specific sensor on a specific station; or find those values for a specific parameter but for all the station; or find those value for a specific parameter but when other parameters (one or two) of the same station are upper than a limit
plot the evolution of one or more parameters to compare them visually (the same parameter on different stations, or different parameters on the same station)
do some correlation analysis between parameters or stations (eg. to find if a sensor is not working).
I was thinking of putting all the information in a Cassandra Table with the following data model:
CREATE TABLE data_stations (
station text, // station ID
test int, // test ID
parameter text, // name of recorded parameter/sensor
tps timestamp, // timestamp
val float, // measured value
PRIMARY KEY ((station, test, parameter), tps)
);
However, I don’t know if one table would be able to handle all the data : a quick calculation give 10^14 different rows according to the precedent data model (100 stations x 10 test x 10 000 parameters x 9,000,000ms (2h30 in milliseconds) ~= 10^14), even if each partition is “only” 9,000,000 rows.
Other ideas were to split the data in different table (eg. One table per station, or one table per test per station, etc.). I don’t know what and how to choose, so any advice is welcome!
Thank you very much for your time and help, if you need more information or details I would be glad to tell you more.
Piar
You are on the right track, Cassandra can handle such data. You may store all the data you want it column families and use Apache Spark over Cassandra to do the required aggregations.
I feel Apache Spark is good for your use case as it could be used for aggregations and calculating correlations.
You may also check out Apache Hive as it can work/query over data in HDFS directly(through external tables).
Check these :
Cassandra - Max. size of wide rows?
Limitations of Cassandra
Current Situation:
I am using JMeter for doing the Performance regression of my application. The scripts are prepared and are executed every night.
I am also using JMeter plugins to capture the PerfMon stats and JMX stats during the test. The response time stats, perfmon stats and JMX stats are all stored in the file in csv format.
Problem Statement:
Q1: The daily analysis of the results is tedious task. Also, we want to plot the daily trends of Response time and Server metrics and share it with larger group. Do you have any suggestions on available tools (open source/ free preferred) that can help us to plot daily trends for response time and server metrics.
If we have to develop our own tool then...
Q2: While plotting the trend, what will be the best way to convey the regression status with minimum number of graphs? Our suite has more than 200 samplers and is growing every month. Plotting the daily trends for 200+ samples in a graph is very confusing for the end audience. Can you suggest a way where I can get a single number to plot the daily trend.
I would recommend going for Jenkins. With Performance Plugin it can
execute JMeter tests on demand or automatically basing on many possible triggers
plot performance trend basing on previous execution results
conditionally fail the builds in case of e.g. response time exceeds a certain threshold
and many more. See Continuous Integration 101: How to Run JMeter With Jenkins article for more detailed explanation on Jenkins, Performance Plugin and JMeter installation and configuration.
Another possible solution could be using JChav - JMeter Chart History and Visualization
I want to know the difference between Hadoop batch analytics and Hadoop real time analytics.
E.g Hadoop real time analytics can be done using Apache Spark while Hadoop batch analytics can be done using Map reduce programming.
Also if real time analytics is the more preferred one then what is batch analytics required for?
thanks
Batch means you process aaaaaaall data you have collected so far. Real-time means you process data as it enters the system. Neither one is "preferred".
Let me explain use cases for Batch processing & Real processing.
Batch processing:
In stock market application, you have requirement to provide below summary data on daily basis
For each stock, total number of buy orders and sum of all buy orders
For each stock, total number of sell order and sum of all sell orders
For each stock, total number of successful orders & failed orders
etc.
Here need 24 hours of stock market data to generate these reports.
** Weather application: **
Save weather reports of all places in the world for all countries. For a given place like Newyork or Country like America, find hottest and coldest day since 1900. This query requires huge input data sets which requires processing on thousands of noudes.
You can use Hadoop Map Reduce job to provide above summary. You may have to process Peta bytes of data, which is stored in 4000+ servers in Hadoop cluster.
Real time analytics:
Another use case, you logged into social networking site like facebook or twitter. Your friends posted a message on your wall or tweeted in twitter. You have to get these notification in real time.
When you visit sites like Booking.com to book a hotel, You will get real time notifications like X users are currently viewing this hotel etc. These notifications are generated in real time.
In above use cases, system should process streams of data and generate real time notifications to users instead of waiting for one day data. Spark streaming provides excellent support to handle these type of scenarios.
Spark uses in - memory processing for faster query execution but it's not possible to always use in - memory for peta bytes of data. Spark can process terabytes of data and Hadoop can process Peta bytes of data.
Hadoop batch analytics and real time analytics both are totally different, It depends on your use case what you want, example - you have a large volume of row dataset and you want to extract only few information from that dataset, information may be based on some calculation/trending etc. than this can be done with batch processing like finding a minimum temperature since last 50 years.
Whereas real time analytics, means you need the expected output ASAP like your friend tweeted on twitter and you get the tweets as soon as tweeted by your friend.
Batch data processing is an efficient way of processing high volumes of data is where a group of transactions is collected over a period of time. Data is collected, entered, processed and then the batch results are produced (Hadoop focused on batch data processing). Batch processing requires separate programs for input, process and output. An example is payroll and billing systems.
In contrast, real time data processing involves a continual input, process and output of data. Data must be processed in a small time period (or near real time). Radar systems, customer services and bank ATMs are examples.