I have a data table that is periodically updated by another service. I log when these updates occur in the database. I'd like to be able to somehow rate the most recently updated data records.
I don't want to select exactly the most recent, I'd like to sort of account for previous updates as well. I'll try and explain what I mean using an example. Suppose I have 3 data items
Item | Updates (Days Ago)
One | 30, 25, 19, 1
Two | 5, 3, 2
Three| 30, 25, 20, 15, 10, 5
So judging by the above list, I'd like to have Three first because it appears to be the most consistent and has the most updates. Next I would like two because it has been active as of late. Then finally one because despite it having the most recent update, it had few updates in the iterim time window.
I haven't outlined exactly how my algorithm will work, as I don't know yet, but I've hopefully explained what I'm generally hoping to achieve.
I'm not explicitly looking for an implementation, I'm looking to be pointed in a helpful direction. Are there specific algorithms that address this problem, or a similar problem?
You could create some form of weighted equation. Say you want the following criteria:
Number of updates (numUpdates)
1 / days since last post (lastPost)
Average updates per day since creation (avgUpdates)
Then, you could give each criteria a specific weight. E.G,
rating = (weight1 * numUpdates) + (weight2 * lastPost) + (weight3 * avgUpdates)
Related
I have been struggling with the following calculation. I have tried a few previous, next and overs but I cant seem to get the syntax correct.
Basically i need to subtract demand from stock on hand, to get a new column. the the next row will use the newly created column as stock on hand and the subtract the demand for that row, then that result becomes the new stock on hand etc. i cant get it to loop. I have ranked the demand in order of date required per plant. AS the data set will have multiple Plants, SOH and demand.
The attached pic shows A020 only has one QTY short so that is straight forward, but for A030 opening SOH is 152, and the 1st date QTY short is 12, so i need 152-12 = 140. then the second date QTY which is ranked 2, needs to be 140 - 12 = 128, so then rank 3 uses 128 - 12 and so on. ie the SOH needs to dynamically update.
data set
It might not be natively possible in point-and-click Spotfire (happy to be corrected if this is incorrect).
You should consider writing a data function using R to do this groupby-loop operation.
I have a PowerPivot table for which I need to be able to determine how long an item was in an Error state. My data set looks something like this:
What I need to be able to do is to look at the values in the ID and State columns, and see if the value in the previous row is ERROR in the State column, and the same in the ID column. If it is, I then need to calculate the difference between the Changed Date values in those two rows.
So, for example, when I got to row 4, I would see that the value in the State column for Row 3, the previous row, is ERROR, and that the value in the ID column in the previous row is the same as the current row, so I would then calculate the difference between the Changed Date values in Row 3 and Row 4 (I don't care about the values in any of the other columns for this particular requirement).
Is there a way to do this in PowerPivot? I've done a fair amount of Internet searching, and it looks like if it can be done, it would use the EARLIER or EARLIEST DAX functions, but I can't find anything that tells me how, or even if, this can be done.
Thanks.
Chris,
I have had similar requirements many times and after a really long time of trial-and-error, I finally understood how EARLIER works. It can be very powerful, but also very slow so always check for the performance of your calculations.
To answer your question, you will need to create 4 calculated columns:
1) Item Rank - used for ranking the issues with same Item ID
=COUNTROWS(FILTER('ID', EARLIER([Item ID]) = [Item ID] && EARLIER([Date]) >= [Date]))
2) Follows Error - to easily find issue that follows EROR issue
=IF([State] = "EROR",[Item Rank]+1)
3) Time of Following Issue - simple lookup so that you can calculate the different
=IF([Follows Error]>0,
LOOKUPVALUE([Date], [User], [User], [Item Rank], [Follows Error]),
BLANK()
)
4) Time Diff - calculation of time different for the specific issue
=IF([State]="EROR",
DAY([Time of Following Issue])-DAY([Date]),
BLANK()
)
With those calculated columns, you can then easily create a powerpivot table, drag State and Item Id onto the ROWS pane and then simply add Time Diff to Values. You will get an overview of issues that contain string "EROR" issue and the time it took to resolve them.
This is what it looks like in PowerPivot window:
And the resulting Pivot table:
You can download my Excel file here (2013).
As I mentioned, be careful with the performance as the calculated columns with nested EARLIER and IF conditions might be a bit too performance-demanding. If there is a smarter way, I would be very happy to see it, but for now this works for me just fine.
Also, keep in mind that all calculated columns could be nested into 1, but I kept them separated to make it easier to understand the formulas.
Hope this helps :-)
Context
I'm working on a small web app to store photos. Photos are ordered according to their timestamp (the date they've been taken), and it's working great. Here's a simplified look at the database:
+--------------+-------------------+
| id | timestamp |
+--------------+-------------------+
| 1 | 1000000003 |
| 2 | 1000000000 |
+--------------+-------------------+
Now I'd like to add the possibility to re-order photos. And I can't find a way of doing that without any downsides.
What I did
I first added a column to the table to save a custom order.
+--------------+-------------------+-------------+
| id | timestamp | order |
+--------------+-------------------+-------------+
| 1 | 1000000003 | 1 |
| 2 | 1000000000 | 2 |
+--------------+-------------------+-------------+
First issue: I believe I can't order photos according to two different criteria, because it'd be hard to know which one has to be given precedence.
So I'm ordering them using the order column, and only this one. When I added the order column, I gave each photo a value so that the current order would remain. I now have photos ordered by order, in the same order as when they were ordered by timestamp.
I can now re-order some photos manually, and the other ones will stay where they belong. The first issue has been solved.
But now, I want to add a new photo.
Second issue: I know when the new photo I'm adding has been taken, but my photos aren't ordered by their timestamp anymore. This photo needs to be correctly ordered, thus it needs a correct order value.
This is the issue: a correct order value.
Here are two ways I could handle a new photo:
Give it an order value greater than others. In the previous table, a new photo would be given order = 3. This is obviously a bad idea, since it doesn't take its timestamp into account. A recent photo would still be the last one displayed.
"Insert" it where it belongs, according to its timestamp. Looking at the same table, if the timestamp of the new photo was 1000000002, the new photo would be given order = 2, and the order of every following photo would be increased by 1.
The second solution looks great, except in one case: if the order of the photo #2 had been manually changed to let's say 50, the new photo would have been given order = 50 even though it belongs among the first photos (according to its timestamp).
What I need
What I need is a way of ordering photos according to their timestamp and to their manually-set order.
Maybe you have a solution to the second issue I highlighted, or maybe you're aware of a whole other way to deal with this. Either way, thank you for your help.
At no point in your question do you mention computers or programming languages. This is OK (actually, it's a good approach, get the problem and solution worked out on paper before coding) and here's an answer which also ignores computers and programming languages.
Put all your photos into a shoebox in the order in which you get them.
Now, take three pieces of paper:
On page 1 write the numbers (one to a line) from 1 to N (the number of photos the box can hold). Whenever you put a photo in the box, write its timestamp on the line corresponding to its order in the box.
On page 2 write the timestamp of photo 1 a few lines down. Write a 1 on the same line. For the next photo, write its timestamp in the appropriate place on the paper, leaving as much space above and below as seems necessary for future photo insertions. Write a 2 on the same line. Continue until you run out of space between lines, when you need to copy all the information onto a new version of the page with more space for insertions. The information on this page is the same as the information on page 1, but with the two numbers on each line swapping positions.
On page 3 write the numbers from 1 to N again. As you collect each photo write its number from page 1 (ie its number in the sequence of all photo numbers) in the correct position for your manually-set ordering. You'll probably have to do a lot of rubbing-out and re-writing on this page as you decide that latecomers ought to be inserted high onto this page.
Now you have:
a store for your photos, the shoebox; you should already have realised that you can't store the photos in more than one order at a time;
three indexes (indices if you prefer); the first is fixed and simply assigns a unique sequence number to each photo; it also tells you the timestamp of each photo in the box;
the second index enables you to find the unique sequence number of a photo given its timestamp, and then find the photo in the shoebox;
the third index allows you to order photos as you wish; the first number on each line is the sequence number in the sorted order, the second number is the photo's unique sequence number from the first index.
All of this is an extremely long-winded way of telling you that, since you can't (either in a shoebox or a computerised data store) keep photos in multiple orders simultaneously, you will have to maintain indices for the orderings you wish to use. Those indices point (that's what an index does) from a number to a location in the shoebox, either directly or indirectly.
Consider I have two collections in MongoDB. One for products with documents like:
{'_id': ObjectId('lalala'), 'title': 'Yellow banana'}
And another stores price changes with documents like:
{'product': DBRef('products', ObjectId('lalala')),
'since': datetime(2011, 4, 5),
'new_price': 150 }
One product may have many price changes. The price lasts until a new change with later time stamp. I guess you've caught idea.
Say, I have 100 products. I want to query my DB to get know what's the price of each product at the moment of June 9, 2011. What is the most efficient (quick) way to perform this query in MongoDB? Suppose I have no cache solution or cache is empty.
I thought about group statement on prices collection, where reduce function would select last since before a date provided, grouping by product.$id. But in this case I would not benefit from an index on since field and all documents would be scanned.
Any ideas?
I had a similar problem, but for GPS locations. I found the fastest way was to set up a query for each item, which is rather counter-intuitive if your used to SQL databases.
Query for the item where it's timestamp is less or equal than the date your looking for, and limit the result to 1. Repeat for each item. To really speed things up, run multiple querys in parallel to utilise all the cores on the MongoDB server.
I need to create an RSS feed for our information system, which is written in PHP.
I had no problems with the RSS 2.0 specification, nor with the creation of RSS feed generator. Items for the feed are to be fetched from a large table containing lots of records, so it will take a lot of time to get all the necessary information from this table. Therefore, it is necessary to implement the following scheme:
To show 5 latest items to new
subscribers.
For the existing subscribers – to
show only those items which have
been added since their last view of
the feed.
I have no problems with the first condition: I can simply use the LIMIT clause
to limit the number of fetched rows. Something like this:
$items = function_select(“SELECT * FROM some_table ORDER BY date DESC LIMIT 5);
But this creates the following problem: Suppose there are real feed subscribers who have already read the items from 1 up to 10. After they've been away for some period of time new items have been created; say, 10 new items.
During their next check-in we want them to see all the new 10 items, but not all at once. They will see only the last 5 ones (from 16 up to 20), but not all 10 of them. The items from 11 up to 15 will be omitted.
I suppose that in order to succeed in solving this problem there should be a kind of a flag to be sent to feed. For example: pubDate of the lasted fetched item. Twitter's feed uses something similar. However, that link is hand-made. How could it be done another way?
Please let me know if you have any ideas. If you have any example ready (no matter in what language) just share a link with me. I would appreciate it greatly.
Thank you in advance.
Standard RSS feeds don't render different content to different users. They simply always provide the most recent few items (often 10), and rely on the RSS reader to poll them often enough that they don't miss any updates. Unless you have a particularly compelling reason not to do this, this is the simplest and most effective mechanism.