RSS functioning problem - algorithm

I need to create an RSS feed for our information system, which is written in PHP.
I had no problems with the RSS 2.0 specification, nor with the creation of RSS feed generator. Items for the feed are to be fetched from a large table containing lots of records, so it will take a lot of time to get all the necessary information from this table. Therefore, it is necessary to implement the following scheme:
To show 5 latest items to new
subscribers.
For the existing subscribers – to
show only those items which have
been added since their last view of
the feed.
I have no problems with the first condition: I can simply use the LIMIT clause
to limit the number of fetched rows. Something like this:
$items = function_select(“SELECT * FROM some_table ORDER BY date DESC LIMIT 5);
But this creates the following problem: Suppose there are real feed subscribers who have already read the items from 1 up to 10. After they've been away for some period of time new items have been created; say, 10 new items.
During their next check-in we want them to see all the new 10 items, but not all at once. They will see only the last 5 ones (from 16 up to 20), but not all 10 of them. The items from 11 up to 15 will be omitted.
I suppose that in order to succeed in solving this problem there should be a kind of a flag to be sent to feed. For example: pubDate of the lasted fetched item. Twitter's feed uses something similar. However, that link is hand-made. How could it be done another way?
Please let me know if you have any ideas. If you have any example ready (no matter in what language) just share a link with me. I would appreciate it greatly.
Thank you in advance.

Standard RSS feeds don't render different content to different users. They simply always provide the most recent few items (often 10), and rely on the RSS reader to poll them often enough that they don't miss any updates. Unless you have a particularly compelling reason not to do this, this is the simplest and most effective mechanism.

Related

Camel + CassandraQL : Process a table without putting all in memory

Goal: read a big Cassandra table, process line by line in parallel
Constraints:
not all rows in memory
no Spark, we have to use Camel
One shot, no need polling the table
I did a first working version with CassandraQL but this Component seems to be limited to one query with all in memory, I did not find mechanics like fetSize/fetchMoreResult. I looked CassandraProducer class, PollingConsumerPollingStrategy, ResultSetConversionStrategy... See nothing.
Could it be possible to read a table by chunks of 1000 elements for example, each chunk would generate an exchange lately split in different threads ?
I think that maybe the ProducerTemplate injecting first exchanges in the route could be the answer. But I don't undertand how I could manage the production exchange rate to avoid to have too many rows in memory (to do so, we would need for example to check the size of the next blocking queue, if more than X no consumed elements, wait before producing more messages).
Maybe there are other options to do something like this ?
Maybe I did not see the magic parameter in CassandraQL ?
Maybe we can override some parts of CassandraQL ?
Thank you
This is not going to be answer to be a your question but hope to kick off some discussion. But as someone learning Cassandra and spending bit of time, it made me thinking. And mainly targets fetSize/fetchMoreResult part of the question
First of all, two of your constraints are contradicting
Not all rows in memory
I don't want all them fetched at once
One shot, no need polling the partition
I don't want to go back to db more than once.
Unless what you actually you meant is
Not all rows in memory
I don't want all them fetched at once
You can go back to partition many times, as long as you go back straight to where you left it last time.
As long as the time it takes for the first page is same as time it takes for the second page. And the time it takes for the 19th Page is same as the time it takes for the 20th page.
i.e Not starting from the first row
So I am going to assume that what you meant is Second Scenario and go with it.
Queries for Cassandra are going to satisfy the following two:
They are going to have a restriction on clustering columns
They are already ordered by clustering columns
Now Consider the following table
department(partition key), firstName(clustering_key), personId(clustering_key), lastname, etc as normal cols
First query
select department, firstName, lastname, etc
from person
where department = 'depart1`
order by firstName ASC
limit 25;
Second query (lets say last record in the page had userId=25 and firstName=kavi)
select department, firstName, lastname, etc
from person
where department = 'depart1` and firstName='kavi' and userId > 25
order by firstName ASC
limit 25;
As you can see, we can easily construct a Cassandra query that brings each chunk with certain size in constant time.
Now back to integration framework
I remember this concept called watermark in mule where the endpoint can store and remember so that they can start from there next time. In this case, value of userId and firstName of the last record of the last page is the watermark. So they can issue the second. I am sure we should be able to do the same with camel
I hope I have convinced that polling is not an issue where each chunk is retrieved in constant time

Smart pagination algorithm that works with local data cache

This is a problem I have been thinking about for a long time but I haven't written any code yet because I first want to solve some general problems I am struggling with. This is the main one.
Background
A single page web application makes requests for data to some remote API (which is under our control). It then stores this data in a local cache and serves pages from there. Ideally, the app remains fully functional when offline, including the ability to create new objects.
Constraints
Assume a server side database of products containing +- 50000 products (50Mb)
Assume no db type, we interact with it via REST/GraphQL interface
Assume a single product record is < 1kB
Assume a max payload for a resultset of 256kB
Assume max 5MB storage on the client
Assume search result sets ranging between 0 ... 5000 items per search
Challenge
The challenge is to define a stateless but (network) efficient way fetch pages from a result set so that it is deterministic which results we will get.
Example
In traditional paging, when getting the next 100 results for some query using this url:
https://example.com/products?category=shoes&firstResult=100&pageSize=100
the search result may look like this:
{
"totalResults": 2458,
"firstResult": 100,
"pageSize": 100,
"results": [
{"some": "item"},
{"some": "other item"},
// 98 more ...
]
}
The problem with this is that there is no way, based on this information, to get exactly the objects that are on a certain page. Because by the time we request the next page, the result set may have changed (due to changes in the DB), influencing which items are part of the result set. Even a small change can have a big impact: one item removed from the DB, that happened to be on page 0 of the result set, will change what results we will get when requesting all subsequent pages.
Goal
I am looking for a mechanism to make the definition of the result set independent of future database changes, so if someone was looking for shoes and got a result set of 2458 items, he could actually fetch all pages of that result set reliably even if it got influenced by later changes in the DB (I plan to not really delete items, but set a removed flag on them, for this purpose)
Ideas so far
I have seen a solution where the result set included a "pages" property, which was an array with the first and last id of the items in that page. Assuming your IDs keep going up in number and you don't really delete items from the DB ever, the number of items between two IDs is constant. Meaning the app could get all items between those two IDs and always get the exact same items back. The problem with this solution is that it only works if the list is sorted in ID order... I need custom sorting options.
The only way I have come up with for now is to just send a list of all IDs in the result set... That way pages can be fetched by doing a SELECT * FROM products WHERE id IN (3,4,6,9,...)... but this feels rather inelegant...
Any way I am hoping it is not too broad or theoretical. I have a web-based DB, just no good idea on how to do paging with it. I am looking for answers that help me in a direction to learn, not full solutions.
Versioning DB is the answer for resultsets consistency.
Each record has primary id, modification counter (version number) and timestamp of modification/creation. Instead of modification of record r you add new record with same id, version number+1 and sysdate for modification.
In fetch response you add DB request_time (do not use client timestamp due to possibly difference in time between client/server). First page is served normally, but you return sysdate as request_time. Other pages are served differently: you add condition like modification_time <= request_time for each versioned table.
You can cache the result set of IDs on the server side when a query arrives for the first time and return a unique ID to the frontend. This unique ID corresponds to the result set for that query. So now the frontend can request something like next_page with the unique ID that it got the first time it made the query. You should still go ahead with your approach of changing DELETE operation to a removed operation because it would make sure that none of the entries from the result set it deleted. You can discard the result set of the query from the cache when the frontend reaches the end of the result set or you can set a time limit on the lifetime of the cache entry.

Advanced search in laravel

I am trying to implementing a search in my application, specifically it checks a table in my database against certain parameters specified by the user.
For example if my user wants to return a list of records created with the last 7 days. Then he types last 7 days or last seven days in the search bar.
My issue now is being able to change string like this into a valid date that can be used in a where when checking my database.
The number can also be random as in the user can simply type last 2 days or last two days or last 100 days or last hundred days, inputs like week 25 and last month should also be allowed.
Instead of creating multiple drop downs and input boxes to allow the user to select form the front end i would like to do something much simpler like this.
My question now is that is there a feature or a package in laravel that already takes care of this?
If there isn't how would i go about doing something like that??
I think what you need to look up is "NLP" or "Natural Language Processing". There are many libraries and API's within the field that can help you out so that you don't reinvent the wheel (so to speak).
Here is a package in Laravel: Laravel Aylien Wrapper or nlp-tools, but there are many others for PHP in general.
Just do a quick search or look around at Mashape to find some examples.

Count how many times one post has been read

Is it possible to show how many times one post has been read? In WordPress there is a plug-in,https://wordpress.org/plugins/wp-postviews/
I don't know whether there is such a plug-in in Anypic of Parse to count the times?
Of course it will be nice if it can display who has read a post as well.
Thanks
I'm not sure which language you working on.
But anyway you need to create:
Array column in Parse.com
And then just make query to add his name when viewWillAppear
Now you can count the array to get integer number for views and you can display their names from the array.
Two options are;
Add a viewcount column and increment it whenever needed.
Add an actions table which consist all actions within your webpage or app. This way you can store more data(custom analytics) in it like button pressing etc.. When you want to check the viewcount you can just count objects with specific type. For iOS SDK countObjectsInBackgroundWithBlock does this job.

Linq to SQL - Random Select Order and Paging

We have a database with 2,00,000 vendor in 100 plus category, if someone visit the website we want to allow them to select a category and show them 25 Vendor per page, first we kept order by VendorId but it always use to get first 25, but we removed it, but now in paging it sometime repeat the vendor, is there a way to get random 25 vendor and also keep the paging.
Regards
you can randomize your result but everytime you dot he query, it will create new random list so unless you randomize and save the randomized state in your Code and page over it, it cant be done straightforward way.
refer, SQL Query results pagination with random Order by in SQL Server 2008
I believe this requirement is impossible to implement if a new random order is needed every time, there needs to be good performance and every item should have equal chance to get selected. I believe you should redesign the way your application works.
One possible workaround is to have a couple of columns in a table and fill them with random numbers. When a user requests the list assign the random column to him (stick it in the URL for example). Then do an order by that column and display the results. Randomly switch 4-5 columns to create the appearance of randomness. Update the random numbers in the columns once a day.

Resources