How granular should data in memcached be?

How granular should data in memcached be? - caching

Something I'm curious about.. What would be "most efficient" to cache the generation of, say, an RSS feed? Or an API response (like the response to /api/films/info/a12345).
For example, should I cache the entire feed, and try and return that, as psuedo code:
id = GET_PARAMS['id']
cached = memcache.get("feed_%s" % id)
if cached is not None:
return cached
else:
feed = generate_feed(id)
memcache.put("feed_%s" % id, feed)
return feed
Or cache the queries result, and generate the document each time?
id = sanitise(GET_PARMS['id'])
query = query("SELECT title, body FROM posts WHERE id=%%", id)
cached_query_result = memcache.get(query.hash())
if cached_query_result:
feed = generate_feed(cached_query_result)
return feed
else:
query_result = query.execute()
memcache.put("feed_%s" % id, query_result)
feed = generate_feed(query_result)
(Or, some other way I'm missing?)

As for my experience, You should use multiple levels of cache. Implement both of Your solutions (provided that it's not the only code that uses "SELECT title, body FROM posts WHERE id=%%". If it is use only the first one).
In the second version of code, You memcache.get(query.hash()), but memcache.put("feed_%s" % id, query_result). This might not work as You want it to (unless You have an unusual version of hash() ;) ).
I would avoid query.hash(). It's better to use something like posts-title-body-%id. Try deleting a video when it's stored in cache as query.hash(). It can hang there for months as a zombie-video.
By the way:
id = GET_PARMS['id']
query = query("SELECT title, body FROM posts WHERE id=%%", id)
You take something from GET and put it right into the sql query? That's bad (will result in SQL injection attacks).

Depends on the usage pattern, but all things equal I'd vote for the first way because you'll only do the work of generating the feed 1 time.

It really depends on what your app does... The only way to answer this is to get some performance numbers from your existing app. Then you can find the code that takes the largest amount of time and work on improving that one.

As others have suggested here I'd profile your code and work out what is the slowest or most expensive part of the operation.

Related

Laravel Search Caching

I'm new to Laravel (and coding in general), and I have a little Pizza Ordering system that stores the orders placed by clients to a local Pizzaria.
Inside the "New Order" form, when you start typing down the name of a pizza (four cheeses, chicken, yada yada), the program returns a query search that is run every 2 keydowns with products with a similar name.
Here's the search query, a pretty simple and basic one:
$pesquisa = json_decode($request->getContent(), true);
$produtos = Produto::select('nome', 'valor')->where('nome', 'LIKE', '%'.$pesquisa.'%')->get();
return response()->json($produtos);
Here's the "problem" I'm having: The current database has about 50 items, and it takes about ~500ms to get a return. This in my local machine, the problem gets a little bigger when it's actually hosted in a server, where it can spike from ~500ms to ~2s, depending on user connection.
In my study, I've heard about caching, and that it can shorten or remove the need for queries (which was already implemented in the "show all orders placed" list, and REALLY minimized the speed of loading), but I don't know if caching can be done with user-inputted search?
First question: How would one go about saving those pizza names to a cache, while still sorting through them based on user input?
Second question: Is caching like this the "best" way to speed up user-inputted search? Is there something else I should be doing first? (I've heard that the 'LIKE' query search is the slowest there is... should I research and try another type?)
All explanations, tips and tricks are greatly appreciated! Thank you!

For Caching you can use this
$pesquisa = json_decode($request->getContent(), true);
return Cache::remember($pesquisa, $seconds, function ()use($pesquisa) {
$produtos = Produto::select('nome', 'valor')->where('nome', 'LIKE', '%'.$pesquisa.'%')->get();
return response()->json($produtos);
});
rememberForever can also be used instead of remember

Dynamic results from query builder

I am working on a search filter atm where people can specify things like (example) size, color, fabric and so on.
Obviously I have models for each i.e. size, color and fabric. But since this "search" should return every result i.e. size + color + fabric and not just one of the three I would need to make a new struct which contains all the (size, color and fabric) to be able to consume the result that Gorm would return.
Since we have a LOT of filters, this could get very messy. Does anyone know if there is a better way to do it? or what would be a best practice to do this?
type Result struct {
ID int
Name string
Age int
}
var result Result
db.Raw("SELECT id, name, age FROM users WHERE name = ?", 3).Scan(&result)
The above example illustrates how I think it should be done, but as you can expect with the amount of data I need to return this struct would become huge.
The results are quite large in terms of data. I mean what I would want in return in the final version for example (this is about sales):
What products were bought, the amount, color and size. Payment data i.e. price, tax, payment method. Customer information, et cetera.
So in total there is a lot of information to store.
All this information should be returned as a JSON format so we can call it through an API call.
Each call should give back 100 up to 15.000 results, to give you an idea of the side of the data.
Hope someone can explain a bit about a best practice method and -or how I should solve this problem as I am unsure on how to code this effectively.

Is it safe to pass a Lucene Query String directly from a user into a QueryParser?

tldr: Can I securely pass a raw query string (retrieved as a URL parameter) into a Lucene QueryParser without any added input sanitization?
I'm not a security expert, but I need some advice. As the title states, is it safe to use this controller method:
#CrossOrigin(origins = "${allowed-origin}")
#GetMapping(value = "/search/{query_string}", produces = MediaType.APPLICATION_JSON_VALUE)
public List doSearch(#PathVariable("query_string") String queryString) {
return searchQueryHandlerService.doSearch(queryString);
}
In tandem with this service method (the error handling is for testing only):
public List doSearch(String queryString) {
LOGGER.debug("Parsing query string: " + queryString);
try {
Query q = new QueryParser(null, standardAnalyzer).parse(queryString);
FullTextEntityManager manager = Search.getFullTextEntityManager(entityManager);
FullTextQuery fullTextQuery = manager.createFullTextQuery(q, Poem.class, Book.class, Section.class);
return fullTextQuery.getResultList();
} catch (ParseException e) {
LOGGER.error(e);
return Collections.emptyList();
}
}
With only basic input sanitization? If this isn't safe are there measures I can take to make it safe?
Any help is greatly appreciated.
I've been looking into this on and off for the last few weeks and I cannot find any reason why it wouldn't be safe, but It's such an obscure question (in an area I'm unfamiliar with) that I may be missing some obvious, fundamental problem anyone working in the area would see immediately.

A FullTextQuery is always read only, so you don't have to be concerned with people dropping tables or similar issues that you might have to consider when dealing with SQL injection.
But you might want to be careful if you have security restrictions on what data can be seen by your users.
The API also restricts the operation to a certain set of indexes - in your case those containing the Poem entities - so it's also not possible to break out of the chosen indexes.
But you need to consider:
is it ok if the user is able to somehow find a different Poem than what you expected them to look for
if you share the same index with other entities, there might be some ways to infer data about these other entities
So to be security conscious you might want to:
each entity type gets indexed into its own index (which is the default).
enable some FullTextFilter to restrict the user query based on your custom rules.
actually check the content of each result before rendering it, so to remove content that your other filters didn't catch.
If you are extremely paranoid, consider that any full-text index can actually reveal a bit about how frequent certain terms are in the whole index. People are normally not too concerned about this as it's extremely hard to take advantage of, and only minimal clues about the data distribution are revealed.
So back at your example, if this index just contains poems and you're ok with allowing any user to see any poem you have stored, giving away clues about which poems you are making available is normally not a security concern but is rather the whole point of your service.

Get all members from the mailing list using MailChimp API 3.0

http://kb.mailchimp.com/api/resources/lists/members/lists-members-collection
Using this resource we can obtain only first 10 members. How to get all?

The answer is quite simple - use offset and count parameters in URL query:
https://us10.api.mailchimp.com/3.0/lists/b5b5fdc2fa/members?offset=150&count=10
Finally I found PHP API client for MailChimp API v3:
https://github.com/pacely/mailchimp-api-v3
And official docs about pagination.. I missed it before :(
http://kb.mailchimp.com/api/article/api-3-overview

I stumbled on this one while researching a way to get all list members in MC API 3.0 as well. I noticed that there were some comments on the API timing out when trying to get all list members on one page. I also encountered this at first but was able to overcome it by limiting the fields in the result by using the 'fields' param. My code is for a mass deleter so all I really needed was the ID of each member to put together a batch delete request. Here's how my fetch request looks (psuedo-code):
$total_members = $result['total_items'];//get number of members in list via previous request
https://usXX.api.mailchimp.com/3.0/lists/foobarx/members?fields=members.id&count=total_members
This way I'm able to fetch over 15,000 subscribers on one page without error.

offset and count is the official way on the docs, but the problem is that has linear slowdown. It appears to be an n^2 solution, so if you have 20,000 items, you're in trouble. Their docs http://developer.mailchimp.com/documentation/mailchimp/reference/lists/members/#read-get_lists_list_id_members warn you against using offset.
If you're scenario permits you to use other filters (like since_last_changed), then you can do it quickly. See What is the right syntax for "timeframe" in MailChimp API 3.0 for format for datetime.

Using offset and count parameters are correct as mentioned in some of the other answers, but becomes tedious for large lists.
A more efficient way, is to use a client for the MailChimp API. I used mailchimp3 for python. Using this, it's pretty easy to get all members on your list because it handles the pagination. Here's how you would do it.
from mailchimp3 import MailChimp
client = MailChimp('YOUR_USERNAME', 'YOUR_SECRET_KEY')
client.lists.members.all('YOUR_LIST_ID', get_all=True, fields="members.email_address")

You can do it just with count, making an API call to the list root so in the next API call you include the count parameter and you have all your list members.
I ran into issues with this because I had a moderate list with 2600 members and MailChimp was throwing an error, but it worked with 1500 people.
So for a list bigger than 1500 members I use MailChimp export API bare in mind that this is going to get discontinued but I could not find any other acceptable solutions.
Alternatively for bigger lists (>1500) you could get the total of members and then make multiple api calls to the Member endpoint but I really dislike that :(
If anyone has a better alternative I would be really glad to hear it.

With MailChimp.Net.
Use the offset value.
List<Member> listMembers = new List<Member>();
IMailChimpManager manager = new MailChimpManager(MailChimpApiKey);
bool moreAvailable = true;
int offset = 0;
while (moreAvailable)
{
var listMembers = manager.Members.GetAllAsync(yourListId, new MemberRequest
{
Status = Status.Subscribed,
Limit = 250,
Offset = offset
}).ConfigureAwait(false);
var Allmembers = listMembers.GetAwaiter().GetResult();
foreach(Member member in Allmembers)
{
listMembers.Add(member);
}
if (Allmembers.Count() == 250)
//if the count is < of 250 then it means that there aren't more results
offset += 250;
else
moreAvailable = false;
}

Handling Exchange Web Services (EWS) missing properties

I'm relatively comfortable with EWS programming and Exchange schemas, but running into an interesting problem to handle.
I have a propertyset, asking for:
ItemClass
DateTimeReceived
LastModifiedTime
Size
Every Item in the AllItems folder at the root.
I get the result set, and then attempt Linq queries against the set, particular to the DateTimeReceived. All Items don't have a DateTimeReceived returned by the server, and they except. I'm trying a...
long msgCount = (from msg in allItems
where !msg.DateTimeReceived.Equals(null)
select msg).Count();
... which (IMO) should return the count of allItems that have a DateTimeReceived. However, the property isn't null; it's just not there, throwing an exception.
I'm trying to avoid iterating through the set one by one, trying each record. Anyone have a thought or experience?

Thanks TTY for the input that definitely lead to the following code that returns what I need. (Still in final testing)
List<EWS.Item> noReceivedProperty = inputlist.Where(m => (m.GetType().GetProperty("DateTimeReceived") != null)).ToList<EWS.Item>();
Then of course, take noReceivedProperty.Count or such as needed.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How granular should data in memcached be? - caching

Depends on the usage pattern, but all things equal I'd vote for the first way because you'll only do the work of generating the feed 1 time.

It really depends on what your app does... The only way to answer this is to get some performance numbers from your existing app. Then you can find the code that takes the largest amount of time and work on improving that one.

As others have suggested here I'd profile your code and work out what is the slowest or most expensive part of the operation.

Related

Laravel Search Caching

Dynamic results from query builder

Is it safe to pass a Lucene Query String directly from a user into a QueryParser?

Get all members from the mailing list using MailChimp API 3.0

Handling Exchange Web Services (EWS) missing properties

Categories

Resources