laravel Cache Queryies with conditions - caching

i am trying to load a cached value by providing a segment of it's key and i can't find out how...i want something like SQL syntax Example here but with laravel caching system:
select * where key like '%{segment of the key }%'
PS: i am using file drive

To date there is no native possibility of Laravel (7.x).
A not immediate option is to implement an event system related to caching (https://laravel.com/docs/7.x/cache#events).
You could save your keys in a table in memory (in MySQL it is very simple. ) through the KeyWritten event.
Then you could use the query you mentioned to search for the key on the Memory table.
Of course the KeyForgotten event will have to delete the keys from the memory table.

Related

How to avoid data duplicates in ClickHouse

I already read this but I still have questions. I only have one VM with 16 GB of RAM, 4 cores and a disk of 100 GB, with only ClickHouse and a light web api working on it.
I'm storing leaked credentials in a database:
CREATE TABLE credential (
user String,
domain String,
password String,
first_seen Date,
leaks Array(UInt64)
) ENGINE ReplacingMergeTree
PARTITION BY first_seen
ORDER BY user, domain, password, first_seen
It something happens that some credentials appear more than once (inside a file or between many).
My long-term objective is(was) the following:
- when inserting a credential which is already in the database, I want to keep the smaller first_seen and add the new leak id to the field leaks.
I have tried the ReplacingMergeTree engine, insert twice the same data ($ cat "data.csv" | clickhouse-client --query 'INSERT INTO credential FORMAT CSV') and then performed OPTIMIZE TABLE credential to force the replacing engine to do its asynchronous job, according to the documentation. Nothing happens, data is twice in the database.
So I wonder:
- what did i miss with the ReplacingMergeTree engine ?
- how does OPTIMIZE work and why doesn't it do what I was expecting from it ?
- is there a real solution for avoiding replicated data on a single instance of ClickHouse ?
I have already tried to do it manually. My problem is a have 4.5 billions records into my database, and identifying duplicates inside a 100k entries sample almost takes 5 minutes with the follow query: SELECT DISTINCT user, domain, password, count() as c FROM credential WHERE has(leaks, 0) GROUP BY user, domain, password HAVING c > 1 This query obviously does not work on the 4.5b entries, as I do not have enough RAM.
Any ideas will be tried.
Multiple things are going wrong here:
You partition very granulary... you should partition by something like a month of data, whatsoever. Now clickhous has to scan lots of files.
You dont provide the table engine with a version. The problem here is, that clickhouse is not able to find out wich row should replace the other.
I suggest you use the "version" parameter of the ReplacingMergeTree, as it allows you to provide an incremental version as a number, or if this works better for you, the current DateTime (where the last DateTime always wins)
You should never design your solution to require OPTIMIZE be called to make your data consistent in your result sets, it is not designed for this.
Clickhouse always allows you to write a query where you can provide (eventual) consistency without using OPTIMIZE beforehand.
Reason for avoiding OPTIMIZE, besides being really slow and heavy on your DB, you could end up in race conditions, where other clients of the database (or replicating clickhouse nodes) could invalidate your data between the OPTIMIZE finished and the SELECT is done.
Bottomline, as a solution:
So what you should do here is, add a version column. Then when inserting rows, insert the current timestamp as a version.
Then select for each row only the one that has the highest version in your result so that you do not depend on OPTIMIZE for anything other then garbage collection.

Check all table columns for a value

Ok, tricky question I am trying to figure out where a database schema is storing a particular pointer. I know the pointer value I just don't what table it is in or what column. I know the pointer is 123123123. How do I check all table columns to see if any of them have that value?
Thanks.
In h2 you can use fulltext search, but then you would need to add all tables in the search scope and indexing.
If you need to index only primary keys, then it might be better but you still need to come up with individual FT_CREATE_INDEX() calls for each table. You can automate this with several languages or with ETLs (like scriptella).
If you've enough disk space, you could dump a SQL from your db and use a viewer for big files like glogg.
The advantage of the first solution is no external tools but you need to work out a specific indexing script for SQL for any existing or new table. The 2nd solution is a 1 time fix.
I use SQL Search from RedGate. It's free and it helps you find any text anywhere in the database.
https://www.red-gate.com/products/?gclid=CjwKEAjwiYG9BRCkgK-G45S323oSJABnykKAE7IH_EMhnmq7OdLdXljfIkdGZrDD6OnOrT4VB0agahoCVn3w_wcB

Using dynamic lookup from parallel sessions with synchronized cache in Informatica

Using Informatica 9.1.0
Scenario
Get the Dimension key generated and inserted to the Fact table from the Fact load.
I have to load the Fact table with a dimension key along with other columns. This dimension record is created from within the same mapping. There are five different sessions using the same mapping and executes simultaneously to load the Fact table. In this case I'm using a dynamic lookup with 'Synchronize dynamic cache' enabled to get unique dimension records generated from the 5 sessions using some conditions. The dimension ID is generated using the Sequence-ID in associated expression of the lookup. When a single session alone is run it worked perfectly fine. But when the sessions were run parallely it started to show unique key violation error as random sessions tried to insert the same sequence which was already there.
To fix the issue I had to give persistent lookup cache enabled and Cache file name prefix. But I did not find this solution or this issue in any of the forums or in INFA communities. So not sure this is the right way of doing it or this is a bug of some kind.
Please let me know if you had similar issue or some different thoughts.
Thanks in advance
One other possible solution I can think of is to have the database generate a sequence instead of using Informatica's sequencer. The database should be capable of avoiding any unique key violations.

managing/implementing auto-increment primary key in oracle without triggers

We have many tables in our database with autoincrement primary key ids setup the way they are in MySQL since we are in the process of migrating to Oracle from MySQL.
Now in oracle I recently learned that implementing this requires creating a sequence and a trigger on the id field for each such table. We have like 30 -40 tables in our schema and we want to avoid using database triggers in our product, since management of database is out of scope for our software appliance.
What are my options in implementing the auto increment id feature in oracle... apart from manually specifying the id in the code and managing it in the code which would change a lot of existing insert statements.
... I wonder if there is a way to do this from grails code itself? (by the way the method of specifying id as increment in domain class mapping doesnt work - only works for mysql)
Some info about our application environement: grails-groovy, hibernate, oracle,mysql support
This answer will have Grails/Hibernate handle the sequence generation by itself. It'll create a sequence per table for the primary key generation and won't cache any numbers, so you won't lose any identifiers if and when the cache times out. Grails/Hibernate calls the sequence directly, so it doesn't make use of any triggers either.
If you are using Grails hibernate will handle this for you automatically.
You can specify which sequence to use by putting the following in your domain object:
static mapping = {
id generator:'sequence', params:[sequence:'MY_SEQ']
}

Can I capture Performance Counters for an Azure Web/Worker Role remotely...?

I am aware of the generation of the Performance Counters and Diagnosis in webrole and worker-role in Azure.
My question is can I get the Performance Counter on a remote place or remote app, given the subscription ID and other certificates (3rd Party app to give performance Counter).
Question in other words, Can I get the Performance Counter Data, the way I use Service Management API for any hosted service...?
What are the pre-configurations required to be done in Server...? to get CPU data...???
Following is the description of the attributes for Performance counters table:
EventTickCount: Stores the tick count (in UTC) when the log entry was recorded.
DeploymentId: Id of your deployment.
Role: Role name
RoleInstance: Role instance name
CounterName: Name of the counter
CounterValue: Value of the performance counter
One of the key thing here is to understand how to effectively query this table (and other diagnostics table). One of the things we would want from the diagnostics table is to fetch the data for a certain period of time. Our natural instinct would be to query this table on Timestamp attribute. However that's a BAD DESIGN choice because you know in an Azure table the data is indexed on PartitionKey and RowKey. Querying on any other attribute will result in full table scan which will create a problem when your table contains a lot of data.
The good thing about these logs table is that PartitionKey value in a way represents the date/time when the data point was collected. Basically PartitionKey is created by using higher order bits of DateTime.Ticks (in UTC). So if you were to fetch the data for a certain date/time range, first you would need to calculate the Ticks for your range (in UTC) and then prepend a "0" in front of it and use those values in your query.
If you're querying using REST API, you would use syntax like:
PartitionKey ge '0<from date/time ticks in UTC>' and PartitionKey le '0<to date/time in UTC>'.
You could use this syntax if you're querying table storage in our tool Cloud Storage Studio, Visual Studio or Azure Storage Explorer.
Unfortunately I don't have much experience with the Storage Client library but let me work something out. May be I will write a blog post about it. Once I do that, I will post the link to my blog post here.
Gaurav
Since the performance counters data gets persisted in Windows Azure Table Storage (WADPerformanceCountersTable), you can query that table through a remote app (either by using Microsoft's Storage Client library or writing your own custom wrapper around Azure Table Service REST API to retrieve the data. All you will need is the storage account name and key.

Resources