I'm currently getting to much data from my cosmosDB, which I want to reduce to the last 8 weeks.
How can I filter in PowerQuery to get the last 8 weeks based on my date column.
This is my powerquery to get the data:
let
Source = DocumentDB.Contents("https://xxx.xxx", "xxx", "xxx"),
#"Expanded Document" = Table.ExpandRecordColumn(Source, "Document", {"$v"}, {"Document.$v"}),
#"Expanded Document.$v" = Table.ExpandRecordColumn(#"Expanded Document", "Document.$v", {"date"}, {"Document.$v.date"}),
#"Expanded Document.$v.date" = Table.ExpandRecordColumn(#"Expanded Document.$v", "Document.$v.date", {"$v"}, {"Document.$v.date.$v"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Document.$v.date",{{"Document.$v.date.$v", type text}})
in
#"Changed Type"
And this is how the data is in my CosmosDB:
{
"_id" : ObjectId("5c6144bdf7ce070001acc213"),
"date" : {
"$date" : 1549792055030
},
If you want to do all the work on your end (maybe the server can do some/all of it):
Assuming the 1549792055030 (shown in example) is a Unix timestamp expressed in milliseconds, to convert to a datetime in Power Query, try something like: #datetime(1970, 1, 1, 0, 0, 0) + #duration(0, 0, 0, 1549792055030/1000)
You seem to expand a record field named $v (which itself was nested within a field named date, which itself was nested within a field named $v) in your M code, but $v is not shown as being present in the structure. I mention this as it's confusing to know whether to follow your M code or the structure. I'm going to assume that you have $v field, which contains a date field, which itself contains a $date field. To get at the nested Unix timestamp, you could try something like: someRecord[#"$v"][date][#"$date"]
Since you're interested in only the last 8 weeks, you could test for something like: Date.IsInPreviousNWeeks(DateTime.AddZone(someDatetime, 0), 8). (You could also do it the other way, by converting 8 weeks ago before now to a Unix timestamp and then filter for timestamps >= to the value you've worked out.)
Putting the above together, we might get some M code that looks like:
let
Source = DocumentDB.Contents("https://xxx.xxx", "xxx", "xxx"),
filterDates = Table.SelectRows(Source, each
let
millisecondsSinceEpoch = Number.From([document][#"$v"][date][#"$date"]),
toDatetime = #datetime(1970, 1, 1, 0, 0, 0) + #duration(0, 0, 0, millisecondsSinceEpoch/1000),
toFilter = Date.IsInPreviousNWeeks(DateTime.AddZone(toDatetime, 0), 8)
in toFilter
)
in filterDates
The code above may be functional (hopefully) but, conceptually, it might not be the right way to do it. I am not familiar with the function DocumentDB.Contents, but this link (https://www.powerquery.io/accessing-data/document-db/documentdb.contents) suggests it has these parameters:
function (url as text, optional database as nullable any, optional
collection as nullable any, optional options as nullable record) as
table
and it goes on to say:
if the field Query is specified in the options record the results of
the query being executed on either the specified database and/or
collection will be returned.
What I understand this to mean is that if you change your first line to something like:
Source = DocumentDB.Contents("https://xxx.xxx", "xxx", "xxx", [Query = "..."])
and the query you specify in "..." is understood by the server (presume the query needs be in Cosmos DB's native query language), only the last 8 weeks' worth of data will be returned to you (meaning less data needs sending and less work for you). As I said, I'm unfamiliar with Azure Cosmos DB, so I can't really comment further. But this seems the better way of doing it.
Related
I am trying to clean up my data by converting certain values to a number "2" but need to leave remaining data and data type as is.
I am using the following code with a custom column step in power query (excel) but it is giving me an error when returning a number.
Number.From([Values_old]) otherwise if Text.Contains([Value_old],"not required",Comparer.OrdinalIgnoreCase) or Text.Lower([Value_old]) = "N/A" or Text.Lower([Value_old]) ="NA" or [Value_old] = "100" or [Value_old] = 100 then 2 else [Value_old]
See the result from the step:
I based my conditional column from the following comment I found in a forum: https://community.powerbi.com/t5/Desktop/data-type-which-contains-both-test-and-number/m-p/55785/highlight/true#M22664
However, this seems to break my if then else condition as well.
It's simple. you used the wrong column name, Values_old instead of Value_old
But your formula won't work then either since neither of these will ever work:
Text.Lower([Value_old]) = "N/A" or Text.Lower([Value_old]) ="NA"
because you are comparing something you just converted to lower case against an upper case
so you probably want below, which includes the try part you seem to have left out of your code
= Table.AddColumn(#"Changed Type", "Custom", each try Number.From([Value_old]) otherwise if Text.Contains([Value_old],"not required",Comparer.OrdinalIgnoreCase) or Text.Lower([Value_old]) = "n/a" or Text.Lower([Value_old]) ="na" or [Value_old] = "100" or [Value_old] = 100 then 2 else [Value_old])
I would like to create an index on text column for the following use case. We have a table of Segment with a column content of type text. We perform queries based on the similarity by using pg_trgm. This is used in a translation editor for finding similar strings.
Here are the table details:
CREATE TABLE public.segments
(
id integer NOT NULL DEFAULT nextval('segments_id_seq'::regclass),
language_id integer NOT NULL,
content text NOT NULL,
created_at timestamp without time zone NOT NULL,
updated_at timestamp without time zone NOT NULL,
CONSTRAINT segments_pkey PRIMARY KEY (id),
CONSTRAINT segments_language_id_fkey FOREIGN KEY (language_id)
REFERENCES public.languages (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT segments_content_language_id_key UNIQUE (content, language_id)
)
And here is the query (Ruby + Hanami):
def find_by_segment_match(source_text_for_lookup, source_lang, sim_score)
aggregate(:translation_records)
.where(language_id: source_lang)
.where { similarity(:content, source_text_for_lookup) > sim_score/100.00 }
.select_append { float::similarity(:content, source_text_for_lookup).as(:similarity) }
.order { similarity(:content, source_text_for_lookup).desc }
end
---EDIT---
This is the query:
SELECT "id", "language_id", "content", "created_at", "updated_at", SIMILARITY("content", 'This will not work.') AS "similarity" FROM "segments" WHERE (("language_id" = 2) AND (similarity("content", 'This will not work.') > 0.45)) ORDER BY SIMILARITY("content", 'This will not work.') DESC
SELECT "translation_records"."id", "translation_records"."source_segment_id", "translation_records"."target_segment_id", "translation_records"."domain_id",
"translation_records"."style_id",
"translation_records"."created_by", "translation_records"."updated_by", "translation_records"."project_name", "translation_records"."created_at", "translation_records"."updated_at", "translation_records"."language_combination", "translation_records"."uid",
"translation_records"."import_comment" FROM "translation_records" INNER JOIN "segments" ON ("segments"."id" = "translation_records"."source_segment_id") WHERE ("translation_records"."source_segment_id" IN (27548)) ORDER BY "translation_records"."id"
---END EDIT---
---EDIT 1---
What about re-indexing? Initially we'll import about 2 million legacy records. When and how often, if at all, should we rebuild the index?
---END EDIT 1---
Would something like CREATE INDEX ON segment USING gist (content) be ok? I can't really find which of the available indices would be best suitable for our use case.
Best, seba
The 2nd query you show seems to be unrelated to this question.
Your first query can't use a trigram index, as the query would have to be written in operator form, not function form, to do that.
In operator form, it would look like this:
SELECT "id", "language_id", "content", "created_at", "updated_at", SIMILARITY("content", 'This will not work.') AS "similarity"
FROM segments
WHERE language_id = 2 AND content % 'This will not work.'
ORDER BY content <-> 'This will not work.';
In order for % to be equivalent to similarity("content", 'This will not work.') > 0.45, you would first need to do a set pg_trgm.similarity_threshold TO 0.45;.
Now how you get ruby/hanami to generate this form, I don't know.
The % operator can be supported by either the gin_trgm_ops index or the gist_index_ops index. The <-> can only be supported by gist_trgm_ops. But it is pretty hard to predict how efficient that support will be. If your "contents" column is long or your text to compare is long, it is unlikely to be very efficient, especially in the case of gist.
Ideally you would partition your table by language_id. If not, then it might be helpful to build a multicolumn index having both columns.
CREATE INDEX segment_language_id_idx ON segment USING btree (language_id);
CREATE INDEX segment_content_gin ON segment USING gin (content gin_trgm_ops);
I have a Model called "AdInteraction", these interactions can either be a click or a view (They either have boolean clicked or boolean viewed set to true).
Along every Interaction I save the created_at date.
Now this is what I want to end up with in order to have all the data I need to populate a ChartJS Chart:
[
{
"date": "01-01-2018"
"clicks": 13,
"views": 25
},
{
"date": "02-01-2018"
"clicks": 25,
"views": 74
},
{
"date": "03-01-2018"
"clicks": 0,
"views": 0
},
]
This is a query I already got on my Ad model which is related to AdInteraction:
public function getClicksForLastDays()
{
return $this->clicks()->get()->groupBy(function($date) {
return Carbon::parse($date->created_at)->format('y-m-d');
});
}
However this returns me an array of arrays looking like this:
What would be the correct and most efficient way to fetch the clicks and count them by days?
try this and let me know, I assume your column names are date,clicks,views, if its different then pls let me know, so I will adjust the answer or you can do it your self..
AdInteraction::select([DB::raw('DATE(date)'),DB::raw('count(case when clicks ="true" then 1 end) as "Clicks"'),
DB::raw('count(case when views ="true" then 1 end) as "Views"')])
->groupBy(DB::raw('DATE(date)'))
->get();
or try this
AdInteraction::select([DB::raw('DATE(date)'),DB::raw('count(case when clicks =true then 1 end) as "Clicks"'),
DB::raw('count(case when views =true then 1 end) as "Views"')])
->groupBy(DB::raw('DATE(date)'))
->get();
You should consider abandoning the idea of grouping by date using datetime column since such query will be very inefficient. When you, for example, GROUP BY DATE(created_at) MySQL will be performing this cast function for each row and won't be able to utilize indexes for created_at.
Therefore I recommend you to denormalize your table by introducing separate DATE created_date_at column for created_at value and create an index for it. Then you will be able to efficiently group your stats by this new column value. Just be sure to register the following code for your model:
AdInteraction::creating(function ($adInteraction) {
$adInteraction->created_date_at = $adInteraction->created_at->format('Y-m-d');
});
Or you can consider creating separate int columns for year, month and day. Then you can create a multi-column index and group by these columns. This way you will be able to also easily retrieve stats by days, months and years if needed.
Beginner Lua quesiton - I'm just learning lua, and I wrote some code, a nested table to create something like a table with rows and columns.
However, when I iterate through the table using pairs(), it doesn't output in the same order I put it in. I put it in a Serial, Service Days, Connected, and it's coming out as Service Days, Serial, Connected. I am at a loss to figuring out why. I intentionally created the three rows different ways, since I'm just learning and trying to get comfortable with the different ways of dealing with Lua tables...
The code:
myTable = {}
myTable["headerRow"] = {
Serial = "Serial",
ServDays = "Service Days",
Connected = "Connected" }
myTable[1] = {
Serial = "B9FX",
ServDays = 7,
Connected = true }
myTable[2] = {}
myTable[2]["Serial"] = "2SHA"
myTable[2]["ServDays"] = 3
myTable[2]["Connected"] = true
for k, v in pairs(myTable) do
for k2, v2 in pairs(v) do
io.write(tostring(v2),",")
end
io.write("\n") --End the row
end
The result:
c:\lua>lua53 primer.lua
7,B9FX,true,
3,2SHA,true,
Service Days,Serial,Connected,
pairs uses the next function. Hence the order of traversal in a generic for loop using the pairs iterator is unspecified.
From the Lua reference manual:
https://www.lua.org/manual/5.3/manual.html#pdf-next
The order in which the indices are enumerated is not specified, even
for numeric indices. (To traverse a table in numerical order, use a
numerical for.)
The behavior of next is undefined if, during the traversal, you assign
any value to a non-existent field in the table. You may however modify
existing fields. In particular, you may clear existing fields.
If you do something like this:
myTable[2] = {}
myTable[2]["Serial"] = "2SHA"
myTable[2]["ServDays"] = 3
myTable[2]["Connected"] = true
Lua will not remember in which order you asigned values to table keys. It will only map keys to values.
I have a column with mixed types of Number and Text and am trying to separate them into different columns using an if ... then ... else conditional. Is there an ISNUMBER() or ISTEXT equivalent for power query?
Here is how to check type in Excel Powerquery
IsNumber
=Value.Is(Value.FromText([ColumnOfMixedValues]), type number)
IsText
=Value.Is(Value.FromText([ColumnOfMixedValues]), type text)
hope it helps!
That depends a bit on the nature of the data and how it is originally encoded. Power Query is more strongly typed than Excel.
For example:
Source = Table.FromRecords({[A=1],[A="1"],[A="a"]})
Creates a table with three rows. The first row's data type is number. The second and third rows are both text. But the second row's text could be interpreted as a number.
The following is a query that creates two new columns showing if each row is a text or number type. The first column checks the data type. The second column attempts to guess the data type based on the value. The guessing code assumes everything that isn't a number is text.
Example Code
Edit: Borrowing from #AlejandroLopez-Lago-MSFT's comment for the interpreted type.
let
Source = Table.FromRecords({[A=1],[A="1"],[A="a"]}),
#"Added Custom" = Table.AddColumn(Source, "Type", each
let
TypeLookup = (inputType as type) as text =>
Table.FromRecords(
{
[Type=type text, Value="Text"],
[Type=type number, Value="Number"]
}
){[Type=inputType]}[Value]
in
TypeLookup(Value.Type([A]))
),
#"Added Custom 2" = Table.AddColumn(#"Added Custom", "Interpreted Type", each
let
result = try Number.From([A]) otherwise "Text",
resultType = if result = "Text" then "Text" else "Number"
in
resultType
)
in
#"Added Custom 2"
Sample output
Put it in logical test format
Value.Type([Column1]) = type number
Value.Type([Column1]) = type text
The function Value.Type returns a type, so by putting it in equation thus return a true / false.
Also, equivalently,
Value.Type([Column1]) = Date.Type
Value.Type([Column1]) = Text.Type
HTH
ISTEXT() doesn't exist in any language I've worked with - typically any numeric or date value can be converted to text so what would be a false result?
For ISNUMBER, I would solve this without any code by changing the Data Type to a number type e.g. Whole Number. Any rows that don't convert will show Error - you can then apply Replace Errors or Remove Errors to handle them.
Use Duplicate Column first if you don't want to disturb the original column.
I agree with Mike Honey.
I have a SKU code that is a mix of Char and Num.
Normally the last 8 Char are Numbers but in some weird circumstances the SKU is repeated with an additional letter but given the same EAN which causes chaos.
by creating a new temp column using Text.End(SKU, 1) I get only the last character. I then convert that column to Whole Number. Any Error rows are then removed to leave only the rows I need. I then delete the temp Column and am left with the Rows I need in the format I started with.