I want to know, is there set of entities by following rule:
I have a table with two primary keys:
| id | key |
| 1 | a |
| 2 | b |
| 1 | c |
So, I want to do something like that:
boolean existsByIdAndAllOfKey(
long id,
Set<Key> keys
)
This query should return true if in the database there are entities with all keys presented in input Set.
I wondering is there any keyword from spring data? Or what is the best way to do that?
found following solution:
int countByIdAndKeyIn(
long id,
Set<Key> keys
)
boolean isThereEntityWithAllKeys(long id, Set<Key> keys) {
return countByIdAndKeyIn(id, keys) == keys.size;
}
Related
Beeing pretty new to Power Query, I find myself faced with this problem I wish to solve.
I have a TableA with these columns. Example:
Key | Sprint | Index
-------------------------
A | PI1-I1 | 1
A | PI1-I2 | 2
B | PI1-I3 | 1
C | PI1-I1 | 1
I want to end up with a set looking like this:
Key | Sprint | Index | HasSpillOver
-------------------------
A | PI1-I1 | 1 | Yes
A | PI2-I2 | 2 | No
B | PI1-I3 | 1 | No
C | PI1-I1 | 1 | No
I thought I could maybe nestedjoin TableA on itself and then compare indicies and strip them away and then count rows in the table, like outlined below.
TableA=Key, Sprint, Index
// TableA Nested joined on itself (Key, Sprint, Index, Nested)
TableB=NestedJoin(#"TableA", "Key", #"TableA", "Key", "Nested", JoinKind.Inner)
TableC= Table.TransformColumns(#"TableB", {"Nested", (x)=>Table.SelectRows(x, each [Index] <x[Index])} )
.. and then do the count, however this throws an error:
Can not apply operator < on types List and Number.
Any suggestions how to approach this problem? Possibly (probably) in a different way.
You did not define very well what "spillover" means but this should get you most of the way
Mine assumes adding another index. You could use what you have if it is relevant
Then the code counts the number of rows where the (2nd) index is higher, and the [Key] field matches. You could add code so that the Sprint field matches as well if relevant
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index.1", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index" ,"Count",(i)=>Table.RowCount(Table.SelectRows(#"Added Index" , each [Key]=i[Key] and [Index.1]>i[Index.1])))
in #"Added Custom"
I'm trying to scan the result of a query into a res structure.
The code builds and the query passes but the result array consists of default values like this:
[{0 0 0} {0 0 0} {0 0 0} {0 0 0} {0 0 0} {0 0 0}]
Also, result array has the exact length as the query result should have.
When i try generated query in postgres shell it returns the result correctly.
Code:
type res struct{
id int
number int
user_id int
}
func getDataJoin(){
new := []res{}
db.Db.Table("users").Select("users.id as id, credit_cards.number as number, credit_cards.user_id as user_id").Joins("left join credit_cards on credit_cards.user_id = users.id").Scan(&new)
fmt.Println("user\n",new)
}
Generated Query:
SELECT users.id as id, credit_cards.number as number, credit_cards.user_id as user_id FROM "users" left join credit_cards on credit_cards.user_id = users.id
Database result
id | number | user_id
----+--------+---------
1 | 1 | 1
1 | 2 | 1
2 | 1 | 2
2 | 2 | 2
3 | 1 | 3
3 | 2 | 3
(6 rows)
Since go-gorm has a certain convention when it comes to naming, you might want to try two things.
Make your res struct publicly available, with public fields:
type Res struct{
ID int
Number int
UserID int
}
Or, specify mappings between columns and fields:
type res struct{
id int `gorm:"column:id"`
number int `gorm:"column:number"`
user_id int `gorm:"column:user_id"`
}
gorm can only read/write on exported fields much like Marshal/Unmarshal methods of json package. If the first letter of your field is in capital, it will be used. By default, gorm matches struct fields with their camel-cased forms. You can also define your own column names.
Since camel-cased form of both ID and Id, is id, as long as the first letter of your field is in capital, it should work. On a different note, it's good practice to write ID, i.e., both letter capital.
Sorry for a newbie question.
Currently I have log files which contains fields such as: userId, event, and timestamp, while lacking of the sessionId. My aim is to create a sessionId for each record based on the timestamp and a pre-defined value TIMEOUT.
If the TIMEOUT value is 10, and sample DataFrame is:
scala> eventSequence.show(false)
+----------+------------+----------+
|uerId |event |timestamp |
+----------+------------+----------+
|U1 |A |1 |
|U2 |B |2 |
|U1 |C |5 |
|U3 |A |8 |
|U1 |D |20 |
|U2 |B |23 |
+----------+------------+----------+
The goal is:
+----------+------------+----------+----------+
|uerId |event |timestamp |sessionId |
+----------+------------+----------+----------+
|U1 |A |1 |S1 |
|U2 |B |2 |S2 |
|U1 |C |5 |S1 |
|U3 |A |8 |S3 |
|U1 |D |20 |S4 |
|U2 |B |23 |S5 |
+----------+------------+----------+----------+
I find one solution in R (Create a "sessionID" based on "userID" and differences in "timeStamp"), while I am not able to figure it out in Spark.
Thanks for any suggestions on this problem.
Shawn's answer regards on "How to create a new column", while my aim is to "How to create an sessionId column based on timestamp". After days of struggling, the Window function is applied in this scenario as a simple solution.
Window is introduced since Spark 1.4, it provides functions when such operations is needed:
both operate on a group of rows while still returning a single value for every input row
In order to create a sessionId based on timestamp, first I need to get the difference between user A's two immediate operations. The windowDef defines the Window will be partition by "userId" and ordered by timestamp, then diff is a column which will return a value for each row, whose value will be 1 row after the current row in the partition(group), or null if the current row is the last row in this partition
def handleDiff(timeOut: Int) = {
udf {(timeDiff: Int, timestamp: Int) => if(timeDiff > timeOut) timestamp + ";" else timestamp + ""}
}
val windowDef = Window.partitionBy("userId").orderBy("timestamp")
val diff: Column = lead(eventSequence("timestamp"), 1).over(windowDef)
val dfTSDiff = eventSequence.
withColumn("time_diff", diff - eventSequence("timestamp")).
withColumn("event_seq", handleDiff(TIME_OUT)(col("time_diff"), col("timestamp"))).
groupBy("userId").agg(GroupConcat(col("event_seq")).alias("event_seqs"))
Updated:
Then exploit the Window function to apply the "cumsum"-like operation (provided in Pandas):
// Define a Window, partitioned by userId (partitionBy), ordered by timestamp (orderBy), and delivers all rows before current row in this partition as frame (rowsBetween)
val windowSpec = Window.partitionBy("userId").orderBy("timestamp").rowsBetween(Long.MinValue, 0)
val sessionDf = dfTSDiff.
withColumn("ts_diff_flag", genTSFlag(TIME_OUT)(col("time_diff"))).
select(col("userId"), col("eventSeq"), col("timestamp"), sum("ts_diff_flag").over(windowSpec).alias("sessionInteger")).
withColumn("sessionId", genSessionId(col("userId"), col("sessionInteger")))
Previously:
Then split by ";" and get each session, create a sessionId; afterwards split by "," and explodes to final result. Thus sessionId is created with the help of string operations.
(This part should be replaced by cumulative sum operation instead, however I did not find a good solution)
Any idea or thought about this question is welcomed.
GroupConcat could be found here: SPARK SQL replacement for mysql GROUP_CONCAT aggregate function
Reference: databricks introduction
dt.withColumn('sessionId', expression for the new column sessionId)
for example:
dt.timestamp + pre-defined value TIMEOUT
I am trying to make a table store 3 parts which will each be huge in length. The first is the name, second is EID, third is SID. I want to be able to get the information like this name[1] gives me the first name in the list of names, and like so for the other two. I'm running into problems with how to do this because it seems like everyone has their own way which are all very very different from one another. right now this is what I have.
info = {
{name = "btest", EID = "19867", SID = "664"},
{name = "btest1", EID = "19867", SID = "664"},
{name = "btest2", EID = "19867", SID = "664"},
{name = "btest3", EID = "19867", SID = "664"},
}
Theoretically speaking would i be able to just say info.name[1]? Or how else would I be able to arrange the table so I can access each part separately?
There are two main "ways" of storing the data:
Horizontal partitioning (Object-oriented)
Store each row of the data in a table. All tables must have the same fields.
Advantages: Each table contains related data, so it's easier passing it around (e.g, f(info[5])).
Disadvantages: A table is to be created for each element, adding some overhead.
This looks exactly like your example:
info = {
{name = "btest", EID = "19867", SID = "664"},
-- etc ...
}
print(info[2].names) -- access second name
Vertical partioning (Array-oriented)
Store each property in a table. All tables must have the same length.
Advantages: Less tables overall, and slightly more time and space efficient (Lua VM uses actual arrays).
Disadvantages: Needs two objects to refer to a row: the table and the index. It's harder to insert/delete.
Your example would look like this:
info = {
names = { "btest", "btest1", "btest2", "btest3", },
EID = { "19867", "19867", "19867", "19867", },
SID = { "664", "664", "664", "664", },
}
print(info.names[2]) -- access second name
So which one should I choose?
Unless you are really need performance, you should go with horizontal partitioning. It's far more common working over full rows, and gives you more freedom in how you use your structures. If you decide to go full OO, having your data in horizontal form will be much easier.
Addendum
The names "horizontal" and "vertical" come from the table representation of a relational database.
| names | EID | SID | | names |
--+-------+-----+-----+ +-------+
1 | | | | | | --+-------+-----+-----+
2 | | | | | | 2 | | | |
3 | | | | | | --+-------+-----+-----+
Your info table is an array, so you can access items using info[N] where N is any number from 1 to the number of items in the table. Each field of the info table is itself a table. The 2nd item of info is info[2], so the name field of that item is info[2].name.
Lets assume that I have Table which contains Username and the other one which contains FirstName. A single user can have multiple FirstNames:
How can I obtain a record which will contain Username and all first names separated by comma?
i. e.
Users
id | Username
1 | Test
Names
UserId | FirstName
1 | Mike
1 | John
I would like to receive a record which will contain
Test, "Mike, John"
How can I do that?
Edit: What if Users table will have more columns which I want to get?
i. e.
id | Username | Status
1 | Test | Active
How to get Test, Active, "Mike, John"?
You can use GroupBy and String.Join
var userGroups = from u in users
join n in names
on u.ID equals n.UserID
group new{n, u} by n.UserID into UserGrp
select new
{
Username = UserGrp.Key,
Status = UserGrp.First().u.Status,
Names = string.Join(",", UserGrp.Select(x => x.n.Name))
};
foreach (var ug in userGroups)
Console.WriteLine("{0}, {1}, \"{2}\"", ug.Username, ug.Status, ug.Names);