Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semantics for join query #37

Open
taegyunkim opened this issue Sep 3, 2020 · 0 comments
Open

semantics for join query #37

taegyunkim opened this issue Sep 3, 2020 · 0 comments

Comments

@taegyunkim
Copy link
Collaborator

taegyunkim commented Sep 3, 2020

Questions

  1. what are tables in our context?
    A single query is converted to a table in our case. Then joining means we want to combine results from two queries.
    What is the example of two queries that can't be merged into one?

  2. What can be the key for query joins?

  • time: what was happening at one service while another is doing x? if they're not causally related

Join semantics from other systems

  1. sql join: https://www.w3schools.com/sql/sql_join.asp
    Used to combine rows from two or more tables, based on a related column between them.
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
  1. spark (streaming) join: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#stream-stream-joins
    "The challenge of generating join results between two data streams is that, at any point of time, the view of the dataset is incomplete for both sides of the join making it much harder to find matches between inputs"

  2. pivot tracing happend-before join: https://www2.cs.uic.edu/~brents/cs494-cdcs/papers/pivot-tracing.pdf
    happened-before relation: https://lamport.azurewebsites.net/pubs/time-clocks.pdf
    The relation "->" on the set of events of a system is the smallest relation satisfying the following three conditions

  • If a and b are events in the same process, and a comes before b, then a->b.
  • if a is the sending of a message by one process and b is the receipt of the same message by another process, then a->b.
  • if a->b and b->c then a->c.

Example query from the pivot tracing paper

From incr In DataNodeMetrics.incrBytesRead
Join cl In First(ClientProtocols) On cl -> incr
GroupBy cl.procName
Select cl.procName, SUM(incr.delta)

this is done by propagating baggage along the request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant