-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Shapes of partition_fragment_{A,B}
#1299
Comments
I've attached the output of using MMA_Op = SM80_16x8x16_F32F16F16F32_TN;
using MMA_Traits = MMA_Traits<MMA_Op>;
using MMA_Atom = MMA_Atom<MMA_Traits>;
using ThreadLayoutMNK = Layout<Shape<Int<4>, _1, _1>>;
using TiledMma = TiledMMA<MMA_Atom, ThreadLayoutMNK>;
print_latex(TiledMma{}); (Note that a recent update removed the While the Atom is 16x8x16, a AtomLayout of 4x1x1 causes the TiledMma to be 64x8x16. This is what For |
Thanks! That makes sense. How to obtain the same |
Is
what you're looking for? That's |
That could work, but that doesn't give the same Original, with New, no |
Then you probably want
To produce a 64x16x16 TiledMMA from the 64x8x16 TiledMMA. (But it will have the same partitioning effect as without the |
Thanks! |
What is your question?
Assuming a
TiledMma
defined as:where
MMA_Op
forSM80_16x8x16...
is:Then I define the following
A
andB
tensors:When I get a
thr_mma
slice as such:I get the following output
thr_mma
:tSrA
:tSrB
:Since
gA
is shape128 x 32
, this would require 4tiles
of thetiled_mma
, each of which is64 x 16
. ThustSrA
makes since, since for the givenMmaAtom
, each thread is responsible for2 x 2 x 2 = 8
fp16
A
values, and the2 x 2 = 4
refers to the required tiling.For
tSrB
, how is the shape derived? The(2, 2)
makes sense, since for theMmaAtom
, each thread is responsible for4 fp16 B
values. However, having trouble reconciling how the16
and2
are derived. One could reason that16 = 128 / 8
and2 = 32 / 16
where8
and16
divisors comes fromN
andK
of theMmaAtom
and the16
and2
are the number ofMmaAtoms
needed to achieve the desired128 x 32
.However,
tSrA
wouldn't make sense then, since applying the same logic would requiretSrA
to have shape((2, 2, 2), 8, 2)
since8 = 128 / 16
and2 = 32 / 16
where16
comes from theMmaAtom
M
andK
.The text was updated successfully, but these errors were encountered: