Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make internal broadcast and unbroadcast both primitives #292

Open
wants to merge 4 commits into
base: dev-1.2
Choose a base branch
from

Conversation

mattjj
Copy link
Contributor

@mattjj mattjj commented Sep 12, 2017

At @dougalm's suggestion, I took a stab at making our internal broadcast and unbroadcast functions into primitives. They seem to form a nice pair!

This might prevent graph expansion and be a bit faster, though I haven't actually run asv on this change yet.

Any thoughts on this first pass, @j-towns?

@j-towns
Copy link
Collaborator

j-towns commented Sep 13, 2017

Yeah looks like a good idea. I'd been meaning to switch to using numpy's broadcast_to inside our broadcast for a while.

@j-towns
Copy link
Collaborator

j-towns commented Sep 13, 2017

Also I feel like we should get the optional vspace checking setup as a matter of priority so that we can rigorously test that these functions are outputting the correct thing (dtype in particular).

target_shape, target_ndim, _, target_iscomplex = target_meta
x_shape = onp.shape(x)
while onp.ndim(x) > target_ndim:
x = onp.sum(x, axis=broadcast_idx)
Copy link
Collaborator

@j-towns j-towns Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we should replace the above two lines with:

x = onp.sum(x, axis=range(broadcast_idx, broadcast_idx + onp.ndim(x) - target_ndim))

or similar. Am I right that only calling sum once might lead to better performance, basically because only one output array has to be allocated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly tried something like that (though if broadcast_idx is -1, which is the only nonzero use case I noticed in the code, then I think we want something different) and it didn't seem to make a speed difference, so I dropped it. Now is a good time to make sure it's performant, though!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a few timings it looks like there is a benefit for small arrays but it's not massive:

In [15]: a = np.ones((5, 5, 5))

In [16]: %timeit np.sum(a, axis=(0, 1))
5.38 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [17]: %timeit x = np.sum(a, axis=0); x = np.sum(x, axis=0)
8.62 µs ± 124 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

and for slightly bigger arrays it's the other way round (maybe I've made some mistake?):

In [18]: a = np.ones((50, 50, 50))

In [19]: %timeit np.sum(a, axis=(0, 1))
118 µs ± 930 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [20]: %timeit x = np.sum(a, axis=0); x = np.sum(x, axis=0)
81.6 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I got similar timings. That seems weird for the bigger arrays...

if anp.iscomplexobj(x) and not target_iscomplex:
x = anp.real(x)
if size == 1: # TODO(mattjj): bug here w/ passing through scalars?
x = onp.sum(x, axis=axis, keepdims=True)
Copy link
Collaborator

@j-towns j-towns Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do a similar thing for this one.

@mattjj
Copy link
Contributor Author

mattjj commented Sep 13, 2017

Btw I think this change was inspired by your improvements to the dot VJPs.

Re: vspaces, I generally agree, though I'm thinking that if vspaces are primarily for testing, we should use them extensively in our testing code but not incur the costs at runtime for every grad eval.

@j-towns
Copy link
Collaborator

j-towns commented Sep 13, 2017 via email

j-towns added a commit to j-towns/autograd that referenced this pull request Oct 18, 2017
@j-towns
Copy link
Collaborator

j-towns commented Oct 26, 2017

I've effectively incorporated the changes in this pr into #312.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants