-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize writes in genomix data using nulls #57
optimize writes in genomix data using nulls #57
Conversation
Any comments on this one? It's been open for 4 days now o_O |
if (n.internalKmer != null && n.internalKmer.getKmerLetterLength() > 0) { | ||
n.internalKmer.write(out); | ||
} | ||
if (n.averageCoverage != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this write()
block consist with the setByCopy()
, in terms of n.getActiveFields & NODE_FIELDS.XXXX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they should be consistent. n.getActiveFields
writes a single byte whose bits are defined in NODE_FIELDS
and are 0 if the particular values are null
or size() == 0
. The setByCopy()
function might be a little more permissive in that if size() == 0
, it will create a new list of length 0 rather than being null
. Shouldn't affect behavior but I could change it to be more consistent if we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to make it more consistent. Because there are several fields to write and read, then the order became very critical. If we want to change or add some fields in the future, it would be easier to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe I should be more clear:
setAsCopy(byte[] data, int offset)
reads the very first byte from thebyte[]
just aswrite()
writes it andreadFields()
reads it. There is no difference there and they should have the exact same effect. They should be completely consistent withNODE_FIELDS
andgetActiveFields()
.setAsReference(byte[] data, int offset)
is almost an exact copy of that function with the exception thatVKmer
instances (in theEdgeMap
s and theinternalKmer
will be references rather than copies-- everything else must be a copy.setAsCopy(Node other)
is what I meant when I said they're not consistent-- for that function, whenother
has a list whosesize() == 0
,this.<list>
could either benull
or ofsize() == 0
as well. Right now, it'ssize() == 0
, just like theNode
that's being copied. But this really shouldn't affect anything practically.
LGTM 👍 |
@anbangx the test cases in travis are currently failing because the pregelix input files are out of date. Sure would be nice to have that automatic generation tool you've been working on 😄. |
members alway be null until we need it~ I think I can modify and add more 'null' cases in nodeTest to test some functions. |
@Nan-Zhang yes, we could return a null in some cases instead of creating the object on the fly. The problem with that approach is when you want to iterate, you have to check for if (node.getEdgeMap(FF) != null)
for (EdgeMap m : node.getEdgeMap(FF))
// do something with m which makes the interface ugly. So without changing the interface at all (we never return |
|
LGTM On Tue, Nov 12, 2013 at 3:01 PM, Jake Biesinger [email protected]:
|
…mix-data-using-nulls optimize writes in genomix data using nulls
@anbangx @JavierJia @Nan-Zhang
This commit is based on and takes #56 further into Node-- only those values that are non-null and collections of length > 0 are saved into the output stream.
Outside of
Node
, the only visible difference is that there are no default values for averageCoverage (it's null by default and trying to use it without setting it first will give a NPE).Inside of
Node
, anywhere aget()
is called, we initialize the requested value if it's not already existing. There's also a fair bit of extra code involved in checking fornull
(curse you, Java) especially in comparison with raw types.The first commit is kind of unrelated-- it removes
EnumSet.allOf()
where it can be replaced with much less overhead withenum.values()
.