Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The importance of setting dataverse.siteUrl should be emphasized in the Installation Guide #4517

Closed
ajs6f opened this issue Mar 16, 2018 · 24 comments
Assignees

Comments

@ajs6f
Copy link
Contributor

ajs6f commented Mar 16, 2018

I installed Dataverse (4.8.3) and let some users sign up for accounts. I exposed the actual address of the application with port number, i.e. https://my.server:8080/.

The email messages being sent to users have links in them that drop the port number, i.e.

https://my.server/passwordreset.xhtml?token=blahblahblah

If Dataverse can't be exposed directly but must be proxied to port 80, that should be clearly explained in the install documentation. If it was, I missed it. Of course, it would be better for Dataverse to correct the code used to construct URLs.

Please ask me for any further information that would be useful.

@pdurbin
Copy link
Member

pdurbin commented Mar 16, 2018

I wrote http://guides.dataverse.org/en/4.8.5/installation/config.html#network-ports and I think I argue fairly strongly that putting Apache in front of Glassfish is a good idea. What I wrote there needs to be cleaned up, though. Right now I like to the Shibboleth page.

I think you can get the password resets working with a port in the URL if you include that port in your dataverse.siteUrl JVM option: http://guides.dataverse.org/en/4.8.5/installation/config.html#dataverse-siteurl

If that doesn't help, pleas let me know.

@ajs6f
Copy link
Contributor Author

ajs6f commented Mar 16, 2018

You do give that advice. You do not (to my reading) explain that the software is not functional otherwise.

At this point, users who have corrected the port number still cannot use the links because they time out after a minute or so. I can't find anything that seems relevant in the logs beyond:

[2018-03-15T20:50:00.591-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.timer.DataverseTimerServiceBean] [tid: _ThreadID=173 _ThreadName=__ejb-thread-pool14] [timeMillis: 1521161400591] [levelValue: 800] [[
  Behold! I am the Master Timer, king of all timers! I'm here to create all the lesser timers!]]

[2018-03-15T20:50:00.591-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.timer.DataverseTimerServiceBean] [tid: _ThreadID=173 _ThreadName=__ejb-thread-pool14] [timeMillis: 1521161400591] [levelValue: 800] [[
  Removing existing harvest timers..]]

[2018-03-15T20:50:00.593-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.timer.DataverseTimerServiceBean] [tid: _ThreadID=173 _ThreadName=__ejb-thread-pool14] [timeMillis: 1521161400593] [levelValue: 800] [[
  HarvesterService: checking timer 1]]

[2018-03-15T20:50:00.594-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.timer.DataverseTimerServiceBean] [tid: _ThreadID=173 _ThreadName=__ejb-thread-pool14] [timeMillis: 1521161400594] [levelValue: 800] [[
  HarvesterService: checking timer 2]]

[2018-03-15T21:50:00.577-0400] [glassfish 4.1] [INFO] [] [edu.harvard.iq.dataverse.timer.DataverseTimerServiceBean] [tid: _ThreadID=172 _ThreadName=__ejb-thread-pool13] [timeMillis: 1521165000577] [levelValue: 800] [[
  Handling timeout on si-dataverse.si.edu]]

so I'm not sure how much longer our evaluation is going to last. In any event, thank you for the pointer to dataverse.siteUrl.

@pdurbin
Copy link
Member

pdurbin commented Mar 16, 2018

Huh. Please feel free to email your server.log to [email protected] and we can take a look. If there's nothing in the log we may need to increase the logging levels as mentioned at http://guides.dataverse.org/en/4.8.5/admin/troubleshooting.html#glassfish but I don't off the top of my head know the exact classes or packages to increase the levels for.

@ajs6f
Copy link
Contributor Author

ajs6f commented Mar 16, 2018

Okay, thanks. I will set up a controlled sequence that I can mark in the log transcript to show the timing of the event, but that will probably not happen before Monday, because (just to make this more pleasant) this timeout phenomenon only happens for some users.

@pdurbin
Copy link
Member

pdurbin commented Mar 16, 2018

@ajs6f ok. Thanks. Please either post the transcript here or email it, like I mentioned. I'm not sure if this will help or be more confusing, but you can see an example of setting the site URL at https://github.com/IQSS/dataverse/blob/v4.8.5/conf/docker-aio/readme.txt#L17

@ajs6f
Copy link
Contributor Author

ajs6f commented Mar 16, 2018

Thanks, @pdurbin. My concern is twofold: One, if it is the case that either that property must be set or the app must be proxied, I suggest that that should be made more clear than it is. Two, JEE gives good tools for constructing URLs, so in the absence of that property, they should be used. It seems that they are not...

@pdurbin
Copy link
Member

pdurbin commented Mar 16, 2018

I feel like @kcondon especially has reported many times that not having dataverse.siteUrl configured properly is a gotcha and I agree that this should be documented better in the guides. Maybe this issue could be pulled into a sprint to do that. I'm eager to hear other feedback on the Installation Guide as well, since you just went through it. Please feel free to leave a brain dump in this issue while it's fresh in your mind. I'd like to make it easier to install Dataverse.

I'm not sure I'm following your point about Java EE URLs. It sounds like you have some experience with Java EE? Most people who install Dataverse don't seem to have much direct experience with it.

@ajs6f
Copy link
Contributor Author

ajs6f commented Mar 16, 2018

I don't know how you are constructing your email messages, but since they are being emitted in response to an HTTP request, you should be passing through some controller technology (e.g. JAX-RS). At that stage you should be able to use JEE facilities for building URLs, which will automatically account for port numbers, context names, etc. You could use that to create URLs and send them into the mail messages instead of relying on application-specific configuration.

@ajs6f
Copy link
Contributor Author

ajs6f commented Mar 16, 2018

Thanks for the invite to braindump, but unfortunately, I actually did the install a few months ago and don't recall that much. From what I do recall, the install was not especially problematic, although I do recall wondering why a full JEE container is required but normal JEE artifacts don't seem to be provided. I would much rather have just installed my container of choice and a normal JEE deployment. Using the JEE framework but requiring a specific container (Glassfish) and offering a custom installer but no normal JEE artifacts... it's all very counterintuitive.

More pointedly, my whole purpose in this exercise is to give my institution a chance to evaluate Dataverse. That end could have been accomplished by running a VM image or using a docker-compose ensemble or by other means. I heartily commend to you the possibility of releasing such an artifact so that evaluators just don't need to do an install.

@pdurbin
Copy link
Member

pdurbin commented Mar 18, 2018

@ajs6f thanks, you've given me a lot of food for thought. I have many more questions and things to say about...

  • normal Java EE artifacts and deployment
  • using a Java EE container of your choice
  • options for evaluating Dataverse (NDS Labs Workbench, DANS's docker-compose effort, "all in one" Docker image, Dataverse on OpenShift)

... but for the matter at hand of password reset not working when you use a port, you're completely right. It's a bug. Here's the line where the URL is constructed:

this.resetUrl = "https://" + finalHostname + "/passwordreset.xhtml?token=" + passwordResetData.getToken();

finalHostname operates on the older dataverse.fqdn JVM option and we should probably seek to eliminate that JVM option in favor of the newer dataverse.siteUrl JVM option. The former can be derived from the latter anyway. All this is just an artifact of the order in which the code was written and the fact that two different developers were working on different parts of the code at the same time in the early days of Dataverse 4 development. It's technical debt that should be paid down. I'd be happy to mentor anyone who's interested in fixing up this part of the code. Here's a link to the code above I'm talking about:

https://github.com/IQSS/dataverse/blob/v4.8.5/src/main/java/edu/harvard/iq/dataverse/passwordreset/PasswordResetInitResponse.java#L21-L42

If we do eliminate dataverse.fqdn, we need to have the installer prompt for dataverse.siteUrl instead.

@pdurbin
Copy link
Member

pdurbin commented Mar 18, 2018

options for evaluating Dataverse

@ajs6f The bottom line is that as of this writing your best bet for evaluating dataverse is to do a "pilot installation" as described at http://guides.dataverse.org/en/4.8.5/installation/prep.html#choose-your-own-installation-adventure . This is what you're doing already and I'm sorry to hear that it isn't going especially well for you. Please keep the feedback coming so we make the user experience better in the future. Thanks for opening #4515 for example.

That page also mentions "NDS Labs Workbench (for Testing Only)" but it has a slightly dated version of Dataverse. Kubernetes is used under the covers (and the code is available at https://github.com/nds-org/ndslabs-dataverse ) but I'm not sure how long the installation sticks around after you spin it up. #4152 is about documenting it better. @craig-willis and I have been talking about how if IQSS starts pushing production-ready images to DockerHub, he'll switch to them. The one I pushed to DockerHub are highly experimental, as I mentioned at http://guides.dataverse.org/en/4.8.5/developers/dev-environment.html#future-production-use-on-minishift-openshift-kubernetes

Over at #4040 I made an attempt at getting Dataverse to run on the free tier of OpenShift (among other things) but Dataverse is too big (over 1 GB) to run on the free tier. This would be been a nice option for evaluating Dataverse, I believe. I'm not sure if it's possible to slim Dataverse down enough to run on the free tier.

Recently @4tikhonov from DANS mentioned that they have a docker-compose file for Dataverse at https://github.com/Dans-labs/dataverse-docker but no one at IQSS has tried it yet.

Some of these community efforts are mentioned in the Dev Efforts by the Dataverse Community spreadsheet. For more context on the spreadsheet, please see this post on the dataverse-community list.

If you're into Docker, there's a new "all in one" image that's used mostly for integration tests at conf/docker-aio but I've never heard of it being used for evaluation purposes.

Back in the day, people asked if we could provide VMWare images (#2280) but it sounds like a docker-compose file is what you'd find most useful these days? It sounds like you're looking for something easier than following the Installation Guide. There's also a community effort to install Dataverse with Ansible over at https://github.com/IQSS/dataverse-ansible if that's of interest. I hope this brain dump helps. I do want to make it easier for people to evaluate Dataverse. Ideas are very welcome!

@ajs6f
Copy link
Contributor Author

ajs6f commented Mar 19, 2018

@pdurbin This is all useful info, and I'm glad to hear that you are actively working on this question of "easy-bake" installs (for evalutation or otherwise). For my purposes, either a VMWare image or a docker-compose ensemble would have been superb. An AWS image would also have been a wonderful tool. You are right that I was looking for something that would require much less time than a proper install. I may look into some of these community-supported efforts if it appears that debugging my current install is going to demand too much time.

As for Java EE artifacts, I would have been very happy to just download an EAR and deploy it to my container of choice (probably Wildfly), setting up a database connection in the usual way for my container. I realize that the effort involved in created a reliable build process for an EAR is non-trivial, but having done both, I doubt that it is much greater than the effort to maintain a custom installer. Unfortunately, the choice of Solr seems to prevent anyone from deploying it into the same container as the Dataverse application, since Solr has chosen to require their own server (ElasticSearch doesn't have the same problem) so there would still have been an extra step to set up a Solr instance, but that would still have been (to me) an improved experience.

As for this specific ticket, I can see that using dataverse.siteUrl is better than what is happening now, but please don't neglect the possibility of leaving dataverse.siteUrl unset. If an installer does that, the graceful alternative for Dataverse would be to rely on the JEE URL-building services to use the container-supplied URL base. If that were the case now, I would never had seen a problem.

@ajs6f
Copy link
Contributor Author

ajs6f commented Mar 19, 2018

Rereading my comment, I think it's important for me to emphasize that while the actual installation process was more involved than it could have been, it really wasn't that burdensome. What has been troubling has been the number of what appear to be either problems with my installed instance or bugs, a set of worries that has been ameliorated to some extent by the expedition with which you've been responding to tickets.

@ajs6f
Copy link
Contributor Author

ajs6f commented Mar 19, 2018

I've broken off the "what kind of JEE artifact?" issue to #4523, so that this ticket can stay with the question of URL-building.

@pdurbin
Copy link
Member

pdurbin commented Mar 20, 2018

@ajs6f I'm glad the installation process wasn't that burdensome for you. You do seem to have hit a number of issues such as #4515 and this one about the port number (good catch, thanks again for the bug report). That's for opening #4523 as a separate issue. I just left you a reply about artifacts and Java EE stuff there.

@pdurbin
Copy link
Member

pdurbin commented Jul 18, 2018

That end could have been accomplished by running a VM image or using a docker-compose ensemble or by other means. I heartily commend to you the possibility of releasing such an artifact so that evaluators just don't need to do an install.

Just an update that #4665 is about using Docker and there's been a lot of chatter in there.

@ajs6f do you still think we should fix this port issue? Again, people mostly run Dataverse behind a proxy, I think and configure dataverse.siteUrl to not have a port.

@ajs6f
Copy link
Contributor Author

ajs6f commented Jul 20, 2018

The point I was making above is that I cannot see why Dataverse has a setting like dataverse.siteUrl at all. That's normally the province of the container, not contained applications. If Dataverse is going to continue to manage its own URL, this ticket could be closed as "won't-fix".

@pdurbin
Copy link
Member

pdurbin commented Jul 20, 2018

@ajs6f thanks. I hear what you're saying about containers (but I don't know what the fix would be) but right now, a variety of Dataverse features absolutely depend on dataverse.siteUrl such as these from a quick swing through the code:

  • email confirmation (this issue)
  • password reset link (code should be refactored to use the setting)
  • generating a Private URL
  • exporting to Schema.org format (and showing JSON-LD in html meta tag)
  • exporting to DDI format
  • which Dataverse installation an "external tool" should return to
  • which Dataverse installation Geoconnect should return to.

I'm trying to decide what we should do with this issue. What if we change the title to something like "document the importance of configuring dataverse.siteUrl"? Documentation-only issues are easier to move through the sausage factory.

@ajs6f
Copy link
Contributor Author

ajs6f commented Jul 20, 2018

The issues that you mention in that list require the application to have some URL in hand, but I'm not sure why any of them require that the application maintain that URL itself. In any event, I'm perfectly fine with however you'd like to handle this ticket. If I had seen some really blaring blatant warnings in the install pages (YOU MUST SET THIS CONFIG OR NOTHING ON THE FOLLOWING LIST OF FUNCTIONS WILL WORK) then I probably would have managed to do it. Probably.

@pdurbin pdurbin changed the title Port number dropped in email confirmation links The importance of setting dataverse.siteUrl should be emphasized in the Installation Guide Jul 20, 2018
@pdurbin
Copy link
Member

pdurbin commented Jul 20, 2018

@ajs6f cool. Makes sense. I just changed the title of this issue to "The importance of setting dataverse.siteUrl should be emphasized in the Installation Guide" and spoke with @kcondon about it who agrees with us about how important that setting is.

Do you feel like making a pull request? It would be a way for you to see our dev process. Right now this issue is in the "Inbox" at https://waffle.io/IQSS/dataverse

@ajs6f
Copy link
Contributor Author

ajs6f commented Jul 24, 2018

Here? I can try quickly to insert a new paragraph, but that's about it. (All of my engagement with DV has been entirely speculative, and as such I appreciate your attention and time, but before I invest much more I need to hear from other SI staff that we have some real likelihood of using the product.) Is dataverse.siteUrl something that is configured by the installer script?

@pdurbin
Copy link
Member

pdurbin commented Jul 24, 2018

Pull request #4887 looks good! Thanks! I moved it and this issue to QA at https://waffle.io/IQSS/dataverse

@pdurbin
Copy link
Member

pdurbin commented Aug 14, 2018

Related: #4947

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants