-
Notifications
You must be signed in to change notification settings - Fork 17
Search in shared files using a single index #10
Comments
maybe @craigpg has some input on how to approach searching inside shared files |
This is exactly what we're looking for. Is this planned? |
would be nice to combine with #40 |
Any update here please ? |
I will have a look for OC9. |
I have a client who is interested in this work and we may be able to provide some programming time or other resources. We'd prefer to collaborate on work in progress and not duplicate work. |
@NacreData I suspect there is no duplicate work in place right now. Would you be able to provide some time for this? |
Yes I have some time allocated for it over the next month or so, I will probably be starting next week. Do you have any ideas or direction for anything else to tell me that would make my efforts more successful? devincontact info: http://nacredata.com/devin
|
Looking today at how much more (compared to my hack described above) would be involved in doing it the "right" way described in "planned approach" at the top. It would help greatly to have the code used for "Initial testing indicates that query hits can be used to obtain the original document, update it with the updated list of users / groups who can acces the document and then delete & reinsert the document into the index.". Is that possible? |
@butonic could you provide @NacreData that code? |
@NacreData Don't assume that the users home directory is named after their username. That's the default but is not guaranteed. In an AD backed system the default internal username will be the value of the objectGUID attribute, a long string of letters and numbers. The user_ldap app allows the home directory to be named after a different attribute as this is often much more convenient and certainly easier to type. So the only way to find out the path for the home directory for a given user object is to ask it. \OC::$server->getUserManager->get($uid)->getHome() is what you're looking for. If the backend can provide the home path like user_ldap does, you get that returned. Otherwise you get a constructed path of datadirectory.'/'.$uid. There are lots of bugs in lots of apps because of this incorrect file system assumption. |
The code I have at https://github.com/NacreData/search_lucene/tree/shared-files is now working for me to search across shared files. This does the work to move the index to a combined/site-wide index and then use user-specific file attributes to filter the results. I have submitted a pull request #120. Much of what I've done could probably be done in a better/more maintainable way by folks with more OC experience - I hope some folks will help improve this so it can be committed and maintained. Thanks. |
@georgehrke mind having a look at the above comments ? |
I'll take a look Monday when I'm back from vacation Please excuse my brevity and typos. Please excuse my brevity and typos.
|
Any help needed on this issue? Can you tell anything about the actual status? |
@feuse8 help would be awesome! As far as I can tell, the best place to look for the current status is to look at the last two comments in the pull-request thread here: #121 I am happy to help with explaining what I've done and what I'm thinking to help move this forward, and I'll be writing more code if/when possible over the next month or two. |
any update on this? |
The non-profit funding my work on this is waiting for new funding, so
I've not been working on it recently.
-- devin
On 11 May 2017, at 11:40, AsimPervez93 wrote:
any update on this?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#10 (comment)
devin
ديفين كيرتاص
--
contact info: https://nacredata.com/devin
• Online Task Management software: https://www.next-task.com
• Amazing Tie-Dye Tee-shirts: https://www.icedyedesigns.com
• Support progressive climate politics: https://www.voteclimatepac.org
• In Solidarity we are strong: http://www.ncsolidarity.net
|
Originally opened as owncloud-archive/apps#1464
Steps to reproduce
Expected behaviour
Users should be able to find files that have been shared with them by searching in the content.
Actual behaviour
Currently, only the users files are indexed.
Technical background
The lucene index is stored on a per user basis and resides in the
/<userhome>/lucene_index
. While it is not encrypted for performance reasons, that is possible but would prevent using another users index for the full text search (because we cannot access his encrypted index without his secret key).Planned Approach
The current plan is to make the documents in the lucene index contain the name of users and groups allowed to access the file. Whenever a file is shared / unshared we need to update the document in the lucene index. Unfortunately, lucene - by design - only allows adding or deleting documents in the index. Initial testing indicates that query hits can be used to obtain the original document, update it with the updated list of users / groups who can acces the document and then delete & reinsert the document into the index. All without having to reindex the original file. Which would take far too long.
Maintaining the permissisons like this is described in http://www.lucenetutorial.com/techniques/permission-filtering.html and we cann add the user that is querying to the query as a subquery as shown in http://framework.zend.com/manual/1.12/en/zend.search.lucene.searching.html
Further thoughts
When we add user / group permissions we could create a single global index and use that instead of querying each individual user index. Whether that will improve performance (because we only need to access one index) or degrade it (because the index might grow very large) remains to be tested.
Using a single index simplifies the whole architecture. And is the way to go.
The text was updated successfully, but these errors were encountered: