PyDocuShare provides Python API to access Collections, Documents and their versions in a Xerox DocuShare site. You can automate your task or workflow that requires accesses to Xerox DocuShare using this API. This document gives you an idea about what can be done with this API.
In DocuShare, each docuemnt and object can be identified by a handle like Document-00000, Version-000000, Collection-00000. These handles are typically shown as a part of URL when you access your DocuShare site. For example, when you open a Collection in your DocuShare site, the URL in your Web browser should look like:
https://your.docushare.domain/docushare/dsweb/Get/Document-98765/xxxxx.pdf
"Document-98765" within this URL is what we call handle. This handle is essentially the key or the identifier to view the collections, documents and versions.
You need to login first to access your DocuShare site:
>>> from docushare import *
>>> ds = DocuShare(base_url='https://your.docushare.domain/docushare/')
>>> ds.login()
_
Enter your username for https://your.docushare.domain/docushare/
Username: your_user_name
_
Enter password of "your_user_name" for https://your.docushare.domain/docushare/
Password:
_
After successful login, you can access your DocuShare resources through the :py:class:`docushare.DocuShare` instance in the ds
variable. The example below downloads Document-98765:
>>> doc = ds.object('Document-98765')
>>> print(f'Download "{doc.title}" as "{doc.filename}".')
>>> doc.download()
PosixPath('/path/to/your/current/directory/{doc.filename}')
Now the Document-98765 should have been downloaded to your local storage in the shown path.
ds.object(handle)
may be replaced by ds[handle]
as shown below:
>>> doc = ds['Document-98765']
To download a specific version, you can also specify Version handle:
>>> ver = ds['Version-111111']
>>> print(f'Download "{ver.title}" as "{ver.filename}".')
>>> ver.download()
PosixPath('/path/to/your/current/directory/{ver.filename}')
You can get the version information as shown below:
>>> doc = ds['Document-98765']
>>> for ver_hdl in doc.version_handles:
... ver = ds[ver_hdl]
... print(f'{ver_hdl} is version #{ver.version_number} for {doc.handle}.')
The example below shows how you can download all documents in a Collection:
>>> col = ds['Collection-55555']
>>> col.download(destination_path = 'output_dir', option = CollectionDownloadOption.ALL)
[PosixPath('output_dir/dir1/document11.pdf'), PosixPath('output_dir/dir1/document12.pdf'), PosixPath('output_dir/dir2/document21.pdf'), PosixPath('output_dir/dir2/document22.pdf'), PosixPath('output_dir/document01.pdf')]
The :py:meth:`docushare.CollectionObject.download()` method returns the list of successfully downloaded files. It may take some time until the method actually starts downloading if there is a lot of documents in the Collection. It is likely because it takes time to get the properties of each document from your DocuShare site. You may change the log level of PyDocuShare API to INFO
so that you can see what is going on behind the scenes:
>>> import logging
>>> ds.logger.setLevel(logging.INFO)
>>> col = ds['Collection-66666']
2022-07-02 14:05:30,998: INFO - HTTP GET https://your.docushare.domain/docushare/dsweb/Services/Collection-66666
2022-07-02 14:05:30,299: INFO - HTTP GET https://your.docushare.domain/docushare/dsweb/View/Collection-66666
>>> downloaded_paths = col.download(destination_path = 'output_dir', option = CollectionDownloadOption.ALL, progress_report = False)
2022-07-02 14:05:33,327: INFO - HTTP GET https://your.docushare.domain/docushare/dsweb/Services/Collection-77777
2022-07-02 14:05:33,650: INFO - HTTP GET https://your.docushare.domain/docushare/dsweb/View/Collection-77777
2022-07-02 14:05:33,654: INFO - HTTP GET https://your.docushare.domain/docushare/dsweb/Services/Document-10001
2022-07-02 14:05:33,886: INFO - HTTP GET https://your.docushare.domain/docushare/dsweb/ServicesLib/Document-10001/History
2022-07-02 14:05:34,133: INFO - HTTP GET https://your.docushare.domain/docushare/dsweb/Services/Document-10002
2022-07-02 14:05:34,313: INFO - HTTP GET https://your.docushare.domain/docushare/dsweb/ServicesLib/Document-10002/History
2022-07-02 14:05:34,317: INFO - HTTP GET https://your.docushare.domain/docushare/dsweb/Get/Document-10001
2022-07-02 14:05:34,372: INFO - Started downloading: https://your.docushare.domain/docushare/dsweb/Get/Document-10001 => output_dir/dir1/document1.pdf
2022-07-02 14:05:34,511: INFO - Completed downloading: https://your.docushare.domain/docushare/dsweb/Get/Document-10001 => output_dir/dir1/document1.pdf
2022-07-02 14:05:34,511: INFO - HTTP GET https://your.docushare.domain/docushare/dsweb/Get/Document-10002
2022-07-02 14:05:34,543: INFO - Started downloading: https://your.docushare.domain/docushare/dsweb/Get/Document-10002 => output_dir/document2.pdf
2022-07-02 14:05:34,892: INFO - Completed downloading: https://your.docushare.domain/docushare/dsweb/Get/Document-10002 => output_dir/document2.pdf
You may access more information about the Collection through an instance of :py:class:`docushare.CollectionObject` returned by ds['Collection-xxxxx']
. In particular, the :py:attr:`docushare.CollectionObject.object_handle_tree` attribute lets you traverse all collections and documents under that Collection. The example code block below shows how to display the tree structure under Collection-70000:
>>> from anytree import RenderTree
>>> col = ds['Collection-70000']
>>> for pre, fill, handle in RenderTree(col.object_handle_tree):
... node_str = f'{pre}{handle}'
... hdl_obj = ds[handle]
... print(node_str.ljust(25), hdl_obj.title)
Collection-70000 (Title of Collection-70000)
├── Collection-70000 (Title of Collection-70000)
│ ├── Document-70001 (Title of Document-70001)
│ └── Document-70002 (Title of Document-70001)
├── Collection-72000 (Title of Collection-72000)
│ ├── Document-72001 (Title of Document-72001)
│ └── Document-72002 (Title of Document-72002)
├── Document-70001 (Title of Document-70001)
├── Document-70002 (Title of Document-70002)
└── Document-70003 (Title of Document-70003)
See the API reference of :py:class:`docushare.CollectionObject` for more details.
User authentication is one of the key things that PyDocuShare does for you to automate your task. By default, :py:meth:`docushare.DocuShare.login()` asks the user to enter the username and password. If you want to fully automate your workflow without any user interaction, you may pass the username and password as the arguments:
>>> ds.login(username = 'your_use_name', password = 'your_password')
However, it is not desirable to hard-code your password in a Python script. If you commit that script to Git by mistake, it can turn to be a SERIOUS SECURITY INCIDENT! It is highly recommended to store password somewhere else in a secure way and reuse it in successive logins for task automation. PyDocuShare provides a convenient option that automatically saves the password in a user directory in a secure way and reuses it next time. Call the :py:meth:`docushare.DocuShare.login()` method with password = PasswordOption.USE_STORED
argument to enable that feature:
>>> ds.login(username = 'your_use_name', password = PasswordOption.USE_STORED)
For the really first time, a dialog may pop-up and ask you to set the master password of your keyring as shown below:
The master password of your keyring is DIFFERENT from your DocuShare password. It is like the master password of your password manager. You should set a very strong password, but you also need to remember this password. It is recommended to set the same password as your DocuShare password so that you will have less chance to forget. You will be asked to enter the master password when you call :py:meth:`docushare.DocuShare.login()` method with password = PasswordOption.USE_STORED
argument for the first time after system reboot. But, once you enter the master password, you will not be asked to enter it until the system quits.
After that, you may be asked to enter your password on the DocuShare site in the console if it is your first time. If so, just enter your DocuShare password. If the user authentication is successful, the password is stored in your user directory. This storage is persistent, so you do not have to enter the DocuShare password forever until you change it.
Let's confirm that the password was stored correctly. Run the same method again and again:
>>> ds.login(username = 'your_use_name', password = PasswordOption.USE_STORED)
Now this method should not ask the user to enter the password, which means that PyDocuShare uses the stored DocuShare password for user authentication. As you as you call the :py:meth:`docushare.DocuShare.login()` method with the same arguments, all you need to enter is the master password (not DocuShare password) only once after every system boot.