Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pickle with shelve #4

Merged
merged 18 commits into from
Apr 2, 2019
Merged

Replace pickle with shelve #4

merged 18 commits into from
Apr 2, 2019

Conversation

iabraham
Copy link
Owner

Reviewer(s), please try to break this branch before approving merge-request. See issue for details. In short, legacy code was modified/updated and used to compress shelve files.

Suggested actions:

  • Run automate.py on a fresh Anaconda environment to generate the shelve files in HCP_1200 folder.
  • Run suggested code in HCP_1200/README.md to ensure generated binaries are accessible.
  • Copy over generated files to new folder outside repo along with zipshelve.py from the repo.
  • Create a test.py file in the new folder. In that file, import zipshelve and run all sorts of crazy tests on the binaries to see what can be made broken.

Copy link
Collaborator

@neurobahar neurobahar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this error:

Traceback (most recent call last):
File "", line 1, in
File "/Users/bahar/Documents/1-UIUC/Analytics/pyhcp/zipshelve.py", line 359, in open
return ZipShelf(filename, mode, protocol, compress_level, writeback, silent)
File "/Users/bahar/Documents/1-UIUC/Analytics/pyhcp/zipshelve.py", line 107, in init
shelve.Shelf.init(self, dbm.open(self.__filename, mode), protocol, writeback)
File "/anaconda3/envs/botohcp/lib/python3.7/dbm/init.py", line 88, in open
raise error[0]("db type could not be determined")
dbm.error: db type could not be determined

@iabraham
Copy link
Owner Author

iabraham commented Apr 2, 2019

Please fix this error:

Traceback (most recent call last):
File "", line 1, in
File "/Users/bahar/Documents/1-UIUC/Analytics/pyhcp/zipshelve.py", line 359, in open
return ZipShelf(filename, mode, protocol, compress_level, writeback, silent)
File "/Users/bahar/Documents/1-UIUC/Analytics/pyhcp/zipshelve.py", line 107, in init
shelve.Shelf.init(self, dbm.open(self.__filename, mode), protocol, writeback)
File "/anaconda3/envs/botohcp/lib/python3.7/dbm/init.py", line 88, in open
raise error[0]("db type could not be determined")
dbm.error: db type could not be determined

This error is caused by a file-naming issue. Dropping the extension from the name hcp_data.* should fix this one, however, it will cause another error:

SystemError: Negative size passed to PyBytes_FromStringAndSize

on MacOS machines. See here for more details.

@iabraham iabraham requested a review from neurobahar April 2, 2019 22:13
@iabraham
Copy link
Owner Author

iabraham commented Apr 2, 2019

But reading the database is a MacOS issue caused by dbm.ndbm. If we are using Campus Cluster (which is Linux) we should be using dbm.gdbm. To get this to work on MacOS with the Anaconda environment botohcp requires some finagling.

STEP 1 - Getting brew Python

which brew to confirm you have homebrew installed.

brew install python3

which may tell you Python is already installed with some version of Python 2.7.X. Then we need to follow instruction and run

brew update python

which should install Python 3.7.X correctly. BUT the brew recipe seems to include the gdbm library by default and so might require the packages to be installed and linked correctly

brew install gdbm
brew unlink gdbm && brew link gdbm

but hopefully those commands won't be required. Sometimes you will have to fix a permission denied error.

STEP 2 - Linking gdbm to Anaconda

Next activate the botohcp environment and run

python -c 'import sys; [print(x) for x in sys.path if "lib-dynload" in x]'

on the terminal. The output is your destination folder. Then deactivate the environment and run:

python3 -c 'import sys; [print(x) for x in sys.path if "lib-dynload" in x]'

That is our source folder. Sometimes you may have to use python3.7 instead of python3. CD into that folder and list the files and look for one that goes like _gdbm*. Copy that file to your destination folder. Now the Anaconda environment botohcp has the gdbm library and should be able to open GNU DBs.


REMARK: Confirm that python on your terminal loads Python 2.7.X (which is the system Python) and python3 loads Python 3.7.X (this is the brew python). Of course any python on the terminal after activating a Anaconda environment loads the python corresponding to that environment.

@iabraham iabraham merged commit 6f14f88 into master Apr 2, 2019
@iabraham iabraham deleted the pickle-replace branch April 2, 2019 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants