Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JOSS REVIEW] Install instructions #6

Open
dbuscombe-usgs opened this issue Sep 3, 2021 · 11 comments
Open

[JOSS REVIEW] Install instructions #6

dbuscombe-usgs opened this issue Sep 3, 2021 · 11 comments

Comments

@dbuscombe-usgs
Copy link

dbuscombe-usgs commented Sep 3, 2021

Installation instructions should be improved - as currently written, the instructions are not an optimal way to ensure success across different platforms and for relatively novice users who don't necessarily know the pitfalls with using pip and conda together in this way ...

and, I cannot get them to work, because the pip repository (why?) is packaged with dependencies that break the conda environment. In fact, the installation is a mess for a number of reasons:

  1. the instructions don't mention what version of python to use, which is crucial. You should recommend a version that you know to work across multiple platforms. I am initially testing on Windows, so my python version will naturally be ahead of my Linux computer

  2. installing some dependencies prior to installing the conda environment is a VERY bad idea because they'll install into the base environment, so won't be accessible to the sandpyper environment, and also would very likely break the base environment

  3. You should consider providing a conda yml file

  4. Further, you are being very restrictive with these versions (that are not compatible with my conda install and OS, for example)

conda install geopandas=0.8.2 matplotlib=3.3.4 numpy=1.20.1 pandas=1.2.2 tqdm=4.56.2 pysal=2.1 rasterio=1.2.0 richdem=0.3.4 scikit-image=0.18.1 scikit-learn=0.24.1 scipy=1.6.0 seaborn=0.11.1 tqdm=4.56.2 pooch=1.4.0 fuzzywuzzy

and that is bad way to 'future-proof' your installation (sorry, but you WILL be getting issues about this). Further, that's not a good way to use conda - conda is designed to tell YOU what versions YOU need, and the requirements.txt file would be the usual place to put really specific versions (for a pip install)

  1. There is no need to install richdem and visual studio tools (thatmight require admin privileges). Conda will install vs build tools for you that for you if you DONT provide the specific version numbers. And richdem is installed by conda, so that's a confusing instruction to install that separately. You can therefore remove that confusing step.

  2. the jupyter package should be installed with the rest, because of the very complex dependencies that might break conda with a posthoc installation. Also the package is called jupyter, not jupyter notebook (that is the command you run afterwards)

  3. the pip packages sandpyper ships with versions of numpy, pandas, rasterio and matplotlib that either conflict with the conda environment, or should simply be installed with the conda environment. The pip installation also raises GDAL dependency errors. That should be in the conda environment!

Conda and pip have very different purposes. I STRONGLY recommend you make a conda environment with all of the dependencies contained within. Its not at all clear to me why the package itself has to be a pip installation, but if you insist of that, it should ship with no additionally dependencies. At which point, there's no point making it a pip installation - you see my point? Its very confused.

I am therefore recommending the following approach, creating an environment with a specific python version

conda create --name sandpyper_env python=3.7

then installing ALL the dependencies inside it.

conda install geopandas matplotlib numpy pandas tqdm pysal rasterio richdem scikit-image scikit-learn scipy seaborn tqdm pooch fuzzywuzzy jupyter gdal

That approach a) was the only thing that worked for me, b) is superior for the reasons I describe above, and c) circumvents the need to install richdem and visual studio tools

Therefore the entire installation could be simply for following

conda config --add channels conda-forge

then

conda create --name sandpyper_env python=3.7

then

conda activate sandpyper

conda install geopandas matplotlib numpy pandas tqdm pysal rasterio richdem scikit-image scikit-learn scipy seaborn tqdm pooch fuzzywuzzy gdal

and finally

pip install sandpyper

(however, that results in GDAL dependency errors because pip/pypi is not appropriate here)

so, use pip/git

pip install git+https://github.com/npucino/sandpyper.git

that also gives GDAL errors

So, I cant install on windows. I cant review the code until I can install

I strongly recommend testing installation instructions on multiple machines before posting them

@dbuscombe-usgs
Copy link
Author

I didnt install GDAL from a wheel like instructed because I happen to know that rasterio/gdal will install on my machine through conda. It is the pip package that is breaking GDAL

@dbuscombe-usgs
Copy link
Author

If a conda environment is created this way (with the package that has least chance of successful integration with the others, which is probably gdal)

conda create --name sandpyper_env gdal

conda activate sandpyper_env
conda install geopandas matplotlib numpy pandas tqdm pysal rasterio richdem scikit-image scikit-learn scipy seaborn tqdm pooch fuzzywuzzy

At this point. all the dependencies are installed correctly. It is then the pip package pip install sandpyper (with its really specific and conflicting dependencies) that break GDAL

I'll keep trying to make this work ...

@dbuscombe-usgs
Copy link
Author

Progress! I have a working solution for my windows machine. Conda works best when you tell it upfront everything you're asking it to install, including all dependencies and pip dependencies. Therefore I recommend going the route of installing from a yml file.

This yml file successfully installs the package and its dependencies . Paste the following into a file called sandpyper.yml

name: sandpyper_env
channels:
  - conda-forge
  - defaults
dependencies:
 - python
 - scipy
 - numpy
 - scikit-image
 - scikit-learn
 - seaborn
 - pooch
 - fuzzywuzzy
 - ipython
 - jupyter
 - rasterio
 - tqdm
 - geopandas
 - matplotlib
 - richdem
 - pip
 - pip:
   - sandpyper

then, install using

conda env create --file sandpyper.yml

That is successful, and also much easier to communicate

however, when I import sandpyper there is a syntax error in the init file of the package

In [2]: import sandpyper
Traceback (most recent call last):

  File "C:\Users\dbuscombe\Anaconda3\envs\sandpyper_env\lib\site-packages\IPython\core\interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-2-bf14279135af>", line 1, in <module>
    import sandpyper

  File "C:\Users\dbuscombe\Anaconda3\envs\sandpyper_env\lib\site-packages\sandpyper\__init__.py", line 1
    __version__='__version__ = '0.0.2''
                                ^
SyntaxError: invalid syntax

@dbuscombe-usgs
Copy link
Author

The fix is to change the first line of init.py in the installed package to

__version__="__version__ = '0.0.2'"

then import sandpyper gives another dependency error:

ModuleNotFoundError: No module named 'pysal'

luckily, conda has that:
conda install pysal

that install successfully. However, now I run into an issue with pysal, so I'm guessing the pysal version that conda installs is older?

I'm going to give up troubleshooting for now until @npucino has a chance to take a look at all of this

In [1]: import sandpyper
C:\Users\dbuscombe\Anaconda3\envs\sandpyper_env\lib\site-packages\numba\np\ufunc\parallel.py:366: NumbaWarning: The TBB threading layer requires TBB version 2021 update 1 or later i.e., TBB_INTERFACE_VERSION >= 12010. Found TBB_INTERFACE_VERSION = 11100. The TBB threading layer is disabled.
  warnings.warn(problem)
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-bf14279135af> in <module>
----> 1 import sandpyper

~\Anaconda3\envs\sandpyper_env\lib\site-packages\sandpyper\__init__.py in <module>
      3 __email__ = '[email protected]'
      4
----> 5 from . import dynamics
      6 from . import hotspot
      7 from . import labels

~\Anaconda3\envs\sandpyper_env\lib\site-packages\sandpyper\dynamics\__init__.py in <module>
      1 __version__='0.0.1'
----> 2 from .dynamics import (
      3 attach_trs_geometry,
      4 infer_weights,
      5 get_coastal_Markov,

~\Anaconda3\envs\sandpyper_env\lib\site-packages\sandpyper\dynamics\dynamics.py in <module>
      4 import geopandas as gpd
      5
----> 6 from pysal.explore.giddy.markov import Markov
      7 import matplotlib.pyplot as plt
      8 import seaborn as sb

ImportError: cannot import name 'Markov' from 'pysal.explore.giddy.markov' (C:\Users\dbuscombe\Anaconda3\envs\sandpyper_env\lib\site-packages\pysal\explore\giddy\markov\__init__.py)

@dbuscombe-usgs
Copy link
Author

Updated yml file to deal with the pysal dependency:

name: sandpyper_env
channels:
  - conda-forge
  - defaults
dependencies:
 - python
 - scipy
 - numpy
 - scikit-image
 - scikit-learn
 - seaborn
 - pooch
 - fuzzywuzzy
 - ipython
 - jupyter
 - rasterio
 - tqdm
 - geopandas
 - matplotlib
 - richdem
 - pysal
 - pip
 - pip:
   - sandpyper

@dbuscombe-usgs
Copy link
Author

The pysal version is 2.5.0

@npucino
Copy link
Owner

npucino commented Sep 6, 2021

Thanks @dbuscombe-usgs and sorry for the numerus issues you encountered during installation.

I just tried the installation on a new Windows machine by following the guidelines I wrote in the README and everything worked out smoothly with not a single errors. Which is not necessarily a good thing cause I have to understand what went wrong with yours.
Let's review your issues:

  1. I must write the supported Python versions and OS you are right, thanks for noting that. At the moment are (tested with github CI here)
- { os: windows-latest, py: "3.8" }
- { os: macOS-latest, py: "3.8" }
- { os: ubuntu-latest, py: "3.8" }
- { os: ubuntu-latest, py: "3.9" }  
  1. The only dependency I thought people will need to have prior the sandpyper environment is VS build tools, not richdem. I see the confusion here. What I meant in the README is simply that you need to have VS build tools installed in your machine because richdem will need it (during the conda installation). The fact that I put an hyperlink to richdem must have misled you into thinking that richdem must be installed prior the sandpyper_env in your base environment, which is definitely not what I meant and not what is intended. I change the sentence to make it more clear, thanks!
  2. I agree that PIP + CONDA = MESS, so one unique yml file would be ideal and also to package everything into conda would be much more efficient. So I do that asap. Hopefully there will not be the need to install VS build tools separately.
  3. Yes. I am being restrictive because those are the versions of the packages I am sure will work. In fact, if you allow conda to choose the versions for you when creating the environment, these will not match sandpyper requirements when imported with pip and you will get the conflicts you noted in issue number 7.
  4. see point number 3.
  5. absolutely, thanks for noting this!

The init typos and the pysal dependency errors are both due to the fact that your installed version was a very old one 0.0.2 (instead of the new 1.0.0).
I suspect that by allowing conda to decide for you which package to download (without specifying the package versions) caused an error somewhere (most probably GDAL via rasterio) and made pip look for an older compatible version of Sandpyper which happened to be the one that had the above mentioned issues.

So, I now:

  • correct the readme
  • package everything in conda

Thanks for this!

@npucino
Copy link
Owner

npucino commented Sep 6, 2021

By the way, @dbuscombe-usgs can you explain me why being strict with packages versions during installation is not future-proof, assuming these packages versions will not be deleted from servers? Thanks!

@dbuscombe-usgs
Copy link
Author

Hi @npucino, thanks, that sounds like an excellent plan! And yes I believe you are correct about the reason for the installation of the old sandpyper package. I will wait for your changes before attempting again with a fresh installation.

I mentioned about the version numbers because conda probably wants to decide for itself about the versions that are compatible. As bugs are being fixed constantly, my personal philosophy would be to try to use the latest stable versions where possible. Additionally, I think if you want to be restrictive with package versions, you may want to specify a specific version for all packages, not just most ...

@crvernon
Copy link

Tagging our review thread: openjournals/joss-reviews#3666 (comment)

@chrisleaman
Copy link

Hi @dbuscombe-usgs & @npucino - aren't python packaging issues just fun to deal with 😑😅? Just to add another data point, I did manage to follow the instructions in the "Getting Started" section in the README, get the environment installed and the jupyter notebooks running (on Windows 10 x64).

Still, I would recommend going with the approach @dbuscombe-usgs suggested #6 (comment) and creating a enviroment.yml file. This would simplify installation into one command (conda env create --file sandpyper.yml) and be less prone to conflicts if users want to install additional packages in the same environment to do their own analysis. In future, it might also be worth putting sandpyper on conda-forge as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants