# Dev environment setup for Python data science project

The ideal and easy dev setup guideline for a Python-based data science project.

This blog serves as a note for myself when I want to build a Python based data science project. This is a minimal setup and mostly for ad-hoc or light-weight DS experiment projects. We can even drop the pytest part if not needed, or add more parts such as continuous integration and deployment if the project becomes heavier and more mature.

### Basic environment requirements

First, let’s lay out the basic environment that we have for the OS system. I’m using macOS Catalina 10.15.3 at the time of writing. Also, I don’t use conda now, but I have to admit that I used conda heavily, and then realized that it is not flexible enough for me to quickly setup or remove light weight project environment.

• To uninstall conda, we can do this:

Don’t forget to remove the conda initializing bash script and remove it from your PATH in your ~/.bash_profile or ~/.zshrc if you are using zsh

• Make sure your xcode is updated.

• Make sure your homebrew is updated

If you still want to keep conda, here is the recipe, thanks to this stackoverflow thread:
Expose command conda but don’t activate any environment, even the base environment. Execute the following commands in your shell.

Note: After this setup, the default python is the one set by pyenv global. Use pyenv and conda to manage environments separately.

### Use pyenv to manage python versions

Pyenv is a python installation manage.

Illustrates the hierarchy that pyenv uses to decide which installed python version to use. From realpython.com

#### Install pyenv

Now let’s install some commonly used python versions:

### Use poetry for packge/environment management

The biggest advantage of it from my point is that unlike conda, it manages virtual environments as well as project dependencies. And we can choose to put the virtual env folder inside the project itself.

#### Install poetry

If you want to install the beat version, e.g. 1.0.0b9 which provides more functionalities:

Make sure $HOME/.poetry/env is added to your $PATH variable. If you are using bash, you can achieve this by adding the following to ~/.bash_profile, and then restart your terminal.

When in question, always refer to poetry’s official tutorial

### Build our project

#### Initial setup

If you want to use poetry in an existing project, run poetry init in your project directory to create a project.toml file

If you want to create a new project and manage it using poetry, run poetry new <name>, and it will create a project with the following structure. Then cd to the project directory.

Since I prefer to put the virtual environment folder inside the project, so let’s change the configuration

Then run poetry install: it will use poetry.lock if it exists or if will solve dependencies with pyproject.toml and do installation accordingly. And poetry will check if it’s currently inside a virtualenv and, if not, will use an existing one or create a brand new one for you to always work isolated from your global Python installation.

#### Add some essential packages for DS development

To install a specific version: poetry add sdk="1.4.4". version constraint syntax: ^1.4 means >=1.4.0 <2.0.0. To downgrade the dependency to a certain version, it is the same; no need to poetry remove sdk first.

#### Use jupyter with poetry

This part is a bit tricky, at least for me at the first time. So basically, Jupyter works with kernels, and will not work out of the box with your virtual environment that poetry created for you. If you wish to work in a jupyter notebook based on your virtual environment, you need to create a kernel for that virtual environment. The code below explains how. The prerequisite is that you have added both jupyter and ipykernel as dependencies in your poetry project.
Then make sure you run the below command within your virtual environment

Say we have a demo script – src/example.py which uses the dependencies we have installed using poetry add. Then to run this script:

If you want to add custom black formatter rules, add those to pyproject.toml. An example is like this:

#### Install pre-commit

The pre-commit framework is a took which implements pre-commit hooks to your project. Defined hooks are run every time you run git commit -m and will prevent the commit if the hooks fail. Install it with brew install pre-commit.
To install black as a pre-commit hook, add the following in a .pre-commit-config.yaml file to the top of your project:

To install the hook, go to the top of your project and run pre-commit install

### Summary

When we have all the above-mentioned tools installed, the following is the general steps we will need at the next time we initialize a DS Python project:

Add black configuration to pyproject.toml

Add .pre-commit-config.yaml file, and run pre-commit install afterwards