Poetry (Packaging) in motion

Poetry (Packaging) in motion

Publishing Public Python Packages on PyPi (WHEW!)

Β·

13 min read

Play this article

Intro πŸ‘‹

As you might have guessed, I've decided to write this entry on Poetry (as in the package manager). I'll start right off by saying that I'm no expert by far. While I've been programming for several years in Python, much of my original experience was just with good old Pip alone. This was partly to do with the fact that back in the early 2010's there weren't as many options as there are now. If PyPi is accurate Pipenv was only started back in 2017 and didn't hit maturity until 2020. Poetry only hit version 1 at the end of 2020 as well.

Now here's the interesting part. I was still working on decent-sized Python projects back then, but at that point, I discovered Docker. This magical container system let me throw caution to the wind (sort of), and finally, be out of the "it works on my machine" camp. I was happily tossing requirements.txt files in via pip freeze into my containers (yea, I know...). I did have a few issues with having to pin certain versions of Waitress to keep my apps running properly. It was annoying for sure, but at least Docker let me sidestep the problems that other developers had created.

A quick Side note, the only reason why I was using Waitress was because I was developing on a Windows 7 Pro box. The app I was working on needed to run on a Debian Python 3.7 Docker image, so you can see my conundrum πŸ˜…. Looking back, I really wish I was rocking Linux on my work PC. Yes, I was aware of PyCharm's remote development, Docker on Windows (gross!), and running a Debian Desktop VM, but I didn't want to make waves or headaches. I made due with what I had. At least I had GitLab and PyTest back then to save my bacon.

Fast forward to late 2022, I was still sticking to using Docker and requirements files, if it ain't broke don't fix it. But then I stumbled upon a Github Actions file that I was reviewing for work... what was this Pipenv thing. I started to dig... "Oh, it's like if VirturalEnv's were easy," I said to myself, oversimplifying a tad bit. A glance at a few CookieCutter templates, and a little YAML later, I had something to start with. For sure it was a taste of things to come!

name: Test
on:
  pull_request: {}
  push:
    branches: main
    tags: "*"
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Setup Python
        uses: actions/setup-python@v1
        with:
          python-version: 3.10
      - name: Install dependencies with pipenv
        run: |
          pip install pipenv
          pipenv install --deploy --dev
      - run: pipenv run isort --recursive --diff .
      - run: pipenv run black --check .
      - run: pipenv run flake8
      - run: pipenv run mypy
      - run: pipenv run pytest --cov --cov-fail-under=100

A Package with a purpose πŸ’ͺ

For an excellent short read on the history of Python packages, check out this link

PipEnv

It was shortly after I had dug into PipEnv that I decided to start back up working on some personal Python projects. I had spent the last several months working mostly with Powershell and Terraform for work and I missed Python. It was my digital mother tongue, warm to me like that favorite pair of PJs right out of the dryer. Well... that's a half lie, TECHNICALLY my first programming language was Java (I did play with BASIC when I was a kid as well), but Python was the first language I learned to use WELL. This new project is a simple script to change the color of a porch light bulb to a more holiday-appropriate theme.

Alt text

Cute, right? Microscale and tiny scope, a perfect project to relax after work with. Nothing too crazy...

PipEnv seemed to be filling and improving what I had been doing with a requirements.txt back in the day. It was simple enough and easy to update and add to. Removing packages wasn't always the cleanest. You can run pipenv clean and pipenv update to get your virtualenv in line with your project manifest. I never ran into any problems on the smaller projects I've worked on (with PipEnv), but I was only looking to pin my module versions and nothing more. One of the big shortcomings of PipEnv is that it does NOT handle publishing packages. For that, you need a package builder. After a quick Google and Reddit search, I came up with Flit.

Flit

One thing Flit IS is an easy-to-use build system. Churning out installable packages in the name of the game with it. But there's a catch... Flit also doesn’t help you manage dependencies: you have to add them to pyproject.toml by hand. For small projects, this isn't too bad, but the manual steps are tedious for sure. Below is a nice simple example of a Flit project file.

The TOML file will typically read as follows:

[build-system]
requires = ["flit_core >=3.2,<4"]
build-backend = "flit_core.buildapi"

[project]
name = "APP_NAME_HERE"
authors = [{name = "YourNameHere", email = "username@gmail.com"}]
dependencies = [
    "flit_core >=3.8.0",
    "requests",
]
readme = "README.md"
license = {file = "LICENSE"}
classifiers = ["License :: OSI Approved :: MIT License"]
dynamic = ["version", "description"]

[project.urls]
Home = "https://github.com/username/APP_NAME_HERE

A simple flit build and flit publish and you're off to the races more or less. I would normally dig in further on Flit, but at this point, I saw the limitations of having to use two tools that would not stay in sync with each other easily. This annoyed the heck out of me. I wanted something to "just work" and not to fight with, especially during my free time after work. Yeah I could script my way out of it with some Make / TaskFile commands as I've seen some do, but I discovered Poetry... and it was beautiful. And on that note, onto the main event (sorry Flit)...

Poetry

I had now (after some digging) landed on Poetry. Poetry for all practical purposes combines BOTH building/packaging along with dependency management. Even if you never plan on publishing anything to PyPi, Poetry offers better dependency resolution out of the box. I did a few small tests to compare the two. I found that PipEnv worked well enough when I was not doing any version pinning and installed all my dev packages properly, but I would be hesitant to use it for any larger, more complex projects due to the amount of flack it's gotten online. YMMV.

Poetry (like pipenv install --dev) supports a superset of required packages, used just for dev environments/testing, using poetry install to install everything listed in your manifest, or to only install what is needed to run poetry install --no-root --without dev.

The pyproject.toml looks a little different compared to Flit but is easy enough to make sense of. A key difference is that your version is defined in the pyproject.toml file, not via the version set in your module. This is because Poetry does dependency management, in addition to packaging.

The TOML file will typically read as follows for Poetry:

[tool.poetry]
name = "APP_NAME_HERE"
version = "1.0.0"
description = "Fill in this"
authors = ["You <you@gmail.com>"]
license = "MIT License"
readme = "README.md"
homepage = "https://github.com/you/repo"
keywords = ["neat", "app"]
classifiers = ["Development Status :: 5 - Production/Stable",
 "Environment :: No Input/Output (Daemon)",
 "Operating System :: OS Independent",
]

[tool.poetry.dependencies]
python = "^3.10"

[tool.poetry.group.dev.dependencies]
setuptools = "*"
pytest = "*"
pytest-cov = "*"
python-dotenv = "*"
pytest-pycharm = "*"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

The PyProject.toml Advantage πŸ€“

Poetry and its associated PyProject file shine when things start to get complicated, as all your project settings can be consolidated into one handy place.

For example, my project manifest includes some additional settings for PyTest and some handy linting. Everything is in one nice place! (Feel free to take some inspiration from these settings)

[tool.pytest.ini_options]
pythonpath = ["."]
testpaths = "test/"
log_cli = true
log_cli_level = "DEBUG"
log_cli_format = "[%(asctime)s] [%(levelname)8s] --- %(message)s (%(filename)s:%(funcName)s():%(lineno)s)"
log_cli_date_format = "%Y-%m-%d %H:%M:%S"

[tool.black]
line-length = 120
skip-string-normalization = true

[tool.flake8]
exclude =['./tests']
max-line-length = 120
count = false
statistics = true
diff = true
format = "pylint"


[tool.isort]
multi_line_output = 3
include_trailing_comma = "True"
force_grid_wrap = 0
use_parentheses = "True"
line_length = 120
profile = "black"

[tool.tartufo]
repo-path = "."
regex = true
entropy = true
exclude-path-patterns = [
 {path-pattern = 'poetry\.lock'},
 {path-pattern = 'pyproject\.toml'},
 # To not have to escape `\` in regexes, use single quoted
 # TOML 'literal strings'
 {path-pattern = 'docs/source/(.*)\.rst'},
]
exclude-entropy-patterns = [
    {path-pattern = '\.github/workflows/.*\.yml', pattern = 'uses: .*@[a-zA-Z0-9]{40}', reason = 'GitHub Actions'},
    {path-pattern = 'poetry\.lock', pattern = '.'},
    {path-pattern = 'Pipfile\.lock', pattern = '.'},
    {path-pattern = 'README\.md', pattern = '.'},
]

Versioning

Another issue you may run into is keeping your project version in sync with your Git tags (which tend to be used for releases).

There is an elegant solution to this, provided you are using Poetry for your build/publish workflow

  1. Update pyproject.toml to have a generic version placeholder.

    [tool.poetry]
    version = "0.0.0"
    
  2. Update your release script to fetch the git version before building the package artifacts.

    poetry version $(git describe --tags --abbrev=0)
    poetry build
    

With this setup, poetry will fetch and use the latest git tag as its version. You can even integrate this as a GitHub Action.

name: Publish

on:
  release:
    types: [created]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.x'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install poetry
    - name: Build and publish
      run: |
        poetry version $(git describe --tags --abbrev=0)
        poetry build
        poetry publish --username ${{ secrets.PYPI_USERNAME }} --password ${{ secrets.PYPI_PASSWORD }}

Folder Structure πŸ“

Well, part of my digital journey was digging in on the question of folder structure in projects. Flat vs module vs src. The answer to which to choose is, it depends on one big question, will this project be used as a library or not? If the project will never be a library, then you can just go with the flat layout.

For a really good deeper read on folder structure check this article out.

Flat

β”œβ”€β”€ README.md
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ app.py
└── test_app.py

I'd recommend against this for anything but the simplest projects. Mixing source code with tests and non-source code files is just going to get messy fast. Don't do this.

Module

β”œβ”€β”€ README.md
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ app
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── app.py
└── tests/
    └── test_app.py

This is easy enough to manage, you can still have multiple folders for each different module. I would recommend you keep your tests separate from your module code, to avoid having your tests mixed when you build/publish. The module-based approach also allows you to include multiple packages in your namespace without having to have lots of path statements. A nice compromise between chaos and strict structure.

Src

This allows for import purity and having multiple modules in one source code folder. By moving your code into /src, you prevent folks from being able to run your project from the root directory. Simply put your Python path and working directory gets moved one layer down. To test your code properly, you will need to install it. This keeps things consistent with how an end user would run the code and related imports.

PyCharm and other IDEs will also let you pick your source root for running your code and tests. This lets you leverage src without too much headache. Poetry and PyTest as well handle the src structure well too for testing and building.

Testing builds πŸ—‘οΈπŸ”₯

If you do any builds, please do them using Docker. If you don't do an isolated test, at least at some point in your build, you could potentially run into packages not being representative of what the end user will see. I would recommend you do something along the lines of the example below in your Dockerfile.

This will let you test your build, while still keeping your tests bundled in. If everything is in line, the build should pass. While this is not quite as nice as having everything bundled in one git checkout, it should keep you out of hot water, at least for smaller projects.

Both module and src layouts keep your project builds clean by only including the relevant code and not everything in the repository when your packaging tools run.

FROM python:3.10-slim AS base

# Setup env
ENV LANG=C.UTF-8
ENV LC_ALL=C.UTF-8
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONFAULTHANDLER=1
ENV PYTHONHASHSEED=random
ENV PYTHONUNBUFFERED=1

#FROM base AS python-deps
WORKDIR /app

# Set timezone
ENV TZ=America/New_York
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

# Install python dependencies
RUN apt-get update && apt-get install -y --no-install-recommends gcc git tree

# Setup Poetry
RUN pip install poetry
# Skip venvs for Docker
RUN poetry config virtualenvs.create false

# Install application into container
# Don't forget to check the .dockerignore
# Install ALL packages
RUN poetry init
RUN poetry add pytest
RUN poetry add python-dotenv
RUN poetry add git+https://github.com/username/repo.git

COPY tests tests
RUN tree
# Create and switch to a new user
RUN useradd --create-home appuser
USER appuser
# Run the executable
CMD [ "pytest" ]

Badges (they're legit) πŸ’―

Badges

I would highly recommend adding (a few!) badges to any repository that you plan on publishing. You can get some great badges from https://shields.io/ along with the info on how to actually generate them. If your repository is public, this should be easy enough. I would say to avoid spamming a ton and having your README looks like a technicolor dreamland. Just having things like package health, SourceRank and dependencies can help inspire faith in a smaller project. Having that βœ… can set you apart by showing that your code works! πŸ˜„

Pulling packages from Git directly 🌐

A very neat feature that is included in most package managers is the ability to pull packages with correct manifests and install them direct without the need to publish the source on PyPi. I'm including this because it will hopefully come in handy for folks not wanting to publish their private repositories to PyPi or just looking to keep everything on GitHub for very small projects.

Package ManagerManifest FileCode
Piprequirements.txtpackagename @ git+https://github.com/ACCOUNT_NAME/PROJECT_NAME.git@master
PipEnvPipfilepackagename = {git = "https://github.com/ACCOUNT_NAME/PROJECT_NAME.git", editable = true, ref = "master"}
PoetryPyProject.tomlpackagename = {git = "https://github.com/ACCOUNT_NAME/PROJECT_NAME.git"}

To summarize the above in a short little chart, if all you need to do is upgrade from requirements.txt, give PipEnv a try. It's very easy to start with (as I did). If you think you will want to give the packaging a try, like a consolidated project config, or if you want to pick "just one", go with Poetry. It's easy to start with and grow into and there is a reason why it has 23,000 stars on GitHub. I would also say going with a src-less module-based structure is both easy to use and easy to publish.

PipEnv is nice, but Poetry is nicer

ToolPackage ManagementEnvironment managementApp Publishing
PipYesNoNo
PipEnvYesYesNo
FlitNoNoYes
PoetryYesYesYes

Repo Compare

Full disclosure, I did not review Conda or Hatch fully. Not that there is anything explicitly wrong with either of them. Conda is too specific to the scientific community for my general taste. Hatch seems to go well with Conda and also uses the PyProject manifest as well. It's nice that it gives you several built in tools, similar to commit hooks, but I tend to like to roll my own via a Taskfile and run them with Poetry.

PDM has definitely caught my eye, and I will be trying to find the time down the road to look into it further once it matures more. I have a rule of thumb, if my IDE doesn't support the build system fully (Like Poetry and PipEnv), I'm going to have a hard time adopting it.

If anyone reading this would like to weigh in, please feel free to either comment here or email me. 🀘

Β