Private PyPI repository on AWS - part 1
Development teams often find themselves needing to share common code across different projects. This is a common challenge in software development.
Reusing code not only saves time but also ensures consistency and improves software reliability.
Consider the scenario where a development team has a set of utility functions that are used across multiple projects. These functions could be anything from data processing utilities to logging and error handling functions.
The team needs a way to share these functions without duplicating the code in every project.
The Different Approaches
There are several ways to share common code among projects, each with its own set of pros and cons:
- Copy-pasting the common code in every project: This is the simplest approach but can lead to code duplication and inconsistencies. It's also difficult to maintain and update the code across all projects.
- Using
pip install
directly from GitHub repositories: This method is straightforward but can be slow, especially for large projects. It also requires SSH credentials for private repositories, adding an extra layer of complexity. - Using git submodules: This approach allows you to include a repository as a submodule in another repository, making it easy to keep track of changes. However, it adds complexity when managing changes across repositories and can lead to issues with dependency management.
Turning Common Code into Installable Packages
A more scalable and maintainable solution is to turn the common code into installable packages. This approach allows teams to easily share, update, and manage common code across projects.
However, the initial step of setting up a central artifact repository can be time-consuming. This is an upfront cost that the team needs to bear.
AWS CodeArtifact
If you're using AWS, AWS CodeArtifact provides a solution to host a PyPI repository. AWS CodeArtifact is a fully managed artifact repository service that makes it easy to store, publish, and share software packages used in your software development process.
It handles TLS, authentication, and authorization, and uses IAM to control access. The service is serverless, meaning you only pay for what you use.
Creating a Private PyPI repository
To create an AWS CodeArtifact repository, you can use Terraform.
Create a domain
locals {
codeartifact_domain_name = "example"
codeartifact_repository_name = "private-pypi"
codeartifact_readonly_access_arns = [
module.iam_role.arn
module.example_glue_job_orchestrator_role.arn
]
codeartifact_readwrite_access_arns = [
module.example_github_oidc_role.arn
]
codeartifact_packages_arn = "arn:aws:codeartifact:${local.region}:${data.aws_caller_identity.current.account_id}:package/${local.codeartifact_domain_name}/${local.codeartifact_repository_name}/*"
}
resource "aws_codeartifact_domain" "example" {
domain = "example"
}
data "aws_iam_policy_document" "example_domain_policy_document" {
statement {
effect = "Allow"
actions = ["codeartifact:GetAuthorizationToken"]
resources = [aws_codeartifact_domain.example.arn]
principals {
type = "AWS"
identifiers = concat(
local.codeartifact_readonly_access_arns,
local.codeartifact_readwrite_access_arns,
)
}
}
}
resource "aws_codeartifact_domain_permissions_policy" "example_domain_permissions_policy" {
domain = aws_codeartifact_domain.example.domain
policy_document = data.aws_iam_policy_document.example_domain_policy_document.json
}
I have defined two access groups:
- those who can perform pip install from the private PyPI.
- those who can do the above and publish new packages into the private PyPI.
Both groups require an authorization token (HTTP Basic Auth) to connect to the service.
Create a repository within the domain
Then we create the repository and grant the necessary permissions to the two groups at the repository and package levels.
resource "aws_codeartifact_repository" "private_pypi" {
repository = "private-pypi"
description = "Private PyPI repository"
domain = aws_codeartifact_domain.example.domain
}
data "aws_iam_policy_document" "private_pypi_policy_document" {
statement {
effect = "Allow"
actions = ["codeartifact:ReadFromRepository"]
resources = [aws_codeartifact_repository.private_pypi.arn]
principals {
type = "AWS"
identifiers = concat(
local.codeartifact_readonly_access_arns,
local.codeartifact_readwrite_access_arns,
)
}
}
statement {
effect = "Allow"
actions = [
"codeartifact:GetPackageVersionAsset",
"codeartifact:GetPackageVersionReadme",
]
resources = [local.codeartifact_packages_arn]
principals {
type = "AWS"
identifiers = concat(
local.codeartifact_readonly_access_arns,
local.codeartifact_readwrite_access_arns,
)
}
}
statement {
effect = "Allow"
actions = ["codeartifact:PublishPackageVersion"]
resources = [local.codeartifact_packages_arn]
principals {
type = "AWS"
identifiers = local.codeartifact_readwrite_access_arns
}
}
}
resource "aws_codeartifact_repository_permissions_policy" "private_pypi_permissions_policy" {
repository = aws_codeartifact_repository.private_pypi.repository
domain = aws_codeartifact_domain.example.domain
policy_document = data.aws_iam_policy_document.private_pypi_policy_document.json
}
Output the repository endpoint URL
data "aws_codeartifact_repository_endpoint" "private_pypi_endpoint" {
domain = aws_codeartifact_domain.example.domain
repository = aws_codeartifact_repository.private_pypi.repository
format = "pypi"
}
output "private_pypi_endpoint_url" {
description = "Private PyPI repository endpoint URL"
value = data.aws_codeartifact_repository_endpoint.private_pypi_endpoint.repository_endpoint
}
Create the resources with
$ terraform plan
$ terraform apply
and take note of the endpoint URL in the outputs. It looks like this:
https://example-111122223333.d.codeartifact.us-west-2.amazonaws.com/private-pypi
To be continued in part 2, publishing packages to the private PyPI repository.