Example tasks

Note

Some of the tasks described below are only available if you self-host Carto-Lab Docker. These tasks require knowledge with Docker and WSL.

Updating packages and custom envs

If you need to change/update packages in worker_env, you have two main options:

1. Temporary package installs

The easiest way is to write a temporary package install to the base worker_env directly in a jupyter cell:

!/opt/conda/envs/worker_env/bin/python -m pip install geoplot alphashape

You can also use the terminal and install additional packages with conda:

  • open a terminal in Jupyter Lab, type bash
  • type conda activate worker_env
  • install your dependencies (e.g. conda install hdbscan)

Note

The worker_env will be reset once the Carto-Lab Docker container is restarted.

2. Persistent package installs

There are several options to do this. See the Jupyter introduction for a quick way to create a new environment.

If you make use of custom environments often, you may want to add a persistent bind mount to Carto-Lab Docker configuration, where the bind mount points to a location outside of the container. This folder can be used inside the container to store persistent information such as custom environments.

Create your own environment in a bind-mount and install the IPKernel

You can install additional environments to /env folder, which is bind-mounted to ${HOME}/envs (by default) using the environment variable CONDA_ENVS, see .env and the docker-compose.yml.

Optionally update CONDA_ENVS in .env with a bind-path to your needs.

In JupyterLab, install a new environment with the prefix:

  1. Open a terminal in Jupyter Lab, type bash
  2. Create an envrionment using conda
  3. Make sure to install ipykernel as a package (below we use pip numpy pandas as example packages)
conda create \
    --prefix /envs/example_env \
    --channel conda-forge \
    pip numpy pandas ipykernel
conda activate /envs/example_env
  1. Afterwards, link the env kernel to Jupyter/ IPython
  2. This only needs to be done once
/envs/example_env/bin/python \
    -m ipykernel install --user --name=example_env
conda deactivate
  1. Refresh with F5, open a notebook and select the new environment

Warning

  • Every time you reset/pull new versions of CartoLab-Docker, you will need to re-link kernels
  • You are responsible for upgrading or backing up your environment, it is not maintained within the Docker container
  • Reproducibility is not guaranteed anymore. You need your own workflow for sharing your dependency setup with others.

Example: Create an environment with a specific R version

Prerequisite: You are using the rstudio Tag for Carto-Lab Docker.


1. Open a new terminal in your Jupyter web interface

2. Activate the r_env

conda activate r_env

3. Get the current R-version

R --version

R version 4.4.1 (2024-06-14) -- "Race for Your Life"


4. Create a new R-Env with a custom R-Kernel version

Below, the specific version 4.2.3 is specified:

conda deactivate
conda create \
    --prefix /envs/custom_r_env \
    --channel conda-forge \
    r-base=4.2.3
conda activate /envs/custom_r_env
R --version

Example output:

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"


5. Link the custom env kernel to Jupyter

First, install the R kernel package from within R:

R
install.packages('IRkernel')

Exit the R session with CTRL+D.

Now, link the new custom R kernel with a Jupyter kernelspec:

# Add Carto-Lab jupyter's bin path to the end of PATH
export PATH="$PATH:/opt/conda/envs/jupyter_env/bin"

# Run the installspec command
Rscript -e "IRkernel::installspec(name='custom_r_env', displayname='Custom R', user=TRUE)"

# Deactivate R environment
conda deactivate

6. Verify

Refresh your browser with F5.

Create a new Jupyter notebook and select the new Custom R kernel.

Custom R Env

Start working with your custom R env!


7. Additional Steps

After each Carto-Lab Docker update, the custom kernel environment may need to be re-linked.

You can do this by including the following commands in an R cell at the top of your notebooks:

# Extend PATH so IRkernel can find jupyter
Sys.setenv(PATH = paste(Sys.getenv("PATH"), "/opt/conda/envs/jupyter_env/bin", sep = ":"))

# Link the kernel
IRkernel::installspec(name = "custom_r_env", displayname = "Custom R", user = TRUE)

8. Backup the Environment for Reproducibility

To preserve installed package versions, you can back up the environment using Conda:

In an R cell, run:

system("conda env export > custom_r_env.yml")

This generates a YAML file (custom_r_env.yml) that includes:

  • All Conda packages (including r-* R packages)
  • Version constraints
  • The name of the environment
  • The channels used to install the packages

To restore the environment, open a terminal and run:

conda env create -f custom_r_env.yml

This will recreate the environment with similar versions.


For Exact Reproducibility:

If you require full reproducibility down to the exact build hash (e.g. for archival or deployment), use:

system("conda list --explicit > custom_r_env.txt")

To restore:

conda create --name restored_env --file custom_r_env.txt

This installs exact builds (requires the original channels to still be available).

Tip

Add your custom_r_env.txt and custom_r_env.yml to git, to track any changes and version your dependencies.

Further options for package installation

For specific purposes, a number of alternatives are possible.

Multi-stage Dockerfile

If you need specific dependencies and always want to get the most recent updates, create a chained Dockerfile off this image. Have a look how we implemented chaining with the mapnik/Dockerfile:

ARG VERSION=latest

FROM registry.gitlab.vgiscience.org/lbsn/tools/jupyterlab:$VERSION

ENV PYTHON_BINDINGS=" \
    autoconf \
    apache2-dev \
    libtool \
    libxml2-dev \
    libbz2-dev \
    libgeos-dev \
    libgeos++-dev \
    gdal-bin \
    python3 \
    python3-pip \
    python3-mapnik \
    python3-psycopg2 \
    python3-yaml"

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        $PYTHON_BINDINGS \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

RUN git clone --depth 1 \
        https://gitlab.vgiscience.de/ad/mapnik_cli.git \
        /mapnik_cli \
    && /usr/bin/python3 -m pip config set global.break-system-packages true \
    && /usr/bin/python3 -m pip install --no-cache-dir \
        --no-dependencies --editable /mapnik_cli

Persistent modification of worker_env

  • edit the environment.yml:
  • and start image with docker compose -f docker-compose.build.yml build --no-cache && docker compose up -d --force-recreate
  • make sure you're running your local image, not the remote

Add your own environment.yml

In .env, update the link to use when building worker_env, e.g:

ENVIRONMENT_FILE=envs/environment_custom.yml

Afterwards, rebuild the Docker container with docker compose -f docker-compose.build.yml build.

  • Make sure that the path is within the repository

  • Use a Symlink/Hardlink to include environment.yml's from elsewhere

  • The env/ directory is excluded from git through .gitignore

Add selenium and webdriver

The base container is constructed lightweight and comes without a webdriver.

If you need a webdriver (e.g. for svg output in Bokeh), either update the Dockerfile or temporarily install Selenium and Chromedriver (e.g.).

Manual Steps: Chrome

  1. Install Selenium
conda activate worker_env
conda install selenium webdriver-manager -c conda-forge
  1. Install Chrome
apt-get update && apt-get install -y gnupg2 zip wget
curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
echo "deb [arch=amd64]  http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
apt-get -y update
apt-get -y install google-chrome-stable
  1. Optional: Install Chromedriver

This is an optional step, since webdriver_manager will automatically install the matching Chromedriver (see below).

Get the Chrome version and install the matching Chromedriver

google-chrome --version

Google Chrome 104.0.5112.101

  • go to
  • click on matching version:
    • If you are using Chrome version 104, please download ChromeDriver 104.0.5112.79

  • copy path to chromedriver_linux64.zip
cd /tmp/
wget https://chromedriver.storage.googleapis.com/104.0.5112.79/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
mv chromedriver /usr/bin/chromedriver
chown root:root /usr/bin/chromedriver
chmod +x /usr/bin/chromedriver
  1. Use in Jupyter
from bokeh.io import export_svgs
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument("--no-sandbox")
options.add_argument("--window-size=2000x2000")
options.add_argument('--disable-dev-shm-usage')        

service = Service(ChromeDriverManager().install())
webdriver = webdriver.Chrome(service=service, options=options)

# Export svg in Bokeh/Holoviews
p =  hv.render(my_layers, backend='bokeh')
p.output_backend = "svg"
export_svgs(p, 
    filename=output / 'svg' / 'graphic.svg',
    webdriver=webdriver)

Note that --disable-dev-shm-usage is necessary for Chrome to work inside Docker.