Miscellaneous scripts and documenation from the DNAstack bioinformatics team.
Recursively build Docker images in a given directory.
Each target image is defined via the presence of a build.env
file, which is used to specify the name and version tag for the corresponding Docker image. It must contain at minimum the following variables:
IMAGE_NAME
IMAGE_TAG
All variables defined in the build.env
file will be made available as build arguments during Docker image build.
The IMAGE_TAG
variable can be built using other variables defined in the build.env
file, as long as those other variables are defined before the IMAGE_TAG
definition. For example, the following IMAGE_TAG
would be set to 0.7.8_1.15
:
# Tool verisons
BWA_VERSION=0.7.8
SAMTOOLS_VERSION=1.15
# Image info
IMAGE_NAME=bwa_samtools
IMAGE_TAG=${BWA_VERSION}_${SAMTOOLS_VERSION}
The Dockerfile
corresponding to a build.env
must either:
- Be named
Dockerfile
and exist in the same directory as thebuild.env
file, or - Be specified using the
DOCKERFILE
variable in thebuild.env
file (e.g. for image builds that reuse a common Dockerfile). The path to the Dockerfile should be relative to the directory containing thebuild.env
file.
These variables have special meanings when the Docker image is being built. † specifies that the variable is required.
IMAGE_NAME
†: specifies the name of the built imageIMAGE_TAG
†: specifies the tag of the built imageNOBUILD
: When set totrue
, skip building this imageDOCKERFILE
: Specify an alternate path to a Dockerfile, relative to thebuild.env
file. If left undefined, the Dockerfile must be namedDockerfile
and be present at the same directory level as the correspondingbuild.env
file.CONDA_ENVIRONMENT_TEMPLATE
: For conda-based images, set this to the path (relative to thebuild.env
file) to the conda environment template file. This file may have variables set in place of tool versions and will be populated with the variables frombuild.env
at build time.
# build Docker images found in a directory (and its subdirectories)
./build_docker_images -d docker/some/path
# build all Docker images defined in the `docker` directory using the provided container registry
./build_docker_images -d docker -c dnastack
# build and push all Docker images defined in the `docker` directory to the provided container registry
./build_docker_images -d docker -c dnastack -p
# build an image targetting AMD-64 from an ARM-based Mac (requires [docker buildx](https://github.com/docker/buildx))
./build_docker_images -d docker -x
# Build an image with global build args; useful for not committing secrets to git, but getting them into the built image
./build_docker_images -d docker -b ARG1=value1,ARG2=value2
chunk_bam -b bam_file [-n n_chunks]
Chunk a BAM into n_chunks. Splits the BAM by BGZF block. Defaults to 6 output chunks.
Output chunks will be materialized in the pwd
, named ${BAM_BASENAME}.chunk_${chunk_index}.bam
.
# Chunk an input bam into 6 chunks (the default)
./chunk_bam -b sampleA.bam
## Output
├── sampleA.chunk_0.bam
├── sampleA.chunk_1.bam
├── sampleA.chunk_2.bam
├── sampleA.chunk_3.bam
├── sampleA.chunk_4.bam
└── sampleA.chunk_5.bam
# Chunk an input bam into 2 chunks
./chunk_bam -b sampleA.bam -n 2
## Output
├── sampleA.chunk_0.bam
└── sampleA.chunk_1.bam
Get the status for recently run workflows. Interacts with Cromwell running in server mode. Set and export CROMWELL_URL
to change the location where your Cromwell is running in server mode (defaults to localhost:8000
).
The output columns are workflow name, workflow ID, and workflow status.
# Use a different Cromwell URL (default: localhost:8000)
export CROMWELL_URL=localhost:9000
# Get information about the last 10 workflows to run
getIDs
# Get information about the last 30 workflows to run
getIDs 30
# Get information about the last 5 workflows that are currently running (there may not be 5 running currently)
getIDs -r 5
Get metadata for a Cromwell workflow run. Interacts with Cromwell running in server mode. Set and export CROMWELL_URL
to change the location where your Cromwell is running in server mode (defaults to localhost:8000
).
# Use a different Cromwell URL (default: localhost:8000)
export CROMWELL_URL=localhost:9000
# set this to a specific workflow ID you want info about, e.g. check the output of getIDs
workflow_id=1c0c4711-c5d6-46dc-b176-c52fb612c1c6
# Get information about a workflow
getMeta ${workflow_id}
# Save the metadata from a workflow to a file (named ${workflow_id}.meta.json)
getMeta -s ${workflow_id}
# Get raw metadata; retrieve workflow inputs using `jq`
getMeta -r ${workflow_id} | jq '.inputs'
# Get workflow status
getMeta -r ${workflow_id} | jq '.status'
# List task calls
getMeta -r ${workflow_id} | jq '.calls | keys'
# Get information about a specific call
# Note that the call key must be double quoted since it contains a `.` character
# The format of call keys is workflow_name.task_name
getMeta -r ${workflow_id} | jq '.calls."hello_world.say_hello"'
# Get the location of the stderr file for a specific call
getMeta -r ${workflow_id} | jq '.calls."hello_world.say_hello"[0].stderr'