Histology in Google Cloud Storage

PHAS supports histology data stored in Google Cloud Storage (GCS) buckets. This works well if you are hosting PHAS in the Google Cloud as well, since the latency between the data and the PHAS server is then relatively low.

This document assumes that you are familiar with GCS and have installed the gcloud CLI and the gsutil utility on your system.

Your data organization should follow the pattern suggested in Organizing Histology Data, but inside of a GCS buckets. For example, an organized dataset in bucket histo_bucket might look like this:

> gsutil ls -R gs://histo_bucket
gs://histo_bucket/S1/histo_raw/S1_slide_001.svs
gs://histo_bucket/S1/histo_raw/S1_slide_001.json
gs://histo_bucket/S1/histo_proc/S1_slide_001/preproc/S1_slide_001_thumbnail.tiff
gs://histo_bucket/S1/histo_proc/S1_slide_001/preproc/S1_slide_001_metadata.json
gs://histo_bucket/S1/histo_raw/S1_slide_002.svs
...

Obtaining and Using a GCP Key

In order for PHAS to access your bucket, you need to create a GCP Service Account and obtain a secure Key

Go to https://console.cloud.google.com/iam-admin/serviceaccounts
Create a service account with read access (grant Storage Object Viewer and Viewer permissions) to your project
Create a json key for your service account. You will download a file like my-project-7881856a8832.json
Save this file into a folder in the PHAS instance directory

For PHAS to be able to connect to the Google Cloud, the environment variable GOOGLE_APPLICATION_CREDENTIALS should be set to point to the full path of the key, e.g., /home/foo/phas/instance/secrets/my-project-7881856a8832.json. This needs to be set in the env.sh file and, if you are using uwsgi, in the phas_uwsgi.ini file. A simple test to see if the connection to GCP is working is to run this code. It should run withour errors and print all of your buckets.

cd /home/foo/phas
source env.sh
python -c "from google.cloud import storage; c=storage.Client(); print([b.name for b in c.list_buckets()])"

Project Descriptor Json

Once you have organized the data in this manner, you can create a PHAS project as described in PICSL Histology Annotation Service Quick Start. For the organization used in the example above, the project descriptor json file would look like this, i.e., identical to the one used for local data organization but with the gs:// prefix to point to your bucket.

{
    "base_url": "gs://histo_bucket",
    "disp_name": "Some Project",
    "desc": "Project demonstrating data organization in PHAS",
    "manifest_mode": "individual_json",
    "url_schema": {
        "pattern": {
            "raw": "{specimen}/histo_raw/{slide_name}.{slide_ext}",
            "thumb": "{specimen}/histo_proc/{slide_name}/preproc/{slide_name}_thumb.tiff",
            "metadata": "{specimen}/histo_proc/{slide_name}/preproc/{slide_name}_metadata.json"
        },
        "raw_slide_ext": [ "svs" ]
    }
}