Important Material:

  • GitHub Repository: Contains an example algorithm, JSON schemas for the individual data sources, and an extensive README to help you get started.
  • HANCOCK download page: Here you can download the training data for this challenge.

Data

The data in the challenge is slightly adjusted in comparison to the available data on our download page. This is due to the fact that Grand Challenge does not support certain data formats. In the example-algorithm we already prepared the reading of the data sources for you, but nevertheless there are some things to consider.

Clinical-Data

The clinical data can be downloaded as part of the Structured Data. During the challenge you only receive a dictionary JSON. This is different in comparison in the test data, here all patients are in an array format. In the challenge the clinical data will look like this:

{
    "patient_id": "001",
    "year_of_initial_diagnosis": 2015,
    "age_at_initial_diagnosis": 65,
    "sex": "male",
    "smoking_status": "former",
    "primarily_metastasis": "no",
    "days_to_last_information": 2258,
    "first_treatment_intent": "curative",
    "first_treatment_modality": "local surgery",
    "days_to_first_treatment": 28,
    "adjuvant_treatment_intent": "curative",
    "adjuvant_radiotherapy": "yes",
    "adjuvant_radiotherapy_modality": "percutaneous radiotherapy",
    "adjuvant_systemic_therapy": "yes",
    "adjuvant_systemic_therapy_modality": "fluorouracil + cisplatin",
    "adjuvant_radiochemotherapy": "yes"
}

[!warning] This means you will not have access to the following columns: - survival_status, survival_status_with_cause - recurrence, days_to_recurrence - progress_{n}, days_to_progress_{n} - metastasis_{n}_locations, days_to_metastasis_{n}

Please consider this for training your model.

Pathological Data

The pathological data can be downloaded as part of the Structured Data. During the challenge you only receive a dictionary JSON. This is different in comparison to the test data, here all patients are in an array format. In the challenge the pathological data will look like this:

{
    "patient_id": "001",
    "primary_tumor_site": "Hypopharynx",
    "pT_stage": "pT4a",
    "pN_stage": "pN2b",
    "grading": "G3",
    "hpv_association_p16": "not_tested",
    "number_of_positive_lymph_nodes": 3.0,
    "number_of_resected_lymph_nodes": 61,
    "perinodal_invasion": "yes",
    "lymphovascular_invasion_L": "yes",
    "vascular_invasion_V": "no",
    "perineural_invasion_Pn": "no",
    "resection_status": "R0",
    "resection_status_carcinoma_in_situ": "CIS Absent",
    "carcinoma_in_situ": "no",
    "closest_resection_margin_in_cm": "<0.1",
    "histologic_type": "SCC_Basaloid",
    "infiltration_depth_in_mm": 19.0
}

Blood Data

The blood data can be downloaded as part of the Structured Data. The blood data is in array format. Each entry in the array will correspond to a single blood value. Each blood value then has the columns: - patient_id: Identifier of the patient - value: The measured value - analyte_name: Short name of the analyte - LOINC_code: Logical Observation Identifiers Names and Codes - LOINC_name: LOINC Long Common Name - group: Analyte group - days_before_first_treatment: Number of days before the surgery, i.e. 0 corresponds to the surgery day and 1 corresponds to one day before the surgery

In the challenge the pathological data will look like this:

[
  {
    "patient_id": "001",
    "value": 0.8899999857,
    "unit": null,
    "analyte_name": "INR",
    "LOINC_code": "34714-6",
    "LOINC_name": "INR in Blood by Coagulation assay",
    "group": "Routine",
    "days_before_first_treatment": 0
  },
  {
    "patient_id": "001",
    "value": 1.2599999905,
    "unit": "x10^3/µl",
    "analyte_name": "Lymphocytes",
    "LOINC_code": "26474-7",
    "LOINC_name": "Lymphocytes [#/volume] in Blood",
    "group": "Hematology",
    "days_before_first_treatment": 0
  },
  {
    "patient_id": "001",
    "value": 0.8999999762,
    "unit": "%",
    "analyte_name": "Eosinophils %",
    "LOINC_code": "26450-7",
    "LOINC_name": "Eosinophils/100 leukocytes in Blood",
    "group": "Hematology",
    "days_before_first_treatment": 0
  }
]

Text Data in English

The text data in English can be downloaded as part of the Text Data. In the challenge this will be a single JSON file with dictionary structure. The available columns are: - patient_id: Identifier of the patient - history: Medical histories - report: Report of the surgery - description: Short description of the surgery

In the challenge the text data will look like this:

{
    "patient_id": "001",
    "history": "In the patient, a G3-differentiated cT3 hypopharyngeal laryngeal carcinoma of the right side was histologically confirmed <2015>. Clinically as well as sonographically and computed tomography cT3 cN2b. In addition, nodular goiter on both sides with nodes > 2 cm each. All common tumor options (surgery versus primary radiochemotherapy) were discussed in detail with the patient in advance. There is now an indication for surgical treatment in the form of a laryngectomy with partial pharyngectomy, neck dissection on both sides and hemi-thyroidectomy. The patient had ample opportunity to ask questions about the procedure before the operation.",
    "report": "After active patient identification, the patient is brought into the operating theater. Carrying out the team time-out. Introductory consultation with the anesthesia department. Induction of anesthesia and orotracheal intubation of the patient. Initial positioning of the patient by the surgeon. First, a new panendoscopy is performed to plan the surgical procedure. Insertion of the mouth guard and insertion of the size C small bore tube. This confirms the size expansion described above. Subsequent placement of a nasogastric feeding tube without any problems. Positioning of the patient in head reclination. Superficial skin disinfection. Infiltration anesthesia with a total of 15 ml xylocaine with adrenaline added in the area of the incision for the planned apron flap. Subsequent ablation of the surgical area and sterile draping. First mark the planned incision. Skin incision and lifting of a broad-based apron flap, with strictly subplatysmal preparation. Insertion of the chain dog. First start of neck dissection on the right side. Exposure of the sternocleidomastoid muscle and the omohyoid muscle as the caudal border and the posterior digastric venter muscle as the cranial border. Expose the capsule of the submandibular gland. Turning to the cervical vascular sheath. Exposure of the internal jugular vein and the venous angle including protection of the facial vein. Exposure of the common carotid artery, the bifurcation and the internal and external carotid artery. Exposure of the accessorius nerve and the hypoglossal nerve. The cervical artery cannot be spared intraoperatively. An extensive metastatic conglomerate can now be seen in levels II and III up to just before level IV, which appears to be firmly attached to the surrounding area. The soft tissue metastasis can only be separated from the internal jugular vein with difficulty. The metastatic conglomerate extends very far medially and reaches the vagus nerve. Therefore, first relocation, neurolysis and re-embedding of the vagus nerve. This can be spared. Intraoperatively, the border cord appears to be firmly attached to the metastatic conglomerate. First develop the lateral neck preparation, whereby the cervical plexus branches can largely be spared here. Turn to the medial neck preparation. The hypoglossal nerve is also exposed here. Displacement, neurolysis and re-embedding of the hypoglossal nerve. Protection of the nerve. Successive development of the medial neck preparation. Moving on to the left side: the same procedure in principle here. First expose the sternocleidomastoid muscle and the omohyoid muscle as the caudal border and the posterior digastric venter muscle as the cranial border. Turning to the cervical vascular sheath. Exposure of the internal jugular vein from the caudal to the cranial border. Exposure of the facial vein and protection of the same. Exposure of the common carotid artery, the bifurcation and the internal and external carotid artery. Exposure of the accessorius nerve, displacement, neurolysis and re-embedding of the same. Protection of the nerve. Development of the lateral neck preparation while sparing the cervical plexus branches. Exposure of the submandibular gland capsule and the hypoglossal nerve. Displacement, neurolysis and re-embedding of the nerve. Development of the medial neck preparation. Transition to visualization of the hypoid. Subsequent separation of the infrahyoid muscles using the electric knife. The sternohyoid muscle is cut caudally. Then supply the upper laryngeal bundle on both sides. Exposure of the thyroid cartilage horn on the left side. Incision of the perichondrium and release of the left-sided piriform sinus with the raspatory as well as with the pedicle swab. Subsequent exposure of the trachea and the thyroid isthmus. Dissection of the isthmus and repositioning of the isthmus. Exposure, displacement, neurolysis and re-embedding of the vagus nerve. This shows the large thyroid nodus described by sonography and computed tomography. Due to the prominent left-sided thyroid lobe and the slightly suspicious intraoperative appearance of the nodule, the decision was made to perform a hemithyroidectomy on the left side. Care is taken to preserve the epithelial bodies. The ventral trachea is then exposed. Proceed to the tracheotomy. Enter below the 1st tracheal clasp and transfer to an LE tube. Then open the pharynx infrahyoidally in the median line. Expose the upper edge of the epiglottis. Grasp the epiglottis and dislocate it through a relatively small pharyngotomy. You now have a good view of the extent of the tumor. First start resection in the area of the left-sided aryepiglottic fold. Mucosa-sparing incision up to just before the arytenoid cartilage. Then widen the incision caudally and in the area of the postcricoid. Hemostasis of several small bleedings from the laryngopharyngeal plexus. Subsequent counter-preparation from the left side. Here, too, the incision is made in the area of the aryepiglottic fold, whereby this is extended into the pharynx due to the tumor extension into the right-sided piriform sinus. In between, further release of the laryngeal skeleton with the attached soft tissue cuff. Deposition of the trachea at the level of the 1st tracheal cartilage and resection of the larynx and the right-sided piriform sinus in toto. The specimen is sent in thread-marked for definitive histology. Meticulous inspection of the tumor specimen and subsequent removal of corresponding marginal samples, which are sent for intraoperative frozen section diagnostics. In the meantime, meticulous hemostasis. Announcement of the intraoperative frozen section diagnosis: This shows a tumor-free resection margin, which is why an R0 situation can be assumed intraoperatively. Perform a rather restrained myotomy in the area of the upper esophageal sphincter so as not to provoke reflux problems postoperatively. The placement of a Provox prosthesis is deliberately avoided for the time being due to the relatively narrow remaining pharyngeal tube. Dissection of the left-sided sternal part of the sternocleidomastoid muscle. This results in slightly more severe venous bleeding, which can be stopped with the aid of three inversion stitches. Then perform the inverting pharyngeal suture with Vicryl 3.0 RB-1 using the single-button suture technique. This is performed in a T-shaped configuration. Extremely meticulous work is carried out at the base of the tongue and caudally at the level of the stoma in order to further reduce the risk of fistula formation at these predilection sites. Grasping the lateral musculature and suturing it using a continuous overlocking suture technique in the sense of a second layer. Application of a layer of TachoSil. Grasping of the remaining peripheral soft tissue, which is additionally stitched over the pharyngeal suture as a fourth layer. The sternohyoid muscle, which has previously been folded caudally, is also stitched over the wound bed. Then irrigate the wound with H2O2 and Ringer's solution. Insertion of two 10-gauge Redon drains and circular epithelialization of the tracheostoma. Subcutaneous suturing with Vicryl 4.0 and skin suturing with Ethilon 5.0. Application of a pressure dressing and transfer of the patient to a 10-gauge high-volume low-pressure cannula. Completion of the procedure without complications. Final consultation with the anesthesia department. The patient received 3 g Unacid i.v. intraoperatively. Antibiotics should be continued postoperatively for five days.",
    "description": "TU resection, Neck dissection bilateral\n"
}

This is only partially available, therefore the text data can be null. Please consider this during training.

Pre-Extracted Tumor Center Cores from Tissue Microarrays

[!NOTE] This is a 196 GB file. Make sure you have the resources available for this download.

The pre-extracted Tumor Center Cores from Tissue Micrarrays (TMA Cores) can be downloaded here. But please make sure before you download that you have at least 196 GB of free storage and, depending on your internet connection, also some time available.

In the challenge, the TMA Cores are in MHA format. But in the example-algorithm, we already extract the cores with the Python library SimpleITK into an array format.

There are at most 8 different stains available. During the challenge, each staining is available at most 2 times. This totals to a maximum of 16 TMA Cores. The potentially available staining are (for): - Hematoxylin and Eosin (HE) - Cluster of Differentiation (CD) 3 - CD8 - CD56 - CD68 - CD163 - Major Histocompatibility Complex class I (MHC-1) - Programmed death-ligand 1 (PD-L1)

[!warning] For each patient, there will be at most 16 cores, but there might be cases where they are not entirely available. In these cases, you will receive an image with only a single pixel. Please consider this during training.

Pre-Extracted Embeddings of Whole Slide Images generated with UNI

The pre-extracted embeddings of the primary tumor and lymph node whole slide images can be downloaded here. It contains the UNI embeddings on patch-level for the Whole Slide Images with a patch size of 2048 x 2048 pixels. If you do not know UNI yet, we recommend that you read this paper.

The HP5 container you can download on the HANCOCK website contains, as well as the JSON file that you will be provided during the challenge, the following keys: - features (list[list[float]]): The features from all the patches that were segmented from the WSI. Every array in the base array relates to the embeddings of a single patch. - coords (list[list[tuple[int, int]]]): Contains for each patch the coordinates where they have been extracted from on the original whole slide image.

The JSON for the challenge will be constructed like this:

{
    "features": [
        [  
            -2.6923696994781494,
            2.5603489875793457,
            -0.4249601662158966,
            -1.039771556854248,
            0.43003350496292114,
            0.7976626753807068,
            0.8344122171401978,
            -1.220271348953247,
            -0.5300487875938416,
            0.1353420913219452,
            1.3537638187408447,
            1.3043394088745117
        ]
    ],
    "coords": [
        [
            10000,
            115728
        ]
    ]
}

[!warning] The lymph node slides will not be available for all patients in the test set. In these cases, the JSON file will have empty entries for the features and coords keys.

Download all Data

We received feedback that the download process on our web page can be frustrating. To make your life a little easier, you can use this bash script to curl the necessary data all at once. Please be aware that this script will not check your available storage and does not provide any error handling.

base_download_url="https://data.fau.de/public/24/87/322108724/"

file_names=(
  "StructuredData.zip"              # Challenge relevant
  "TextData.zip"                    # Challenge relevant
  # "DataSplits_DataDictionaries.zip" # Not needed for the challenge
  # "TMA_TumorCenter.zip"             # Not needed for the challenge
  # "TMA_InvasionFront.zip"           # Not needed for the challenge
  # "TMA_Maps.zip"                    # Not needed for the challenge
  # "TMA_CellDensityMeasurements.zip" # Not needed for the challenge
  # "WSI_PrimaryTumor_Oropharynx_Part1.zip" # Not needed for the challenge
  # "WSI_PrimaryTumor_Oropharynx_Part2.zip" # Not needed for the challenge
  # "WSI_PrimaryTumor_OralCavity.zip"       # Not needed for the challenge
  # "WSI_PrimaryTumor_Larynx.zip"           # Not needed for the challenge
  # "WSI_PrimaryTumor_Hypopharynx.zip"      # Not needed for the challenge
  # "WSI_PrimaryTumor_CUP.zip"              # Not needed for the challenge
  # "WSI_PrimaryTumor_Annotations.zip"      # Not needed for the challenge
  # "WSI_LymphNode.zip"                     # Not needed for the challenge
)

for file_name in "${file_names[@]}"
do 
  download_url="$base_download_url$file_name"
  echo "Try to download file $file_name from $download_url"
  wget $download_url
  echo "Unzipping $file_name"
  unzip $file_name
  echo "Removing $file_name"
  rm $file_name
  echo -e "\n\n"
done

base_download_url="https://hancock.research.fau.eu/public/assets/"
file_names=(
  "TMA_TumorCenter_Cores.zip"             # Challenge relevant
  "WSI_UNI_encodings.zip"                 # Challenge relevant
)
for file_name in "${file_names[@]}"
do 
  download_url="$base_download_url$file_name"
  echo "Try to download file $file_name from $download_url"
  wget $download_url
  echo "Unzipping $file_name"
  unzip $file_name
  echo "Removing $file_name"
  rm $file_name
  echo -e "\n\n"
done