CheXpert Project Review
Goal:
Motivation for Automation:
  - automated chest radiograph interpretation at the level of practicing radiologists could provide substantial benefit in many medical settings:
    
      - improved workflow prioritization
 
      - clinical decision support
 
      - large-scale screening
 
      - global population health initiatives
 
    
   
X-Ray Reports:
  - each imaging study can pertain to one or more images, but most often are associated with two images:
    
      - a frontal view, and
 
      - a lateral view
 
    
   
  - images are provided with 14 labels derived from a natural language processing (NLP) tool applied to the corresponding free-text radiology reports
 
The following keywords are the ‘observations’ (medical diagnosis of) sought after in each radiology report:
  - No finding
 
  - Enlarged Cardiomegaly
 
  - Cardiomegaly
 
  - Lung Lesion
 
  - Lung Opacity
 
  - Edema
 
  - Consolidation
 
  - Pneumonia
 
  - Atelectasis
 
  - Pneumothorax
 
  - Pleural Effusion
 
  - Pleural Other
 
  - Fracture
 
  - Support Devices
 
Report NLP to provide labels to associated X-Ray images:
  - Each report was processed by an NLP labeler, and the associated x-rays were given the above listed 14 observations with an assigned weight:
    
      - positive (observation exists in x-ray image),
 
      - negative (observation does not exist in x-ray image), or
 
      - uncertain
 
    
   
  - 
    
An automated rule-based labeler (Natural Language Processor) extracted observations from the radiology reports to be used as structured labels for the chest radiographs (x-ray images)
   
  - 
    
The NLP labeler is set up in three distinct stages:
    
      - mention extraction:
        
          - the labeler extracts mentions of above listed observations from the impression section of radiology reports
 
          - summarizes the key findings in the radiographic study
 
        
       
      - mention classification:
        
          - mentions of observations are classified as negative, uncertain, or positive
 
        
       
      - mention aggregation:
        
          - we use the classification for each mention of observations to arrive at a final label for the 14 observations
            
              - blank for unmentioned, 0 (negative), 1 (positive), or u (uncertain).
 
            
           
        
       
    
   

Model Training:

  - Input: single-view chest radiograph
 
  - 
    
Output: probability of each of the 14 observations
   
  - 
    
When more than one view is available, the models output the maximum probability of the observations across the views
   
  - 
    
The training labels in the dataset for each observation are either 0 (negative), 1 (positive), or u (uncertain)
   
  - 
    
For the uncertain labels, different approaches are explored during the model training:
    
      - U-Ignore: We ignore the uncertain labels during training
 
      - U-Zeroes: We map all instances of the uncertain label to 0
 
      - U-Ones: We map all instances of the uncertain label to 1
 
      - U-SelfTrained: We first train a model using the U-Ignore approach to convergence, and then use the model to make predictions that re-label each of the uncertainty labels with the probability prediction outputted by the model
 
      - U-MultiClass: We treat the uncertainty label as its own class
 
    
   
  - Baseline model has been selected based on the best performing approach on each competition tasks on the validation set:
    
      - U-Ones for Atelectasis and Edema,
 
      - U-MultiClass for Cardiomegaly and Pleural Effusion, and
 
      - U-SelfTrained for Consolidation
 
    
   
  - The model output looks like the following table:
 

Submission Goals:
Glossary:
  - pathology: The anatomic or functional manifestations of a disease: the pathology of cancer
 
References: