Pathology-LLaVA-(PCaption-0.5M dataset)

We developed a domain-speciffc large language-vision assistant (PA-LLaVA) for pathology image understanding. Specifically, (1) we first construct a human pathology image-text dataset by cleaning the public medical image-text data for domainspecific alignment; (2) Using the proposed image-text data, we first train a pathology language-image pretraining (PLIP) model as the specialized visual encoder for pathology image, and then we developed scale-invariant connector to avoid the information loss caused by image scaling; (3) We adopt two-stage learning to train PA-LLaVA, first stage for domain alignment, and second stage for end to end visual question & answering (VQA) task.

Updates：

[24/08/30] The test code has been update!

[24/08/20] Model weights have been updated in HuggingFace! OpenFace-CQUPT/Pathology-LLaVA

Architecture

Checkpoint

The weights for PLIP and the weights for both the domain alignment and instruction fine-tuning phases of PA-LLaVA are disclosed in the HuggingFace(https://huggingface.co/OpenFace-CQUPT/Pathology-LLaVA).

Human Pathology Image-Text data （PCaption-0.5M）

Introduction

These public datasets contain substantial amounts of data unrelated to human pathology. To obtain the human pathology image-text data, we performed two cleaning processes on the raw data, as illustrated in the follow figture: (1) Removing nonpathological images. (2) Removing nonhuman pathology data. Additionally, we excluded image-text pairs with textual descriptions of fewer than 20 words. Ultimately, we obtained 518,413 image-text pairs (named "PCaption-0.5M" ) for the aligned training dataset.

Instruction fine-tuning phase we only cleaned PMC-VQA in the same way and obtained 15,788 question-answer pairs related to human pathology. Lastly, we combined PathVQA and Human pathology data obtained from PMC-VQA, thereby constructing a dataset of 35543 question-answer pairs data.

Data Cleaning Process

Get the Dataset

Step 1 Download the public datasets.

Here we only provide the download link for the public dataset and expose the image id index of our cleaned dataset on HuggingFace(https://huggingface.co/OpenFace-CQUPT/Pathology-LLaVA).

Step 2 Data processing.

First, use the image index of the clean dataset provided by us to extract the human pathological dataset, and then process it into the following format:

[
	{
		"image": ,
		"caption": 
	},
]

Finally, run dataformate.py to get the format needed to train the model.

python dataformat.py

Training

We used xtuner as a training tool, so please go to xtuner official to complete the environment configuration [https://github.com/InternLM/xtuner]. Then add the xtuner_add/pallava file to the installed xtuner code with the following location structure.

Domain Alignment

NPROC_PER_NODE=8 NNODES=2 PORT=12345 ADDR= NODE_RANK=0 xtuner train pallava_domain_alignment.py --deepspeed deepspeed_zero2 --seed 1024

Instruction Tuning

NPROC_PER_NODE=8 NNODES=2 PORT=12345 ADDR= NODE_RANK=0 xtuner train pallava_instruction_tuning.py --deepspeed deepspeed_zero2 --seed 1024

Test

First, replace or add all the files in xtuner_add/tool_add into the tool file of the xtuner runtime file with the following file location structure:

Before Test

Our released weights are distributed training weights that can be directly loaded for training through XTuner. If you need merged weights, they can be merged using XTuner (using the weights from the domain alignment phase as an example):

If you need to test the caption inference task with the first stage weights：

xtuner convert pth_to_hf path/pallava_domain_alignment.py ./domain_alignment_weight.pth ./domain_alignment_weight_ft
xtuner convert merge meta-llama/Meta-Llama-3-8B-Instruct ./domain_alignment_weight_ft/llm_adapter ./domain_alignment_weight_ft/llm_merge_lora

If you need to use phase 2 weights for classification or VQA tasks(The VQA question-and-answer style has been provided in the paper):

xtuner convert pth_to_hf path/pallava_instruction_tuning.py ./instruction_tuning_weight.pth ./instruction_tuning_weight_ft
xtuner convert merge meta-llama/Meta-Llama-3-8B-Instruct ./instruction_tuning_weight_ft/llm_adapter ./instruction_tuning_weight_ft/llm_merge_lora

PathVQA

NPROC_PER_NODE=8 xtuner pathvqa meta-llama/Meta-Llama-3-8B-Instruct --visual-encoder PLIP --llava ./instruction_tuning_weight_ft --prompt-template llama3_chat --data-path absolute_path/Path_VQA/path_vqa_test.json --work-dir absolute_path/logs/pathvqa --launcher pytorch --anyres-image

PMCVQA

NPROC_PER_NODE=8 xtuner pmcvqa meta-llama/Meta-Llama-3-8B-Instruct --visual-encoder PLIP --llava ./instruction_tuning_weight_ft --prompt-template llama3_chat --data-path absolute_path/PMC-VQA/pmc-vqa_test_clean_answer_abcd.json --work-dir absolute_path/logs/pmcvqa --launcher pytorch --anyres-image

Zero-Shot

Here is an example with OSCC data.

Generate answer

NPROC_PER_NODE=8 xtuner zero_shot meta-llama/Meta-Llama-3-8B-Instruct --visual-encoder PLIP --llava ./instruction_tuning_weight_ft --prompt-template llama3_chat --data-path absolute_path/OSCC/oscc.json --work-dir absolute_path/logs/oscc --launcher pytorch --anyres-image

Calculate score

python test/f1.py

GPT4-Score

python python test/gpt4-scores.py

Result

Citation

@INPROCEEDINGS{10821785,
  author={Dai, Dawei and Zhang, Yuanhui and Xu, Long and Yang, Qianlan and Shen, Xiaojing and Xia, Shuyin and Wang, Guoyin},
  booktitle={2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)}, 
  title={PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding}, 
  year={2024},
  volume={},
  number={},
  pages={3138-3143},
  keywords={Connectors;Pathology;Visualization;Codes;Computational modeling;Biological system modeling;Data models;Cleaning;Bioinformatics;Biomedical imaging;Pathology Image Understanding;VQA;LLaVA},
  doi={10.1109/BIBM62325.2024.10821785}}

Contact

This repo is currently maintained by Dawei Dai (dw_dai@163.com) and his master's student Yuanhui Zhang (S230233056@stu.cqupt.edu.cn).

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
plip		plip
test		test
xtuner_add		xtuner_add
Architecture.png		Architecture.png
DataCleanProcess.png		DataCleanProcess.png
README.md		README.md
cmd.md		cmd.md
dataformat.py		dataformat.py
pallava_domain_alignment.py		pallava_domain_alignment.py
pallava_instruction_tuning.py		pallava_instruction_tuning.py
pt_prompt.txt		pt_prompt.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pathology-LLaVA-(PCaption-0.5M dataset)

Updates：

Architecture

Checkpoint

Human Pathology Image-Text data （PCaption-0.5M）

Introduction

Data Cleaning Process

Get the Dataset

Step 1 Download the public datasets.

Domain Alignment Stage

Instruction Tuning Stage

Categorical dataset for zero-sample testing

Step 2 Data processing.

Training

Domain Alignment

Instruction Tuning

Test

Before Test

PathVQA

PMCVQA

Zero-Shot

Generate answer

Calculate score

GPT4-Score

Result

Citation

Contact

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

ddw2AIGROUP2CQUPT/PA-LLaVA

Folders and files

Latest commit

History

Repository files navigation

Pathology-LLaVA-(PCaption-0.5M dataset)

Updates：

Architecture

Checkpoint

Human Pathology Image-Text data （PCaption-0.5M）

Introduction

Data Cleaning Process

Get the Dataset

Step 1 Download the public datasets.

Domain Alignment Stage

Instruction Tuning Stage

Categorical dataset for zero-sample testing

Step 2 Data processing.

Training

Domain Alignment

Instruction Tuning

Test

Before Test

PathVQA

PMCVQA

Zero-Shot

Generate answer

Calculate score

GPT4-Score

Result

Citation

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages