classification
TB-positive 87.4% confidenceRefer to specialist within 7 days. Sputum smear advised.
Drishti is an open vision-language model experiment for chest X-rays in places where doctors and internet access are scarce.
classification
TB-positive 87.4% confidenceRefer to specialist within 7 days. Sputum smear advised.
02 / the problem
Most medical AI requires a cloud connection. In rural clinics, neither the radiologist nor the network is reliably present.
03 / who it is for
Triaging chest X-rays in outreach settings before specialist review is available.
Running first-pass analysis on commodity Android hardware, even without a stable connection.
Experimenting with quantized vision-language models and interpretable TB screening workflows.
04 / how it works
TBX11K dataset
Chest X-ray annotations are split, validated, and formatted into Qwen-VL style conversations.
Qwen2-VL + QLoRA
Low-rank adapters target the model efficiently while keeping the experiment reproducible.
Vision-LoRA ablation
Diagnostic gates catch class collapse before moving toward a smaller on-device runtime.
Android path
The design centers the phone because the useful version runs where bandwidth does not.
05 / live demo
Upload a chest X-ray to run the live run4 classifier (GPU-hosted, 4-bit NF4 LoRA). The same checkpoint is exported as GGUF Q4_K_M for offline llama.cpp use.
Capture or upload a chest X-ray. Inference runs on the demo API.
0% / scoring
classification
— —Upload a chest X-ray to run the live Drishti run4 classifier.
$ curl -F "image=@scan.png" /api/analyze
{
"status": "ready"
}
06 / evaluation
Held-out TBX11K validation workflows are tracked with explicit diagnostic gates for class collapse and proportional minimum predictions.
Run #4 · 1,800 val samples
Vision-LoRA ablation
200 active_tb support
4.68 GB on Hugging Face
step 4950 · Qwen2-VL-7B-Instruct
Q4_K_M text + F16 mmproj
GPU-hosted · run4 LoRA
07 / built on
Drishti is an integration project: dataset tooling, vision-language fine-tuning, adapter evaluation, and Grad-CAM overlays in one reproducible repository.
Vision-language base
AlibabaTraining dataset
Liu et al.Visual explanation overlays
vision encoderDemo frontend
drishti-demo.vercel.app08 / install
The current repository contains the research and training pipeline. The landing page frames the Android/offline direction from the design handoff.
# CUDA / Colab workflow
git clone https://github.com/ShivamSinghNow/Drishti.git
cd Drishti
python -m pip install -r requirements-colab.txt
python generate_jsonl.py --output-dir data/processed
# AMD MI300X / ROCm workflow
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements-rocm.txt
# GPU inference backend (see modal_app.py in repo)
pip install modal
modal deploy modal_app.py
# Vercel frontend
vercel deploy --prod --archive=tgz
# https://drishti-demo.vercel.app
# Offline GGUF (llama.cpp) — needs text GGUF + mmproj
llama-server \
-hf ShivSingh123/drishti-qwen2vl-run4-gguf:Q4_K_M \
--mmproj hf://ShivSingh123/drishti-qwen2vl-run4-gguf/mmproj-drishti-qwen2vl-run4-quantized-f16.gguf