open source · runs offline · model research

Tuberculosis detection that runs offline on a phone.

Drishti is an open vision-language model experiment for chest X-rays in places where doctors and internet access are scarce.

Try the demo View on GitHub GGUF model (run4)

Live demo: GPU-hosted · run4 LoRA
Val accuracy: 98.72%
Offline GGUF: Q4_K_M · 4.68 GB

Grad-CAM heatmap

Airplane mode

9:41

Drishti 1.89s

classification

TB-positive 87.4% confidence

regionupper-L lobe

cavitationmild

recommended

Refer to specialist within 7 days. Sputum smear advised.

02 / the problem

Where existing AI does not reach.

Most medical AI requires a cloud connection. In rural clinics, neither the radiologist nor the network is reliably present.

600M+ people in India live in areas with limited doctor access

#1 TB remains one of India's most urgent infectious disease burdens

0 KB network traffic needed for an on-device inference path

03 / who it is for

A second set of eyes for field screening.

Community health workers

Triaging chest X-rays in outreach settings before specialist review is available.

Primary clinics

Running first-pass analysis on commodity Android hardware, even without a stable connection.

Research teams

Experimenting with quantized vision-language models and interpretable TB screening workflows.

04 / how it works

From labeled dataset to phone in a clinic.

Data

TBX11K dataset

Chest X-ray annotations are split, validated, and formatted into Qwen-VL style conversations.

Train

Qwen2-VL + QLoRA

Low-rank adapters target the model efficiently while keeping the experiment reproducible.

Optimize

Vision-LoRA ablation

Diagnostic gates catch class collapse before moving toward a smaller on-device runtime.

Deploy

Android path

The design centers the phone because the useful version runs where bandwidth does not.

05 / live demo

Watch the inference flow.

Upload a chest X-ray to run the live run4 classifier (GPU-hosted, 4-bit NF4 LoRA). The same checkpoint is exported as GGUF Q4_K_M for offline llama.cpp use.

9:41

Drishti live

New scan

Capture or upload a chest X-ray. Inference runs on the demo API.

Drishti live

Analyzing...

Chest X-ray being analyzed — forward pass

preprocess done

vision encoder done

language head ...

grad-cam pending

0% / scoring

Drishti —

classification

— —

guidance

Upload a chest X-ray to run the live Drishti run4 classifier.

drishti.api run4 · step 4950

$ curl -F "image=@scan.png" /api/analyze

{
  "status": "ready"
}

ready

06 / evaluation

How well does it read?

Held-out TBX11K validation workflows are tracked with explicit diagnostic gates for class collapse and proportional minimum predictions.

Research artifact, not a medical device. Not validated for clinical use.

Accuracy98.72%

Run #4 · 1,800 val samples

Macro-F10.979

Vision-LoRA ablation

Active TB F10.953

200 active_tb support

GGUFQ4_K_M

4.68 GB on Hugging Face

Deployment

Run 4 LoRA

step 4950 · Qwen2-VL-7B-Instruct

Offline GGUF

Q4_K_M text + F16 mmproj

Live demo

GPU-hosted · run4 LoRA

07 / built on

Open weights, open code, open dataset.

Drishti is an integration project: dataset tooling, vision-language fine-tuning, adapter evaluation, and Grad-CAM overlays in one reproducible repository.

Qwen2-VL-7B

Vision-language base

Alibaba

TBX11K

Training dataset

Liu et al.

Grad-CAM

Visual explanation overlays

vision encoder

Vercel

Demo frontend

drishti-demo.vercel.app

08 / install

Clone the repo and run the workflow.

The current repository contains the research and training pipeline. The landing page frames the Android/offline direction from the design handoff.

Open repository Read setup notes

# CUDA / Colab workflow
git clone https://github.com/ShivamSinghNow/Drishti.git
cd Drishti
python -m pip install -r requirements-colab.txt
python generate_jsonl.py --output-dir data/processed

# AMD MI300X / ROCm workflow
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements-rocm.txt

Vercel drishti-demo.vercel.app

# GPU inference backend (see modal_app.py in repo)
pip install modal
modal deploy modal_app.py

# Vercel frontend
vercel deploy --prod --archive=tgz
# https://drishti-demo.vercel.app

# Offline GGUF (llama.cpp) — needs text GGUF + mmproj
llama-server \
  -hf ShivSingh123/drishti-qwen2vl-run4-gguf:Q4_K_M \
  --mmproj hf://ShivSingh123/drishti-qwen2vl-run4-gguf/mmproj-drishti-qwen2vl-run4-quantized-f16.gguf