Aquila-VL-2B

Disclaimer

This is beta software containing preliminary data which is incomplete and may be inaccurate. If you experience errors with the tool or discover inaccurate information, please open an Issue or Pull Request on the MOF GitHub repository. Thank you.

Download JSON Download YAML

Class III - Open Model

Components included

Model architecture [Apache-2.0]
Model parameters (Final) [Apache-2.0]
Model card [Apache-2.0]
Data card [CC-BY-4.0]
Technical report [CC-BY-4.0]
Evaluation results [Apache-2.0]

Class II - Open Tooling Model

Class I - Open Science Model

Description

The Aquila-VL-2B model is a vision-language model (VLM) trained based on the LLava-one-vision framework. The Qwen2.5-1.5B-instruct model is chose as the LLM, while siglip-so400m-patch14-384 is utilized as the vision tower.

Version/Parameters

2.18B

Organization

Beijing Academy of Artificial Intelligence（BAAl）

Type

Multimodal model

Status

Approved

Architecture

Transformer (Decoder-only)

Treatment

Instruct fine-tuned

Base model

Qwen2.5-1.5B-instruct

Last updated

2025-06-02

Primary tabs

Disclaimer

Class III - Open Model

Components included

Class II - Open Tooling Model

Components included

Class I - Open Science Model

Components included