Skip to content
l

llava-1.5-7b-hf Beta

Image-to-Textllava-hf
@cf/llava-hf/llava-1.5-7b-hf

LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

    Usage

    Workers - TypeScript

    export interface Env {
    AI: Ai;
    }
    export default {
    async fetch(request: Request, env: Env): Promise<Response> {
    const res = await fetch("https://cataas.com/cat");
    const blob = await res.arrayBuffer();
    const input = {
    image: [...new Uint8Array(blob)],
    prompt: "Generate a caption for this image",
    max_tokens: 512,
    };
    const response = await env.AI.run(
    "@cf/llava-hf/llava-1.5-7b-hf",
    input
    );
    return new Response(JSON.stringify(response));
    },
    } satisfies ExportedHandler<Env>;

    Parameters

    Input

    • 0 string

      Binary string representing the image contents.

    • 1 object

      • temperature number

        Controls the randomness of the output; higher values produce more random results.

      • prompt string

        The input text prompt for the model to generate a response.

      • raw boolean

        If true, a chat template is not applied and you must adhere to the specific model's expected formatting.

      • image

        • 0 array

          An array of integers that represent the image data constrained to 8-bit unsigned integer values

          • items number

            A value between 0 and 255

        • 1 string

          Binary string representing the image contents.

      • max_tokens integer default 512

        The maximum number of tokens to generate in the response.

    Output

    • description string

    API Schemas

    The following schemas are based on JSON Schema

    {
    "oneOf": [
    {
    "type": "string",
    "format": "binary",
    "description": "Binary string representing the image contents."
    },
    {
    "type": "object",
    "properties": {
    "temperature": {
    "type": "number",
    "description": "Controls the randomness of the output; higher values produce more random results."
    },
    "prompt": {
    "type": "string",
    "description": "The input text prompt for the model to generate a response."
    },
    "raw": {
    "type": "boolean",
    "default": false,
    "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."
    },
    "image": {
    "oneOf": [
    {
    "type": "array",
    "description": "An array of integers that represent the image data constrained to 8-bit unsigned integer values",
    "items": {
    "type": "number",
    "description": "A value between 0 and 255"
    }
    },
    {
    "type": "string",
    "format": "binary",
    "description": "Binary string representing the image contents."
    }
    ]
    },
    "max_tokens": {
    "type": "integer",
    "default": 512,
    "description": "The maximum number of tokens to generate in the response."
    }
    },
    "required": [
    "image"
    ]
    }
    ]
    }