inference
Indexโ
Schemasโ
Inferenceโ
Inference is a module schema consisting of model, framework and so on
Attributesโ
| name | type | description | default value | 
|---|---|---|---|
| framework required | "Ollama" | "KubeRay" | The framework or environment in which the model operates. | |
| model required | str | The model name to be used for inference. | |
| num_ctx | int | The size of the context window used to generate the next token. | 2048 | 
| num_predict | int | Maximum number of tokens to predict when generating text. | 128 | 
| system | str | The system message, which will be set in the template. | "" | 
| temperature | float | A parameter determines whether the model's output is more random and creative or more predictable. | 0.8 | 
| template | str | The full prompt template, which will be sent to the model. | "" | 
| top_k | int | A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. | 40 | 
| top_p | float | A higher value (e.g. 0.9) will give more diverse answers, while a lower value (e.g. 0.5) will be more conservative. | 0.9 | 
Examplesโ
import inference.v1.infer
accessories: {
    "inference@v0.1.0": infer.Inference {
        model: "llama3"
        framework: "Ollama"
        system: "You are Mario from super mario bros, acting as an assistant."
        template: "{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant"
        top_k: 40
        top_p: 0.9
        temperature: 0.8
        num_predict: 128
        num_ctx: 2048
    }
}