API Call Details#

GET /health#

HEAD /health#

GET /healthz#

Get the current server running status

Call Example:

$ curl http://0.0.0.0:8080/health

Output Example:

{"message":"Ok"}

GET /token_load#

Get the current server token usage status

Call Example:

$ curl http://0.0.0.0:8080/token_load

Output Example:

{"current_load":0.0,"logical_max_load":0.0,"dynamic_max_load":0.0}

POST /generate#

Call the model to implement text completion

Call Example:

$ curl http://localhost:8080/generate \
$ -H "Content-Type: application/json" \
$ -d '{
$      "inputs": "What is AI?",
$      "parameters":{
$        "max_new_tokens":17,
$        "frequency_penalty":1
$      },
$      "multimodal_params":{}
$     }'

Output Example:

{"generated_text": [" What is the difference between AI and ML? What are the differences between AI and ML"], "count_output_tokens": 17, "finish_reason": "length", "prompt_tokens": 4}

POST /generate_stream#

Stream return text completion results

Call Example:

$ curl http://localhost:8080/generate_stream \
$ -H "Content-Type: application/json" \
$ -d '{
$      "inputs": "What is AI?",
$      "parameters":{
$        "max_new_tokens":17,
$        "frequency_penalty":1
$      },
$      "multimodal_params":{}
$     }'

Output Example:

data:{"token": {"id": 3555, "text": " What", "logprob": -1.8383026123046875, "special": false, "count_output_tokens": 1, "prompt_tokens": 4}, "generated_text": null, "finished": false, "finish_reason": null, "details": null}

data:{"token": {"id": 374, "text": " is", "logprob": -0.59185391664505, "special": false, "count_output_tokens": 2, "prompt_tokens": 4}, "generated_text": null, "finished": false, "finish_reason": null, "details": null}

data:{"token": {"id": 279, "text": " the", "logprob": -1.5594439506530762, "special": false, "count_output_tokens": 3, "prompt_tokens": 4}, "generated_text": null, "finished": true, "finish_reason": "length", "details": null}

POST /get_score#

Reward model, get conversation score

Call Example:

import json
import requests

query = "<|im_start|>user\nHello! What's your name?<|im_end|>\n<|im_start|>assistant\nMy name is InternLM2! A helpful AI assistant. What can I do for you?<|im_end|>\n<|reward|>"

url = "http://127.0.0.1:8080/get_score"
headers = {'Content-Type': 'application/json'}

data = {
    "chat": query,
    "parameters": {
        "frequency_penalty":1
    }
}
response = requests.post(url, headers=headers, data=json.dumps(data))

if response.status_code == 200:
    print(f"Result: {response.json()}")
else:
    print(f"Error: {response.status_code}, {response.text}")

Output Example:

Result: {'score': 0.4892578125, 'prompt_tokens': 39, 'finish_reason': 'stop'}