API Call Details#
GET /health#
HEAD /health#
GET /healthz#
Get the current server running status
Call Example:
$ curl http://0.0.0.0:8080/health
Output Example:
{"message":"Ok"}
GET /token_load#
Get the current server token usage status
Call Example:
$ curl http://0.0.0.0:8080/token_load
Output Example:
{"current_load":0.0,"logical_max_load":0.0,"dynamic_max_load":0.0}
POST /generate#
Call the model to implement text completion
Call Example:
$ curl http://localhost:8080/generate \
$ -H "Content-Type: application/json" \
$ -d '{
$ "inputs": "What is AI?",
$ "parameters":{
$ "max_new_tokens":17,
$ "frequency_penalty":1
$ },
$ "multimodal_params":{}
$ }'
Output Example:
{"generated_text": [" What is the difference between AI and ML? What are the differences between AI and ML"], "count_output_tokens": 17, "finish_reason": "length", "prompt_tokens": 4}
POST /generate_stream#
Stream return text completion results
Call Example:
$ curl http://localhost:8080/generate_stream \
$ -H "Content-Type: application/json" \
$ -d '{
$ "inputs": "What is AI?",
$ "parameters":{
$ "max_new_tokens":17,
$ "frequency_penalty":1
$ },
$ "multimodal_params":{}
$ }'
Output Example:
data:{"token": {"id": 3555, "text": " What", "logprob": -1.8383026123046875, "special": false, "count_output_tokens": 1, "prompt_tokens": 4}, "generated_text": null, "finished": false, "finish_reason": null, "details": null}
data:{"token": {"id": 374, "text": " is", "logprob": -0.59185391664505, "special": false, "count_output_tokens": 2, "prompt_tokens": 4}, "generated_text": null, "finished": false, "finish_reason": null, "details": null}
data:{"token": {"id": 279, "text": " the", "logprob": -1.5594439506530762, "special": false, "count_output_tokens": 3, "prompt_tokens": 4}, "generated_text": null, "finished": true, "finish_reason": "length", "details": null}
POST /get_score#
Reward model, get conversation score
Call Example:
import json
import requests
query = "<|im_start|>user\nHello! What's your name?<|im_end|>\n<|im_start|>assistant\nMy name is InternLM2! A helpful AI assistant. What can I do for you?<|im_end|>\n<|reward|>"
url = "http://127.0.0.1:8080/get_score"
headers = {'Content-Type': 'application/json'}
data = {
"chat": query,
"parameters": {
"frequency_penalty":1
}
}
response = requests.post(url, headers=headers, data=json.dumps(data))
if response.status_code == 200:
print(f"Result: {response.json()}")
else:
print(f"Error: {response.status_code}, {response.text}")
Output Example:
Result: {'score': 0.4892578125, 'prompt_tokens': 39, 'finish_reason': 'stop'}