Let's start by submitting your first request to Craylm.

Setup

Clone the Craylm repository and start the server.

git clone [email protected]:cray-lm/cray-lm.git
cd cray-lm
./cray up

This will bring up the Craylm development server on localhost:8000, which includes an OpenAI compatible API.

Your first request

curl http://localhost:8000/v1/openai/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "masint/tiny-random-llama",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the world series in 2020?"}
        ]
    }'

Using the Python client

You can also use the Python client to interact with the local Craylm server.


import masint

# Make sure to set the API URL to the local Craylm server
masint.api_url = "http://localhost:8000"

def get_dataset():
    dataset = []

    count = 4

    for i in range(count):
        dataset.append(f"What is {i} + {i}?")

    return dataset


llm = masint.SupermassiveIntelligence()

dataset = get_dataset()

results = llm.generate(prompts=dataset)

print(results)

Loading a different model

Edit the file cray-lm/infra/cray_infra/util/default_config.py and change the model field to the desired model.

model = "meta-llama/Llama-3.2-1B-Instruct"

Then restart the server.

./cray up

Submitting a request to the new model

curl http://localhost:8000/v1/openai/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/Llama-3.2-1B-Instruct",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the world series in 2020?"}
        ]
    }'