閱讀(448) 書簽贊(0) 我要糾錯(cuò)

快速使用指南：Ollama與CodeGeeX4-ALL-9B的集成教程

2025-01-03 11:39 更新

快速使用

Ollama教程

Ollama也是一個(gè)開源項(xiàng)目，是在本地快速跑各種開源大模型的最優(yōu)選擇之一。CodeGeeX4-ALL-9B在開源后不到24小時(shí)就獲得了Ollama的支持，目前通過(guò)Ollama下載已經(jīng)超過(guò)了10,000次+。

Mac環(huán)境下的配置教程：

安裝使用的過(guò)程非常簡(jiǎn)單，跟著下面的教程，大家也可以一起來(lái)體驗(yàn)：

首先，把Ollama開源項(xiàng)目一鍵安裝在自己的電腦上。建議安裝Ollama0.2或更高版本。MacOS和Windows都有一鍵安裝包。Linux也只需要執(zhí)行一行命令。

安裝完成后，打開終端，輸入ollama，能看到這些信息就說(shuō)明Ollama已經(jīng)安裝成功。

接下來(lái)我們打開Ollama的官網(wǎng)，搜索CodeGeeX4。

打開進(jìn)入詳情頁(yè)面，就可以看到CodeGeeX4-ALL-9B模型的相關(guān)介紹和使用命令了，復(fù)制運(yùn)行命令

在終端運(yùn)行剛才復(fù)制的模型運(yùn)行命令，就開始安裝CodeGeeX4-ALL-9B模型了

看到終端命令行提示的“Send a message”就說(shuō)明CodeGeeX4-ALL-9B已經(jīng)成功安裝在你的電腦上，你可以直接在這里輸入問(wèn)題和CodeGeeX4-ALL-9B進(jìn)行對(duì)話。

接下來(lái)按照下面的步驟，就可以把CodeGeeX4-ALL-9B接入到您的CodeGeeX插件本地模式中。

兩個(gè)平臺(tái)的步驟相同。

在VSCode、JetBrains全家桶的IDE插件市場(chǎng)中，搜索CodeGeeX，點(diǎn)擊下載安裝插件。

配置跨域所需的環(huán)境變量在終端輸入

export OLLAMA_ORIGINS="*"

或

launchctl setenv OLLAMA_ORIGINS "*"

來(lái)設(shè)置環(huán)境變量，Windows環(huán)境可以觀看下方視頻，了解如何手動(dòng)配置環(huán)境變量。

設(shè)置后需要重啟 Ollama 服務(wù)和 IDE(VSCode 或其他環(huán)境) 使環(huán)境變量生效。

啟動(dòng)CodeGeeX4，在終端輸入

ollama serve

打開一個(gè)新的終端，在終端輸入

ollama run codegeex4

配置接口地址在CodeGeeX插件的本地模式設(shè)置中，輸入模型地址：

http://localhost:11434/v1/chat/completions

打開模型配置的高級(jí)模式，在模型名稱欄填寫

codegeex4

現(xiàn)在就可以享受 CodeGeeX4在本地提供的編碼體驗(yàn)！

希望了解更多模型部署的教程，可以前往CodeGeeX4在Github上的教程與Demo查看，如果您喜歡我們的項(xiàng)目并認(rèn)為它對(duì)您有幫助，請(qǐng)?jiān)贕itHub上為CodeGeeX4點(diǎn)一個(gè)?? Star！

Huggingface Transformers

請(qǐng)使用 4.39.0<=transformers<=4.40.2 部署 codegeex4-all-9b：

from transformers import AutoTokenizer, AutoModelForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex4-all-9b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"THUDM/codegeex4-all-9b",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to(device).eval()
inputs = tokenizer.apply_chat_template([{"role": "user", "content": "write a quick sort"}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True).to(device)
with torch.no_grad():
outputs = model.generate(**inputs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM

使用 vllm==0.5.1 快速啟動(dòng) codegeex4-all-9b：

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

# CodeGeeX4-ALL-9B
# max_model_len, tp_size = 1048576, 4
# If OOM，please reduce max_model_len，or increase tp_size
max_model_len, tp_size = 131072, 1
model_name = "codegeex4-all-9b"
prompt = [{"role": "user", "content": "Hello"}]

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
llm = LLM(
model=model_name,
tensor_parallel_size=tp_size,
max_model_len=max_model_len,
trust_remote_code=True,
enforce_eager=True,
# If OOM，try using follong parameters
# enable_chunked_prefill=True,
# max_num_batched_tokens=8192
)
stop_token_ids = [151329, 151336, 151338]
sampling_params = SamplingParams(temperature=0.95, max_tokens=1024, stop_token_ids=stop_token_ids)

inputs = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
outputs = llm.generate(prompts=inputs, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

通過(guò) vllm 設(shè)置 OpenAI 兼容服務(wù)，詳細(xì)信息請(qǐng)查看 OpenAI 兼容服務(wù)器：

python -m vllm.entrypoints.openai.api_server \
--model THUDM/codegeex4-all-9b \
--trust_remote_code

以上內(nèi)容是否對(duì)您有幫助：

← 離線模式教程：如何在Ollama上部署CodeGeeX4-ALL-9B

CodeGeeX插件：代碼生成與智能補(bǔ)全功能詳解 →

寫筆記

我要補(bǔ)充

快速使用指南：Ollama與CodeGeeX4-ALL-9B的集成教程

推薦文章

推薦教程

推薦課程