AI LLM Models

由 datahunter 在二, 07/05/2024 - 23:22 發表

最後更新: 2024-10-10

Llama

Llama 3.1

8B, 4.7G
70B, 40G
405B, 229G

Context length: 128K

Llama 3

有 8b(4.7G), 70b(40), 405b(229G) 版本公開了

ollama pull llama3 # 預設取得 8b 版本

Code Llama

Web

Code Llama is a code-specialized version of Llama 2
that was created by further training Llama 2
on its code-specific datasets, sampling more data from that same dataset for longer.

Size

70B 131GB
34B 63GB
13B 24GB
7B ~12.55GB

分支

i.e.

7b-instruct # natural language
7b-code # Base model for code completion
7b-python # fine-tuned on 100B tokens of Python code

Example prompts

Instruct(default)
Code completion
Python

Instruct

# It trained to output human-like answers to questions(closest to ChatGPT)

ollama run codellama "Where is the bug in this code? $(cat fib.py)"

ollama run codellama "write a unit test for this function: $(cat fib.py)"

ollama run codellama 'You are an expert programmer that writes simple,
concise code and explanations. Write a python function to generate the nth fibonacci number.'

Code completion

Generate by comment

# generate subsequent tokens based on the provided prompt

ollama run codellama:7b-code '# A simple python function to remove whitespace from a string:'

Fill-in-the-middle (FIM)

# model can complete code between two already written code blocks.

Format: <PRE> {prefix} <SUF>{suffix} <MID>

i.e.

def compute_gcd(x, y):
    <FILL>
    return result

相當於

ollama run codellama:7b-code '<PRE> def compute_gcd(x, y): <SUF>return result <MID>'

Python

fine-tuned on 100B additional Python tokens

Llama2-Chinese

Web

https://ollama.com/library/llama2-chinese

Gemma

它是 Gemmi 的 Open Source 版本, 由 Google 開發.

Model: lightweight text2text model

https://ai.google.dev/gemma

gemma2

https://ollama.com/library/gemma2

2b # 1.6GiB
9b # 5.4 GiB (Default)
27b # 16GiB

gemma(v1)

https://ollama.com/library/gemma

它共有2個版本

2b # 建議用於 Mobile devices (1.7 GiB)
7b # 建議用於 Desktop computers (5 GiB)

Pi

3.5

3.8b, 2.2G

token context length: 128K

Optimization

supervised fine-tuning
proximal policy optimization
direct preference optimization

一共有兩個被本 3B (2.2G) and 14B (7.9G)

Qwen

通义千问, By Alibaba Cloud

Qwen2

https://ollama.com/library/qwen2

7b (Default) # 4.4 GiB
72b # 41 GiB

Qwen 1.5

6 model sizes, including 0.5B, 1.8B, 4B (default), 7B, 14B, 32B (new) and 72B

mixtral

A set of Mixture of Experts (MoE) model with open weights by Mistral AI in 8x7b and 8x22b parameter sizes.

It has strong maths and coding capabilities
It is natively capable of function calling
64K tokens context window allows precise information recall from large documents

Yi-Coder

Link

Supporting 52 major programming languages.

context length of 128K tokens.

功能

Code Completion
Code Insertion
Repo Q&A
A Powerful Natural Language to SQL Converter

Size

9b(default) # 5G
1.5b # 866M

Usage Example

System Prompt:

You are Yi-Coder, you are exceptionally skilled in programming, coding, and any computer-related issues.

[1]

Write a quick sort algorithm.

[2] To identify errors and insert the correct code to fix them

prompt = """
```python
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[len(arr) // 2]
        left = [x for x in arr if x < pivot]

        right = [x for x in arr if x > pivot]
        return quick_sort(left) + middle + quick_sort(right)

print(quick_sort([3,6,8,10,1,2,1]))
# Prints "[1, 1, 2, 3, 6, 8, 10]"
```
Is there a problem with this code?
"""

[3]

key components:

NL2SQLConverter
DatabaseManager
Main Function

Count the number of orders for each city
Who are the top 5 users with the most orders

Starcoder

transparently trained open code

starcoder2

https://github.com/bigcode-project/starcoder2

a context window of 16,384 tokens, with sliding window attention of 4,096 tokens.

StarCoder2 models are intended for code completion,
they are not instruction models and commands like
"Write a function that computes the square root."
do not work well.

15b # 9.1G (trained on 600+ programming languages)
7b # 4G (17 languages)
3b(default) # 1.7G (17 languages)
instruct # 9.1G (follows natural and human-written instructions)

deepseek-coder

deepseek-coder-v2

https://ollama.com/library/deepseek-coder-v2

16b (Default) # 9 GiB
236b # 133 GiB

瀏覽次數： 249

夢想家