AI LLM Models

最後更新: 2024-10-10

目錄

 


Llama

 

Llama 3.1

  • 8B, 4.7G
  • 70B, 40G
  • 405B, 229G

Context length: 128K

Llama 3

有 8b(4.7G), 70b(40), 405b(229G) 版本公開了

ollama pull llama3               # 預設取得 8b 版本

 


Code Llama

 

Web

Code Llama is a code-specialized version of Llama 2
  that was created by further training Llama 2
  on its code-specific datasets, sampling more data from that same dataset for longer.

Size

  • 70B     131GB
  • 34B     63GB
  • 13B     24GB
  • 7B       ~12.55GB

分支

i.e.

  • 7b-instruct    # natural language
  • 7b-code        # Base model for code completion
  • 7b-python     # fine-tuned on 100B tokens of Python code

Example prompts

  • Instruct(default)
  • Code completion
  • Python

Instruct

# It trained to output human-like answers to questions(closest to ChatGPT)

ollama run codellama "Where is the bug in this code? $(cat fib.py)"

ollama run codellama "write a unit test for this function: $(cat fib.py)"

 

ollama run codellama 'You are an expert programmer that writes simple,
concise code and explanations. Write a python function to generate the nth fibonacci number.'

 

Code completion

Generate by comment

# generate subsequent tokens based on the provided prompt

ollama run codellama:7b-code '# A simple python function to remove whitespace from a string:'

Fill-in-the-middle (FIM)

# model can complete code between two already written code blocks.

Format: <PRE> {prefix} <SUF>{suffix} <MID>

i.e.

def compute_gcd(x, y):
    <FILL>
    return result

相當於

ollama run codellama:7b-code '<PRE> def compute_gcd(x, y): <SUF>return result <MID>'

Python

fine-tuned on 100B additional Python tokens

 


Llama2-Chinese

 

Web

 


Gemma
 

它是 Gemmi 的 Open Source 版本, 由 Google 開發.

Model: lightweight text2text model

https://ai.google.dev/gemma

gemma2

https://ollama.com/library/gemma2

  • 2b        # 1.6GiB
  • 9b        # 5.4 GiB (Default)
  • 27b      # 16GiB

gemma(v1)

https://ollama.com/library/gemma

它共有2個版本

  • 2b   # 建議用於 Mobile devices (1.7 GiB)
  • 7b   # 建議用於 Desktop computers (5 GiB)

 


Pi

 

3.5

3.8b, 2.2G

token context length: 128K

Optimization

  • supervised fine-tuning
  • proximal policy optimization
  • direct preference optimization

v3

一共有兩個被本 3B (2.2G) and 14B (7.9G)

 


 

 


Qwen

 

通义千问, By Alibaba Cloud

Qwen2

https://ollama.com/library/qwen2

  • 7b (Default)     # 4.4 GiB
  • 72b                 # 41 GiB

Qwen 1.5

 

6 model sizes, including 0.5B, 1.8B, 4B (default), 7B, 14B, 32B (new) and 72B

 


mixtral

 

A set of Mixture of Experts (MoE) model with open weights by Mistral AI in 8x7b and 8x22b parameter sizes.

  • It has strong maths and coding capabilities
  • It is natively capable of function calling
  • 64K tokens context window allows precise information recall from large documents

 


Yi-Coder

 

Link

Supporting 52 major programming languages.

context length of 128K tokens.

功能

  • Code Completion
  • Code Insertion
  • Repo Q&A
  • A Powerful Natural Language to SQL Converter

Size

  • 9b(default)        # 5G
  • 1.5b                  # 866M

Usage Example

System Prompt:

You are Yi-Coder, you are exceptionally skilled in programming, coding, and any computer-related issues.

[1]

Write a quick sort algorithm.

[2] To identify errors and insert the correct code to fix them

prompt = """
```python
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[len(arr) // 2]
        left = [x for x in arr if x < pivot]

        right = [x for x in arr if x > pivot]
        return quick_sort(left) + middle + quick_sort(right)

print(quick_sort([3,6,8,10,1,2,1]))
# Prints "[1, 1, 2, 3, 6, 8, 10]"
```
Is there a problem with this code?
"""

[3]

key components:

  • NL2SQLConverter
  • DatabaseManager
  • Main Function
Count the number of orders for each city
Who are the top 5 users with the most orders

 


Starcoder

 

transparently trained open code

starcoder2

https://github.com/bigcode-project/starcoder2

a context window of 16,384 tokens, with sliding window attention of 4,096 tokens.

StarCoder2 models are intended for code completion,
they are not instruction models and commands like
"Write a function that computes the square root."
do not work well.

  • 15b              # 9.1G (trained on 600+ programming languages)
  • 7b                # 4G (17 languages)
  • 3b(default)   # 1.7G (17 languages)
  • instruct         # 9.1G (follows natural and human-written instructions)

 


deepseek-coder

 

deepseek-coder-v2

https://ollama.com/library/deepseek-coder-v2

  • 16b (Default)     # 9 GiB
  • 236b                 # 133 GiB

 

 

 

Creative Commons license icon Creative Commons license icon