所有文章 > API对比报告 > 国内开源AI大模型对比:DeepSeek R1 对比通义千问Max

国内开源AI大模型对比:DeepSeek R1 对比通义千问Max

近年来,国内开源AI大模型发展得特别快,像DeepSeek R1和通义千问Max这样的明星模型,正在让人工智能技术变得更普及、更好用。DeepSeek R1 主打轻便高效,尤其擅长处理图片、文本等多模态任务,特别适合开发者快速部署到实际应用中。而通义千问Max 则是个”大块头”,参数规模超大,还融入了丰富的行业知识,在复杂任务上表现非常强悍。

幂简集成精心整理了一份详尽的对比表格,全面覆盖国内主流开源AI大模型的核心性能指标、API产品特性、价格等关键信息。本文将重点从API产品表格和API接口效果两个角度进行分析,为您提供直观的比较视角,同时诚邀您查阅完整报表,获取更全面的洞察!

国内开源AI大模型基准和规格

AI大模型的规格

Deepseek R1 标志着人工智能技术的一次大胆进步,带来了创新功能,并注重精度。与此同时, 通义千问Max 代表着阿里云在人工智能领域的又一次飞跃,展示了其在多功能性和可靠性方面的显著提升。

在讨论大语言模型的性能时,我们需要关注其核心参数和表现指标。以下表格详细对比了通义千问Max与Deepseek R1两款模型在输入输出能力、输出效率等方面的关键数据,为进一步分析模型性能提供直观依据。

规格通义千问MaxDeepseek R1
输入上下文窗口32k64K
最大输出令牌最大输出约4k token最大输出约8k token
参数数量基于大量公开数据训练基于深度优化的领域数据训练
知识截止截至2024年中期截至2024年后期
发布日期2024年早期2024年中期
每秒输出令牌数约30-50 token/s约50-80 token/s
想了解更多模型、更完整的深度报告内容,点此查看完整报告

该表格比较了通义千问Max和Deepseek R1两种模型:

  • 输入上下文窗口:通义千问Max支持4K-8K token,Deepseek R1支持32K token。
  • 最大输出令牌:通义千问Max最大输出4K token,Deepseek R1最大输出8K token。
  • 参数量:通义千问Max基于大星公开数据训练,Deepseek R1基于深度优化的闭源数据训练。
  • 知识截止:两者知识都截至2024年中。
  • 发布日期:两者均于2024年中发布。
  • 每秒输出令牌数:通义千问Max为30-50 token/s,Deepseek R1为50-80 token/s。

总结:Deepseek R1在上下文窗口、最大输出令牌和每秒输出速度上优于通义千问Max,但两者知识更新和发布日期相同,训练数据来源不同。

AI大模型的基准

该基准测试是两种型号的官方发布说明以及多个开放基准测试的组合。

基准通义千问MaxDeepseek R1
MMLU90.890.8
GPQA6771.5
Human Eval9696.3
CTFs43.0
MATH-5009297.3
想了解更多模型、更完整的深度报告内容,点此查看完整报告

该表格比较了通义千问Max和Deepseek R1两个模型在不同基准测试中的性能表现,数据如下:

说明:

  1. MMLU(本科水平知识):两模型表现相同,均为90.8。
  2. GPQA(研究生级别推理):Deepseek R1(71.5)优于通义千问Max(67)。
  3. Human Eval(编程):Deepseek R1(96.3)略优于通义千问Max(96)。
  4. CTFs(网站安全):仅通义千问Max有数据(43.0),Deepseek R1无数据。
  5. MATH-500(数学问题解决):Deepseek R1(97.3)优于通义千问Max(92)。

总体来看,Deepseek R1在大多数测试中表现略优或持平,通义千问Max在CTFs上有数据但表现较低。

DeepSeek R1模型

国内开源AI大模型实践测试

我们在充满挑战的真实场景中对人工智能模型进行严格测试。我们的评估重点关注对话能力、逻辑推理、数学问题解决能力和编程性能等关键领域,以评估每个模型的优势和劣势。

1. 对话能力测试:

让我们从一个熟悉的挑战开始:很多人都知道,AI 模型在诸如计算单词中特定字母的数量之类的任务上经常遇到困难。让我们看看 DeepSeek R1 和 通义千问Max 如何处理这项任务。(我故意拼错了这个词)

How many letters 'r' are in the word strawbery?

通义千问Max 输出

There are **2** instances of the letter 'r' in "strawbery."

正确答案:

Deepseek R1 输出

The word “strawbery” (as spelled) contains 2 instances of the letter ‘r’. Here’s the breakdown:

S
T
R (first ‘r’)
A
W
B
E
R (second ‘r’)
Y
Note: The correct spelling of the fruit is “strawberry” (with two 'r’s and two 'b’s), but the count here is based on the provided spelling.

正确答案:

点击试用验证更多模型效果

2. 逻辑推理能力测试:

模型将接受一个陈述和几个可能的答案。它的任务是选出正确的答案。

The medieval Arabs had many manuscripts of the ancient Greek. 
When needed, they translate them into Arabic.
Medieval Arab philosophers were very interested in Aristotle's Theory of Poetry,
which was obviously not shared by Arab poets,
because a poet interested in it must want to read Homer's poems.
Aristotle himself often quotes Homer's poems.
However, Homer's poems were not translated into Arabic until modern times.

Which of the following options, if true, strongly supports the above argument?
Please only return the letter in front of your predict correct option, A, B, C or D.

A. Some medieval Arab translators have manuscripts of Homer poems in ancient Greek.

B. Aristotle's Theory of Poetry is often quoted and commented by modern Arab poets.

C. In Aristotle's Theory of Poetry, most of the content is related to drama,
and medieval Arabs also wrote plays and performed them.

D. A series of medieval Arab stories, such as Arab Night,
are very similar to some parts of Homer's epic.
通义千问 MAX模型

通义千问Max 输出

A

正确答案:✅
通义千问Max 成功完成了这项任务

Deepseek R1 输出

A

正确答案:
Deepseek R1 无法成功完成此任务

点击试用验证更多模型效果

3.创造力能力测试:

Write a short poem about friendship

通义千问Max 输出

In life’s journey, paths may twist and turn,
Friendship’s light forever will burn.
A bond so pure, a heart’s true find,
Through every season, love intertwined.

Hand in hand, we face the storm,
With gentle words, a healing balm.
Laughter shared, or silent gaze,
In friendships deep, our souls find ways.

To lift, support, and understand—
An open heart, a helping hand.
So cherish near, each kindred soul,
For friendships make us whole.

正确答案: (4/5)
这是一首扎实、真挚的诗,但在执行上却略显可预测。

Deepseek R1 输出

Bonds Unseen

In laughter’s echo, through silent tears,
A thread weaves softly, spanning years—
Not bound by time, nor duty’s chain,
But roots that hold through wind and rain.

Shared sunsets, secrets whispered low,
A shelter when the cold winds blow.
No score, no count, just open hands,
A map of trust where each soul stands.

When shadows stretch and paths divide,
Your voice becomes my steady guide.
Not perfect, flawed, yet fiercely true,
A mirror showing the best in you.

For friendship’s not a blaze, bright-brief,
But embers glowing, slow, beneath—
A quiet flame that time won’t dim,
The quiet hum of home within.

正确答案:(4.5/5)
这是一首生动且发人深省的诗歌,虽然不太容易理解,但值得深入思考。

很难评判哪首诗更好,因为每首诗都有其独特的优点。此外,对诗歌的看法非常主观,因人而异。不过,我上面的评分只是对每首诗的个人看法。

点击试用验证更多模型效果

4.数学能力测试:

我用几个不同的任务测试了这些模型,它们都成功完成了。然而,GPT 在这个任务上遇到了困难。

The expression 2⋅3⋅4⋅5+1=121, as multiplication is performed before addition. 
However, other values can be obtained by inserting parentheses into the expression.
For example:
(2⋅(3⋅4))⋅(5+1)=144
How many distinct values can be obtained by inserting parentheses into
2⋅3⋅4⋅5+1, without rearranging the terms?

通义千问Max 输出

To solve this problem, we need to analyze the different ways parentheses can be inserted into the expression ( 2 \cdot 3 \cdot 4 \cdot 5 + 1 ), while ensuring that the terms are not rearranged. The goal is to determine how many distinct values can be obtained.

Step 1: Understand the structure of the expression
The original expression is: [ 2 \cdot 3 \cdot 4 \cdot 5 + 1 ] Here, multiplication has higher precedence than addition. Without parentheses, the default evaluation order is: [ (2 \cdot 3 \cdot 4 \cdot 5) + 1 = 120 + 1 = 121 ]

By inserting parentheses, we can change the grouping of terms and thus alter the result. However, the order of the terms ( 2, 3, 4, 5, +1 ) must remain fixed.

Step 2: Count the number of ways to insert parentheses
The expression consists of four multiplications and one addition. To evaluate it, we must group the terms using parentheses. This is equivalent to finding all possible binary trees with four leaves (one for each number ( 2, 3, 4, 5 )) and a single addition operation.

Catalan numbers and binary tree enumeration
The number of ways to parenthesize ( n ) terms in an expression is given by the ( (n-1) )-th Catalan number. For four terms (( n = 4 )), the number of ways to parenthesize them is: [ C_3 = \frac{1}{3+1} \binom{6}{3} = \frac{1}{4} \cdot 20 = 5 ] Thus, there are 5 ways to parenthesize the multiplication part of the expression ( 2 \cdot 3 \cdot 4 \cdot 5 ).

For each of these 5 groupings, the addition ( +1 ) can either be applied last or grouped earlier with some subset of the terms. This introduces additional possibilities.

Step 3: Evaluate all possible parenthesizations
We now systematically evaluate all distinct results that can arise from the different parenthesizations.

Case 1: Default order (no parentheses)
Without any parentheses, the expression evaluates as: [ (2 \cdot 3 \cdot 4 \cdot 5) + 1 = 120 + 1 = 121 ]

Case 2: Grouping two terms at a time
We consider all possible ways to group two terms together first, then proceed with the rest of the computation.

Group ( (2 \cdot 3) ): [ ((2 \cdot 3) \cdot 4 \cdot 5) + 1 = (6 \cdot 4 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot 3) \cdot (4 \cdot 5) + 1 = 6 \cdot 20 + 1 = 120 + 1 = 121 ]

Group ( (3 \cdot 4) ): [ (2 \cdot (3 \cdot 4) \cdot 5) + 1 = (2 \cdot 12 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot 3) \cdot (4 \cdot 5) + 1 = 6 \cdot 20 + 1 = 120 + 1 = 121 ]

Group ( (4 \cdot 5) ): [ (2 \cdot 3 \cdot (4 \cdot 5)) + 1 = (2 \cdot 3 \cdot 20) + 1 = (6 \cdot 20) + 1 = 120 + 1 = 121 ]

Case 3: Grouping three terms together
We now consider cases where three terms are grouped together first.

Group ( (2 \cdot 3 \cdot 4) ): [ ((2 \cdot 3 \cdot 4) \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot 3 \cdot 4) + 1 \cdot 5 = 24 + 5 = 29 ]

Group ( (3 \cdot 4 \cdot 5) ): [ (2 \cdot (3 \cdot 4 \cdot 5)) + 1 = (2 \cdot 60) + 1 = 120 + 1 = 121 ] Alternatively: [ 2 \cdot (3 \cdot 4 \cdot 5 + 1) = 2 \cdot 61 = 122 ]

Case 4: Other combinations
Finally, we check other possible groupings:

Group ( (2 \cdot (3 \cdot 4)) ): [ ((2 \cdot (3 \cdot 4)) \cdot 5) + 1 = (2 \cdot 12 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ (2 \cdot (3 \cdot 4)) + 1 \cdot 5 = 24 + 5 = 29 ]

Group ( ((2 \cdot 3) \cdot 4) ): [ (((2 \cdot 3) \cdot 4) \cdot 5) + 1 = (6 \cdot 4 \cdot 5) + 1 = (24 \cdot 5) + 1 = 120 + 1 = 121 ] Alternatively: [ ((2 \cdot 3) \cdot 4) + 1 \cdot 5 = 24 + 5 = 29 ]

Step 4: Identify distinct values
From the above evaluations, the distinct values obtained are: [ 121, 29, 122 ]

Thus, the total number of distinct values is: [ \boxed{3} ]

正确答案:
GPT 因推理错误而失败

Deepseek R1 输出

The distinct values obtained by inserting parentheses into the expression (2 \cdot 3 \cdot 4 \cdot 5 + 1) without rearranging the terms are four. Each value corresponds to incorporating the addition of 1 at different stages of the multiplication process:

Adding 1 at the very end: ((2 \cdot 3 \cdot 4 \cdot 5) + 1 = 121).
Adding 1 to the product of 3, 4, and 5: (2 \cdot (3 \cdot 4 \cdot 5 + 1) = 122).
Adding 1 to the product of 4 and 5: (2 \cdot 3 \cdot (4 \cdot 5 + 1) = 126).
Adding 1 to 5 first: (2 \cdot 3 \cdot 4 \cdot (5 + 1) = 144).
Each grouping leads to a distinct result, and due to the associative property of multiplication, different parenthesizations within the multiplications do not yield new values. Thus, the number of distinct values is 4.

(\boxed{4})

正确答案:✅
Deepseek 成功完成任务

点击试用验证更多模型效果

5. 算法能力测试:

Given two sorted arrays nums1 and nums2 of size m and n respectively,
return the median of the two sorted arrays.
The overall run time complexity should be O(log (m+n))

Example:
Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000
Explanation: merged array = [1,2,3] and median is 2.

通义千问Max 输出

def findMedianSortedArrays(nums1, nums2):
# Ensure nums1 is the smaller array
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1

m, n = len(nums1), len(nums2)
imin, imax, half_len = 0, m, (m + n + 1) // 2

while imin <= imax:
i = (imin + imax) // 2
j = half_len - i

if i < m and nums2[j-1] > nums1[i]:
# i is too small, increase it
imin = i + 1
elif i > 0 and nums1[i-1] > nums2[j]:
# i is too big, decrease it
imax = i - 1
else:
# i is perfect
max_of_left = 0
if i == 0:
max_of_left = nums2[j-1]
elif j == 0:
max_of_left = nums1[i-1]
else:
max_of_left = max(nums1[i-1], nums2[j-1])

if (m + n) % 2 == 1:
return max_of_left

min_of_right = 0
if i == m:
min_of_right = nums2[j]
elif j == n:
min_of_right = nums1[i]
else:
min_of_right = min(nums1[i], nums2[j])

return (max_of_left + min_of_right) / 2

# Example usage:
nums1 = [1, 3]
nums2 = [2]
print(findMedianSortedArrays(nums1, nums2)) # Output: 2.0

正确答案: (5/5)✅
GPT 提供了出色的解决方案,跻身最佳之列

点击试用验证更多模型效果

Deepseek R1 输出

def findMedianSortedArrays(nums1, nums2):
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1

m, n = len(nums1), len(nums2)
total_left = (m + n + 1) // 2

low, high = 0, m

while low <= high:
partition_nums1 = (low + high) // 2
partition_nums2 = total_left - partition_nums1

max_left_nums1 = float('-inf') if partition_nums1 == 0 else nums1[partition_nums1 - 1]
min_right_nums1 = float('inf') if partition_nums1 == m else nums1[partition_nums1]

max_left_nums2 = float('-inf') if partition_nums2 == 0 else nums2[partition_nums2 - 1]
min_right_nums2 = float('inf') if partition_nums2 == n else nums2[partition_nums2]

if max_left_nums1 <= min_right_nums2 and max_left_nums2 <= min_right_nums1:
if (m + n) % 2 == 0:
return (max(max_left_nums1, max_left_nums2) + min(min_right_nums1, min_right_nums2)) / 2
else:
return max(max_left_nums1, max_left_nums2)
elif max_left_nums1 > min_right_nums2:
high = partition_nums1 - 1
else:
low = partition_nums1 + 1
return 0.0

正确答案:(4.5/5)✅
Deepseek 提供了近乎完美的解决方案

点击试用验证更多模型效果

6.重构和优化能力测试:

This function, form_valid, processes a valid form 
submission for a purchase in a Django view. Optimize and refactor it.

def form_valid(self, form):
email = form.cleaned_data['email']
tel = form.cleaned_data['tel']
country = form.cleaned_data['country']
city = form.cleaned_data['city']
street = form.cleaned_data['street']
user = self.request.user
products = Basket.objects.values_list(
'product',
flat=True
).filter(user=self.request.user)
total_price = Basket.objects.filter(
user=self.request.user
).aggregate(
total_price=Sum(F('product__price') * F('quantity'))
)
purchase = Purchase.objects.create(
user=user,
total_price=total_price.get('total_price'),
email=email,
tel=tel,
country=country,
city=city,
street=street
)
purchase.products.add(*products)
Basket.objects.filter(user=self.request.user).delete()
return redirect('history_purchases')

通义千问Max 输出

from django.db.models import F, Sum

def form_valid(self, form):
# Extract cleaned data from the form
user = self.request.user
basket_products = Basket.objects.filter(user=user)

# Aggregate cleaned data for Purchase creation
purchase_data = {
'user': user,
'email': form.cleaned_data['email'],
'tel': form.cleaned_data['tel'],
'country': form.cleaned_data['country'],
'city': form.cleaned_data['city'],
'street': form.cleaned_data['street'],
}

# Calculate total price using a single query
total_price = basket_products.aggregate(
total_price=Sum(F('product__price') * F('quantity'))
)['total_price'] or 0 # Handle case where total_price might be None

# Create the Purchase instance
purchase = Purchase.objects.create(**purchase_data, total_price=total_price)

# Add products from the basket to the purchase
product_ids = basket_products.values_list('product', flat=True)
purchase.products.add(*product_ids)

# Clear the user's basket after the purchase is completed
basket_products.delete()

return super().form_valid(form)

正确答案: (4/5)
优点:
– 有效地获取相关产品对象可以在访问产品属性(例如, product__price )时最大限度地减少数据库查询。
total_price 的计算保留在交易块之外,从而降低了交易内部的复杂性。

缺点:
[item.product for item in shopping cart_items] 在内存中创建一个列表,这是不必要的,因为可以直接获取 ID
– 重复逻辑(提取表单数据和购物篮产品可以组织得更紧凑)

Deepseek R1 输出

def form_valid(self, form):
# Get user once and reuse
user = self.request.user

# Get form data in one dictionary operation
form_data = {
'email': form.cleaned_data['email'],
'tel': form.cleaned_data['tel'],
'country': form.cleaned_data['country'],
'city': form.cleaned_data['city'],
'street': form.cleaned_data['street']
}

# Get basket data in one query using select_related for better performance
basket_queryset = Basket.objects.filter(user=user).select_related('product')

# Calculate total price and get product IDs in a single iteration
total_price = 0
product_ids = []

for item in basket_queryset:
total_price += item.product.price * item.quantity
product_ids.append(item.product.id)

# Create purchase with all data
purchase = Purchase.objects.create(
user=user,
total_price=total_price,
**form_data
)

# Add products using the collected product IDs
purchase.products.add(*product_ids)

# Clear the basket (assuming this was intended from the truncated original)
basket_queryset.delete()

return super().form_valid(form)

正确答案:(4/5)✅


点击试用验证更多模型效果

优点:
– 使用 values_list(‘product’, flat=True) 直接从购物篮中获取产品 ID,而无需将完整对象加载到内存中
– 事务块封装了所有数据库操作,使得识别代码的关键部分变得更加容易。
– Extracting and passing form data is concise and integrated with the – 提取和传递表单数据简洁并与 Purchase.objects.create 语句集成

缺点:
– 代码不使用 select_related ,如果访问产品(例如,查询其价格),这可能会导致额外的查询。
– 紧凑的特性牺牲了一点清晰度,特别是对于初学者来说,因为所有逻辑都在交易块内。

自己比较一下

虽然我们强调了一些优势,但最适合您特定任务的模型取决于您的需求。以上测试和比较了两种语言模型: 通义千问Max 输出和 Deepseek R1。尝试不同的提示,看看哪个模型能为您带来最相关、最令人印象深刻的结果!‍

结论

测试GPT o1 预览版深探 R1
1.聊天
2.逻辑推理
3.创造力(4/5)✅(4.5/5)✅
4.数学
5.算法(5/5)✅(4.5/5)✅
6.重构与优化(4/5)✅(4/5)✅

每种模型的优缺点

通义千问Max

优势:

  1. 语言理解与生成:
  • 在多语言文本处理(如翻译、摘要)中表现优异,尤其在中文语境下能精准捕捉细微语义差异。
  • 生成自然流畅的长篇内容(如报告、故事),逻辑连贯且风格适配性强。
  1. 多模态交互:
  • 支持图像、文本混合输入的分析与描述,在视觉问答(VQA)任务中响应准确。
  1. 领域适应性:
  • 对垂直领域(如法律、医疗)的专业术语和知识库整合能力突出,提供符合行业规范的输出。
  1. 代码与工具使用:
  • 熟练调用API和编写脚本(Python、SQL等),能结合外部工具(如Wolfram Alpha)增强计算能力。
  1. 上下文记忆:
  • 在长对话中保持较高的上下文一致性,减少重复或偏离主题的情况。

弱点:

  1. 实时性数据依赖:
  • 对2024年后的动态信息(如新闻、科技进展)可能缺乏更新,需依赖外部检索补足。
  1. 复杂数学推导:
  • 涉及高阶数学(如拓扑学、非线性方程)时可能出现步骤跳跃或验证错误。
  1. 创意深度限制:
  • 艺术类创作(如音乐作曲、抽象绘画描述)时,创新性略逊于人类专家水平。
  1. 多任务并行瓶颈:
  • 同时处理多个异构任务(如实时翻译+代码生成)时,响应速度或精度可能下降。

Deepseek R1

优势:

  • 数学问题解决:擅长处理高级数学任务,展示出强大的推理能力。
  • 研究生水平的逻辑:在需要细致逻辑推理的任务(例如 GPQA 基准)中表现出色。
  • 编码效率:在数据库交互中使用内存优化方法,提供干净有效的解决方案。
  • 创造力:产生发人深省、富有反思性的创意成果,强调深度和原创性。

弱点:

  • Web 开发:添加按钮等功能,但有时其实现或功能会失败。

总结

何时使用通义千问Max:

  • 多轮对话与上下文理解:擅长处理需要长期记忆的复杂对话,如深度咨询或连续问答。
  • 多语言翻译与处理:支持高精度跨语言任务,包括实时翻译和文化语境适配。
  • 数据分析与可视化:适用于结构化数据解读、统计推断及图表生成需求。
  • 教育辅助与知识检索:高效解答学术问题,提供分步骤的学习资源或解题思路。
  • 安全与合规内容审核:可靠识别敏感信息,生成符合伦理规范的响应。 (注:通义千问Max作为通用大模型,其能力覆盖更广泛,但以上场景为突出优势领域。)

何时使用 Deepseek R1:

  • 数学问题解决:高级数学推理任务和问题解决的首选。
  • 研究生水平的逻辑:非常适合需要深度逻辑推理的细微任务。
  • 创意输出:适合生成具有深度、反思性、发人深省的内容。
  • 高效编码:适用于需要内存优化和简洁实现的任务。

通义千问Max在多语言处理、复杂对话和数据分析上表现突出,尤其擅长教育咨询与跨文化任务;而Deepseek R1则在高等数学、符号推理和内存高效编码领域占据优势,更适合学术研究与资源敏感型开发。两者定位鲜明,选择需基于项目核心需求:若需全球化服务或合规安全内容,优先通义千问Max;若涉及理论推导或硬件受限场景,Deepseek R1更为胜任。

若要从更多的基础技术参数、服务稳定性、互联网口碑等其它维度进行选型的话,可以自己选择期望的服务商制作比较报告

#你可能也喜欢这些API文章!