{"id":141,"date":"2025-05-01T17:04:00","date_gmt":"2025-05-01T09:04:00","guid":{"rendered":"https:\/\/blog.liu-qi.cn\/?p=141"},"modified":"2026-04-18T21:53:32","modified_gmt":"2026-04-18T13:53:32","slug":"qwen3%e5%80%bc%e4%b8%8d%e5%80%bc%e5%be%97%e6%99%ae%e9%80%9a%e7%94%a8%e6%88%b7%e6%9c%ac%e5%9c%b0%e9%83%a8%e7%bd%b2%ef%bc%9f3%e4%b8%aa%e8%90%bd%e5%9c%b0%e5%9c%ba%e6%99%af%ef%bc%8c30%e9%81%93%e9%a2%98","status":"publish","type":"post","link":"https:\/\/en.blog.liu-qi.cn\/2025\/05\/01\/qwen3%e5%80%bc%e4%b8%8d%e5%80%bc%e5%be%97%e6%99%ae%e9%80%9a%e7%94%a8%e6%88%b7%e6%9c%ac%e5%9c%b0%e9%83%a8%e7%bd%b2%ef%bc%9f3%e4%b8%aa%e8%90%bd%e5%9c%b0%e5%9c%ba%e6%99%af%ef%bc%8c30%e9%81%93%e9%a2%98\/","title":{"rendered":"Is Qwen3 Worth Local Deployment for Everyday Users? 3 Use Cases, 30 Questions, 300 Responses, 10-Model Showdown Scored by Doubao AI!"},"content":{"rendered":"<p>A few days ago, Alibaba released Qwen3. Among all the news, the one that attracted me the most was:<\/p>\n<p>Qwen3-4B can rival the previous generation&#8217;s open-source powerhouse, Qwen2.5-72B<\/p>\n<p>That&#8217;s right, as a pure user, I&#8217;m more interested in the small models of this Qwen3 release.<\/p>\n<p>Although Qwen3-235B-A22B is also powerful, ranking first on the global open-source leaderboard, respect, I can&#8217;t deploy it locally. As for using it online, for someone like me who has recently collected the trio of GPT, Claude, and Gemini plus a Cursor membership, the appeal isn&#8217;t that great.<\/p>\n<p>The improvement in small model performance means an increase in the feasibility of local deployment for individuals. Being able to deploy locally means offline data privatization and unlimited tokens, successfully piquing the interest of this 4090 user.<\/p>\n<p>But there&#8217;s a premise: the performance of Qwen3&#8217;s small models must be truly commendable.<\/p>\n<p>Long-time readers should remember that when DeepSeek personal local deployment became popular, I was singing a different tune:<\/p>\n<p><a href=\"https:\/\/blog.liu-qi.cn\/2025\/01\/29\/%E9%86%92%E9%86%92%EF%BC%81%E4%BD%A0%E6%9C%AC%E5%9C%B0%E9%83%A8%E7%BD%B2%E7%9A%84deepseek-r1%EF%BC%8C%E5%AE%83%E4%B8%8D%E6%98%AFr1\/\">Wake up! The DeepSeek-R1 you deployed locally isn&#8217;t the real R1<\/a><\/p>\n<p>First, the distilled models from DeepSeek aren&#8217;t actually DeepSeek, and second, the performance of the few models that can be deployed on consumer-grade graphics cards is indeed quite mediocre. In comparison, it&#8217;s not as good as using the API.<\/p>\n<p>So, entering today&#8217;s topic, let&#8217;s see how Qwen3&#8217;s small models actually perform.<\/p>\n<p>First, a disclaimer: My testing scenarios this time are quite subjective and not a serious review. The relevant scores are given by AI (Doubao-1.5-thinking-pro) for entertainment.<\/p>\n<p>In terms of scenarios, I selected three based on my own usage. I think a significant proportion of ordinary office workers have scenarios similar to mine:<\/p>\n<ol>\n<li>Review and revision of purposeful copywriting.<\/li>\n<\/ol>\n<p>The test questions used 10 problematic positive review copies from Taobao.<br \/>\n2. Summarization and understanding of content.<\/p>\n<p>The test questions were 10 articles randomly selected from the internet.<br \/>\n3. Logical and computational abilities under basic Q&amp;A.<\/p>\n<p>The test questions were 10 randomly selected questions (with answers) from the &#8216;Weak Intelligence Bar&#8217; training set.<\/p>\n<p>A total of 10 large models participated in the test:<\/p>\n<ul>\n<li>6 were deployed locally via Ollama, namely:<\/li>\n<\/ul>\n<p>qwen3:8b, qwen3:14b, qwen3:32b, qwen3:30b-a3b, deepseek-r1:8b, deepseek-r1:32b<br \/>\n&#8211; 4 were accessed via online APIs, namely:<\/p>\n<p>QwQ-32B\u3001GLM-4-Flash\u3001DeepSeek-V3\u3001Grok-3<\/p>\n<p>The judge is Doubao-1.5-thinking-pro:<\/p>\n<ul>\n<li>For the first two scenarios, it will adopt a God&#8217;s-eye view, reviewing all contestants&#8217; answers before scoring<\/li>\n<li>For the Weak Intelligence Bar test, it will score based on the standard answers<\/li>\n<\/ul>\n<p>For the testing environment, I initially planned to use CherryStudio, but later found that models would copy each other&#8217;s homework when answering the same question. So, I ended up using a multi-dimensional table.<\/p>\n<p><img decoding=\"async\" alt=\"\" loading=\"lazy\" src=\"https:\/\/blog.liu-qi.cn\/wp-content\/uploads\/2025\/05\/001-ada482f09ea3.png\" \/><\/p>\n<p><img decoding=\"async\" alt=\"\" loading=\"lazy\" src=\"https:\/\/blog.liu-qi.cn\/wp-content\/uploads\/2025\/05\/002-1d2e574c627a.png\" \/><\/p>\n<p>This also made it convenient to present the complete test questions and contestants&#8217; answers at the end.<\/p>\n<p>Below are the official test results.<\/p>\n<p>Each scenario has 10 questions; the average score is the sum of scores divided by 10.<\/p>\n<p>Test 1: Copywriting Analysis and Revision:<\/p>\n<p><img decoding=\"async\" alt=\"\" loading=\"lazy\" src=\"https:\/\/blog.liu-qi.cn\/wp-content\/uploads\/2025\/05\/003-261f1db45ea9.png\" \/><\/p>\n<p>Scores:<\/p>\n<p><img decoding=\"async\" alt=\"\" loading=\"lazy\" src=\"https:\/\/blog.liu-qi.cn\/wp-content\/uploads\/2025\/05\/004-0342e6db50bc.png\" \/><\/p>\n<p>Qwen3 indeed performed impressively, with the 32B model taking the top spot in the first test.<\/p>\n<p>The two previously distilled DeepSeek models\u2014as I said, I wasn&#8217;t lying\u2014ranked relatively low.<\/p>\n<p>The freely callable GLM-4-Flash unfortunately ranked last. But to be fair to it, it&#8217;s hard to find an API that&#8217;s as free, unlimited, and high-concurrency as this one.<\/p>\n<p>Test 2: Article Summarization and Reflection:<\/p>\n<p><img decoding=\"async\" alt=\"\" loading=\"lazy\" src=\"https:\/\/blog.liu-qi.cn\/wp-content\/uploads\/2025\/05\/005-ea11c5b0546c.png\" \/><\/p>\n<p>Scores:<\/p>\n<p><img decoding=\"async\" alt=\"\" loading=\"lazy\" src=\"https:\/\/blog.liu-qi.cn\/wp-content\/uploads\/2025\/05\/006-9deca7a7f80e.png\" \/><\/p>\n<p>QWQ-32B took the lead.<\/p>\n<p>DeepSeek-V3 secured second place.<\/p>\n<p>The previous first-place Qwen3:32B came in third this time.<\/p>\n<p>The distilled 32B from DeepSeek (actually distilled from Qwen) rose to fourth, while the 8B still lagged behind.<\/p>\n<p>GLM-4-Flash continued to trail along.<\/p>\n<p>Test 3: Weak Intelligence Bar Answer Exam:<\/p>\n<p><img decoding=\"async\" alt=\"\" loading=\"lazy\" src=\"https:\/\/blog.liu-qi.cn\/wp-content\/uploads\/2025\/05\/007-ffc5c0bdc2ab.png\" \/><\/p>\n<p>Scores:<\/p>\n<p><img decoding=\"async\" alt=\"\" loading=\"lazy\" src=\"https:\/\/blog.liu-qi.cn\/wp-content\/uploads\/2025\/05\/008-88cc05662f20.png\" \/><\/p>\n<p>DeepSeek-V3 remained strong (this time using the 0324 version).<\/p>\n<p>Qwen3:14B made a bold move to secure second place, with 32B in third, also an excellent performer.<\/p>\n<p>The distilled 8B from DeepSeek unfortunately ranked last, with 32B third from the bottom.<\/p>\n<p>GLM-4-Flash made a comeback on the Weak Intelligence Bar, saving some face.<\/p>\n<p>Final Test Results:<\/p>\n<p><img decoding=\"async\" alt=\"\" loading=\"lazy\" src=\"https:\/\/blog.liu-qi.cn\/wp-content\/uploads\/2025\/05\/009-edae7fdd777c.png\" \/><\/p>\n<p>DeepSeek-V3 took first place.<\/p>\n<p>Qwen3:32B came in second.<\/p>\n<p>QWQ-32B surprisingly secured third place.<\/p>\n<p>Qwen3:30B-A3B, which I had high hopes for, unexpectedly performed poorly in these tests.<\/p>\n<p>Based on these results, for friends with 90-series or higher GPUs, I recommend choosing Qwen3:32B for local deployment. The slightly inferior 14B is also a good choice, and the 8B will outperform the previous distilled DeepSeek versions.<\/p>\n<p>You can view the complete test questions and results at the link:<\/p>\n<p>https:\/\/ilovezhiwai.feishu.cn\/wiki\/HThDwnX0FiyTIakyDe9c2z99nef?from=from_copylink<\/p>\n<p>Let me reiterate: This test is neither scientific nor rigorous, and is purely for entertainment. Please do not use the conclusions from this test in serious contexts.<\/p>\n<p>Wishing everyone a happy May Day holiday!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A practical test evaluates Qwen3 small models for local deployment across three real-world scenarios, comparing 10 models with AI scoring to see if they&#8217;re viable for ordinary users.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[24],"tags":[21,20,10],"class_list":["post-141","post","type-post","status-publish","format-standard","hentry","category-articles","tag-ollama","tag-qwen","tag-10"],"_links":{"self":[{"href":"https:\/\/en.blog.liu-qi.cn\/index.php\/wp-json\/wp\/v2\/posts\/141","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/en.blog.liu-qi.cn\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/en.blog.liu-qi.cn\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/en.blog.liu-qi.cn\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/en.blog.liu-qi.cn\/index.php\/wp-json\/wp\/v2\/comments?post=141"}],"version-history":[{"count":0,"href":"https:\/\/en.blog.liu-qi.cn\/index.php\/wp-json\/wp\/v2\/posts\/141\/revisions"}],"wp:attachment":[{"href":"https:\/\/en.blog.liu-qi.cn\/index.php\/wp-json\/wp\/v2\/media?parent=141"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/en.blog.liu-qi.cn\/index.php\/wp-json\/wp\/v2\/categories?post=141"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/en.blog.liu-qi.cn\/index.php\/wp-json\/wp\/v2\/tags?post=141"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}