Show HN: 马尔可夫链与小型 LLM 究竟有何不同?
4 分•作者: JPLeRouzic•7 个月前
我打磨了一个马尔可夫链生成器,并用 Uri Alon 等人 [0] 的一篇文章对其进行了训练。<p>它生成的文本,在我看来至少与 NanoGPT 这样的小型 LLM 相当。以下是一个例子:<p>jplr@mypass:~/Documenti/2025/SimpleModels/v3_very_good$
./SLM10b_train UriAlon.txt 3<p>正在训练阶数为 3 的模型...<p>跳字检测:已禁用(阶数 < 5)<p>剪枝已禁用<p>正在计算 JSON 导出的模型大小...<p>将导出 29832 个模型条目<p>正在导出词汇表(1727 个条目)...<p>词汇表导出完成。<p>正在导出模型条目...<p><pre><code> 已处理 12000 个上下文,写入 28765 个条目(96.4%)...
</code></pre>
JSON 导出完成:29832 个条目已写入 model.json<p>模型已训练并保存到 model.json<p>词汇表大小:1727<p>jplr@mypass:~/Documenti/2025/SimpleModels/v3_very_good$ ./SLM9_gen model.json<p>衰老细胞模型需要全面的发病率数据。为了获得如此庞大的关节医学数据库,存在风险因素。因此,该理论可以扩展到描述动脉粥样硬化和代谢综合征的演变。例如,晚期 2 型糖尿病与 β 细胞功能的崩溃有关。这种崩溃有两个参数:衰老细胞的比例预计会影响疾病阈值。对于每个人,使用 SR 模型模拟衰老细胞丰度,该模型具有近似指数的发病率曲线,并在高龄时下降。在本节中,我们模拟了各种与年龄相关的发病率曲线。下一节提供了疾病类别的例子,这些类别在溶栓治疗方面显示出改善,这往往在定性上支持这种预测。模型不同的疾病阈值作为疾病发生时生理参数 ϕ 增加的值。增加易感性参数 s,该参数在 BMI 低于 25(男性)和 54(女性)之间变化约 3 倍,至少与年龄轻度相关,而 25(男性)和 28(女性)与年龄强烈相关,如上所述。在这些中,我们发现 66 个被该模型很好地描述,作为一系列反馈机制,这些机制可以为年轻小鼠提供半衰期为几天的体内平衡,但它们的清除率在老年小鼠中会减慢,对于特定类型的癌症具有强烈的风险因素,应增加关节的清除率,该关节承载着衰老最常见的生物学过程,该过程控制着至少 104 人的病理学发病,总共 877 个疾病类别代码(参见 SI 第 9 节),每年增加 6-8%。该两参数模型很好地描述了与年龄强烈相关的 ICD9 代码:90% 的代码显示 R 2 > 0.9)(图 4c)。这种一致性与先前提出的用于癌症、主要纤维化疾病和数百种其他与年龄相关的疾病状态的 IMII 模型相似,这些疾病状态是从 10−4 降低癌症发病率获得的。当允许超过其疾病类别阈值机制时,可以实现更好的拟合,为起源不明的疾病(如骨髓和皮肤)提供推定的病因。因此,肺泡在外周的突然崩溃,免疫清除能力下降。例如,NK 细胞也会清除衰老细胞,也会清除其他形式的与年龄相关的损伤和衰退(De Bourcy 等,2017)。可以将其描述为首达时间问题,询问突变何时会损害支气管的颗粒清除并增加对肺泡细胞的损伤(Yang 等,2019;Xu 等,2018),以及导致 T 细胞靶向衰老细胞的免疫疗法(Amor 等,2020)。由于预计这些治疗方法具有指数发病率曲线,该曲线在高龄时会减慢。有趣的是,主要影响与癌症生长率与清除率相反。接下来,我们考虑上面讨论的前线组织的情况。<p>[0] <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7963340/" rel="nofollow">https://pmc.ncbi.nlm.nih.gov/articles/PMC7963340/</a>
查看原文
I polished a Markov chain generator and trained it on an article by Uri Alon and al [0].<p>It generates text that seems to me at least on par with tiny LLMs, such as demonstrated by NanoGPT. Here is an example:<p>jplr@mypass:~/Documenti/2025/SimpleModels/v3_very_good$
./SLM10b_train UriAlon.txt 3<p>Training model with order 3...<p>Skip-gram detection: DISABLED (order < 5)<p>Pruning is disabled<p>Calculating model size for JSON export...<p>Will export 29832 model entries<p>Exporting vocabulary (1727 entries)...<p>Vocabulary export complete.<p>Exporting model entries...<p><pre><code> Processed 12000 contexts, written 28765 entries (96.4%)...
</code></pre>
JSON export complete: 29832 entries written to model.json<p>Model trained and saved to model.json<p>Vocabulary size: 1727<p>jplr@mypass:~/Documenti/2025/SimpleModels/v3_very_good$ ./SLM9_gen model.json<p>Aging cell model requires comprehensive incidence data. To obtain such a large medical database of the joints are risk factors. Therefore, the theory might be extended to describe the evolution of atherosclerosis and metabolic syndrome. For example, late‐stage type 2 diabetes is associated with collapse of beta‐cell function. This collapse has two parameters: the fraction of the senescent cells are predicted to affect disease threshold . For each individual, one simulates senescent‐cell abundance using the SR model has an approximately exponential incidence curve with a decline at old ages In this section, we simulated a wide range of age‐related incidence curves. The next sections provide examples of classes of diseases, which show improvement upon senolytic treatment tends to qualitatively support such a prediction. model different disease thresholds as values of the disease occurs when a physiological parameter ϕ increases due to the disease. Increasing susceptibility parameter s, which varies about 3‐fold between BMI below 25 (male) and 54 (female) are at least mildly age‐related and 25 (male) and 28 (female) are strongly age‐related, as defined above. Of these, we find that 66 are well described by the model as a wide range of feedback mechanisms that can provide homeostasis to a half‐life of days in young mice, but their removal rate slows down in old mice to a given type of cancer have strong risk factors should increase the removal rates of the joint that bears the most common biological process of aging that governs the onset of pathology in the records of at least 104 people, totaling 877 disease category codes (See SI section 9), increasing the range of 6–8% per year. The two‐parameter model describes well the strongly age‐related ICD9 codes: 90% of the codes show R 2 > 0.9) (Figure 4c). This agreement is similar to that of the previously proposed IMII model for cancer, major fibrotic diseases, and hundreds of other age‐related disease states obtained from 10−4 to lower cancer incidence. A better fit is achieved when allowing to exceed its threshold mechanism for classes of disease, providing putative etiologies for diseases with unknown origin, such as bone marrow and skin. Thus, the sudden collapse of the alveoli at the outer parts of the immune removal capacity of cancer. For example, NK cells remove senescent cells also to other forms of age‐related damage and decline contribute (De Bourcy et al., 2017). There may be described as a first‐passage‐time problem, asking when mutated, impair particle removal by the bronchi and increase damage to alveolar cells (Yang et al., 2019; Xu et al., 2018), and immune therapy that causes T cells to target senescent cells (Amor et al., 2020). Since these treatments are predicted to have an exponential incidence curve that slows at very old ages. Interestingly, the main effects are opposite to the case of cancer growth rate to removal rate We next consider the case of frontline tissues discussed above.<p>[0] <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7963340/" rel="nofollow">https://pmc.ncbi.nlm.nih.gov/articles/PMC7963340/</a>