Ask HN: 数据科学家的高性能计算学习路径

1作者: proudmo8 个月前
我拥有数学学位,目前担任数据科学家。虽然我熟练使用 Python 和核心机器学习技术,但我意识到,为了优化代码速度并扩展算法以适应大型系统,我需要加深对高性能计算(HPC)和性能工程的理解。<p>具体来说,我感兴趣的方面包括: * 编写高性能、内存高效的代码(例如,使用 C++、SIMD、GPU、并行计算) * HPC 系统设计和架构 * 优化大规模数据处理和机器学习基础设施 * 针对数据密集型任务进行性能分析、延迟优化和内存管理<p>我正在寻找: 1. 能够指导我从扎实的数学和机器学习基础过渡到性能优化的书籍、资源、教程、在线学位课程 2. 有效的学习路径,帮助我从一般的数据科学角色转变为从事性能关键型系统和大规模计算环境的工作<p>我渴望提高构建更高效系统,并在必要时以接近实时的性能处理大型数据集或复杂模型的能力。<p>非常欢迎任何推荐、个人经验或资源,以帮助指导我的学习!
查看原文
I have a degree in mathematics and currently work as a data scientist. While I’m comfortable with Python and core machine learning techniques, I’ve realized that I need to deepen my understanding of high-performance computing (HPC) and performance engineering in order to optimize my code for speed and scale up algorithms for large systems.<p>Specifically, I’m interested in: * Writing high-performance, memory-efficient code (e.g., using C++, SIMD, GPU, parallel computing) * HPC system design and architecture * Optimizing large-scale data processing and ML infrastructure * Profiling, latency optimization, and memory management for data-heavy tasks<p>I’m looking for: 1. Books, resources, tutorials, online degrees that can guide me from a strong mathematical and ML foundation into performance optimization 2. Effective learning paths to transition from a general data science role to working with performance-critical systems and large-scale compute environments<p>I’m keen to improve my ability to build more efficient systems and handle large datasets or complex models with near real-time performance where necessary.<p>Would love any recommendations, personal experiences, or resources to help guide my learning!