Recently, Github released a report that counted Top10, the most popular programming language, the most widely used software package and the most popular contributor in 2018. Python, Numpy, and <<>Tensorflow is ranked first, and it is expected to return. C++, Java, Pandas, and Scikit-learn are all in Top10.
In our 2018 Octoverse report, machine learning and data science has become a hot topic on GitHub. te nsorflow is one of the most contributory projects, pytorch is growing One of the fastest projects, Python is the third most popular language on GitHub. We decided to continue to delve into the current state of machine learning and data science on GitHub.
We captured the contribution data from January 1, 2018 to December 31, 2018. Contributions here may include pushing code, asking questions or extracting requests, commenting on questions or extracting requests, or reviewing extraction requests. For most import packages, we used the data in the dependency graph, which includes all public repositories and all private repositories that have been selected for dependency graphs.
The most popular machine learning programming language: Python leads
The most popular machine learning language on GitHub in 2018
We looked at the contributors to the repository using the Machine Learning topic tag and ranked the most common primary languages in the repository. Python is the most commonly used language in the machine learning resource library and the third most commonly used language on GitHub. However, not all machine learning projects use Python: some of the most common languages on GitHub are also the common language for machine learning projects.
Languages such as Julia, R, and Scala are also among the top 10 in machine learning projects, but have not entered the top ten of the entire GitHub language. Both Julia and R are languages commonly used by data scientists, and Scala is becoming more common in languages such as Apache Spark that interact with big data systems.
The most widely used machine learning and data science software package: Numpy first
Top-level packages imported into the Github repository in 2018
We extract data from the dependency graph and calculate the percentage of projects that import machine-learning or data science topics from popular Python packages. The above list shows the top ten packages imported by these projects. The result is shown below:
Numpy is a software package that supports multidimensional data math operations. It is the most frequently imported package and is used in nearly three-quarters of machine learning and data science projects.
Scipy is a software package for scientific computing, Pandas is a software package for managing data sets, and matplotlib is a visualization library that is used in more than 40% of machine learning and data science projects.
Scikit-learn is a very popular machine learning package that contains a large number of machine learning algorithm implementations, which is used by nearly 40% of projects.
Tensorflow is a package for processing neural network There are nearly a quarter of packages that use it.
The other packages in the top ten are functional packages: six of them are Python 2 and 3 compatibility libraries, python-dateutil and pytz are packages for processing dates.
The most popular machine learning project: Tensorflow folds
Top Machine Learning Project on GitHub in 2018
We also looked at the most open source projects under the "machine learning" label in 2018. Tensorflow is by far the most popular project, more than five times the number of contributors to the second-ranked scikit-learn.
The explosion/spaCy and RasaHQ /rasa_nlu projects focus on natural language processing issues.
The other four projects, CMU-Perceptual-Computing-Lab / openpose, thtrieu / darkflow, ageitgey / face_recognition and tesseract-ocr / tesseract, focus on image processing. The Julia language source code project is also one of the most contributor projects in 2018.