Book Image

Julia Programming Projects

By : Adrian Salceanu
Book Image

Julia Programming Projects

By: Adrian Salceanu

Overview of this book

Julia is a new programming language that offers a unique combination of performance and productivity. Its powerful features, friendly syntax, and speed are attracting a growing number of adopters from Python, R, and Matlab, effectively raising the bar for modern general and scientific computing. After six years in the making, Julia has reached version 1.0. Now is the perfect time to learn it, due to its large-scale adoption across a wide range of domains, including fintech, biotech, education, and AI. Beginning with an introduction to the language, Julia Programming Projects goes on to illustrate how to analyze the Iris dataset using DataFrames. You will explore functions and the type system, methods, and multiple dispatch while building a web scraper and a web app. Next, you'll delve into machine learning, where you'll build a books recommender system. You will also see how to apply unsupervised machine learning to perform clustering on the San Francisco business database. After metaprogramming, the final chapters will discuss dates and time, time series analysis, visualization, and forecasting. We'll close with package development, documenting, testing and benchmarking. By the end of the book, you will have gained the practical knowledge to build real-world applications in Julia.
Table of Contents (19 chapters)
Title Page
Copyright and Credits
Dedication
About Packt
Contributors
Preface
Index

Data harvesting through web scraping


The technique for extracting data from web pages using software is called web scraping. It is an important component of data harvesting, typically implemented through programs called web crawlers. Data harvesting or data mining is a useful technique, often used in data science workflows to collect information from the internet, usually from websites (as opposed to APIs), and then to process that data for different purposes using various algorithms. 

At a very high level, the process involves making a request for a web page, fetching its content, parsing its structure, and then extracting the desired information. This can be images, paragraphs of text, or tabular data containing stock information and prices, for example—pretty much anything that is present on a web page. If the content is spread across multiple web pages, the crawler will also extract the links and will automatically follow them to pull the rest of the pages, repeatedly applying the same...