Skip to content

Latest commit

 

History

History
84 lines (59 loc) · 4.27 KB

README.md

File metadata and controls

84 lines (59 loc) · 4.27 KB

Ruby-Capstone-Project-Scraper

Web Scraping

Definition

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. The web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Techniques

Web scraping is the process of automatically mining data or collecting information from the World Wide Web. It is a field with active developments sharing a common goal with the semantic web vision, an ambitious initiative that still requires breakthroughs in text processing, semantic understanding, artificial intelligence, and human-computer interactions. Current web scraping solutions range from the ad-hoc, requiring human effort, to fully automated systems that can convert entire websites into structured information, with limitations.

About Project

The purpose of this project is to develop a scraper tool to achieve web-scrapping. This was achieved using Ruby, Open-URI, and Nokogiri gem. Byebug debugger is used to check the values of the scrapped data from the page. In this project, I created a scraper that extracts job advertisements for junior web developers from the Simplyhired.com.

Built With

  • Ruby 2.7.1

Getting Started

To get started, you should first get this file in your local machine by downloading this project or typing. git clone https://github.com/arslanbisharat/Capstone_Project_Ruby/

Prerequisites

- Ruby installed on local machine
- Text editor (preferably: VSCode, Atom, Sublime)
- Git

Setup

If you have installed Ruby on your machine:

  1. Clone the project into your local machine using the `git clone command or download the zip file.
  2. Go into the project directory using the cd directory name command.
  3. Install required gems by using gem install <gem name>
  • gem install nokogiri
  • gem install colorize
  1. From the root directory type the bin/main.rb command.
  2. Run command RSpecc ` to test the various methods in the classes.

Running the scraper

When you run the project it will show you job advertisements on the selected page, then prompts the user to see more or stop. If you want to see more results you can press the 'y' button or 'Enter/Return key. If you want to stop or found a job that matches you, then press the 'n' or 'a button. The scraping process will be stopped.
In job advertisements you can found information about job title, hiring company and its location, estimated salary per year and the link about job descriptions. If you
interest in any job you can go to the job url and apply for the job.

Contributing

🤝 Contributions, issues, and feature requests are welcome! Start by:

1. Forking the project
2. Cloning the project to your local machine
3. cd into the project directory
4. Run git checkout -b your-branch-name
5. Make your contributions
6. Push your branch up to your forked repository
7. Open a Pull Request with a detailed description to the development branch of the original project for a review

Please feel free to contribute to any of these!

Feel free to check the issues page.

👤 Muhammad Arslan

Show your support!

Give a 🌟 if you like this project! 😊

📝 Copyright

Acknowledgments

📝 License

This project is MIT licensed

Happy coding!