High-Level Approach

The high level approach begins with establishing a TCP socket connection. In order to crawl the website, it is critical to establish a secure connection. Then, once the connection has been established, you start with logging in to the FakeSpot website. In order to login to the website, this starts with a get request to go to the link, a post request to actually log in, then passing over the cookies, and then receiving all of the links. You need to make sure that you are logged in properly before you can get the links and search for the flags. Then, comes the detection until you find all 5 of the flags continue to search the website. We also built in a feature that in the case that you exceed 5000 requests, then restart the socket connection. This is to handle load management.

Now, we will begin the searching for the secret flags. If you have not seen the flag, then add that link to a global list. Followed by that, get the request's cookied and go ahead and send the cookies. This is to mimic the way that the browser works. You want to make sure to pass along the cookies to prevent running into CSRF errors. Then, for each of the links that you receive check the response code that is displayed. The response code is parsed out with the help of a helper function. Based off of what the response code is, then either ignore the link, retry the link, retry with a new link, or look for the secret flag. Eventually, once you receive the secret flag, print out the flag. You will continue to do this indefinitely until all of the flags have been printed out (with a max of 5).

Challenges you faced

The major challenge that we faced was understanding what was expected and how to go about that. We have never worked on something like this in the past, so there are some obstacles that you need to get through in order to understand what is going on and then go about coding it. I wish there was an hour long video done by one of the professors specifically designed to explain how the details of the channels worked. This would increase my understanding of the material and lead me to the correct path in a faster manner.

Overview of testing the code

The major way that we went about testing the code was based off of running the code and seeing if you're receiving the secret flags back. We introduced debugging print statements to make sure that we are adding the links to the appropriate variable. Then, based off of adding that link we are making sure that we are parsing the link correctly for in the case that there is a secret flag embedded inside then we should be able to locate it and then find it. So, after we find a secret flag we kept on trying it to see if there are other secret flags. We ran it multiple times locally to ensure that the code is working correctly.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.DS_Store		.DS_Store
3700crawler		3700crawler
Makefile		Makefile
README.md		README.md
secret_flags		secret_flags

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High-Level Approach

Challenges you faced

Overview of testing the code

About

Releases

Packages

Languages

AnshulShirude/Web-Crawler

Folders and files

Latest commit

History

Repository files navigation

High-Level Approach

Challenges you faced

Overview of testing the code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages