- Graceful Showdown. detail
- Node Info Optimization. detail
- Append System Environment Variables to Tasks. detail
- Auto Refresh Task Log. detail
- Enable HTTPS Deployment. detail
- Unable to fetch spider list info in schedule jobs. detail
- Unable to fetch node info from worker nodes. detail
- Unable to select node when trying to run spider tasks. detail
- Unable to fetch result count when result volume is large. #260
- Node issue in schedule tasks. #244
- Docker Image Optimization. Split docker further into master, worker, frontend with alpine image.
- Unit Tests. Covered part of the backend code with unit tests.
- Frontend Optimization. Login page, button size, hints of upload UI optimization.
- More Flexible Node Registration. Allow users to pass a variable as key for node registration instead of MAC by default.
- Uploading Large Spider Files Error. Memory crash issue when uploading large spider files. #150
- Unable to Sync Spiders. Fixes through increasing level of write permission when synchronizing spider files. #114
- Spider Page Issue. Fixes through removing the field "Site". #112
- Node Display Issue. Nodes do not display correctly when running docker containers on multiple machines. #99
- Golang Backend: Refactored code from Python backend to Golang, much more stability and performance.
- Node Network Graph: Visualization of node typology.
- Node System Info: Available to see system info including OS, CPUs and executables.
- Node Monitoring Enhancement: Nodes are monitored and registered through Redis.
- File Management: Available to edit spider files online, including code highlight.
- Login/Regiser/User Management: Require users to login to use Crawlab, allow user registration and user management, some role-based authorization.
- Automatic Spider Deployment: Spiders are deployed/synchronized to all online nodes automatically.
- Smaller Docker Image: Slimmed Docker image and reduced Docker image size from 1.3G to ~700M by applying Multi-Stage Build.
- Node Status. Node status does not change even though it goes offline actually. #87
- Spider Deployment Error. Fixed through Automatic Spider Deployment #83
- Node not showing. Node not able to show online #81
- Cron Job not working. Fixed through new Golang backend #64
- Flower Error. Fixed through new Golang backend #57
- Documentation: Better and much more detailed documentation.
- Better Crontab: Make crontab expression through crontab UI.
- Better Performance: Switched from native flask engine to
gunicorn
. #78
- Deleting Spider. Deleting a spider does not only remove record in db but also removing related folder, tasks and schedules. #69
- MongoDB Auth. Allow user to specify
authenticationDatabase
to connect tomongodb
. #68 - Windows Compatibility. Added
eventlet
torequirements.txt
. #59
- Docker: User can run docker image to speed up deployment.
- CLI: Allow user to use command-line interface to execute Crawlab programs.
- Upload Spider: Allow user to upload Customized Spider to Crawlab.
- Edit Fields on Preview: Allow user to edit fields when previewing data in Configurable Spider.
- Spiders Pagination. Fixed pagination problem in spider page.
- Automatic Extract Fields: Automatically extracting data fields in list pages for configurable spider.
- Download Results: Allow downloading results as csv file.
- Baidu Tongji: Allow users to choose to report usage info to Baidu Tongji.
- Results Page Pagination: Fixes so the pagination of results page is working correctly. #45
- Schedule Tasks Duplicated Triggers: Set Flask DEBUG as False so that schedule tasks won't trigger twice. #32
- Frontend Environment: Added
VUE_APP_BASE_URL
as production mode environment variable so the API call won't be alwayslocalhost
in deployed env #30
- Configurable Spider: Allow users to create a spider to crawl data without coding.
- Advanced Stats: Advanced analytics in spider detail view.
- Sites Data: Added sites list (China) for users to check info such as robots.txt and home page response time/code.
- Basic Stats: User can view basic stats such as number of failed tasks and number of results in spiders and tasks pages.
- Near Realtime Task Info: Periodically (5 sec) polling data from server to allow view task info in a near-realtime fashion.
- Scheduled Tasks: Allow users to set up cron-like scheduled/periodical tasks using apscheduler.
- Initial Release