Skip to content

Commit

Permalink
V : 0.14.0
Browse files Browse the repository at this point in the history
  • Loading branch information
Matthieu committed Feb 24, 2018
1 parent 0818b32 commit dfd5c4e
Show file tree
Hide file tree
Showing 14 changed files with 215 additions and 15 deletions.
4 changes: 4 additions & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
### Changelog :

0.14.0 : stability : corrected the scraper in order to allow it to continue to use mangafox ho changed name ( mangafox.la to fanfox.net ) +implemented a parameter ( loop_on_todo_qt ) to allow configuration of the loop_on_todo option<br />

###### Warning : the params database is not compatible<br />Please look at the file migration/0.12.x_to_0.13.x.txt<br />

0.13.0 : stability : corrected the scraper in order to allow it to continue to use mangafox ho changed domain name ( .me to .la ) + implemented a new param loop_on_todo in order to allow MangaScrap to download a maximum of todo pages in one go without having to put multiple instructions together<br />

###### Warning : the params database is not compatible<br />Please look at the file migration/0.12.x_to_0.13.x.txt<br />
Expand Down
13 changes: 13 additions & 0 deletions migration/0.13.x_to_0.14.x.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env bash

echo "ALTER TABLE Download ADD loop_on_todo_times INT;
.exit
" | sqlite3 ~/.MangaScrap/db/params.db

ruby ./../MangaScrap.rb param set ltt 5

ruby ../tools/mangafox_to_fanfox.rb

echo ""
echo "done"
echo ""
7 changes: 7 additions & 0 deletions migration/0.13.x_to_0.14.x.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
1. update the params database with this instruction :
ALTER TABLE Download ADD loop_on_todo_times INT;

2. give the new parameter ( ltt ) a value. Recommended is 5

3. use the tool to update the database and ensure that MangaScrap will be able to continue to update your mangas
ruby tools/mangafox_to_fanfox.rb
7 changes: 6 additions & 1 deletion sources/DB/Manga_database.rb
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,12 @@ def get_manga(manga_data)
puts 'Error '.red + ': while trying to get manga in database => Data is nil'
exit 2
end
Utils_database::db_exec('SELECT * FROM manga_list WHERE link=?', "Exception while getting #{manga_data[:name]} in database", @db, [manga_data[:link]])[0]
buff = nil
if manga_data[:id] != nil
Utils_database::db_exec('SELECT * FROM manga_list WHERE id=?', "Exception while getting #{manga_data[:name]} in database", @db, [manga_data[:id]])[0]
else
Utils_database::db_exec('SELECT * FROM manga_list WHERE link=?', "Exception while getting #{manga_data[:name]} in database", @db, [manga_data[:link]])[0]
end
end

def manga_in_data?(manga_data)
Expand Down
6 changes: 3 additions & 3 deletions sources/DB/manga_data/Manga_data.rb
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def extract_data(display)
if @data[:website] == nil
return false
end
@data[:link] = @data[:website][:link] + @data[:website][:to_complete] + @data[:name]
@data[:link] = @data[:website][:link] + @data[:website][:to_complete] + @data[:name] + '/'
elsif @data[:link] != nil # got manga_data with link
unless Web_data.instance.get_web_info_from_link(@data, display)
return false
Expand All @@ -64,11 +64,11 @@ def validate_data(connect, display)
return false
end
if ret && connect # if connect is true, the manga should not be found in the database
puts 'Warning :'.yellow + ' ' + @data[:name] + ' of ' + @data[:website][:link] + ' is already in the database, ignoring it'
puts 'Warning :'.yellow + ' ' + @data[:name] + ' / ' + @data[:link] + ' is already in the database, ignoring it'
return false
end
if !ret && !connect # if connect is false, it should be in the database
puts 'Warning :'.yellow + ' ' + @data[:name] + ' of ' + @data[:website][:link] + ' is not in the database, ignoring it'
puts 'Warning :'.yellow + ' ' + @data[:name] + ' / ' + @data[:link] + ' is not in the database, ignoring it'
return false
end
# should the manga not be in the database and Manga_data require connection,
Expand Down
6 changes: 3 additions & 3 deletions sources/DB/manga_data/Web_data.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ class Web_data
# private instance methods
def initialize
@sites = []
@sites << Struct::Website.new('http://mangafox.la/', %w(http://mangafox.la mangafox.la mangafox),
'mangafox/', 'manga/', Download_Mangafox)
@sites << Struct::Website.new('http://fanfox.net/', %w(http://fanfox.net fanfox.net fanfox),
'mangafox/', 'manga/', Download_Mangafox) # purposely left in the old directory
@sites << Struct::Website.new('http://www.mangareader.net/', %w(http://www.mangareader.net www.mangareader.net mangareader.net mangareader),
'mangareader/', '', Download_Mangareader_Pandamanga)
@sites << Struct::Website.new('http://www.mangapanda.com/', %w(http://www.mangapanda.com www.mangapanda.com mangapanda.com mangapanda),
Expand All @@ -21,7 +21,7 @@ def initialize

def extract_values_from_link(link)
# get site
# throw exception if bad site
# throw exception if bad sites
# call static method of download class
end

Expand Down
8 changes: 5 additions & 3 deletions sources/DB/sub_params/Download.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def param_set(id, value, display = true)
case id
when 'lt'
return param_check_bool(param, value)
when 'bs', 'fs', 'nbf', 'es', 'ct', 'dt'
when 'bs', 'fs', 'nbf', 'es', 'ct', 'dt', 'ltt'
return param_check_nb(param, value)
when 'mp'
begin
Expand All @@ -42,14 +42,15 @@ def get_params(default = false)
ret << Struct::Param_value.new('connect_timeout', 'cto', 'int', ((default) ? 20 : @params[:connect_timeout]), self, 1, 300)
ret << Struct::Param_value.new('download_timeout', 'dt', 'int', ((default) ? 300 : @params[:download_timeout]), self, 0, 300)
ret << Struct::Param_value.new('loop_on_todo', 'lt', 'bool', ((default) ? true : @params[:loop_on_todo]), self)
ret << Struct::Param_value.new('loop_on_todo_times', 'ltt', 'int', ((default) ? 5 : @params[:loop_on_todo_times]), self, 1, 10)
end

def initialize
@display = true
@db_name = 'Download'
@template_file = 'sources/templates/text/params/download.txt'
Struct.new('Download_params', :id, :manga_path, :between_sleep, :failure_sleep, :nb_tries_on_fail, :error_sleep,
:connect_timeout, :download_timeout, :loop_on_todo)
:connect_timeout, :download_timeout, :loop_on_todo, :loop_on_todo_times)
init("CREATE TABLE IF NOT EXISTS #{@db_name} (
Id INTEGER PRIMARY KEY AUTOINCREMENT,
manga_path TEXT,
Expand All @@ -59,7 +60,8 @@ def initialize
error_sleep FLOAT,
connect_timeout INT,
download_timeout INT,
loop_on_todo VARCHAR(5))") do |data|
loop_on_todo VARCHAR(5),
loop_on_todo_times INT)") do |data|
@params = Struct::Download_params.new(*data)
end
end
Expand Down
2 changes: 1 addition & 1 deletion sources/DB/sub_params/params_module.rb
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ def init(init_exec)
data = get_data_from_db(init_exec)
data = prepare_data(data)
yield(data)
@params_list = get_params # todo : les params ont déjà été get avec les valeurs par défaut, regarder s'il ne serait pas possible de faire usae du tableau de structures plutot que d'en demander un autre
@params_list = get_params # todo : les params ont déjà été get avec les valeurs par défaut, regarder s'il ne serait pas possible de faire usage du tableau de structures plutot que d'en demander un autre
end

public # ========================================================================================================= public
Expand Down
113 changes: 113 additions & 0 deletions sources/Download/fanfox.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
class Download_Mangafox
include Base_downloader

private
def extract_links(manga)
links = @manga_data[:index_page].xpath('//a[@class="tips"]').map{ |link| link['href'] }
if links == nil || links.size == 0
raise ('failed to get manga '.red + manga[:name].yellow + ' chapter index'.red)
end
links
end

public
def self.volume_string_to_int(string)
case string
when 'TBD'
volume = -2
when 'NA'
volume = -3
when 'ANT'
volume = -4
else
volume = string.to_i
end
volume
end

def self.data_extractor(link)
link += '1.html'
link_split = link.split('/')
page = link_split[link_split.size - 1].chomp('.html').to_i
link_split[link_split.size - 2][0] = ''
chapter = link_split[link_split.size - 2].to_f
if chapter % 1 == 0
chapter = chapter.to_i
end
if link_split.size == 8
link_split[link_split.size - 3][0] = ''
if link_split[link_split.size - 3] =~ /\A\d+\z/
volume = link_split[link_split.size - 3].to_i
else
if link_split[link_split.size - 3] == 'NA'
volume = -3
elsif link_split[link_split.size - 3] == 'TBD'
volume = -2
elsif link_split[link_split.size - 3] == 'ANT'
volume = -4
else
volume = -42 # error value
end
end
else
volume = -1 # no volume
end
ret = Array.new
ret << volume << chapter << page
ret
end

def link_generator(volume, chapter, page)
chapter = chapter.to_i if chapter % 1 == 0
link = @manga_data[:website][:link] + 'manga/' + @manga_data[:name] + '/'
if volume >= 0
vol_buffer = ((volume >= 10) ? '' : '0')
link += 'v' + vol_buffer + volume.to_s + '/'
elsif volume == -2
link += 'vTBD/'
elsif volume == -3
link += 'vNA/'
elsif volume == -4
link += 'vANT/'
end
chap_buffer = ((chapter < 10) ? '00' : ((chapter < 100) ? '0' : ''))
link += 'c' + chap_buffer
if chapter % 1 == 0
link += chapter.to_i.to_s
else
link += chapter.to_s
end
link + '/' + ((page < 10) ? '0' : '') + page.to_s + '.html'
end

# downloads a page, with link = the link, data = [volume, chapter, page]
def page_link(link, data)
get_page_from_link(link, data, '//img[@id="image"]')
end

# downloads a chapter with link = the link and prep_display = small string displayed when announcing the download of the chapter
def chapter_link(link, prep_display = '')
get_chapter_from_link(link, prep_display, '.html') do |page|
page.xpath('//div[@class="l"]').text.split.last.to_i
end
end

def data
alternative_names = @manga_data[:index_page].xpath('//div[@id="title"]/h3').text
release_author_artist_genres = @manga_data[:index_page].xpath('//td[@valign="top"]')
release = release_author_artist_genres[0].text.to_i
author = release_author_artist_genres[1].text.gsub(/\s+/, '').gsub(',', ', ')
artist = release_author_artist_genres[2].text.gsub(/\s+/, '').gsub(',', ', ')
genres = release_author_artist_genres[3].text.gsub(/\s+/, '').gsub(',', ', ')
description = @manga_data[:index_page].xpath('//p[@class="summary"]').text
data = @manga_data[:index_page].xpath('//div[@class="data"]/span')
status = data[0].text.gsub(/\s+/, '').split(',')[0]
rank = data[1].text[/\d+/]
rating = data[2].text[/\d+[.,]\d+/]
rating_max = 5 # rating max is a constant in mangafox
tmp_type = @manga_data[:index_page].xpath('//div[@id="title"]/h1')[0].text.split(' ')
type = tmp_type[tmp_type.size - 1]
html_name = tmp_type.take(tmp_type.size - 1).join(' ')
validate_data(description, author, artist, type, status, genres, release, html_name, alternative_names, rank, rating, rating_max, '//div[@class="cover"]/img')
end
end
7 changes: 5 additions & 2 deletions sources/api/mangas.rb
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,12 @@ def self.update(mangas, todo_only = false, fast_update = false)
end
begin
generate_html = (todo_only ? dw.todo : dw.update)
if params.download[:loop_on_todo] && (dw.generated_todo || dw.downloaded_a_page)
i = 0
i_max = params.download[:loop_on_todo_times]
while params.download[:loop_on_todo] && (dw.generated_todo || dw.downloaded_a_page) && i < i_max
i += 1
dw.todo_reset
puts 're-trying to download todo' + ' (loop-on-todo is set to true)'.yellow
puts 're-trying to download todo' + ' (loop-on-todo is set to true) '.yellow + i.to_s + ' / ' + i_max.to_s
dw.todo
end
rescue RuntimeError => e
Expand Down
2 changes: 1 addition & 1 deletion sources/init.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def self.get_file_list
html/html
html/html_manga
Download/base_downloader
Download/mangafox
Download/fanfox
Download/mangareader_mangapanda
DownloadDisplay
instructions/Parsers/exec/Instruction_parser
Expand Down
4 changes: 4 additions & 0 deletions sources/templates/text/params/download.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,7 @@ Values from 0 ( no timeout ) to 300 seconds
[lt] (lt) = {lt}
if set to true, will allow MangaScrap to loop on a manga if a todo was generated or downloaded
it will try to re-download the todo elements if it succeded to download a todo on the previous try

[ltt] (ltt) = {ltt}
should |lt| be set to true, this parameter will allow the scrapper to know how many times it can loop at most
values range from 1 to 10
2 changes: 1 addition & 1 deletion tools/mangafox_me_to_la.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def update_mangafox_mangas(db, mangas)
puts 'updating ' + i.to_s + ' / ' + mangas.size.to_s + ' ' + manga[1]
# update website
# update link
args = ['http://mangafox.la', manga[4].gsub('mangafox.me', 'mangafox.la'), manga[0]]
args = ['http://mangafox.la/', manga[4].gsub('mangafox.me', 'mangafox.la'), manga[0]]
db.exec_query('UPDATE manga_list SET site=?, link=? WHERE id=?',
'could not update ' + manga[1], args)
i += 1
Expand Down
49 changes: 49 additions & 0 deletions tools/mangafox_to_fanfox.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/usr/bin/env ruby
# coding: utf-8

require_relative '../sources/../sources/init'

$0="MangaScrap's mangafox migration"

def get_mangafox_mangas(db)
db.exec_query('SELECT * FROM manga_list WHERE site=? ORDER BY name COLLATE NOCASE', 'error while getting the manga list', ['http://mangafox.la/'])
end

def update_mangafox_mangas(db, mangas)
if mangas.size != 0
puts 'updating + ' + mangas.size.to_s + ' mangas(s)'
i = 1
mangas.each do |manga|
puts 'updating ' + i.to_s + ' / ' + mangas.size.to_s + ' ' + manga[1]
# update website
# update link
args = ['http://fanfox.net/', manga[4].gsub('mangafox.la', 'fanfox.net'), manga[0]]
if args[1][-1, 1] != '/'
args[1] += '/'
end
db.exec_query('UPDATE manga_list SET site=?, link=? WHERE id=?',
'could not update ' + manga[1], args)
i += 1
end
puts ''
puts 'done'
else
puts 'nothing to update'
end
end

begin
Init::initialize_mangascrap
db = Manga_database.instance
mangas = get_mangafox_mangas(db)
update_mangafox_mangas(db, mangas)
rescue Interrupt
puts ''
puts ''
puts 'MangaScrap was interrupted by user'.magenta
puts ''
puts 'backtrace'.yellow + ' is :'
pp $!.backtrace
puts ''
exit 7
end

0 comments on commit dfd5c4e

Please sign in to comment.