如何下载web资源 目的 最近机工社宣布开放工程科技数字图书馆
如何下载web资源
目的比来机工社公布发表开放工程科技数字藏书楼, 全网免费共克时艰!
发明有些书是以web页面的方法给用户看的,,一张一张,很难一次性下载
有没有步伐一次性下载他们呢?
好比书
研究 test 1: chrome extension上网查到很多chrome extension但是他们都认不到页面内的连接。这是因为页面里面根柢没有连接
biru
页面链接如下
<a href="javascript:void(0);" onclick="probation.readBook(this);" id="678612" ref="/openresources/teach_ebook/uncompressed/13780/OEBPS/Text/chapter33.html#heading_id_3">3.1 协商原则</a>该链接其实最终酿成?path=http://www.mamicode.com/openresources/teach_ebook/uncompressed/13780/OEBPS/Text/chapter33.html
所以怪不得扩展不认识了
看来还是要本身写一个了
最简单就是用python了
测试以上链接
C:\Users\cutep>python -m wget ?path=http://www.mamicode.com/openresources/teach_ebook/uncompressed/13780/OEBPS/Text/chapter33.html -o 33.html 100% [................................................................................] 4000 / 4000 Saved under 33.html告成!
test 2: 最终写了如下python脚本 import os #from selenium import webdriver #from urllib2 import urlopen import requests def my_system(cmd): print(cmd) os.system(cmd) def download(url, file): cmd = 'python -m wget %s -o %s'%(url, file) my_system(cmd) def download_chapter(click_url, file): download('?path=%s'%click_url, file) def get_bookname(cont): s='<div class="book-name">' p1 = cont.find(s) p1 = p1 + len(s) p1 = cont.find('<span>', p1) p1 = p1 + len('<span>') p2 = cont.find('</span>', p1) #print(p1, p2) name=cont[p1:p2] return name def get_value_token(cont): s='"ebookId" value="' p1 = cont.find(s) p1 = p1 + len(s) p2 = cont.find('"/>', p1) #print(p1, p2) ebookId=cont[p1:p2] s2 = 'name="token" value="' p3 = cont.find(s2, p2) p3 = p3 + len(s2) p4 = cont.find('"/>', p3) #print(p3, p4) token=cont[p3:p4] print('ebookId, token %s %s'%(ebookId, token)) return [ebookId, token] def download_book(main_link): my_system('del main*.html') download(main_link, 'main.html') main_cont = open('main.html', 'r', encoding='utf-8').read() [ebookId, token] = get_value_token(main_cont) bookname = get_bookname(main_cont) print(bookname) if os.path.isdir(bookname): return my_system('rd/s/q my_temp') my_system('md my_temp') os.chdir('my_temp') my_system('cd') #response = requests.post('', data={'ebookId':15917,'token':"e87436c8bc7849c397a1db2f27c0ba5d"}) response = requests.post('', data={'ebookId':ebookId,'token':token}) resp_json = response.json() #print(resp_json) for i in resp_json['data']['data']: ref_link = i['ref'] file = ref_link[ref_link.rfind('/')+1:] print(ref_link, file) download_chapter(ref_link, file) os.chdir('..') my_system('cd') my_system('md "%s"'%bookname) my_system('xcopy /c/d/e/y my_temp "%s"'%bookname) #download_book('') download_book('') download_book('') download_book('') download_book('') download_book('') download_book('') download_book('') download_book('') download_book('') download_book('') download_book('') download_book('') download_book('') download_book('')Test result
Saved under chapter51.xhtml /openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter52.xhtml chapter52.xhtml python -m wget ?path=http://www.mamicode.com/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter52.xhtml -o chapter52.xhtml 100% [................................................................................] 1058 / 1058 Saved under chapter52.xhtml /openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter53.xhtml chapter53.xhtml python -m wget ?path=http://www.mamicode.com/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter53.xhtml -o chapter53.xhtml 100% [................................................................................] 4625 / 4625 Saved under chapter53.xhtml /openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter54.xhtml chapter54.xhtml python -m wget ?path=http://www.mamicode.com/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter54.xhtml -o chapter54.xhtml 100% [..................................................................................] 705 / 705 Saved under chapter54.xhtml /openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter55.xhtml chapter55.xhtml python -m wget ?path=http://www.mamicode.com/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter55.xhtml -o chapter55.xhtml 100% [................................................................................] 1814 / 1814 Saved under chapter55.xhtml /openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter56.xhtml chapter56.xhtml python -m wget ?path=http://www.mamicode.com/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter56.xhtml -o chapter56.xhtml 100% [..............................................................................] 10025 / 10025 Saved under chapter56.xhtml /openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter57.xhtml chapter57.xhtml python -m wget ?path=http://www.mamicode.com/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter57.xhtml -o chapter57.xhtml 其他 下面这个是啥框架写的?温馨提示: 本文由Jm博客推荐,转载请保留链接: https://www.jmwww.net/file/web/30715.html