博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
python爱好者社区公众号历史文章合集_GitHub - thinkingpy/weixin_crawler: 高效微信公众号历史文章和阅读数据爬虫powered by scrapy...
阅读量:6270 次
发布时间:2019-06-22

本文共 3689 字,大约阅读时间需要 12 分钟。

Insatall mongodb / redis / elasticsearch and run them in the background

downlaod mongodb / redis / elasticsearch from their official sites and install them

run them at the same time under the default configuration. In this case mongodb is localhost:27017 redis is localhost:6379(or you have to config in weixin_crawler/project/configs/auth.py)

Inorder to tokenize Chinese, elasticsearch-analysis-ik have to be installed for Elasticsearch

Install proxy server and run proxy.js

install nodejs and then npm install anyproxy and redis in weixin_crawler/proxy

cd to weixin_crawler/proxy and run node proxy.js

install anyproxy https CA in both computer and phone side

if you are not sure how to use anyproxy, hereis the doc

Install the needed python packages

NOTE: you may can not simply type pip install -r requirements.txt to install every package, twisted is one of them which is needed by scrapy. When you get some problems about installing python package(twisted for instance), here always have a solution——downlod the right version package to your drive and run $ pip install package_name

I am not sure if your python enviroment will throw other package not found error, just install any package that is needed

Some source code have to be modified(maybe it is not reasonable)

scrapy Python36\Lib\site-packages\scrapy\http\request\ _init_.py --> weixin_crawler\source_code\request\__init__.py

scrapy Python36\Lib\site-packages\scrapy\http\response\ _init_.py --> weixin_crawler\source_code\response\_init_.py

pyecharts Python36\Lib\site-packages\pyecharts\base.py --> weixin_crawler\source_code\base.py. In this case function get_echarts_options is added in line 106

If you want weixin_crawler work automatically those steps are necessary or you shoud operate the phone to get the request data that will be detected by Anyproxy manual

Install adb and add it to your path(windows for example)

install android emulator(NOX suggested) or plugin your phone and make sure you can operate them with abd from command line tools

If mutiple phone are connected to your computer you have to find out their adb ports which will be used to add crawler

adb does not support Chinese input, this is a bad news for weixin official account searching. In order to input Chinese, adb keyboard has to be installed in your android phone and set it as the default input method, more is here

Why could weixin_crawler work automatically? Here is the reason:

If you want to crawl a wechat official account, you have to search the account in you phone and click its "全部消息" then you will get a message list , if you roll down more lists will be loaded. Anyone of the messages in the list could be taped if you want to crawl this account's reading data

If a nickname of a wechat official account is given, then wexin_crawler operate the wechat app installed in a phone, at the same time anyproxy is 'listening background'...Anyway weixin_crawler get all the request data requested by wechat app, then it is the show time for scrapy

As you supposed, in order to let weixin_crawler operate wechat app we have to tell adb where to click swap and input, most of them are defined in weixin_crawler/project/phone_operate/config.py. BTW phone_operate is responsible for wechat operate just like human beings, its eyes are baidu OCR API and predefined location tap area, its fingers are adb

Run the main.py

$ cd weixin_crawler/project/

$ python(3) ./main.py

Now open the browser and everything you want would be in localhost:5000.

In this long step list you may get stucked, join our community for help, tell us what you have done and what kind of error you have found.

Let's go to explore the world in localhost:5000 together

转载地址:http://btspa.baihongyu.com/

你可能感兴趣的文章
前端如何接收 websocket 发送过来的实时数据
查看>>
JavaWeb下载文件response
查看>>
Laravel的三种安装方法总结
查看>>
SpringMVC加载配置Properties文件的几种方式
查看>>
C#设计模式总结 C#设计模式(22)——访问者模式(Vistor Pattern) C#设计模式总结 .NET Core launch.json 简介 利用Bootstrap Paginat...
查看>>
java 项目相关 学习笔记
查看>>
numpy opencv matlab eigen SVD结果对比
查看>>
WPF获取某控件的位置,也就是偏移量
查看>>
Boost C++ 库 中文教程(全)
查看>>
solr查询优化(实践了一下效果比较明显)
查看>>
jdk目录详解及其使用方法
查看>>
说说自己对RESTful API的理解s
查看>>
通过layout实现可拖拽自动排序的UICollectionView
查看>>
服务器错误码
查看>>
javascript中的面向对象
查看>>
Splunk作为日志分析平台与Ossec进行联动
查看>>
yaffs文件系统
查看>>
Mysql存储过程
查看>>
NC营改增
查看>>
Lua
查看>>