在无声的世界里,默默地等待着他的出现 ----《声之形》
大概有两个月左右没更新过文章了,没想到还会有人来参观这个博客~访问人数在逐渐增加,这一点我挺开心的~最近也挺忙的……开学了不说~我还报了驾校,也快考科目一了,练习倒库练了一个礼拜,前一个小时倒进去没啥问题的,过了几个小时后又进不去了…….唉。行~废话也不多说,这次我介绍一下如何用Python来登录新浪微博~
访问新浪微博
URL: http://weibo.com,F12打开开发者工具抓包
然后什么都不做等待几秒钟后他会出现push_count.json
的请求,而且每隔几秒钟就会出现一个

js里面的内容是这样的
try{STK_150823119986721({"code":1,"data":{"remind_settings":{"msgbox":1},"app_message":[]}});}catch(e){}
用户名的生成
接着,输入帐号和密码(只输入,不登录),又抓到了一个prelogin.php
的请求,然后刚才出现push_count.json
请求的时间会停止轮询~

看一下他的url
https://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su=MTQ3MzAxODY3MSU0MHFxLmNvbQ%3D%3D&rsakt=mod&checkpin=1&client=ssologin.js(v1.4.19)&_=1508233679833
这个url里面只需注意一下su
参数
MTQ3MzAxODY3MSU0MHFxLmNvbQ%3D%3D
我们用Python里面的base64来解一下
不过这里注意一个问题,这个字符串里面%3D%3D
base64没法解,但是去掉就会报错:binascii.Error: Incorrect padding
这个问题只需要把 %3D
换成 =
就行了
用户名中可能包含@
这样的符号,解码之后@
变成了%40
>>> import base64
>>> base64.b64decode("MTQ3MzAxODY3MSU0MHFxLmNvbQ==") # %3D 就是 = 两个%3D 就是==
b'1473018671%40qq.com' # %40就是 @
>>>
看见没,和咱们输入的帐号一样~
得到预登录参数
咱们已经知道prelogin.php
的请求里的su
参数是怎么生成的了~所以就只需要把url里面的su
参数改一下就可以拿到预登录的数据~
接着用浏览器打开刚才抓到的prelogin.php?
请求的url,他返回的是:
sinaSSOController.preloginCallBack({"retcode":0,"servertime":1508233777,"pcid":"gz-074aaac89dd6886019ebc121f0137b79ec17","nonce":"AFJHZY","pubkey":"EB2A38568661887FA180BDDB5CABD5F21C7BFD59C090CB2D245A87AC253062882729293E5506350508E7F9AA3BB77F4333231490F915F6D63C55FE2F08A49B353F444AD3993CACC02DB784ABBB8E42A9B1BBFFFB38BE18D78E87A0E41B9B8F73A928EE0CCEE1F6739884B9777E4FE9E88A1BBE495927AC4A799B3181D6442443","rsakv":"1330428213","is_openlock":0,"showpin":0,"exectime":88})
这里面我们需要拿4个参数,servertime
nonce
pubkey
rsakv
输入帐号密码,登录
现在找一下他的POST参数~接着抓包~
在抓包前,先打开preserve log
,不然就一闪而过,抓不到了~
当输入帐号密码登录之后~你会看见一个login.php?
的请求

https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)
这个url是一个POST,我们可以拿到他POST出去的数据参数,就是Form Data
里面的东西
有些参数是不是很眼熟呀?~~
su
: 用户名,咱们用base64可以搞出来
servertime
: 服务器时间,通过预登录得到
nonce
: 通过预登录得到
rsakv
: 通过预登录得到
sp
: 密码,通过rsa加密,这个东西在ssologin.js
里面,稍后介绍~
其他参数保持不变~
加密密码
在login.php?client=ssologin.js(v1.4.19)
下面有一个ssologin.js
顺便看一下home?wvr=5&lf=reg
我们后面会用到~

长达2000多行~~在900行左右你会看见……

是吧!我们得到这个东西
var RSAKey = new sinaSSOEncoder.RSAKey();
RSAKey.setPublic(me.rsaPubkey, "10001");
password = RSAKey.encrypt([me.servertime, me.nonce].join("\t") + "\n" + password)
加密方法~他有两种加密方法~就看RSA就行了~
注意一下 '10001'
需要转换成十进制
就是 65537
上代码!
需要用到的模块:
import urllib
import urllib.request
import urllib.parse
import base64
import rsa
import json
import http.cookiejar
import binascii
import re
from bs4 import BeautifulSoup
需要安装的模块:
pip install rsa
pip install bs4
初始化
初始化username
和 password
1 2 3 4 5
| class WeiboLogin():
def __init__(self,username,password): self.username = username self.password = password
|
Cookie
建立一个cookies,用于登录过程的cookies绑定~
1 2 3 4 5 6 7 8 9
| def enableCookies(self): # 建立一个cookies 容器 cookie_container = http.cookiejar.CookieJar() # 将一个cookies容器和一个HTTP的cookie的处理器绑定 cookie_support = urllib.request.HTTPCookieProcessor(cookie_container) # 创建一个opener,设置一个handler用于处理http的url打开 opener = urllib.request.build_opener(cookie_support, urllib.request.HTTPHandler) # 安装opener,此后调用urlopen()时会使用安装过的opener对象 urllib.request.install_opener(opener)
|
加密用户名
解码之后@
变成了%40
,如果帐号的纯数字的,会出现%3D
,对应为=
加密之前必须先把username字符串转化成url的编码样式,实现这一功能的是urllib.request.qoute
base64编码是bytes的形式,但是我们需要得到str,才能被接受
1 2 3 4
| def getusername(self): username_req_qo = urllib.request.quote(self.username) username_bsencode = base64.b64encode(bytes(username_req_qo,encoding='utf-8')) return username_bsencode.decode("utf-8").split("=")[0]
|
获取4个参数
1 2 3 4 5 6 7 8 9 10 11 12 13
| def getprelogin(self): prelogin_url = "https://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su={}&rsakt=mod&checkpin=1&client=ssologin.js(v1.4.19)&_=1507974787556".format(self.getusername()) pre = re.compile('sinaSSOController.preloginCallBack(.*)') request = urllib.request.Request(prelogin_url) response = urllib.request.urlopen(request) read_data = response.read().decode("utf-8") date = pre.search(read_data).group(1)[1:-1] date_json = json.loads(date) servertime = str(date_json['servertime']) nonce = date_json['nonce'] pubkey = date_json['pubkey'] rsakv = date_json['rsakv'] return servertime,nonce,pubkey,rsakv
|
加密密码
上面说过了'10001'
需要转换成十进制
就是 65537
1 2 3 4 5 6 7 8
| def getpassword(self): servertime,nonce,pubkey,rsakv = self.getprelogin() pw_string = str(servertime) + '\t' + str(nonce) + '\n' + str(self.password) key = rsa.PublicKey(int(pubkey,16),65537) # 10001 == 65537 转10进制 pw_encrypt = rsa.encrypt(pw_string.encode('utf-8'),key) self.password = '' # 安全起见~清空密码~ passwd = binascii.b2a_hex(pw_encrypt) return passwd
|
POST
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
| def build_post_data(self): servertime, nonce, pubkey, rsakv = self.getprelogin() post_data = { 'entry': 'weibo', 'gateway': '1', 'from': '', 'savestate': '7', 'qrcode_flag': 'false', 'useticket': '1', "pagerefer":"http://passport.weibo.com/visitor/visitor?entry=miniblog&a=enter&url=http%3A%2F%2Fweibo.com%2F&domain=.weibo.com&ua=php-sso_sdk_client-0.6.14", 'vsnf': '1', 'su': self.getusername(), 'service':'miniblog', 'servertime': servertime, 'nonce': nonce, 'pwencode': 'rsa2', 'rsakv': rsakv, 'sp': self.getpassword(), 'sr': '1920 * 1080', 'ncoding': 'UTF - 8', 'prelt': '912', 'url': "http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack", 'returntype': 'META' } data = urllib.parse.urlencode(post_data).encode('utf-8') return data
|
登录
1 2 3 4 5 6 7 8 9
| def login(self): url = 'https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)' data = self.build_post_data() self.enableCookies() headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"} requests1 = urllib.request.Request(url,data=data,headers=headers) reqopen1 = urllib.request.urlopen(requests1) reqread1 = reqopen1.read().decode("GBK") print(reqread1)
|
到这一步登录之后,恶心的重定向出现了…..
返回的内容:
<html>
<head>
<title>新浪通行证</title>
<meta http-equiv="refresh" content="0; url='https://login.sina.com.cn/crossdomain2.php?action=login&entry=weibo&r=https%3A%2F%2Fpassport.weibo.com%2Fwbsso%2Flogin%3Fssosavestate%3D1539793002%26url%3Dhttp%253A%252F%252Fweibo.com%252Fajaxlogin.php%253Fframelogin%253D1%2526callback%253Dparent.sinaSSOController.feedBackUrlCallBack%26ticket%3DST-NTUxMzA4ODA3Mw%3D%3D-1508257002-gz-755E02FA479243D1D971FC7697414B96-1%26retcode%3D0&sr=1920+%2A+1080'"/>
<meta http-equiv="Content-Type" content="text/html; charset=GBK" />
</head>
<body bgcolor="#ffffff" text="#000000" link="#0000cc" vlink="#551a8b" alink="#ff0000">
<script type="text/javascript" language="javascript">
location.replace("https://login.sina.com.cn/crossdomain2.php?action=login&entry=weibo&r=https%3A%2F%2Fpassport.weibo.com%2Fwbsso%2Flogin%3Fssosavestate%3D1539793002%26url%3Dhttp%253A%252F%252Fweibo.com%252Fajaxlogin.php%253Fframelogin%253D1%2526callback%253Dparent.sinaSSOController.feedBackUrlCallBack%26ticket%3DST-NTUxMzA4ODA3Mw%3D%3D-1508257002-gz-755E02FA479243D1D971FC7697414B96-1%26retcode%3D0&sr=1920+%2A+1080");
</script>
</body>
</html>
重定向的url写在 location.replace的后面
可以用正则,可以用bs4或者你觉得有更好的方法
更改一下,我用bs4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| def login(self): url = 'https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)' data = self.build_post_data() self.enableCookies() headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"} requests1 = urllib.request.Request(url,data=data,headers=headers) reqopen1 = urllib.request.urlopen(requests1) reqread1 = reqopen1.read().decode("GBK") bs = BeautifulSoup(reqread1,'lxml') bfind = bs.find('script') for i in bfind: p = i.strip().split('"')[1] requests2 = urllib.request.Request(p) reqopen2 = urllib.request.urlopen(requests2) reqread2 = reqopen2.read() print(reqread2)
|
然后….
b'<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=GBK" />\n<title>\xd0\xc2\xc0\xcb\xcd\xa8\xd0\xd0\xd6\xa4</title>\n\n\n<script charset="utf-8" src="https://i.sso.sina.com.cn/js/ssologin.js"></script>\n</head>\n<body>\n\xd5\xfd\xd4\xda\xb5\xc7\xc2\xbc ...\n<script>\ntry{sinaSSOController.setCrossDomainUrlList({"retcode":0,"arrURL":["https:\\/\\/passport.weibo.com\\/wbsso\\/login?ticket=ST-NTUxMzA4ODA3Mw%3D%3D-1508257418-gz-1D704A04699B6A20EFE6F4123E677198-1&ssosavestate=1539793417","https:\\/\\/passport.97973.com\\/sso\\/crossdomain?action=login&savestate=1539793417","https:\\/\\/passport.weibo.cn\\/sso\\/crossdomain?action=login&savestate=1"]});}\n\t\tcatch(e){\n\t\t\tvar msg = e.message;\n\t\t\tvar img = new Image();\n\t\t\tvar type = 1;\n\t\t\timg.src = \'https://login.sina.com.cn/sso/debuglog?msg=\' + msg +\'&type=\' + type;\n\t\t}try{sinaSSOController.crossDomainAction(\'login\',function(){location.replace(\'https://passport.weibo.com/wbsso/login?ssosavestate=1539793417&url=http%3A%2F%2Fweibo.com%2Fajaxlogin.php%3Fframelogin%3D1%26callback%3Dparent.sinaSSOController.feedBackUrlCallBack&ticket=ST-NTUxMzA4ODA3Mw==-1508257417-gz-64BE9EFA64E07479B7DB6F8882CA661C-1&retcode=0\');});}\n\t\tcatch(e){\n\t\t\tvar msg = e.message;\n\t\t\tvar img = new Image();\n\t\t\tvar type = 2;\n\t\t\timg.src = \'https://login.sina.com.cn/sso/debuglog?msg=\' + msg +\'&type=\' + type;\n\t\t}\n</script>\n</body>\n</html>'
又是一个重定向…….
好嘞~bs4不好抓了~用正则吧~
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| def login(self): url = 'https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)' data = self.build_post_data() self.enableCookies() headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"} requests1 = urllib.request.Request(url,data=data,headers=headers) reqopen1 = urllib.request.urlopen(requests1) reqread1 = reqopen1.read().decode("GBK") bs = BeautifulSoup(reqread1,'lxml') bfind = bs.find('script') for i in bfind: p = i.strip().split('"')[1] requests2 = urllib.request.Request(p) reqopen2 = urllib.request.urlopen(requests2) reqread2 = reqopen2.read() bss = BeautifulSoup(reqread2,'lxml') bff = bss.find_all('script')[1] try: p2 = re.compile(r'location.replace(.*?);}') for i in bff: get_p2 = p2.findall(i)[0][2:-2] requests3 = urllib.request.Request(get_p2) reqopen3 = urllib.request.urlopen(requests3) reqread3 = reqopen3.read().decode('utf-8') print(reqread3) except IndexError: print("Login Error!")
|
返回结果:
1 2
| /usr/bin/python3.5 /home/crazyrookie/Documents/Python/reptile/weibo_login.py <html><head><script language='javascript'>parent.sinaSSOController.feedBackUrlCallBack({"result":true,"userinfo":{"uniqueid":"5513088073","userid":null,"displayname":null,"userdomain":"?wvr=5&lf=reg"}});</script></head><body></body></html>
|
又是一个重定向…….
但是注意到里面有个?wvr=5&lf=reg
字段,看看刚才手工登陆抓到的包,就是我让各位顺便看一下
的那个地方,这是是主页链接的一部分。
再写一个正则,把?wvr=5&lf=reg
拼接一个url出来,就可以轻松而愉悦地模拟登陆了!
完整代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
| import urllib import urllib.request import urllib.parse import base64 import rsa import json import http.cookiejar import binascii import re from bs4 import BeautifulSoup
class WeiboLogin():
def __init__(self,username,password): self.username = username self.password = password
def enableCookies(self): # 建立一个cookies 容器 cookie_container = http.cookiejar.CookieJar() # 将一个cookies容器和一个HTTP的cookie的处理器绑定 cookie_support = urllib.request.HTTPCookieProcessor(cookie_container) # 创建一个opener,设置一个handler用于处理http的url打开 opener = urllib.request.build_opener(cookie_support, urllib.request.HTTPHandler) # 安装opener,此后调用urlopen()时会使用安装过的opener对象 urllib.request.install_opener(opener)
# 加密用户名 def getusername(self): username_req_qo = urllib.request.quote(self.username) username_bsencode = base64.b64encode(bytes(username_req_qo,encoding='utf-8')) return username_bsencode.decode("utf-8").split("=")[0]
# 得到servertime,nonce,pubkey,rsakv # 输入用户名和密码之后(不登录)会出现一个prologin的预登陆的包 def getprelogin(self): prelogin_url = "https://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su={}&rsakt=mod&checkpin=1&client=ssologin.js(v1.4.19)&_=1507974787556".format(self.getusername()) pre = re.compile('sinaSSOController.preloginCallBack(.*)') request = urllib.request.Request(prelogin_url) response = urllib.request.urlopen(request) read_data = response.read().decode("utf-8") date = pre.search(read_data).group(1)[1:-1] date_json = json.loads(date) servertime = str(date_json['servertime']) nonce = date_json['nonce'] pubkey = date_json['pubkey'] rsakv = date_json['rsakv'] return servertime,nonce,pubkey,rsakv
# 加密密码 def getpassword(self): servertime,nonce,pubkey,rsakv = self.getprelogin() pw_string = str(servertime) + '\t' + str(nonce) + '\n' + str(self.password) key = rsa.PublicKey(int(pubkey,16),65537) # 10001 == 65537 转10进制 pw_encrypt = rsa.encrypt(pw_string.encode('utf-8'),key) self.password = '' # 安全起见~清空密码~ passwd = binascii.b2a_hex(pw_encrypt) return passwd
# POST参数 def build_post_data(self): servertime, nonce, pubkey, rsakv = self.getprelogin() post_data = { 'entry': 'weibo', 'gateway': '1', 'from': '', 'savestate': '7', 'qrcode_flag': 'false', 'useticket': '1', "pagerefer":"http://passport.weibo.com/visitor/visitor?entry=miniblog&a=enter&url=http%3A%2F%2Fweibo.com%2F&domain=.weibo.com&ua=php-sso_sdk_client-0.6.14", 'vsnf': '1', 'su': self.getusername(), 'service':'miniblog', 'servertime': servertime, 'nonce': nonce, 'pwencode': 'rsa2', 'rsakv': rsakv, 'sp': self.getpassword(), 'sr': '1920 * 1080', 'ncoding': 'UTF - 8', 'prelt': '912', 'url': "http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack", 'returntype': 'META' } data = urllib.parse.urlencode(post_data).encode('utf-8') return data
def login(self): url = 'https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)' data = self.build_post_data() self.enableCookies() headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"} requests1 = urllib.request.Request(url,data=data,headers=headers) reqopen1 = urllib.request.urlopen(requests1) reqread1 = reqopen1.read().decode("GBK") bs = BeautifulSoup(reqread1,'lxml') bfind = bs.find('script') for i in bfind: p = i.strip().split('"')[1] requests2 = urllib.request.Request(p) reqopen2 = urllib.request.urlopen(requests2) reqread2 = reqopen2.read() bss = BeautifulSoup(reqread2,'lxml') bff = bss.find_all('script')[1] try: p2 = re.compile(r'location.replace(.*?);}') p3 = re.compile(r'"userdomain":"(.*?)"') for i in bff: get_p2 = p2.findall(i)[0][2:-2] requests3 = urllib.request.Request(get_p2) reqopen3 = urllib.request.urlopen(requests3) reqread3 = reqopen3.read().decode('utf-8') userdomain = p3.findall(reqread3) login_url = 'http://weibo.com/' + userdomain[0] requests4 = urllib.request.Request(login_url) reqopen4 = urllib.request.urlopen(requests4) reqread4 = reqopen4.read().decode('utf-8') bs_date = BeautifulSoup(reqread4,'lxml') bfind_nick_uid = bs_date.find_all('script') nick_re = re.compile("CONFIG\['nick'\]='.*?';") uin_re = re.compile("CONFIG\['uid'\]='.*?';") for i in bfind_nick_uid[2]: print("Login success!") print("Usernmae:",nick_re.search(i.strip()).group().split('=')[1][1:-2],"Uin:",uin_re.search(i).group().split('=')[1][1:-2]) except IndexError: print("Login Error!")
if __name__ == '__main__': w = WeiboLogin('','') w.login()
|
结果如下:
Login success!
Usernmae: 呆呆的设计师 Uin: 5513088073
到此结束,如果发现有啥地方不对劲或者没理解的地方可以在下面的评论区写下~
Thanks~