记一次投票爬虫开发

2019/09/19

前言

从 2010 年69圣战第一次接触饭圈文化起(彼时还是个犯中二病的小学生),我时常感叹粉丝对偶像的热爱实在是强烈得不可思议、疯狂到不可理喻。上初中后因为某些复杂的原因,我开始加入某些追星群体(偶有真心多为假意),有了更近距离观察他们的机会,也涨了些见识,多年来感触颇多。扯远了,以后该找时间写一写。

这几天打榜组群里又在喊人投票。有老哥用脚本注册了大量账号,并表示即将攻克自动投票难关,需要大家人工投几天。尽管我对这种事情稍微有些反感,但还是愿意趁着吃外卖的时间动手帮帮忙的。在我越过某堵담,打开投票页面后,不知为何我的左手在未经大脑指挥的情况下擅自给 F12 键施加了 5N 左右的压力——或许手贱打开开发者工具也是一种职业病吧。然后,我看着自动进入了无尽 debugger 的调试器(又是该死的反调试),脑中突然闪过一道灵光:自动投票我也会做啊!

想到就开干,今天咱无主之地3也不玩了,昆特牌也不打了,就来写写爬虫。

页面逻辑分析

抓包

既然投票页面做了反调试,那就先分析报文吧。

打开 Fiddler,然后访问投票页面,按正常流程登录、投票、注销。

抓包结果

抓包结果

除去获取静态资源的请求后,剩下的和登录、投票相关的报文只有几个,注销没有专门发送请求。

抓包结果

可能有用的请求

关键报文分析

筛选出最有用的报文之后,先看第一组报文。

请求(这里 email 打个码):

POST https://www.googleapis.com/identitytoolkit/v3/relyingparty/verifyPassword?key=AIzaSyAlFqKJC-D1xFUsZ6PE9tzsRcjNftrxXmY HTTP/1.1
Host: www.googleapis.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
Accept: */*
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
Accept-Encoding: gzip, deflate, br
Content-Type: application/json
X-Client-Version: Firefox/JsCore/6.5.0/FirebaseCore-web
Content-Length: 81
Origin: https://vote.aaavietnam2019.com.vn
Connection: keep-alive
Referer: https://vote.aaavietnam2019.com.vn/

{"email":"xxxxx@ooooo.com","password":"12345678","returnSecureToken":true}

响应载荷(idToken 打个码):

  "kind": "identitytoolkit#VerifyPasswordResponse",
  "localId": "q4m1qM1PHWOQAUd0l9qwmDEQdg62",
  "email": "xxxxx@ooooo.com",
  "displayName": "xxxxx",
  "idToken": "eyJhbGciOiJSUzI1NiIsImtpZCI6ImVlMjc0MWQ0MWY5ZDQzZmFiMWU2MjhhODVlZmI0MmE4OGVmMzIyOWYiLCJ0eXAiOiJKV1QifQ.eyJuYW1lIjoiZ2dnZnJpZW5kNDYyMCIsImlzcyI6Imh0dHBzOi8vc2VjdXJldG9rZW4uZ29vZ2xlLmNvbS91dG9wLWFhYTIwMTkiLCJhdWQiOiJ1dG9wLWFhYTIwMTkiLCJhdXRoX3RpbWUiOjE1Njg5Njc2MjIsInVzZXJfaWQiOiJxNG0xcU0xUEhXT1FBVWQwbDlxd21ERVFkZzYyIiwic3ViIjoicTRtMXFNMVBIV09RQVVkMGw5cXdtREVRZGc2MiIsImlhdCI6MTU2ODk2NzYyMiwiZXhwIjoxNTY4OTcxMjIyLCJlbWFpbCI6ImdnZ2ZyaWVuZDQ2MjBAYmNjdG8ubWUiLCJlbWFpbF92ZXJpZmllZCI6dHJ1ZSwiZmlyZWJhc2UiOnsiaWRlbnRpdGllcyI6eyJlbWFpbCI6WyJnZ2dmcmllbmQ0NjIwQGJjY3RvLm1lIl19LCJzaWduX2luX3Byb3ZpZGVyIjoicGFzc3dvcmQifX0.JJ_t7QMGp4aptqy3DcvpJYoibcoSZHMTiPUAOi25q1w9GwowJeWRiNALluLP0O2XZ9-NHLBznQci4dJ1_rNGE6iGSxxhyCWHsqJQFd6N5-iuZk5_mdQaa98vebsGFNJm3iFFc0YMXP1zISvVMKR5nLHFNkSNROTuq9hgAct66oG1GNNWBqsjotiEVWM8X4i6aOd49ZRKdFergBg4TqQEOD-********************************************-**************************************************************-w358qYb4YC8BQQJQRLw",
  "registered": true,
  "refreshToken": "AEu4IL1rLBlb2Kt82gnR0mVXzsSZW7mCCQAHQnCoq0aqOwp-OYUObxw0x4MhXfkGcQWmLJFd8oL8zQLHaV929ZmqWaQj4GFPA_KpeDjhU_mZ8m55jcxzH1kUg2VTwPGXaZFDNHQ1Xm9-vJ9WzjM1-8-XhY9JcVR65eFVwmtqWiTpT_ANf-FyvXWjL_T8oCLVMbKJsr5yxWqeeDAI1jVapd9C58ZfFO53zg6UsYosdh0BJgdJFLppSzo",
  "expiresIn": "3600"
}

这个请求是把登录邮箱和密码发到了 Google 的一个 API,获取对应的 idToken。请求的 URL 中带上的 key 暂时还看不出来是什么,不过既然使用了第三方 API,这个 key 应该是应用的身份标识,也就是说应该是死的。

换个帐号再登录,重新抓包,结果一样。证明 key 是写死的。

直接在所有抓到的包中搜索这个 key,可以发现是直接写在 config.js 中的。

app key

这个 key 是写死的

再看第二个请求:

POST https://www.googleapis.com/identitytoolkit/v3/relyingparty/getAccountInfo?key=AIzaSyAlFqKJC-D1xFUsZ6PE9tzsRcjNftrxXmY HTTP/1.1
Host: www.googleapis.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
Accept: */*
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
Accept-Encoding: gzip, deflate, br
Content-Type: application/json
X-Client-Version: Firefox/JsCore/6.5.0/FirebaseCore-web
Content-Length: 967
Origin: https://vote.aaavietnam2019.com.vn
Connection: keep-alive
Referer: https://vote.aaavietnam2019.com.vn/

{"idToken":"eyJhbGciOiJSUzI1NiIsImtpZCI6ImVlMjc0MWQ0MWY5ZDQzZmFiMWU2MjhhODVlZmI0MmE4OGVmMzIyOWYiLCJ0eXAiOiJKV1QifQ.eyJuYW1lIjoiZ2dnZnJpZW5kNDYyMCIsImlzcyI6Imh0dHBzOi8vc2VjdXJldG9rZW4uZ29vZ2xlLmNvbS91dG9wLWFhYTIwMTkiLCJhdWQiOiJ1dG9wLWFhYTIwMTkiLCJhdXRoX3RpbWUiOjE1Njg5Njc2MjIsInVzZXJfaWQiOiJxNG0xcU0xUEhXT1FBVWQwbDlxd21ERVFkZzYyIiwic3ViIjoicTRtMXFNMVBIV09RQVVkMGw5cXdtREVRZGc2MiIsImlhdCI6MTU2ODk2NzYyMiwiZXhwIjoxNTY4OTcxMjIyLCJlbWFpbCI6ImdnZ2ZyaWVuZDQ2MjBAYmNjdG8ubWUiLCJlbWFpbF92ZXJpZmllZCI6dHJ1ZSwiZmlyZWJhc2UiOnsiaWRlbnRpdGllcyI6eyJlbWFpbCI6WyJnZ2dmcmllbmQ0NjIwQGJjY3RvLm1lIl19LCJzaWduX2luX3Byb3ZpZGVyIjoicGFzc3dvcmQifX0.JJ_t7QMGp4aptqy3DcvpJYoibcoSZHMTiPUAOi25q1w9GwowJeWRiNALluLP0O2XZ9-NHLBznQci4dJ1_rNGE6iGSxxhyCWHsqJQFd6N5-iuZk5_mdQaa98vebsGFNJm3iFFc0YMXP1zISvVMKR5nLHFNkSNROTuq9hgAct66oG1GNNWBqsjotiEVWM8X4i6aOd49ZRKdFergBg4TqQEOD-********************************************-**************************************************************-w358qYb4YC8BQQJQRLw"}

响应载荷:

{
  "kind": "identitytoolkit#GetAccountInfoResponse",
  "users": [
    {
      "localId": "q4m1qM1PHWOQAUd0l9qwmDEQdg62",
      "email": "xxxxx@ooooo.com",
      "displayName": "xxxxx",
      "passwordHash": "UkVEQUNURUQ=",
      "emailVerified": true,
      "passwordUpdatedAt": 1568765118192,
      "providerUserInfo": [
        {
          "providerId": "password",
          "displayName": "xxxxx",
          "federatedId": "xxxxx@ooooo.com",
          "email": "xxxxx@ooooo.com",
          "rawId": "xxxxx@ooooo.com"
        }
      ],
      "validSince": "1568765118",
      "lastLoginAt": "1568967622691",
      "createdAt": "1568765118192",
      "lastRefreshAt": "2019-09-20T08:20:22.691Z"
    }
  ]
}

这个请求是获取用户信息的,应该是用于在页面上显示信息,对投票爬虫没有用处。

第三个请求是 OPTIONS 请求,爬虫不用发。

第四个请求:

POST https://vote-api.aaavietnam2019.com.vn/User/GenSession HTTP/1.1
Host: vote-api.aaavietnam2019.com.vn
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
Accept: */*
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
Accept-Encoding: gzip, deflate, br
Referer: https://vote.aaavietnam2019.com.vn/
Content-Type: application/json
ProGuard: proguard 8C9458547FB8B7B73B4A57F2A70DD858B7185D7E8EB7227AF8F1A9BA3CFE0843pro1568967624pro3dd82d27c8f34b0493c9122a4c6db50cproed010e3b0bb65efbe1db34e98c97192cdbfa1d7d4c078191789854c63d5c2860
Origin: https://vote.aaavietnam2019.com.vn
Content-Length: 965
Connection: keep-alive

{"Token":"eyJhbGciOiJSUzI1NiIsImtpZCI6ImVlMjc0MWQ0MWY5ZDQzZmFiMWU2MjhhODVlZmI0MmE4OGVmMzIyOWYiLCJ0eXAiOiJKV1QifQ.eyJuYW1lIjoiZ2dnZnJpZW5kNDYyMCIsImlzcyI6Imh0dHBzOi8vc2VjdXJldG9rZW4uZ29vZ2xlLmNvbS91dG9wLWFhYTIwMTkiLCJhdWQiOiJ1dG9wLWFhYTIwMTkiLCJhdXRoX3RpbWUiOjE1Njg5Njc2MjIsInVzZXJfaWQiOiJxNG0xcU0xUEhXT1FBVWQwbDlxd21ERVFkZzYyIiwic3ViIjoicTRtMXFNMVBIV09RQVVkMGw5cXdtREVRZGc2MiIsImlhdCI6MTU2ODk2NzYyMiwiZXhwIjoxNTY4OTcxMjIyLCJlbWFpbCI6ImdnZ2ZyaWVuZDQ2MjBAYmNjdG8ubWUiLCJlbWFpbF92ZXJpZmllZCI6dHJ1ZSwiZmlyZWJhc2UiOnsiaWRlbnRpdGllcyI6eyJlbWFpbCI6WyJnZ2dmcmllbmQ0NjIwQGJjY3RvLm1lIl19LCJzaWduX2luX3Byb3ZpZGVyIjoicGFzc3dvcmQifX0.JJ_t7QMGp4aptqy3DcvpJYoibcoSZHMTiPUAOi25q1w9GwowJeWRiNALluLP0O2XZ9-NHLBznQci4dJ1_rNGE6iGSxxhyCWHsqJQFd6N5-iuZk5_mdQaa98vebsGFNJm3iFFc0YMXP1zISvVMKR5nLHFNkSNROTuq9hgAct66oG1GNNWBqsjotiEVWM8X4i6aOd49ZRKdFergBg4TqQEOD-********************************************-**************************************************************-w358qYb4YC8BQQJQRLw"}

响应载荷:

{"voteSession":"eyJhbGciOiJodHRwOi8vd3d3LnczLm9yZy8yMDAxLzA0L3htbGRzaWctbW9yZSNobWFjLXNoYTI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VySWQiOiJxNG0xcU0xUEhXT1FBVWQwbDlxd21ERVFkZzYyIiwiZXhwIjoxNTc0MjM4MDI1fQ.cbOwvJALYqzko9Ax8xQAgf7ADEfZeNx03ED-9rxsvNc"}

第四个请求是使用通过 API 获取到的 idToken 来获取投票用的 voteSession ,其实也就是投票网站自己的 token。请求头中带有 ProGuard 字段,应该是签名认证,需要看 JS 才知道如何生成。

第五个请求也是 OPTIONS 请求,爬虫不用发。

第六个请求(ridCode 打个码):

POST https://vote-api.aaavietnam2019.com.vn/Vote/Vote?v=1568967654757 HTTP/1.1
Host: vote-api.aaavietnam2019.com.vn
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
Accept: */*
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
Accept-Encoding: gzip, deflate, br
Referer: https://vote.aaavietnam2019.com.vn/
Content-Type: application/json
ProGuard: proguard 8C9458547FB8B7B73B4A57F2A70DD858B7185D7E8EB7227AF8F1A9BA3CFE0843pro1568967655pro267ce6d8c9a046b5a074978ed7d0a579proe792e65bf3fbd3d97efbf671fa4ff3b3e9649b09f5122324a2f892980032f91a
Authorization: Bearer eyJhbGciOiJodHRwOi8vd3d3LnczLm9yZy8yMDAxLzA0L3htbGRzaWctbW9yZSNobWFjLXNoYTI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VySWQiOiJxNG0xcU0xUEhXT1FBVWQwbDlxd21ERVFkZzYyIiwiZXhwIjoxNTc0MjM4MDI1fQ.cbOwvJALYqzko9Ax8xQAgf7ADEfZeNx03ED-9rxsvNc
language: en
Origin: https://vote.aaavietnam2019.com.vn
Content-Length: 33
Connection: keep-alive

{"ridCode":"1700xxx","round":"2"}

响应载荷:

{"success":true,"message":"Voted successfully."}

这个请求就是投票了,请求载荷中 ridCode 是艺人编号, round 表示是第二轮投票,对我的爬虫来说都是可以写死的。 请求头中的 Authorization 字段是通过前面的POST /User/GenSession获取的,所以重点还是在于 ProGuard 的生成。

经过多次抓包对比后可以发现,ProGuard 由这几部分组成:

// 固定的
proguard 8C9458547FB8B7B73B4A57F2A70DD858B7185D7E8EB7227AF8F1A9BA3CFE0843
// 分隔符
pro
// 东八区的秒时间戳,也就是说直接用了本地时间
1568967655
// 分隔符
pro
// 32个字符的字符串,猜测是MD5
267ce6d8c9a046b5a074978ed7d0a579
// 分隔符
pro
// 64位的字符串,不知道是啥
e792e65bf3fbd3d97efbf671fa4ff3b3e9649b09f5122324a2f892980032f91a

找出关键 JS 文件

要知道 ProGuard 的加密方式,就必须通过 JS 阅读与调试。所以必须要知道是哪些 JS 文件和加密、反调试有关。

投票页面一共引入了 20 个 JS 文件,其中类似 JQuery 和 angular 这样的显然不是我要找的,外部 host 的应该也不是。在剩下的 JS 中只有 dshdjsdjsh.min.js 和 home.js 做了混淆,此地无银三百两。

js

20 个 JS 文件

另外,按下 F12 后自动进入了无穷无尽的 debugger,看一眼调用堆栈,debugger 是通过 dshdjsdjsh.min.js 中的匿名函数进入的。

js

通过调用堆栈找到了反调试代码所在文件

把 dshdjsdjsh.min.js 格式化之后,可以看到很多密码学相关的东西(惭愧,只认识其中一部分),以及'debu''gger'这样的字符串。这个文件很可疑。

为了方便测试,我把这个 js 文件下载到本地,然后在本地写了一个 a.html,格式化后引入这个 js。

<script type="text/javascript" src="dshdjsdjsh.min.js"></script>

打开后,控制台报错:

js

a.html 报错

在 a.html 中把 CryptoJs 也引入,解决。

<script type="text/javascript" src="crypto-js.js"></script>
<script type="text/javascript" src="dshdjsdjsh.min.js"></script>

此时在 a.html 的页面中打开浏览器的开发者工具,会陷入无尽的 debug。种种线索表明,这个文件就是需要破解的关键 JS文件。

初步翻译与反调试突破

把 dshdjsdjsh.min.js 格式化之后,可以看到第一行就是各种字符串,显然是用于函数调用和赋值、比较之类操作的,要读懂这个混淆过的文件,这就是最靠谱的突破口。

fucking1

dshdjsdjsh.min.js 开头的十几行

先给_0x1957起个名字,就叫它fuckingStrs吧,然后全局替换,开头这十几行就变成了这样:

var fuckingStrs = ['Hex', 'join', 'substr', 'Latin1', 'fromCharCode', 'charCodeAt', 'Utf8', 'Malformed\x20UTF-8\x20data', 'parse', 'BufferedBlockAlgorithm', '_data', '_nDataBytes', 'string', 'concat', 'blockSize', 'max', '_minBufferSize', 'min', '_doProcessBlock', 'splice', 'cfg', 'reset', '_doReset', '_append', '_process', '_doFinalize', 'HMAC', 'algo', 'Base64', '_map', 'charAt', 'indexOf', 'create', 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=', 'abs', 'sin', 'MD5', '_hash', 'floor', '_createHelper', 'HmacMD5', 'EvpKDF', 'hasher', 'keySize', 'iterations', 'finalize', 'Cipher', '_ENC_XFORM_MODE', '_DEC_XFORM_MODE', '_xformMode', 'decrypt', 'StreamCipher', 'mode', '_iv', '_prevBlock', 'BlockCipherMode', 'Encryptor', 'Decryptor', '_cipher', 'encryptBlock', 'slice', 'decryptBlock', 'CBC', 'pad', 'BlockCipher', 'createEncryptor', '_mode', 'unpad', 'CipherParams', 'formatter', 'OpenSSL', 'salt', 'SerializableCipher', 'padding', '_parse', 'createDecryptor', 'compute', 'PasswordBasedCipher', 'kdf', 'execute', 'encrypt', 'key', 'format', 'ivSize', '_hasher', '_oKey', '_iKey', 'update', 'AES', '_key', '_nRounds', '_invKeySchedule', '_doCryptBlock', '_keySchedule', 'Hasher', 'sqrt', 'pow', 'SHA256', 'HmacSHA256', '_createHmacHelper', 'split', 'replaceAll', '9D4763DDF7320050E4CA8G159AB03CEV', '904A346434B5G47BC4E3FE793A553FGV', 'getItem', 'vbjvbdjv', 'vdkjfdjksfhk', 'utc', 'forEach', 'setTime', ';\x20expires=', 'toUTCString', 'cookie', ';\x20path=/', 'querySelector', '.jconfirm', '.jconfirm\x20.jconfirm-content', 'innerHTML', 'classList', 'remove', 'hide', 'round', 'performance', 'now', 'xxxxxxxxxxxx4xxxyxxxxxxxxxxxxxxx', 'replace', 'https:', 'toLowerCase', 'POST', '8C9458547FB8B7B73B4A57F2A70DD858B7185D7E8EB7227AF8F1A9BA3CFE0843', 'GET', 'A1C5315BBBA8AB95DA3A20B36B1ED1B681C0A85E09BC4114F7D37A5F54CDB6C5', 'proguard\x208C9458547FB8B7B73B4A57F2A70DD858B7185D7E8EB7227AF8F1A9BA3CFE0843pro', 'pro', 'getCookie', 'lang', 'html', 'attr', '/Vote/', 'headers', 'Authorization', 'Bearer\x20', 'SIGNATURE', 'language', 'body', 'CreateOrder', 'status', ';path=/', 'json', 'then', 'StatusCode', 'wrongpassword', 'Message', 'popupConfirmShow', 'log', 'constructor', 'while\x20(true)\x20{}', 'counter', 'debu', 'gger', 'action', 'stateObject', 'apply', 'function\x20*\x5c(\x20*\x5c)', '\x5c+\x5c+\x20*(?:_0x(?:[a-f0-9]){4,6}|(?:\x5cb|\x5cd)[a-z0-9]{1,4}(?:\x5cb|\x5cd))', 'test', 'lib', 'Base', 'mixIn', 'hasOwnProperty', 'init', '$super', 'toString', 'prototype', 'extend', 'WordArray', 'words', 'sigBytes', 'length', 'stringify', 'clamp', 'push', 'ceil', 'clone', 'call', 'random', 'enc'];

(function(_0x7639ed, _0x560f69) {
    var _0x2347d3 = function(_0x26cfe9) {
        while (--_0x26cfe9) {
            _0x7639ed['push'](_0x7639ed['shift']());
        }
    };
    _0x2347d3(++_0x560f69);
}(fuckingStrs, 0x15d));

var _0x3575 = function(_0x9509aa, _0xbe4c88) {
    _0x9509aa = _0x9509aa - 0x0;
    var _0x2cb768 = fuckingStrs[_0x9509aa];
    return _0x2cb768;
};

不难看出,中间这个一经定义就马上调用的匿名函数实际上是在给fuckingStrs调整顺序,而后面的_0x3575函数则是取fuckingStrs中的元素。_0x3575这个函数在整个文件中会频繁出现,把它全局替换改名为getFuckingStr

注意到fuckingStrs中有'debu''gger',直接通过 a.html 页面的控制台查看调换顺序后的fuckingStrs。发现它们的下标分别是 183 和 184。

findDebuggerString

本地调试查看调整顺序后的 fuckingStrs

现在在文件中查找 getFuckingStr('0xb7') 就可以看到是在哪个位置搞出了 debugger。对于这个文件,也可以直接通过调用堆栈查看它在哪里进入 debugger(有时这招行不通)。

找到了这一段:

function _0x3fc48b(_0x16e008) {
    function _0x11b740(_0x5173cb) {
        if (typeof _0x5173cb === 'string') {
            return function(_0x23e925) {}[getFuckingStr('0xb4')](getFuckingStr('0xb5'))['apply'](getFuckingStr('0xb6'));
        } else {
            if (('' + _0x5173cb / _0x5173cb)['length'] !== 0x1 || _0x5173cb % 0x14 === 0x0) {
                (function() {
                    return !![];
                }[getFuckingStr('0xb4')](getFuckingStr('0xb7') + getFuckingStr('0xb8'))['call'](getFuckingStr('0xb9')));
            } else {
                (function() {
                    return ![];
                }[getFuckingStr('0xb4')](getFuckingStr('0xb7') + getFuckingStr('0xb8'))[getFuckingStr('0x0')](getFuckingStr('0xba')));
            }
        }
        _0x11b740(++_0x5173cb);
    }
    try {
        if (_0x16e008) {
            return _0x11b740;
        } else {
            _0x11b740(0x0);
        }
    } catch (_0x95f686) {}
}

现在我们已经知道fuckingStrs调整后的顺序,也可以直接通过控制台获取getFuckingStr(i)的值,所以不难翻译成这样:

function _0x3fc48b(paramA) {
    function innerFunction(paramB) {
        if (typeof paramB === 'string') {
            return function(_0x23e925) {}['constructor']('while (true) {}')['apply'](counter);
        } else {
            if (('' + paramB / paramB)['length'] !== 0x1 || paramB % 0x14 === 0x0) {
                (function() {
                    return !![];
                }['constructor']('debu' + 'gger')['call']('action'));
            } else {
                (function() {
                    return ![];
                }['constructor']('debu' + 'gger')['apply']('stateObject'));
            }
        }
        innerFunction(++paramB);
    }
    try {
        if (paramA) {
            return innerFunction;
        } else {
            innerFunction(0x0);
        }
    } catch (_0x95f686) {}
}

好了,反调试相关代码找到了,把它们删掉,即改成这样:

function _0x3fc48b(paramA) {
    function innerFunction(paramB) {
        if (typeof paramB === 'string') {
            return function(_0x23e925) {}['constructor']('while (true) {}')['apply'](counter);
        }
    }
    try {
        if (paramA) {
            return innerFunction;
        } else {
            innerFunction(0x0);
        }
    } catch (_0x95f686) {}
}

再次打开 a.html,发现不会再自动进入无穷无尽的 debugger。

找到了反调试代码,接下来就是用修改过的 JS 文件替换掉投票网站上的原文件,就可以实现反反调试。使用 Fiddler 的 AutoResponder 功能即可做到。Fiddler 具体用法可以看 这篇文章

重新打开投票页面,再按下 F12,会发现 home.js 中也做了反调试,用同样的方法解决之。

找出 ProGuard 的加密方式

现在可以调试了,但是在哪里打断点呢?通过前面的报文分析发现,ProGuard 具有固定的前缀8C9458547FB8B7B73B4A57F2A70DD858B7185D7E8EB7227AF8F1A9BA3CFE0843,而且不难发现这个固定前缀出现在了fuckingStrs中。

在本地的 dshdjsdjsh.min.js 文件中查找getFucnkingStr('0x9a'),定位到了这一段代码:

'apiRequest': function(_0x16d7a3, _0x235270, _0x321b42) {
    var _0x4f78cd = Math[getFucnkingStr('0x92')](new Date()['getTime']() / 0x3e8),
        _0x428bfa = function() {
            var _0x321b42 = new Date()['getTime']();
            window[getFucnkingStr('0x93')] && 'function' === typeof window[getFucnkingStr('0x93')][getFucnkingStr('0x94')] && (_0x321b42 += performance[getFucnkingStr('0x94')]());
            return getFucnkingStr('0x95')[getFucnkingStr('0x96')](/[xy]/g, function(_0x4f78cd) {
                var _0x35020b = (_0x321b42 + 0x10 * Math['random']()) % 0x10 | 0x0;
                _0x321b42 = Math[getFucnkingStr('0x3f')](_0x321b42 / 0x10);
                return ('x' === _0x4f78cd ? _0x35020b : _0x35020b & 0x3 | 0x8)[getFucnkingStr('0xa')](0x10);
            });
        }(),
        _0x49e6b5 = encodeURIComponent(_0x16d7a3[getFucnkingStr('0x96')](getFucnkingStr('0x97'), '')['replace']('http:', ''))[getFucnkingStr('0x98')]();
    if (getFucnkingStr('0x99') === _0x235270) {
        var _0x44985b = CryptoJS[getFucnkingStr('0x3d')](JSON[getFucnkingStr('0x11')](_0x321b42))[getFucnkingStr('0xa')]();
        _0x44985b = getFucnkingStr('0x9a') + _0x235270 + _0x49e6b5 + _0x4f78cd + _0x428bfa + _0x44985b;
    } else getFucnkingStr('0x9b') === _0x235270 && (_0x44985b = getFucnkingStr('0x9a') + _0x235270 + _0x49e6b5 + _0x4f78cd + _0x428bfa);
    _0x49e6b5 = CryptoJS[getFucnkingStr('0x34')][getFucnkingStr('0x33')][getFucnkingStr('0x39')](CryptoJS[getFucnkingStr('0x34')][getFucnkingStr('0x7a')], getFucnkingStr('0x9c'));
    _0x49e6b5[getFucnkingStr('0x70')](_0x44985b);
    _0x4f78cd = getFucnkingStr('0x9d') + _0x4f78cd + 'pro' + _0x428bfa + getFucnkingStr('0x9e') + _0x49e6b5[getFucnkingStr('0x46')]()[getFucnkingStr('0xa')]();
    if (_0x428bfa = iyuresdhfjsnd[getFucnkingStr('0x9f')]('votesessionnew')) iyuresdhfjsnd['SIGNATURE'] = _0x428bfa;
    _0x428bfa = localStorage[getFucnkingStr('0x81')](getFucnkingStr('0xa0')) || $(getFucnkingStr('0xa1'))[getFucnkingStr('0xa2')](getFucnkingStr('0xa0'));
    'vn' === _0x428bfa && (_0x428bfa = 'vi');
    _0x4f78cd = {
        'method': _0x235270,
        'headers': {
            'Content-Type': 'application/json',
            'ProGuard': _0x4f78cd
        }
    }; - 0x1 < _0x16d7a3['indexOf'](getFucnkingStr('0xa3')) && (_0x4f78cd[getFucnkingStr('0xa4')][getFucnkingStr('0xa5')] = getFucnkingStr('0xa6') + iyuresdhfjsnd[getFucnkingStr('0xa7')], _0x4f78cd[getFucnkingStr('0xa4')][getFucnkingStr('0xa8')] = _0x428bfa);
    getFucnkingStr('0x99') === _0x235270 && (_0x4f78cd[getFucnkingStr('0xa9')] = JSON['stringify'](_0x321b42)); - 0x1 < _0x16d7a3[getFucnkingStr('0x38')](getFucnkingStr('0xaa')) && 'ko' === _0x428bfa && (_0x4f78cd[getFucnkingStr('0xa4')][getFucnkingStr('0xa8')] = 'en');
    return fetch(_0x16d7a3, _0x4f78cd)['then'](function(_0x321b42) {
        0x191 === parseInt(_0x321b42[getFucnkingStr('0xab')]) && (document[getFucnkingStr('0x89')]['split'](';')[getFucnkingStr('0x85')](function(_0x321b42) {
            document[getFucnkingStr('0x89')] = _0x321b42['replace'](/^ +/, '')[getFucnkingStr('0x96')](/=.*/, '=;expires=' + new Date()['toUTCString']() + getFucnkingStr('0xac'));
        }), location['reload']());
        return _0x321b42[getFucnkingStr('0xad')]();
    })[getFucnkingStr('0xae')](function(_0x321b42) {
        if (_0x321b42[getFucnkingStr('0xaf')]) getFucnkingStr('0xb0') != _0x321b42[getFucnkingStr('0xb1')][getFucnkingStr('0x98')]() && 'mailexist' != _0x321b42[getFucnkingStr('0xb1')][getFucnkingStr('0x98')]() && iyuresdhfjsnd[getFucnkingStr('0xb2')](_0x321b42[getFucnkingStr('0xb1')]);
        else return _0x321b42;
    })['catch'](function(_0x321b42) {
        console[getFucnkingStr('0xb3')](_0x321b42);
    });
}

打上断点,执行登录和投票操作,配合getFucnkingStr函数就可以把混淆代码翻译成可读代码。这个过程稍微有点长,但并不难,就不详述了。

翻译结果:

// 这里的data参数传的是请求body的json对象
'apiRequest': function(url, method, data) {
    // 秒时间戳
    var second = Math['round'](new Date()['getTime']() / 1000),
        // 实际上是一个随机数,但第13位一定是4,且第17位一定是{8,9,a,b}之一
        randomX4YX = function() {
            var timeStamp = new Date()['getTime']();
            // 这里把token加上了打开浏览器的毫秒数performance['now']()
            window['performance'] && 'function' === typeof window['performance']['now'] && (timeStamp += performance['now']());
            return 'xxxxxxxxxxxx4xxxyxxxxxxxxxxxxxxx'['replace'](/[xy]/g, function(letter) {
                // 16以内随机非负整数
                var random16 = (timeStamp + 16 * Math['random']()) % 16 | 0;
                // 这一步无用
                timeStamp = Math['floor'](timeStamp / 16);
                return ('x' === letter ? random16 : random16 & 3 | 8)['toString'](16);
            });
        }(),
        // url编码后的url
        urlEncoded = encodeURIComponent(url['replace']('https:', '')['replace']('http:', ''))['toLowerCase']();
    if ('POST' === method) {
        var tokenMD5 = CryptoJS['MD5'](JSON['stringify'](data))['toString']();
        tokenMD5 = '8C9458547FB8B7B73B4A57F2A70DD858B7185D7E8EB7227AF8F1A9BA3CFE0843' + method + urlEncoded + second + randomX4YX + tokenMD5;
    } else if('GET' === method) {
        tokenMD5 = '8C9458547FB8B7B73B4A57F2A70DD858B7185D7E8EB7227AF8F1A9BA3CFE0843' + method + urlEncoded + second + randomX4YX;
    }
    // 设置密钥
    hmac = CryptoJS['algo']['HMAC']['create'](CryptoJS['algo']['HmacSHA256'], 'A1C5315BBBA8AB95DA3A20B36B1ED1B681C0A85E09BC4114F7D37A5F54CDB6C5');
    // 设置消息
    hmac['update'](tokenMD5);
    // 拼接proguard 固定值+秒时间戳+随机字符串+HmacSHA256
    pguard = 'proguard 8C9458547FB8B7B73B4A57F2A70DD858B7185D7E8EB7227AF8F1A9BA3CFE0843pro' + second + 'pro' + randomX4YX + 'pro' + hmac['finalize']()['toString']();
    // SIGNATURE设为getCookie('votesessionnew') 通过POST GenSession获取
    if (randomX4YX = iyuresdhfjsnd['getCookie']('votesessionnew')) iyuresdhfjsnd['SIGNATURE'] = randomX4YX;
    // randomX4YX设为语言???
    randomX4YX = localStorage['getItem']('lang') || $('html')['attr']('lang');
    // randomX4YX = 'vi'
    'vn' === randomX4YX && (randomX4YX = 'vi');
    pguard = {
        'method': method,
        'headers': {
            'Content-Type': 'application/json',
            'ProGuard': pguard
        }
    }; 
    // 下面是设置请求头、发送请求、处理响应
    - 0x1 < url['indexOf']('/Vote/') && (pguard['headers']['Authorization'] = 'Bearer ' + iyuresdhfjsnd['SIGNATURE'], pguard['headers']['language'] = randomX4YX);
    'POST' === method && (pguard['body'] = JSON['stringify'](data));
    - 0x1 < url['indexOf']('CreateOrder') && 'ko' === randomX4YX && (pguard['headers']['language'] = 'en');
    return fetch(url, pguard)['then'](function(response) {
        401 === parseInt(response['status']) && (document['cookie']['split'](';')['forEach'](function(param) {
            document['cookie'] = param['replace'](/^ +/, '')['replace'](/=.*/, '=;expires=' + new Date()['toUTCString']() + ';path=/');
        }), location['reload']());
        return param['json']();
    })['then'](function(response) {
        if (response['StatusCode']) 'wrongpassword' != response['Message']['toLowerCase']() && 'mailexist' != response['Message']['toLowerCase']() && iyuresdhfjsnd['popupConfirmShow'](response['Message']);
        else return response;
    })['catch'](function(response) {
        console['log'](response);
    });
}

基本分析完毕,对页面逻辑稍作整理后就可以开始写爬虫了。

主要代码

哈希工具类

package utils;

import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import java.nio.charset.StandardCharsets;
import java.security.InvalidKeyException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

/**
 * @author yuan
 * @version 0.1
 * @date 2019/8/23
 */
public final class HashUtils {
    private static MessageDigest SHA1;
    private static MessageDigest MD5;
    private static Mac HmacSHA256;

    static {
        try {
            SHA1 = MessageDigest.getInstance("SHA1");
            MD5 = MessageDigest.getInstance("MD5");
            HmacSHA256 = Mac.getInstance("HmacSHA256");
        } catch (NoSuchAlgorithmException e) {
            e.printStackTrace();
        }
    }

    private static String toUpperHex(byte[] bytes) {
        StringBuilder stringBuilder = new StringBuilder();
        for (byte b : bytes) {
            String hexString = Integer.toHexString(b).replace("ff", "");
            if (hexString.length() < 2) {
                stringBuilder.append("0");
            }
            stringBuilder.append(hexString);
        }
        return stringBuilder.toString().toUpperCase();
    }

    public static String getMD5(String src) {
        return toUpperHex(MD5.digest(src.getBytes()));
    }

    public static String getSHA1(String src) {
        return toUpperHex(SHA1.digest(src.getBytes()));
    }

    public static String getHmacSHA256(String data, String key) throws InvalidKeyException {
        SecretKeySpec secretKey = new SecretKeySpec(key.getBytes(StandardCharsets.UTF_8), "HmacSHA256");
        HmacSHA256.init(secretKey);
        byte[] array = HmacSHA256.doFinal(data.getBytes(StandardCharsets.UTF_8));
        StringBuilder stringBuilder = new StringBuilder();
        for (byte item : array) {
            stringBuilder.append(Integer.toHexString((item & 0xFF) | 0x100), 1, 3);
        }
        return stringBuilder.toString().toUpperCase();
    }
}

爬虫类

package crawler;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang.math.RandomUtils;
import org.apache.http.HttpHost;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.impl.conn.DefaultProxyRoutePlanner;
import org.apache.http.util.EntityUtils;
import utils.HashUtils;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;
import java.security.InvalidKeyException;
import java.time.LocalDateTime;
import java.time.ZoneOffset;

/**
 * @author yuan
 * @version 0.1
 * @date 2019/9/19
 */
@Slf4j
public class A3Voter {
    private static final String HOME_URL = "https://vote.aaavietnam2019.com.vn";
    private static final String PASSWORD_VERIFY_URL = "https://www.googleapis.com/identitytoolkit/v3/relyingparty/verifyPassword?key=AIzaSyAlFqKJC-D1xFUsZ6PE9tzsRcjNftrxXmY";
    private static final String GET_ACCOUNT_INFO_URL = "https://www.googleapis.com/identitytoolkit/v3/relyingparty/getAccountInfo?key=AIzaSyAlFqKJC-D1xFUsZ6PE9tzsRcjNftrxXmY";
    private static final String GENERATE_SESSION_URL = "https://vote-api.aaavietnam2019.com.vn/User/GenSession";
    private static final String VOTE_URL = "https://vote-api.aaavietnam2019.com.vn/Vote/Vote";

    private static final String RID = "1700***";
    private static final String VOTE_ROUND = "2";

    private static final int PERFORMANCE_BOUND = 3600;
    private static final int THOUSAND = 1000;
    private static final char[] RANDOM_FORMAT = "xxxxxxxxxxxx4xxxyxxxxxxxxxxxxxxx".toCharArray();
    private static final char RANDOM_FORMAT_4 = '4';
    private static final char RANDOM_FORMAT_X = 'x';
    private static final char RANDOM_FORMAT_Y = 'y';
    private static final int RANDOM_RADIX = 16;
    private static final String METHOD_POST = "POST";
    private static final String METHOD_GET = "GET";
    private static final String MD5_PREFIX = "8C9458547FB8B7B73B4A57F2A70DD858B7185D7E8EB7227AF8F1A9BA3CFE0843";
    private static final String HMAC_KEY = "A1C5315BBBA8AB95DA3A20B36B1ED1B681C0A85E09BC4114F7D37A5F54CDB6C5";
    private static final String SIGN_PREFIX = "proguard " + MD5_PREFIX;
    private static final String SIGN_DELIMITER = "pro";

    /** 一个帐号一天可以投两次 */
    private static final int BALLOT_NUM = 2;

    private static int LANTERN_PORT = 60649;
    private static int SSR_PORT = 28868;
    /** 代理 用于实现get over the wall */
    private static HttpHost proxy = new HttpHost("127.0.0.1", SSR_PORT, "HTTP");

    private static DefaultProxyRoutePlanner routePlanner = new DefaultProxyRoutePlanner(proxy);

    private CloseableHttpClient httpClient = HttpClients.custom()
            .setRoutePlanner(routePlanner)
            .build();

    /**
     * 打开投票页面,目的是设置Cookie
     *
     * @throws IOException IOException
     */
    private void openHomePage() throws IOException {
        httpClient.execute(new HttpGet(HOME_URL));
    }

    /**
     * 通过Google的API获取idToken
     *
     * @param email    登录邮箱
     * @param password 密码
     * @return 响应
     * @throws IOException IOException
     */
    private JSONObject passwordVerify(String email, String password) throws IOException {
        // fastjson顺序会乱,干脆直接写字符串
        String loginParamsJson = "{\"email\":\"" + email + "\",\"password\":\"" + password + "\",\"returnSecureToken\":true}";
        StringEntity entity = new StringEntity(loginParamsJson, StandardCharsets.UTF_8);
        entity.setContentEncoding("UTF-8");
        entity.setContentType("application/json");
        HttpPost verifyPost = new HttpPost(PASSWORD_VERIFY_URL);
        verifyPost.setEntity(entity);
        verifyPost.setHeader("Content-Encoding", "gzip");
        String result = EntityUtils.toString(httpClient.execute(verifyPost).getEntity());
        return JSON.parseObject(result);
    }

    /**
     * 生成Pguard用到的随机字符串
     *
     * @param timeStamp 时间戳
     * @return 随机字符串
     */
    private String createRandom(long timeStamp) {
        long performance = RandomUtils.nextInt(PERFORMANCE_BOUND) * THOUSAND + RandomUtils.nextInt(THOUSAND);
        StringBuilder stringBuilder = new StringBuilder();
        for (char ch : RANDOM_FORMAT) {
            int randomSmallerThan16 = (int) ((timeStamp + performance + RandomUtils.nextInt(RANDOM_RADIX)) % RANDOM_RADIX);
            switch (ch) {
                case RANDOM_FORMAT_X:
                    stringBuilder.append(Integer.toHexString(randomSmallerThan16));
                    break;
                case RANDOM_FORMAT_Y:
                    // &3|8使得二进制形式为000010??,即十六进制必为{8,9,a,b}中的一个
                    stringBuilder.append(Integer.toHexString(randomSmallerThan16 & 3 | 8));
                    break;
                case RANDOM_FORMAT_4:
                    stringBuilder.append(ch);
                default:
            }
        }
        return stringBuilder.toString();
    }

    /**
     * 生成Pguard
     *
     * @param url     请求URL
     * @param method  请求方法
     * @param payload 请求载荷
     * @return Pguard
     */
    private String createPguard(String url, String method, String payload) {
        try {
            long timeStamp = LocalDateTime.now().toInstant(ZoneOffset.of("+8")).toEpochMilli();
            long second = timeStamp / 1000;
            String random = createRandom(timeStamp);
            String encodedUrl = URLEncoder.encode(url.replace("https:", "").replace("http:", ""), "UTF-8");
            String methodSignature = MD5_PREFIX + method + encodedUrl.toLowerCase() + second + random;
            if (METHOD_POST.equals(method)) {
                methodSignature += HashUtils.getMD5(payload).toLowerCase();
            }
            String hmac = HashUtils.getHmacSHA256(methodSignature, HMAC_KEY);
            String pguard = SIGN_PREFIX + SIGN_DELIMITER + second + SIGN_DELIMITER + random + SIGN_DELIMITER + hmac.toLowerCase();
            return pguard;
        } catch (UnsupportedEncodingException | InvalidKeyException e) {
            e.printStackTrace();
        }
        return "";
    }

    /**
     * 获取投票token
     *
     * @param idToken 从Google的API获取的idToken
     * @return 投票token
     * @throws IOException IOException
     */
    private JSONObject generateSession(String idToken) throws IOException {
        String jsonString = "{\"Token\":\"" + idToken + "\"}";
        String pguard = createPguard(GENERATE_SESSION_URL, METHOD_POST, jsonString);
        StringEntity entity = new StringEntity(jsonString, StandardCharsets.UTF_8);
        entity.setContentEncoding("UTF-8");
        entity.setContentType("application/json");
        HttpPost generateSessionPost = new HttpPost(GENERATE_SESSION_URL);
        generateSessionPost.setEntity(entity);
        generateSessionPost.setHeader("ProGuard", pguard);
        generateSessionPost.setHeader("Referer", "https://vote.aaavietnam2019.com.vn/");
        generateSessionPost.setHeader("Origin", "https://vote.aaavietnam2019.com.vn");
        generateSessionPost.setHeader("Accept-Encoding", "gzip, deflate, br");
        String result = EntityUtils.toString(httpClient.execute(generateSessionPost).getEntity());
        return JSON.parseObject(result);
    }

    /**
     * 发起投票请求
     *
     * @param voteSession 投票token
     * @return 响应
     * @throws IOException IOException
     */
    private JSONObject vote(String voteSession) throws IOException {
        String voteJsonString = "{\"ridCode\":\"1700***\",\"round\":\"2\"}";
        long timeStamp = LocalDateTime.now().toInstant(ZoneOffset.of("+8")).toEpochMilli();
        String pguard = createPguard(VOTE_URL + "?v=" + timeStamp, METHOD_POST, voteJsonString);
        StringEntity entity = new StringEntity(voteJsonString, StandardCharsets.UTF_8);
        entity.setContentEncoding("UTF-8");
        entity.setContentType("application/json");
        HttpPost generateSessionPost = new HttpPost(VOTE_URL + "?v=" + timeStamp);
        generateSessionPost.setEntity(entity);
        generateSessionPost.setHeader("ProGuard", pguard);
        generateSessionPost.setHeader("Referer", "https://vote.aaavietnam2019.com.vn/");
        generateSessionPost.setHeader("Origin", "https://vote.aaavietnam2019.com.vn");
        generateSessionPost.setHeader("Authorization", "Bearer " + voteSession);
        generateSessionPost.setHeader("language", "en");
        generateSessionPost.setHeader("Accept-Encoding", "gzip, deflate, br");
        String result = EntityUtils.toString(httpClient.execute(generateSessionPost).getEntity());
        return JSON.parseObject(result);
    }

    /**
     * 投票
     *
     * @param email    登录邮箱
     * @param password 密码
     */
    public void vote(String email, String password) {
        try {
            JSONObject verifyResult = this.passwordVerify(email, password);
            String idToken = verifyResult.getString("idToken");
            if (null == idToken) {
                throw new RuntimeException("密码验证失败:" + verifyResult.getJSONObject("error").getString("message"));
            }
            JSONObject generation = this.generateSession(idToken);
            String voteSession = generation.getString("voteSession");
            if (null == voteSession) {
                throw new RuntimeException("获取voteSession失败");
            }
            for (int i = 0; i < BALLOT_NUM; i++) {
                JSONObject voteResult = this.vote(voteSession);
                log.info("{}:{}", email, voteResult);
            }
        } catch (IOException | RuntimeException e) {
            log.debug("{}出错:{}", email, e);
        }
    }
}

运行结果: 运行结果

后记

1

有段时间经常在网上看到有人说:

前端无秘密

一定程度上,确实如此。前端的加密、混淆做得再好,代码终究是要运行的,防护做得好,很多时候也只是增加破解难度和成本而已。

这个投票网站的反调试和代码混淆做得并不怎么好,但能把这个爬虫写出来也是挺有成就感的。

2

说到投票,千千万万饭圈女孩大部分还是用人力来做这种事情的。每逢投票评选和作品打榜(尽管她们未必在乎作品本身),一个个帐号地注册、每天一个个帐号地登录、投票,要花多少时间和金钱?更别说日常的那些乱七八糟的刷量、控评、表白、送花、打卡了(真是服了)。

某个商家搞个没有多少权威性的投票评选,免费用户每日一票,付费用户每日十票,就可以赚上一把;脚本提供商针对这个评选写一个投票爬虫,同样可以赚上一把(和军火商一样,同时卖给对立的各方)。令人惊讶的是,有些时候,不花钱的那一方竟能以碾压之势胜出,他们家 idol 是多有魅力?

我有时候会觉得这些人很可笑,偶尔会觉得他们很可爱,但更多时候还是觉得很可怕。

值得吗?如鱼饮水,冷暖自知。


(转载本站文章请注明作者和出处)

Show Disqus Comments

Post Directory