京东商城商品类目批量抓取数据爬虫Node

前言

标题只是献给搜索引擎,增加搜索命中率

1. 进入移动端京东商城的商品分类页面

https://so.m.jd.com/webportal/channel/m_category?searchFrom=home

2. 寻找分类数据

我在network中找过,似乎没用。在storage中找过,有用,把每一个分类点一次,能得到所有分类的数据。但没有一级目录数据。

3. 最终发现

在html页面中script脚本中,包含全部的商品分类数据。

查找方法,在html页面中搜索`热门推荐`

得到

<script>
window.bigpipe.setData("m1",{"json":{"errCode":"0","retCode":"0","msg":"","projectId":"1,715","projectName":"M站分类页","lastReleaseUser":"wangyangang6","lastReleaseTime":"2020-02-0818:11:29","keywordAreas":[{"areaId":"2006","areaName":"热门推荐","PTAG":"138624.10002.9999","extInfo1":"","extInfo2":"11053","label":"1","pattern":"4","customDoc":"","labelStartT=
...

4. 提取得到

[
    {
        "areaId": "2006",
        "areaName": "热门推荐",
        "PTAG": "138624.10002.9999",
        "extInfo1": "",
        "extInfo2": "11053",
        "label": "1",
        "pattern": "4",
        "customDoc": "",
        "labelStartTime": "1445875200",
        "labelEndTime": "1445875300",
        "patternStartTime": "1527077059",
        "patternEndTime": "1842696261",
        "level1words": [{
            "keywordId": "64754",
            "keyword": "热门分类",
            "url": "&projectId=1715",
            "ptag": "",
            "imageUrl": "",
            "extInfo1": "",
            "extInfo2": "",
            "label": "1",
            "pattern": "4",
            "customDoc": "",
            "labelStartTime": "1445875200",
            "labelEndTime": "1445875300",
            "patternStartTime": "1527077040",
            "patternEndTime": "1842696240",
            "level2words": [{
                "keywordId": "65103",
                "keyword": "手机",
                "url": "//so.m.jd.com/products/9987-653-655.html?PTAG=138624.10002.130001",
                "ptag": "138624.10002.130001",
                "imageUrl": "//img14.360buyimg.com/focus/jfs/t27136/183/1628977274/31007/a6f7ed55/5be6ebd8Nb07ef492.png",
                "extInfo1": "",
                "extInfo2": "",
                "label": "1",
                "pattern": "1",
                "customDoc": "",
                "catId": "0:0:0",
                "catName": "不限类目",
                "labelStartTime": "1445875200",
                "labelEndTime": "1445875300",
                "patternStartTime": "1526976440",
                "patternEndTime": "1842595640",
                "noneSearchUrl": "",
                "noneSearchUrlStartTime": "0",
                "noneSearchUrlEndTime": "0"
            }

...

5. 结论

此分类数据应该是前台类目,而商品直接关联的数据一般是后台类目。所以此数据也只能作为参考数据。原本以为会有相应的接口,能爬出所有的类目。