tedshd's DevNote

tedshd's DevNote

Develop & Design Note by Ted

Puppeteer 安裝在 ubuntu server 使用紀錄

on 2020-11-20

Puppeteer 是 Google 推出的基於 nodejs 的一套工具 可以控制 Chrome 和 Chromium 所以在爬蟲和測試等等需求都很好用

這裡記錄一下在 GCP 上面開一台 Computer Engine 後裝 Puppeteer 的紀錄

1. CE 開一個 instance

之前開 f1-mirco(1vCPU & 0.6G RAM)(共用核心)

這樣的等級如果只是 load 完頁面爬內容還是可以撐得住的

但是要做一些操作行為或是下滑垃取 AJAX 內容等等就不夠用了

所以就開了一台 e2-small (2vCPU & 2GB RAM)(非共用核心) 來跑

這次是裝 ubuntu 20.04 LTS

2. 安裝 node

先拉 node 套件庫下來

curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -

裝 nodejs

sudo apt install nodejs

Refer - How to Install Node.js and npm on Ubuntu 18.04

3. 裝要啟動 Puppeteer 和 chrome 等相關套件

sudo sudo apt update
sudo apt-get install ca-certificates fonts-liberation libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm1 libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 lsb-release wget xdg-utils

refer - Troubleshooting

通常裝完 Puppeteer 沒有啟動成功大多是有套件遺漏造成啟動不了 Chrome/Chromium

3. 裝 Puppeteer

npm i puppeteer

不要裝 puppeteer-core(因為這不包含 browser), 這個東西要裝請在自己的電腦裝來玩

Github - puppeteer

4. 寫個 sample code 吧

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({
    args: ["--no-sandbox"],
  });
  const page = await browser.newPage();
  await page.goto("https://google.com");
  await page.screenshot({ path: "example.png" });

  await browser.close();
})();

記得加上 –no-sandbox 有時在 terminal 有問題是因為沒加這個參數(但這是個選項)

Troubleshooting

在 Puppeteer 有個整理出來的 Troubleshooting

如果有任何狀況都可以參考該頁面

troubleshooting

Log

在使用時參考 launcharg

使用以下的內容當參考

List of Chromium Command Line Switches

puppeteer 文件也是有許多 method 的使用方式

https://pptr.dev/

Read more

JavaScript - 處理一些排序的方式

on 2020-11-18

常常會遇到用 JavaScript 處理一些資料結構的排序

這裡會列舉一些使用情境與案例

取出排名前幾名的資料(排名的值不重複)

資料

var data = {
  123: {
    count: 123,
    type: "video",
    source: "",
  },
  345: {
    count: 345,
    type: "video",
    source: "",
  },
  99: {
    count: 99,
    type: "image",
    source: "",
  },
  1: {
    count: 1,
    type: "video",
    source: "",
  },
  9786: {
    count: 9786,
    type: "image",
    source: "",
  },
  347: {
    count: 347,
    type: "video",
    source: "",
  },
};

處理方式

function topNine(data) {
  var arr = [];
  (tmp = Object.keys(data).reverse()), (l = tmp.length >= 9 ? 9 : tmp.length);
  for (let i = 0; i < l; i++) {
    arr.push(data[tmp[i]]);
  }
  return arr;
}

處理範例為取出前 9 筆

這樣的處理是使用 Object 的 Key 的排序本身是升冪排序

為了讓代碼好讀懂

使用了 Object.keys(data).reverse() 反轉陣列使其變成降冪排序

這樣跑 for 迴圈就可以自然而然的從 0 開始

考慮到資料可能不到 9 筆

所以用 (tmp.length >= 9) ? 9 : tmp.length 來處理迴圈要跑的次數

這樣這個函數就可以回傳前 9 名的資料了

但是這樣的資料結構有個問題就是範例資料的 count 必須是唯一值, 因為 JSON Object 不能重複 key

取出排名前幾名的資料(排名的值重複)

資料

var data = [
  {
    source: "",
    type: "image",
    ts: 1600860409000,
    liked: 9680,
  },
  {
    source: "",
    type: "image",
    ts: 1603438465000,
    liked: 11746,
  },
  {
    source: "",
    type: "image",
    ts: 1602389743000,
    liked: 13289,
  },
  {
    source: "",
    type: "image",
    ts: 1605334079000,
    liked: 14095,
  },
  {
    source: "",
    type: "image",
    ts: 1602652312000,
    liked: 14138,
  },
  {
    source: "",
    type: "image",
    ts: 1603275324000,
    liked: 14310,
  },
  {
    source: "",
    type: "image",
    ts: 1603600329000,
    liked: 14448,
  },
  {
    source: "",
    type: "image",
    ts: 1603080713000,
    liked: 14625,
  },
  {
    source: "",
    type: "image",
    ts: 1604823132000,
    liked: 15351,
  },
  {
    source: "",
    type: "image",
    ts: 1604919156000,
    liked: 15373,
  },
  {
    source: "",
    type: "image",
    ts: 1601123545000,
    liked: 15442,
  },
  {
    source: "",
    type: "image",
    ts: 1601544281000,
    liked: 16659,
  },
  {
    source: "",
    type: "image",
    ts: 1605088396000,
    liked: 17137,
  },
  {
    source: "",
    type: "image",
    ts: 1602749094000,
    liked: 17483,
  },
  {
    source: "",
    type: "image",
    ts: 1600928882000,
    liked: 17928,
  },
  {
    source: "",
    type: "image",
    ts: 1602562980000,
    liked: 18217,
  },
  {
    source: "",
    type: "image",
    ts: 1603709259000,
    liked: 21845,
  },
  {
    source: "",
    type: "image",
    ts: 1602470351000,
    liked: 22292,
  },
  {
    source: "",
    type: "image",
    ts: 1603887487000,
    liked: 22559,
  },
  {
    source: "",
    type: "image",
    ts: 1602846057000,
    liked: 22824,
  },
  {
    source: "",
    type: "image",
    ts: 1601465442000,
    liked: 25226,
  },
  {
    source: "",
    type: "image",
    ts: 1604736681000,
    liked: 25580,
  },
  {
    source: "",
    type: "image",
    ts: 1601285690000,
    liked: 25796,
  },
  {
    source: "",
    type: "image",
    ts: 1601693594000,
    liked: 26078,
  },
];

處理方式

function topNine(data) {
  var arr = [];
  var i, j, temp;
  for (i = 0; i < data.length - 1; i++) {
    for (j = 0; j < data.length - 1 - i; j++) {
      if (data[j]["liked"] < data[j + 1]["liked"]) {
        temp = data[j];
        data[j] = data[j + 1];
        data[j + 1] = temp;
      }
    }
  }
  var l = data.length >= 9 ? 9 : data.length;
  for (var n = 0; n < l; n++) {
    arr.push(data[n]);
  }
  return arr;
}

這個也不太難理解

第一個迴圈依照 liked 跑了一個降冪的氣泡排序(一般情況都是跑升冪)

temp = data[j];
data[j] = data[j + 1];
data[j + 1] = temp;

這邊使用 variable swap 的技巧來做前後兩個值的交換

c = a;
a = b;
b = c;

第二個迴圈就是取出前 9 筆的 liked

一樣是在 data.length >= 9 ? 9 : data.length 確認未滿 9 筆的話就有多少取多少

Read more