Python Script For Notifying Experimental Results Via WeChat Work.

Posted on 2023-12-07 Edited on 2024-04-20 In Tips Views: Waline: Word count in article: 7k Reading time ≈ 6 mins.

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Deep learning experiments usually take a long time to run. Without proactive notification, two situations may arise:

Check the experiment results after a period of time, unable to focus on other things.
Abandon the experiment, not knowing whether it is running or completed.

To solve the above problems, it is necessary to add a monitoring process to the experimental program, notify and send the experimental results after the experiment ends. This article's script mainly includes three parts: process monitoring, tensorboard data extraction, and enterprise WeChat robot notification.

Process Monitoring

Usually, experiments are safer to run in the background, and they will not stop because the terminal is closed. The command to run in the background is:

1	nohup your_command >log_name.log 2>&1 &

The role of process monitoring is twofold:

Notify in time when the experiment ends abnormally.
Notify the result when the experiment ends normally.

It is relatively simple to implement. Just monitor whether the process has ended. The code is as follows:

import os
from utils.wechat_bot import research_bot
import time
import datetime

def process_num(key_words=""):
    num_process = int(os.popen(f"ps -ef |grep -v grep|grep {key_words}|wc -l").read())
    return num_process

while True:
    if process_num() == 0:
        now = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        send_report_text("The experiment is over, analyzing the results...")
        break
    time.sleep(60)

# do something

Tensorboard Data Extraction

The learning process data of deep learning is usually recorded in the log file by tensorboard. Using tensorboard, it is convenient to view training data, but it also reduces the convenience of performing other operations on training data. The following script can be used to extract the log data generated by tensorboard and write it to an excel file.

Just call the tb_to_excel function to use it. The input format is:

path_list: a list of strings, each item is the path of a log, such as ['logs/0','logs/1']
excel_path: the exported excel path, such as 'export.xlsx'
one_sheet(default: False): whether to merge all data into one excel sheet

Execution result: the log files in path_list are exported to the target excel file, and each item in path_list corresponds to each sheet in the excel file. If on_sheet=True, all data will be merged into one sheet through pd.concat.

import datetime
import pandas as pd
from tensorboard.backend.event_processing import event_accumulator
import os
from typing import Dict
import sys
import collections

def get_tb_data(path):
    tb_data = event_accumulator.EventAccumulator(path)
    tb_data.Reload()
    return tb_data

def tb_to_df(tb_data: event_accumulator.EventAccumulator):
    keys = tb_data.scalars.Keys()
    df = pd.DataFrame(columns=keys)
    for key in keys:
        data = tb_data.scalars.Items(key)
        df[key] = pd.Series([item.value for item in data])
        if 'step' not in df.columns:
            df['step'] = pd.Series([item.step for item in data])
        if 'wall_time' not in df.columns:
            df['wall_time'] = pd.Series([item.wall_time for item in data])

    order = ['step', 'wall_time'] + keys
    df = df[order]

    return df

def get_tb_df(path):
    tb_data = get_tb_data(path)
    df = tb_to_df(tb_data)
    return df

def extract_all_data(path_list:list) -> Dict[str, pd.DataFrame]:
    data_dict = {}
    for filepath in path_list:
        fn = filepath.split('/')[-1]
        if os.path.isdir(filepath):
            print("Reading and processing:", filepath)
            sheet_name = fn
            df = get_tb_df(filepath)
            data_dict[sheet_name] = df
    return data_dict

def df_to_excel(df_dict, path, one_sheet=False):
    if one_sheet:
        df_all = pd.DataFrame()
        for df in df_dict.values():
            df_all = pd.concat([df_all, df], ignore_index=True)
        df_all.to_excel(path, index=False)
    else:
        with pd.ExcelWriter(path) as writer:
            for sheet_name, df in df_dict.items():
                print("Writing sheet:", sheet_name)
                df.to_excel(writer, sheet_name, index=False)

def tb_to_excel(path_list, excel_path, one_sheet=False):
    df_dict = extract_all_data(path_list)
    df_to_excel(df_dict, excel_path, one_sheet)

Enterprise WeChat Robot Notification

There are many ways to choose from for proactive notification, including but not limited to:

SMTP sends email
Push services like Server Chan
QQ robot
WeChat robot
Enterprise WeChat application
Enterprise WeChat group robot

Each of these methods has its advantages and disadvantages. The demand for result notification is:

Good real-time performance: the notification can be received immediately after sending
No response: only one-way notification is required, and no reply needs to be set up

Therefore, after weighing the above requirements and considering the convenience of implementation, the enterprise WeChat group robot is selected as the information push method.

First, create an enterprise WeChat group robot:

Create an enterprise WeChat group chat (>=3 people)
Kick out irrelevant people
Add a group robot
View the robot information and find the key value in the Webhook address and record it

Then fill in the key in the following script to use it.

import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
import hashlib
import base64


SLEEP_INTERVAL = 0.1 # sleep time between retries
MAX_RETRIES = 1 # Number of times to retry a request

retries=Retry(
    total=MAX_RETRIES,
    backoff_factor=SLEEP_INTERVAL,
    status_forcelist=[403, 500, 502, 503, 504],
)

default_headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36 Edg/98.0.1108.62",
    "Accept-Encoding": "gzip, deflate",
    'Content-Type': 'application/json'
}

client = requests.Session()
client.mount("http://", HTTPAdapter(max_retries=retries))
client.mount("https://", HTTPAdapter(max_retries=retries))
client.headers=default_headers

BotList = {
    "Research Assistant":"XXX" # Fill in the group chat robot key here
}

class Bot:
    def __init__(self, key) -> None:
        self._base_url = "https://qyapi.weixin.qq.com/cgi-bin/webhook/send"
        self._key = key
        self._url = f"{self._base_url}?key={self._key}"
        
    def send_msg(self, msg):
        response = client.post(self._url, json=msg)
    
    def send_text(self, content):
        msg = {
            "msgtype": "text",
            "text": {
                "content": content
            }
        }
        self.send_msg(msg)
    
    def send_image(self, image_path):
        with open(image_path, "rb") as f:
            # encode an image to base64
            image = f.read()
            base64_bytes = str(base64.b64encode(image))[2:-1]
            md5 = hashlib.md5(image).hexdigest()
        msg = {
            "msgtype": "image",
            "image": {
                "base64": base64_bytes,
                "md5": md5
            }
        }
        self.send_msg(msg)
    
    
    def send_markdown(self, content):
        msg = {
            "msgtype": "markdown",
            "markdown": {
                "content": content
            }
        }
        self.send_msg(msg)
    
    def send_news(self, articles):
        msg = {
            "msgtype": "news",
            "news": {
                "articles": articles
            }
        }
        self.send_msg(msg)

research_bot = Bot(BotList["Research Assistant"])
if __name__ == "__main__":
    research_bot.send_text("Hello I'm your research assistant")