年纪越大,感到时光流逝的就越快。转眼间,就已经到了2023年。比我大一届的师兄师姐们就要毕业了。这篇博客主要放照片用,没有什么太多的文字内容。

祝各位毕业的师兄师姐前程似锦,生活精彩。🎉🎉🎉

阅读全文 »

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

As we grow older, we feel that time passes by faster. In the blink of an eye, it's already 2023. The seniors who are one year older than me are about to graduate. This blog mainly consists of photos and doesn't have much text content.

I wish all the graduating seniors a bright future and a wonderful life. 🎉🎉🎉

阅读全文 »

最近,清华大学和商汤发表了一篇名为《Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory》的文章,简称GITM。很有意思,感兴趣的朋友可以读一下原文。

阅读全文 »

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Recently, Tsinghua University and SenseTime published an article titled "Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory," abbreviated as GITM. It's quite interesting, and interested friends can read the original article.

阅读全文 »

深度强化学习的流程可以抽象为以下步骤的重复:

  1. 智能体与环境交互产生并存储经验
  2. 智能体从经验中进行学习

本文主要探讨在收集经验过程中,环境自然结束(Terminated,包括目标成功,失败等)和人为截断(Truncated,主要为达到一定步数结束)对经验收集和训练产生的影响,以及如何对其进行处理。并对其进行了部分实验来比较性能。

阅读全文 »

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

The process of deep reinforcement learning can be abstracted into the following steps:

  1. The agent interacts with the environment and generates and stores experiences.
  2. The agent learns from the experiences.

This article mainly discusses the impact of natural termination (Terminated), including successful or failed goals, and artificial truncation (Truncated), mainly ending after a certain number of steps, on experience collection and training. It also conducts some experiments to compare performance.

阅读全文 »

现在是2023年5月17日凌晨00时57分,不知道是下午喝了那杯拿铁的缘故,还是因为这两天发生的事情,到现在依旧没有困意。此外,脑中也有很多想法在不断涌现和争辩。思来想去,与其在床上胡思乱想,亦或是借酒助眠,不如来工位写一篇文章,梳理一下脑中所想,将不断涌现的混乱的想法整理为有条理与逻辑的文本内容。

红豆生南国,春来发几枝?

愿君多采撷,此物最相思。

阅读全文 »

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Now it is 00:57 on May 17, 2023. I still can't fall asleep, maybe because of the latte I had in the afternoon or because of what has happened in the past few days. My mind is filled with thoughts and debates. Instead of lying in bed and overthinking or relying on alcohol to fall asleep, I decided to write an article at my desk to organize my thoughts and turn the chaotic ideas into logical and organized text.

Red beans grow in the southern land, how many branches bloom in spring?

May you pick many, for this is the most lovesick thing.

阅读全文 »

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Common

  • Be cautious when implementing reinforcement learning algorithms, as attention to detail is crucial for convergence and training effectiveness. This article mainly documents some pitfalls encountered and details to be aware of while implementing various reinforcement learning algorithms, with continuous updates...

And here's my self-implemented RL algorithm library: https://github.com/KezhiAdore/RL-Algorithm

Image
阅读全文 »
0%