October 2022 roundup

6 min readOct 31, 2022

This is a monthly newsletter where I try to collate interesting blogs and papers that I have encountered during the month. Currently it covers interesting advances in ML, Semiconductor and Software Engineering.

Machine Learning

Paper: Mega: Moving Average Equipped Gated Attention

link: https://arxiv.org/abs/2209.10655

The main reason behind the huge success of the transformers architecture in different tasks and domains such as language, audio, images and video is the self attention mechanism.

Still there are two major drawbacks in the attention mechanism. These are

Weak inductive bias
Quadratic computational complexity.

These issues hamper the application of transformers to long sequence tasks.

In this paper, the authors proposed a moving average equipped gated attention mechanism (MEGA). Basically inductive bias is introduced by using exponential moving average approach. This enables the model to both learn complex dependency patterns and have computationally efficient chunk wise computation.

Another major difference is the choice of the output attention function. Generally softmax is the most popular, but the squared relu function is shown to have the best results recently. Since this function is unstable during training, a laplace attention function is proposed which is a more stable approximation of the squared relu function.

As per experiments, in text tasks, softmax is better but in image tasks, the laplace based function is better.

Paper: PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

link: https://arxiv.org/abs/2203.11130

One of the major current trends in machine learning is the improvement of physical common sense in everyday objects in AI systems. Common sense is a myriad of simple facts about everyday life and the ability to make use of that knowledge when appropriate.

For the betterment of the common sense models the authors have worked on and released the PACS dataset. It is the first audiovisual benchmark annotated for physical commonsense attributes. Most of the underlying tasks are binary tasks where given a question and couple of choices, the model needs to have the most appropriate choice.

link: https://github.com/samuelyu2002/PACS

For human performance, 243 data points are taken from the dataset and given to 10 annotators. Half were given with audio and half without to quantify the importance of audio as well.

Below are the descriptions of the models that are used for establishing benchmarks.

Backbones used:

Vision Transformer model for the image model
Audio Spectrogram Transformer model for the audio model
Temporal difference network for the video model
Text model: Deberta V3 large model

Baseline models created:

Latefusion to encode all four modalities as an input baseline. The unimodal embeddings are concatenated and a linear layer is used to create the embeddings for prediction
CLIP embeds image and text into a shared vector space.
AudioCLIP to include audio inputs as well. A linear layer is used to project the audio embeddings into the same vector space of the text embeddings.
UNITER model is trained using 4 different image text tasks. Here the objects are split and two object question embeddings are created. This is concatenated and an MLP is used to classify the answer.
Merlot Reserve model uses all the modalities to get the best results.

For the results, there is a big difference between the model performance and the human performance. As per the authors the models are not able to utilize all the information from the multimodel input and also are not able to capture strong understanding of common sense.

Semiconductor Engineering

Fan Out WLP: Various papers

Chip packaging is one of the steps in the manufacturing process. Important goals of any packaging process are size, cost, performance and yield. Also it should be easy to perform die testing.

Fan out wafer level packaging is a promising tech for this. Recently there are a lot of publications towards improving fan out wafer level packaging.

In previous tech, wafer is diced first and then individual dies are packaged. So package sizes become bigger. In WLP, integrated circuits are packaged when still part of the wafer. Hence the resulting package is almost same size as the die. But the number of IO gets reduced.

In fan out WLP, the wafer is diced first. But then the dies are repositioned on a career wafer, with space for fan out kept around each die. A redistribution layer is made on the entire molding area.

source: https://ase.aseglobal.com/en/technology/fan_out

There are two categories of fan out process flows — 1. die first or otherwise known as mold first and, 2. RDL first.

source: https://semiengineering.com/fan-out-packaging-gets-competitive/

RDL first approaches as some distinct advantages. High density RDLs can be achieved with finer line width space. More performance, more IOs, multichip integration and better testing.

References:

Thermal Scanning Probe Lithography for MoS2 transistors

link: https://pubs.acs.org/doi/10.1021/acsami.2c10150

2D semiconductors have huge promise to circumvent the barriers of miniaturization with semiconductors.

But fabricating 2D materials is quite challenging as their properties can be significantly altered by chemical and physical fabrication processes.

UV radiation can affect the interface between graphene and SiO2, induce hysteresis in the electronic characteristics and reduce the charge carriers mobility.

source: https://www.jstage.jst.go.jp/article/jsapmeeting/2019.1/0/2019.1_3560/_pdf

In the paper, for patterning small features thermal scanning probe lithography (t-SPL) and direct laser writing (DLW) for larger features away from the 2D materials is used.

t-SPL is an emerging direct write nano-lithography method for improved nano patterning. There is a sharp heated tip to generate high resolution patterns.

source https://www.epfl.ch/labs/lmis1/research/t-spl/

Currently electron beam lithography is considered the work horse in modern high end photo masks. But this method is expensive. Also electron scattering happens which leads to additional undesired resist exposure that is corrected by compute intensive algorithms.

Scanning probe lithography is another option, which patterns are created using a nm-sharp tip to locally induce modifications. But the write speeds are quite slow on the order of 0.1–50 μm/s and hence impractical from production purposes.

In t-SPL, only small volumes of heat is required. Writing speeds are also fast of the order of mm/s. There are no charged particles thus avoiding unwanted covalent bonds. Current challenges are throughput and tip degradation.

The other method used in the paper is direct laser writing. This involves an ultrafast laser beam, into a small volume, inside a photosensitive resin. It works via non linear absorption of two or more photons by photosensitive monomers.