The SAMOS XXI Conference (Virtual)

The International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS) XXI conference took place a couple of weeks ago. Like all other events in the past 18 months, it was virtual due to Covid-19. For more on the background on the SAMOS conference, see my blog post about SAMOS XIX (from 2019). This year, I presented a tutorial about our public release of the Simics simulator and took the chance to listen to most of the other conference talks.

Gather.Town virtual conference system

In 2020, SAMOS was a series of videos on Youtube. Not ideal, but the best that could be achieved on short notice in the early days of Covid.

This year, SAMOS used the gather.town virtual world system to provide a sense of physical presence. The system looks like a 1990s video game, with squarish sprites running around in a scrolling 2D world made up of multiple rooms. Users can talk to each other when they are close enough in the world, and other users can join in to create a group conversation.  This works pretty well as a way to grab someone and have a conversation. Each user is supposed to use a microphone and a camera to show themselves when interacting with other users. It appears that gather.town is intended as a group collaboration and remote socialization tool, not primarily as a conference system.

When arriving at the conference, you were landed in the outdoors area of the virtual SAMOS island. Here we can see the virtual Intel booth as well – the light area demarcates an area where all can hear each other. Just like a real trade-show booth. If you walk up to the roll-up on the right, you will see the top of it pop up on your screen.

For the conference talks, the SAMOS organizers made use the gather.town “presenter spots”. When a user is standing on such a spot, everyone in the same virtual room can hear and see the person. In a typical conference setting, there is one spot for the presenter, one for the session chair, and one or more open mikes for questions (like you do it in the physical world).

To present talks, gather.town relies on screen sharing. With a dual-monitor setup this worked OK for me as a presenter… but it is a bit hard to keep an eye on everything at once. More refined and focused conference calling tools like Teams or Zoom do this better – in particular, I like it when I can upload Powerpoint presentations for broadcast from within the system, or when it is possible to share a single window.

Presenting pre-recorded videos (as was used in a few presentations) was not all that smooth. gather.town does not appear to have a built-in function for showing videos, and instead a moderator had to share their screen, play the video there, and get the audio out by sharing the audio as well (Mathias Jung got this working nicely, but I don’t think all users are as technically savvy).  gather.town appears to be designed primarily for live interaction, which does make sense for their main target market of workplace collaboration and social gatherings.

I also tried accessing the conference using the gather.town app. This was strictly worse than using the web interface. Particularly annoying was that it detected when other programs were in focus and switched the sidebar view shown above. To me, this makes the app useless, since I like to take notes in OneNote while listening to talks. But with the app, the talk would immediately disappear when I moved over to OneNote. Using the web browser had none of these issues.

Other notes

The conference schedule worked well for a virtual conference. Each day was kept to four to five hours of talks, which is about right. Longer than that, and people will drop off or lose their attention. The physical world variant of SAMOS used a similar schedule, dedicating the afternoons to social events and discussions.

Unfortunately, the social component did not work for the virtual SAMOS. It seems that having people stay online to discuss requires some more organization or dedicated sessions. The system we used definitely supported it, but it seemed people were not all that interested in playing virtually social.

Indeed, the main shortcoming of the virtual SAMOS compared to the in-person event I attended two years ago was the lack of fun discussions after the talks. The post-talk freee discussions that makes SAMOS such a great event did not translate well to the virtual format. The discussions never really materialized in the virtual form, and we had very few senior people attending. It would seem that typically, if you are a busy senior researcher or practitioner, it is much easier to set aside time for discussions when you are attending an event in-person.

Keynote on “UPEC”

The best talk of the conference was the keynote by Professor Doktor-Ingenieur Habil. Wolfgang Kunz from the Technische Universität Kaiserslautern.  He talked about his work (in collaboration with Intel among others) on using formal methods to try to find (or prove the absence of) transient execution effects that violate security assumptions/guarantees of the specified instruction set semantics.

TEE is the latest term for the set of problems that became famous with Spectre, Meltdown, et al. I.e., issues where the way that processors execute instructions “under the hood” leads to unintended information leakage that should not be possible at the defined instruction set level.

The method he and his group employed to catch TEEs is called UPEC, Unique Program Execution Computation.

The idea is clever: they use two instances of a system (in RTL). Both have the same initial state for the instruction-set-defined user-visible part of the total system state. However, the initial state differs in the secret/non-visible part. The state evolution of the two systems is explored side by side, and if at any point the user-visible state differs between the two, the execution represents a counter-example that shows how non-visible state leaks into the visible state.

Using this method, they found a transient effect in the simple Rocket RISC-V core! This is an in-order five-stage no-speculation basic-RISC-core that you would expect to be immune from TEEs… However, that is not the case. The core-to-cache interface turns out to use some forwarding logic, and that is sufficient to allow data to leak across memory protection boundaries! See the paper “Processor hardware security vulnerabilities and their detection by unique program execution checking”, by M.R. Fadiheh, D. Stoffel, S. Mitra, C. Barrett, W. Kunz, presented at Design, Automation Test in Europe Conference Exhibition (DATE), March 2019. On Arxiv.

Another insight from the talk was that sometimes the issues found might not be solved in the RTL/hardware design, but instead pushed to the firmware/software side. As long as it is known what happens in the hardware and it is documented, (trusted) firmware or software can take proper care not to expose upper layers of the stack to the issues.

Negative results

Just like previous SAMOS conferences, there was a session on “negative results”. I must admit I felt that this edition was not quite as good as last time, but there were still some nice insights.

I found one paper in particular interesting as an example of a general issue. The paper, “(When) do Multiple Passes Save Energy?” by Louis Narmour, Tomofumi Yuki and Sanjay Rajopadhye, addressed a question about the power efficiency effects of tiling in parallel computation tasks.

Slide from the talk, showing the very nice effect of a black background in the gather.town presentation context.

The concept that was explored was the trade-off between maximizing the number of parallel compute tasks and the increased combined cache pressure from having more tasks run in parallel. There was an original paper that claimed that by reducing the number of parallel compute tasks, the overall cache efficiency of the computation would increase, and that this would in turn provide the same computation at up to 10% less energy (but with an increase in completion time due to less parallel work). The authors of the SAMOS XXI paper had tried to replicate the results, and found that they could not replicate the promised savings in their setting. The key difference seems to be the transition from CPU-based to GPU-based computation. Which should surprise nobody – different types of processors should be expected to have different trade-offs.

To me, this paper pointed at a general issue I have had with some academic papers for a long time: that papers lack clarity on their underlying assumptions and area of applicability. To me as a practitioner in industry, the first thing I want to know is just when an idea or technique makes sense, and when not. Too often it seems that researchers either do not understand this aspect at all, or that they are afraid to make a clear statement on this since it would make their research seem less valuable or general.

Trends

The papers presented this year exhibited some trends that seem to be quite universal today.

RISC-V is definitely the most popular target architecture for software work – adding new instructions, implementing libraries, etc.  Mostly actually connected to the RTL cores – not so much simulation work. Indeed, it is rather popular to work at the RTL level, using open-source RISC-V cores, running on FPGA. There exist at least six different open-source RISC-V cores in RTL available for researchers to play with. ARM is trying to muscle in on this with free access to some simple cores in RTL, but RISC-V university cores are apparently winning.

Current deep learning and CNN technology comes up quite often.  Both as the target of hardware implementations, software architecture, and used in tools to produce results in lieu of exhaustive search or other optimization techniques.

GEM5 is the default standard simulator for cycle-accurate simulations. Which it has been for a rather long time.

A bit late

It took a few weeks after the conference to get this out, sorry for that.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.