• About Jakob Engblom and this blog
Observations from Uppsala Computer Simulation, Virtual Platforms, Embedded Programming, Multicore and More (by Jakob Engblom)

Memory Models: x86 is TSO, TSO is Good

2011 June 22 17:16 / 1 Comment / Jakob

By chance, I got to attend a day at the UPMARC Summer School with a very enjoyable talk by Francesco Zappa Nardelli from INRIA. He described his work (along with others) on understanding and modeling multiprocessor memory models. It is a very complex subject, but he managed to explain it very well.

He showed a very interesting discussion from a few years ago on the x86 memory model and the implementation of spinlocks in the Linux kernel. Various experts went back and forth over whether the final MOV that sets a lock variable to 1 needed to be prefixed by LOCK or not. The discussion ended when Linus Torvalds said “I know that it is needed”. Only to see an Intel architect finally intervene and say “you know, really, it isn’t needed”. This was followed by a series of releases of Intel manuals documenting the x86 memory model, with increasing precision in each release. Intel also actually changed the published rules along the road, withdrawing some optimizations as they realized that they would break existing software.

Note that such a description of a memory model must both describe existing hardware, and serve as the guideline for future hardware. Therefore, there are optimizations that are not implemented today but which are possible given the rules. Such optimization opportunities can be removed from the rulebook as long as they have never been part of shipping hardware, so it is not as crazy as it might sound.

Anyway, the point that Francesco made was both to tell an interesting story from history, and making the point that describing and understanding memory models is hard. I certainly agree with that. I recall an ISCA many years ago when some computer architecture professors all agreed that very few people really understand consistency and weak memory models.

To make life easier for programmers, Francesco and Peter Sewell (in Cambridge) has defined their own set of rules for x86 memory consistency. This is not an architecture spec, but a rule set for regular programmers. It is found at http://www.cl.cam.ac.uk/~pes20/weakmemory/. Essentially, the conclusion is that x86 in practice implements the old SPARC TSO memory model.

They have also attempted to formalize the Power Architecture memory model. Both the actual memory model and their model of it can only be described as very complex. The programmer’s model is expressed in terms of store queues, speculative instruction execution, and commits of instructions. Not something you easily keep in your head. It is interesting to note that ARM MPCore essentially copied the Power Architecture.

He showed an interactive simulation of the Power memory model, and the way that you need to think about it in terms of propagating information between threads and committing them. It is possible to propagate values and then another propagation overrides a value before the thread commits… Fun. Or a headache.

The big take-away from the talk for me is that it confirms the observation made may times before that SPARC TSO seems to be the optimal memory model. It is sufficiently understandable that programmers can write correct code without having barriers everywhere. It is sufficiently weak that you can build fast hardware implementation that can scale to big machines.

Maybe TSO does not theoretically scale in the same insane way as Power or Alpha does/did. But the cost of that theoretical scalability is that programmers might have to litter their code with sync operations just to get it to run correctly. With too many sync operations, the code will run very slowly negating any advantage on the hardware level. Note that sync operations can be very expensive. Doug Lea, in the audience, pointed out that a sync can cost up to 300 cycles on a POWER5.

Posted in: computer simulation technology, conferences, multicore computer architecture, multicore software, parallel computing / Tagged: ARM, Doug Lea, Francesco Zappa Nardelli, memory consistency, power architecture, SPARC, UpMarc, x86

One Thought on “Memory Models: x86 is TSO, TSO is Good”

  1. Pingback: Weak vs. Strong Memory Models

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Post Navigation

← Previous Post
Next Post →

Recent Posts

  • Prototypical – Good [Book | PR | Reading]
  • Clocks or Cores? Choose One
  • Intel Blog: Wind River Using Simics to Test IoT at Scale
  • A Replay Debugger from 1995!
  • Intel Blog: Finding a Linux Kernel bug by running Simics on Simics
  • First post on the Intel Software and Services Blog
  • David Kanter on Benchmarking and Simulation (Tech Report podcast)
  • Wind River Guest Blog: Interview with Sangeeta, a CoFluent user doing Software Modeling
  • “Architectural Simulators Considered Harmful” – I would tend to agree
  • Article on Cloud-Based Virtual Labs and Why you Want Them
  • rr– The Mozilla Reverse Debugger
  • Electric Bikes – Useful Alternative
  • Time-Accurate Simulation Revisited – 15 years later
  • Wind River Blog: Interview with Intel Users of Simics
  • Google ”IoT” Testing for Chromecast: Cloud Emulation + Physical Gear

Categories

  • appearances (34)
  • articles (22)
  • blogging (14)
  • books (15)
  • business issues (34)
  • computer architecture (49)
  • conferences (38)
  • EDA (54)
    • ESL (36)
  • embedded (99)
    • embedded software (73)
    • embedded systeme (64)
  • evangelist (2)
  • general research (7)
  • history (41)
    • general history (11)
    • history of computing (30)
  • Intel Blog (3)
  • off-topic (114)
    • biking (6)
    • board games (1)
    • computer games (3)
    • desktop software (41)
    • food and drink (1)
    • funny (12)
    • gadgets (31)
    • Politics (4)
    • popular culture (7)
    • skeptic (1)
    • trains (5)
    • transportation (12)
    • travel (12)
    • websites (3)
  • parallel computing (104)
    • multicore computer architecture (56)
    • multicore debug (26)
    • multicore software (71)
  • programming (134)
  • review (12)
  • security (23)
  • teaching (8)
  • testing (15)
  • uncategorized (14)
  • virtual things (160)
    • computer simulation technology (92)
    • virtual machines (24)
    • virtual platforms (124)
    • virtualization (18)
  • Wind River Blog (68)

Tags

ARM blog commentary Cadence Checkpointing clock-cycle models Communications of the ACM computer architecture conference cycle accuracy debugging Domain-specific languages embedded fault injection freescale heterogeneous homogeneous IBM Intel iPod lego linux Microsoft mobile phones multicore off-topic office 2007 operating systems p4080 podcast commentary power architecture rant research reverse debugging reverse execution S4D SiCS Multicore days Simics simulation software tools SystemC video virtualization Vista Windows Wind River

1

  • F-Secure Blog

Blogs and news

  • Andras Vajda's blog (on multicore)
  • Embedded in Academia (John Regehr)
  • Grant Martin
  • Jack Ganssle
  • My Wind River Blog
  • Security Now podcast
  • Secworks (Joachim Strömbergson)
  • Simon Kågström
  • Synopsys View from the Top
  • Worse Than Failure

Archives

  • July 2016 (1)
  • June 2016 (3)
  • May 2016 (4)
  • April 2016 (2)
  • March 2016 (2)
  • January 2016 (3)
  • December 2015 (1)
  • November 2015 (4)
  • October 2015 (1)
  • September 2015 (3)
  • August 2015 (4)
  • July 2015 (4)
  • June 2015 (1)
  • May 2015 (2)
  • April 2015 (1)
  • March 2015 (2)
  • February 2015 (1)
  • January 2015 (2)
  • December 2014 (1)
  • November 2014 (6)
  • October 2014 (2)
  • September 2014 (3)
  • August 2014 (2)
  • July 2014 (2)
  • June 2014 (2)
  • February 2014 (2)
  • January 2014 (1)
  • December 2013 (3)
  • November 2013 (3)
  • October 2013 (1)
  • September 2013 (3)
  • August 2013 (1)
  • July 2013 (3)
  • June 2013 (5)
  • May 2013 (4)
  • April 2013 (1)
  • March 2013 (4)
  • February 2013 (1)
  • January 2013 (3)
  • December 2012 (2)
  • November 2012 (2)
  • October 2012 (1)
  • September 2012 (6)
  • August 2012 (4)
  • July 2012 (4)
  • June 2012 (3)
  • May 2012 (4)
  • April 2012 (2)
  • March 2012 (3)
  • February 2012 (1)
  • January 2012 (6)
  • December 2011 (2)
  • November 2011 (3)
  • October 2011 (4)
  • September 2011 (5)
  • August 2011 (4)
  • July 2011 (3)
  • June 2011 (4)
  • May 2011 (7)
  • April 2011 (1)
  • March 2011 (3)
  • February 2011 (5)
  • January 2011 (1)
  • December 2010 (4)
  • November 2010 (3)
  • October 2010 (5)
  • September 2010 (5)
  • August 2010 (5)
  • July 2010 (6)
  • June 2010 (5)
  • May 2010 (3)
  • April 2010 (4)
  • March 2010 (3)
  • February 2010 (4)
  • January 2010 (7)
  • December 2009 (6)
  • November 2009 (6)
  • October 2009 (7)
  • September 2009 (6)
  • August 2009 (7)
  • July 2009 (11)
  • June 2009 (5)
  • May 2009 (10)
  • April 2009 (7)
  • March 2009 (8)
  • February 2009 (9)
  • January 2009 (12)
  • December 2008 (8)
  • November 2008 (9)
  • October 2008 (9)
  • September 2008 (10)
  • August 2008 (13)
  • July 2008 (12)
  • June 2008 (8)
  • May 2008 (9)
  • April 2008 (10)
  • March 2008 (7)
  • February 2008 (8)
  • January 2008 (5)
  • December 2007 (5)
  • November 2007 (7)
  • October 2007 (7)
  • September 2007 (12)
  • August 2007 (9)
  • July 2007 (2)
© Copyright 2016 - Observations from Uppsala
Infinity Theme by DesignCoral / WordPress