Search This Blog

Sunday, March 7, 2010

iPad Saga - Week 6: Apple's Mobile Device Strategy

This article has been moved to http://www.technomicon.com/iPad_Saga_-_Week_6.html .  Please go to my new web site.

Thanks,

Mark W. Hibben

20 comments:

  1. I think you are far out.

    " Intel’s have become incredibly bloated in terms of transistor count, physical size, and power dissipation."
    Really? Do you believe that myth?

    PA6T and SpecFP?

    ReplyDelete
  2. Apple and Amiga crowd are similar in some cases, like in the capability of wishfull thinking... :-P

    Who will be the manufacturer of NG Amiga CPU/SoC?
    http://a-eon.com/6.html

    ReplyDelete
  3. Amiga X1000 PPC what CPU will it use!!!??

    PA6T?

    ReplyDelete
  4. PPC is a joke compared to x86 in raw processing power and pricing for desktops. PPC is a joke when compared to power requirements and pricing compared to ARM for mobile devices.

    Once Jobs back stabbed the AIM Alliance by destroying the Mac clone market, that was the beginning of the end of the PPC for desktops. It lives on in gaming consoles and embedded market for now.

    You know it's game over when even Freescale has stopped pushing PPC in favor of their ARM solutions.

    ReplyDelete
  5. "Compared to RISC processors, Intel’s have become incredibly bloated in terms of transistor count, physical size, and power dissipation."

    ????


    "On a modern processor, if x86 decode hardware takes up twice as many transistors as RISC decode hardware, then you're only talking about a difference of, say, 4% of the total die area vs. 2%. (I used to have the exact numbers for the amount of die area that x86 decode hardware uses on the Pentium 4, but I can't find them at the moment.)"
    http://arstechnica.com/old/content/2005/11/5541.ars

    ReplyDelete
  6. Response:

    I doubt that the relative bloat between Intel processors and comparable RISC processors is due exclusively to x86 translation, but the bloat is definitely there, and the numbers speak for themselves. Just look at the comparison between the PA6T and the Core 2 Duo transistor counts that I offered in this week's article. There's a reason why RISC is used almost exclusively in embedded systems, including ARM's. Why is this so controversial?

    ReplyDelete
  7. "Why is this so controversial?"
    Because you are comparing difference things.

    "There's a reason why RISC is used almost exclusively in embedded systems, including ARM's."
    And iMacs and Macbooks are emedded systems? I don't think so. ;-)

    "Just look at the comparison between the PA6T and the Core 2 Duo transistor counts that I offered in this week's article."
    Again. You are comparing difference things: logical tansistor count vs: total transistor count

    Also the numbers you are using are no facts.
    e.g. I have these numbers:

    About 200M transistors
    •21M transistors per core
    http://www.power.org/devcon/07/Session_Downloads/PADC07_Hayter_PADC_FINALv3-jcc.pdf


    Where do I find SpecFP?

    ReplyDelete
  8. Response:

    Thanks for the info on the PA6T total transistor count. So, we have 200 M transistors for the PA6T vs. 291 M for the Core 2 Duo T7600. Considering that the PA6T had on-board dual channel memory controllers and the Core 2 did not, I would call the Core 2 pretty bloated.

    Mark Hibben

    ReplyDelete
  9. We don't have (real) facts about the PA6T. We even know, if the PA6T is able to compete with a Core 2 Duo (performance).


    Counterexample:

    PowerPC 970MP: Transistor Count 183 million
    https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/B9C08F2F7CF5709587256F8C006727F1/$file/970MP_DS_DD1.1x_v1.3_17Jan2008_pub.pdf

    Intel Allendale (Core 2 Duo): 167 million transistors

    ReplyDelete
  10. "So, we have 200 M transistors for the PA6T vs. 291 M for the Core 2 Duo T7600."

    Yes - if the numbers are right.
    But don't forget that PA6T has AFAIK less Cache.

    T7600 = 4MB vs. PA6T = 2 MB

    I found this:
    "1MB (1024K) of cache = 62500*1024 = 64 million transistors for 1MB of L2 cache."
    I don't know if it is true but sounds plausible.

    So T7600 has about 256 million transistors just for l2 cache!

    ReplyDelete
  11. Response:

    Your counterexample isn't really reasonable or fair, since the Allendale E series isn't a mobile processor and isn't in the same power/performance class. For instance, the Core 2 Duo E6300 has a max TDP of 65 W and scores worse in SPECint2000 (1939), and SPECfp2000 (1978), but you are correct on transistor count (167 M) and Cache size (2 MB). A better comparison is with the Core 2 Duo T5670 (65 nm process), 2 MB L2 Cache, 35 W TDP, and 291 M transistors.

    Also your math on cache size seems a bit flaky, since you calculate a total cache transistor count of 256 M, which is almost the total transistor count of the chip (291 M). You may find this plausible, but I do not.

    Mark Hibben

    ReplyDelete
  12. Response:

    The comparison with the Allendale Core 2 Duo is unreasonable, since the Allendale series is not comparable in power, as it's a desktop processor. For instance, the E6300 has a max TDP of 65 Watts and slightly lower performance of SPECint2000 (1939), and SPECfp2000 (1978). A better comparison, and my counter-counter example is the Core 2 Duo T5670, a mobile processor with 35 W max TDP, 2 MB L2 Cache, and 291 M transistor count.

    Also, I find your calculation of transistor count for the cache a little suspect. You calculate 256 M transistors, which is most of the entire chip transistor count of 291 M. You may consider this plausible, but I do not.

    Mark Hibben

    ReplyDelete
  13. "Your counterexample isn't really reasonable or fair, since the Allendale E series isn't a mobile processor and isn't in the same power/performance class."

    Doesn't really matter. TDP is totally overrated for Core 2 Duo
    Take a look at xbit labs:
    TDP = 65W but:
    Core 2 Duo E8500: Idle: 3,4 W 100% Load 33,4W
    Core 2 Duo E8200: Idle: 2,5 W 100% Load 27,7W
    http://www.xbitlabs.com/articles/cpu/display/intel-wolfdale_11.html#sect0

    OK, take merom. Merom is a mobile processor:
    Number of transistors: 167 million.
    Core 2s with 4Mb of level 2 cache have 291 million transistors; Core 2s with 2Mb L2 have 167 million.


    PA6T: You wrote:+
    SpecInt 2000 > 2000
    SpecFP 2000 > 4000

    Where do I found these numbers?

    ReplyDelete
  14. You seem to be forgetting one thing: A4 is ARM based. Current Apple (desktop and laptop) CPUs are x86 based. A lot of apps (including *huge* apps like Adobe's suite) are now x86-only: how are you going to execute these apps with your A4-based laptop ?

    No... I'm sure A4 will end up in anything the size of the iPhone/iPad, but you surely won't see it on laptop/desktops anytime soon.

    ReplyDelete
  15. Response,

    Your assertion that Merom is 167 M transistors is simply in error. The counter-example I gave of the T5670 is a Merom processor. After checking Intel's documentation, I find that all the T5xxx series are listed as having 291 M transistors and 2 MB L2 cache.

    The performance results were published in the Lockheed Martin study which I have repeatedly referenced in the blog. You might want to go back to week 2. I'm sorry that the numbers are stated as > rather than specific values. L-M may have done this because of classification guidance they received from the US government.

    Mark Hibben

    ReplyDelete
  16. Response to comment that A4 is ARM based:

    I'm certainly well aware that rumors that A4 is ARM based have been widely reported and accepted uncritically as fact. As a dues paying member of the iPhone Developer Program and having downloaded the beta SDK, I can understand this. The whole point of this series of blog articles is to take issue with the conclusion that A4 is ARM based. I could hardly have forgotten it.

    Mark Hibben

    ReplyDelete
  17. "Your assertion that Merom is 167 M transistors is simply in error. The counter-example I gave of the T5670 is a Merom processor. After checking Intel's documentation, I find that all the T5xxx series are listed as having 291 M transistors and 2 MB L2 cache."

    I think there is an error in Intel's documentation, because it makes no sense.
    Do you really think more caches will cost nothing?

    "Number of transistors: 167 million."
    http://www.chiplist.com/Intel_Core_2_Duo_T5xxx_T7xxx_series_mobile_processor_Merom_2M_Socket_P/tree3f-subsection--2313-/

    ->
    Core 2s with 4Mb of level 2 cache have 291 million transistors; Core 2s with 2Mb L2 have 167 million.

    291 - 167 = 124
    124 / 2 = 62

    -> around 62 million transistors for 1 mb l2 cache


    "The performance results were published in the Lockheed Martin study which I have repeatedly referenced in the blog."
    OK, but there is no link. I also doubt these numbers (especially SpecFP 2000 > 4000).

    PS:
    I compared Allendale with PowerPC 970MP (= Dual G5).

    ReplyDelete
  18. Mark, if you erase my posts it will not change the facts.

    Someone told me that 1 x Conroe Core has 19 mill transistors (his source is Intel).

    ReplyDelete
  19. Because of the large volume of comments, and the fact that they come in at all hours of the day and night, from all over the world, I will only post comments once per day. If you don't see your comment right away, be patient.

    As far as the Merom transistor count, I don't believe that Intel's documentation is in error, it's just not very detailed guidance. After reviewing it for a large number of processors, it appears that they simply list two different die sizes and processor counts for the 65 nm process mobile processors: 111 mm^2 with 167 M transistors and 143 mm^2 with 291 M transistors. Apparently Intel, in their usual corporate paranoia, think that more detailed specs are somehow competition sensitive. But I still consider their numbers authoritative, if approximate. I agree that there should be some variation from processor to processor, especially for different cache sizes, but if Intel has assigned a processor to the 291 M class, then that's the number I will use. I still believe that chiplist.com is in error.

    Having said that, I think we're not merely losing the forest for the trees, we're stumbling over tree roots by getting so wrapped up in debates about transistor counts. My basic premise that the RISC architecture is inherently more power efficient must be true, or else ARM (whose acronym once stood for Advanced RISC Machine), would not have come to dominate the low power end of the embedded market. The problem with total transistor counts is that it tends to be obscured by other on-die functions which have nothing to do with the processor core architecture. The information I provided was for reference, and not intended as difinitive proof. My sweeping claim that the x86 processors have become "incredibly bloated", I agree I have not adequately substantiated, yet.

    The point of this article is to identify a "mega-trend" in which RISC processors are spreading from their base in the low power handset market to a larger role in mobile computing. This is the significance of the A4 regardless of the specific RISC architecture that it incorporates.

    A final note: various readers have expressed the opinion that I am somehow biased in favor of PPC. Nothing could be further from the truth. I agree that Intel processors are superior in desktop processing, which is why I own a water-cooled, overclocked Core 2 Quad 9550 based machine that I assembled myself, and I'm looking forward to upgrading to the new 6 core i7. But I also own about a half a dozen other computers, split between Mac and PC, and I am a dues paying member of the Apple iPhone Developer Program. My interest in processor architecture for the A4 is purely pragmatic: I want to see Apple use the best architecture possible. But I'll still be developing for the iPad regardless.

    ReplyDelete
  20. "If you don't see your comment right away, be patient."

    OK. Sorry!!!
    :-)


    "As far as the Merom transistor count, I don't believe that Intel's documentation is in error, it's just not very detailed guidance.
    [...]
    I agree that there should be some variation from processor to processor, especially for different cache sizes, but if Intel has assigned a processor to the 291 M class, then that's the number I will use. I still believe that chiplist.com is in error."

    Mark, I think they (Intel) produce just one chip (= 291 M transistors) and deactivate 2 MB Cache. That would explain the difference.

    (e.g. Conroe-2M and Conroe-4M)

    ReplyDelete