Grebyn Corporation's T1000 Experience

		Future Research
Main Page Introduction Sun Microsystems T1000 The RSA Factoring Challenge The Constructive Approach Evaluation Summary Future Research Evaluation Chronology System Configurations Financial Disclaimer Postscript © Grebyn Corp. 2006		General Purpose CPU / System Platforms I have a request into Azul Systems which also has a try-and-buy program with their network attached compute device with similar characteristics, albeit marketed as a Java Compute Engine. I don't yet know whether this machine will allow native execution of programs in other languages. (1/2/2007 - Nope. Java only.) It would also be interesting to look at other systems, such as the Cell Processor, the KiloCore CPU, and other commodity multi-core systems as they become available. Another, wild, off-the-wall, out-of-the-box approach would be to create an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA) to see if they could be used here, since the implementation using a standard commercial CPU and Operating System wastes so much chip real estate and functionality. Algorithm Research The Aha! moment when moving the implementation off of decimal powers to binary powers for storage has lit a fire regarding the extension to higher powers (from 8 to 16, even possibly 32) for implementation. At the next boundary (16 bits), the current naive implementation of pre-computing tables will require 8GB per thread (or 256GB for a T1000 environment). This could possibly be addressed through more or larger disk drives or attached network storage (after all, the T1000 does have 4 Gigabit Ethernet (GbE) connections). Doing increased data lengths will eventually favor the 64 bit systems over the 32 bit systems because of the ability to natively perform arithmetic on larger values. How much that improvement will be is currently unknown. The current implementation also does not take advantage of the extensive memory available on the T1000. It is not clear whether there is a convenient way to do so without overly complicating it. The current implementation is also very non-distributed and also does not provide any reasonable mechanism for check-pointing or management of work performed. I just ran across The Eight Fallacies of Distributed Computing, which certainly will have to be taken into consideration when distributing the workload. It's not clear that the implementation really takes maximum advantage of the multicore architecture. Articles in the industry, such as Multicore faces a long road and Making the Move to Multicore, and my experience here, make it clear that programming multicore systems for performance is going to be more than just a simplistic approach of running N copies of a homogeneous application on a system in order to maximize utilization. The T1000 is clearly a nice platform for doing research and development on multicore programming. Previous Next

Future Research