Showing posts with label SGE. Show all posts
Showing posts with label SGE. Show all posts

Monday, June 2, 2008

Big Buck Bunny Rocks

Way back when (June 19, 2007 at ~4am) I sent mail to Ton at Blender. At the time I allowed that we had this compute grid that already ran Blender and that while I couldn't commit any time I could if he was interested put him in touch with the people who could. Happily he allowed that indeed they were looking for a render farm sponsor.


Woot! On May 29, 2008 I received my copy of the Big Buck Bunny DVD. I brought it inside and we paused the Stanley Cup Playoffs so we could watch it. It looks really good and I think it was worth the occasional pain on our part in supporting the rendering (also because my name is in the credits as I bought a pre-release copy of the DVD to further support the development effort :) ).


Big Buck Bunny was rendered on Network.com (press release) using Blender which is part of the Network.com application catalog




To focus on the things that I think most people reading Sun blogs would find interesting the rendering process from the Big Buck Bunny blogs. My role was down in the Sun Grid - Network.com bubble.


Renderfarm overview


If you look at the blogs there is plenty of information describing the development process, the rendering process and the fact that there were some complications in the execution. Interestingly one of the most common problems from my perspective were a large number of core files from the various Blender processes. These cores would at times fill up /var in the zones they were running in causing the Blender jobs to fail. Because of the implementation a single sub-task failure would cascade resulting in stopping the whole rendering process.


I "fixed" the core issue (coreadm is your friend) and cores have been disabled by default, now requiring that the end user make a local directory in their job setup scripts to hold their own cores if they would like to capture them. I believe that the cascading stop has been made into an option that is controllable in the job definition through the portal as well.


Go take a look at Big Buck Bunny and marvel at the power of open source software. Then realize how cool it is that a company such as Sun would spend/donate real time, money and resources on a creative commons animated movie. I'm glad we did it and I hope we can continue to do so in the future. (Maybe the next one as well)


Monday, January 21, 2008

Network.com in Vegas (belated)

I have had these pictures sitting on my laptop for about 1/2 a year. I pulled a couple out for a screen cast that I still haven't published because the audio blows. I have been doing a better job again on publishing pictures shortly after I take them. Network.com is the Sun Grid Compute Utility, I have a few pictures from the on site work.

Network.com setup in Vegas, Bellagio, Glass Flower Ceiling Network.com setup in Vegas, Shawn, Courtney

Network.com setup in Vegas, Compute Nodes Network.com setup in Vegas, Thumper disk bay, green

Network.com setup in Vegas, Torx Peg in a Hex Hole. I didn't put it in there :), I just removed it so we could relocate the rails. Network.com setup in Vegas, Courtney loves Storage Network.com setup in Vegas, Core Switch Time Servers

The Whole Gallery

Monday, November 26, 2007

SGE quick and dirty how to find jobs on 'bad' slots

I occasionally have a need to find queues in Sun Grid Engine that are in one of the possibly problematic states which have an occupied slot. It is just infrequent enough that I don't remember exactly how I did it the last time.

qstat -f | awk '$6~/[cdsuE]/ && $3!~/^[0]/'
queuename qtype used/tot. load_avg arch states
zone.q@r130c24z0.network.com BIP 1/1 -NA- sol-amd64 adu
zone.q@r130c24z1.network.com BIP 1/1 -NA- sol-amd64 adu

An alternate is "qstat -f | awk '$6~/[cdsuE]/ && $3~/^[1-9]/'" which also avoids printing the header line. In the example above 'state' in $6 matches 's' and 'used' does not begin with '0'.

The possibly more elegant 'qstat -f -qs cdsuE' still requires a second comparison in awk of '$0!~/--/' to filter out the queue separator lines. (qstat -f -qs acduE | awk '$0!~/--/ && $3!~/^[0]/')


Finally because I can never remember what exactly all the queue states are and the qstat man page doesn't have the nice table:


aoACD #8211 Number of queue instances that are in at least one of the following states:
a #8211 Load threshold alarm
o #8211 Orphaned
A #8211 Suspend threshold alarm
C #8211 Suspended by calendar
D #8211 Disabled by calendar

 

cdsuE #8211 Number of queue instances that are in at least one of the following states:
c #8211 Configuration ambiguous
d #8211 Disabled
s #8211 Suspended
u #8211 Unknown
E #8211 Error

 

Job State/Status:

d(eletion),  E(rror), h(old), r(unning), R(estarted), s(uspended), S(uspended), t(ransfering), T(hreshold) or w(aiting).

References: SGE (N1GE 6.0) -- Monitoring and Controlling Queues

Edit: Added Job Status, literally couldn't find that in any of the online docs (notwithstanding ~40% through the qstat(1) man page, targeted google searches do a poor job finding the link)