Status of subbmitting CCSM3 with a geometry suggested by George
Carr
- I'm using this diff to define the
geometry for CCSM3. It's a geometry that UCAR used with a beta of
CCSM3 on a small (for UCAR) system, with all the numbers cut in
half to try to fit it onto UCI's SP2 system.
- this is what I get from llq -s when I submit
this job. It doesn't look healthy. IBM has confirmed my
suspicion: the fact that we had 10 node types, and only 7
available nodes, was preventing the job from being dispatched for
execution.
Here's our LoadL_admin file
Minimalist job with a large geometry (for our IBM SP2 anyway - small
by many other sites' standards :), as suggested by Jim Edwards. This
job pretty much just identifies what nodes and CPU's were allocated
to it, and exits. Is this too minimal?
The above job was getting stuck in the queue. Jim suggested that I
work out a minimal geometry that runs right away. I suspect this may
be a function of what sorts of jobs are already running on the
machine, but the 4 bullets below document what happens when I
reduce the geometry on a very minimalist loadleveler job
substantially.
- phost
is a program I got from nersc.gov, that is supposed to report the
nodes and CPU's allocated to your job. They indicate that it
sometimes works where $LOADL_PROCESSOR_LIST does not.
- The job I'm submitting:
llsubmit-me
- The results of that job:
- poe.stderr.84.0 -
Please note all those setgroups errors. I don't see any occurrences
of setgroups in phost_mpi.c, but I do see phost_mpi trying to do
setuid() (which for some reason is reporting a setgroups problem)
when I truss phost_mpi. I don't really see any reason for
phost to setuid or setgroups; I've sent an e-mail to
NERSC asking if they have someone who can make an informed
comment.
- poe.stdout.84.0
Wed Nov 10 10:54:58 PST 2004
- I've added -x's at the head of a couple of csh scripts, to get
more info about what's happening
- I've added a grep for the task_geometry, so I get an immediate
report of what the geometry is when submitting. This should
probably be changed to inspect llq instead of the batch file. Or
both. :)
- I don't appear to be getting a poe.cmdfile...
- Revised to report both the task_geometry in the batch file, and
the task_geometry in llq -s
- Added a sed for the geometry in local-script