program main character array1(5) character array2(5) character array3(5) integer i integer j do i=1,25 do j=1,5 array1(j) = 'a' array2(j) = 'a' array3(j) = 'a' end do call sub(array2,i) print *,i,' ',array1,' ',array2,' ',array3 end do end subroutine sub(chars,maxind) character chars(*) integer maxind integer i do i=1,maxind chars(i) = 'b' end do return end
1 aaaaa baaaa aaaaa 2 aaaaa bbaaa aaaaa 3 aaaaa bbbaa aaaaa 4 aaaaa bbbba aaaaa 5 aaaaa bbbbb aaaaa 6 aaaaa bbbbb aaaaa 7 aaaaa bbbbb aaaaa 8 aaaaa bbbbb aaaaa 9 aaaaa bbbbb aaaaa 10 aaaaa bbbbb aaaaa 11 aaaaa bbbbb aaaaa 12 aaaaa bbbbb aaaaa 13 aaaaa bbbbb aaaaa 14 aaaaa bbbbb aaaaa 15 aaaaa bbbbb aaaaa 16 aaaaa bbbbb aaaaa 17 baaaa bbbbb aaaaa 18 bbaaa bbbbb aaaaa 19 bbbaa bbbbb aaaaa 20 bbbba bbbbb aaaaa 21 bbbbb bbbbb aaaaa 22 bbbbb bbbbb aaaaa 23 bbbbb bbbbb aaaaa 24 bbbbb bbbbb aaaaa 25 bbbbb bbbbb aaaaa
1 aaaaa baaaa aaaaa 2 aaaaa bbaaa aaaaa 3 aaaaa bbbaa aaaaa 4 aaaaa bbbba aaaaa 5 aaaaa bbbbb aaaaa 6 ~B bbbbb aaaaa 7 bbbbb aaaaa make: *** [go] Segmentation fault
1 aaaaa baaaa aaaaa 2 aaaaa bbaaa aaaaa 3 aaaaa bbbaa aaaaa 4 aaaaa bbbba aaaaa 5 aaaaa bbbbb aaaaa 6 aaaaa bbbbb aaaaa 7 aaaaa bbbbb aaaaa 8 aaaaa bbbbb aaaaa 9 aaaaa bbbbb aaaaa 10 aaaaa bbbbb aaaaa 11 aaaaa bbbbb aaaaa 12 aaaaa bbbbb aaaaa 13 aaaaa bbbbb aaaaa 14 aaaaa bbbbb aaaaa 15 aaaaa bbbbb aaaaa 16 aaaaa bbbbb aaaaa 17 aaaaa bbbbb baaaa 18 aaaaa bbbbb bbaaa 19 aaaaa bbbbb bbbaa 20 aaaaa bbbbb bbbba 21 aaaaa bbbbb bbbbb 22 aaaaa bbbbb bbbbb 23 aaaaa bbbbb bbbbb 24 aaaaa bbbbb bbbbb 25 aaaaa bbbbb bbbbb
1 aaaaa baaaa aaaaa 2 aaaaa bbaaa aaaaa 3 aaaaa bbbaa aaaaa 4 aaaaa bbbba aaaaa 5 aaaaa bbbbb aaaaa 6 aaaaa bbbbb aaaaa 7 aaaaa bbbbb aaaaa 8 aaaaa bbbbb aaaaa 9 aaaaa bbbbb aaaaa 10 aaaaa bbbbb aaaaa 11 aaaaa bbbbb aaaaa 12 aaaaa bbbbb aaaaa 13 aaaaa bbbbb aaaaa 14 aaaaa bbbbb aaaaa 15 aaaaa bbbbb aaaaa 16 aaaaa bbbbb aaaaa 17 aaaaa bbbbb baaaa 18 aaaaa bbbbb bbaaa 19 aaaaa bbbbb bbbaa 20 aaaaa bbbbb bbbba 21 aaaaa bbbbb bbbbb 22 aaaaa bbbbb bbbbb 23 aaaaa bbbbb bbbbb 24 aaaaa bbbbb bbbbb 25 aaaaa bbbbb bbbbb
1 aaaaa baaaa aaaaa 2 aaaaa bbaaa aaaaa 3 aaaaa bbbaa aaaaa 4 aaaaa bbbba aaaaa 5 aaaaa bbbbb aaaaa 6 aaaaa bbbbb aaaaa 7 aaaaa bbbbb aaaaa 8 aaaaa bbbbb aaaaa 9 aaaaa bbbbb baaaa 10 aaaaa bbbbb bbaaa 11 aaaaa bbbbb bbbaa 12 aaaaa bbbbb bbbba 13 aaaaa bbbbb bbbbb 14 aaaaa bbbbb bbbbb 15 aaaaa bbbbb bbbbb 16 aaaaa bbbbb bbbbb 17 aaaaa bbbbb bbbbb make: *** [go] Segmentation fault (core dumped)
esmf04m-strombrg> xlf -C -g why-early.f -o why-early ** main === End of Compilation 1 === ** sub === End of Compilation 2 === 1501-510 Compilation successful for file why-early.f. Fri Jun 10 17:41:55 esmf04m-strombrg> ./why-early 1 aaaaa baaaa aaaaa 2 aaaaa bbaaa aaaaa 3 aaaaa bbbaa aaaaa 4 aaaaa bbbba aaaaa 5 aaaaa bbbbb aaaaa Trace/BPT trap (core dumped) Fri Jun 10 17:42:00 esmf04m-strombrg> dbx why-early core Type 'help' for help. reading symbolic information ... [using memory image in core] Trace/BPT trap in sub at line 29 in file "why-early.f" 29 chars(i) = 'b' (dbx) where sub(chars = (...), maxind = warning: Unable to access address 0x110000a28 from core -1, 0x1), line 29 in "why-early.f" main(), line 16 in "why-early.f" (dbx) list 1,50 1 program main 2 3 character array1(5) 4 character array2(5) 5 character array3(5) 6 integer i 7 integer j 8 9 10 do i=1,25 11 do j=1,5 12 array1(j) = 'a' 13 array2(j) = 'a' 14 array3(j) = 'a' 15 end do 16 call sub(array2,i) 17 print *,i,' ',array1,' ',array2,' ',array3 18 end do 19 20 end 21 22 subroutine sub(chars,maxind) 23 character chars(5) 24 integer maxind 25 26 integer i 27 28 do i=1,maxind 29 chars(i) = 'b' 30 end do 31 32 return 33 34 end 35 (dbx)
For general information about the method I used to dig this up, please see debugging with system call tracers. I guess what I did is really all in there, but there are some ways of combining things that aren't necessarily immediately obvious.
Anyway, I fired up 20 truss's, one against each of your pop (not the e-mail protocol, but rather part of a climatology simulation) processes on esmf08m, saving the output from each truss to a distinct compressed text file. I had to start them up in the middle of the run (or so), because starting them up at the beginning generated too much output, and I had remove everything and restart the truss's.
I then looked near the end of all 20 files, using bunzip2 (which compresses text harder than gzip) and less -c. 19 of them seemed very similar, all having died on signal 15, but one was different - it died on signal 1.
So I examined that outstanding file further, and after wading back past all the localization-related I/O that's generated on a modern *ix system when reporting errors, found this:
This just about has to be the first-level cause (to coin a phrase. Let's call a first-level cause the cause that results in the error, and the 2nd level what caused the first level, and so on. Then the nth level cause is the problem in the source code - someone's source code, not necessarily yours!) of your error.
More specifically, that value, "0400000402", just about has to be illegal, and causing your EINVAL. I looked at the include file that defines what bits are legal in the second argument to open on AIX 5.1 (/usr/include/sys/mode.h - the values are logically or'd together), and found that while this value used by pop to open that file is 9 octal digits long, the flags in the file are normally only 6 digits long, and even the obscure ones are 8 digits long, which would seem to account for the EINVAL - IE, there are no bit flags that are 9 digits long, and just and'ing and or'ing those values shouldn't produce a 9 digit long octal value (though shifts could, but they aren't common for the second argument to open).
Now as far as finding the 2nd..nth level causes (the nth level one hopefully being the one that appears in the pop source code), that can be quite a chore sometimes. The first thing to do is probably to look at the open statement that's trying to open the file /ptmp/username/rungx3v5.140/ocn/rest/rungx3v5.140.pop.r.1791-01-01-00000, and seeing if there's anything unusual about it. If there is, you're in luck - one just corrects that problem, and you know n is 2, which is always nice to discover.
If there isn't, then you may be stuck with trying to guess what's changing this flag, which can be very difficult, for reasons described at my checking early page.
Things that may facilitate getting this working without having to resort to cranking warning levels and trying to eliminate all nth level causes the warnings catch in the hope of finding the "right" nth level cause corresponding to your 1st level cause (the EINVAL), are tools like purify (recently purchased by IBM from Rational, I believe), libefence, valgrind, and so forth.
Good luck, and feel free to let me know if you require further assistance with this potentially-messy issue.
Thanks!
You can e-mail the author with questions or comments: