I don’t know if anyone else has this problem, but I sometimes want to know what dependencies will be introduced into my program by linking against a static library (on Windows). If I’m linking against a DLL, I can just run depends, which will tell me what other DLLs that DLL needs to load (and even which exports it’s pulling in), but in the case of a static library, I can’t find an analagous tool.
This came up for me recently at work: I’ve been asked to port a static library which I wrote some time ago to another platform. To get a sense of what kind of dependencies this library will want to drag along, I wanted to get a list of all it’s unresolved externals.
This has turned out to be harder than you might think. A few minutes with Google didn’t really turn up anything, so I took a look at dumpbin. dumpbin /symbols prints out a nice report of the library’s symbol table, complete with unresolved symbols clearly marked as UNDEF. Great, I thought: I’ll just type:
dumpbin /symbols foo.lib | grep -e '^[0-9a-fA-F]\+ [0-9a-fA-F]\{8\} UNDEF'
(this would be in a Cygwin shell, obviously) and have my list (which I’d pretty up later).
Not so fast. Persusing the list, I started seeing symbols listed as undefined which I knew to be defined in this library… hmmmm. A few more minutes spent persuing the original dumpbin output showed that they were, in fact, defined in this library! The symbols would show up once as undefined, and a second time as defined.
I can only guess that dumpbin is just concatenating the output I would get if I ran it against each .obj separately. That is, if symbol _XYZ is defined in module a.obj, and referenced in mobule
b.obj, we get two records (one for each module):
67F 00000000 SECT183 notype () External | _XYX
....
107 00000000 UNDEF notype () External | _XYZ
Damnit. Ok, so I’m going to have to write a little code, here. What I want to do is walk dumpbin’s output, parsing each record containing a symbol definition, that symbol’s undecorated name (if present), and whether or not it’s defined. The trick is that it may show up more than once.
IOW, a “mark & sweep” approach: as I parse each record, I need to check to see if it’s already been recorded and only mark it as undefined if the current record says it is and if it hasn’t already been marked down as present. Else, I want to mark it as defined. Once I’m done, I’ll sweep the datastructure of any records corresponding to symbols defined inside
my library.
I fired up a Python shell, even tho this kind of little reporting problem “feels” like Perl to me, so that I could horse around with these ideas interactively:
>>> import os, re
>>> f = os.popen("dumpbin /symbols foo.lib", "r")
>>> x = f.readline()
>>> print x
Now, the records we want generally look like this:
023 00000000 SECT9 notype External | ?FRAG_ACK@WscMsg@ani8021x@@2EB (public: static unsigned char const ani8021x::WscMsg::FRAG_ACK)
but we get lots of stuff we dont’ care about like:
Section length 1, #relocs 0, #linenums 0, checksum E963A535, selection 2 (pick any)
and some stuff that’s not un-decorated:
357 00000000 UNDEF notype () External | _memset
I guessed at a regexp,
^[0-9a-f]{3} [0-9a-f]{8} (SECT[0-9a-f]+|UNDEF) [^|]+\| ([^(]+) ?(?:\((.*)\))?
but how to tell? I tried it a few times in the interpreter:
>>> for i in range(1, 25):
... x = f.readline()
... m = re.search("^[0-9a-f]{3} [0-9a-f]{8} (SECT[0-9a-f]+|UNDEF) [^|]+\| ([^(]+) ?(?:\((.*)\))?", x, re.I)
... print x
... if m: print m.groups()
... else: print None
...
Cool. This let me watch my regex in action over enough lines to get some confidence in my approach: it was discarding the stuff about which I didn’t care, and parsing what I wanted.
So, let’s do this:
>>> program = re.compile("^[0-9a-f]{3} [0-9a-f]{8} (SECT[0-9a-f]+|UNDEF) [^|]+\| ([^(]+) ?(?:\((.*)\))?", re.I)
>>> print program
<_sre.SRE_Pattern object at 0x00A507B8>
With the regex now compiled, we’re ready to rock:
>>> f.close()
>>> data={}
>>> f = os.popen("dumpbin /symbols foo.lib", "r")
>>> x = f.readline()
>>> while x:
... m = program.search(x)
... if m:
... sym = m.group(2).strip()
... if sym[0] != '.' and sym[0] != '$':
... undefd = m.group(1) == "UNDEF"
... und = m.group(3)
... if not data.has_key(sym):
... data[sym] = [ undefd, und ]
... elif not undefd:
... data[sym][0] = False
... x = f.readline()
...
So at this point, we’ve traversed all the symbols in our library, and marked those that are undefined. Cleanup,
>>> f.close()
& sweep:
>>> for k in data.keys():
... if data[k][0]:
... undefined_symbols.append([k, data[k][1]])
...
That’s it– undefined_symbols is now a list of lists, each sub-list containing two elements: the symbol name and the undecorated version (which may be None).
We can just as quickly pretty-print our results to file:
>>> f = file("C:\\tmp\\report.txt", "w")
>>> for x in undefined_symbols:
... und = ""
... if x[1]: und = x[1]
... f.write("%s | %s\n" % (x[0], und))
...
>>> f.close()
Of course, I still have to figure out how to enumerate template instantiations made in my library, but whose definitions were pulled in from external code…