Hello! I need help to construct a regular expression in Python that will help me get the file name of a include directive in C++. I think that regular expressions are a good way to solve this problem, but I'm open to new ideas.

Consider these following includes:

#include "hello.h"
#include "d/hello.h"
#include "dir/hello.h"
#include "dir\hello.h"
#include <hello.h>
#include "a\b\c.h"
#include <ref\six\eight.h>
#include "123\456/789.h"
#include "bye.h"

In all these cases, I want the name of the included file (for example, hello.h, c.h, eight.h, 789.h and bye.h). I have written a regular expression that's not working (I think). Here it is:

fileNamePattern = re.compile(r"""
[\\/"<]  #The last part of the include file path begins with these characters."
[a-zA-Z0-9_]+
[>"]      #The end of include line must end with > or "
(\s)*     #Any whitespace character (including tabs, carriage returns)
$         #The end of the string.
""", re.VERBOSE)

I'm calling the groups() method on every match and on every string passed, it returns None (means that none of the strings match this regular expression).

Can you help me make a regular expression work with my case?

Thanks in advance.

Recommended Answers

All 2 Replies

import re

text = '''\
#include "hello.h"
#include "d/hello.h"
#include "dir/hello.h"
#include "dir\hello.h"
#include <hello.h>
#include "a\b\c.h"
#include <ref\six\eight.h>
#include "123\456/789.h"
#include "bye.h"
'''

new_text = re.findall(r'\w+\.\w' ,text)
print new_text  #list
print '\n'.join(new_text) #string
print set(new_text)  #No duplicate

"""Output-->
['hello.h', 'hello.h', 'hello.h', 'hello.h', 'hello.h', 'c.h', 'eight.h', '789.h', 'bye.h']
hello.h
hello.h
hello.h
hello.h
hello.h
c.h
eight.h
789.h
bye.h
set(['eight.h', '789.h', 'hello.h', 'c.h', 'bye.h'])
"""

Thank you for your help. It works!

By the way, I edited your regular expression to match file that have . in their name (for example, tree.old.h). It gives this:

re.findall(r'[\w\.]+\.h', line)
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.