Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The consistency of string during each process may not be well-concerned. #3

Open
thejimmylin opened this issue Jul 29, 2020 · 3 comments
Labels
invalid This doesn't seem right

Comments

@thejimmylin
Copy link
Owner

thejimmylin commented Jul 29, 2020

Consider a Fortigate config like

config firewall policy
    edit 168
        set name "policy 168"
        set uuid 14435052-3097-4d70-98c7-1dd2d60e229f
        set srcintf "jimmylin__1688"
        set dstintf "port1"
        set srcaddr "address__jimmylin__10.100.168.11/32"
        set dstaddr "all"
        set action accept
        set schedule "always"
        set service "ALL"
        set comments "\"customer\": \"Jimmy Lin\""
        set nat enable
        set ippool enable
        set poolname "ippool__jimmylin__168.100.168.11"
    next
end

(This is a .conf, like a plain text file)

Say the file is living in /conf/, named firewall_policy.conf.
We want to get it into python so then we can do all the data-processing things around it, So we do

>>> with open(file='/conf/firewall_policy.conf', mode='r', encoding='utf-8') as f:
>>>     lines = f.read().splitlines()

Now, let's look deeper into it.

>>> lines[2]
'        set name "policy 168"'
>>> print(lines[2])
        set name "policy 168"
>>> lines[10]
'        set service "ALL"'
>>> print(lines[10])
        set service "ALL"
>>> lines[11]
'        set comments "\\"customer\\": \\"Jimmy Lin\\""'
>>> print(lines[11])
        set comments "\"customer\": \"Jimmy Lin\""

We'll want something like

>>> def parse(string):
...     # some string-parsing things
...     return string
...
>>> parse(lines[2])
['set', 'name', '"policy 168"']
>>> parse(lines[10])
['set', 'service', '"ALL"']
>>> parse(lines[11])
['set', 'comments', '"\\"customer\\": \\"Jimmy Lin\\""'']

If we simply use

>>> def parse(string):
...    return string.split()
...

Then we will get

>>> parse(lines[2])
['set', 'name', '"policy', '168"']

Obviously that is not we want.

If we try shlex — Simple lexical analysis, a module in The Python Standard Library

>>> from shlex import split as shlex_split
>>> def parse(string):
...    return shlex_split(string)
...

Then we will get

>>> parse(lines[2])
['set', 'name', 'policy 168']

That's great, and we can join them back together by

>>> from shlex import join as shlex_join
>>> shlex_join(parse(lines[2]))
"set name 'policy 168'"

That's almost the same as the original string, but the double quotes around policy 168 became single quotes. It's acceptable though.

But when it comes to lines[10]

>>> parse(lines[10])
['set', 'name', 'ALL']
>>> shlex_join(parse(lines[10]))
'set service ALL'

The double quotes just disappear, because ALL is not a string containing space, and shlex doesn't think it is needed to add quotes around it.

I looked up the shlex documentation and it is a parameter called posix to use with.

>>> from shlex import split as shlex_split
>>> def parse(string):
...    return shlex_split(string, posix=False)
...
>>> parse(lines[2])
['set', 'service', '"ALL"']

And we can simply use the built-in join to turn it back.

>>> ' '.join(parse(lines[2]))
'set service "ALL"'

But when it comes to lines[11], that is not the case.

>>> ' '.join(parse(lines[11]))
['set', 'comments', '"\\"', 'customer\\":', '\\"Jimmy', 'Lin\\""']

I think that is the different way of parsing a string between Forti devices and the shlex module.
Forti device is seeing something like

'set comments \'"customer": "Jimmy Lin"\''

set comments ""customer": "Jimmy Lin""

and shlex is seeing something like

'set comments "\\"customer\\":  \\"Jimmy Lin\\""'

If we use .replace('\"', '''), it will seem better

>>> lines[11].replace('\\"', '\'')
'        set comments "\'customer\': \'Jimmy Lin\'"'
>>> parse(lines[11].replace('\\"', '\''))
['set', 'comments', '"\'customer\': \'Jimmy Lin\'"']

But this is a dirty way, if there are more escape characters or more complicated nested things, I think it may fail, and the join part also not correct

>>> ' '.join(parse(lines[11].replace('\\"', '\'')))
'set comments "\'customer\': \'Jimmy Lin\'"'

Is there a solution to make these parsing process correct and clean.
Am I missing something?

@EiffelFly
Copy link

That's almost the same as the original string, but the double quotes around policy 168 became single quotes. It's acceptable though.

But when it comes to lines[10]

parse(lines[10])
['set', 'name', 'ALL'] # i believe this is 'service', not a big deal though.
shlex_join(parse(lines[10]))
'set service ALL'

@thejimmylin
Copy link
Owner Author

This one may work, but we need to test more.

import re


def parse(string):
    return re.findall('(?:\".*?[^\\\]\"|\S)+', string)


with open(file=r'C:\Users\LinKeiChi\envs\jupyter\conf\firewall_policy.conf', mode='r', encoding='utf-8') as f:
    lines = f.read().splitlines()
for line in lines:
    print(parse(line))

Output:

['config', 'firewall', 'policy']
['edit', '168']
['set', 'name', '"policy 168"']
['set', 'uuid', '14435052-3097-4d70-98c7-1dd2d60e229f']
['set', 'srcintf', '"jimmylin__1688"']
['set', 'dstintf', '"port1"']
['set', 'srcaddr', '"address__jimmylin__10.100.168.11/32"']
['set', 'dstaddr', '"all"']
['set', 'action', 'accept']
['set', 'schedule', '"always"']
['set', 'service', '"ALL"']
['set', 'comments', '"\\"customer\\": \\"Jimmy Lin\\""']
['set', 'nat', 'enable']
['set', 'ippool', 'enable']
['set', 'poolname', '"ippool__jimmylin__168.100.168.11"']
['next']
['end']

@thejimmylin thejimmylin added enhancement New feature or request invalid This doesn't seem right and removed enhancement New feature or request labels Aug 6, 2020
@thejimmylin
Copy link
Owner Author

thejimmylin commented Aug 8, 2020

  1. Forti and shlex parse a string with their respective way, but in most(all?) of the cases Forti can understand shlex's format, and shlex can understand Forti's.
  2. The main differences are:
  • Forti has some built-in quoted elements/words, it quotes them even if there is no white space in them; shelx doesn't.
  • shlex uses single quote; Forti uses double quote.
  • Forti uses backslash to escape to represent a quote in a quote; shlex mixs single quotes and double quotes and concatenate them instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants