Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1 #2419

Open
5 of 6 tasks
fhjgch opened this issue Oct 15, 2024 · 2 comments
Open
5 of 6 tasks
Labels

Comments

@fhjgch
Copy link

fhjgch commented Oct 15, 2024

Bug report checklis

  • Searched the issues page for similar reports

  • Read the relevant sections of the documentation

  • Browse the tutorials and tests for usefull code snippets and examples of use

  • Reproduced the issue after updating with pip install --upgrade pandapower (or git pull)

  • Tried basic troubleshooting (if a bug/error) like restarting the interpreter and checking the pythonpath

Reproducible Example

See `cim2pp` notebook in tutorials:

# folder_path points to the directory where the CIM .zip-Files are stored:
folder_path = os.path.join(os.getcwd(), 'example_cim')

# cgmes_files is a list containing paths to both files needed for the CIM converter:
cgmes_files = [os.path.join(folder_path, 'CGMES_v2.4.15_SmallGridTestConfiguration_Boundary_v3.0.0.zip'),
               os.path.join(folder_path, 'CGMES_v2.4.15_SmallGridTestConfiguration_BaseCase_Complete_v3.0.0.zip')]

for f in cgmes_files:
    if not os.path.exists(f):
        raise UserWarning(f"Wrong path specified for the CGMES file {f}")

net = cim2pp.from_cim(file_list=cgmes_files, use_GL_or_DL_profile='DL')

print('Conversion successful')

Issue Description and Traceback

Reading of the XML files fails with following error: XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1.
This only happens to some of the files stored in example_cim, not all of them.

It seems that the encoding of the files is the problem, as some are encoded in 'utf-8' while others in 'utf-8-bom'. After conversion of all files into 'utf-8' the file import was successful.

Expected Behavior

Message: "Conversion successful"

Installed Versions

INSTALLED VERSIONS

commit : 0691c5cf90477d3503834d983f69350f250a6ff7
python : 3.11.8
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22631
machine : AMD64
processor : Intel64 Family 6 Model 186 Stepping 3, GenuineIntel
byteorder : little
LOCALE : English_United Kingdom.1252

pandas : 2.2.3
numpy : 2.0.2
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 23.2.1
IPython : 8.28.0
bs4 : 4.12.3
jinja2 : 3.1.4
lxml.etree : 5.3.0
matplotlib : 3.9.2
numba : 0.60.0
scipy : 1.13.1
tzdata : 2024.2

Label

  • Relevant labels are selected
@fhjgch fhjgch added the bug label Oct 15, 2024
@KS-HTK
Copy link
Collaborator

KS-HTK commented Oct 31, 2024

@heckstrahler @mrifraunhofer I had the same Issue.

@KS-HTK
Copy link
Collaborator

KS-HTK commented Nov 19, 2024

The issue seems to be the encoding passed to XMLParser object. According to help(etree.XMLParser) this should be a libiconv encoding name, suggesting that 'UTF-8' is a valid name. But if i ommit the encoding keyword there is no longer any issue.

What is the reason for overriding the encoding?

Relavant code section: cim_classes.py Line 488

# Leads to error
parser = etree.XMLParser(encoding='UTF-8', resolve_entities=False)
xml_tree = etree.parse(file, parser)

# No error
parser = etree.XMLParser(encoding=None, resolve_entities=False)
xml_tree = etree.parse(file, parser)

print(xml_tree.docinfo.encoding)
# prints: 'UTF-8'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants