Importing additional python packages

Hi all,

I am trying to import some python packages that I need for advanced document analysis and ran into some troubles I was unfortunately not able to resolve myself.
As explained by you in another issue I changed my python path to my anaconda base environment using

python_path = user_path + '\\Anaconda3'
sys.path.append(python_path)
sys.path.append(os.path.join(python_path, 'Lib'))
sys.path.append(os.path.join(python_path, 'Lib\\site-packages'))

When I now try to import

from docx.api import Document

everything works. However, when I then want to read the document, an error occurs:

word_doc = Document(file)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "docx\api.py", line 25, in Document
  File "docx\opc\package.py", line 128, in open
  File "docx\opc\pkgreader.py", line 32, in from_file
  File "docx\opc\phys_pkg.py", line 101, in __init__
  File "zipfile.py", line 1258, in __init__
  File "zipfile.py", line 1321, in _RealGetContents
  File "zipfile.py", line 259, in _EndRecData
ValueError: I/O operation on closed file.

This does not happen for my local python interpreter (in the same environment).
When trying to solve the problem via

with open(file, "r") as f:
     word_doc = Document(file)

I also get an error:

Traceback (most recent call last):
  File "<string>", line 16, in read_table_word
  File "docx\api.py", line 25, in Document
  File "docx\opc\package.py", line 128, in open
  File "docx\opc\pkgreader.py", line 32, in from_file
  File "docx\opc\phys_pkg.py", line 101, in __init__
  File "zipfile.py", line 1258, in __init__
  File "zipfile.py", line 1325, in _RealGetContents
zipfile.BadZipFile: File is not a zip file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 26, in read_table_word
AttributeError: '_io.TextIOWrapper' object has no attribute 'split'

I have similar issues with a couple of packages (for example, requests or spacy). All of them work with my local interpreter but not within liberty RPA.

Is this just a compatibility problem that cannot be resolved, (i.e., I would have to find another workaround)? Or am I missing something?

Also, just as I side note: In case you are wondering why I’m not simply using the built-in read word file node: I wanted to directly extract the tables of the word file

My environment has python 3.9.7. If you need any other info on packages etc. please let me know :slight_smile:

Thank you,
Michaela

Hi Michaela,

Welcome to the Community!

A couple of suggestions:

  • Have you tried using Python 3.7.8 for your Conda env? This is mentioned from within the docs for advanced usage. You can do so with (but please do reference the Anaconda docs here as I am freewheeling a bit here):
conda create -n "myenv" python=3.7.8

You can also try importing the docx package before changing your paths (like you mentioned). That way you will import the python-docx package from Liberty RPA (which is included).

I would expect changing your Conda Python env version to match the one listed in the docs to resolve your incompatibility issues, but if not happy to help further in case you are still facing issues with the above suggestions!

Kind regards,
Koen

Hi Koen,

thank you for your super quick response.
Unfortunately I can either choose 3.7 or 3.8 with anaconda. Per default that is then 3.{7/8}.13.
When I ran into these problems a few weeks ago I remember then having some dependency problems. However, I don’t exactly remember which packages were affected and somehow found a workaround. But in case I will run into these problems again, I might reopen the issue, if that’s ok.

As for my current problem, import docx before changing the environment solved the problem :slight_smile:

Thank you very much!

All the best,
Michaela

Hi Michaela,

Glad to hear everything worked out in the end.

You can often get the right version from Python.org rather than through Anaconda, but you’ll have to use pip package manager rather than conda package manager.

Hope this helps and best of luck on your automation journey!

Kind regards,
Koen