How can I upload a tar.bz2 file to openstack swift object storage container

Question

Hanginium65 0 Newbie Poster

1 Month Ago

I wrote a Python script that included the python-swiftclient module to connect to the OpenStack Object Storage and upload some files to the OpenStack Object Storage container

It works great if I upload a file that ends with the extension .gz however, I’m getting an error regarding the ‘TarFile’ object having no attribute ‘read’ after running my script.
when it comes to the compressed file that ends with the extension .tar.bz2.

I’ve included the Python script and the errors I got after running it. Please show me where I’m wrong, and I would like some assistance in solving this issue. I’m a beginner in Python.

from keystoneauth1 import session
from keystoneauth1.identity import v3
from swiftclient.client import Connection
from swiftclient.client import ClientException
import gzip
import tarfile

# Create a password auth plugin
auth = v3.Password(auth_url='https://cloud.company.com:5000/v3/',
                   username='myaccount',
                   password='mypassword',
                   user_domain_name='Default',
                   project_name='myproject',
                   project_domain_name='Default')

# Create session
keystone_session = session.Session(auth=auth)

# Create swiftclient Connection
swift_conn = Connection(session=keystone_session)

# Create a new object with the contents of Netbox database backup
with gzip.open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as file:
    swift_conn.put_object(
        container,
        'object_netbox_2024-03-16.psql.gz',
        contents=file,
        content_type='application/gzip'
    )

# Confirm the presence of the object holding the Netbox database backup
obj1 = 'object_netbox_2024-03-16.psql.gz'
container = 'netbox-backups'
try:
    resp_headers = swift_conn.head_object(container, obj1)
    print("The object " + obj1 + " was successfully created")
except ClientException as e:
    if e.http_status == '404':
        print("The object " + obj1 + " was not found!")
    else:
        print("An error occurred checking for the existence of the object " + obj1)

# Create a new object with the contents of the compressed Netbox media backup
with tarfile.open('/var/backup/netbox_backups/netbox_media_2024-03-20.tar.bz2', mode='r:bz2') as file_tar_bz2:

    # Read the contents of the compressed Netbox media backup file
    file_contents = file_tar_bz2.read()

    # Create a file-like object from the contents of the compressed Netbox media backup file
    my_file_like_object = io.BytesIO(file_contents)

    # Upload the returned contents to the OpenStack Object Storage container
    swift_conn.put_object(
        container,
        'object_netbox_media_2024-03-20.tar.bz2',
        contents=file_tar_bz2,
        content_type='application/x-tar'
    )

# Confirm the presence of the object holding the compressed Netbox media backup
obj2 = 'object_netbox_media_2024-03-16.tar.bz2'
container = 'netbox-backups'
try:
    resp_headers = swift_conn.head_object(container, obj2)
    print("The object " + obj2 + " was successfully created")
except ClientException as e:
    if e.http_status == '404':
        print("The object " + obj2 + " was not found!")
    else:
        print("An error occurred checking for the existence of the object " + obj2)

Below is the error I got after running the script.

Python
Traceback (most recent call last):
File "/opt/scripts/netbox_backups_transfer.py", line 57, in <module>
file_contents = file_tar_bz2.read()
AttributeError: 'TarFile' object has no attribute 'read'

python

2 Contributors
3 Replies
85 Views
5 Days Discussion Span
Latest Post 3 Weeks Ago Latest Post by Salem

All 3 Replies

Salem 5,138 Posting Sage

1 Month Ago

First, let's prepare two tar files using different compression schemes for demo purposes.

$ cat foo_1.txt 
This is file 1
$ cat foo_2.txt 
This is file 2
This is file two
This is file too

# Three tar files, two compressed and one uncompressed for reference
$ tar -j -c -f foo.tar.bz2 foo_1.txt foo_2.txt 
$ tar -z -c -f foo.tar.gz foo_1.txt foo_2.txt 
$ tar -c -f foo.tar foo_1.txt foo_2.txt

$ file foo.tar.bz2 foo.tar.gz foo.tar
foo.tar.bz2: bzip2 compressed data, block size = 900k
foo.tar.gz:  gzip compressed data, from Unix, original size modulo 2^32 10240
foo.tar: POSIX tar archive (GNU)

# tar understands the contents of all three formats
$ tar tf foo.tar.bz2
foo_1.txt
foo_2.txt
$ tar tf foo.tar.gz
foo_1.txt
foo_2.txt
$ tar tf foo.tar
foo_1.txt
foo_2.txt

# The file sizes; note how much larger the uncompressed tar file is.
$ ls -l foo.tar.gz foo.tar.bz2 foo.tar
-rw-rw-r-- 1 sc sc 181 Mar 23 08:06 foo.tar.bz2
-rw-rw-r-- 1 sc sc 170 Mar 23 08:07 foo.tar.gz
-rw-rw-r-- 1 sc sc 10240 Mar 23 08:26 foo.tar

The gzip python library only does decompression. It knows nothing of the structure of the file, and just gives you bytes.

>>> import gzip
>>> gz = gzip.open('foo.tar.gz','rb')
>>> bytes = gz.read()
>>> print(len(bytes))
10240
>>> print(str(bytes)[:80])
b'foo_1.txt\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\

Notice that the length of the data is the same as the uncompressed tar file.

with gzip.open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as file:
    swift_conn.put_object(
        container,
        'object_netbox_2024-03-16.psql.gz',
        contents=file,
        content_type='application/gzip'
    )

So what you're actually presenting to the put_object will be a decompressed stream.
Is it being re-compressed again when you say "content_type='application/gzip'" ?
It might be worth comparing the local and remote file sizes, to check it is actually storing a compressed version.

The tarfile python library does know about tar files, and gives you a richer set of functions to deal with.

>>> tf1 = tarfile.open('foo.tar.bz2',mode='r')  # let it figure out the compression
>>> print(tf1.getnames())
['foo_1.txt', 'foo_2.txt']
>>> tf2 = tarfile.open('foo.tar.gz',mode='r')   # let it figure out the compression
>>> print(tf2.getnames())
['foo_1.txt', 'foo_2.txt']

In particular, it knows what each member file is called, and can handle the contents of the tarfile on a per member basis.

If you actually don't care about the contents of the file (you're just making a backup), you can use the regular file.
Here you can see that it just reads the compressed size.

>>> rawfile = open('foo.tar.bz2',mode='rb')
>>> rawbytes = rawfile.read()
>>> print(len(rawbytes))
181

In other words, treat every single file the same way with.

with open('/var/backup/netbox_backups/netbox_2024-03-16.psql.gz', 'rb') as file:
    swift_conn.put_object(
        container,
        'object_netbox_2024-03-16.psql.gz',
        contents=file,
        content_type='application/octet-stream'
    )

with open('/var/backup/netbox_backups/netbox_media_2024-03-20.tar.bz2', 'rb') as file:
    swift_conn.put_object(
        container,
        'object_netbox_media_2024-03-20.tar.bz2',
        contents=file,
        content_type='application/octet-stream'
    )

It kinda depends on what swift_conn.put_object does with content_type.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Hanginium65 0 Newbie Poster · Answer 1 · 2024-03-28T09:34:58+00:00

Good day, Salem. My apologies for taking so long to reply to your suggestion.

I refactored my code to read the contents of the tar.bz2 file and then pass them as a file-like object to the 'put_object' and also to change the content type for the file transfer to "application/octet-stream". The first was sent through to object storage but the tar file couldn't be sent and I got the error regarding the 'NoneType' object having no attribute 'read'.

Please see below the attempt I made and the error which occurred afterward.

# Create a new object with the contents of the compressed Netbox media backup
with tarfile.open("/var/backup/netbox_backups/netbox_media_2024-03-24.tar.bz2", "r:bz2") as file_tar_bz2:
# Go over each file in the tar archive...
for file_info in file_tar_bz2:

if file_info.isreg():

# Read the contents...
logger.info(f"Is regular file: {file_info.name}")
file_contents = file_tar_bz2.extractfile(file_info).read()

elif file_info.isdir():

# Read the contents...
logger.info(f"Is directory: {file_info.name}")
file_contents = file_tar_bz2.extractfile(file_info).read()

elif file_info.issym():

# Read the contents...
logger.info(f"Is symbolic link: {file_info.name}")
file_contents = file_tar_bz2.extractfile(file_info).read()

elif file_info.islnk():

# Read the contents...
logger.info(f"Is hard link: {file_info.name}")
file_contents = file_tar_bz2.extractfile(file_info).read()

else:
logger.info(f"Is something else: {tarinfo.name}. Skip it")
continue

# Create a file-like object from the contents...
file_like_object = io.BytesIO(file_contents)

# Upload the returned contents to Swift...
swift_conn.put_object(
container,
file_info.name,
# Use the name of the file selected in the archive as your object name...
contents=file_like_object,
content_type='application/octet-stream' # Set the appropriate content type...
)

Below is the error

File "/opt/scripts/netbox_backups_transfer.py", line 69, in <module>
file_contents = file_tar_bz2.extractfile(file_info).read()
AttributeError: 'NoneType' object has no attribute 'read'

Salem 5,138 Posting Sage · Answer 2 · 2024-03-28T11:11:14+00:00

I don't understand why you need to extract all the files from the compressed tar.bz2 just to upload to a backup.

Also, line 69 is now meaningless having just posted only a snippet of the code.

Before the error, what was the last logger.info message?

How can I upload a tar.bz2 file to openstack swift object storage container

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers