Hey! I've been asked to store some data from a client in mysql in an encoded format using python. Nothing could be easier, right?

I will later need to decode the same data using mysql exclusively. Given that constraint, I thought that base64 would be the go to since nearly every contemporary data tool that we've heard of is base64-literate.

So from flatfile it goes into the db using python no problem.

When I try to decode the data for warehousing later, the mysql from_base64() function returns a hexadecimal string instead of....ascii string? Or whatever. Human readable string.

I've found various hacks online that...should work? But I want to code this as correctly as possible because portability is so important, and it's tough to remember whatever hack you used to get it running five years ago. I'm convinced I'm doing it wrong.

Is there a better way to python-encode data that's consistently reversible using mysql for processing?

Thanks in advance!

Yeah, even

SELECT UNHEX(HEX('string'));

is giving me

+--------------------------------------------+
| UNHEX(HEX('string'))                       |
+--------------------------------------------+
| 0x737472696E67                             |
+--------------------------------------------+
1 row in set (0.00 sec)

Is there a better way to python-encode data that's consistently reversible using mysql for processing?

I suppose the first question I would ask is why does the data need to be encoded in MySQL? What is the objective for it to be encoded? I suspect it's not security, since you're wanting something easily portable and encoded/decoded cross-platform, and encoded rather than encrypted. I also suspect it's not to conserve space, since, with the algorithm you're using, your encoded string is longer than your decoded version.

What is the problem you are trying to solve instead of just keeping everything in plain text?

The reason I ask is because something we use here at DaniWeb is:

// Convert from Base 10 to Base 36
function encrypt_id($id = 0)
{    
    return base_convert($id, 10, 36);
}

// Convert from Base 36 to Base 10
function decrypt_id($id = 0)
{    
    return intval(base_convert($id, 36, 10));
}

This is PHP, but you get the point. I suspect python has a similar function.

And then the MySQL equivalent of those would be:

CONV(id, 10, 36) // From Base 10 to Base 36
CONV(id, 36, 10) // From Base 36 to Base 10

Base 10 system allows for the characters 0,1,2,3...9 (Hence 10 different characters used to represent the number)
Base 36 system allows for 0-9 as well as A-Z (Hence 36 different characters)
Base 62 system allows for 0-9, A-Z, and a-z, but php's native base_convert() function doesn't support it

We do this for integer ID #s because the encoded versions are both portable and fewer bytes, which serves us well in a couple of different areas. For example, instead of the number 987654321 we can send GC0UY9.

You should be able to use the same/similar algorithm for ASCII strings.

Base 64, which is essentially Base 62 plus + and / characters, is used a lot to encode/decode strings, such as to transport binary data over email. Therefore, there's a MySQL FROM_BASE64() and MySQL TO_BASE64() as well as, in PHP, there's base64_encode() and base64_decode(). A quick Google search showed that Python has similar functions e.g.:

import base64
string = 'My String'
base64.b64encode(string)
commented: Thanks! This helped. Ended up solving it with CONVERT (FROM_BASE64('TXkgY2F0IGxpa2VzIHRvIGNoYXNlIGVsZXBoYW50cyE=') USING utf8mb4) +0

Sorry! I went on a bit of a tangent about BASE64 because I skimmed over your initial question and saw you were doing things such as UNHEX(HEX()).

Let me answer your question now: I think it has to do with the encoding you're using for your database. When I do SELECT UNHEX(HEX('string')); I get string back. I'm using MySQL 8 with a server charset of UTF-8 Unicode (utf8mb4).

What happens if, when connecting to MySQL from the command line, you add the flag --skip-binary-as-hex?

commented: This showed promise, but did not fix the error. Instead I had to use the CONVERT() function +0

What happens when you do:

SELECT CONVERT(UNHEX(HEX('string')) USING utf8mb4);

You can see from https://dev.mysql.com/doc/refman/8.0/en/mysql-command-options.html#option_mysql_binary-as-hex that it shows that as of MySQL 8.0.19, it displays binary data as hexadecimal when mysql is in interactive mode. You can disable it with the --skip-binary-as-hex flag when connecting via the command line.

I don't believe this is an issue if connecting via a web app and using an appropriate character set.

You mentioned in your initial post that you found various hacks online. Was CONVERT() one of the ones you were already looking into?

I’m sure there is a MySQL setting that can fix this for you. As mentioned, it works fine for me when I query MySQL from within PHP without having to do any type of conversions.

@Dani

You mentioned in your initial post that you found various hacks online. Was CONVERT() one of the ones you were already looking into?

I’m sure there is a MySQL setting that can fix this for you. As mentioned, it works fine for me when I query MySQL from within PHP without having to do any type of conversions.

Yes, the four possible solutions that I encountered during my mediation and during my research were changes to the python script engine configuration, changes to database engine configuration, changes to the SQL or changes to the python.

It was whilst I revisited the possibility (as you suggest) of changing the db engine that I bumped perchance into a code segment that happened to work, in fact the selfsame function (CONVERT()) that you had recommended.

You can imagine my relief. In terms of design priorities, fixing the issue on the db statement side of the equation is the most satisfactory because I tend to assume that the SQ Language is the most stable element in my stack. Python changes every five or six days, with web servers and database engines changing somewhat less frequently but still at an inconvenient rate.

Does that make sense? I think that 90% of application development issues should be tackled through design. Trying to solve a problem with page after page of source code to me is inferior. Wouldn't you agree? I'm very open to 3rd party opinions and I can't emphasize enough how all of the time that I spent in the design phase is now paying off, given how smoothly all of this is really going.

Personally, I don’t like the idea of having to use CONVERT() on every query in my application. i think the right strategy is to make sure the database is in the right character set and you're connecting to the database with the correct charset. As mentioned, it works for me without having to modify my queries with CONVERT(). You shouldn’t have to either.

Everything is moving along, thanks for all the help so far!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.