Bulk Bin Updater (or How I Learned to Stop Worrying and Love ODBC Connections)

Quite a while ago, I posted a replacement for DDMS’ Release to On-Hand report, PO Bin. Over the years, I’ve had a variety of ways of keeping the bulk bin table updated. Here is how I’m doing it now, and it’s the most efficient way I’ve ever had.

To keep the bulk bin table on our MySQL server updated, originally I waited for one of my warehouse guys to let me know they’d changed something, whether it was new product coming in, product being dropped, or bins being rearranged and product being moved. Because they weren’t consistent about telling when it needed to be updated, over time the usefulness of the report degraded significantly.

I came up with a method to update the table. It was a bit of a hack, involving copying DDMS’ i-bulk table, turning that into a csv file, then updating the MySQL table from that. I had it update twice a day and things were all right, as far as I knew.

More recently, I’ve got a new guy in the warehouse and a whole lot more product than I used to, along with two new mezzanines. Needless to say, the warehouse is a lot different now than it was when I wrote PO Bin. I needed a better way to keep bulk_bin updated. Thanks to Python and ODBC connections, now I’ve got a process that works really well.

At the heart of it all, we’re using an ODBC connection to i-bulk.dbf itself. ODBC, for those who don’t know, stands for ‘Open Database Connectivity’. Much beyond that is little beyond the scope of this article, but the Wikipedia article on the subject gives a good overview. Most importantly here, ODBC is the key to a lot of things we can do with DDMS. Even better, ODBC connections abstract away a lot of application-specific hassles between differing database types.

First, you’ll need a DSN (Data Source Name) connection set up on the machine where this script will live. To keep things easy, most everything I do lives on my sandbox server, and this is no different. Creating a DSN to DDMS is a little beyond the scope of this article, but if demand warrants, I’ll put up a post with a walk-through on doing just that.

I’m using Python for this script. Mostly, this is due to Python’s ease of use and ridiculously vast array of libraries. We can quickly see what I mean with this script. To start, we use import pyodbc. On my sandbox server, I set up a dedicated DSN pointing at DDMS’ IN directory. To connect, it’s as easy as dbfConn = pyodbc.connect('DSN=DDMS-IN'), followed by dbfCur = dbfConn.cursor(). Of course, if we’re being responsible developers, the proper code looks like this:

# create odbc connection to DDMS server
try:
dbfConn = pyodbc.connect('DSN=DDMS-IN')
dbfCur = dbfConn.cursor()
except Exception as e:
write_log("error opening odbc connection to DDMS server: " + str(e))

That is, for our own sanity, pretty much anytime we do anything in Python, we wrap it in a try/except block. This allows us to trap errors and figure out what has gone wrong. In the full file linked at the bottom of this post, you’ll find write_log is a function to write messages to a log file. This can be invaluable if you have something going wrong when the script runs, as whenever it’s called, it will write the error thrown.

Now then, we’ve got a connection to DDMS. We’re going to be updating our bulk_bin table on our MySQL server, so we’ll need to connect to that, as well. We could use another ODBC connection, but Python has a dedicated MySQL library, so we’ll use that instead. To do so, we update our import statement at the top of the file so it reads like so: import pyodbc, MySQLdb as mysql. I tend to alias the MySQL lib so the name is similar to conventional Python libraries (eg, all lowercase). It’s not required, but remember to adjust your function names accordingly if you don’t alias it. That done, we connect to MySQL:

# create connection to MySQL server
try:
mariaCon = mysql.connect(db_params['host'], db_params['mysql_user'], db_params['mysql_pass'], db_params['db'])
mariaCur = mariaCon.cursor()
except Exception as e:
write_log("error connecting to MySQL server: " + str(e))

Here again we wrap everything in a try/except block and write out errors to our log file. As to the connection string itself, I use config files for parameters that might change, such as the location and credentials to the MySQL server. Structurally, they’re much like ini files. In my config file, I have a section for the MySQL server:

[db]
mysql_user = bb
mysql_pass = bbpassword
host = localhost
db = item
bulkbin = bulk_bin
hash = hashes
hashtable = table_hashes

Note I have a dedicated user for this, as well. I like to follow the model of least privilege, so I created a user, bb, that is only allowed to make modifications to bulk_bin and table_hashes, which we’ll cover shortly. While we’re at it, though, we need to update the libraries we’re importing, to allow Python to read config files. Our import statement will now look like so: import pyodbc, MySQLdb as mysql, ConfigParser. For any Python devs out there, this is the tip-off for something I hadn’t mentioned yet: We’re using Python 2.7 here. At the time I started hacking at Python, Python 3 was out, but didn’t seem ready for prime-time. That was many years ago, and I’m still on Python 2.7. Eventually I’ll likely start converting old work to Python 3, but for now, I spend what time I have trying to get work done, rather than re-learning a language.

Let’s step back a moment before we go further. As referenced above, we’ve got another database in MySQL that the bb user can manipulate, hashes. There isn’t much to this db, just a single table called table_hashes. The table itself is also fairly simple, having an auto-incrementing index field, ‘idx’, along with ‘table_name’ as tinytext, ‘hash’ as varchar(255), and ‘timestamp’, a datetime field set to update with the current timestamp on update. I added the ‘table_name’ field for future use, in case I have need for a similar process later.

The reason we need this setup is that we’ll be checking if we need to update our table when we run this script. To do that, we’ll hash i-bulk.dbf itself to see if anything has changed. If nothing’s changed, no need to update bulk_bin on the MySQL server. Getting into the theory of hashing is also something for another time, so suffice it to say the magic of math makes it easy to determine if our table has changed quickly and easily. We’ll again update our import statement, so we’ve now got import pyodbc, MySQLdb as mysql, ConfigParser, hashlib.

Next, we’ll open a connection to our hashes db and retrieve the last hash from table_hashes:

try:
hashCon = mysql.connect(db_params['host'], db_params['mysql_user'], db_params['mysql_pass'], db_params['hash'])
hashCur = hashCon.cursor()
except Exception as e:
write_log("error connecting to MySQL server: " + str(e))

try:
hashCur.execute("SELECT `hash` FROM `" + db_params['hashtable'] + "` WHERE table_name = '" + path_params['tablename'] + "' ORDER BY timestamp DESC LIMIT 1")
mrHash = hashCur.fetchone()
except Exception as e:
write_log("error retrieving hash from table: " + str(db_params['hashtable']))

In development, I inserted a row to avoid errors being thrown from the table being empty. It’s a little hacky, but I don’t have need to empty out table_hashes, so I can avoid having to handle no results being returned. Notice the end of the query – we’re letting it pull all hashes, but we sort them date-descending and limit results to one. We only need the latest hash of the table, and this is an easy way to accomplish that.

Next, we need a way to hash i-bulk, which is comparatively simple:

hasher = hashlib.sha1()
with open(path_params['tablepath'] + path_params['tablename'], 'rb') as bulkFile:
try:
buffer = bulkFile.read(65536)
while len(buffer) > 0:
hasher.update(buffer)
buffer = bulkFile.read(65536)
except Exception as e:
write_log("error hashing table: " + str(path_params['tablename']) + ": " + str(e))

Rather than trying to hash the whole file, we create a buffer and feed the file in chunks to our hashing function. When we’re done, the result will be stored in hasher.hexdigest().

Next, we’ll compare the hash we retrieved from table_hashes against the hash we just created from i-bulk:

if (mrHash[0] != hasher.hexdigest()):

Note we end on a colon. Here what we’re looking for is if the two hashes are not the same, meaning i-bulk has changed since we last updated table_hashes, therefore, we need to update bulk_bin. Within this code is where we actually open connections to DDMS and the item database on our MySQL server, where bulk_bin is located. First, though, we’ll write our new hash into table_hashes:

# insert new hash into table
hList = (path_params['tablename'], hasher.hexdigest())
hashSql = "INSERT INTO " + db_params['hashtable'] + " (table_name, hash) VALUES (%s, %s)"
try:
hashCur.execute(hashSql, hList)
hashCon.commit()
except Exception as e:
# print str(e)
write_log("error executing hashCur in hash comparison: " + str(e))

Notice the commented-out print str(e). I use bits like this in debugging when I manually run this from the command line. It’s helpful to see errors thrown right there where you run it from as you debug, thus tricks like this. Once things are working, they can be commented out or deleted. I tend to leave them in, in case I need to debug later.

Now we can connect to DDMS and MySQL’s item database as covered above. Since we know bulk_bin needs to be updated, let’s get rid of the existing data:

try:
mariaCur.execute("TRUNCATE TABLE " + db_params['bulkbin'])
except Exception as e:
write_log("error truncating table: " + str(db_params['bulkbin']) + ": " + str(e))

We’re executing our SQL statement directly in our execute() function, since it’s short and there’s no user input. Now we’ll retrieve the fresh data:

try:
dbfIter = dbfCur.execute("select * from i-bulk")
except Exception as e:
write_log("error getting data from i-bulk: " + str(e))

valList = []
for rec in dbfIter:
try:
valList.append((str(rec[0]).strip(), int(str(rec[1]).strip()), str(rec[2]).strip(), int(str(rec[3]).strip()), str(rec[4]).strip(), str(rec[5]).strip(), int(str(rec[6]).strip())))
except Exception as e:
write_log("error appending to list from query result on line: " + str(k) + " " + str(e))

Again we’re executing our SQL directly in execute(). We start an empty list, valList[], into which we’ll populate the results from querying i-bulk. Our for loop reads through our query results and adds them row-by-row to valList. The valList.append() function is simply casting the results to strings and stripping the spaces out, as DDMS fields, as we all know, are littered with trailing or leading spaces. From there, we update bulk_bin:

try:
sql = "INSERT INTO " + db_params['bulkbin'] + " (bulk_keys, table_order, item_key, loc, bin, bulk_unit, bulk_qty) VALUES (%s, %s, %s, %s, %s, %s, %s)"
mariaCur.executemany(sql, valList)
except Exception as e:
write_log("error inserting values into table: " + str(db_params['bulkbin']) + ": " + str(e))

This is functionally the end of updating the table. While Python will handle this sort of thing itself, the Pythonic philosophy prefers explicit over implicit, so we end the script like so:

hashCon.close()
dbfConn.close()
mariaCon.close()
else:
print "match - no update needed"
hashCon.close()

The first three lines, our close() functions, close our database connections. They will also automatically close when the script exits, but explicit calls to clean up what’s no longer needed are never a bad idea. Below that, we have close() called on the connection to the hashes table. Way up above, remember that we started the real work with an if clause, if (mrHash[0] != hasher.hexdigest()):. This closes that function. In essence, if we compare the stored hash to the current hash and find them to be equal, there’s no work to be done, so we close our database connection and exit. The print command is also for debugging, but I missed commenting it out. It doesn’t affect functionality, though, so no matter.

After all this, I create a new task in Windows Task Manager. I’ve set mine to run once an hour from 7am to 7pm Monday through Friday. Everyone seems happy with hourly updates, but it’s easy enough to adjust later if need be.

Today, in what feels like a slightly rusty and disjointed way, I’ve demonstrated a method for updating bulk_bin on our MySQL server using Python, which is a new addition to our arsenal. To see the completed version, the files I use are posted here – http://www.bendustries.co/files/warehouse/bb.zip. This includes both the bb2.py script itself, along with the bb.cfg configuration file.

As ever, I remain

Leave a Reply

Your email address will not be published.