Site icon Andy M Mallon – AM²

Parsing file names from paths using T-SQL

These files are way neater than the ones on my laptop.

SQL Server is great at a lot of things. I spend a lot of time blogging about all the cool things that SQL Server does. But (you knew there was a “but”, didn’t you) SQL Server isn’t great at everything. Lets write about something that SQL Server is bad at.

SQL Server sucks at parsing strings

That’s not hyperbole. If you’ve tried to write your own function to parse CSVs into a table, you know it’s painful. Judging by the number of comments on this other article about splitting CSVs, this is clearly a topic that people spend a lot of time on. Thankfully, SQL Server 2016 finally introduced a SPLIT_STRING() function that performs well and handles most use cases….but we’re not here to split CSVs.

SQL Server sucks at parsing strings

Yea, I really can’t say it enough. The string manipulation & analysis functions that SQL Server does provide tend to be clunky. I can never remember how they work. Why does CHARINDEX() take three parameters, but PATINDEX() only takes two parameters? Why do those functions only work left-to-right, but not right-to-left? Oh, there’s a REVERSE() function I can apply to the string, but if I reverse the string to search right-to-left, I have to reverse it back on output.

SQL Server sucks at parsing strings

I feel like any time I need to do anything remotely involved, I end up with nested functions. Sometimes a lot of nested functions. Sometimes you have to repeat the same bits of function multiple times. It can make your code hard to read, especially if you don’t already know what it’s supposed to be doing.

Why are we here again?

Oh, right, parsing file & directory names. As a DBA, there seems to be a constant trickle of automation where I need to parse a file name out of a full path. PowerShell does a much better job at this than T-SQL. Go ahead and write a comment below that I should use PowerShell (or Python, or .NET, or some other language) to parse file names. No, really… go ahead, I’ll wait…

Look, if I’m in the database already, sometimes it’s easier to just use T-SQL. It’s convenient. So let’s parse this stuff in T-SQL, even though we know there’s a better way. While I’m being defensive about my coding, I’m only going to deal with Windows paths. Even though SQL Server runs on Linux now, this code won’t work with Linux & their backwards (uhhh….forwards) slashes.

First, let’s take a sample file path that we can use to practice parsing. I’m going to use this file (that I recently wrote about) on my laptop: C:\Users\Andy\Documents\GitHub\dba-database\stored-procedures\dbo.Alert_Blocking.sql

Starting with plain English

Writing & reading code is easier if you understand the logic before attacking the code. I find this to be particularly important when you anticipate complicated code. SQL Server sucks at parsing strings, so I anticipate complicated code.

How do you identify the directory from a file path? That’s just everything up to the last slash–and I like to include that final slash to make it clear it’s a directory.

How do you identify the file name from a file path? That’s just everything after the final slash.

The key here is going to be identifying that final slash, and grabbing the text on either side.

Time to parse strings

We’re going to use CHARINDEX to find the position of the final slash…and here’s the trick… we want to start at the end and work right-to-left. So we’re actually getting the the position from the tail end. We do this because there could be a bunch of slashes in the path, and if we attack this left-to-right, we would need to loop through finding slashes until we stopped finding them.

In this example, we see that the string is 84 characters long, and the final slash is 23 characters from the end:

Lets pop that into a CTE, and use it to grab the stuff from the left & right sides of that slash. Heck, let’s use the LEFT and RIGHT functions to do it.

To get the directory path, we’ll apply the LEFT function to the FilePath, but for how many characters? The total length minus the stuff after the final slash…plus one if we want the directory to include that final slash.

To get the file name, we’ll apply the RIGHT function to the FilePath, and since FinalSlashPos is counting from the right, we’ll use that for the number of characters….minus one so that we don’t include the slash.

We’ve got this far, being very careful to avoid those off-by-one errors with our position:

While we’re parsing stuff, we can parse the file name into the file extension and a “bare” file name.

Getting the file extension is pretty easy. Remember all the logic we just did to parse the file name out of the full path? Do the same thing over again but using a dot instead of a slash.

You might have guessed–getting the bare file name is similar to getting the directory from the path.

Now here’s the catch: This code assumes that you’re actually parsing a Windows file path, and the files all have extensions. In short, if your value is not LIKE '%\%.%', then the code is going to error.

Making the code useful

I don’t parse file paths all that often. In fact, every time it comes up, I usually forget the finer points and have to re-learn how it works. I’m pretty lazy, so I don’t want to have to keep re-learning the same thing. I’d rather put the code in my DBA database, and then just call some function and know that it works. That’s exactly what I’ve done.

I created an inline table-valued function called dbo.ParseFilePath(). Now, if I need to parse files & directories out of paths, I can just APPLY this function.

SQL Server might suck at parsing strings, but at least it’ll be easier next time.

 

Exit mobile version