FILESTREAM is not a particularly popular feature of SQL Server–if you’re using it, you are in the minority. If you have a database with filestream in an Availability Group, then you’re REALLY in the minority. At my 9-5, this is being done, and it’s apparent that this combination of features isn’t a robust combination.
The bug
I ran into a situation where garbage collection process for filestream didn’t appear to be working properly. I initially noticed this when a significant amount of data was migrated out of the database, but no additional space was reclaimed on disk.
I tracked the problem back to this specific scenario: When you have a database that is part of an Availability Group, and you drop a table that contains filestream data, the filestream garbage collection does not clean up the data container subdirectory that corresponds to that table. Garbage collection will continue to clean up other items (eg, deleted rows), but the dropped table never gets cleaned up.
I’ve logged this as a UserVoice item for your up-voting pleasure.
The repro steps
Environment: I am testing this on SQL Server 2014 SP1-CU3 (12.0.4427.24), using an Availability Group with two replicas. Both servers run Windows 2012 R2.
- Create a table that contains filestream data
- Insert data to populate the filestream column
- Notice that files have been created on disk
- Drop the table (or Truncate the table)
- Force garbage collection to run
- Perform log backups so that garbage collection can clean up
- Repeat log backups & forcing garbage collection N times
- Notice that the num_unprocessed_items shows a pending item
- Notice that the files on disk are never cleaned up
And here’s a script I used. You’ll want to replace AM2_test
with the name of your database that you’re using for testing. Also, I do the log backup to NUL:
… That’s like sending your log backup to the shredder. Don’t do that if you actually care about your database.
--Create table CREATE TABLE dbo.FilestreamTestData( ID bigint IDENTITY(1,1) NOT NULL, FilestreamData varbinary(max) FILESTREAM NULL, FilestreamGUID uniqueidentifier ROWGUIDCOL NOT NULL UNIQUE DEFAULT (newsequentialid()), CONSTRAINT PK_FilestreamTestData PRIMARY KEY CLUSTERED (ID) ); GO --Put some random data into the table --I'll create 100 rows with 100 filestream files INSERT INTO dbo.FilestreamTestData(FilestreamData) SELECT TOP 100 CAST( 'Random Data' AS varbinary(max)) FROM sys.columns; GO --drop the table DROP TABLE dbo.FilestreamTestData; GO --force garbage collection to run multiple times --take log backups in between to make sure things are moving along. EXEC sp_filestream_force_garbage_collection @dbname = 'AM2_test'; BACKUP LOG AM2_test TO DISK = 'NUL:'; GO 10
The workaround
I haven’t figured out how to get garbage collection to collect the DROP TABLE and still keep the database in the Availability Group. In order to force the garbage collection to work, I use the following process:
- Remove database from Availability Group
- Backup Log
- Force garbage collection
- Repeat log backup & garbage collection until garbage collection is caught up
- You can tell garbage collection is caught up when sp_filestream_force_garbage_collection returns 0 for both num_marked_for_collection_items and num_unprocessed_items.
- Restore logs to secondary server(s)
- Re-add database to Availability Group
If you have another workaround–particularly one that doesn’t involve removing the database from the AG–let me know in the comments below.
UPDATE (2015-12-21):
In doing another round of testing, it appears that the TRUNCATE TABLE command does not get cleaned up by filestream garbage collection either. Both the DROP TABLE and TRUNCATE TABLE will remain unprocessed by garbage collection as long as the database is part of the Availability Group. Instead, use DELETE to delete the contents of the table.