convert sql-excel date to recognizable format

clip art of 
 a double-quote character

Question

I have a large csv file(1.1G) exported from SQL Server. I would like to do data wrangling in python, but I had a few problems doing this. The original csv file dates look like this 00:07.5, 00:08.3, 00:48.7, I can manually convert them to 1/12/2015 12:00:07 am, 1/12/2015 12:00:08 am, 1/12/2015 12:00:49 am in excel by formatting the entire column to make it in d/m/yy h:mm:ss, but sometimes output is weird like 100:00:00, or it just kept the time but not the dates, when I finally successfully did this, I noticed that the file size had shrank to 36.6 MB from 1.1G. I received the excel notification:

error message reading:Possible Data Lost: Some features might be lost if you save this notebook in the comma-delimited(.csv) format. To preserve these features, save it in an Excel file format.

My issue with the date time data is similar to this question.

My questions are:

  1. Is there any data lost during the process given that file size shrank?(most concerned)
  2. How does this sql to excel date conversion work?
  3. Is it possible to convert the dates in python?

asked 2013-10-26 by nilsinelabore


Answer

I notice sometimes you mention exporting to Excel, but you mention it is a CSV file (and the error message you mention supports that). Note that Excel will open a .csv file, but much functionality in Excel prefers to save in native .xslx format. Once you begin modifying data in Excel, I recommend doing a Save As... to save as a native Excel file.

Regarding your questions….

Is there any data lost during the process given that file size shrank?

Almost certainly, yes. CSV files are plain text files, so a dramatic decrease in size means a dramatic decrease in the number of characters stored in the file. Its possible those extra characters were a bunch of meaningless whitespace, and Excel trimmed it away–but I suspect not.

Excel is limited to opening only 1,048,576 rows (or 65,536 rows on very old versions of Excel). If your CSV file contained more than 1,048,576 rows, then opening it in Excel and saving the results would truncate it to that maximum row count.

Also, sometimes Excel can display date/time values in odd ways, which obscure the full date/time. Without seeing the raw CSV data, it’s hard to say what Excel might have been doing. For example, Excel may have been displaying only the time portion, and “hiding” the date portion (even though it was in the CSV). You would need to open the original CSV using a text editor to confirm.

How does this sql to excel date conversion work?

It sounds like there is an issue with your method for creating the CSV when you export data from SQL Server. You do not describe the export method from SQL Server, however, the format of the datetime values that you describe (00:07.5, 00:08.3, 00:48.7) is not a format that SQL Server would normally use to export datetime data. If you don’t apply any formatting, and accept the defaults, the date/time fields would be in the format according to your OS locality.

For example, you can create this sample data in SQL Server:

    CREATE TABLE dbo.ExportMe (
       ID int IDENTITY(1,1),
       SomeData varchar(10),
       DateField datetime,
       DateField2 datetime2(0)
    );
    
    INSERT INTO dbo.ExportMe (SomeData, DateField, DateField2)
    VALUES ('Something','20191030','20191030'),
           ('Something Spooky','20191031','20191031'),
           ('Something Jolly','20191225','20191225');

Then use the SQL Server Management Studio Import/Export Wizard or SQL Server Integration Services (SSIS) to export the table to a CSV destination, you’ll get a file that looks like this:

    "ID","SomeData","DateField","DateField2"
    "1","Something","2019-10-30 00:00:00","2019-10-30 00:00:00"
    "2","Something Spooky","2019-10-31 00:00:00","2019-10-31 00:00:00"
    "3","Something Jolly","2019-12-25 00:00:00","2019-12-25 00:00:00"

Also note that the Import/Export Wizard and SSIS can export directly to Excel–though this can be finicky with requiring the proper 32/64 bit version of Excel to match your version of SSMS/SSIS.

Is it possible to convert the dates in python?

Assuming that you have valid dates in your CSV file to begin with, Python should be able to consume the dates without a problem. However, as I mentioned, it sounds like your file does not have valid data, and you are probably out of luck. You might try posting a separate question with some plain-text examples from your CSV (ie, full rows from the CSV file with private data obscured), and see if a Python expert can help you wrangle those dates.

answered 2019-10-29 by Andy Mallon