python 3.x - Pandas .to_sql fails silently randomly

Question

Welcome To Ask or Share your Answers For Others

python 3.x - Pandas .to_sql fails silently randomly

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python 3.x - Pandas .to_sql fails silently randomly

I have several large pandas dataframes (about 30k+ rows) and need to upload a different version of them daily to a MS SQL Server db. I am trying to do so with the to_sql pandas function. On ocassion, it will work. Other times, it will fail - silently - as if the code uploaded all of the data despite not having uploaded a single row.

Here is my code:

class SQLServerHandler(DataBaseHandler):
    
    ...


    def _getSQLAlchemyEngine(self):
        '''
            Get an sqlalchemy engine
            from the connection string

            The fast_executemany fails silently:

            https://stackoverflow.com/questions/48307008/pandas-to-sql-doesnt-insert-any-data-in-my-table/55406717
        '''
        # escape special characters as required by sqlalchemy
        dbParams = urllib.parse.quote_plus(self.connectionString)
        # create engine
        engine = sqlalchemy.create_engine(
            'mssql+pyodbc:///?odbc_connect={}'.format(dbParams))

        return engine

    @logExecutionTime('Time taken to upload dataframe:')
    def uploadData(self, tableName, dataBaseSchema, dataFrame):
        '''
            Upload a pandas dataFrame
            to a database table <tableName>
        '''
        engine = self._getSQLAlchemyEngine()

        dataFrame.to_sql(
            tableName,
            con=engine,
            index=False,
            if_exists='append',
            method='multi', 
            chunksize=50,              
            schema=dataBaseSchema)

Switching the method to None seems to work properly but the data takes an insane ammount of time to upload (30+ mins). Having multiple tables (20 or so) a day of this size discards this solution.

The proposed solution here to add the schema as a parameter doesn't work. Neither does creating a sqlalchemy session and passsing it to the con parameter with session.get_bind().

I am using:

ODBC Driver 17 for SQL Server
pandas 1.2.1
sqlalchemy 1.3.22
pyodbc 4.0.30

Does anyone know how to make it raise an exception if it fails?

Or why it is not uploading any data?

question from:https://stackoverflow.com/questions/65940670/pandas-to-sql-fails-silently-randomly

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T18:56:39+0000

In rebuttal to this answer, if to_sql() was to fall victim to the issue described in

SQL Server does not finish execution of a large batch of SQL statements

then it would have to be constructing large anonymous code blocks of the form

-- Note no SET NOCOUNT ON;
INSERT INTO gh_pyodbc_262 (id, txt) VALUES (0, 'row0');
INSERT INTO gh_pyodbc_262 (id, txt) VALUES (1, 'row1');
INSERT INTO gh_pyodbc_262 (id, txt) VALUES (2, 'row2');
…

and that is not what to_sql() is doing. If it were, then it would start to fail well below 1_000 rows, at least on SQL Server 2017 Express Edition:

import pandas as pd
import pyodbc
import sqlalchemy as sa

print(pyodbc.version)  # 4.0.30

table_name = "gh_pyodbc_262"
num_rows = 400
print(f" num_rows: {num_rows}")  # 400

cnxn = pyodbc.connect("DSN=mssqlLocal64", autocommit=True)
crsr = cnxn.cursor()

crsr.execute(f"TRUNCATE TABLE {table_name}")

sql = "".join(
    [
        f"INSERT INTO {table_name} ([id], [txt]) VALUES ({i}, 'row{i}');"
        for i in range(num_rows)
    ]
)
crsr.execute(sql)

row_count = crsr.execute(f"SELECT COUNT(*) FROM {table_name}").fetchval()
print(f"row_count: {row_count}")  # 316

Using to_sql() for that same operation works

import pandas as pd
import pyodbc
import sqlalchemy as sa

print(pyodbc.version)  # 4.0.30

table_name = "gh_pyodbc_262"
num_rows = 400
print(f" num_rows: {num_rows}")  # 400

df = pd.DataFrame(
    [(i, f"row{i}") for i in range(num_rows)], columns=["id", "txt"]
)

engine = sa.create_engine(
    "mssql+pyodbc://@mssqlLocal64", fast_executemany=True
)

df.to_sql(
    table_name,
    engine,
    index=False,
    if_exists="replace",
)

with engine.connect() as conn:
    row_count = conn.execute(
        sa.text(f"SELECT COUNT(*) FROM {table_name}")
    ).scalar()
    print(f"row_count: {row_count}")  # 400

and indeed will work for thousands and even millions of rows. (I did a successful test with 5_000_000 rows.)

Categories

python 3.x - Pandas .to_sql fails silently randomly

python 3.x - Pandas .to_sql fails silently randomly

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags