Dynamically Querying Table Column DDL Info in Redshift

Dynamic Column Info

A very useful ability when working with databases is to be able to dynamically retrieve table description data of a table where data is stored. You may want to do this for many different reasons; a very common one would be to create dynamically generated UI (also known as scaffolding), another may be to modify or append extra columns to an existing data-set – say for example, during an ETL step or report generation.
The major point being that you’d like to create one block of code that won’t need to change if the data schema of the tables in your databases happen to change. This will reduce regression errors and maintenance efforts throughout the life of your code.

Redshift vs other Databases

Redshift differs from other, more established, databases in that it doesn’t contain a pre-packaged command for describing the DDL for existing database objects. This differs greatly from older db’s like MySQL that make this task super easy.
Since Redshift doesn’t come with commands for this task what approach should we use? The answer is to use queries or views that query Redshift’s internal tables to piece together the DDL info we need.

Querying Column Info

To dynamically get our DDL information we’ll use the catalog tables. If we only want a listing of column names in a table we can do so with this query…

SELECT QUOTE_IDENT(a.attname) AS col_name
FROM pg_namespace AS n
INNER JOIN pg_class AS c ON n.oid = c.relnamespace
INNER JOIN pg_attribute AS a ON c.oid = a.attrelid
LEFT OUTER JOIN pg_attrdef AS adef ON a.attrelid = adef.adrelid AND a.attnum = adef.adnum
WHERE c.relkind = 'r'
 AND a.attnum > 0
 AND n.nspname = '<schema name>'
 AND c.relname = '<table name>'
ORDER BY a.attnum

If we want more information, like what data type each column is, we can use a query like this…

SELECT
n.nspname AS schemaname
,c.relname AS tablename
,100000000 + a.attnum AS seq
,CASE WHEN a.attnum > 1 THEN ',' ELSE '' END AS col_delim
,QUOTE_IDENT(a.attname) AS col_name
,CASE WHEN STRPOS(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER VARYING') > 0
  THEN REPLACE(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER VARYING', 'VARCHAR')
 WHEN STRPOS(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER') > 0
  THEN REPLACE(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER', 'CHAR')
 ELSE UPPER(format_type(a.atttypid, a.atttypmod))
 END AS col_datatype
,CASE WHEN format_encoding((a.attencodingtype)::integer) = 'none'
 THEN ''
 ELSE 'ENCODE ' + format_encoding((a.attencodingtype)::integer)
 END AS col_encoding
,CASE WHEN a.atthasdef IS TRUE THEN 'DEFAULT ' + adef.adsrc ELSE '' END AS col_default
,CASE WHEN a.attnotnull IS TRUE THEN 'NOT NULL' ELSE '' END AS col_nullable
FROM pg_namespace AS n
INNER JOIN pg_class AS c ON n.oid = c.relnamespace
INNER JOIN pg_attribute AS a ON c.oid = a.attrelid
LEFT OUTER JOIN pg_attrdef AS adef ON a.attrelid = adef.adrelid AND a.attnum = adef.adnum
WHERE c.relkind = 'r'
 AND a.attnum > 0
 AND n.nspname = ''
 AND c.relname = ''
ORDER BY a.attnum

The key to these queries is to put your schema name as the value for n.nspname, and your table name as the value for c.relname.

Generic Script to Dump Modified Column Headers into a CSV

Say we wanted to make a quick shell script to dynamically pull columns from a table and use that information for another step in our ETL process – for example, appending several tables’ data to a csv file. Keep in mind Redshift already comes with a very robust set of data migration tools, COPY and UNLOAD, but there’s always some peculiar operation that needs to be done where existing methods just won’t cut it (such is the life of a professional developer).
For our example we can use the PostgreSQL cli client psql to dynamically query our table’s columns, put them in csv form, then from there the world is our oyster. For example ….

fieldnames=""
fieldnames_results=`psql -X "host=$rs_host port=$rs_port dbname=$rs_dbname user=$rs_user password=$rs_password"  -A -t -q  -c "SELECT QUOTE_IDENT(a.attname) AS col_name FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace INNER JOIN pg_attribute AS a ON c.oid = a.attrelid LEFT OUTER JOIN pg_attrdef AS adef ON a.attrelid = adef.adrelid AND a.attnum = adef.adnum WHERE c.relkind = 'r' AND a.attnum > 0 AND n.nspname = '$schema_name' AND c.relname = '$db_name' ORDER BY a.attnum;"`
for fieldnames_results_item in $(echo $fieldnames_results | sed "s/ / /g")
do
fieldnames=$fieldnames','$fieldnames_results_item
done

fieldnames="${fieldnames:1:${#fieldnames}}"

… now “fieldnames” possesses a comma delimited string of our table which was created completely on the fly.

Code

db-aws-redshift-dynamic-table-schema (GitHub)

Links

Examples of Catalog Queries (aws docs)
Querying the Catalog Tables (aws docs)
awslabs/amazon-redshift-utils (aws github)
excerpt from https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminViews/v_generate_tbl_ddl.sql SELECT n.nspname AS schemaname ,c.relname AS tablename ,100000000 + a.attnum AS seq ,CASE WHEN a.attnum > 1 THEN ',' ELSE '' END AS col_delim ,QUOTE_IDENT(a.attname) AS col_name ,CASE WHEN STRPOS(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER VARYING') > 0 THEN REPLACE(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER VARYING', 'VARCHAR') WHEN STRPOS(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER') > 0 THEN REPLACE(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER', 'CHAR') ELSE UPPER(format_type(a.atttypid, a.atttypmod)) END AS col_datatype ,CASE WHEN format_encoding((a.attencodingtype)::integer) = 'none' THEN '' ELSE 'ENCODE ' + format_encoding((a.attencodingtype)::integer) END AS col_encoding ,CASE WHEN a.atthasdef IS TRUE THEN 'DEFAULT ' + adef.adsrc ELSE '' END AS col_default ,CASE WHEN a.attnotnull IS TRUE THEN 'NOT NULL' ELSE '' END AS col_nullable FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace INNER JOIN pg_attribute AS a ON c.oid = a.attrelid LEFT OUTER JOIN pg_attrdef AS adef ON a.attrelid = adef.adrelid AND a.attnum = adef.adnum WHERE c.relkind = 'r' AND a.attnum > 0 AND n.nspname = '' AND c.relname = '' ORDER BY a.attnum
excerpt from https://github.com/awslabs/amazon-redshift-utils/blob/master/src/AdminViews/v_generate_tbl_ddl.sql SELECT QUOTE_IDENT(a.attname) AS col_name FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace INNER JOIN pg_attribute AS a ON c.oid = a.attrelid LEFT OUTER JOIN pg_attrdef AS adef ON a.attrelid = adef.adrelid AND a.attnum = adef.adnum WHERE c.relkind = 'r' AND a.attnum > 0 AND n.nspname = '' AND c.relname = '' ORDER BY a.attnum
PG_TABLE_DEF (aws docs)
Redshift table discovery sql queries (dev blog)

Dynamically Querying Table Column DDL Info in Redshift

Dynamic Column Info

Redshift vs other Databases

Querying Column Info

Generic Script to Dump Modified Column Headers into a CSV

Code

Links

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112