Phaisarn

ติดตั้ง Flask

Posted on April 30, 2023 by jack

Welcome to Flask — Flask Documentation (2.3.x) (palletsprojects.com)

Flask ใช้ Jinja เป็น HTML templates

Install Flask

สร้าง folder สำหรับโปรเจ็กส์

> mkdir myproject ; cd myproject

สร้างไฟล์ requirements.txt

Flask==2.0.2

พิมพ์คำสั่งตามนี้

python -m venv venv
.\venv\scripts\activate
pip install -r requirements.txt

สร้างไฟล์ app.py หรือชื่ออื่นก็ได้ เช่น hello.py แต่ห้ามชื่อ flask.py

Quickstart — Flask Documentation (2.3.x) (palletsprojects.com)

from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello_world():
    return "<p>Hello, World!</p>"

ไฟล์ชื่อ app.py สั่งรัน Flask แบบนี้

flask run

แต่ถ้าไฟล์ชื่ออื่น เช่น ไฟล์ชื่อ hello.py สั่งรันแบบนี้

flask --app hello run

รันแบบ debug – restarts the server whenever you make changes to the code.

flask run --debug

เปิด browser ไปที่ http://127.0.0.1:5000/

HTML Escaping

from flask import Flask
from markupsafe import escape

app = Flask(__name__)

@app.route("/")
def hello_world():
    return "<p>Hello, World!</p>"

@app.route("/<name>")
def hello(name):
    return f"Hello, {escape(name)}!"

ไปที่ http://127.0.0.1:5000/jack

จะได้ Hello, jack!

Routing

from flask import Flask
from markupsafe import escape

app = Flask(__name__)

@app.route('/')
def index():
    return 'Index Page'

@app.route('/hello')
def hello_world():
    return 'Hello, World'

@app.route("/<name>")
def hello(name):
    return f"Hello, {escape(name)}!"

Variable Rules

from flask import Flask
from markupsafe import escape

app = Flask(__name__)

@app.route('/')
def index():
    return 'Index Page'

@app.route('/hello')
def hello_world():
    return 'Hello, World'

@app.route("/<name>")
def hello(name):
    return f"Hello, {escape(name)}!"

@app.route('/user/<username>')
def show_user_profile(username):
    # show the user profile for that user
    return f'User {escape(username)}'

@app.route('/post/<int:post_id>')
def show_post(post_id):
    # show the post with the given id, the id is an integer
    return f'Post {post_id}'

@app.route('/path/<path:subpath>')
def show_subpath(subpath):
    # show the subpath after /path/
    return f'Subpath {escape(subpath)}'

Converter types:

`string`	(default) accepts any text without a slash
`int`	accepts positive integers
`float`	accepts positive floating point values
`path`	like `string` but also accepts slashes
`uuid`	accepts UUID strings

Apache log4net™ กับ .NET 6 WinForms

Posted on April 30, 2023 by jack

สร้างโปรเจ็กส์แบบ Windows Forms App (winforms) ชื่อ WinFormsApp1

ดูไฟล์ Program.cs

namespace WinFormsApp1
{
    internal static class Program
    {
        /// <summary>
        ///  The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main()
        {
            // To customize application configuration such as set high DPI settings or default font,
            // see https://aka.ms/applicationconfiguration.
            ApplicationConfiguration.Initialize();
            Application.Run(new Form1());
        }
    }
}

ดูไฟล์ Form1.cs

namespace WinFormsApp1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }
    }
}

ติดตั้ง Package log4net , Microsoft.Extensions.Hosting และ Microsoft.Extensions.Logging.Log4Net.AspNetCore

ดูไฟล์ WinFormsApp1.csproj

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>WinExe</OutputType>
    <TargetFramework>net6.0-windows</TargetFramework>
    <Nullable>enable</Nullable>
    <UseWindowsForms>true</UseWindowsForms>
    <ImplicitUsings>enable</ImplicitUsings>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="log4net" Version="2.0.15" />
    <PackageReference Include="Microsoft.Extensions.Hosting" Version="6.0.1" />
    <PackageReference Include="Microsoft.Extensions.Logging.Log4Net.AspNetCore" Version="6.1.0" />
  </ItemGroup>

</Project>

สร้างไฟล์ log4net.config และ ตั้งค่า Copy to Output Directory เป็น Copy always

<?xml version="1.0" encoding="utf-8" ?>
<log4net>
  <appender name="RollingLogFileAppender" type="log4net.Appender.RollingFileAppender">
    <lockingModel type="log4net.Appender.FileAppender+MinimalLock"/>
    <file value="log/" />
    <datePattern value="yyMMdd'Jack.log'" />
    <staticLogFileName value="false"/>
    <appendToFile value="true"/>
    <rollingStyle value="Date"/>
    <maxSizeRollBackups value="100"/>
    <maximumFileSize value="15MB"/>
    <encoding value="UTF-8"/>
    <layout type="log4net.Layout.PatternLayout">
      <param name="ConversionPattern" value="%-5p%d{ HH:mm:ss} li:%line - [%method] %m  %n" />
    </layout>
  </appender>
  <root>
    <level value="ALL"/>
    <appender-ref ref="RollingLogFileAppender"/>
  </root>
</log4net>

แก้ไขไฟล์ Program.cs

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;

namespace WinFormsApp1
{
    internal static class Program
    {
        /// <summary>
        ///  The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main()
        {
            // To customize application configuration such as set high DPI settings or default font,
            // see https://aka.ms/applicationconfiguration.
            ApplicationConfiguration.Initialize();

            ///Generate Host Builder and Register the Services for DI
            var builder = new HostBuilder()
               .ConfigureServices((hostContext, services) =>
               {
                   //Register all your services here
                   services.AddLogging(configure => configure.AddConsole())
                           .AddScoped<Form1>();
               }).ConfigureLogging(logBuilder =>
               {
                   logBuilder.SetMinimumLevel(LogLevel.Trace);
                   logBuilder.AddLog4Net("log4net.config");
               });

            var host = builder.Build();

            using (var serviceScope = host.Services.CreateScope())
            {
                var services = serviceScope.ServiceProvider;
                try
                {
                    var form1 = services.GetRequiredService<Form1>();
                    Application.Run(form1);

                    Console.WriteLine("Success");
                }
                catch (Exception ex)
                {
                    Console.WriteLine("Error");
                    Console.WriteLine(ex.ToString());
                }
            }
        }
    }
}

แก้ไขไฟล์ Form1.cs

using Microsoft.Extensions.Logging;

namespace WinFormsApp1
{
    public partial class Form1 : Form
    {
        private readonly ILogger _logger;
        public Form1(ILogger<Form1> logger)
        {
            InitializeComponent();
            _logger = logger;

            try
            {
                _logger.LogInformation("Form1 Started");
                MessageBox.Show("Hello World!");
            }
            catch (Exception ex)
            {
                _logger.LogError(ex.Message);
            }
        }
    }
}

Delta Lake Features

Posted on April 17, 2023 by jack

Understanding the Delta Lake Transaction Log – Databricks Blog

1. สร้างตาราง students

%sql
CREATE TABLE students (
  id INT, name STRING, value DOUBLE);

ดูคำสั่งสร้างตารางนี้ด้วย SHOW CREATE TABLE

%sql
SHOW CREATE TABLE students

%sql
CREATE TABLE spark_catalog.default.students (
  id INT,
  name STRING,
  value DOUBLE)
USING delta
TBLPROPERTIES (
  'delta.minReaderVersion' = '1',
  'delta.minWriterVersion' = '2')

Using DESCRIBE EXTENDED allows us to see important metadata about our table.

%sql
DESCRIBE EXTENDED students

%python
df1 = spark.sql('DESCRIBE EXTENDED students')
df1.show()

+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                  id|                 int|   null|
|                name|              string|   null|
|               value|              double|   null|
|                    |                    |       |
|# Detailed Table ...|                    |       |
|             Catalog|       spark_catalog|       |
|            Database|             default|       |
|               Table|            students|       |
|                Type|             MANAGED|       |
|            Location|dbfs:/user/hive/w...|       |
|            Provider|               delta|       |
|               Owner|                root|       |
| Is_managed_location|                true|       |
|    Table Properties|[delta.minReaderV...|       |
+--------------------+--------------------+-------+

DESCRIBE DETAIL is another command that allows us to explore table metadata.

%sql
DESCRIBE DETAIL students

%python
df2 = spark.sql('DESCRIBE DETAIL students')
df2.show(vertical=True)

-RECORD 0--------------------------------
 format           | delta                
 id               | 46681f33-7201-4c6... 
 name             | spark_catalog.def... 
 description      | null                 
 location         | dbfs:/user/hive/w... 
 createdAt        | 2023-04-17 16:24:... 
 lastModified     | 2023-04-17 16:24:32  
 partitionColumns | []                   
 numFiles         | 0                    
 sizeInBytes      | 0                    
 properties       | {}                   
 minReaderVersion | 1                    
 minWriterVersion | 2

ดู Delta Lake Files

%python
display(dbutils.fs.ls('dbfs:/user/hive/warehouse/students'))

%python
li_file = dbutils.fs.ls('dbfs:/user/hive/warehouse/students')
df3 = sqlContext.createDataFrame(li_file)
df3.show()

+--------------------+--------------------+----+----------------+
|                path|                name|size|modificationTime|
+--------------------+--------------------+----+----------------+
|dbfs:/user/hive/w...|         _delta_log/|   0|   1681750558289|
+--------------------+--------------------+----+----------------+

Reviewing Delta Lake Transactions

%sql
DESCRIBE HISTORY students

%python
df4 = spark.sql('DESCRIBE HISTORY students')
df4.show(vertical=True)

-RECORD 0-----------------------------------
 version             | 0                    
 timestamp           | 2023-04-17 16:24:32  
 userId              | 8501686721698164     
 userName            | odl_user_915759@d... 
 operation           | CREATE TABLE         
 operationParameters | {isManaged -> tru... 
 job                 | null                 
 notebook            | {1477724271071511}   
 clusterId           | 0415-162149-6ai590aw 
 readVersion         | null                 
 isolationLevel      | WriteSerializable    
 isBlindAppend       | true                 
 operationMetrics    | {}                   
 userMetadata        | null                 
 engineInfo          | Databricks-Runtim...

2. เพิ่มข้อมูล 3 ครั้ง

%sql
INSERT INTO students VALUES (1, "Yve", 1.0);
INSERT INTO students VALUES (2, "Omar", 2.5);
INSERT INTO students VALUES (3, "Elia", 3.3);

%python
df2 = spark.sql('DESCRIBE DETAIL students')
df2.show(vertical=True)

RECORD 0--------------------------------
 format           | delta                
 id               | 46681f33-7201-4c6... 
 name             | spark_catalog.def... 
 description      | null                 
 location         | dbfs:/user/hive/w... 
 createdAt        | 2023-04-17 16:24:... 
 lastModified     | 2023-04-17 16:51:01  
 partitionColumns | []                   
 numFiles         | 3                    
 sizeInBytes      | 2613                 
 properties       | {}                   
 minReaderVersion | 1                    
 minWriterVersion | 2

%python
li_file = dbutils.fs.ls('dbfs:/user/hive/warehouse/students')
df3 = sqlContext.createDataFrame(li_file)
df3.show()

+--------------------+--------------------+----+----------------+
|                path|                name|size|modificationTime|
+--------------------+--------------------+----+----------------+
|dbfs:/user/hive/w...|         _delta_log/|   0|   1681750558289|
|dbfs:/user/hive/w...|part-00000-1d6df3...| 868|   1681750256000|
|dbfs:/user/hive/w...|part-00000-57bc12...| 872|   1681750261000|
|dbfs:/user/hive/w...|part-00000-ec8db3...| 873|   1681750259000|
+--------------------+--------------------+----+----------------+

%python
display(spark.sql(f"SELECT * FROM json.`dbfs:/user/hive/warehouse/students/_delta_log/00000000000000000001.json`"))

รันคำสั่งนี้จะ error

%sql
SELECT * FROM parquet.`dbfs:/user/hive/warehouse/students/part-00000-1d6df344-3187-42fd-8591-df4ed47b403f.c000.snappy.parquet`

AnalysisException: Incompatible format detected.

A transaction log for Delta was found at `dbfs:/user/hive/warehouse/students/_delta_log`,
but you are trying to read from `dbfs:/user/hive/warehouse/students/part-00000-57bc1277-813b-41c0-990d-4ba8fd05c9b1.c000.snappy.parquet` using format("parquet"). You must use
'format("delta")' when reading and writing to a delta table.

To disable this check, SET spark.databricks.delta.formatCheck.enabled=false
To learn more about Delta, see https://docs.databricks.com/delta/index.html; line 1 pos 14

ให้ SET ค่านี้ก่อน

%sql
SET spark.databricks.delta.formatCheck.enabled=false

รันใหม่จะได้ละ

%sql
SELECT * FROM parquet.`dbfs:/user/hive/warehouse/students/part-00000-1d6df344-3187-42fd-8591-df4ed47b403f.c000.snappy.parquet`

คำสั่ง CREATE VIEW

Posted on April 17, 2023 by jack

CREATE VIEW | Databricks on AWS

Constructs a virtual table that has no physical data based on the result-set of a SQL query. ALTER VIEW and DROP VIEW only change metadata.

Syntax

CREATE [ OR REPLACE ] [ TEMPORARY ] VIEW [ IF NOT EXISTS ] view_name
    [ column_list ]
    [ COMMENT view_comment ]
    [ TBLPROPERTIES clause ]
    AS query

column_list
   ( { column_alias [ COMMENT column_comment ] } [, ...] )

ตัวอย่าง

%sql
CREATE OR REPLACE TEMPORARY VIEW demo_tmp2(name, value) AS
VALUES
  ("Yi", 1),
  ("Ali", 2),
  ("Selina", 3)

หรือใช้คำว่า TEMP แทน TEMPORARY ก็ได้

%sql
CREATE OR REPLACE TEMP VIEW demo_tmp1(name, value) AS
VALUES
  ("Yi", 1),
  ("Ali", 2),
  ("Selina", 3)

PySpark: display a spark data frame in a table format

Posted on April 17, 2023 by jack

python – Pyspark: display a spark data frame in a table format – Stack Overflow

สร้าง PySpark DataFrame ชื่อ df

%python
df = sqlContext.createDataFrame([("foo", 1), ("bar", 2), ("baz", 3)], ("k", "v"))

%python
print(type(df))

# <class 'pyspark.sql.dataframe.DataFrame'>

แสดงตาราง

%python
df.show()

# +---+---+
# |  k|  v|
# +---+---+
# |foo|  1|
# |bar|  2|
# |baz|  3|
# +---+---+

แสดงตาราง โดยกำหนด n = 2

%python
df.show(n=2)

# +---+---+
# |  k|  v|
# +---+---+
# |foo|  1|
# |bar|  2|
# +---+---+
# only showing top 2 rows

%python
df.show(2, True)

# +---+---+
# |  k|  v|
# +---+---+
# |foo|  1|
# |bar|  2|
# +---+---+
# only showing top 2 rows

doc

%python
df.show?

Signature:
df.show(
    n: int = 20,
    truncate: Union[bool, int] = True,
    vertical: bool = False,
) -> None
Docstring:
Prints the first ``n`` rows to the console.

.. versionadded:: 1.3.0

Parameters
----------
n : int, optional
    Number of rows to show.
truncate : bool or int, optional
    If set to ``True``, truncate strings longer than 20 chars by default.
    If set to a number greater than one, truncates long strings to length ``truncate``
    and align cells right.
vertical : bool, optional
    If set to ``True``, print output rows vertically (one line
    per column value).

Examples
--------
>>> df
DataFrame[age: int, name: string]
>>> df.show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  5|  Bob|
+---+-----+
>>> df.show(truncate=3)
+---+----+
|age|name|
+---+----+
|  2| Ali|
|  5| Bob|
+---+----+
>>> df.show(vertical=True)
-RECORD 0-----
 age  | 2
 name | Alice
-RECORD 1-----
 age  | 5
 name | Bob
File:      /databricks/spark/python/pyspark/sql/dataframe.py

ติดตั้ง Java JDK บน Ubuntu 20.04

Posted on April 16, 2023 by jack

Installing Java

update packages ก่อน

$ sudo apt update

ติดตั้ง Java JDK 11 (openjdk)

$ sudo apt install default-jdk

แต่ถ้าจะติดตั้ง Java 8 ใช้คำสั่ง

sudo apt install openjdk-8-jdk

โปรแกรมจะติดตั้งอยู่ที่ /usr/lib/jvm/java-11-openjdk-amd64/bin/

$ ls -l /usr/lib/jvm/java-11-openjdk-amd64/bin/java*
-rwxr-xr-x 1 root root 14560 ม.ค.  20 16:07 /usr/lib/jvm/java-11-openjdk-amd64/bin/java
-rwxr-xr-x 1 root root 14608 ม.ค.  20 16:07 /usr/lib/jvm/java-11-openjdk-amd64/bin/javac
-rwxr-xr-x 1 root root 14608 ม.ค.  20 16:07 /usr/lib/jvm/java-11-openjdk-amd64/bin/javadoc
-rwxr-xr-x 1 root root 14576 ม.ค.  20 16:07 /usr/lib/jvm/java-11-openjdk-amd64/bin/javap

ตรวจสอบการติดตั้งบน Ubuntu 20.04.6

$ java --version
openjdk 11.0.18 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-post-Ubuntu-0ubuntu120.04.1)
OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Ubuntu-0ubuntu120.04.1, mixed mode, sharing)

$ javac --version
javac 11.0.18

ตรวจสอบการติดตั้งบน Ubuntu 22.04.2

$ java --version
openjdk 11.0.18 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

$ javac --version
javac 11.0.18

Managing Java

ใช้คำสั่ง update-alternatives

$ update-alternatives --help
Usage: update-alternatives [<option> ...] <command>

Commands:
  --install <link> <name> <path> <priority>
    [--slave <link> <name> <path>] ...
                           add a group of alternatives to the system.
  --remove <name> <path>   remove <path> from the <name> group alternative.
  --remove-all <name>      remove <name> group from the alternatives system.
  --auto <name>            switch the master link <name> to automatic mode.
  --display <name>         display information about the <name> group.
  --query <name>           machine parseable version of --display <name>.
  --list <name>            display all targets of the <name> group.
  --get-selections         list master alternative names and their status.
  --set-selections         read alternative status from standard input.
  --config <name>          show alternatives for the <name> group and ask the
                           user to select which one to use.
  --set <name> <path>      set <path> as alternative for <name>.
  --all                    call --config on all alternatives.

<link> is the symlink pointing to /etc/alternatives/<name>.
  (e.g. /usr/bin/pager)
<name> is the master name for this link group.
  (e.g. pager)
<path> is the location of one of the alternative target files.
  (e.g. /usr/bin/less)
<priority> is an integer; options with higher numbers have higher priority in
  automatic mode.

Options:
  --altdir <directory>     change the alternatives directory.
  --admindir <directory>   change the administrative directory.
  --log <file>             change the log file.
  --force                  allow replacing files with alternative links.
  --skip-auto              skip prompt for alternatives correctly configured
                           in automatic mode (relevant for --config only)
  --quiet                  quiet operation, minimal output.
  --verbose                verbose operation, more output.
  --debug                  debug output, way more output.
  --help                   show this help message.
  --version                show the version.

You can have multiple Java installations on one server. You can configure which version is the default for use on the command line by using the update-alternatives command.

$ sudo update-alternatives --config java

ถ้ามี java ตัวเดียวก็จะขึ้นประมาณนี้

$ sudo update-alternatives --config java
There is only one alternative in link group java (providing /usr/bin/java): /usr/lib/jvm/java-11-openjdk-amd64/bin/java
Nothing to configure.

แต่ถ้ามี java หลายตัว ก็จะแสดงให้เราเลือก

javac ก็เหมือนกัน ใช้คำสั่ง

$ sudo update-alternatives --config javac

Setting the JAVA_HOME

$ sudo nano /etc/environment

At the end of this file, add the following line, and to not include the bin/ portion of the path: (หา path ได้ด้วยคำสั่ง update-alternatives)

JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"

JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

Modifying this file will set the JAVA_HOME path for all users on your system.

Save the file and exit the editor.

Now reload this file to apply the changes to your current session:

$ source /etc/environment

Verify that the environment variable is set:

$ echo $JAVA_HOME

Link

How To Install Java with Apt on Ubuntu 22.04 | DigitalOcean

คำสั่ง apt

Posted on April 16, 2023 by jack

ดู help ของคำสั่ง apt

$ apt --help
apt 2.4.8 (amd64)
Usage: apt [options] command

apt is a commandline package manager and provides commands for
searching and managing as well as querying information about packages.
It provides the same functionality as the specialized APT tools,
like apt-get and apt-cache, but enables options more suitable for
interactive use by default.

Most used commands:
  list - list packages based on package names
  search - search in package descriptions
  show - show package details
  install - install packages
  reinstall - reinstall packages
  remove - remove packages
  autoremove - Remove automatically all unused packages
  update - update list of available packages
  upgrade - upgrade the system by installing/upgrading packages
  full-upgrade - upgrade the system by removing/installing/upgrading packages
  edit-sources - edit the source information file
  satisfy - satisfy dependency strings

See apt(8) for more information about the available commands.
Configuration options and syntax is detailed in apt.conf(5).
Information about how to configure sources can be found in sources.list(5).
Package and version choices can be expressed via apt_preferences(5).
Security details are available in apt-secure(8).
                                        This APT has Super Cow Powers.

Update and Upgrade packages

update list of available packages

$ sudo apt update

list packages ที่สามารถอัพเกรดได้

$ apt list --upgradable

upgrade the system by installing/upgrading packages

$ sudo apt upgrade

ทำคำสั่ง apt update และต่อด้วย apt upgrade ด้วยการใช้คำสั่ง &&

When using the && command, the second command will be executed only when the first one has been succcefully executed.

$ sudo apt update && sudo apt upgrade

Install or Remove package

install packages

$ sudo apt install <package_name>

remove packages

$ sudo apt remove <package_name>

Remove automatically all unused packages

$ sudo apt autoremove

reinstall packages

sudo apt reinstall <package_name>
sudo apt reinstall lighttpd

hold a package ด้วย apt-mark

sudo apt-mark hold <package_name>
sudo apt-mark hold sudo

unhold a package ด้วย apt-mark

sudo apt-mark unhold <package_name>
sudo apt-mark unhold sudo

Other

show package details

To show or see information about the given package(s) including its dependencies, installation and download size, sources the package is available from, the description of the packages content and much more:

$ apt show <package_name>
$ apt show sudo

List package dependency

apt depends <package_name>
apt depends sudo

search in package descriptions

apt search php
apt search mysql-5.?
apt search mysql-server-5.?
apt search httpd*
apt search ^apache
apt search ^nginx
apt search ^nginx$

apt search ค้นหาใน package descriptions ทำให้ได้ข้อมูลเยอะเกิน หา pakcage ที่ต้องการยาก ให้ลองใช้ apt list แทน

apt list
apt list | more
apt list | grep foo
apt list | grep php7-

apt list nginx
apt list 'php7*'

List all installed packages

apt list --installed
apt list --installed | grep <package_name>

Ref

apt Command Examples for Ubuntu/Debian Linux – nixCraft (cyberciti.biz)

Write Excel with PySpark

Posted on April 11, 2023 by jack

ที่ Cluster ติดตั้ง com.crealytics:spark-excel-2.12.17-3.0.1_2.12:3.0.1_0.18.1

สร้าง pyspark dataframe

%python
data = [('A', "1"),
        ('B', "2"),
        ('C', "3"),
        ('D', "4")
        ]
print(type(data))  # <class 'list'>
df = spark.createDataFrame(data)
print(type(df))    # <class 'pyspark.sql.dataframe.DataFrame'>
display(df)

เขียนไฟล์ excel

%python
path = '/mnt/xxx/tmp/'
filename = f'{path}output1.xlsx'
print(f'filename = {filename}')
df.write.format("com.crealytics.spark.excel")\
  .option("header", "true")\
  .mode("overwrite")\
  .save(filename)

ลอง %fs ls ‘/mnt/xxx/tmp/‘ จะเห็นไฟล์ dbfs:/mnt/xxx/tmp/output1.xlsx ละ

สร้าง dataframe อีกอัน

%python
columns = ['Identifier', 'Value', 'Extra Discount']
vals = [(1, 150, 0), (2, 160, 12)]
df2 = spark.createDataFrame(vals, columns)
df2.show()

# +----------+-----+--------------+
# |Identifier|Value|Extra Discount|
# +----------+-----+--------------+
# |         1|  150|             0|
# |         2|  160|            12|
# +----------+-----+--------------+

เขียนแบบ append โดยข้อมูลเริ่มต้นที่ cell B3 ถึง C35 คอลัมน์ Extra Discount เลยหายไป

%python
df2.write.format("com.crealytics.spark.excel") \
  .option("dataAddress", "'My Sheet'!B3:C35") \
  .option("header", "true") \
  .mode("append") \
  .save(filename)

Write Excel with Pandas

Posted on April 11, 2023 by jack

สร้าง pandas DataFrame

%python
import pandas as pd
import openpyxl

df = pd.DataFrame([[11, 21, 31], [12, 22, 32], [31, 32, 33]],
                  index=['one', 'two', 'three'], columns=['a', 'b', 'c'])

print(df)

#         a   b   c
# one    11  21  31
# two    12  22  32
# three  31  32  33

print(type(df))
# <class 'pandas.core.frame.DataFrame'>

เขียน pandas DataFrame ลงไฟล์

%python
path = '/tmp/'
filename = f'{path}output1.xlsx'
print(filename)
with pd.ExcelWriter(filename) as writer:  
     df.to_excel(writer, sheet_name='Sheet_name_1')

Python Naming Conventions

Posted on April 10, 2023 by jack

PEP 8 – Style Guide for Python Code | peps.python.org

The naming styles

b (single lowercase letter)
B (single uppercase letter)
lowercase
lower_case_with_underscores
UPPERCASE
UPPER_CASE_WITH_UNDERSCORES
CapitalizedWords (or CapWords, or CamelCase – so named because of the bumpy look of its letters [4]). This is also sometimes known as StudlyCaps.Note: When using acronyms in CapWords, capitalize all the letters of the acronym. Thus HTTPServerError is better than HttpServerError.
mixedCase (differs from CapitalizedWords by initial lowercase character!)
Capitalized_Words_With_Underscores (ugly!)

Class Names

Class names should normally use the CapWords convention.

Function Names

Function names should be lowercase, with words separated by underscores as necessary to improve readability. (lower_case_with_underscores)

Variable Names

Variable names follow the same convention as function names.