02. String 操作

最後更新: 2020-02-10

介紹

String 是 built-in class 來, Class 名叫 str

immutable 的特性

 

目錄

  • 合併字串
  • " 與 '
  • """ 與 '''
  • \ 續行
  • raw string
  • 數字轉成 string
  • Slicing 與 Indexing
  • unicode
  • String Function
  • String attribute
  • Placeholder
  • Encode  與 Decode
  • ascii type()
  • newline char
  • remove newline in string
  • check empty line
  • print without newline
  • StringIO

 


合併字串:

s= 'a' + 'b' + 'c'

'+' 沒有自動轉型的功能
 

" 與 '

  • 'Let\'s go!'
  • "Let 's go!"

在 python 上, ' 與 " 是沒有分別的 !

 

""" 與 '''

它們用來進行多行的 text 輸入

multi = """It was the best of times.
It was the worst of times."""

多行:

如果第一行只係得 "/n/r" 那就要加 '\'

'''\
<?xml version="1.0"?>
<!DOCTYPE root SYSTEM "test" [ <!ENTITY tasty "parsnips"> ]>
<root>
  <a>&tasty;</a>
</root>
'''

 

\ 續行

m = 1 + 2 + 3 + \
      4 + 5 + 6
 

raw string:

mystr =  'Let \n "s go!'
print mystr

# Let
# "s go!

raw = r'Let \n "s go!'
print raw                        

# Let \n "s go!

A "raw" string literal is prefixed by an 'r'
A 'u' prefix allows you to write a unicode string literal

 


int to string

str

  • str(3)
  • `3`         <--  相當於 repr(3)

chr

print chr(0x30)                     # 0

 


string to int

 

# Here age is a string object
age = "18"
print(age)
# Converting string to integer
int_age = int(age)
print(int_age)

 


數字的字串左手面補 "0"

 

方法1: zfill()

zfill() 返回指定長度的字符串, 原字符串右對齊, 前面填充0

ns = str(n).zfill(digits)

i.e.

print '3'.zfill(5)

00003

方法2:

print '%0Nd' % int

i.e.

print '%05d' % 3

00003

 


Slicing 與 Indexing

 

"abcdefg"[1]             # b

"abcdefg"[-1]            # g

"abcdefg"[0:4]          # abcd

"abcdefg"[-3:]           # efg

"abcdefg"[:3]            # abc

"abcdefg"[::2]           # aceg  <- 0,2,4,6

"abcdefg"[::-2]         # geca

"a" * 5                      # aaaaa

 'P' in 'Python'            # True

 


String Function

 

Basic

  • s.len(s)
  • s.lower()
  • s.upper()
  • s.strip()                              # 去除頭尾的空格

checking

  • s.isalpha()
  • s.isdigit()
  • s.isspace()
  • s.startswith('start')             # True, if s starts with 'start'
  • s.endswith('end')

search & replace

  • s.find('other')                           # return -1 if not found
  • s.replace('old', 'new' [,max])     # 把 s 內的 max 個 'old' replace 成 'new'. max default 1

split & join

s.split('delim')                      #  'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']

s.join(list)

e.g.

', '.join("abc")                         # 'a, b, c'

'-'.join(['a1', 'b2', 'c3'])           # 'a1-b2-c3'

 

轉換

  • chr(i)                                            # return c
  • ord(c)                                           # return i

 

cmp 比較:

print cmp(2,1)                         # 1

print cmp(1,1)                         # 0

print cmp(1,2)                         # -1

 


String attribute

 

string.lowercase
string.uppercase
string.letters      (lowercase + uppercase)
string.digits
string.hexdigits
string.printable
string.whitespace    # space, tab, linefeed, return, formfeed, and vertical tab

i.e.

import string
string.lowercase[:14]
'abcdefghijklmn'

 


Placeholder

 

方式

  • %X
  • str.format()
  • f-strings

 

%X

  • %d     # int
  • %s     # string
  • %f     # floating point

i.e.

>>> format = "Hello, %s. %s enough for ya?"
>>> values = ('world', 'Hot')
>>> print format % values

Hello, world. Hot enough for ya?

Width and Precision

print '%.3s' % 'abcdefg'          # abc

print '%.*s' % (3, 'abcdefg')    # abc

i.e.

from math import pi

"%10.3f" % pi     # 10位.3位 '     3.142'

'%010.3f' % pi    # 加0      '000003.142'

'%-10.2f' % pi    # 靠左     '3.14      '

%s left align

%-6s

str.format()

print("My name is {}.".format("Datahunter"))

# Multiple placeholders

print('{0} : {1}, {2}'.format("string0", "string1", "string2"))

# Variable substitutions

print("{who} name is {name}.".format(name="Datahunter", who="My"))

# Value formatting

print('{:.2f}'.format(12345.12345))

print('{0:.2f}'.format(12345.12345))

import datetime
today = datetime.datetime.today()
print("{t:%B %d, %Y}".format(t=today))

# Padding Substitutions

<     left-align text in the field
^     center text in the field
>     right-align text in the field

# Dictionary for string formatting

acl = {"uid": 0, "username": "datahunter", "is_admin": "True"}

print('{uid}, {username}: {is_admin}'.format(**acl))

 * 要用 "True" 不能用 boolean

 

f-strings

它新增於 PEP498, 可以實現 str.format() 相同的功能

i.e. netdata 的 API URL

api_url = "http://localhost:19999/api/v1/data?chart="
chart = "net.ens192"
after = 86400
url = f'{api_url}{chart}&after=-{after}'

i.e. datetime format

import datetime
today = datetime.datetime.today()
print(f"{today:%B %d, %Y}")

Out: "February 20, 2024"

 


Unicdoe

 

unicode type

*  built-in types

unicode(string[, encoding, errors])

string = 8-bit strings

>>> unicode('abcdef')
u'abcdef'

相當於

>>> u'abcdef'

不同的 len

<1>

# len() 是計算 Byte sequence 的長度, 而不是"字"的長度

# 4

# 告訴 python source code 是以 Big5 來編碼的
# coding=Big5

text = '測試'

print len(text)

<2>

#  len() 取得一個 unicode instance 的 length

# 2

# coding=Big5

text = u'測試'
print len(text) 

<3>

# 4

# coding=utf-8

text = '中文'
print len(text)

 


讀取 UTF-8 檔案

 

# 讀取UTF-8文字檔案, 可用選項 Big5, UTF-8

file = open(name, 'r', encoding='UTF-8')

 


Encode  與 Decode

 

codecs - base classes for standard Python codecs

# 當 python 要做編碼轉換的時候

原有編碼 -> 內部編碼 -> 目的編碼

* Default 的原有 encoding 是 'ascii'
* 內部是使用 unicode

unicode 在 python 一共有兩種

  • UCS-2 (65536)
  • UCS-4 (2147483648)

* 編譯時通過 --enable-unicode=ucs2 或 --enable-unicode=ucs4 來指定的

# 查看是用 UCS-2 還是 UCS-4

import sys
print sys.maxunicode

Output

# Linux

1114111

# Window (UCS-2)

65535

decode 與 encode

它們都是 Decodes the object input and returns a tuple

decode: 原本 -> python 內部編碼

converts a plain string encoded using a particular character set encoding to a Unicode object

encode: python 內部編碼 -> 原本

encoding converts a Unicode object to a plain string using a particular character set encoding

Usage

decode([codec])

codec = ascii, utf-8, big5 ...                      # default 是 ascii

用錯 codec 會有

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

# 把 mystr 用 utf-8 轉為內部 string

newstr = mystr.decode('utf-8')

# 內部 string 轉成 big output

mystr.encode('big5')

big5 -> 內部編碼的 len

#!/usr/bin/env python
# coding=big5

mytext = "中"

# 3
print len(mytext)

mybig5 = mytext.decode("big5")

# 1
print len(mybig5)

"codecs" Usage

import codecs

codecs.encode(obj[, encoding[, errors]])
codecs.decode(obj[, encoding[, errors]])

usage:

#!/usr/bin/env python
# coding=big5
import codecs

mytext = "中"

# 3
print len(mytext)

big5toutf8=codecs.lookup("big5")

# (u'\u4e2d', 2)
# (data, len)
print big5toutf8.decode(mytext)

用 codecs 的 open 打開文件. 在讀取的時候自動轉換為內部 unicode

 


ascii type()

 

<1>

#coding=utf-8

mytext=u"中文"

def isEng(msg):
    lang="en"
    for n in msg:
        if ord(n) >127:
            lang="zh"
    return lang

print isEng(mytext)

def is_ascii(s):
    return all(ord(c) < 128 for c in s)

<2>

In Python 2, a string may be of type str or of type unicode

def whatisthis(s):

    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
    else:
        print "not a string"

# int
x = 5
print isinstance(x,int);

# ordinary string
y = '測試'
print isinstance(y,str);

# unicode string
z = u'測試'
print isinstance(z,unicode);

# str 與 unicode
print isinstance(y,basestring);

==================

type(s)

One will say unicode, the other will say str.

text = '測試'
print len(text) # 6

text = u'測試'
print len(text) # 2

type(s)

<type 'unicode'>
<type 'str'>

==================

s.decode('ascii')

# if it raises an exception, the string is not 100% ASCII.

s.decode('ascii')

 


ascii_*

 

>>> import string
>>> string.ascii_lowercase[:14]
'abcdefghijklmn'
>>> string.ascii_lowercase[:14:2]
'acegikm'

 


Newline char

 

print "line 1" + os.linesep + "line 2"

 


Remove newline in string

 

>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'

>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'

>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'

Using '\r\n' as the parameter to rstrip means that it will strip out any trailing combination of '\r' or '\n'.

That's why it works in all three cases above.

 


python check empty line

 

f = open(myfile)
for line in f.readlines():
        if not line.strip():
                continue
        else:
                print line

line.strip() is doing is removing whitespace from the ends of the string.

If the line is completely empty, or it's only spaces and tabs,

line.strip() will return an empty string '' which evaluates to false.

 


Print without newline

 

import sys
sys.stdout.write('some thing')
sys.stdout.flush()

OR

print "this should be",
print "on the same line"

 


StringIO

 

介紹

 - 支援多次寫野入 string buffer

Usage:

class StringIO.StringIO([buffer])

* If no string is given, the StringIO will start empty

# Retrieve the entire contents

StringIO.getvalue()

# Free the memory buffer.

StringIO.close()

Example:

import StringIO

output = StringIO.StringIO()

# 多次寫野入去
output.write('First line.\n')
print >> output, 'Second line.'

contents = output.getvalue()

output.close()

 

 

Creative Commons license icon Creative Commons license icon