最後更新: 2020-02-10
介紹
String 是 built-in class 來, Class 名叫 str
有 immutable 的特性
目錄
- 合併字串
- " 與 '
- """ 與 '''
- \ 續行
- raw string
- 數字轉成 string
- Slicing 與 Indexing
- unicode
- String Function
- String attribute
- Placeholder
- Encode 與 Decode
- ascii type()
- newline char
- remove newline in string
- check empty line
- print without newline
- StringIO
合併字串:
s= 'a' + 'b' + 'c'
'+' 沒有自動轉型的功能
" 與 '
- 'Let\'s go!'
- "Let 's go!"
在 python 上, ' 與 " 是沒有分別的 !
""" 與 '''
它們用來進行多行的 text 輸入
multi = """It was the best of times.
It was the worst of times."""
多行:
如果第一行只係得 "/n/r" 那就要加 '\'
'''\
<?xml version="1.0"?>
<!DOCTYPE root SYSTEM "test" [ <!ENTITY tasty "parsnips"> ]>
<root>
<a>&tasty;</a>
</root>
'''
\ 續行
m = 1 + 2 + 3 + \
4 + 5 + 6
raw string:
mystr = 'Let \n "s go!'
print mystr
# Let
# "s go!
raw = r'Let \n "s go!'
print raw
# Let \n "s go!
A "raw" string literal is prefixed by an 'r'
A 'u' prefix allows you to write a unicode string literal
int to string
str
- str(3)
- `3` <-- 相當於 repr(3)
chr
print chr(0x30) # 0
string to int
# Here age is a string object age = "18" print(age) # Converting string to integer int_age = int(age) print(int_age)
數字的字串左手面補 "0"
方法1: zfill()
zfill() 返回指定長度的字符串, 原字符串右對齊, 前面填充0
ns = str(n).zfill(digits)
i.e.
print '3'.zfill(5)
00003
方法2:
print '%0Nd' % int
i.e.
print '%05d' % 3
00003
Slicing 與 Indexing
"abcdefg"[1] # b
"abcdefg"[-1] # g
"abcdefg"[0:4] # abcd
"abcdefg"[-3:] # efg
"abcdefg"[:3] # abc
"abcdefg"[::2] # aceg <- 0,2,4,6
"abcdefg"[::-2] # geca
"a" * 5 # aaaaa
'P' in 'Python' # True
String Function
Basic
- s.len(s)
- s.lower()
- s.upper()
- s.strip() # 去除頭尾的空格
checking
- s.isalpha()
- s.isdigit()
- s.isspace()
- s.startswith('start') # True, if s starts with 'start'
- s.endswith('end')
search & replace
- s.find('other') # return -1 if not found
- s.replace('old', 'new' [,max]) # 把 s 內的 max 個 'old' replace 成 'new'. max default 1
split & join
s.split('delim') # 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']
s.join(list)
e.g.
', '.join("abc") # 'a, b, c'
'-'.join(['a1', 'b2', 'c3']) # 'a1-b2-c3'
轉換
- chr(i) # return c
- ord(c) # return i
cmp 比較:
print cmp(2,1) # 1
print cmp(1,1) # 0
print cmp(1,2) # -1
String attribute
string.lowercase
string.uppercase
string.letters (lowercase + uppercase)
string.digits
string.hexdigits
string.printable
string.whitespace # space, tab, linefeed, return, formfeed, and vertical tab
i.e.
import string string.lowercase[:14] 'abcdefghijklmn'
Placeholder
方式
- %X
- str.format()
- f-strings
%X
- %d # int
- %s # string
- %f # floating point
i.e.
>>> format = "Hello, %s. %s enough for ya?" >>> values = ('world', 'Hot') >>> print format % values Hello, world. Hot enough for ya?
Width and Precision
print '%.3s' % 'abcdefg' # abc
print '%.*s' % (3, 'abcdefg') # abc
i.e.
from math import pi "%10.3f" % pi # 10位.3位 ' 3.142' '%010.3f' % pi # 加0 '000003.142' '%-10.2f' % pi # 靠左 '3.14 '
%s left align
%-6s
str.format()
print("My name is {}.".format("Datahunter"))
# Multiple placeholders
print('{0} : {1}, {2}'.format("string0", "string1", "string2"))
# Variable substitutions
print("{who} name is {name}.".format(name="Datahunter", who="My"))
# Value formatting
print('{:.2f}'.format(12345.12345))
print('{0:.2f}'.format(12345.12345))
import datetime today = datetime.datetime.today() print("{t:%B %d, %Y}".format(t=today))
# Padding Substitutions
< left-align text in the field
^ center text in the field
> right-align text in the field
# Dictionary for string formatting
acl = {"uid": 0, "username": "datahunter", "is_admin": "True"}
print('{uid}, {username}: {is_admin}'.format(**acl))
* 要用 "True" 不能用 boolean
f-strings
它新增於 PEP498, 可以實現 str.format() 相同的功能
i.e. netdata 的 API URL
api_url = "http://localhost:19999/api/v1/data?chart=" chart = "net.ens192" after = 86400 url = f'{api_url}{chart}&after=-{after}'
i.e. datetime format
import datetime today = datetime.datetime.today() print(f"{today:%B %d, %Y}")
Out: "February 20, 2024"
Unicdoe
unicode type
* built-in types
unicode(string[, encoding, errors])
string = 8-bit strings
>>> unicode('abcdef')
u'abcdef'
相當於
>>> u'abcdef'
不同的 len
<1>
# len() 是計算 Byte sequence 的長度, 而不是"字"的長度
# 4
# 告訴 python source code 是以 Big5 來編碼的 # coding=Big5 text = '測試' print len(text)
<2>
# len() 取得一個 unicode instance 的 length
# 2
# coding=Big5 text = u'測試' print len(text)
<3>
# 4
# coding=utf-8 text = '中文' print len(text)
讀取 UTF-8 檔案
# 讀取UTF-8文字檔案, 可用選項 Big5, UTF-8
file = open(name, 'r', encoding='UTF-8')
Encode 與 Decode
codecs - base classes for standard Python codecs
# 當 python 要做編碼轉換的時候
原有編碼 -> 內部編碼 -> 目的編碼
* Default 的原有 encoding 是 'ascii'
* 內部是使用 unicode
unicode 在 python 一共有兩種
- UCS-2 (65536)
- UCS-4 (2147483648)
* 編譯時通過 --enable-unicode=ucs2 或 --enable-unicode=ucs4 來指定的
# 查看是用 UCS-2 還是 UCS-4
import sys print sys.maxunicode
Output
# Linux
1114111
# Window (UCS-2)
65535
decode 與 encode
它們都是 Decodes the object input and returns a tuple
decode: 原本 -> python 內部編碼
converts a plain string encoded using a particular character set encoding to a Unicode object
encode: python 內部編碼 -> 原本
encoding converts a Unicode object to a plain string using a particular character set encoding
Usage
decode([codec])
codec = ascii, utf-8, big5 ... # default 是 ascii
用錯 codec 會有
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
# 把 mystr 用 utf-8 轉為內部 string
newstr = mystr.decode('utf-8')
# 內部 string 轉成 big output
mystr.encode('big5')
big5 -> 內部編碼的 len
#!/usr/bin/env python
# coding=big5
mytext = "中"
# 3
print len(mytext)
mybig5 = mytext.decode("big5")
# 1
print len(mybig5)
"codecs" Usage
import codecs codecs.encode(obj[, encoding[, errors]]) codecs.decode(obj[, encoding[, errors]])
usage:
#!/usr/bin/env python # coding=big5 import codecs mytext = "中" # 3 print len(mytext) big5toutf8=codecs.lookup("big5") # (u'\u4e2d', 2) # (data, len) print big5toutf8.decode(mytext)
用 codecs 的 open 打開文件. 在讀取的時候自動轉換為內部 unicode
ascii type()
<1>
#coding=utf-8 mytext=u"中文" def isEng(msg): lang="en" for n in msg: if ord(n) >127: lang="zh" return lang print isEng(mytext)
def is_ascii(s):
return all(ord(c) < 128 for c in s)
<2>
In Python 2, a string may be of type str or of type unicode
def whatisthis(s): if isinstance(s, str): print "ordinary string" elif isinstance(s, unicode): print "unicode string" else: print "not a string"
# int
x = 5
print isinstance(x,int);
# ordinary string
y = '測試'
print isinstance(y,str);
# unicode string
z = u'測試'
print isinstance(z,unicode);
# str 與 unicode
print isinstance(y,basestring);
==================
type(s)
One will say unicode, the other will say str.
text = '測試'
print len(text) # 6
text = u'測試'
print len(text) # 2
type(s)
<type 'unicode'>
<type 'str'>
==================
s.decode('ascii')
# if it raises an exception, the string is not 100% ASCII.
s.decode('ascii')
ascii_*
>>> import string
>>> string.ascii_lowercase[:14]
'abcdefghijklmn'
>>> string.ascii_lowercase[:14:2]
'acegikm'
Newline char
print "line 1" + os.linesep + "line 2"
Remove newline in string
>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'
>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'
>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'
Using '\r\n' as the parameter to rstrip means that it will strip out any trailing combination of '\r' or '\n'.
That's why it works in all three cases above.
python check empty line
f = open(myfile) for line in f.readlines(): if not line.strip(): continue else: print line
line.strip() is doing is removing whitespace from the ends of the string.
If the line is completely empty, or it's only spaces and tabs,
line.strip() will return an empty string '' which evaluates to false.
Print without newline
import sys sys.stdout.write('some thing') sys.stdout.flush()
OR
print "this should be",
print "on the same line"
StringIO
介紹
- 支援多次寫野入 string buffer
Usage:
class StringIO.StringIO([buffer])
* If no string is given, the StringIO will start empty
# Retrieve the entire contents
StringIO.getvalue()
# Free the memory buffer.
StringIO.close()
Example:
import StringIO output = StringIO.StringIO() # 多次寫野入去 output.write('First line.\n') print >> output, 'Second line.' contents = output.getvalue() output.close()